NEW: Dominic Kwiatkowski’s final paper... more
Changes to MalariaGEN cloud data access

To help us continue providing cloud data resources for the community, we are making some changes to how data are accessed. If you are actively using MalariaGEN data, or are planning to use it in the future, please read this carefully and follow the steps below.

News 18 Apr 2024
Mosquito Parasite

Read about the changes below and fill out this form to maintain access to MalariaGEN data

Upcoming changes to accessing MalariaGEN data in the cloud

MalariaGEN data resources provide an integrated view of malaria parasite and vector genomes from across the globe. These data are available to everyone to benefit the science and surveillance of malaria.

We currently use Google Cloud to host MalariaGEN data resources. Storing data in the cloud enables users from anywhere in the world to access and analyse the data for free via cloud computing services such as Google Colab. Making genomic data accessible to the widest possible audience is an important goal for MalariaGEN.

To help us continue providing cloud data resources for the community, we are making some changes to how data are accessed. If you are actively using MalariaGEN data, or are planning to use it in the future, please read this carefully and follow the steps below.

What is changing?

Currently, MalariaGEN data are stored in Google Cloud Storage (GCS) buckets which are publicly accessible, meaning that data can be retrieved anonymously. This has minimised the technical hurdles that users have to navigate before they can start analysing data, making it very convenient for all parties.

However, we need to change the way the data storage is configured, to require users to authenticate prior to accessing data. Everyone who wants to access these data will still be able to, but users will no longer be able to access data anonymously.

Why is it changing?

We are making this change because we are finding that some users are repeatedly downloading large volumes of data from GCS to locations outside of Google Cloud. This is not optimal because network transfer is relatively slow, and users wanting to perform computations locally would be better to download data to local storage once then access data locally thereafter. Excessive data transfer outside of Google Cloud also increases our running costs, potentially impacting our ability to maintain a free service for other users.

Also, some MalariaGEN data resources, such as those generated by the MalariaGEN Vector Observatory, balance the needs of public health and scientific research by making all data immediately accessible for public health and educational purposes, but allowing data owners to request a publication embargo for up to 2 years. The change to data access will allow us to ensure all data users are made aware that some data within MalariaGEN data resources are subject to terms of use including a publication embargo.

What do I have to do?

If you are an existing user or would like to access MalariaGEN data in future, please now complete the following four steps.

Step 1. Make sure you have a Google Account

To allow us to configure data access permissions, you will need to provide us with an email address that is associated with a Google account. This could be a standard Google (i.e., Gmail) account, or alternatively it could be your work email if your employer uses Google Workspace.

Step 2. Fill out the data access request form

Please fill out and submit the following form:

https://forms.gle/kCqistorZyxaU4LP7 

All requests for data access will be granted, subject to verification checks and agreement to reasonable use. This is to ensure that the data resources remain accessible to everyone. Submitting this form will allow us to configure storage permissions and monitor storage for excessive network usage in future.

Step 3. Upgrade the malariagen_data Python package

If you access data via the malariagen_data Python package, please upgrade to version 9.0 or higher. These newer versions will automatically use your authentication credentials when accessing data in Google Cloud.

Step 4. Set up Google Cloud authentication credentials

If you are only accessing data via the malariagen_data Python package from within Google Colab, you can skip this step, because authentication credentials will be obtained automatically.

If you are accessing data from any other location, you will need to authenticate with Google Cloud. Further details of authentication are provided in the user guide documentation.

When will the changes happen?

We are planning to change the storage configuration and remove anonymous access on or before 25 April.

However, it will take some time to process applications and configure permissions, so please submit your data access request as soon as possible to ensure continued access. We may also need to implement the changes sooner than planned, so please don’t delay.

How can I get help?

If you have any questions about these changes, or you would like any technical assistance, or you would like advice about getting the best performance out of accessing data in cloud storage, please get in touch with us via support@malariagen.net – we’re more than happy to help!

Thanks in advance for your kind understanding as we work through these changes.