NEW: Dominic Kwiatkowski’s final paper... more
Using MalariaGEN Data
Before using MalariaGEN data, please read more about the terms that govern the datasets you are looking at.

April 2024 update:

Currently, MalariaGEN data are stored in Google Cloud Storage (GCS) buckets which are publicly accessible, meaning that data can be retrieved anonymously. This has minimised the technical hurdles that users have to navigate before they can start analysing data, making it very convenient for all parties.

However, we need to change the way the data storage is configured, to require users to authenticate prior to accessing data. Find out more about the changes.

Request data access

Everyone who wants to access these data will still be able to, but users will no longer be able to access data anonymously.

Fill out this form

Our approach to data-sharing is governed by two simple principles:

  1. Data-sharing must be equitable and appropriate, and
  2. Data users must acknowledge the work of contributing researchers by citing the data source.

These principles are applied in different ways for different data releases because the definition of “equitable” and “appropriate” can vary. They depend on the community of researchers involved, the scientific objectives of the project for which the data was generated, and the nature of the data. For example, human data is subjected to different restrictions than parasite data.

MalariaGEN data resources – amounting to more than 130 terabytes to date – have been generated in collaboration with independent researchers in malaria-endemic countries through a number of distinct projects with varied scientific and translational objectives.

In order to maximise the benefit of these data resources to the scientific community, many of our projects make data available online before their own analyses and publications. While some resources are made available open access, typically data are released under specific Terms of Use or data release policies in an effort to strike a balance between facilitating access and ensuring that the science and underlying partnerships are sustainable.

The relevant Terms of Use or data release policy will clearly state any restrictions on data usage, outline acceptable data use, and provide guidance on how to cite the data source.

All data users are expected to respect the contributions of their scientific colleagues by abiding by the relevant terms of use, where applicable, and always providing appropriate acknowledgement.

Terms of Use

This page provides detail about how to apply for access to human GWAS data and links to the relevant policies regarding data use.
These Terms of Use were applied to the Ag1000G Phase 1 and Phase 2 data releases when first publically released and were lifted in March 2022. These datasets are now available open access. The terms of use still apply to Phase 3 datasets.
Although malaria is generally an endemic rather than an epidemic disease, and the focus of this project is on surveillance of disease vectors rather than pathogens, our data terms of use build on MalariaGEN’s approach to data sharing, and adopt norms which have been established for rapid sharing of pathogen genomic data during disease outbreaks.
The Pf3k Pilot Phase Terms of Use were applied to Pilot Phase data releases when they were publicly released. In September 2016 these restrictions were lifted from Pf3k pilot data release packages 1-5. The Terms of Use below have now been archived and the data are available open access.
These Terms of Use were applied to the P. falciparum Community Project Jan. 2016 data release when first publically released and were lifted in February 2017. This dataset is now available open access
These Terms of Use were applied to the Anopheles funestus Genomic Surveillance Project