About the data

This web application is designed to share the findings of the MalariaGEN Plasmodium falciparum Community Project, an international collaboration with partner studies in over 20 malaria-endemic countries.

We are currently working on a scientific publication that will provide a more detailed description of the methods used and further analyses of the data. Our decision to make this web application openly available in advance of publication is a reflection of our desire to share this information with the research community as soon as possible. Please note that the data in the web application should be treated as provisional until the accompanying paper is finalised and accepted for publication. This paper should be cited whenever the data are used in other publications. We will update this page with citation information as soon as it is available.

We ask that users respect the interests of the many individuals who have contributed to this work. If you have questions regarding data use, please email Vikki Cornelius, vikkic@well.ox.ac.uk.

This web application is under active development and we invite your feedback about how you’re using this tool, what works, bugs and/or ideas for features or functionality. Get in touch by emailing appsupport@malariagen.net.

Building an atlas of parasite genome variation

A core goal of the Community Project is to establish an atlas of P. falciparum genome variation that is of direct benefit to ongoing scientific research and ultimately to malaria control.

In an earlier analysis of 227 parasite samples collected at six different locations in Africa, Asia and Oceania, we identified more than 86,000 exonic single nucleotide polymorphisms (SNPs). This initial SNP catalogue can be browsed and queried online and is described in detail in Manske, Miotto et al, Nature (2012), PMID 22722859. It was also used for the analysis published in Miotto et al, Nature Genetics (2013), PMID 23624527.

The data used here are based on analysis of 3,248 samples from 40 separate locations in 20 countries. The number of SNPs has increased to over 600,000, primarily due to the increased number of samples in this data set.

Users should refer to the accompanying paper (see above) for details of the samples and methods used. In brief, parasite DNA was obtained from blood samples collected from patients with malaria. Short-read DNA sequence data were generated using Illumina technology, and then mapped against the 3D7 reference genome. A series of analytical steps yielded an initial list of more than 2 million single nucleotide polymorphisms (SNPs) which was narrowed down to about 400,000 exonic SNPs after applying a series of stringent quality-control filters. All samples included in this dataset were genotyped at each typable SNP. Population genetic analyses were performed to determine the major divisions of global population structure, and each sample was assigned to one of seven major population groups. Allele frequency data were calculated for each of the seven major populations. Additional analysis was performed on variations occurring in four genes previously associated with drug resistance: pfcrt, pfdhfr, pfdhps, pfmdr1.