An open dataset of Plasmodium falciparum genome variation in 7,000 worldwide samples

24 Feb 2021
MalariaGEN et. al
Wellcome Open Research 2021 6 42 DOI: 10.12688/wellcomeopenres.16168.1

This page provides information about the Pf6 dataset which contains genome variation data on 7,000 worldwide samples of Plasmodium falciparum.  The key publication is MalariaGEN et al, Wellcome Open Research 2021642 DOI: 10.12688/wellcomeopenres.16168.1.   You can browse summary data using the Pf6 data exploration tool.  

Background and previous releases

This dataset is based on the MalariaGEN Plasmodium falciparum Community Project which supported  groups around the world to integrate parasite genome sequencing into clinical and epidemiological studies of malaria.  It comprises multiple partner studies, each with its own research objectives and led by a local investigator.  Genome sequencing is performed centrally, and partner studies are free to analyse and publish the genetic data produced on their own samples, in line with MalariaGEN’s guiding principles on equitable data sharing.

Aggregated data from the Community Project were initially released through a companion project called Pf3k whose goal was to bring together leading analysts from multiple institutions to benchmark and standardise methods of variant discovery and genotyping calling.   The Pf3k dataset can be explored using an interactive web application.

The open dataset was enlarged in 2016 when multiple partner studies contributed to a consortial publication on 3,488 samples from 23 countries.  The variants and genotypes described in this publication used version 3 of the analysis pipeline. Data produced using an earlier version of the data analysis pipeline can be explored using an interactive web application.

About the version 6 data pipeline

In 2018 the Plasmodium falciparum Community Project upgraded to version 6 of its variant discovery and genotype calling pipeline.  Details of the methods can be found in the accompanying paper and here.  The major change from previous versions is that the version 6 pipeline is based on GATK and utilises findings on genome accessibility generated by P. falciparum Genetic Crosses Project

Content of the data release

This release contains details on contributing partner studies, sample metadata and key sample attributes inferred from genomic data, and genomic data including raw sequence reads. Further details and analytical results can be found in the accompanying data release paper.

These data are available open access. Publications using these data should acknowledge and cite the source of the data using the following format: "This publication uses data from the MalariaGEN Plasmodium falciparum Community Project as described in ‘An open dataset of Plasmodium falciparum genome variation in 7,000 worldwide samples. MalariaGEN et al, Wellcome Open Research 2021642 DOI: 10.12688/wellcomeopenres.16168.1.'".

  • Study information: Details of the 49 contributing partner studies, including description, contact information and key people.
  • Sample provenance and sequencing metadata: sample information including partner study information, location and year of collection, ENA accession numbers, and QC information for 7,113 samples from 28 countries.
  • Measure of complexity of infections: characterisation of within-host diversity (FWS) for 5,970 QC pass samples.
  • Drug resistance marker genotypes: genotypes at known markers of drug resistance for 7,113 samples, containing amino acid and copy number genotypes at six loci: crt, dhfr, dhps, mdr1, kelch13, plasmepsin 2-3.
  • Inferred resistance status classification: classification of 5,970 QC pass samples into different types of resistance to 10 drugs or combinations of drugs and to RDT detection: chloroquine, pyrimethamine, sulfadoxine, mefloquine, artemisinin, piperaquine, sulfadoxine- pyrimethamine for treatment of uncomplicated malaria, sulfadoxine- pyrimethamine for intermittent preventive treatment in pregnancy, artesunate-mefloquine, dihydroartemisinin-piperaquine, hrp2 and hrp3 genes deletions.
  • Drug resistance markers to inferred resistance status: details of the heuristics utilised to map genetic markers to resistance status classification.
  • Gene differentiation: estimates of global and local differentiation for 5,561 genes.
  • Short variants genotypes: Genotype calls on 6,051,696 SNPs and short indels in 7,113 samples from 29 countries, available both as VCF and zarr files.

A README file describes in fine detail all the files included in the release, the format and interpretation of each column, and contains some tips and tricks for accessing genotype data in VCF and zarr files.

NOTE: Many browsers now do not support links to FTP sites. If you are experiencing difficulties, you may need to change your browser settings.

Supplementary data

The following supplementary data is available as a single document download: supplementary data

  • Supplementary Note 
    • Analysis of local differentiation score
    • The classic 76T chloroquine resistance mutation in crt is found on multiple haplotypes
    • Suplhadoxine-pyrimethamine resistance is widespread and associated with many haplotypes
    • mdr1 duplications have many different breakpoints
    • Artemisinin, piperaquine, and mefloquine resistance
    • No evidence of resistance to less commonly used antimalarials
  • Supplementary Table 1. Breakdown of analysis set samples by geography
  • Supplementary Table 2. Studies contributing samples
  • Supplementary Table 3. Summary of discovered variant positions
  • Supplementary Table 4. Breakpoints of duplications of gch1
  • Supplementary Table 5. Breakpoints of duplications of mdr1
  • Supplementary Table 6. Breakpoints of duplications of plasmepsin 2-3
  • Supplementary Table 7. Genes ranked by global differentiation score
  • Supplementary Table 8. Genes ranked by local differentiation score
  • Supplementary Table 9. Number of samples used to determine proportions in Table 2
  • Supplementary Table 10. Frequencies of mutations associated with mono- and multi-drug resistance pre- and post-2011
  • Supplementary Table 11. Frequency of crt amino acid 72-76 haplotypes
  • Supplementary Table 12. Frequencies of dhfr (51, 59, 108, 164) and dhps (437, 540, 581, 613) multi-locus haplotypes
  • Supplementary Table 13. Frequency of HRP2 and HRP3 deletions by country
  • Supplementary Table 14. Alleles at six mitochondrial positions used for the species identification
  • Supplementary Figure 1. Histogram of local differentiation score for all genes

Publications that have used the P. falciparum Community Project data resource, prior to and including version 6

  • Auburn S, Campino S, Clark TG, et al. An effective method to purify Plasmodium falciparum DNA directly from clinical blood samples for whole genome high-throughput sequencing. PLoS One 2011; 6: e22213.
  • Venkatesan M, Amaratunga C, Campino S, et al. Using CF11 cellulose columns to inexpensively and effectively remove human DNA from Plasmodium falciparum-infected whole blood samples. Malar J 2012; 11: 41.
  • Manske M, Miotto O, Campino S, et al. Analysis of Plasmodium falciparum diversity in natural infections by deep sequencing. Nature 2012; 487: 375–9.
  • Vauterin P, Jeffery B, Miles A, et al. Panoptes: Web-based exploration of large scale genome variation data. Bioinformatics 2017; 33. DOI:10.1093/bioinformatics/btx410.
  • MalariaGEN Plasmodium falciparum Comunity Project. Genomic epidemiology of artemisinin resistant malaria. Elife 2016; 5. DOI:10.7554/eLife.08714.
  • Miotto O, Almagro-Garcia J, Manske M, et al. Multiple populations of artemisinin-resistant Plasmodium falciparum in Cambodia. Nat Genet 2013; 45: 648–55.
  • Ariey F, Witkowski B, Amaratunga C, et al. A molecular marker of artemisinin-resistant Plasmodium falciparum malaria. Nature 2014; 505: 50–5.
  • Nwakanma DC, Duffy CW, Amambua-Ngwa A, et al. Changes in malaria parasite drug resistance in an endemic population over a 25-year period with resulting genomic evidence of selection. J Infect Dis 2014; 209: 1126–35.
  • Ashley EA, Dhorda M, Fairhurst RM, et al. Spread of Artemisinin Resistance in Plasmodium falciparum Malaria. N Engl J Med 2014; 371: 411–23.
  • Kamau E, Campino S, Amenga-Etego L, et al. K13-propeller polymorphisms in Plasmodium falciparum parasites from sub-Saharan Africa. J Infect Dis 2014; 211: 1352–5.
  • Ravenhall M, Benavente ED, Mipando M, et al. Characterizing the impact of sustained sulfadoxine/pyrimethamine use upon the Plasmodium falciparum population in Malawi. Malar J 2016; 15: 575.
  • Gomes AR, Ravenhall M, Benavente ED, et al. Genetic diversity of next generation antimalarial targets: A baseline for drug resistance surveillance programmes. Int J Parasitol Drugs Drug Resist 2017; 7: 174–80.
  • Apinjoh TO, Mugri RN, Miotto O, et al. Molecular markers for artemisinin and partner drug resistance in natural Plasmodium falciparum populations following increased insecticide treated net coverage along the slope of mount Cameroon: Cross-sectional study. Infect Dis Poverty 2017; 6. DOI:10.1186/s40249-017-0350-y.
  • Ross LS, Dhingra SK, Mok S, et al. Emerging Southeast Asian PfCRT mutations confer Plasmodium falciparum resistance to the first-line antimalarial piperaquine. Nat Commun 2018; 9: 3314.
  • Amato R, Pearson RD, Almagro-Garcia J, et al. Origins of the current outbreak of multidrug-resistant malaria in southeast Asia: a retrospective genetic study. Lancet Infect Dis 2018; 18: 337–45.
  • Amambua-Ngwa A, Amenga-Etego L, Kamau E, et al. Major subpopulations of Plasmodium falciparum in sub-Saharan Africa. Science 2019; 365: 813–6.
  • Hamilton WL, Amato R, van der Pluijm RWRW, et al. Evolution and expansion of multidrug-resistant malaria in southeast Asia: a genomic epidemiology study. Lancet Infect Dis 2019; published online July. DOI:10.1016/S1473-3099(19)30392-5.
  • van der Pluijm RW, Imwong M, Chau NH, et al. Determinants of dihydroartemisinin-piperaquine treatment failure in Plasmodium falciparum malaria in Cambodia, Thailand, and Vietnam: a prospective clinical, pharmacological, and genetic study. Lancet Infect Dis 2019; published online July. DOI:10.1016/S1473-3099(19)30391-3.
  • Miotto O, Amato R, Ashley EA, et al. Genetic architecture of artemisinin-resistant Plasmodium falciparum. Nat Genet 2015; 47: 226–34.
  • Takala-Harrison S, Jacob CG, Arze C, et al. Independent Emergence of Artemisinin Resistance Mutations Among Plasmodium falciparum in Southeast Asia. J Infect Dis 2015; 211: 670–9.
  • Amato R, Lim P, Miotto O, et al. Genetic markers associated with dihydroartemisinin–piperaquine failure in Plasmodium falciparum malaria in Cambodia: a genotype–phenotype association study. Lancet Infect Dis 2017; 17: 164–73.
  • Borrmann S, Straimer J, Mwai L, et al. Genome-wide screen identifies new candidate genes associated with artemisinin susceptibility in Plasmodium falciparum in Kenya. Sci Rep 2013; 3: 3318.
  • Wendler JP, Okombo J, Amato R, et al. A Genome Wide Association Study of Plasmodium falciparum Susceptibility to 22 Antimalarial Drugs in Kenya. PLoS One 2014; 9: e96486.
  • Zhu L, Tripathi J, Rocamora FM, et al. The origins of malaria artemisinin resistance defined by a genetic and transcriptomic background. Nat Commun 2018; 9: 5158.
  • Sepúlveda N, Phelan J, Diez-Benavente E, et al. Global analysis of Plasmodium falciparum histidine-rich protein-2 (pfhrp2) and pfhrp3 gene deletions using whole-genome sequencing data and meta-analysis. Infect Genet Evol 2018; 62: 211–9.
  • Williams AR, Douglas AD, Miura K, et al. Enhancing blockade of Plasmodium falciparum erythrocyte invasion: assessing combinations of antibodies against PfRH5 and other merozoite antigens. PLoS Pathog 2012; 8: e1002991.
  • Benavente ED, Oresegun DR, de Sessions PF, et al. Global genetic diversity of var2csa in Plasmodium falciparum with implications for malaria in pregnancy and vaccine development. Sci Rep 2018; 8: 15429.
  • Amambua-Ngwa A, Tetteh KK a, Manske M, et al. Population genomic scan for candidate signatures of balancing selection to guide antigen characterization in malaria parasites. PLoS Genet 2012; 8: e1002992.
  • Campino S, Marin-Menendez A, Kemp A, et al. A forward genetic screen reveals a primary role for Plasmodium falciparum Reticulocyte Binding Protein Homologue 2a and 2b in determining alternative erythrocyte invasion pathways. PLOS Pathog 2018; 14: e1007436.
  • Crosnier C, Iqbal Z, Knuepfer E, et al. Binding of Plasmodium falciparum merozoite surface proteins DBLMSP and DBLMSP2 to human immunoglobulin M is conserved amongst broadly diverged sequence variants. J Biol Chem 2016; epub ahead. DOI:10.1074/jbc.M116.722074.
  • Amambua-Ngwa A, Jeffries D, Amato R, et al. Consistent signatures of selection from genomic analysis of pairs of temporal and spatial Plasmodium falciparum populations from the Gambia. Sci Rep 2018; 8. DOI:10.1038/s41598-018-28017-5.
  • Duffy CW, Amambua-Ngwa A, Ahouidi AD, et al. Multi-population genomic analysis of malaria parasites indicates local selection and differentiation at the gdv1 locus regulating sexual development. Sci Rep 2018; 8: 15763.
  • Duffy CW, Ba H, Assefa S, et al. Population genetic structure and adaptation of malaria parasites on the edge of endemic distribution. Mol Ecol 2017; 26: 2880–94.
  • Duffy CW, Assefa SA, Abugri J, et al. Comparison of genomic signatures of selection on Plasmodium falciparum between different regions of a country with high malaria endemicity. BMC Genomics 2015; 16: 527.
  • Mobegi VA, Duffy CW, Amambua-Ngwa A, et al. Genome-Wide Analysis of Selection on the Malaria Parasite Plasmodium falciparum in West African Populations of Differing Infection Endemicity. Mol Biol Evol 2014; 31: 1490–9.
  • Shetty AC, Jacob CG, Huang F, et al. Genomic structure and diversity of Plasmodium falciparum in Southeast Asia reveal recent parasite migration patterns. Nat Commun 2019; 10: 2665.
  • Auburn S, Campino S, Miotto O, et al. Characterization of within-host Plasmodium falciparum diversity using next-generation sequence data. PLoS One 2012; 7: e32891.
  • Assefa SA, Preston MD, Campino S, Ocholla H, Sutherland CJ, Clark TG. estMOI: estimating multiplicity of infection using parasite deep sequencing data. Bioinformatics 2014; 30: 1292–4.
  • Murray L, Mobegi VA, Duffy CW, et al. Microsatellite genotyping and genome-wide single nucleotide polymorphism-based indices of Plasmodium falciparum diversity within clinical infections. Malar J 2016; 15: 275.
  • Chang H-H, Worby CJ, Yeka A, et al. THE REAL McCOIL: A method for the concurrent estimation of the complexity of infection and SNP allele frequency for malaria parasites. 2017; 13: e1005348.
  • O’Brien JD, Iqbal Z, Wendler J, Amenga-Etego L. Inferring Strain Mixture within Clinical Plasmodium falciparum Isolates from Genomic Sequence Data. PLOS Comput Biol 2016; 12: e1004824.
  • Robinson T, Campino SG, Auburn S, et al. Drug-resistant genotypes and multi-clonality in Plasmodium falciparum analysed by direct genome sequencing from peripheral blood of malaria patients. PLoS One 2011; 6: in press.
  • O’Brien JD, Amenga-Etego L, Li R. Approaches to estimating inbreeding coefficients in clinical isolates of Plasmodium falciparum from genomic sequence data. Malar J 2016; 15: 473.
  • Zhu SJ, Almagro-Garcia J, McVean G. Deconvolution of multiple infections in Plasmodium falciparum from high throughput sequencing data. Bioinformatics 2018; 34: 9–15.
  • Zhu SJ, Hendry JA, Almagro-Garcia J, et al. The origins and relatedness structure of mixed infections vary with local prevalence of P. falciparum malaria. Elife 2019; 8. DOI:10.7554/eLife.40845.
  • Henden L, Lee S, Mueller I, Barry A, Bahlo M. Identity-by-descent analyses for measuring population dynamics and selection in recombining pathogens. PLOS Genet 2018; 14: e1007279.
  • Schaffner SF, Taylor AR, Wong W, Wirth DF, Neafsey DE. hmmIBD: software to infer pairwise identity by descent between haploid genotypes. Malar J 2018; 17: 196.
  • Samad H, Coll F, Preston MD, Ocholla H, Fairhurst RM, Clark TG. Imputation-Based Population Genetics Analysis of Plasmodium falciparum Malaria Parasites. PLoS Genet 2015; 11: e1005131.
  • Ravenhall M, Campino S, Clark TG. SV-Pop: population-based structural variant analysis and visualization. BMC Bioinformatics 2019; 20: 136.
  • Jacob CG, Tan JC, Miller BA, et al. A microarray platform and novel SNP calling algorithm to evaluate Plasmodium falciparum field samples of low DNA quantity. BMC Genomics 2014; 15: 719.
  • Preston MD, Assefa S a, Ocholla H, et al. PlasmoView: A Web-based Resource to Visualise Global Plasmodium falciparum Genomic Variation. J Infect Dis 2014; 209: 1808–15.