NEW: Dominic Kwiatkowski’s final paper... more
Catalogue of Genetic Variation in P. falciparum - v6.0

Released on 13 Nov 2020.

Parasite

This page contains information about the Pf6 data release: data generated by the Plasmodium falciparum Community Project using the version 6 pipeline for variant discovery and genotype calling. This release contains sample information, accession numbers and genotype calls for samples as described in Plasmodium falciparum Community Project: about the version 6 data.

In 2018 the Plasmodium falciparum Community Project upgraded to version 6 of its variant discovery and genotype calling pipeline. Details of the methods can be found in the accompanying paper. The major change from previous versions is that the version 6 pipeline is based on GATK and utilises findings on genome accessibility generated by P. falciparum Genetic Crosses Project.

These data are available open access. Publications using these data should acknowledge and cite the source of the data using the following format: “This publication uses data from the MalariaGEN Plasmodium falciparum Community Project as described in ‘An open dataset of Plasmodium falciparum genome variation in 7,000 worldwide samples. MalariaGEN et al, Wellcome Open Research 2021642 DOI: 10.12688/wellcomeopenres.16168.1‘”.

Data sets

Study information

Details of the 49 contributing partner studies, including description, contact information and key people.

Download study information.

Sample provenance and sequencing metadata

Sample information including partner study information, location and year of collection, ENA accession numbers, and QC information for 7,113 samples from 28 countries.

Download sample provenance and sequencing metadata.

Measure of complexity of infections

Characterisation of within-host diversity (FWS) for 5,970 QC pass samples

Download measure of complexity of infections.

Drug resistance marker genotypes

Genotypes at known markers of drug resistance for 7,113 samples, containing amino acid and copy number genotypes at six loci: crt, dhfr, dhps, mdr1, kelch13, plasmepsin 2-3.

Download drug resistance marker genotypes.

Inferred resistance status classification

Classification of 5,970 QC pass samples into different types of resistance to 10 drugs or combinations of drugs and to RDT detection: chloroquine, pyrimethamine, sulfadoxine, mefloquine, artemisinin, piperaquine, sulfadoxine- pyrimethamine for treatment of uncomplicated malaria, sulfadoxine- pyrimethamine for intermittent preventive treatment in pregnancy, artesunate-mefloquine, dihydroartemisinin-piperaquine, hrp2 and hrp3 genes deletions.

Download inferred resistance status classification.

Drug resistance markers to inferred resistance status

Details of the heuristics utilised to map genetic markers to resistance status classification.

Download drug resistance markers to inferred resistance status.

Gene differentiation

Estimates of global and local differentiation for 5,561 genes.

Download gene differentiation.

Short variant genotypes

Genotype calls on 6,051,696 SNPs and short indels in 7,113 samples from 29 countries, available both as VCF and zarr files.

Download genotype call VCF: ftp://ngs.sanger.ac.uk/production/malaria/pfcommunityproject/Pf6/Pf_6_vcf/

Download genotype call zarr: ftp://ngs.sanger.ac.uk/production/malaria/pfcommunityproject/Pf6/Pf_6.zarr.zip

NOTE: You may need to download a free FTP client to access the FTP links.

Release notes

Acknowledging and citing these data:
18 Apr 2023

This release contains details on contributing partner studies, sample metadata and key sample attributes inferred from genomic data, and genomic data including raw sequence reads. Further details and analytical results can be found in the accompanying data release paper: An open dataset of Plasmodium falciparum genome variation in 7,000 worldwide samples. MalariaGEN et al, Wellcome Open Research 2021642 DOI: 10.12688/wellcomeopenres.16168.1

A README file describes in fine detail all the files included in the release, the format and interpretation of each column, and contains some tips and tricks for accessing genotype data in VCF and zarr files.

Open access

Our approach to sharing data

Data package contact

Citations

Publications using these data should acknowledge and cite the source of the data using the following format:

“This publication uses data from the MalariaGEN Plasmodium falciparum Community Project as described in ‘An open dataset of Plasmodium falciparum genome variation in 7,000 worldwide samples. MalariaGEN et al, Wellcome Open Research 2021642 DOI: 10.12688/wellcomeopenres.16168.1’”