Our primary goal is to undertake a comprehensive analysis of genome variation in 3,000 parasite samples representing the major malaria endemic regions of the world. In doing so, we'll:
Provide an open set of P. falciparum genome sequence data that captures common variation across multiple populations in different parts of the world
Use a combination of short- and long-read sequencing technologies in controlled settings to establish standards for accuracy and completeness in the inference of P. falciparum genome sequence variation and to characterise the quality of information obtained from standard approaches
Combine information from read-mapping, full de novo assembly, variant assembly and iterative reassembly of specific genes to obtain the most comprehensive resource on P. falciparum variation to date
Develop new high-quality reference genomes that will increase the resolution and accuracy of variation analysis across the whole sample set
Analyse the data to learn about parasite population structure, epidemiology and history, mutational and recombinational processes generating diversity, evolutionary processes including drug resistance and immune evasion, and how such phenomena differ between populations and regions
The primary output of the project will be an open access data resource with companion publications on genomic diversity and population genetics that together provide a detailed description of P. falciparum genome variation across the major malaria endemic regions.
Other outputs will include papers on methodology and standardisation of protocols for P. falciparum sequence analysis and genotyping calling. All of the underlying data will be made publicly available for use by the scientific community, initially under Fort Lauderdale conditions.
Scientific working groups will drive forward specific areas of analysis including statistics and population genetics (led by Gil McVean and Roberto Amato), technology benchmarking (led by Dan Neafsey and Jim Stalker) and reference genomes (led by Matt Berriman and Thomas Otto). The MalariaGEN Resource Centre will provide support for partner studies, data production pipelines, communications and project management. The Project is overseen by the Pf3k Management Committee that is comprised of working group leaders, with support from members of the MalariaGEN Resource Centre.
The Pf3k project will have several discrete phases, beginning with a pilot phase which commenced in June 2014. During the pilot phase, the Project is analysing Illumina short-read sequence data on 2,512 samples from multiple locations in Africa and Asia, together with laboratory samples for benchmarking and methods development. The MalariaGEN P. falciparum Community Project and the Broad Institute, together with their partners, have contributed the samples for the pilot phase. The Project will generate genotype calls by a range of different methods, and will perform methodological comparisons and performance metrics.
During the pilot phase, the Project is undertaking a series of planned analyses that will form the basis of a manuscript, 'A global reference for genomic variation in Plasmodium falciparum', using Pilot Phase data (2,512 samples).
Sequence data and quality including SNPs, short tandem repeats, haplotypes and patterns of linkage disequilibrium
Population genetic phenomena such as population comparisons, mutation and recombination rates (haplotype structure and LD)
Signals of selection and demographic analyses
Merozoite surface proteins
var genes and genes implicated in drug resistance
Congo (Democratic Republic of the Congo) (CD)
The Gambia (GM)
We work with investigators who are pursuing independent partner studies in a number of malaria-endemic countries. Click a link below to learn more about their work.
The Project will publicly release data on a regular basis and prior to publication. Raw sequence reads will be deposited in either the European Nucleotide Archive (ENA) or the NCBI. Alignments and variant calls will be released on individual samples, and data formats and software developed by the Project will be made publicly available. Associated sample information will be made available in the public domain through the MalariaGEN website and other public databases as appropriate. Public release of the data will be associated with contact information for the lead investigators that have contributed the samples.