The Pf3k project will publicly release data on a regular basis and prior to publication. Raw sequence reads will be deposited in the European Nucleotide Archive (ENA) or the NCBI Short Read Archive (SRA). Alignments and variant calls will be released on individual samples, and data formats and software developed by the Project will be made publicly available. Associated sample metadata will be made available in the public domain through the MalariaGEN website and other public databases as appropriate. Public release of the data will be associated with contact information for the lead investigators that have contributed the samples.
The MalariaGEN P. falciparum Community Project and the Broad Institute, together with their partners, have agreed to contribute an initial data set to the Pf3k project. The Data Producers (the Pf3k Consortium and its contributing investigators) will release these data prior to publication, in the expectation that they will be valuable for many researchers and that their planned analyses using these data will be respected.
Analyses of Project data
The Project plans to publish analyses of: sequence data and quality; SNPs, short tandem repeats, copy number variations and other structural variants; haplotypes and patterns of linkage disequilibrium; population genetic phenomena such as population comparisons, mutation and recombination rates, signals of selection and demographic analyses; functional annotations; and analyses of regions of general interest such as genes encoding merozoite surface proteins, var genes, and genes implicated in drug resistance. Since the data is derived from blood samples of infected individuals in a natural setting, some samples will include mixed infections of P. falciparum with other Plasmodium species and this will form part of the Project analyses.
Talks, posters, and papers on all such analyses are to be published first by approved presenters on behalf of the Project. When these planned analyses have been published by the Project, then Data Users are free to present and publish using the Project data. For more information about the Project manuscript, see the Pf3k project page.
In consultation with the Project, Data Producers may make presentations and publish papers on more extensive analyses of specific topics coincident with the main Project analysis presentations and publications.
Methods development using Project data
Data Users who have used small amounts of Project data may present methods development posters, talks, and papers that include these data prior to the main Project publications. The Project should be acknowledged and cited using the format given below. Methods presentations or papers on global analyses or analyses using large amounts of Project data would be similar to large-scale analyses of Project data: Data Producers may make presentations or submit papers at the same time as the main Project presentations and papers, and others could do so after the Project publishes on the global analyses.
Candidate region studies using Project data
Data Users may present and publish on use of Project data in specific chromosome regions (unless otherwise stated in this document) or as summaries (such as the total number of variants), prior to the main Project publications. The Project should be acknowledged and cited using the format given below.
Population comparisons using Project data
Data Users may use Project data as controls or additional information for comparisons with their own data sets, prior to the main Project publications, provided this use does not conflict with global analyses of Project data. The Project should be acknowledged and cited using the format given below.
Acknowledging and citing the source of the data
Publications using Project data should cite the source using the following format: "This publication uses data generated by the Pf3k project (www.malariagen.net/pf3k) and in [here give details of most recent relevant consortial publication]." Consortial publications will be listed on the Pf3k webpages.
Uses of the data to study other organisms
The open access sequence read data may only be used for Plasmodium genome analysis and must not be used to investigate humans or other organisms.
Data Users who have questions about whether they may make presentations or submit papers using Project data may contact the Pf3k Management Committee (pf3k_mc [at] malariagen.net).