Pf3k Pilot Phase Terms of Use

6 Oct 2015

The Pf3k Pilot Phase Terms of Use were applied to Pilot Phase data releases when they were publically released. In September 2016 these restrictions were lifted from Pf3k pilot data release packages 1-5. The Terms of Use below have now been archived and the data are available open access.

Data release

The Pf3k project will publicly release data on a regular basis and prior to publication. Raw sequence reads will be deposited in the European Nucleotide Archive (ENA) or the NCBI Short Read Archive (SRA). Alignments and variant calls will be released on individual samples, and data formats and software developed by the Project will be made publicly available. Associated sample metadata will be made available in the public domain through the MalariaGEN website and other public databases as appropriate. Public release of the data will be associated with contact information for the lead investigators that have contributed the samples.

The MalariaGEN P. falciparum Community Project and the Broad Institute, together with their partners, have agreed to contribute an initial data set to the Pf3k project. The Data Producers (the Pf3k Consortium and its contributing investigators) will release these data prior to publication, in the expectation that they will be valuable for many researchers and that their planned analyses using these data will be respected.

Terms of Use

To support the initial data release, the Consortium developed draft Terms of Use, adapted from those of the human 1000 Genomes project, which was widely viewed as a standard in data sharing for large-scale genetic variation data resource projects. Guided by the Fort Lauderdale Agreement for Sharing Data from Large-scale Biological Research Projects (, Data Users may use the data for their own studies, but are expected to allow the Data Producers to first present and publish on their intended analyses.

Analyses of Project data

The Project plans to publish analyses of: sequence data and quality; SNPs, short tandem repeats, copy number variations and other structural variants; haplotypes and patterns of linkage disequilibrium; population genetic phenomena such as population comparisons, mutation and recombination rates, signals of selection and demographic analyses; functional annotations; and analyses of regions of general interest such as genes encoding merozoite surface proteins, var genes, and genes implicated in drug resistance. Since the data is derived from blood samples of infected individuals in a natural setting, some samples will include mixed infections of P. falciparum with other Plasmodium species and this will form part of the Project analyses.

Talks, posters, and papers on all such analyses are to be published first by approved presenters on behalf of the Project. When these planned analyses have been published by the Project, then Data Users are free to present and publish using the Project data. For more information about the Project manuscript, see the Pf3k project page.

In consultation with the Project, Data Producers may make presentations and publish papers on more extensive analyses of specific topics coincident with the main Project analysis presentations and publications.

Methods development using Project data

Data Users who have used small amounts of Project data may present methods development posters, talks, and papers that include these data prior to the main Project publications. The Project should be acknowledged and cited using the format given below. Methods presentations or papers on global analyses or analyses using large amounts of Project data would be similar to large-scale analyses of Project data: Data Producers may make presentations or submit papers at the same time as the main Project presentations and papers, and others could do so after the Project publishes on the global analyses.

Candidate region studies using Project data

Data Users may present and publish on use of Project data in specific chromosome regions (unless otherwise stated in this document) or as summaries (such as the total number of variants), prior to the main Project publications. The Project should be acknowledged and cited using the format given below.

Population comparisons using Project data

Data Users may use Project data as controls or additional information for comparisons with their own data sets, prior to the main Project publications, provided this use does not conflict with global analyses of Project data. The Project should be acknowledged and cited using the format given below.

Project data exempt from terms of use

Data contributed by the Broad Institute for 137 samples from Senegal have previously been released open access and are therefore exempt from Project terms of use.

Acknowledging and citing the source of the data

Publications using Project data should cite the source using the following format: "This publication uses data generated by the Pf3k project ( and in [here give details of most recent relevant consortial publication]." Consortial publications will be listed on the Pf3k webpages.

Uses of the data to study other organisms

The open access sequence read data may only be used for Plasmodium genome analysis and must not be used to investigate humans or other organisms.


Data Users who have questions about whether they may make presentations or submit papers using Project data may contact the Pf3k Management Committee (pf3k_mc [at]