Pf3k pilot data release 3

Project: Pf3k

Released on 14 Apr 2015

This release contains sample information, accession numbers, and baseline genotypes for 2,512 samples comprised of the 1,931 samples included in the 2.0 pilot data release as well as an additonal 581 samples collected in Ghana, Mali and Malawi.

At the time of their release, these data were subject to the Pf3k Pilot Phase Terms of Use. In September 2016, these restrictions were lifted and this dataset is now available open access.

3.0 Data

This data set comprises sample information for 2,512 P. falciparum samples.

This data set includes:

  • A table of sample metadata in tab-delimited and Excel file formats. This table includes:
    • Accessions for downloading the sequence reads from the European Nucleotide Archive (ENA)
    • Sampling location
    • Contributing partner study ID and contact person
    • Mapping metadata including sequence coverage metrics

These data can be downloaded from the Wellcome Trust Sanger Institute public ftp site.

3.1 Data

This data set contains a set of baseline genotypes for the 3.0 sample set. These genotypes are based on a set of high-quality SNP loci from the MalariaGEN partner studies, but these samples have not been through de novo variant discovery as a set and these genotypes should not be taken as a quality-controlled output of the Pf3K project. They are provided for public interest, and as a basis for future methods development. For more information, see the README files on the ftp site.

Note that the SNP loci provided in this release are a larger set than in the earlier Pf3K releases (944k cf 682k), so this release contains additional genotype data for previously released samples.

This data set includes:

These data can be downloaded from the Wellcome Trust Sanger Institute public ftp site.

Release notes

9 Feb 2016
Analysis BAMs removed

This release previously contained analysis BAM files, one-per-sample, aligned to the 3D7_v3 reference. These data have been superseded; please see Pf3k pilot data release 5 for the latest analysis BAMs for this sample set.

Known issues

19 May 2015
Incorrect DP values

For some SNPs, the Total Depth (DP) field in both the ‘INFO' and 'genotypes' columns is greater than the sum of the alleles depth (AD). If you use this field, we recommended that you recalculate it prior to any analysis.

19 May 2015
Samples with zero reads

86 samples in the VCF file are incorrectly reported as having zero reads at every position of the genome. However, the BAM files and statistics generated from these samples are correct.

For a list of affected samples, see the README file for this data release: