NEW: Dominic Kwiatkowski’s final paper... more
Pf3k pilot data release 3
Project: Pf3k

Released on 14 Apr 2015.

Parasite

This release contains sample information, accession numbers, and baseline genotypes for 2,512 samples comprised of the 1,931 samples included in the 2.0 pilot data release as well as an additonal 581 samples collected in Ghana, Mali and Malawi.

At the time of their release, these data were subject to the Pf3k Pilot Phase Terms of Use. In September 2016, these restrictions were lifted and this dataset is now available open access.

Data sets

3.0 Data

This data set comprises sample information for 2,512 P. falciparum samples.

This data set includes:

  • A table of sample metadata in tab-delimited and Excel file formats. This table includes:
    • Accessions for downloading the sequence reads from the European Nucleotide Archive (ENA)
    • Sampling location
    • Contributing partner study ID and contact person
    • Mapping metadata including sequence coverage metrics

These data can be downloaded from the Wellcome Trust Sanger Institute public ftp site.

NOTE: Many browsers now do not support links to FTP sites. If you are experiencing difficulties, you may need to change your browser settings.

Go to FTP

3.1 Data

This data set contains a set of baseline genotypes for the 3.0 sample set. These genotypes are based on a set of high-quality SNP loci from the MalariaGEN partner studies, but these samples have not been through de novo variant discovery as a set and these genotypes should not be taken as a quality-controlled output of the Pf3K project. They are provided for public interest, and as a basis for future methods development. For more information, see the README files on the ftp site.

Note that the SNP loci provided in this release are a larger set than in the earlier Pf3K releases (944k cf 682k), so this release contains additional genotype data for previously released samples.

This data set includes:

  • A VCF file (http://vcftools.sourceforge.net/specs.html) containing genotypes for all 3.0 samples at 944k high-quality SNP loci.

These data can be downloaded from the Wellcome Trust Sanger Institute public ftp site.

NOTE: Many browsers now do not support links to FTP sites. If you are experiencing difficulties, you may need to change your browser settings.

Go to FTP

Release notes

MedRxiv
9 Feb 2016

This release previously contained analysis BAM files, one-per-sample, aligned to the 3D7_v3 reference. These data have been superseded; please see Pf3k pilot data release 5 for the latest analysis BAMs for this sample set.

Known issues

Incorrect DP values
19 May 2015

For some SNPs, the Total Depth (DP) field in both the ‘INFO’ and ‘genotypes’ columns is greater than the sum of the alleles depth (AD). If you use this field, we recommended that you recalculate it prior to any analysis.

Samples with zero reads
19 May 2015

86 samples in the VCF file are incorrectly reported as having zero reads at every position of the genome. However, the BAM files and statistics generated from these samples are correct.

For a list of affected samples, see the README file for this data release: README

NOTE: Many browsers now do not support links to FTP sites. If you are experiencing difficulties, you may need to change your browser settings.

Open access

Archived

Our approach to sharing data

Data package contact

Citations

To cite this release directly, please use the following format:

The Pf3K Project (2015): pilot data release 3.

http://www.malariagen.net/data_package/pf3k-pilot-data-release-3/