NEW: Dominic Kwiatkowski’s final paper... more
Pf3k pilot data release 4
Project: Pf3k

Released on 16 Oct 2015.

Parasite

This release contains sample information and accession numbers, analysis BAMs, and de novo variant discovery and genotyping across 2,512 samples collected in 14 countries, as well as five lab strains included for method development validation.

At the time of their release, these data were subject to the Pf3k Pilot Phase Terms of Use. In September 2016, these restrictions were lifted and this dataset is now available open access.

Data sets

4.1 Data

This data set contains a set of de novo genotypes for the 4.0 sample set. The genotyping, including both indel and SNP variants, was performed using a pipeline based on GATK best practices (http://www.broadinstitute.org/gatk/guide/best-practices). These genotypes should not be taken as a quality-controlled output of the Pf3K project and are provided for public interest and as a basis for future methods development.

For more information, see the README files on the ftp site.

Files in this release include:

  • Per-chromosome VCF files (http://vcftools.sourceforge.net/specs.html) containing genotypes for all 4.0 samples at ~2M high-quality SNP and indel loci.

These data can be downloaded from the Wellcome Trust Sanger Institute public ftp site.

NOTE: Many browsers now do not support links to FTP sites. If you are experiencing difficulties, you may need to change your browser settings.

Go to FTP

4.0 Data

This data set includes sample information and analysis BAMs from 2,517 samples.

The files in this data set include:

  • A table of sample metadata in tab-delimited and Excel file formats. This table includes the accessions for downloading the sequence reads from the European Nucleotide Archive (ENA), the sampling location, the contributing partner study ID and contact person, and mapping metadata including sequence coverage metrics.
  • Analysis BAM files. These files, one per sample, contain alignments of the raw sequence reads to the 3D7v3.1 reference genome.

These data can be downloaded from the Wellcome Trust Sanger Institute public ftp site.

NOTE: Many browsers now do not support links to FTP sites. If you are experiencing difficulties, you may need to change your browser settings.

Go to FTP

Known issues

VCF version
16 Oct 2015

The VCF files are tagged as version 4.1 spec vcf, but are in fact version 4.2. This results from a known bug in the current version of GATK. http://gatkforums.broadinstitute.org/gatk/discussion/5893/gatk-3-4-46-producing-v4-2-vcfs

Go to FTP

Open access

Archived

Our approach to sharing data

Data package contact

Citations

To cite this release directly, please use the following format:

The Pf3K Project (2015): pilot data release 4. http://www.malariagen.net/data_package/pf3k-pilot-data-release-4/