NEW: Dominic Kwiatkowski’s final paper... more
Ag1000G phase 1 preview data release
Project: Ag1000G

Released on 21 May 2014.

Mosquito

This page contains information about the phase 1 preview data release from the Anopheles gambiae 1000 Genomes project. This data release comprises variant calls on 103 samples collected in Uganda.

Any use of Project data is subject to the Terms of Use.

Data sets

Downloads

This data release comprises variant call data, available as either VCF or HDF5 format files, and other supporting data files, including a table of sample metadata.

All of the data files included in this release can be downloaded from the Wellcome Trust Sanger Institute public FTP site.

The same data files are also available from Amazon S3, see the following URL for a list of file locations:

If you are downloading files, please use the Sanger FTP site where possible. The ag1000g-eu S3 bucket is hosted in the eu-west-1 region, and so is fastest and most cost-efficient when downloading data into AWS compute resources hosted in the same region.

NOTE: Many browsers now do not support links to FTP sites. If you are experiencing difficulties, you may need to change your browser settings.

Go to FTP

Known issues

Incorrect data associated with missing calls (HDF5 only)
2 Jun 2014

In the HDF5 format files. where there is a missing genotype call, other data fields (e.g., GQ, AD, DP) may have incorrect values due to a bug in the format conversion software. This applies only to missing genotype calls, otherwise the call data fields in the HDF5 format files are correct and correspond to the data in the VCF format files.

Missing filters on FS, MQ, QD and ReadPosRankSum
30 May 2014

Four of the FILTER annotations that are declared in the header of the VCF were not actually applied to the variants due to an error in the VCF processing pipeline. These FILTER annotations are:

##FILTER=<ID=FS,Description=”FS > 60″>

##FILTER=<ID=MQ,Description=”MQ < 40″>

##FILTER=<ID=QD,Description=”QD < 5″>

##FILTER=<ID=ReadPosRankSum,Description=”ReadPosRankSum < -8″>

If you use these data, it is recommended that you apply these variant filters yourself prior to any analysis. If you use GATK to apply these filters you must use JEXL expressions with the correct value type, these are all Float fields so, e.g., the correct expression for the FS filter should be “FS > 60.0”.

Multiallelic filter
30 May 2014

This preview release is a subset of a larger callset which will be released in the near future. The Multiallelic filter was applied to the larger callset, and so some variants annotated in this preview release as Multiallelic will actually only have two segregating alleles.

Open access

Archived

Our approach to sharing data

Data package contact

Citations

To cite these data directly, please use the following citation format:

The Anopheles gambiae 1000 Genomes Consortium (2014): Ag1000G phase 1 preview data release. MalariaGEN. http://www.malariagen.net/data_package/aag1000g-phase-1-preview-data-release/