Ag1000G phase 1 preview data release

Project: Ag1000G

Released on 21 May 2014

This page contains information about the phase 1 preview data release from the Anopheles gambiae 1000 Genomes project. This data release comprises variant calls on 103 samples collected in Uganda.

Any use of Project data is subject to the Terms of Use.


This data release comprises variant call data, available as either VCF or HDF5 format files, and other supporting data files, including a table of sample metadata.

All of the data files included in this release can be downloaded from the Wellcome Trust Sanger Institute public FTP site.

The same data files are also available from Amazon S3, see the following URL for a list of file locations:

If you are downloading files, please use the Sanger FTP site where possible. The ag1000g-eu S3 bucket is hosted in the eu-west-1 region, and so is fastest and most cost-efficient when downloading data into AWS compute resources hosted in the same region.

Known issues

2 Jun 2014
Incorrect data associated with missing calls (HDF5 only)

In the HDF5 format files. where there is a missing genotype call, other data fields (e.g., GQ, AD, DP) may have incorrect values due to a bug in the format conversion software. This applies only to missing genotype calls, otherwise the call data fields in the HDF5 format files are correct and correspond to the data in the VCF format files.

30 May 2014
Missing filters on FS, MQ, QD and ReadPosRankSum

Four of the FILTER annotations that are declared in the header of the VCF were not actually applied to the variants due to an error in the VCF processing pipeline. These FILTER annotations are:

##FILTER=<ID=FS,Description="FS > 60">

##FILTER=<ID=MQ,Description="MQ < 40">

##FILTER=<ID=QD,Description="QD < 5">

##FILTER=<ID=ReadPosRankSum,Description="ReadPosRankSum < -8">

If you use these data, it is recommended that you apply these variant filters yourself prior to any analysis. If you use GATK to apply these filters you must use JEXL expressions with the correct value type, these are all Float fields so, e.g., the correct expression for the FS filter should be "FS > 60.0".

30 May 2014
Multiallelic filter

This preview release is a subset of a larger callset which will be released in the near future. The Multiallelic filter was applied to the larger callset, and so some variants annotated in this preview release as Multiallelic will actually only have two segregating alleles.