NEW: Vector genomics fellows... more
Ag1000G phase 1 AR3 data release
Project: Ag1000G

Released on 22 Jul 2015.


This data release includes variant calls and associated data for 845 mosquito specimens — 765 wild-caught specimens collected from eight countries across sub-Saharan Africa, and 80 specimens comprising parents and progeny of four crosses. All mosquitoes were sequenced by the Wellcome Trust Sanger Institute’s Malaria programme.

Any use of Project data is subject to the Terms of Use.

Data sets


This data release comprises variant call data, available as either VCF or HDF5 format files, and other associated data files.

All of the data files included in this release can be downloaded from the Wellcome Trust Sanger Institute public FTP site.

NOTE: Many browsers now do not support links to FTP sites. If you are experiencing difficulties, you may need to change your browser settings.

Go to FTP

Release notes

Genome accessibility
22 Jul 2015

This release includes new data on genome accessibility. The “accessibility” directory within the FTP site contains files providing a number of metrics of genome accessibility for each position in the AgamP3 reference genome, derived from alignments of sequence reads from the 765 wild-caught samples to the reference. Also included is a mask specifying which positions are considered accessible and which are not.

This data is
6 Nov 2014

Also new in this release are variant calls for four crosses between parents derived from various established colonies, including the Mali and Pimperena colonies. Each cross comprises two parents and around 18 progeny. The “variation/crosses” directory contains variant calls in both VCF and HDF5 formats.

Variant filtering
22 Jul 2015

The raw variant calls for the main phase 1 cohort of 765 wild-caught samples have not changed since the previous phase 1 AR2 release, however, the variant filtering strategy is different. Variant filters now make use of the genome accessibility metrics mentioned above. The new filtering strategy is generally more conservative than the previous AR2 release, thus some variants previously passing all filters may now fail one or more filters. Variant calls for the 765 wild-caught samples are in the “variation/main” directory, in both VCF and HDF5 formats.

22 Jul 2015

In addition to the unphased genotype calls, this release includes phased haplotypes estimated for both the 765 wild-caught individuals and the parents and progeny of the crosses. Data are available in the “haplotypes” directory in HDF5 and SHAPEIT formats. The directory also includes some data on estimates of phasing error rates over the genome.

Go to FTP

Open access


Our approach to sharing data

Data package contact


To cite these data directly, please use the following citation format:

The Anopheles gambiae 1000 Genomes Consortium (2015): Ag1000G phase 1 AR3 data release. MalariaGEN.