Ag1000G phase 2 AR1 data release

Project: Ag1000G

Released on 6 Nov 2017

This data release includes genome-wide variant calls, haplotypes and associated data for 1,142 wild-caught specimens collected from 13 countries spanning sub-Saharan Africa, and 234 specimens comprising parents and progeny of 11 lab crosses.

Any use of Project data is subject to the Terms of Use.

If you have any questions regarding these data or would like to report an issue, please email Chris Clarkson (cc28 [at] or raise an issue via GitHub.

All mosquitoes were sequenced by the Wellcome Trust Sanger Institute’s Malaria programme.

6 Nov 2017


This data release comprises variant call data, available in several different file formats, and other associated data files.

All of the data files included in this release can be downloaded from the Wellcome Trust Sanger Institute public FTP site using a freely available FTP client.

Release notes

6 Nov 2017
Organisation of VCF files

VCF files containing variation data for the wild-caught specimens are available from the “variation/main/vcf” sub-folder within the FTP site. Files are further organised into sub-directories providing different subsets of the overall dataset. The “all” sub-directory contains the complete dataset, including all variants discovered. The “pass” sub-directory contains only SNPs that passed all quality filters. For most analyses it is recommended to work only with PASS variants.

6 Nov 2017
Variant filters

A number of annotations have been added to the FILTER column in the VCF files. These annotations indicate quality filters that apply to the given variant. The VCF file headers contain information about the meaning of each of the filters used.

6 Nov 2017
Genome accessibility

This release includes a revised map of genome accessibility, derived from an analysis of sequence read alignments for the 1,142 wild-caught specimens in this release. Various files are included in the “accessibility” sub-folder, including accessibility metrics and a mask specifying which positions are considered accessible and which are not.

6 Nov 2017

Phased haplotypes estimated at biallelic SNPs both for the wild-caught individuals and the crosses are available from the “haplotypes” sub-directory.

Sample metadata

Files containing metadata about the specimens included in this release are available from the “samples” sub-directory. This includes sampling location for the wild-caught individuals, and parentage for the crosses individuals.