NEW: Vector genomics fellows... more
Ag1000G phase 2 AR1 data release
Project: Ag1000G

Released on 6 Nov 2017.


This data release includes genome-wide variant calls, haplotypes and associated data for 1,142 wild-caught specimens collected from 13 countries spanning sub-Saharan Africa, and 234 specimens comprising parents and progeny of 11 lab crosses.

Any use of Project data is subject to the Terms of Use.

If you have any questions regarding these data or would like to report an issue, please email Chris Clarkson ( or raise an issue via GitHub.

All mosquitoes were sequenced by the Wellcome Trust Sanger Institute’s Malaria programme.

Data sets


This data release comprises variant call data, available in several different file formats, and other associated data files.

All of the data files included in this release can be downloaded from the Wellcome Trust Sanger Institute public FTP site using a freely available FTP client.

Go to FTP

Release notes

Nature Communications
26 May 2021

VCF files containing variation data for the wild-caught specimens are available from the “variation/main/vcf” sub-folder within the FTP site. Files are further organised into sub-directories providing different subsets of the overall dataset. The “all” sub-directory contains the complete dataset, including all variants discovered. The “pass” sub-directory contains only SNPs that passed all quality filters. For most analyses it is recommended to work only with PASS variants.

Molecular Biology and Evolution
1 Mar 2020

A number of annotations have been added to the FILTER column in the VCF files. These annotations indicate quality filters that apply to the given variant. The VCF file headers contain information about the meaning of each of the filters used.

Science Advances
28 Apr 2021

This release includes a revised map of genome accessibility, derived from an analysis of sequence read alignments for the 1,142 wild-caught specimens in this release. Various files are included in the “accessibility” sub-folder, including accessibility metrics and a mask specifying which positions are considered accessible and which are not.

Malaria Journal
14 Jun 2021

Phased haplotypes estimated at biallelic SNPs both for the wild-caught individuals and the crosses are available from the “haplotypes” sub-directory.

Wellcome Open Research
15 Nov 2016

Files containing metadata about the specimens included in this release are available from the “samples” sub-directory. This includes sampling location for the wild-caught individuals, and parentage for the crosses individuals.

Open access

Our approach to sharing data

Data package contact


To cite these data directly, please use the following citation format:

The Anopheles gambiae 1000 Genomes Consortium (2017): Ag1000G phase 2 AR1 data release. MalariaGEN.