This data release includes genome-wide variant calls, haplotypes and associated data for 1,142 wild-caught specimens collected from 13 countries spanning sub-Saharan Africa, and 234 specimens comprising parents and progeny of 11 lab crosses.
If you have any questions regarding these data or would like to report an issue, please email Chris Clarkson (firstname.lastname@example.org) or raise an issue via GitHub.
All mosquitoes were sequenced by the Wellcome Trust Sanger Institute’s Malaria programme.
This data release comprises variant call data, available in several different file formats, and other associated data files.
All of the data files included in this release can be downloaded from the Wellcome Trust Sanger Institute public FTP site using a freely available FTP client.
Nature Communications26 May 2021
VCF files containing variation data for the wild-caught specimens are available from the “variation/main/vcf” sub-folder within the FTP site. Files are further organised into sub-directories providing different subsets of the overall dataset. The “all” sub-directory contains the complete dataset, including all variants discovered. The “pass” sub-directory contains only SNPs that passed all quality filters. For most analyses it is recommended to work only with PASS variants.
Molecular Biology and Evolution1 Mar 2020
A number of annotations have been added to the FILTER column in the VCF files. These annotations indicate quality filters that apply to the given variant. The VCF file headers contain information about the meaning of each of the filters used.
Science Advances28 Apr 2021
This release includes a revised map of genome accessibility, derived from an analysis of sequence read alignments for the 1,142 wild-caught specimens in this release. Various files are included in the “accessibility” sub-folder, including accessibility metrics and a mask specifying which positions are considered accessible and which are not.
Malaria Journal14 Jun 2021
Phased haplotypes estimated at biallelic SNPs both for the wild-caught individuals and the crosses are available from the “haplotypes” sub-directory.
Wellcome Open Research15 Nov 2016
Files containing metadata about the specimens included in this release are available from the “samples” sub-directory. This includes sampling location for the wild-caught individuals, and parentage for the crosses individuals.
Data package contact
To cite these data directly, please use the following citation format:
The Anopheles gambiae 1000 Genomes Consortium (2017): Ag1000G phase 2 AR1 data release. MalariaGEN. http://www.malariagen.net/data_package/ag1000g-phase-2-ar1/.