This data release includes variant calls and associated data for 845 mosquito specimens — 765 wild-caught specimens collected from eight countries across sub-Saharan Africa, and 80 specimens comprising parents and progeny of four crosses. All mosquitoes were sequenced by the Wellcome Trust Sanger Institute’s Malaria programme.
This data release comprises variant call data, available as either VCF or HDF5 format files, and other associated data files.
All of the data files included in this release can be downloaded from the Wellcome Trust Sanger Institute public FTP site.
NOTE: Many browsers now do not support links to FTP sites. If you are experiencing difficulties, you may need to change your browser settings.
Genome accessibility22 Jul 2015
This release includes new data on genome accessibility. The “accessibility” directory within the FTP site contains files providing a number of metrics of genome accessibility for each position in the AgamP3 reference genome, derived from alignments of sequence reads from the 765 wild-caught samples to the reference. Also included is a mask specifying which positions are considered accessible and which are not.
This data is6 Nov 2014
Also new in this release are variant calls for four crosses between parents derived from various established colonies, including the Mali and Pimperena colonies. Each cross comprises two parents and around 18 progeny. The “variation/crosses” directory contains variant calls in both VCF and HDF5 formats.
Variant filtering22 Jul 2015
The raw variant calls for the main phase 1 cohort of 765 wild-caught samples have not changed since the previous phase 1 AR2 release, however, the variant filtering strategy is different. Variant filters now make use of the genome accessibility metrics mentioned above. The new filtering strategy is generally more conservative than the previous AR2 release, thus some variants previously passing all filters may now fail one or more filters. Variant calls for the 765 wild-caught samples are in the “variation/main” directory, in both VCF and HDF5 formats.
Haplotypes22 Jul 2015
In addition to the unphased genotype calls, this release includes phased haplotypes estimated for both the 765 wild-caught individuals and the parents and progeny of the crosses. Data are available in the “haplotypes” directory in HDF5 and SHAPEIT formats. The directory also includes some data on estimates of phasing error rates over the genome.
Data package contact
To cite these data directly, please use the following citation format:
The Anopheles gambiae 1000 Genomes Consortium (2015): Ag1000G phase 1 AR3 data release. MalariaGEN. http://www.malariagen.net/data_package/ag1000g-phase1-ar3/