This page provides information about data generated by phase 2 of the Anopheles gambiae 1000 Genomes Project (Ag1000G), an international collaboration working to discover natural genetic variation in malaria mosquito populations and build an open data resource for mosquito research and surveillance.
The Ag1000G phase 2 data resource includes genome-wide single nucleotide polymorphism (SNP) calls, SNP haplotypes, copy number variation (CNV) calls, and associated data for 1,142 wild-caught mosquito specimens collected from 13 countries spanning sub-Saharan Africa, and 234 specimens comprising parents and progeny of 11 lab crosses.
All mosquitoes were sequenced by the Parasites and Microbes Programme at the Wellcome Sanger Institute.
For general information about use of MalariaGEN data, see our approach to sharing data.
If you use these data, please cite the following publication:
- The Anopheles gambiae 1000 Genomes Consortium (2020) "Genome variation and population structure among 1,142 mosquitoes of the African malaria vector species Anopheles gambiae and Anopheles coluzzii." Genome Research 30: 1533-1546.
If you use the CNV data, please also cite the following publication:
- Eric R. Lucas et al. (2019) "Whole-genome sequencing reveals high complexity of copy number variation at insecticide resistance loci in malaria mosquitoes." Genome Research 29: 1250-1261.
If you have any technical or scientific questions regarding the data, or would like to report an issue, please email Chris Clarkson (cc28 [at] sanger.ac.uk) or raise an issue via GitHub.
Raw sequence reads for all mosquitoes included in this release are available from the European Nucleotide Archive (ENA). ENA sample accessions are available separately for wild-caught samples and lab crosses samples.
All other data are available from the Ag1000G public FTP site.
NOTE: Links to FTP sites no longer work with the Google Chrome Web Browser. If you are experiencing difficulties, please update your settings or use an alternative browser.
A selection of data are also available from Google Cloud Storage via the ag1000g-release bucket located in the us-central1 region.
Contents of the data resource
Below is an overview of the main contents of the data resource.
- Contributing partner studies: information about the studies which contributing mosquito specimens to Ag1000G, including details of mosquito sampling sites and collection methods, and contact information.
- Sample provenance and metadata for wild-caught samples and lab crosses: sampling location, year of collection and ENA accession for all mosquito samples included in this resource.
- Insecticide resistance marker genotypes: genotypes at validated and putative markers of insecticide resistance for the 1,142 wild-caught samples, including genotypes at non-synonymous SNPs in target-site resistance genes and copy number variations in metabolic resistance genes.
- Genome-wide SNP calls: Analysis-ready SNP calls for both wild-caught samples and lab crosses, filtered and annotated.
- Genome-wide SNP haplotypes: Haplotypes for the wild-caught samples, phased at biallelic SNPs.
- Genome-wide CNV calls: CNV calls for wild-caught samples.
- Genome accessibility map: Accessibility map and associated data, needed for many population genetics analyses.
- SNP allele frequencies: Allele frequencies for all SNPs in each of the 16 populations sampled in this cohort, defined by species and country of collection.
- Conserved Cas9 targets: Locations of genome regions that could be used as Cas9 targets (23-mers containing a protospacer adjacent motif) with data on SNPs and nucleotide diversity within targets.
See the README file for further details about the files included in the release, including file formats and meanings of fields and columns used.
Details of the sampling and analytical methods used to generate these data can be found in the accompanying paper, "Genome variation and population structure among 1,142 mosquitoes of the African malaria vector species Anopheles gambiae and Anopheles coluzzii" [coming soon].
Further details of the CNV calling methods are available from the paper, "Whole-genome sequencing reveals high complexity of copy number variation at insecticide resistance loci in malaria mosquitoes".
For convenience, we have also compiled the variant calling and phasing methods into a single document available from the public FTP site.
Publications using these data
Eric R. Lucas et al. (2019) Whole-genome sequencing reveals high complexity of copy number variation at insecticide resistance loci in malaria mosquitoes. Genome Research 29, 1250-1261.
Eric R. Lucas et al. (2019) A high throughput multi-locus insecticide resistance marker panel for tracking resistance emergence and spread in Anopheles gambiae. Scientific Reports 9.
Christina M. Bergey et al. (2019) Assessing connectivity despite high diversity in island populations of a malaria mosquito. Biorxiv.
R. Rebecca Love et al. (2019) In Silico Karyotyping of Chromosomally Polymorphic Malaria Mosquitoes in the Anopheles gambiae Complex. G3: Genes, Genomes, Genetics 9 (10), 3249-3262.
Alexander T. Xue et al. (2019) Discovery of ongoing selective sweeps within Anopheles mosquito populations using deep learning. Biorxiv.
Bhavin S. Khatri and Austin Burt (2019) Robust Estimation of Recent Effective Population Size from Number of Independent Origins in Soft Sweeps. Molecular Biology and Evolution 36 (9), 2040-2052.
Kyros Kyrou et al. (2018) A CRISPR–Cas9 gene drive targeting doublesex causes complete population suppression in caged Anopheles gambiae mosquitoes. Nature Biotechnology 36, 1062-1066.
Chris S. Clarkson et al. (2018) The genetic architecture of target-site resistance to pyrethroid insecticides in the African malaria vectors Anopheles gambiae and Anopheles coluzzii. Biorxiv.
Taedong Yun et al. (2018) Improved non-human variant calling using species-specific DeepVariant models. Google DeepVariant Blog.
José L. Vicente et al. (2017) "Massive introgression drives species radiation at the range limit of Anopheles gambiae." Scientific Reports 7.
Paul Vauterin et al. (2017) "Panoptes: web-based exploration of large scale genome variation data." Bioinformatics 33 (20), 3243-3249.
Andrew Brantley Hall et al. (2016) "Radical remodeling of the Y chromosome in a recent radiation of malaria mosquitoes." PNAS 113 (15), E2114-E2123.