Ag1000G: Anopheles gambiae 1000 Genomes

Ag1000G is a global collaboration using whole genome deep sequencing to provide a high-resolution view of genetic variation in natural populations of Anopheles gambiae, the principal vector of Plasmodium falciparum malaria in Africa.

Overview

Discovering natural genetic variation – Ag1000G is using high-throughput sequencing of a large number of wild-caught mosquitos sampled from across Africa to build a comprehensive catalogue of genetic variation in natural vector populations. As genetic variation is relevant to a wide range of scientific and vector control applications, Ag1000G will make these data available as a resource to the scientific community. The current focus is on A. gambiae sensu strictu and A. coluzzii but may in future expand to consider other members of the species complex.

Describing the structure and history of vector populations – Ag1000G is analysing genetic variation data to characterise key features of natural vector populations, such as patterns of diversity, linkage disequilibrium and recombination, population structure and gene flow, signals of recent selection, and demographic history.

Connecting genetic variation and population biology with ecology and malaria epidemiology – Ag1000G aims to study associations between genotype and broad phenotypes such as ecological specialisation and differences in local malaria epidemiology.

Sampling locations

Ag1000G samples have been collected in the following countries: 

  • Angola
  • Bioko Island, Equatorial Guinea
  • Burkina Faso
  • Cameroon
  • Côte d'Ivoire
  • Gabon
  • Ghana
  • Guinea-Bissau
  • Guinea-Conakry
  • Kenya
  • Mayotte Islands, France
  • The Gambia
  • Uganda 

Data

Ag1000G samples have been sequenced by the Wellcome Trust Sanger Institute using Illumina high throughput technology. The sequence data are then used to discover genetic variation between samples and make genotype calls. Both the sequence data and the variant call data generated by the project will be publicly released on a regular basis.

Data Releases

See the links below for details of data releases, including information about how to access and download data.

Latest release

  • Ag1000G phase 1 AR2 data release – this release comprises variant call data on 765 A. gambiae and A. coluzzii mosquito specimens collected from eight countries.

Previous releases

Use of the project data, presentations and publications, and authorship

The data producers (the Consortium and its Contributing Investigators) will release the Project data prior to publication, in the expectation that they will be valuable for many researchers. In keeping with Fort Lauderdale principles, data users may use the data for their own studies, but are expected to allow the Consortium and its Contributing Investigators to make the first presentations and to publish the first papers with global analyses of the data.

Global analyses of Project data

The Project plans to publish global analyses of the sequence data and quality, SNPs, structural variants, STRs, microsatellites, transposable elements, haplotypes and LD patterns, population genetic phenomena such as population comparisons, mutations rates, signals of selection and functional annotations, demographic history, as well as analyses of regions of general interest such as inversion breakpoints, regions associated with insecticide resistance, regions associated with resistance to Plasmodium infection or vectorial capacity, regions associated with adaptation to broad ecological conditions, and regions associated with incipient speciation. Talks, posters and papers on all such analyses are to be published first by approved presenters on behalf of the Anopheles gambiae 1000 Genomes Consortium. When these planned analyses have been published by the Consortium then researchers inside and outside the Consortium are free to present and publish using the Project data for these and other analyses. 

Large-scale analyses of Project data

Groups within the Project may make presentations and publish papers on behalf of the Consortium on more extensive analyses of topics to be included in the main analysis presentations and papers, coincident with the main project analysis presentations and papers. The major points would be included in the main Project presentations and papers, but these additional presentations and papers allow more focussed discussion of methods and results. 

Methods development using Project data

Researchers who have used small amounts of Project data (< 10% of the genome) may present methods development posters, talks and papers that include these data prior to the first major Project paper, without needing Project approval or authorship, although the Project should be acknowledged and the data should be cited (see below). Methods presentations or papers on global analyses or analyses using large amounts of Project data, on topics that the Consortium plans to examine, would be similar to large-scale analyses of Project data: researchers within the Project may make presentations or submit papers at the same time as the main Project presentations and papers, and others could do so after the Project publishes the first major analysis paper.

Other studies using Project data

Researchers may present and publish on use of Project data in specific chromosome regions (that are not of general interest) or as summaries (such as the total number of variants) for other studies (such as studies on specific aspects of vector biology) without Project approval, prior to the first major Project paper being published. The Project should should be acknowledged and cited (see below).

Population comparisons using Project data

Researchers may use Project data as controls or additional information for comparison with their samples from other populations, prior to the major Project paper being published, as long as the analyses that the Project plans to do are not included. These are not Project studies and the Project should not be listed as an author, however the project should be acknowledged and the data should be cited (see below).

Citing Project data

Any publication which has made use of Project data should directly cite the data release that was used. Guidance on how to cite a specific data release will be given on that data release’s webpage.

Researchers who have questions about whether they may make presentations or submit papers using Project data, or whether to include the Anopheles gambiae 1000 Genomes Consortium as an author, may contact Martin Donnelly (M.J.Donnelly@liverpool.ac.uk).

Coordination and membership

The Ag1000G Project is run by a research consortium with partners from multiple institutions. The Partner Working Group and the Project Management Group provide strategic guidance and operational management to the Project. Members of these groups, and affiliates involved in data production and analysis, comprise the Ag1000G Consortium

Partner Working Group 

The Partner Working Group (PWG) is comprised of the investigators contributing samples or other significant resources / strategic guidance to the Project: 

  • Martin Donnelly, Chair (Liverpool School of Tropical Medicine and Wellcome Trust Sanger Institute)
  • Nora Besansky (University of Notre Dame)
  • Beniamino Caputo (University of Rome)
  • Alessandra della Torre (University of Rome)
  • Charles Godfray (University of Oxford)
  • Mara Lawniczak (Wellcome Trust Sanger Institute)
  • Dominic Kwiatkowski (University of Oxford and Wellcome Trust Sanger Institute)
  • Janet Midega (KEMRI)
  • Dan Neafsey (Broad Institute)
  • Samantha O’Loughlin (Imperial College)
  • Joao Pinto (NOVA University Lisbon)
  • Michelle Riehle (University of Minnesota)
  • Igor Sharakhov (Virginia Tech)
  • Kenneth Vernick (Institut Pasteur)
  • David Weetman (Liverpool School of Tropical Medicine)
  • Craig Wilding (Liverpool John Moores University)

Project Management and Operations Group 

The Project Management Group (PMG) is responsible for project management, data production and handling, QC and exploratory analyses in conjunction with the Data Analysis Group (DAG).

  • Dominic Kwiatkowski, Chair (University of Oxford and Wellcome Trust Sanger Institute)
  • Tiago Antao (Liverpool School of Tropical Medicine)
  • Vikki Cornelius (University of Oxford)
  • Martin Donnelly (Liverpool School of Tropical Medicine and Wellcome Trust Sanger Institute)
  • Christa Henrichs (University of Oxford)
  • Ben Jeffery (University of Oxford)
  • Mara Lawniczak (Wellcome Trust Sanger Institute)
  • Alistair Miles (University of Oxford)
  • Dawn Muddyman (Wellcome Trust Sanger Institute)
  • Jim Stalker (Wellcome Trust Sanger Institute)
  • Ian Wright (University of Oxford)

Data Analysis Group 

The Data Analysis Group (DAG) is undertaking core project analyses and includes members of both the PWG and PMG, alongside additional members of the Ag1000G Consortim.

  • Dominic Kwiatkowski, Chair (University of Oxford and Wellcome Trust Sanger Institute)
  • Tiago Antao (Liverpool School of Tropical Medicine)
  • Michael Bateman (University of Cambridge)
  • Giordano Botta (University of Rome and La Sapienza University Di Roma)
  • Chris Clarkson (Liverpool School of Tropical Medicine)
  • Martin Donnelly (Liverpool School of Tropical Medicine and Wellcome Trust Sanger Institute)
  • Danica Fabrigar (University of Oxford)
  • Michael Fontaine (University of Notre Dame)
  • Nick Harding (University of Oxford)
  • Mara Lawniczak (Wellcome Trust Sanger Institute)
  • Alistair Miles (University of Oxford)
  • Janet Midega (KEMRI)

Funding and support 

Sequencing of all samples is being carried out at the Wellcome Trust Sanger Institute, funded by the Wellcome Trust. Core analytical, management and support teams are funded by the Wellcome Trust via the MalariaGEN Resource Centre and the Wellcome Trust Sanger Institute.

Contact

For more information about the project, contact Alistair Miles (alistair.miles@well.ox.ac.uk).