NEW: Unveiling the new MalariaGEN website... more
Large public data set gives unprecedented view of African mosquito genome variation

The first major data set from the Anopheles gambiae 1000 Genomes (Ag1000G) project has been released, comprised of whole genome sequence data on 765 mosquitoes collected from 8 countries spanning sub-Saharan Africa. This release from phase 1 of the project offers an unprecedented view of genetic differences within and between mosquito populations, in locations where malaria remains a public health priority.

News 11 Dec 2014
Anopheles gambiae(link is external). Photo credit: Johns Hopkins Malaria Research Institute, WikiCommons 2006, CC-BY2.5.

Anopheles gambiae. Photo credit: Johns Hopkins Malaria Research Institute, WikiCommons 2006, CC-BY2.5.

A. gambiae is one of the primary vectors of the malaria parasite Plasmodium falciparum. This new data set is a quantum leap in terms of the breadth and depth of data now available for researchers studying how the species has become such an effective transmitter of malaria, and tracking the emergence and spread of insecticide resistance.

The release incorporates data on 44 million single nucleotide polymorphisms (SNPs), more than were discovered by the human 1000 genomes project from a similar number of genomes, even though the human genome is more than ten times larger than that of A. gambiae, highlighting the spectacular natural diversity that exists in mosquito populations.

“Understanding how and why mosquitoes are genetically different from each other is fundamental to many areas of malaria research” says Dominic Kwiatkowski, one of the project’s founders and chair of the Ag1000G data analysis group. “We hope these data will be a valuable resource for the community and will lead to new discoveries that make a difference for malaria control in Africa.”

The Ag1000G project is using whole genome deep sequencing to provide a high-resolution view of genetic variation in natural populations of A. gambiae. The wild-caught mosquitoes are collected at field sites across Africa by Ag1000G partners and sequenced in the UK at the Wellcome Trust Sanger Institute in Hinxton.

Martin Donnelly, of the Liverpool School of Tropical Medicine, explained the importance of this shared data resource. “We hope that these data will allow the Anopheles research community to understand the evolutionary processes that make Anopheles gambiae such a formidable malaria vector.” Professor Donnelly is a co-founder of the project and chairs the Ag1000G partner working group, which brings together partners from 13 research institutions.

A strong ethos of the project is to release data prior to publication, with the expectation that they will be valuable for other researchers, in keeping with Fort Lauderdale principles regarding the publication of global analyses of the data.

As well as files available for download, data in this release can be explored via a new web application developed by the MRC Centre for Genomics and Global Health (CGGH). The application can be used to find and query SNPs and visualise genotypes in individual mosquitoes.

 

 

This powerful new tool has been designed to help researchers access the full richness of the data, which can be challenging without dedicated computational facilities and analytical support.

“The sheer scale and complexity of these data can be daunting, even for a seasoned bioinformatician,” says Alistair Miles, who coordinated this data release and is leading analyses of the data. “Many different evolutionary forces and historical events have left their mark on these 765 genomes, both ancient and recent. Early analyses have shown that insecticide pressure in particular has had a profound impact, reinforcing our concerns that mosquito populations across Africa are evolving rapidly in response to malaria control interventions.”

The Ag1000G analysis group is currently working on a paper describing these and other key features of the data and plans to share their findings as soon as possible.

For more information about the project, visit the Ag1000G home page. For technical information about this data release, see the Ag1000G phase 1 AR2 data release page.