The Ag1000G Consortium has made a new data release based on whole-genome sequencing of 765 wild-caught mosquitoes from eight African countries and an additional 80 mosquitoes from four crosses carried out in the lab.
This release builds on the previous release from phase 1, adding new data on genome accessibility, genetic variation in the four crosses, and estimation of haplotypes for both wild-caught and lab mosquitoes. Taken together, this data provides a higher-resolution view of genetic variation in the major malaria vector, and paves the way for studies of evolutionary processes such as the emergence and spread of insecticide resistance.
The laboratory crosses, bred at the Liverpool School of Tropical Medicine and sequenced at the Wellcome Trust Sanger Institute, are essentially mosquito families – parents and offspring. Piecing together a mosquito genome is tricky, and comparing the genetic changes between generations of related mosquitoes can help researchers to identify errors in this process.
“In some regions of the genome, we’re confident that when a genetic change occurs we will be able to find it. In other regions, it’s not so straightforward,” says Alistair Miles, who coordinated this data release. “We’ve used the new data from the crosses to build a deeper understanding of which parts of the genome we can study with confidence. This has really helped us to improve the quality of the data on genetic variation within the wild mosquito populations.”
While the variant calls in the wild-caught mosquitoes are unchanged from the previous phase 1 release, the data on genome accessibility led to a new filtering strategy that excludes the genetic changes that are most likely to be errors, leaving a more accurate picture of genetic variation.
Also new in this release are the first genome-wide data on mosquito haplotypes, which opens the door to a range of new analyses of population structure and natural selection. “Estimating haplotypes for wild-caught mosquitoes is very challenging, because natural genetic diversity is exceptionally high,” says Nick Harding, who led this analysis. “We may be able to improve the data in future releases but for now this is a valuable first step.”
The data can be explored via an interactive web application, developed by the MRC Centre for Genomics and Global Health. The application can be used to find single nucleotide polymorphisms (SNPs) and visualise genotypes of individual mosquitoes, as well as interact with other data including genome accessibility and genetic diversity within the different natural populations.
The Ag1000G Consortium is currently working on a paper describing various large-scale analyses of data from this release. In the mean time, use of the data by researchers outside the Consortium is very much encouraged. The data are released under Fort Lauderdale conditions, for more information see the Consortium’s statement on data use or contact Alistair Miles <alistair.miles [at] well.ox.ac.uk>.
Read more about the data release: www.malariagen.net/data/ag1000g-phase1-AR3
Explore the data: www.malariagen.net/apps/ag1000g
Learn more about the Ag1000G Consortium: www.malariagen.net/ag1000g