NEW: Over 100 African researchers begin training... more
Data archive
Apr 2024
Species: A. gambiae sensu lato
The MalariaGEN Vector Observatory Anopheles gambiae data resource version 3.9 (Ag3.9) contains single nucleotide polymorphism (SNP) calls, copy number variant (CNV) calls and SNP haplotypes from whole-genome sequencing of mosquitoes collected in Angola (8), Benin (11), Burkina Faso (7), Cameroon (87), Equatorial Guinea (9), Ethiopia (273), Gabon (288), Guinea Bissau (14), Kenya (373), Madagascar (10), Mali (30), Sao Tome and Principe (31), South Africa (127), Tanzania (14), The Gambia (2193), The Union of the Comoros (35), Uganda (113), Zambia (6) and Zimbabwe (10) from 1988 to 2022. Ag3.9 contains 3639 whole genome sequences from An. coluzzi, An. arabiensis, An. melas, An. merus, An. quadriannulatus and An. gambiae.
Nov 2023
The MalariaGEN Vector Observatory Anopheles gambiae data resource version 3.8 (Ag3.8) contains single nucleotide polymorphism (SNP) calls, copy number variant (CNV) calls and SNP haplotypes from whole-genome sequencing of mosquitoes collected in Burkina Faso (387 samples), Gabon (43 samples), Nigeria (117 samples), and Uganda (1714 samples) from 2011 to 2020. Ag3.8 contains 2261 whole genome sequences from An. coluzzi, An. arabiensis, An. fontenillei and An. gambiae.
Nov 2023
Species: A. gambiae sensu lato
The MalariaGEN Vector Observatory Anopheles gambiae data resource version 3.7 (Ag3.7) contains single nucleotide polymorphism (SNP) calls, copy number variant (CNV) calls and SNP haplotypes from whole-genome sequencing of mosquitoes collected in Benin (451 samples), Burkino Faso (24 samples), Cameroon (744 samples), Gabon (4 samples), Ghana (528 samples), Guinea (1 sample), and Tanzania (794 samples) from 2007 to 2021. Ag3.7 contains 2546 whole genome sequences from An. coluzzi, An. arabiensis, An. fontenillei and An. gambiae. This release includes openly-available data from two literature studies: Barrón, M.G., Paupy, C., Rahola, N. et al. A new species in the major malaria vector complex sheds light on reticulated species evolution. Sci Rep 9, 14753 (2019). https://doi.org/10.1038/s41598-019-49065-5 Crawford, J.E., Riehle, M.M., Markianos, K., et...
Nov 2023
Species: A. gambiae sensu lato
The MalariaGEN Vector Observatory Anopheles gambiae data resource version 3.6 (Ag3.6) contains single nucleotide polymorphism (SNP) calls, copy number variant (CNV) calls and SNP haplotypes from whole-genome sequencing of mosquitoes collected in Cote d’Ivôire (379 samples), South Africa (341 samples), Uganda (483 samples) and Zambia (201 samples) from 2017 to 2021. Ag3.6 contains 1404 whole genome sequences from An. coluzzii, An. gambiae and An. arabiensis species.
Nov 2023
Species: A. gambiae sensu lato
The MalariaGEN Vector Observatory Anopheles gambiae data resource version 3.5 (Ag3.5) contains single nucleotide polymorphism (SNP) calls, copy number variant (CNV) calls and SNP haplotypes from whole-genome sequencing of mosquitoes collected in the Democratic Republic of Congo (673 samples), Ethiopia (85 samples) and The Gambia (382 samples) from 2001 to 2020. Ag3.5 contains 1140 whole genome sequences from An. gambiae, An. coluzzii, and An. arabiensis species.
Nov 2023
Species: A. gambiae sensu lato
The MalariaGEN Vector Observatory Anopheles gambiae data resource version 3.4 (Ag3.4) contains single nucleotide polymorphism (SNP) calls, copy number variant (CNV) calls and SNP haplotypes from whole-genome sequencing of mosquitoes collected in Burkina Faso (1126 samples), Ghana (485 samples) and Mali (206 samples) from 2014 to 2018. Ag 3.4 contains 1817 whole genome sequences from An. coluzzii, An. gambiae and An. arabiensis species.
Nov 2023
Species: A. gambiae sensu lato
The MalariaGEN Vector Observatory Anopheles gambiae data resource version 3.3 (Ag3.3) contains single nucleotide polymorphism (SNP) calls, copy number variant (CNV) calls and SNP haplotypes from whole-genome sequencing of mosquitoes collected in Ghana (945 samples), Kenya (1 sample) and Uganda (56 samples) from 2013 to 2018. Ag 3.3 contains 1002 whole genome sequences from An. coluzzii, An. gambiae and An. arabiensis species.
Nov 2023
Species: A. gambiae sensu lato
The MalariaGEN Vector Observatory Anopheles gambiae data resource version 3.2 (Ag3.2) contains single nucleotide polymorphism (SNP) calls, copy number variant (CNV) calls and SNP haplotypes from whole-genome sequencing of mosquitoes collected in Benin (232 samples), Côte d’Ivoire (38 samples), Ghana (666 samples), Mali (23 samples) and Togo (179 samples) from 2013 to 2018. Ag3.2 contains 1138 whole genome sequences from the An. coluzzii and An. gambiae species.
Nov 2023
Species: A. gambiae sensu lato
The MalariaGEN Vector Observatory Anopheles gambiae data resource version 3.1 (Ag3.1) contains single nucleotide polymorphism (SNP) calls, copy number variant (CNV) calls and SNP haplotypes from whole-genome sequencing of mosquitoes collected in Mali from 2012 to 2015. Ag3.1 contains 647 whole genome sequences from mosquito samples collected in two locations in Mali. Mosquito species are mostly An. coluzzii, with a smaller number of An. arabiensis and An. gambiae.
Dec 2022
Species: P. falciparum
This page provides information about the Pf7 dataset which contains genome variation data on over 20,000 worldwide samples of Plasmodium falciparum. Open the Pf7 app to view summary information about contributing studies, countries, and resistance profiles. This release contains details on contributing partner studies, sample metadata and key sample attributes inferred from genomic data, and genomic data including raw sequence reads. A description of the dataset can be found here. These data are available open access. Publications using these data should acknowledge and cite the source of the data using the following format: “This publication uses MalariaGEN data as described in ‘Pf7: an open dataset of Plasmodium falciparum genome variation in 20,000 worldwide samples . MalariaGEN et al, Wellcome Open...
Feb 2022
Species: P. vivax
This page provides information about the Pv4 dataset, which contains genome variation data on 1,895 worldwide samples of Plasmodium vivax. The key publication is MalariaGEN et al, Wellcome Open Research 2022, 7:136 https://doi.org/10.12688/wellcomeopenres.17795.1. Full details of the methods can be found in the accompanying paper. The major changes from the v1 (May 2016 data release) pipeline are that we now a) map to the PvP01 reference genome rather than PvSal1 and b) use a pipeline based on current GATK best practices which is analogous to the Pf6 pipeline. This release contains details on contributing partner studies, sample metadata and key sample attributes inferred from genomic data, and genomic data including raw sequence reads. Further details and analytical results can be...
Dec 2021
We used DNA from over 4,000 children ascertained with severe malaria in the period 1995-2009 to test for association between human and P.falciparum genetic variants. All individuals were from Banjul, The Gambia, and from Kilifi County, Kenya, and were previously analysed for human genotypes (see doi:10.1038/s41467-019-13480-z and https://www.malariagen.net/resource/25/). For this study, we generated P.falciparum genome sequence reads using the Illumina X Ten platform and used this to identify and call parasite genetic variation. We then tested for association between human and parasite variants using a logistic regression approach implemented in the software HPTEST (https://www.well.ox.ac.uk/~gav/hptest). More information about the methodology is available on the data resource page: https://www.malariagen.net/resource/32/. Full details of our data generation and processing are available in the manuscript:...
Nov 2021
This data release includes whole genome sequences from 302 wild-caught mosquitoes collected from five sites in Cambodia. The mosquitoes included in this study were An. minimus s.s.. All mosquitoes were sequenced using Illumina technology with 150bp long reads by the Wellcome Sanger Institute Parasite and Microbes Programme.
Nov 2021
This data release includes copy number variant (CNV) calls, genome-wide single nucleotide polymorphism (SNP) calls, haplotypes as well as sample metadata and sequence read alignments from whole-genome sequencing of 2,784 wild-caught mosquitoes collected from 19 countries in sub-Saharan Africa, and 297 mosquitoes comprising parents and progeny of 15 lab crosses. Three mosquito species are represented: Anopheles gambiae, Anopheles coluzzii and Anopheles arabiensis. This data was generated by the Ag1000G project which is part of the MalariaGEN vector observatory but can also be analysed together with data from the Anopheles gambiae genomic surveillance project.
Nov 2021
Species: A. gambiae sensu lato
Project: Ag1000G
This data release includes phased haplotypes for 2,784 wild-caught mosquitoes collected from 19 countries in sub-Saharan Africa. These haplotypes can be analysed directly or used as haplotype reference panels to improve phasing of other samples. Three mosquito species are represented: Anopheles gambiae, Anopheles coluzzii and Anopheles arabiensis. All mosquitoes were sequenced using Illumina technology by the Wellcome Sanger Institute Parasites and Microbes programme.
Jul 2021
Species: A. gambiae sensu lato
Project: Ag1000G
This data release includes copy number variant (CNV) calls from whole-genome sequencing of 2,784 wild-caught mosquitoes collected from 19 countries in sub-Saharan Africa, and 297 mosquitoes comprising parents and progeny of 15 lab crosses. Three mosquito species are represented: Anopheles gambiae, Anopheles coluzzii and Anopheles arabiensis. All mosquitoes were sequenced using Illumina technology by the Wellcome Sanger Institute Parasites and Microbes programme.
Feb 2021
Species: A. gambiae sensu lato
Project: Ag1000G
This data release includes sample metadata, sequence read alignments and genome-wide single nucleotide polymorphism (SNP) calls from whole-genome sequencing of 2,784 wild-caught mosquitoes collected from 19 countries in sub-Saharan Africa, and 297 mosquitoes comprising parents and progeny of 15 lab crosses. Three mosquito species are represented: Anopheles gambiae, Anopheles coluzzii and Anopheles arabiensis. All mosquitoes were sequenced using Illumina technology by the Wellcome Sanger Institute Parasites and Microbes programme.
Jan 2021
Species: P. falciparum
Project: GenRe-Mekong
This page contains information about the first GenRe-Mekong project data release (v1.0), comprising Genetic Report Cards data from 9,623 Plasmodium falciparum samples. The v1.0 data release contains details on contributing partner studies, sample metadata and key sample attributes inferred from genomic data. The release is accompanied by a publication, detailing the project and highlighting a number of key analyses and results and an accompanying Resource page provides access a wealth of supplementary information, including all methods and protocols used, details of reagents, information about the participating studies, as well as sample information, accession numbers, genotype calls and phenotype predictions. GenRe-Mekong data were produced using the SPOTmalaria genetic surveillance framework. SPOTmalaria implements standardized methodologies and processing pipelines for extracting and nalysed...
Nov 2020
Species: P. falciparum
This page contains information about the Pf6 data release: data generated by the Plasmodium falciparum Community Project using the version 6 pipeline for variant discovery and genotype calling. This release contains sample information, accession numbers and genotype calls for samples as described in Plasmodium falciparum Community Project: about the version 6 data. In 2018 the Plasmodium falciparum Community Project upgraded to version 6 of its variant discovery and genotype calling pipeline. Details of the methods can be found in the accompanying paper. The major change from previous versions is that the version 6 pipeline is based on GATK and utilises findings on genome accessibility generated by P. falciparum Genetic Crosses Project. These data are available open access. Publications using these...
Aug 2019
This release contains six packages of Illumina whole-genome sequence data and Illumina Omni 2.5M genotype data for individuals from three African countries. Individuals were collected as nominally unrelated (Burkina Faso) or as family trios (Cameroon and Tanzania). This data was also used to analyse a specific genomic region in Leffler et al. (2017), “Resistance to malaria through structural variation of red blood cell invasion receptors”, Science, 356.
Aug 2019
This data release contains SNP genotype data and association test results from our analysis of severe malaria in eleven populations. Data for eleven populations (Gambia, Mali, Burkina Faso, Ghana, Nigeria, Cameroon, Tanzania, Malawi, Kenya, Vietnam and Papua New Guinea) are available. The datasets available include raw Illumina Omni 2.5M genotype data from each population analysed in the above manuscript. In addition, we provide a set of processed data for a subset of samples that passed our quality control process, including phased and imputed SNP genotypes and a set of association test summary statistics. A set of genotypes at selected genetic variants generated using the Sequenom MassArray platform on a larger set of individuals is also available.
Jul 2019
The Gambian Genome Variation Project (GGVP) is a collaboration of the MRC Unit in The Gambia, the Wellcome Sanger Institute, the MRC Centre for Genomics and Global Health at Oxford University, and the MalariaGEN Resource Centre. The purpose of the project was to support the discovery and understanding of genetic variants that influence human disease. Please see the 1000 Genomes FTP site for a full description of the data including terms of use.
Nov 2017
Species: A. gambiae sensu lato
Project: Ag1000G
This data release includes genome-wide variant calls, haplotypes and associated data for 1,142 wild-caught specimens collected from 13 countries spanning sub-Saharan Africa, and 234 specimens comprising parents and progeny of 11 lab crosses. Any use of Project data is subject to the Terms of Use. If you have any questions regarding these data or would like to report an issue, please email Chris Clarkson (cc28@sanger.ac.uk) or raise an issue via GitHub. All mosquitoes were sequenced by the Wellcome Trust Sanger Institute’s Malaria programme.
Dec 2016
Species: A. gambiae sensu lato
Project: Ag1000G
This release adds further data on the Ag1000G phase 1 cohort, comprising 845 mosquito specimens in total. Included in the release are improved haplotype data for the 765 wild-caught specimens (now including the X chromosome) and results from various population genomic analyses and SNP validation experiments. See also the Ag1000G phase 1 AR3 data release. All mosquitoes were sequenced by the Wellcome Trust Sanger Institute’s Malaria programme. Any use of Project data is subject to the Terms of Use.
Jun 2016
Species: P. vivax
This page contains information about the P. vivax Genome Variation project May 2016 data release. This data release contains sample information, accession numbers, and genotype calls for samples used in the analyses described in Pearson et al, 2016. The full text article is accessible online. These data are available open access.
Mar 2016
Background This data release contains SNP genotype data and association test results from our ongoing analysis of severe malaria in eleven populations. Data for three populations (Gambia, Malawi and Kenya) are available currently; additional populations will be added as they become available. If you use these data, please cite: Malaria Genomic Epidemiology Network. A novel locus of resistance to severe malaria in a region of ancient balancing selection. Nature. 2015 Oct 8;526(7572):253-7. doi: 10.1038/nature15390. This release contains two types of data: SNP genotype data. These data reflect genotyping of all samples on the Illumina Omni 2.5M array and are provided in VCF format. Addionally we provide the clinical status, gender, and sickle trait status of each sample, and information on...
Feb 2016
Species: P. falciparum
Project: Pf3k
This page contains information about the pilot data release 5 from the Pf3k project. This release contains de novo variant discovery and genotyping across an updated sample set from the pilot phase of the project. At the time of their release, these data were subject to the Pf3k Pilot Phase Terms of Use. In September 2016, these restrictions were lifted and this dataset is now available open access.
Jan 2016
Species: P. falciparum
This page contains information about the P. falciparum Community Project Jan. 2016 data release. This release contains sample information, accession numbers and genotype calls for samples used in the analyses described in MalariaGEN P. falciparum Community Project, 2016. At the time of their release, these data were subject to the P. falciparum Community Project Terms of Use. In February 2017, these restrictions were lifted and these data are now available open access.
Oct 2015
Species: P. falciparum
Project: Pf3k
This release contains sample information and accession numbers, analysis BAMs, and de novo variant discovery and genotyping across 2,512 samples collected in 14 countries, as well as five lab strains included for method development validation. At the time of their release, these data were subject to the Pf3k Pilot Phase Terms of Use. In September 2016, these restrictions were lifted and this dataset is now available open access.
Aug 2015
Species: P. falciparum
An important output of the P. falciparum Community Project is the Catalogue of Genetic Variation in P. falciparum, which includes: Single nucleotide polymorphisms (SNPs) identified via deep sequencing Allele frequencies in different geographical populations Measures of genome accessibility – such as %GC, mapping quality and coverage – to visualise potential difficulties in identifying SNPs and accurately genotyping samples The data in the Catalogue of Genetic Variation in P. falciparum are periodically updated and made available open access.
Jul 2015
Species: A. gambiae sensu lato
Project: Ag1000G
This data release includes variant calls and associated data for 845 mosquito specimens — 765 wild-caught specimens collected from eight countries across sub-Saharan Africa, and 80 specimens comprising parents and progeny of four crosses. All mosquitoes were sequenced by the Wellcome Trust Sanger Institute’s Malaria programme. Any use of Project data is subject to the Terms of Use.
May 2015
Species: P. falciparum
The P. falciparum Genetic Crosses project 1.0 data release comprises sequence data and variant calls on parents and 78 progeny clones from the crosses 3D7xHB3, HB3xDd2 and 7G8xGB4, sequenced by the Wellcome Trust Sanger Institute’s Malaria programme.
Apr 2015
Species: P. falciparum
Project: Pf3k
This release contains sample information, accession numbers, and baseline genotypes for 2,512 samples comprised of the 1,931 samples included in the 2.0 pilot data release as well as an additonal 581 samples collected in Ghana, Mali and Malawi. At the time of their release, these data were subject to the Pf3k Pilot Phase Terms of Use. In September 2016, these restrictions were lifted and this dataset is now available open access.
Jan 2015
Species: P. falciparum
Download information about P. falciparum samples and sequence data included in published population-level analyses: ENA accession numbers Country of origin Contributors’ names and contact information These samples and data were contributed to the P. falciparum Community Project. At the time of their release, these data were subject to the P. falciparum Community Project Terms of Use. In February 2017, these restrictions were lifted and this dataset is now available open access..
Dec 2014
Species: A. gambiae sensu lato
Project: Ag1000G
This data release includes variant calls on 765 mosquito specimens collected from eight countries across sub-Saharan Africa and sequenced by the Wellcome Trust Sanger Institute’s Malaria programme. Any use of Project data is subject to the Terms of Use.
Nov 2014
Species: P. falciparum
Project: Pf3k
This data release contains sample information, accession numbers, and baseline genotypes for 1,931 samples comprised of the 1,794 samples included in the Pf3k 1.0 pilot data release along with an additional 137 samples contributed by the Broad Institute. At the time of their release, these data were subject to the Pf3k Pilot Phase Terms of Use. In September 2016, these restrictions were lifted and this dataset is now available open access.
Aug 2014
Species: P. falciparum
Project: Pf3k
This data release comprises sample information and accession numbers for sequence reads from 1,794 samples collected from multiple locations in Africa and Asia, and contributed by the MalariaGEN P. falciparum Community Project and their partners. At the time of their release, these data were subject to the Pf3k Pilot Phase Terms of Use. In September 2016, these restrictions were lifted and this dataset is now available open access.
Jun 2014
The initial study and data description are published in: Band G et al. (2013). Imputation-based meta-analysis of severe malaria in three African populations. PLoS Genet. 9:e1003509. This data release contains three separate data packages of SNP genotype data for cases and controls from three populations: Gambia, Kenya and Malawi. This data has been deposited in the European Genotyping Archive under EGA Study Code EGAS00001000807. All cases have been diagnosed with malaria in a hospital. Controls were samples from within the general population and from new births. All samples are unrelated (but see Readme files for further details). The information provided here is common to each of the three country datasets and where differences exist these are noted. Data set structure...
May 2014
Species: A. gambiae sensu lato
Project: Ag1000G
This page contains information about the phase 1 preview data release from the Anopheles gambiae 1000 Genomes project. This data release comprises variant calls on 103 samples collected in Uganda. Any use of Project data is subject to the Terms of Use.
Feb 2014
Species: P. falciparum
An important output of the P. falciparum Community Project is the Catalogue of Genetic Variation in P. falciparum, which includes: Single nucleotide polymorphisms (SNPs) identified via deep sequencing Allele frequencies in different geographical populations Measures of genome accessibility – such as %GC, mapping quality and coverage – to visualise potential difficulties in identifying SNPs and accurately genotyping samples The data in the Catalogue of Genetic Variation in P. falciparum are periodically updated and made available open access.
Sep 2013
Species: P. falciparum
An important output of the P. falciparum Community Project is the Catalogue of Genetic Variation in P. falciparum, which includes: Single nucleotide polymorphisms (SNPs) identified via deep sequencing Allele frequencies in different geographical populations Measures of genome accessibility – such as %GC, mapping quality and coverage – to visualise potential difficulties in identifying SNPs and accurately genotyping samples The data in the Catalogue of Genetic Variation in P. falciparum are periodically updated and made available open access.
May 2012
Species: P. falciparum
An important output of the P. falciparum Community Project is the Catalogue of Genetic Variation in P. falciparum, which includes: Single nucleotide polymorphisms (SNPs) identified via deep sequencing Allele frequencies in different geographical populations Measures of genome accessibility – such as %GC, mapping quality and coverage – to visualise potential difficulties in identifying SNPs and accurately genotyping samples The data in the Catalogue of Genetic Variation in P. falciparum are periodically updated and made available open access.
Mar 2011
This release contains SNP genotype data from cases and controls genotyped on Affymetrix 500K array. This data has been deposited in the European Genotyping Archive under EGA Study Code EGAS00000000026, which is split into two data packages: EGA Study ID EGA Data Set ID # Samples EGAS00000000026_Controls EGAD00000000017 1,496 EGAS00000000026_Cases EGAD00000000018 1,059 All cases have been diagnosed with malaria in a hospital. Controls were samples from within the general population and from new births. Data Package Structure Each data package contains: Genotypes Intensities Samples Supplementary_data Genotypes The Affymetrix 500K SNP chip can yield approximately 2GB per cohort, so this platform’s genotype data have been partitioned according to chromosome and sorted according to SNP position. Each file is presented in tab-delimited...
Jan 2011
This release contains SNP genotype data from mother-father-child trios genotyped on the Illumina 650Y array. These data include parents and a single offspring (so-called trios) from three partner studies within Consortial Project 1. All children have been diagnosed with malaria in a hospital. This data release contains complete total of 4,174 samples that have passed quality control. These data have been deposited in the European Genotyping Archive (EGA) under EGA Study IDs: EGAS00000000087 (Gambia) and EGAS00000000088 (Ghana). Data files Annotations Sample Support Files Sample overlap with Gambian Case-Control on Affymetrix 500k platform Sample QC SNP QC Description of data Intensities (Normalised Signals) Genotypes Plink_files Phased Supplementary_data Sample Support Files (650Y_samples_[country]Trios.txt) The tab delimited Samples Files lists the following information for...