The Wellcome Sanger Institute’s core facility, at peak efficiency, can sequence 40,000 billion DNA bases per day. In 2021, it churned out the equivalent of one gold-standard human genome (in which all 3 billion bases are read on average 30 times each) every three minutes.
That’s impressive and important, but might be overkill when it comes to tracking malaria in endemic countries.
In the case of malaria and other diseases not endemic to the UK, every molecule of DNA that passes through Sanger’s sequencing machines must come from elsewhere. The process of collecting, shipping, and sequencing samples and then analysing and returning data can take too long when partners are looking for answers to public health questions.
“Do my samples have this mutation at this location, which we know causes resistance to this drug?”
For specific questions like this, there is a much faster and cheaper way to get the answers hidden in the genetic code: amplicon sequencing.
What is an amplicon?
An amplicon is simply a short fragment of DNA or RNA that gets amplified (i.e. copied many times). In the context of genomic surveillance, if you pick regions of the genome where mutations are known to confer resistance, you can then seek out and sequence just those regions.
For example, there’s a gene in Plasmodium falciparum parasites called k13 (or kelch-13) that is 2,181 nucleotides long and codes for 726 amino acids. Exactly what it does for the parasites is not currently known, although it is thought to be important for the parasite’s development in human blood. And all it takes is a single mutation in its genetic code — a Single Nucleotide Polymorphism (SNP) — that changes one of at least 10 amino acids (the 539th and 580th being two common ones), and the parasite can resist artemisinin. This drug is currently the best available anti-malarial drug and losing it would be disastrous.
This is a scary prospect, but also presents an opportunity. If it’s only k13 that determines the parasites’ susceptibility to artemisinin, then by focusing solely on the 2,181 nucleotides that make up k13 and ignoring the other 22 million in the parasite, you can find out quickly and cheaply if your anti-malarials will work. That is crucial information, and better to know within weeks or months, rather than waiting years.
How does amplicon sequencing work?
Amplicon sequencing for malaria genomic surveillance revolves around a few processes including selective whole genome amplification (sWGA), the polymerase chain reaction (PCR), incorporation of tags, and Illumina sequencing.
The first step is to extract the parasite DNA from dried blood spots. This is tricky because, in a typical blood spot, more than 95% of the genetic material will be human rather than parasite. Cristina Ariani, the Malaria Parasite Surveillance Lead at the Wellcome Sanger Institute's Genomic Surveillance Unit (GSU), co-led the research that established sWGA as a viable strategy for extracting enough parasite DNA to make further analysis possible. The sWGA method uses primers that specifically pull out parasite DNA sequences while leaving the human behind.
Next, we want to amplify specific genes within the parasite genome. In the lab, researchers combine a DNA-synthesising enzyme with a soup of free nucleotides and “primers” — short, unique sequences that will attach on to the regions of the genome being targeted. By cycling through different temperatures, the double helix of DNA will separate, the primers will bind to each side of single DNA strands, and then the enzymes will get to work building new copies of the selected areas. In a matter of hours, if everything has gone well, there will be millions of copies of the amplicon.
Picking the right combination of letters for a primer is tough. They must be short so that they’re easy to produce, unique and specific to the region of interest on the genome to ensure that only a specific sequence is amplified, and, if there are multiple target areas, all the primers must play nicely together by having similar active temperatures.
There are three “panels” of primers used in the MalariaGEN amplicon toolkit: Genetic Report Card 1 (GRC1), GRC2, and Speciation. GRC1 has 68 sets of primers that will amplify 68 different regions on the parasite genome that are important for drug resistance. The GRC2 panel is similar to GRC1 but has 66 pairs of primers. Between GRC1 and GRC2, we can identify mutations that are known to lead to resistance to many typical anti-malarial drugs. This includes artemisinin, chloroquine, mefloquine, piperaquine, pyrimethamine, and sulfadoxine. To complete the set, the Speciation panel has only two pairs of primers. These are used to clarify if the Plasmodium in the sample is indeed P. falciparum or a different species, like P. vivax.
How does the lab process work? A drop of solution with all the primers is placed in all of the wells in a plate, and then amplified DNA from dried blood spots go in — one sample per well. The targeted genes get amplified through PCR, and then a unique set of DNA barcode tags gets added. This is so that the samples can be pooled for sequencing but their identity can still be determined. There are a series of quality-control steps that ensure that the samples going into a bench-top next-generation sequencer like an Illumina MiSeq are high quality. The resulting data is used to create genetic report cards, where, for each sample, it’s possible to say which drugs the parasite is likely to be resistant to.
For a full, technical breakdown of the protocol, the MalariaGEN team has produced the Amplicon Sequencing Toolkit, available online from malariagen.net.
Amplicon sequencing for genomic surveillance of malaria has been implemented in the Greater Mekong Subregion by partners including Olivo Miotto, and labs in Ghana (led by Lucas Amenga-Etego) and The Gambia (led by Alfred Amambua-Ngwa) are now set up to do the same.
The Ghanaian and Gambian labs are part of a larger NIHR-funded project to create a West African genomic surveillance network, with regional hubs sequencing samples from all over the region and returning actionable data to National Malaria Control Programs (NMCPs) quickly. This kind of surveillance can be used to inform policy decisions about which drugs to use in which areas, and can help spot emerging threats before they become severe.
While useful for parasite drug resistance surveillance, amplicon sequencing can also be used to identify vector species. Mara Lawniczak and her team have developed a panel of 62 primers called ANOSPP that can be used to identify mosquito species. The protocol is also non-destructive, which means that samples can be kept intact for other analyses. The details of the approach were published in Molecular Ecology Resources in 2022.
As amplicon sequencing for genomic surveillance moves from pilot programs to established protocols, the proof of its efficacy is ever more apparent.
The goal of the Wellcome Sanger Institute’s new Genomic Surveillance Unit (GSU) is to support partners around the world to achieve their vision for genomic surveillance. This will likely include amplicon sequencing.
That’s not to say that the Wellcome Sanger Institute’s core facility will go quiet as sequencing moves elsewhere. Whole Genome Sequencing (WGS) remains the only way to discover previously unknown mutations and tease out their effects. There are also certain analyses, like copy number variation, that can only be accomplished with WGS data. New scientific discovery will continue to require the facilities, expertise, and risk-tolerance that sequencing hubs like the Sanger Institute can supply.
It’s becoming clear that amplicon sequencing is a good option for routine genomic surveillance of malaria parasites. It works, it's cost-effective, and the whole end-to-end process can be handled in-country. This provides quicker answers that are more relevant to public health questions, more trained personnel, and the ability to better integrate genomics information within the community.