Dealing with missing data on ethnicity in Malawi

5 August 2012
By Vysaul Nyirongo
Photo courtesy of Vysaul Nyirongo.

In Blantyre, Malawi we have been collecting DNA samples from children with severe malaria admitted to the Malawi-Liverpool-Wellcome Trust and Blantyre Malaria Project research ward since 1997. These datasets were contributed to MalariaGEN's Consortial Project 1. One of the challenges we have had to overcome with the project is a lack of ethnicity data for some of these retrospective cases.

Collection of data on ethnicity

In the early years of the recruitment we did not collect information on the tribe or ethnicity of patients. In later years only information on the ethnicity of the mother was collected. Therefore ethnicity data is missing from a large proportion of the data collected before 2006. Furthermore we have a quite high proportion of mixed ethnicity i.e. children with mother and father from different ethnicities. Data on ethnicity is important for genetic analyses because it enables you to control for the confounder effects of population stratification in ethnically diverse communities. So the missing data created a concern for us about how to avoid problems caused by unknown population structure.

Learning new statistical methods

At the MalariaGEN Data Fellow workshops held in Oxford and at our site in Blantyre I learnt from informatics team members that there are statistical methods that can mitigate against the possible population stratification problems. These include genomic control programmes like STRUCTURE (a free software package for using multi-locus genotype data to investigate population structure) and EIGENSTRAT (which detects and corrects for population stratification in genome-wide association studies using principal components analysis). All this is in addition to learning how to get started on the basic analysis of our epidemiological genomic data!