This release contains SNP genotype data from cases and controls genotyped on Affymetrix 500K array.
This data has been deposited in the European Genotyping Archive under EGA Study Code EGAS00000000026, which is split into two data packages:
EGA Study ID | EGA Data Set ID | # Samples |
---|---|---|
EGAS00000000026_Controls | EGAD00000000017 | 1,496 |
EGAS00000000026_Cases | EGAD00000000018 | 1,059 |
- All cases have been diagnosed with malaria in a hospital.
- Controls were samples from within the general population and from new births.
Data Package Structure
Each data package contains:
Genotypes
The Affymetrix 500K SNP chip can yield approximately 2GB per cohort, so this platform’s genotype data have been partitioned according to chromosome and sorted according to SNP position.
Each file is presented in tab-delimited format and contains one genotype per line. Regardless of how the SNPs are organised, all assays are sorted according to sample so that the file can be readily separated into sample blocks. It should also be noted that all genotypes for Affymetrix have been configured to the ‘+’ strand of the SNP. The following is a brief example of the genotype data format:
SNP | SAMPLE | GENOTYPE | SCORE |
---|---|---|---|
rs1234567 | ID-XXXXXXX | CC | 0.9262 |
rs1234568 | ID-XXXXXXX | TC | 0.8650 |
rs1234569 | ID-XXXXXXX | AA | 0.9117 |
Intensities
Quantile normalised signal data were generated from the Affymetrix intensity (‘CEL’) files and used as input to the CHIAMO genotype calling program. Software to perform the normalisation is available (see Available software). The format of the signal data is tab-delimited plain text; there is one line per SNP, consisting of IDs, position, alleles and one pair of intensities per sample for each of the two alleles. All genotypes have also been configured to the ‘+’ strand of the SNP. The following is a brief example of a signal file.
AFFYID | RSID | pos | AlleleA | AlleleB | 1234A1_A | 1234A1_B | 1234A2_A | … |
---|---|---|---|---|---|---|---|---|
SNP_A-0123456 | rs001 | 10000 | C | T | 0.407238 | 1.366599 | 0.347438 | … |
SNP_A-0123457 | rs002 | 20000 | A | G | 0.958866 | 1.084143 | 0.148448 | … |
SNP_A-0123458 | rs003 | 30000 | C | G | 1.943426 | 0.291587 | 1.610764 | … |
Please note that these files may contain very long lines and are not intended to be human-readable.
Samples
We are providing data from two cohorts, in files that come with information describing each sample. These files are tab-delimited and contain each sample’s gender, plate and well number, cohort and ethnic group. They are denoted ‘samples’ files; for example, Affymetrix_20080506fs1_samples_AFC.txt. The following is a brief example of a sample support file:
SAMPLE | GENDER* | COHORT | PLATE/WELL | ETHNICITY** |
---|---|---|---|---|
ID-XXXXXX1 | 2 | AFC | 12701b2 | Jola |
ID-XXXXXX2 | 1 | AFC | 12701c2 | Fula |
ID-XXXXXX3 | 2 | AFC | 12701d2 | Others |
* Females denoted 2, males denoted 1, undefined on manifest is denoted 0.
**Only ethnic information for the major ethnic groups is available and all other groups have been pooled together and labelled as “Others”.
Note that, for some data sets on this site, the chromosome X data have been split into two ‘chromosomes’: 23 and 24. The region not homologous with Y (23) needed to be treated differently from the pseudo autosomal region (24).
Supplementary_data
From March 2013, we have added some supplementary data for use with this dataset.
A ReadMe file accompanies the supplementary data that describes their contents and how they relate to the main data.
Data sets
Controls
EGA Study ID: EGAS00000000026
EGA Data Set ID: EGAD00000000017 (1,496 controls)
Method: Affymetrix 500K array
For each of these samples, one set of genotypes – called by CHIAMO – is available, as discussed and used in the analysis by the Wellcome Trust Case Control Consortium (WTCCC).
Cases
EGA Study ID: EGAS00000000026
EGA Data Set ID: EGAD00000000018 (1,059 cases)
Method: Affymetrix 500K array
For each of these samples, one set of genotypes – called by CHIAMO – is available, as discussed and used in the analysis by the Wellcome Trust Case Control Consortium (WTCCC).
Apply for access
Archived
Data package contact
Citations
Further details and methodologies can be found in Jallow et al. Genome-wide and fine-resolution association analysis of malaria in West Africa. Nat. Genet, 2009; Jun; 41(6): 657-65.