Wednesday, June 15, 2011

Getting the most out of the Interpretome chromosome painting + some suggestions for the future

The Interpretome chromosome "painting" is the best tool of its kind at the moment. It's ultra user friendly, and has a range of advanced options that can be tweaked to dig deep into one's inter-continental ancestry. It's still a work in progress, so some of the features aren't very accurate, like the painting based on data from the HapMap 3. So for the time being, I recommend sticking to the standard HapMap 2 option,. This means getting a three-way inter-continental admixture test, based on the following references: white Americans of Northern and Western European origin (CEU), Yoruba from Nigeria (YRI), and Chinese and Japanese (CHB + JPT).

However, I suggest that users cranck up the "number of samples" to 50, and lower the SNP/Kb threshold to 40 SNPs/120 Kb. After running tests on myself and selected reference samples, I believe this setting is optimal for Europeans, from anywhere in Europe. Basically, this seems to be the zone where Europeans are likely to get the maximum amount of information with only slight noise. Below are two screen caps of results obtained from my own raw data file after using the tool multiple times at the settings described above. One suggests the genome to be entirely of Northern and/or Western European origin, and the other shows two small East Asian hits near the left tip (telomere) of chromosome 6.

Most Europeans with no known ancestry from outside of Europe should expect such results - either one or the other, or both if you run the test more than once. It's unlikely that small segments which flash in and out at these settings, and are positioned near the generally more highly conserved telomeres (tips) or centromeres (middle break points), indicate genuine admixture. However, I would say that if they're of a fair size (ie. more than a couple of scratches), and show up regularly at 40 SNPs/120 Kb, then they're certainly worth investigating further. Indeed, if they show up at higher SNP/Kb settings, then I think it's a fairly sure bet that they're real.

I wont't bother reviewing the HapMap 3 option just now. However, I'd like to make a few suggestions to Konrad K. and the team. Firstly, I don't think it's useful to rely on recently admixed reference populations, like the Mexicans (MEX) or African-Americans (ASW). They'll only cause noise and confusion. For instance, it's likely that many Iberians will end up with multiple "Mexican" segments using the HapMap 3 option, not because they have Mexican ancestry, but because many of the Mexican references are partly Spanish. So it'd be much more useful to rely on "purer" American samples, such as the Amerindians from the HGDP.

Indeed, it'd be wonderful if the Interpretome team could at some point put together a seven-way test, based on relatively unmixed samples from Northern Europe, Southern Europe, the Middle East, South Asia, East Asia, the Americas and Africa. They could do this by combining samples from the HGDP, Behar et al., and even Rasmussen et al. As far as I can see, it should be possible.

