search this blog

Friday, November 21, 2008

A couple of PCAs with Polish samples


Context is important when it comes to genetic affinity, and this is clearly illustrated by an article published in the European Journal of Human Genetics this week. For instance, on the intra-European principal component analysis (PCA) featured in this paper, Poles from Warsaw and Lodz basically sit between samples from Dresden and Moscow, but overlap more with the latter. However, when an African and two East Asian populations are added (ie. the plot turns into an inter-continental one), they end up almost at the top of a much tighter European blob, overlapping heavily with Swedes, Slovaks, Germans and Czechs, just to name a few.




Key: CEU = Utah Americans of Western and Northern European ancestry; CHB = Northern Han Chinese from Beijng; JPT = Japanese from Tokyio; YRI = Yoruba from Nigeria.


The study also shows a pairwise fixation index (Fst) table featuring 12 European, one American (CEU), one Sub-Saharan (YRI), and two East Asian (CHB and JPT) samples. Most of the results appear to align with geography, although Scandinavians show higher affinity than their southern neighbors, Germans and Poles, to the East Asians. This suggests that East Asian or East Asian-like admixture diffused into Scandinavia from the north.



Abstract: An investigation into fine-scale European population structure was carried out using high-density genetic variation on nearly 6000 individuals originating from across Europe. The individuals were collected as control samples and were genotyped with more than 300 000 SNPs in genome-wide association studies using the Illumina Infinium platform. A major East–West gradient from Russian (Moscow) samples to Spanish samples was identified as the first principal component (PC) of the genetic diversity. The second PC identified a North–South gradient from Norway and Sweden to Romania and Spain. Variation of frequencies at markers in three separate genomic regions, surrounding LCT, HLA and HERC2, were strongly associated with this gradient. The next 18 PCs also accounted for a significant proportion of genetic diversity observed in the sample. We present a method to predict the ethnic origin of samples by comparing the sample genotypes with those from a reference set of samples of known origin. These predictions can be performed using just summary information on the known samples, and individual genotype data are not required. We discuss issues raised by these data and analyses for association studies including the matching of case-only cohorts to appropriate pre-collected control samples for genome-wide association studies.

Simon C Heath et al, Investigation of the fine structure of European populations with applications to disease association studies, European Journal of Human Genetics (2008) 16, 1413–1429; doi:10.1038/ejhg.2008.210


No comments: