search this blog

Monday, October 6, 2014

The power of imputation


The latest version of the Affymetrix Human Origins genotype dataset, published last month along with Lazaridis et al. 2014, is an awesome resource for population genetics (see here). However, it lacks Polish samples, which is a major drawback as far as this blogger is concerned.

Hopefully this oversight is corrected soon. In the meantime, I decided to include 15 Poles from the Eurogenes Project dataset in my copy of the Human Origins. But in order to do that I first had to impute around 460K genotypes for each of these people.

Imputing so many markers might sound pretty crazy, but it's actually very doable, especially for genetically homogeneous groups with relatively low haplotype diversity, like the Polish population. I used BEAGLE 3.3.2 for the job, mostly because I'm familiar with it, but also because it's quick and accurate.

My reference panel included 1090 individuals, most of them shared by Eurogenes and Human Origins, and just over 1 million markers. Only around 130K of the markers were shared by the two datasets, but well over 50% of the 1 million genotypes were observed in each of the Poles. This meant that I was imputing sporadically missing data, which is certainly a more sensible strategy than attempting to fill in long stretches of empty calls.

Everything seems to have worked out just fine, and the proof is in the pudding. Below are two Principal Component Analyses (PCA) featuring the Poles alongside 50 samples from the HGDP. The first PCA is based on observed genotypes, while the second on markers that were imputed into the Polish genomes. PCA are very sensitive to artifacts like genotyping errors, but as you can see, there's very little difference between these results. Also, keep in mind that the SNPs used in the Human Origins were specifically chosen for population genetics, while those in the Eurogenes dataset come from chips mostly designed for commercial ancestry and medical work.


Also, here's a PCA based on more than 300K SNPs, both observed and imputed in the Poles, featuring all of the West Eurasian samples from the filtered version of Human Origins, as well as the 15 Polish individuals. Note that the Poles cluster more or less between the Czechs and groups from the East Baltic region, and overlap most strongly with Belarusians, which makes sense.



Citations...

Brian L. Browning, Sharon R. Browning, A Unified Approach to Genotype Imputation and Haplotype-Phase Inference for Large Data Sets of Trios and Unrelated Individuals, AJHG, Volume 84, Issue 2, p210–223, 13 February 2009, DOI: http://dx.doi.org/10.1016/j.ajhg.2009.01.005

Lazaridis et al., Ancient human genomes suggest three ancestral populations for present-day Europeans, Nature, 513, 409–413 (18 September 2014), doi:10.1038/nature13673

15 comments:

Nirjhar007 said...

@David and others
Italian Academic Scholar and Indologist Prof. Giacomo Benedetti has a very refreshing document on the Cradle of PIE people and Culture-
http://new-indology.blogspot.in/2014/10/can-we-finally-identify-real-cradle-of.html
In the article The Kurds And Iranians have a very crucial role! So it would be a great honor if you kindly visit the Scholars blog and give your valuable thoughts...
Have a Great Day.

Mike Thomas said...

"Poles cluster more or less between the Czechs and groups from the East Baltic region, and overlap most strongly with Belarusians, which makes sense.'

HHmm. This, and the homogeneity of the Poles, might be explained by the fact that the Poland- belorussian region was most affected by the depopulation after the collapse of Roman period ?

Davidski said...

All Northeastern Europeans are genetically homogenous, and many are actually even more homogenous than Poles.

The reason for this is low population density across Northeastern Europe until very recent times, coupled with several massive in-situ expansions, like the Baltic and Slavic expansions, and probably the earlier Corded Ware and Unetice expansions too.

As a result, there's comparatively very little haplotype diversity in this area of Europe, and the highest levels of European Mesolithic ancestry.

Nirjhar007 said...

David,
I think you should be reading the article i have linked on PIE origins it just not genetics that will lead us to the origins of PIE but other various factors as described there will have to agree!.
Good Day.
P.S. Waiting Eagerly for the aDNA of Samara, Corded ware.
N.

Mike Thomas said...

David; so then clearly, the Elbe - Dnieper expanse of northern Europe has more or less always been a population sink, not source.

Davidski said...

I guess it depends on the period. This area was a major population source for much of the Balkans, the Carpathian Basin and even large areas of Germany during the Slavic expansions. The genetic imprints of those migrations are still easily seen in these areas.

Mike Thomas said...

BY that are you referring to M458, as an example ? How can you be sure where M458 exactly expanded from ? Just becuase peak frequencies might curently lie in modoern Poland, it does not neceearily imply that M458 expanded there . Do you have privlilage to any other data which conforms this ?

Davidski said...

Population sinks have high haplotype diversity, because the populations there have heterogeneous origins (like Italians or French).

On the other hand, recently settled regions from the same or similar sources show very low haplotype diversity and founder effects (like eastern Finns and French Canadians).

But Poles only have relatively low haplotype diversity, and show no signs of any major founder effects within historic times. So Poland can't be described as a sink, nor a recently colonized region from a Slavic homeland elsewhere.

And yes, there's quite a bit of complexity in the structure of M458 in Poland, with the ancestral mutations to both M458 and the Scandinavian Z284 present there. Z280 also shows a lot of diversity in Poland, and certainly has a very old presence at the very least in the Vistula mouth in Pomerania. You can see that here...

https://www.familytreedna.com/public/r1a/default.aspx?section=results

So it's very likely that the ancestors of present-day Poles have been sitting around in what is now Poland since the Corded Ware days. Although I wouldn't discount older links too, considering the high allele sharing between the recently tested Gotland hunter-gatherers and Poles, Lithuanians and Belorussians.

http://polishgenes.blogspot.com.au/2012/04/prehistoric-scandinavians-genetically.html

http://polishgenes.blogspot.com.au/2014/01/poles-more-indigenous-to-europe-than.html

http://eurogenes.blogspot.com.au/2014/04/low-genomic-diversity-among-ancient.html

Mike Thomas said...

"But Poles only have relatively low haplotype diversity, and show no signs of any major founder effects within historic times"

(for the latter) based on what empirical data ? isnt the overall genetic homogeneity, and overriding predominance of M458, wholly suggestive of a recent (eg Middle Ages) ounder effect. And some genealogists, eg Piotr Gowzdz argue they there was a very recent population growth in Poland (1500 years ago); not to mention Ralph & Coops paper on IBD, which suggests indeed there was depopulation - repopulation throughout much of EE.

"So it's very likely that the ancestors of present-day Poles have been sitting around in what is now Poland since the Corded Ware days"

Yes, maybe ""around Poland"', but to argue that there has been definite cultural and ethnic continuity *withing Poland* is to make absurd deductions about something which is ultimately "social" and 'sociolinguistc' from a whole other line of evidence ('biological').

Davidski said...

M458 reaches a max of around 35% in Southern Poland, and drops to less than 10% in the north, where various local, and rather old types of Z280 take over. There's also a lot of I2 in the south and east, often more basal than among the east and south Slavs.

Also, Poles simply don't behave like a bottlenecked population in any analyses, like Finns or Basques on PCA. Recent founder effects are out of the question.

And I didn't see any data in the Ralph and Coop paper suggesting a depopulation in Eastern Europe. Their data actually show large in-situ expansions within, and possible out of the region, going back over 2,000 years.

But Ralph and Coop did seem to have an unnecessary fixation with the Huns, so I'd be weary about taking their interpretations of what happened in Eastern Europe too seriously.

Mike Thomas said...

"But Ralph and Coop did seem to have an unnecessary fixation with the Huns, so I'd be weary about taking their interpretations of what happened in Eastern Europe too seriously.'

I think their calculations still hold, but certainly the Huns were just hand ful of, relatively speaking ...

"And I didn't see any data in the Ralph and Coop paper suggesting a depopulation "

"Depopulation" is a relative term, of course. We are not talking about extinction of all hominids here, but the archaeological data, now supplemented increasingly by pollen analysis and other palynological evidence, casts no doubt- there was large scale depopulation of eastern Europe, north of the carpathians, from c.450 AD. Of course, there were relictual areas, "refugia', where ancestral 'northern European" lineages survived and repopulated Poland. Thus modern day Poles still appear nothing other than north Europeans (!)

There you see, you have, both, **apparent* signals of genetic continuity, yet, fine-grained archaeological analysis shows rather large scale depopulation in most part of Poland during that brief interlude (450 - 650); wihtout which, genetic data alone would have missed.

In fact, the Late antiquity situation was nothing unusual. Same thing happened after the collapse of the Luzatian culture.

Must be linked to the particularities of the climate in northeastern Europe (being subboreal version of Atlantic, or whatever - dont quote me), apropensity for soil dregradation before the advent of more adcanced farming strategies in the EMAs, and perhaps even endemic warfare. ..

Matt said...

Davidski Population sinks have high haplotype diversity, because the populations there have heterogeneous origins (like Italians or French).

On the other hand, recently settled regions from the same or similar sources show very low haplotype diversity and founder effects (like eastern Finns and French Canadians).


In Europe, this seems kind of hard to examine, maybe because the differences in haplotypes between putative ancestral populations, or their closest modern proxies, seem small enough to yield to even small fairly small founder effects.

For instance, in Europe, haplotype diversity, I think is lowest in Britain and Ireland in Europe on a continental scale, particularly Ireland and Scotland.

http://www.nature.com/ejhg/journal/v18/n11/pdf/ejhg201087a.pdf

This paper only shows haplotype diversity as lower in Ireland and Scotland than England than Sweden, which is lower than Bulgaria, which is lower than Spain.

But another paper by Auton looked at a wider range of populations and again found haplotype diversity to be higher in Italy and Spain, higher in Germany than England or Ireland and higher in Northeast Europe than England or Ireland -

http://genome.cshlp.org/content/suppl/2009/05/01/gr.088898.108.DC1/Supplementary_Material.pdf

(although their Northeast Europe was a bunch of Baltic, Finnish, Scandinavian and other North Slavic populations, so go figure if there'd be any difference if they separated them out into linguistic clusters, with the richer datasets now available).

Rather than look at diversity alone (e.g. France has lower haplotype diversity than Spain, but seems at least as likely to be a sink, in fact more so), I wonder if any researchers have looked private or unique haplotypes, with sources perhaps holding high numbers of private haplotypes relative to diversity.

Within Europe I wonder whether there's a SW-NW cline and NE-NW running to mix in NW Europe - haplotype diversity reduces on both clines, so there is low diversity, but there may be particularly low private haplotype diversity. But then that might depend on the abundance of private haplotypes (whether there is sufficient for statistical power)?

Davidski said...

Haplotype diversity appears to be slightly lower in the UK and Ireland than across the entire eastern half of Europe, including Scandinavia in the second study.

It's difficult to say what that means for Poles.

Mike Thomas said...

I read in a paper on African buffalo that , with even a drop in population by 80%, there was no real impact 'genetically' in terms of diversity. So even a marked drop in population in the immediate post-Roman period in many parts of Europe might not have become perceptible by means of extent genetic analysis, as pockets of diversity nevertheless survived.

Davidski said...

Haplotype diversity depends on the size of the effective population not the total population.

You can have 99% of a population wiped out, but if the people who survive carry all of the effective population then genetic diversity won't drop.

In fact, if mobility increases as a result of whatever happened, as people move around to re-populate the emptied areas, then genetic diversity might actually increase.