search this blog

Monday, October 28, 2013

Ancient DNA from prehistoric Bulgaria and Denmark

A paper at the AJHG describes a new cost effective method of significantly increasing the amount of authentic DNA output from ancient samples:

By using biotinylated RNA baits transcribed from genomic DNA libraries, we are able to capture DNA fragments from across the human genome. We demonstrate this method on libraries created from four Iron Age and Bronze Age human teeth from Bulgaria, as well as bone samples from seven Peruvian mummies and a Bronze Age hair sample from Denmark. Prior to capture, shotgun sequencing of these libraries yielded an average of 1.2% of reads mapping to the human genome (including duplicates). After capture, this fraction increased substantially, with up to 59% of reads mapped to human and enrichment ranging from 6- to 159-fold.

This is particularly good news for studies which aim to extract autosomal DNA from hundreds of ancient remains, like Gothenburg University's The Rise project, which I excitedly blogged about earlier this year (see here). In fact, I suspect the aforementioned Danish hair sample is one of the samples from The Rise dataset. The reason I say that is because Morten Allentoft is a co-author on this paper, and he's also doing the DNA analysis for The Rise (see here).

In any case, below are two global Principal Component Analyses (PCA) featuring one of the ancient Bulgarians (V2) and the ancient Dane (M4). The principal components (PC1 & 2) were computed using only modern samples, and then the ancient samples projected onto the PCA space.

The ancient Bulgarian is sitting more or less where modern Bulgarians are usually found on such global plots. On the other hand, the ancient Dane is clearly shifted towards East Asia and the Americas, and as a result clusters with Finns, which I suppose is somewhat unexpected because that never happens with modern Danes. So either there's a problem with the analysis, like, say, projection bias (see below for more details), or this Bronze Age Dane was in fact more eastern in terms of global genetic affinities than modern Danes. The latter might well be true if, for instance, he was a recent descendant of migrants from the east (like present-day Russia), and/or he harbored more Mesolithic hunter-gatherer ancestry than Danes do today.

Now, here are a couple of PCA limited to European samples from the supplemental data PDF, including another ancient Bulgarian (K8) and the same ancient Dane (M4). Unfortunately, PC1 appears to be mostly a reflection of the well documented and very recent founder effect and strong genetic drift experienced by the Finnish population. In other words, it's not saying much more than the fact that the ancient samples weren't affected by the same demographic events and genetic drift as Finns during the past few hundred years. It might have been possible to get more informative results by reducing the Finnish sample to only a handful of the least drifted (ie. least Finnish-like) individuals.

Moreover, it's curious that both ancient samples land in more or less the middle of their respective plots in PC2, despite the fact that they come from very different parts of Europe. I suspect that in these instances projection bias is indeed the problem.

Projection bias is similar to the "calculator effect" (see here), but it affects PCA, especially PCA that include only closely related populations, like from Europe. For more background see Haasl et al. 2012 and Lee et al. 2012.

It's also interesting to note that two of the Iron Age Bulgarians are reported as belonging to mtDNA haplogroups U3b and HV, respectively. Both of these haplogroups are generally accepted to be of Near Eastern origin. They're rare in Europe today (usually <2%), but relatively more common in Bulgaria than most other European countries. This suggests some genetic continuity in Bulgaria from at least the Iron Age to the present. Indeed, U3 has been reported from early Neolithic samples from Germany and Ukraine, which means that the ancient Bulgarian U3 lineage need not have arrived in Europe from the Near East during the metal ages.


Carpenter et al., Pulling out the 1%: Whole-Genome Capture for the Targeted Enrichment of Ancient DNA Sequencing Libraries, The American Journal of Human Genetics (2013),

See also...

More info on two Thracian genomes from Iron Age Bulgaria + a complaint

PCA projection bias in ancient DNA studies

Sunday, October 27, 2013

Afghan Hindu Kush: a genetic sink

This paper by Di Cristofaro et al. might well be the turning point in modern population genetics, at least as far as Eurasia is concerned anyway. Not only do the authors give up on the standard but dodgy method of dating Y-chromosome expansion times with Y-STR diversities, but they also conclude that, contrary to popular belief, Afghanistan and surrounds cannot be the source of any major population expansions into other parts of Eurasia. So this study really goes against the grain to what we've seen from academia in recent years, and I have to say it's very refreshing to finally read a paper like this which doesn't make the dubious claim that Y-chromosome haplogroup R1a is native to India.

Below are a few quotes and figure 2 from the study, showing the spatial distribution of six Ancestry Components (AC) from the K=9 ADMIXTURE analysis. Note the presence of the North European-specific AC4 in Central Asia, but almost complete lack of the South Asian-specific AC7 in Europe.

Given the uncertainties associated with Y-STR mutation rates [73] together with the onset of recent estimations of the Time to Most Recent Common Ancestor (TMRCA) of the various branching events in SNP based Y phylogenies using ‘complete’ Y sequences [74–76], in prudence, we choose not to estimate expansion times based on Y-STR diversities.


Our autosomal and haploid data suggested that the Afghan Hindu Kush populations exhibit a blend of components from Europe, the Caucasus, Middle East, East and South Asia. This juxtaposition of autosomal and haploid markers could reflect important male and female influences contributing to the Afghan populations’ genetic make-up. Considering autosomal data, all ancestral components displayed a decreasing gradient of their frequencies when approaching Afghanistan. Finding the highest genetic frequencies in a region does not necessarily mean that this region was the original source: it has been shown that geographic distributions can result from various modalities besides natural selection such as geographic barriers, subsequent migrations, replacement, isolation, and the surfing effect [69]. However, the fact that all the ancestral components reach a lower frequency when in Afghanistan supports the model of a convergence of migrations [87,88].


Although the modern Afghan population is made up of ethnically and linguistically diverse groups, the similarity of the underlying gene pool and its underlying gene flows from West and East Eurasia and from South Asia is consistent with prehistoric post-glacial expansions, such as an eastward migration of humans out of the Fertile Crescent in the early Neolithic period, and the arrival of northern steppe nomads speaking the Indo-Iranian variety of Indo-European languages. Taken together, these events led to the creation of a common genetic substratum that has been veneered with relatively recent cultural and linguistic differences.

Di Cristofaro J, Pennarun E, Mazie`res S, Myres NM, Lin AA, et al. (2013) Afghan Hindu Kush: Where Eurasian Sub-Continent Gene Flows Converge. PLoS ONE 8(10): e76748. doi:10.1371/journal.pone.0076748

See also...

The Poltavka outlier

Wednesday, October 16, 2013

Revised migration routes of Eurasian Y-chromosome haplogroups

A preprint at arXiv argues that most Chinese paternal lineages can be grouped into three subclades within Y-chromosome haplogroup O3, and that these expanded rapidly during the East Asian Neolithic. Moreover, it includes a series of maps showing early migration routes of modern humans across Eurasia. These maps suggest that Y-chromosome haplogroups R1a and R1b broke away from R1 about 14.8K years ago somewhere in West Central Asia, and then non-Indo-European groups loaded with R1b migrated to the Atlantic fringe via a route north of the Black Sea. R1a is singled out as the Proto-Indo-European marker, which makes sense based on its latest phylogeny and elevated presence in various ancient samples (see here).

Haplogroup P diverged into Q and R at ~24.1 kya, slightly before the LGM. Most Q individuals in Han Chinese belong to the Q1a1-M120 clade, while R’s in Han Chinese are mostly R1a1-M17. The separation events of R1 and R2, and R1a and R1b are estimated here at 19.9 and 14.8 kya, respectively. R1b roamed till the Atlantic coast, forming some of the non-Indo-European groups (e.g. Basque)32.

Yen et al., Y Chromosomes of 40% Chinese Are Descendants of Three Neolithic Super-grandfathers, arXiv:1310.3897v1 [q-bio.PE]