Tuesday, July 29, 2014

Analysis of Upper Paleolithic Siberian forager Afontova Gora-2

Apparently, this 15,000 year-old genome from Central Siberia is heavily contaminated with modern DNA (see section SI 5.2.3. in Raghavan et al. 2013). However, apart from MA-1, it's the only Ancient North Eurasian (ANE) sample available right now, so I thought I'd take a closer look at it.

The shared drift statistics using f3(Mbuti;AG-2,Test) do suggest contamination from a present-day Eastern European source, with, for instance, Ukrainians from Lviv showing an unexpectedly strong signal (third on the list below just behind Pima Indians). This makes sense since AG-2 was probably mainly handled by Slavic-speaking Soviet archaeologists and museum staff.

Shared drift with AG-2 (spreadsheet)

Indeed, in the Eurogenes K15 test, the Baltic component is the most important for AG-2, and this component is modal among Balto-Slavic populations. However, AG-2 fails to register any Mediterranean-specific admixture. At the very least, this is interesting, because all present-day Europeans show this influence. In fact, out of the four K15 components typical of the Near East, only the West Asian component appears for AG-2. This component actually peaks in the Caucasus, where today ANE reaches its highest levels in West Eurasia.

Eurogenes K15 results for AG-2

North_Sea 11.3
Atlantic 0.01
Baltic 22.83
Eastern_Euro 20.53
West_Med 0
West_Asian 4.63
East_Med 0
Red_Sea 0
South_Asian 13.9
Southeast_Asian 0
Siberian 5.97
Amerindian 16.07
Oceanian 4.77
Northeast_African 0
Sub-Saharan 0

4 Ancestors Oracle results based on the K15 ancestry proportions suggest that AG-2 might simply be a more westerly ANE sample than MA-1, perhaps with some European forager ancestry. Below are a few examples of the best population approximations; note the strong showing by StoraFörvar11, a Mesolithic genome from near Gotland, Sweden. The full list can be seen here.

1 Brahmin_UP+North_Amerindian+StoraFörvar11+StoraFörvar11 @ 8.364493
2 Burusho+North_Amerindian+StoraFörvar11+StoraFörvar11 @ 8.411899
3 MA-1+MA-1+StoraFörvar11+Tatar @ 8.427561
4 Kshatriya+North_Amerindian+StoraFörvar11+StoraFörvar11 @ 8.437549
5 Gujarati+North_Amerindian+StoraFörvar11+StoraFörvar11 @ 8.45127

However, I was only able to use around 13K SNPs that overlapped with my dataset for all of the tests here. So perhaps these markers were much less affected by contamination than the rest? In any case, here are three Principal Component Analyses (PCA) to finish things off. Again, AG-2 basically looks like the genome of a late ANE survivor with a solid contribution from indigenous European foragers. Hopefully this can be confirmed or debunked in the near future with a much higher quality sequence of its genome.

Update 20/08/2014: In the above analysis I used variants from the 1stextraction AG-2 bam file. To try and get more markers I have now also processed the apparently lower quality supernatant bam. Merging the two files has given me just over 30K SNPs to play with, and I think the extra markers have made a positive difference. Below are the updated results, which I'd say appear more accurate because they're much more similar to those of MA-1 (see here and here).

Revised Eurogenes K15 results for AG-2

North_Sea 12.63
Atlantic 0
Baltic 12.77
Eastern_Euro 30.26
West_Med 0
West_Asian 1.13
East_Med 0
Red_Sea 0
South_Asian 18.44
Southeast_Asian 0
Siberian 3.84
Amerindian 17.34
Oceanian 3.6
Northeast_African 0
Sub-Saharan 0

Revised 4 Ancestors Oracle results for AG-2
Revised shared drift with AG-2 (spreadsheet)

PCA based on the new set of markers look almost identical to the PCA above, so I won't bother posting them. By the way, I updated the Eurogenes ancient genomes datasheet with the revised AG-2 K15 results (see here).

