search this blog

Monday, July 13, 2020

Don't believe everything you read in peer reviewed papers


Case in point, here's a quote from a recent paper at the Journal of Human Genetics (emphasis is mine):

The Mordovian and Csango samples have a moderate to slight orientation toward the Central-Asian and Siberian Turkic groups. This could suggest the more significant East Eurasian or Turkic ancestry of these populations, which should be further investigated. German samples are inhomogeneous, and some of the German samples also show this tendency, which can be the result of the recent 20th century Turkish immigration into Germany [42].

Nope, these German samples don't show anything even remotely resembling recent Turkish ancestry. The authors of the paper, Ádám, V., Bánfai, Z., Maász, A. et al., should've been able to figure this out, even with the standard analyses that they ran on their dataset. Failing that, the peer reviewers at the Journal of Human Genetics should've noticed that the authors were confused.

Moreover, if the authors and peer reviewers actually bothered to take a closer look at metadata for these samples, which were sourced from the Estonian Biocentre, they'd see that they're not even from Germany. In fact, they represent self-reported ethnic Germans from Russia.

My own quick and dirty analysis of these individuals suggests that many of them harbor East Slavic and/or Volga Finnic ancestries. Indeed, only some of them can pass genetically for run of the mill Germans from Germany. The Principal Component Analysis (PCA) below is self-explanatory. It was run with the Vahaduo Custom PCA tools freely available here. The relevant PCA datasheet can be gotten here.


That's not to say, of course, that some Germans don't have recent Turkish ancestry, because an increasing number of Germans nowadays do, nor that people with German heritage in Russia shouldn't identify as Germans, because that's entirely their choice.

This blog post isn't about what it takes to be German, and this is not something that I ever want to discuss for obvious reasons. The point I'm making here is that the authors and peer reviewers of the said paper at the Journal of Human Genetics were sloppy and half-arsed in their approach. And, sadly, this isn't an isolated case in peer reviewed scientific literature dealing with human population genetics.

I feel that the Estonian Biocentre is also partly to blame for this cock up, due to its somewhat peculiar sampling and labelling strategies. For instance, its scientists rely solely on self-reported identity to establish the ethnic origins of their samples, and they apparently never remove genetic outliers from their datasets or even try to identify them.

Unfortunately, I fear that this relaxed approach will eventually lead to basic errors and even unusual conclusions in a number of so called peer reviewed papers.

I first raised this issue with the Estonian Biocentre about five years ago, when I noticed that some of the supposedly Polish individuals in its dataset were genetically more similar to various groups from northern Russia than to Poles from Poland. These individuals also showed significant Siberian ancestry, which was very unusual indeed. Where the hell did the Estonian Biocentre find Poles who resembled people from near the Arctic circle, you might ask? Apparently in Estonia.

OK, I can imagine that sampling ethnic Poles from Estonia may have been easier for the Estonian Biocentre than sampling Poles from Poland. And Estonian Poles certainly make for interesting and useful data points. However, as you can see in the PCA below, some of these samples (labeled Polish_Estonia by me) aren't representative of the native Polish population, and yet the Estonian Biocentre not only lumps them with their Poles from Poland sample set, but even labels them with the word "Poland". The relevant PCA datasheet can be gotten here.


But, based on my communications with some of the scientists at the Estonian Biocentre, including head honcho Mait Mestpalu, it seems that nothing will ever change there in regards to this issue. Who knows, perhaps some day we'll see a paper based on Estonian Biocentre data in the Journal of Human Genetics claiming that Poles originated near the Arctic circle? I wouldn't be shocked if that actually happened.

Citation...

Ádám, V., Bánfai, Z., Maász, A. et al. Investigating the genetic characteristics of the Csangos, a traditionally Hungarian speaking ethnic group residing in Romania. J Hum Genet (2020). https://doi.org/10.1038/s10038-020-0799-6

See also...

Like three peas in a pod

Wednesday, July 17, 2019

Viking invasion at bioRxiv


A new preprint featuring hundreds of Viking Age genomes has appeared at bioRxiv [LINK]. Titled Population genomics of the Viking world, it looks like a solid effort overall, although I'm skeptical about its conclusions. I might elaborate on that in the comments below, but I'll have a lot more to say on the topic if and when I get to check out the ancient genomes with my own tools. Details about the new samples, including their Y-chromosome haplogroup assignments, are available here. Below is the abstract, emphasis is mine:

The Viking maritime expansion from Scandinavia (Denmark, Norway, and Sweden) marks one of the swiftest and most far-flung cultural transformations in global history. During this time (c. 750 to 1050 CE), the Vikings reached most of western Eurasia, Greenland, and North America, and left a cultural legacy that persists till today. To understand the genetic structure and influence of the Viking expansion, we sequenced the genomes of 442 ancient humans from across Europe and Greenland ranging from the Bronze Age (c. 2400 BC) to the early Modern period (c. 1600 CE), with particular emphasis on the Viking Age. We find that the period preceding the Viking Age was accompanied by foreign gene flow into Scandinavia from the south and east: spreading from Denmark and eastern Sweden to the rest of Scandinavia. Despite the close linguistic similarities of modern Scandinavian languages, we observe genetic structure within Scandinavia, suggesting that regional population differences were already present 1,000 years ago. We find evidence for a majority of Danish Viking presence in England, Swedish Viking presence in the Baltic, and Norwegian Viking presence in Ireland, Iceland, and Greenland. Additionally, we see substantial foreign European ancestry entering Scandinavia during the Viking Age. We also find that several of the members of the only archaeologically well-attested Viking expedition were close family members. By comparing Viking Scandinavian genomes with present-day Scandinavian genomes, we find that pigmentation-associated loci have undergone strong population differentiation during the last millennia. Finally, we are able to trace the allele frequency dynamics of positively selected loci with unprecedented detail, including the lactase persistence allele and various alleles associated with the immune response. We conclude that the Viking diaspora was characterized by substantial foreign engagement: distinct Viking populations influenced the genomic makeup of different regions of Europe, while Scandinavia also experienced increased contact with the rest of the continent.

Margaryan et al., Population genomics of the Viking world, bioRxiv, posted July 17, 2019, doi: https://doi.org/10.1101/703405

See also...

They came, they saw, and they mixed

Who were the people of the Nordic Bronze Age?

Asiatic East Germanics

Wednesday, January 31, 2018

Modern-day Poles vs Bronze Age peoples of the East Baltic


Below are three of my staple Principal Component Analyses (PCA) featuring Baltic Bronze Age (Baltic_BA) samples from the recent Mittnik et al. 2018 paper (open access here). On each of the plots I've also highlighted modern-day Balts and Poles. The latter two PCA also include most of the other ancients from the said paper (listed here). They're not highlighted, but all of the relevant datasheets are available here, here and here, and easy to plot with the Past software.

No doubt, these Bronze Age peoples of the East Baltic, and in particular the four individuals from Turlojiske, Lithuania, are very closely related to modern-day Balts and northern Slavs. They may well be our ancestors, or at least close relatives thereof. This is argued and demonstrated well enough by Mittnik et al., and it clearly shows in my PCA, especially the first one, which is designed to focus on entho-linguistic-specific genetic drift in Northern Europe.




Nevertheless, overall, they do clearly show a higher cut of indigenous European Hunter-Gatherer ancestry relative to modern-day Northeast Europeans (note how in the second PCA the Baltic_BA samples pull towards the European Hunter-Gatherers compared to Balts and especially Poles). I'm not exactly sure what the explanation is for this yet. Indeed, there might be several different explanations. But generally speaking, it's probably in large part the result of post-Bronze Age gene flow into the Baltic region from Central Europe.

See also...

Early Baltic Corded Ware form a genetic clade with Yamnaya, but...

The genetic history of Northern Europe (or rather the South Baltic)

Genetic and linguistic structure across space and time in Northern Europe

Tuesday, January 2, 2018

On the genomic history of North Eurasia (Triska et al. 2017)


Over at BMC Genetics at this LINK. The accompanying dataset is freely available here, although it includes less than 300K SNPs, so the overlap with the Human Origins and EGDP datasets isn't great. Emphasis is mine:

Background: The history of human populations occupying the plains and mountain ridges separating Europe from Asia has been eventful, as these natural obstacles were crossed westward by multiple waves of Turkic and Uralic-speaking migrants as well as eastward by Europeans. Unfortunately, the material records of history of this region are not dense enough to reconstruct details of population history. These considerations stimulate growing interest to obtain a genetic picture of the demographic history of migrations and admixture in Northern Eurasia.

Results: We genotyped and analyzed 1076 individuals from 30 populations with geographical coverage spanning from Baltic Sea to Baikal Lake. Our dense sampling allowed us to describe in detail the population structure, provide insight into genomic history of numerous European and Asian populations, and significantly increase quantity of genetic data available for modern populations in region of North Eurasia. Our study doubles the amount of genome-wide profiles available for this region.

We detected unusually high amount of shared identical-by-descent (IBD) genomic segments between several Siberian populations, such as Khanty and Ket, providing evidence of genetic relatedness across vast geographic distances and between speakers of different language families. Additionally, we observed excessive IBD sharing between Khanty and Bashkir, a group of Turkic speakers from Southern Urals region. While adding some weight to the “Finno-Ugric” origin of Bashkir, our studies highlighted that the Bashkir genepool lacks the main “core”, being a multi-layered amalgamation of Turkic, Ugric, Finnish and Indo-European contributions, which points at intricacy of genetic interface between Turkic and Uralic populations. Comparison of the genetic structure of Siberian ethnicities and the geography of the region they inhabit point at existence of the “Great Siberian Vortex” directing genetic exchanges in populations across the Siberian part of Asia.

Slavic speakers of Eastern Europe are, in general, very similar in their genetic composition. Ukrainians, Belarusians and Russians have almost identical proportions of Caucasus and Northern European components and have virtually no Asian influence. We capitalized on wide geographic span of our sampling to address intriguing question about the place of origin of Russian Starovers, an enigmatic Eastern Orthodox Old Believers religious group relocated to Siberia in seventeenth century. A comparative reAdmix analysis, complemented by IBD sharing, placed their roots in the region of the Northern European Plain, occupied by North Russians and Finno-Ugric Komi and Karelian people. Russians from Novosibirsk and Russian Starover exhibit ancestral proportions close to that of European Eastern Slavs, however, they also include between five to 10 % of Central Siberian ancestry, not present at this level in their European counterparts.

Conclusions: Our project has patched the hole in the genetic map of Eurasia: we demonstrated complexity of genetic structure of Northern Eurasians, existence of East-West and North-South genetic gradients, and assessed different inputs of ancient populations into modern populations.

Triska et al., Between Lake Baikal and the Baltic Sea: genomic history of the gateway to Europe, BMC Genetics, 2017 18(Suppl 1):110, https://doi.org/10.1186/s12863-017-0578-3

Monday, June 19, 2017

Polish aDNA PCA


Below is a Principal Component Analysis (PCA) that I put together for an upcoming presentation on Polish ancient DNA (aDNA). The five RISE samples are from Allentoft et al. 2015, including RISE569, the early Slavic genome from the Czech Republic, which was initially wrongly labeled as that of a Czech Bell Beaker (see here). PL_N17 is an Early Bronze Age (EBA) sample from Gustorzyn, Northern Poland (see here).


I also organized f3 outgroup statistics of the form f3(European_pop,Test,Yoruba) for each of these samples to compare their genetic affinities to present-day European populations. Although f3 outgroup statistics aren't as sensitive as haplotype tests, I think these results look interesting and useful, with both PL_N17 and RISE569 seemingly showing strong links to modern-day West Slavs. The full output is available in a zip file here.

Poland_EBA PL_N17
Lithuanian 0.175778
Ukrainian_West 0.174866
Sorb 0.174334
Estonian 0.174313
Icelandic 0.17397
Irish 0.173863
Polish_West 0.173743
Polish_East 0.173549
Czech 0.173545
Norwegian 0.173533

Early Slav RISE569
Sorb 0.169171
Lithuanian 0.168945
Estonian 0.168819
Polish_West 0.168267
Polish_East 0.168143
Irish 0.168092
Czech 0.167941
Norwegian 0.167787
Icelandic 0.167696
Finnish 0.167685

See also...

Testing for genetic continuity in Poland from the Bronze Age to the present

Saturday, May 20, 2017

Shared maternal ancestry between Slavs and Germanics probably dates to the Metal Ages


Over at the Russian Journal of Genetics behind a paywall at this LINK. Emphasis is mine:

Abstract: The structure and diversity of mitochondrial DNA (mtDNA) macrohaplogroup U lineages in Russians from Eastern Europe are studied on the basis of analysis of variation of nucleotide sequences of complete mitochondrial genomes. In total, 132 mitochondrial genomes belonging to haplogroups U1, U2e, U3, U4, U5, U7, U8a, and K are characterized. Results of phylogeographic analysis show that the mitochondrial gene pool of Russians contains mtDNA haplotypes belonging to subhaplogroups that are characteristic only of Russians and other Eastern Slavs (13.7%), Slavs in general (11.4%), Slavs and Germans (17.4%), and Slavs, Germans, and Baltic Finns (9.8%). Results of molecular dating show that ages of mtDNA subhaplogroups to which Russian mtDNA haplotypes belong vary in a wide range, from 600 to 17000 years. However, molecular dating results for Slavic and Slavic-Germanic mtDNA subhaplogroups demonstrate that their formation mainly occurred in the Bronze and Iron Ages (1000–5000 years ago). Only some instances (for subhaplogroups U5b1a1 and U5b1e1a) are characterized by a good agreement between molecular dating results and the chronology of Slavic ethnic history based on historical and archaeological data.

Malyarchuk, B.A., Derenko, M.V. & Litvinov, The macrohaplogroup U structure in Russians, A.N. Russ J Genet (2017) 53: 498. doi:10.1134/S1022795417020053

Tuesday, May 16, 2017

Globular Amphora people were starkly different from Yamnaya people


The figure below is from the recent Mathieson et al. 2017 preprint; slightly edited to highlight the results of nine Globular Amphora Culture (GAC) samples from two burial sites in what are now Poland and Ukraine.


Despite living in East Central Europe at about the same time as the nearby Yamnaya people of the Pontic-Caspian Steppe, these GAC individuals show practically zero Yamnaya-related or steppe ancestry (note the almost total absence of the orange "Yamnaya" component in the Globular_Amphora results in the ADMIXTURE bar graph). Instead, they're very similar to Chalcolithic and Middle Neolithic Central and Western Europeans, with whom they overlap in the Principal Component Analysis (PCA).

During the tail end of the GAC period, East Central Europe was suddenly dominated by a new archaeological complex called the Corded Ware Culture (CWC). Although most CWC individuals sampled to date show minor GAC-related ancestry, they're overwhelmingly Yamnaya-like, which suggests that by and large the CWC population has its origins on the Pontic-Caspian Steppe. In fact, some of the earliest CWC examples from the Baltic States, such as Latvia_LN in the ADMIXTURE bar graph, are basically identical to Yamnaya people.

It was suggested not long ago that the presence of Yamnaya-related ancestry in modern-day Europeans could be mostly explained by the so called Isolation-by-Distance phenomenon (see here). But as I said at the time, this was a major faux pas, and thanks to these GAC samples I now have direct evidence from ancient DNA to back me up. So forget the idea of anything resembling a gentle cline in Yamnaya-like ancestry east to west across Europe before proto-CWC and Yamnaya exploded from the steppes.

By the way, in that critique I said that it's not possible to recapitulate ancient populations with ADMIXTURE components. I stand by that statement, although as we can see in Mathieson et al. 2017, it is possible to get close at times with enough of the right ancient samples; close enough to make some general observations anyway.

Interestingly, on the PCA plot, the European Bronze Age cluster is more or less half way between GAC and Latvia_LN. This is also where modern-day Poles and Ukrainians cluster on such plots when they're not significantly skewed by projection bias or shrinkage. Thus, I do wonder if the Slavs of East Central Europe are essentially a 50/50 mixture of early CWC and late GAC? I'll try and test this when the Mathieson et al. 2017 dataset goes online.

Reference...

Mathieson et al., The Genomic History Of Southeastern Europe, bioRxiv, Posted May 9, 2017, doi: https://doi.org/10.1101/135616

See also...

Late PIE ground zero now obvious; location of PIE homeland still uncertain, but...

Tuesday, March 28, 2017

Hints of deep genetic substructure in Iron Age Poland


A paper at Infection, Genetics and Evolution looks at the susceptibility to infectious diseases in two late Iron Age groups from Central Poland. I can't wait to see genome-wide and Y-chromosome data from these and other ancient Polish populations. Judging by the outcomes presented in this paper, and also rumors that I've heard from Polish labs, we're in for some major surprises. Emphasis is mine:

Abstract: For thousands of years human beings have resisted life-threatening pathogens. This ongoing battle is considered to be the major force shaping our gene pool as every micro-evolutionary process provokes specific shifts in the genome, both that of the host and the pathogen. Past populations were more susceptible to changes in allele frequencies not only due to selection pressure, but also as a result of genetic drift, migration and inbreeding. In the present study we have investigated the frequency of five polymorphisms within innate immune-response genes (SLC11A1 D543N, MBL2 G161A, P2RX7 A1513C, IL10 A-1082G, TLR2 –196 to –174 ins/del) related to susceptibility to infections in humans. The DNA of individuals from two early Roman-Period populations of Linowo and Rogowo was analysed. The distribution of three mutations varied significantly when compared to the modern Polish population. The TAFT analysis suggests that the decreased frequency of SLC11A1 D543N in modern Poles as compared to 2nd century Linowo samples is the result of non-stochastic mechanisms, such as purifying or balancing selection. The disparity in frequency of other mutations is most likely the result of genetic drift, an evolutionary force which is remarkably amplified in low-size groups. Together with the FST analysis, mtDNA haplotypes' distribution and deviation from the Hardy-Weinberg equilibrium, we suggest that the two populations were not interbreeding (despite the close proximity between them), but rather inbreeding, the results of which are particularly pronounced among Rogowo habitants.

...

Although no sound evidence of population differentiation was found when comparing the samples of Linowo and Rogowo, it is worth noticing that the distribution of mtDNA haplotypes between these two settlements differs remarkably. Apart from the two haplotypes (rCRS and 16126C) that occur in both studied groups, no other pattern of mtDNA SNPs is shared between them. The lack of reflection of these dissimilarities in the FST analysis is probably a result of the low-size group which is more exposed to result bias or low diversity of haplotypes among Rogowo individuals. All of the above allows to draw the theoretical conclusion that although these two settlements date back to the same period and are located within 55 km (or around 160 km along the Vistula River) of one another, they are genetically remote.

Lewandowska et al., The genetic profile of susceptibility to infectious diseases in Roman-Period populations from Central Poland, Infection, Genetics and Evolution, Volume 47, January 2017, Pages 1–8, https://dx.doi.org/10.1016/j.meegid.2016.11.011

See also...

R1a-Z280 from Early Bronze Age Northern Poland

Wednesday, July 27, 2016

Lipka Tatars vs Balto-Slavs


Note the huge difference in this ADMIXTURE bar graph from the recent Pankratov et al. paper between Lipka Tatars from Belarus and nearby Balts and Slavs. The Lipka Tatars are almost identical to Volga Tatars despite residing in their current homeland for about 500 years. I'm guessing the fact that they're Sunni Muslims might have something to do with it.


Pankratov, V. et al. East Eurasian ancestry in the middle of Europe: genetic footprints of Steppe nomads in the genomes of Belarusian Lipka Tatars. Sci. Rep. 6, 30197; doi: 10.1038/srep30197 (2016).

Saturday, June 18, 2016

Poles in the new Human Origins dataset


Harvard's Human Origins dataset is being updated with 238 new samples, including 23 from Poland (15 from Poznan in western Poland and 8 from Lublin in eastern Poland). It should be available for download soon at the Reich Lab website here, although many of the new samples will only be accessible to people who sign a waiver. Below is a Principal Component Analysis (PCA) from Lazaridis et al. 2016 featuring the new samples. Interestingly, most of the Poles, probably those from Poznan, cluster with Sorbs from eastern Germany.


Citation...

Lazaridis et al., The genetic structure of the world's first farmers, bioRxiv preprint, posted June 16, 2016, doi: https://dx.doi.org/10.1101/059311