search this blog

Saturday, July 18, 2015

Around 65% LN/EBA European ancestry in the Hindu Kush (?)


One of the toughest nuts to crack in population genetics has proved to be the story of the people of the Hindu Kush. However, using TreeMix and ancient genomes from the recent Allentoft et al. and Haak et al. papers, I'm seeing most of the Kalash and Pathan individuals from the HGDP modeled as ~65% Late Neolithic/Early Bronze Age (LN/EBA) European and ~35% Central Asian. This, to me at least, makes a lot of sense. For instance:



The Kalash and Pathan samples that can't be modeled in this way, at least with the reference populations that I'm using, are fitted within a framework that closely resembles the old two-way Ancestral South Indian/Ancestral North Indian model (ASI/ANI). They usually score ~12% admixture from the branch leading to the Dai of southern China, which is obviously the proxy for ASI.


Both of these models are correct; they just show the same thing in different ways. So if we mesh them together the Kalash and Pathans come out ~65% LNE/EBA European (which includes substantial Caucasus or Caucasus-related ancestry), ~12% ASI, and ~23% something as yet undefined.

If I had to guess, I'd say the mystery ~23% was Neolithic admixture from what is now Iran. But ancient DNA has thrown plenty of curve balls at us already, so that's a low confidence prediction, even though it does make good sense.

It's also interesting to see the migration edges running from the Ulchi of east Siberia to the LN/EBA Europeans. This might be a signal of minor Eastern non-African (ENA), in other words East Eurasian, admixture. Then again, it might just be the algorithm trying to compensate for something, like excess Eastern Hunter-Gatherer (EHG) ancestry.

The full output from my analysis can be downloaded here. The reference samples and markers are listed here and here.

See also...

The Poltavka outlier

The real thing

The enigma of the Kalash

53 comments:

Nirjhar007 said...

Very Interesting David i am impressed! BTW when the Second part of ''Badasses'' is coming?.
And yes please make the comments section pop out to make it easier to comment...

Matt said...

Yeah, stuff like this makes OK sense. Just for clarity I really see anything necessarily incorrect in the idea that present South-Central Asians may have got a fair amount of Bronze Age steppe mixture necessarily (as much as ADMIXTURE doesn't naturally seem to tend to go with that on the whole).

It's just that, if so, whatever else they got seems to me to had to have been in either more concentratedly basal and divergent on the tree than 40% of a mix of 20:80 Dai:Georgian. And relative to ENA relatedness, the relatedness to WHG relative to ANE seems too low for 9:32:59 Dai:Georgian:Andronovo. (Also wonder if fitting Georgian to these trees would make that more obvious?).

This seems to satisfy that OK, for the most part. The South-Central Asians are either substantively ANI, or they have a 35% mixture edge from a population that actually sits on the ENA branch (although weakly as the outgroup to all others).

That whole region is terra incognita for ancient dna, so the pre-Andronovo population could have anything from 1/3 to 2/3 persistance, and we really don't have anything to tell it apart, depending on what exactly it was like. It could be anything between a third of something quite basal like this treemix is finding (the South Central Asian edge is not even from the same branch as West Eurasian) or maybe even up to two thirds or more of something much less basal (closer to "ANI").

Nirjhar007 said...

Andronovo has nothing to do with SC Asia, So just beat it....

Davidski said...

Matt,

Adding Georgians messes things up, possibly because these trees can't handle more than one modern group with complex ancestry. Or maybe because we're doubling up on what's already there, like the Caucasus ancestry in Yamnaya?

For example, there's a weird migration edge here running from Yamnaya to...Oceania and Australia? I'm guessing it's something linked to ANE, but not very informative.

https://drive.google.com/file/d/0B9o3EYTdM8lQRDJLc2xMQjdCRVU/view?usp=sharing

Alberto said...

I went back to the paper about ANI-ASI admixture (http://www.cell.com/AJHG/abstract/S0002-9297%2813%2900324-8) to check for estimated proportions and dates of admixture. Pathans are estimated to have 71% ANI, which should account for any Georgian-like and Lithuanian-like, the other 29% being Onge/Dai/Malay-like ASI.

An interesting part about the dates of admixture:

"It is also important to emphasize what our study has not shown. Although we have documented evidence for mixture in India between about 1,900 and 4,200 years BP, this does not imply migration from West Eurasia into India during this time. On the contrary, a recent study that searched for West Eurasian groups most closely related to the ANI ancestors of Indians failed to find any evidence for shared ancestry between the ANI and groups in West Eurasia within the past 12,500 years* (although it is possible that with further sampling and new methods such relatedness might be detected). An alternative possibility that is also consistent with our data is that the ANI and ASI were both living in or near South Asia for a substantial period prior to their mixture. Such a pattern has been documented elsewhere; for example, ancient DNA studies of northern Europeans have shown that Neolithic farmers originating in Western Asia migrated to Europe about 7,500 years BP but did not mix with local hunter gatherers until thousands of years later to form the present-day populations of northern Europe."

The study stating that there is no shared ancestry (or very little) between ANI groups and West Eurasian groups is from 1999 and based on mtDNA (http://www.cell.com/current-biology/abstract/S0960-9822%2800%2980057-3?_returnURL=http%3A%2F%2Flinkinghub.elsevier.com%2Fretrieve%2Fpii%2FS0960982200800573%3Fshowall%3Dtrue), so I'm not sure how accurate that might be.

In any case, it's interesting that the earliest date of admixture is 2200 BCE. This means that if we get samples from IVC they will most likely predate any admixture event, so they will be either pure ANI or pure ASI. That will be really helpful to start to crack this nut with more than theoretical analysis (all very interesting of course, but not comparable to having the real thing).

Sein mentioned that next month we might get IVC DNA? There have been news and rumours about it for a while, do you have any further information about it?

Nirjhar007 said...

Alberto,
//so they will be either pure ANI or pure ASI.//
I think they will be not Pure on the basis of that estimate because it is of limited reach, there were ASI-ANI ADMIXTURE Before though the 4.2. KYO one started the main one.
I think this ADMIXTURE is the result of Eastward shift of the Harappan people which was ASI dominant.
So in My Opinion a sample of ~2500 BC will be ~80-85% ANI

Nirjhar007 said...

^ The South And Eastern areas where Harappans shifted were ASI Dominant.

Alberto said...

Nirjhar,

Yes, those estimates based on modern DNA might not be completely accurate, though the authors seem quite confident about them.

But an important point about it is that it's very unlikely that the IVC was pure ASI. On the contrary, if we had to guess and choose between them being pure ANI or pure ASI, I think we'd all go for pure ANI. And that implies no significant migration from outside S-C Asia.

Whether correct or not, I don't know. Hopefully soon we'll have an answer.

Seinundzeit said...

David,

Thanks for attempting this!

Very fascinating stuff, and nicely in line with the qpAdm output. This, in conjunction with the d-stats that show Pashtun-Lithuianian gene flow + Pashtun-Afanaseivo gene flow + and Pashtun-Yamnaya gene flow (rather than Pashtun-Georgian gene flow, Pashtun-Georgian gene flow, and Pashtun-Iranian gene flow), and in conjunction with Chad's f3 stats which show that Pashtuns have the strongest signal of admixture with either Corded Ware + Onge or SHG + Mala, all provide very robust and unambiguous verification of the qpAdm models (although, the qpAdm models themselves should be enough for this sort of thing, but it is excellent to see backing from other methods). I think this sort of backing from other formal methods is what some commentators (hat tip to Matt) wanted to see.

At the end of the day, I'm pretty sure that 60%-70% LN/EBA ancestry in the Hindu Kush is a solid/accurate proposition, one that will be backed via South Central Asian aDNA.

For what it's worth, we also have solid evidence from the uniparental data (the dominance of R1a1a in the region, and many mtDNA links to Sintashta/Andronovo). On top of that, this makes sense of the archaeology, which tells us that the BMAC and IVC cultures suffered fairly total collapse (probably due to a combination of climatic changes, overpopulation, and other less tangible factors like ideological transformations). Naturally, we also see severe depopulation. After this civilizational collapse + severe depopulation, we see the rise of groups across South Central Asia whose material culture is a hybrid between Andronovo and remnants of the BMAC (and with many steppe-derived traditions, like horse sacrifice). That can't be a coincidence.

Alberto said...

@Sein

Just a small comment about these stats, since it was me who asked for them and they ended up being quite confusing:

D(Lithuanian, Georgian; Pathan, Mbuti) D=0.0051, Z=3.294
D(RISE_baAfan, Georgian; Pathan, Mbuti) D=0.0079, Z=2.512
D(Lithuanian, Iranian; Pathan, Mbuti) D=0.0128, Z=7.764

They probably just mean that Georgian and Iranian have more Basal Eurasian admixture than Lithuanian and Afanasievo (in the case of Iranian probably also some Sub-Saharan might be at work). If in those stats you change Pathan for more southern populations the score should be higher (it might peak with something like Kharia or Dai). So basically they are useless for their purpose and it's probably safer to ignore them.

Davidski said...

I think what might have happened with one of those old ASI/ANI models is that they confused some of the steppe ancestry for ASI.

Davidski said...

Matt,

I think that those Central Asian migration edges into most of the Kalash/Pathans would start at the base of the West Eurasian branch, and not on the ENA branch, if the 12% of Dai-related input wasn't there.

Davidski said...

The South Indian-like GujaratiD have around 40% ancestry from the EBA steppe.

https://drive.google.com/file/d/0B9o3EYTdM8lQd3l4bjhJSDQ1dWM/view?usp=sharing

Unknown said...

David,
I'm getting different numbers from that, with a much better p-value. I don't have the Rise samples, or the Paniya, which would help. I get about 45% Corded Ware, 34% Georgian, and 21% Kharia, with a p-value that was about 2-e08. I think if you could get the Paniya on there, it would be about 33%, and give a better picture of LN/EBA stuff.

Davidski said...

But how do you know Georgians and Kharia are viable references, and don't produce good fits by coincidence?

There's no p-value in Treemix. These tests are unsupervised and they don't need the perfect reference samples to be available.

I just build the scaffolding and watch what happens.

Chad said...

Paniyas would be better, they have less additional Atayal. Georgians aren't great with the chi-squared. Starcevo does better. Maybe if you play around with some mixes with Starcevo on the data.ind, or whatever you use. Iraqi and Iranian Jews don't look terrible either. Using Starcevo increases the ASI bearing group a tad and increases Steppe stuff. I can play around with it some more later tonight or tomorrow.

Davidski said...

But none of these reference samples are perfect.

Let me put it another way; Treemix has an infinite number of options to model the Kalash, Pathans and Gujarati on these trees.

I don't ask it to tell me how much LN/EBA European admixture the Kalash, Pathans and Gujarati have.

And yet, it chooses to nest most of the Kalash and Pathan variation within LN/EBA Europe, and also chooses to run a migration edge from LN/EBA Europe to the Gujarati.

Chad said...

True, but it's not perfect either. It's affected by the samples like Admixture and qpAdm. There's been some odd ones, like 8% East African into Stuttgart, and such.

Davidski said...

It wasn't from East Africa, but from a basal part of the tree, and it was describing Basal Eurasian admixture.

To more accurately define the level of Basal Eurasian admixture in Stuttgart I'd have to build the scaffolding with that in mind.

This tree is already too complex for that, because it's focusing on much later events.

Chad said...

Okay. Another thing that I notice with the Paniya, is that no matter what admixture run I do, whether it involves just the Paniya making a component, or the program making them with a mix of Onge, Papuan, and Atayal, and such for ASI, is that each time they go into the Pathans at almost 33%. That isn't just for me. Moorjani found the same thing. They had the Paniya at 84% ASI, and the Pathan at 27%. Almost perfectly at the same rate. If the Paniya can fit into the Pathans at 33%, the remaining 2/3 certainly wont all be EN/LBA, but maybe half of it, or just over. Central and SC Asia had some pretty good sized archaeological cultures. They certainly didn't disappear. I think the Tajiks will be the only ones that are close to 50% Andronovo, or just over 50%.

Davidski said...

The idea behind running trees like this is to make use of the fact that genetic variation is clinal.

So the South Asian migration edges running from the branch that leads to the Dai are much more likely to represent indigenous South Asian ancestry than the Paniya, who have fairly complex ancestry themselves.

Unknown said...

@ Sein

"On top of that, this makes sense of the archaeology, which tells us that the BMAC and IVC cultures suffered fairly total collapse (probably due to a combination of climatic changes"

Not quite. IVC, yes it 'collapsed', or perhaps shifted east following monsoonal rain. but BMAC did not collapse. In fact, it arose exactly after the demise of IVC, and continued will into MBA.

Matt said...

Davidski I think that those Central Asian migration edges into most of the Kalash/Pathans would start at the base of the West Eurasian branch, and not on the ENA branch, if the 12% of Dai-related input wasn't there.

Yeah, a population that is 1/3 Dai, 2/3 "something else" could sit weakly on the East Asian branch, if the "something else" was essentially equally positioned to the West Eurasian and East Asian branch (e.g. so much on the base of the West Eurasian branch its almost equally related to both branches, aka Basal Eurasian).

A population that is 1/5 Dai, 4/5 Georgian as in literal interpretation of prior models seems like it shouldn't but should still decidedly be on the West Eurasian branch. Assuming Georgian is anywhere close to Bronze & Iron Age Armenia, for'ex. But I can't be 100% sure without seeing where Georgian would be on that tree which it seems like treemix can't really cope with.

Seinundzeit said...

Mike,

In terms of temporal context, both the BMAC and IVC have considerable overlap (as far as I know, it would be incorrect to say that it arose directly after the demise of IVC, but then again, I need to read more material on the subject).

Also, David Anthony draws a somewhat different picture in "The Horse, the Wheel and Language". Based on that text (my main introduction to the subject, at this point), BMAC settlements decreased radically in size, and in the later period we see a very strong intensification of contact and interaction with southernly extensions of the Andronovo culture (a melding of material culture seems to occur, not to mention the horse sacrifices).

Chad,

As we saw with the ASI ADMIXTURE experiments you tried, that software is exceedingly volatile. When you simply added a few new populations, the components would change radically, even when dealing with the same K. Depending on the experiment, you had the Paniya range from only 30% ENA (and 70% West Eurasian) all the way to 80% ENA (and only 20% West Eurasian), although you usually got them to be around 60%-55% ENA + 40%-45% West Eurasian.

At the end of the day, ADMIXTURE isn't the best guide here, especially when both qpAdm and TreeMix are showing us the same results (and not to mention the fact that the d-stats corroborate the qpAdm and TreeMix results).

Chad said...

FYI,
I've got the Kharia at the following
30.3% Onge
18.8% Papuan
27.0% Atayal
23.8% Georgian
chi-square 1.038 tail-prob .9594

31.3% Onge
18.0% Papuan
27.1% Atayal
21.5% BedouinB
2.1% Georgian

chi-square 0.753 tail-prob .8606


changing it up
47.7% Onge
26.6% Atayal
18.1% BedouinB
7.6% Georgian
chi-square 2.359 tail-prob .6707

another...
46% Dai
47.1% Georgian
6.9% BedouinB
chi-square 43.165 tail-prob 3.42159e-08-- very poor

I've got some more that I can do later, but the possibility of the Paniya really being 84%ENA/ASI, is looking better.


Chad said...

My admixture run, without the Onge and Papuan, did make them 84% of a component. I'm going to try and pull that out of a future run. You're talking about supervised, which is tough without real ancient pops there. In the unsupervised, Pathans are always 33% of the ENA in the Paniya. Every single time!

Davidski said...

I doubt it's pure ENA. If you run it with Admixture it'll probably come back mixed.

Another problem is that the same components in different populations often mean somewhat different things. This happens a lot.

Unknown said...

David, look above. Kharia consistently at about 76% ENA.

Unknown said...

David,
If you have the Paniya in plink, I'll run it as the same mix.

Davidski said...

I've sent you the Paniya. But the files I sent you only have ~110K SNPs, because that's how many overlap between the usual Illumina chip and the Haak 2015 dataset.

Seinundzeit said...

The Kharia aren't carbon copies of the Paniya on any ADMIXTURE run that I've seen. Quite the contrary, they have much more of a Southeast Asian affinity on HarappaWorld (the Paniya don't come even close to that much "SE-Asian" on that ADMIXTURE run), they have a lot of extra ENA that isn't ASI.

Basically, the Paniya are more West Eurasian-admixed than the Kharia. And since it seems the Kharia are around 75% ENA, the Paniya being around 84% ASI is not supported by these results.

Seinundzeit said...

Also, HarappaWorld has the "South-Indian" component peak in Paniya at 84%, while the HGDP Pashtuns and Kalash average out to only 21%-23%. Same with Dodecad K12b. So, the 84% to 33% pattern you are seeing isn't some sort of necessary outcome.

In addition, Everest tried the whole "zombie" approach with the component that peaks at 84% in Paniya (in HarappaWorld). It came out 50% Caucasus + 50% SE-Asian and Papuan. So it is very far from being an ENA component, and is clearly a hybrid between West Eurasian (Caucasus-like) and ENA (Southeast Asian-like, with a slight Oceanian affinity).

Seinundzeit said...

Sorry for multiple consecutive postings (this is the final one), but the models you are trying might benefit from the addition of ancient samples.

Matt said...

@ Chad, what kind of pright outgroups are you using in those models? I'm interested in what is splitting apart the Onge, Papuan and Atayal like ancestry in Kharia?

Balaji said...

Davidski,

Thank you very much for your interesting work. I am glad that you put in a “?” in the title of your post. You have recapitulated in a different format what the Reich Labs found. That is, that Pathans and Kalash have about 70% West-Eurasian-like ancestry (ANI) and about 30% of another kind of ancestry (ASI). Your tree mix results do not prove that the 70% ANI originated in Europe or in the Steppe.

And your Treemix results do not give any special role to Sintashta or Andronove as ancestors to Pathans or Kalash.

Regarding the role of the steppe, even in Europe, Haak only show that the Corded Ware people have substantial Steppe ancestry. They can model present-day Europeans using qpAdm as having steppe ancestry., But this does not prove Steppe ancestry in all Europeans, especially Southern Europeans.

Davidski said...

Balaji, check this out...

https://drive.google.com/file/d/0B9o3EYTdM8lQN0dnUHppbWc4QUE/view?usp=sharing

Unknown said...

Sein

Your right about the timing. sorry. For some reason I thought the IVC ended much earlier (2500 BC instead of c. 1800). Nevertheless, my overall comment remains:

* the BMAC appears to have been a native development from Chalcolithic villages in the foothills.

* the forts must have been defensive at least in part, but also organization, storage. If anything, the major threat might have been local competitors rather than raiding nomads from the north - as Lamberg Karlovsky puts it - it was a system of "local khans presiding over tribal contexts" - much like later Eras of central Asia.

* contact with steppe was only occasional ("However, steppe material in Margiana is rare, and the new motifs which appear in Period 2 relate to the oasis environment itself (snakes, scorpions, boar, etc.) rather than to the steppe nomadic traditions."- F T Biebert) and context specific. Steppe tribes brought secondary products, as the residue remains of "Andronovo" Incised ceramics demonstrates, and perhaps other materials.

* its overall cultural synthesis is entirely original, and cannot be reduced to either a steppic nor Mesopotamian origin.

* toward the later phases, it expands toward the Indus, central Iran, etc. Perfect timing for the appearance of Indo -Aryans, IMO, incl the Mittani.

* the final abandonment of the forts c. 1700 appears to be due to sudden and massive shift south. Perhaps over expansion with shortfall of food supply, and general exhaustion of what was essentially a consumptive but non-productive system. Again, this dovetails with point above. BMAC-derived material is being found in Iran and Pakistan on a regular basis, as archaeology accelerates in these regions.

* however, the final phases of BMAC (1700 BC onward) are poorly studied. "There is, however, little, if any, evidence for conflict between the BMAC and the ephemeral pastoral nomadic presence" (Lamberg-karlovsky).

So only aDNA can solve the issue of its origins and transformation, etc

Matt said...

Davidski: I think what might have happened with one of those old ASI/ANI models is that they confused some of the steppe ancestry for ASI.

I do think that has some plausibility - the ENA like affinity in WHG / EHG / ANE teaming up with real ENA affinity via ASI. But I don't know. It seemsmostly plausible when you're *just* thinking about Pathans and Kalash and other people from North India / Pakistan. It seems like if you consider the whole Indian cline, then seems like you would require increasing WHG / EHG / ANE to correlate with real ASI.

....

Re: ANI - going back to Dienekes experiments with this component, he generated an ANI component, then put it through an MDS with components he'd generated called West Asian, SW Asian, South European and North European - http://tinyurl.com/obhh6k2. Rotate that and you've got http://i.imgur.com/3bC9UAg.jpg which would fit sort of fit with ANI being a little "northeast" relative to other Eurasian components than the West Asian, although more east than north. It is still far from the "North European" component though.

Re: Y-dna groups, I don't know much about this topic but, apparently R1a runs about 50% in Pathans / Pashtoons in Pakistan - http://www.ncbi.nlm.nih.gov/pubmed/24709582 (http://forensic.yonsei.ac.kr/presentation/75.pdf). No I or I2. The other 50% y-dna looks South Asian, Middle East and East Asia in frequency (so not really other steppe associated lineages taking down the %).
Groups seem to vary and Kalash are only around 20% R1a (per Wikipedia - https://en.wikipedia.org/wiki/Y-DNA_haplogroups_in_South_Asian_populations).

Seems like if they have 65% steppe ancestry, the male line ancestry runs a little below their autosome. Slightly female biased migrations (still with a fair amount of males)? That seems like a different case than other IE migrations where what we think is the IE y-dna runs at a higher percentage than the overall autosome. But then a lot can happen between then and now (e.g. Basques). Maybe R1a was higher at one time, and men with those lineages were systematically replaced a little.

Nirjhar007 said...

Sein,
//Also, David Anthony draws a somewhat different picture in "The Horse, the Wheel and Language". Based on that text (my main introduction to the subject, at this point), BMAC settlements decreased radically in size, and in the later period we see a very strong intensification of contact and interaction with southernly extensions of the Andronovo culture (a melding of material culture seems to occur, not to mention the horse sacrifices).//
A complete Hogwash, there were some traces of interactions at best which is normal but that trace also disappears when we go more South.
SSC/IVC didn't disappear the people moved SE and probably NW also.
There were no movements from Steppes to India but some to Steppe from BMAC area.

Unknown said...

David,
I only have the bed, bim, and fam files. I don't have the plink ones for the SA samples.

Unknown said...

Kharias may be a little more Atayal, but the Paniya have more of the Onge/Papuan stuff.

Seinundzeit said...

Matt,

Depending on the paper, Pashtuns range from 50% R1a1a to 70% R1a1a (although two Ghilji groups in one paper are 80% R1a1a, and the sample size is very large). The people of the Punjab are around 50% R1a1a. After R1a1a, the next largest haplogroups for Pashtuns are usually either some subclade of Q or L, followed by G (not to mention H. Although, sometimes, some subclade of G constitutes the second largest haplogroup). Although, since some Pashtun data-sets turn out to be around 70% R1a1a (and at least one has turned out to be 80% R1a1a), the share of these other haplogroups tend to be pretty paltry.

Also, the Kalash are a severely drifted population, so a much more genetically cosmopolitan group like Pashtuns also turn out to be much more representative in terms of y-DNA. Even Punjabis are better representatives, and as already mentioned, they are around 50% R1a1a.

As far as Dienekes' ANI component, that just brings us back to the fact that we have formal methods all pointing in one specific direction (both qpAdm and TreeMix are showing 60%-70% LN/EBA European ancestry in South Central Asia), so it seems very strange to concentrate on much older work which involved ADMIXTURE and no aDNA samples whatsoever.

Anyway, his component can't really be called ANI, since it isn't even arrived at via methods using f-stats (which is how ANI is defined in Reich et al.), and since it doesn't involve aDNA samples, which are now absolutely necessary.

Chad,

Not only do they have more "Onge/Papuan stuff", they also have much more West Eurasian ancestry. And, their modal component turns out to be 50% West Eurasian, when analyzed using other ADMIXTURE components from the same K.

Seinundzeit said...

I should have posted this before, here are some Pashtun y-DNA frequencies from a few papers:

R1a1a=65.8%
G2c=6.2%
H1a=4.1%
L3=4.1%
J2b2=2.7%
J2a=2.1%
C3=2.1%
R2a=2.1%
Q1a3=1.4%
G2a=1.4%
G=0.7%
G2a1=0.7%
C5=0.7%
C5a=0.7%
J1=0.7%
J2a5=0.7%
J2b1=0.7%
L1=0.7%
Q1=0.7%
Q1a=0.7%
R1b1a2a=0.7%
R2=0.7%

R1a1a=58.82%
G2c-M377=15%
J2a4-P55=6%
H-M69*=5.88%
Q-M242*=3%
G1-M285=2.941%
L1c-M357=2.94%
H1a-M82=2.94%

19 HGDP Pashtuns, the population on the TreeMix graphs (naturally, not all 22, since 3 samples are female)

R1a1a=42.10%
R1b1a2a=10.52%
Q1a3-M346=10.52%
R2-M479*=5.26%
G2c-M377=5.26%
G2a3a-M406=5.26%
L1c-M357=5.26%
L1a-M76=5.26
H-M69*=5.26%
H1a-M82=5.26%

I can find more examples.

Also, I should note that substantial percentages of y-DNA haplogroup I have been reported among Yusufzai Pakistani Pashtuns (who also have an odd peak in y-DNA Q). Also, Afghan Tajiks do display some y-DNA I, and they are autosomally very similar to Pashtuns. Finally, although this has much less weight (since it isn't from an academic source), there is a Punjabi individual at 23andMe who has y-DNA I.

So one needs to look at the big picture, when talking about the y-DNA.

Matt said...

The general population in Pakistan is put in the sample of 638 here at around 37.1% - https://en.wikipedia.org/wiki/List_of_R1a_frequency_by_population.

Doesn't seem like there's *too* much consistency. Sindhis of Pakistani, R1a in Underhill et al 2009 - 49%, Qamar et al 2002 - 12.3%. Same sample size (>100), same country. Pashtuns of Pakistan, Firasat 2007 - 44.8%, Qamar 2002 - 10.8% (both >90), Pashtuns of Aghanistan, Haber et al 2012 51% (sample 49). The consensus looks around 50% if you look at those other studies as well you posted as well, and it may be that the Qamar study is just the outlier for those two.

Generally seems higher than the very low numbers for Iran, which seem from that list are about 13% for the general population (sample size 150).

The Kalash apparently at least have no East Eurasian mtdna, which seems interesting (could be drift, otherwise suggests any ASI admixture into them was male mediated). While of 230 Pathans in Pakistanmtdna is "55% from West Eurasian, 40% South Asian, 5% East Asia" https://docs.google.com/file/d/0ByzF6-KpQXK3ckVFY3E1a1lSc0cxcEJEV0JJVEVwUQ/edit?pli=1

Alberto said...

I think the only population with no West Eurasian at all and more ASI than Dai might be Malay. They still have a lot of East Asian, but less than Dai.

Re: the ANI-ASI study, they used Onge for the ASI, so I don't think it's likely that it confused ASI with steppe ancestry. But I do think that 29% ASI for Pathans is too high. The problem is probably that neither Onge nor Georgians are the right populations, though they are the closest to ANI and ASI among modern ones, apparently:

"Georgians are the most closely related West Eurasian group to the ANI provides a good fit to the data for many models that we tested, whereas models with Europeans in their place do not provide as good a fit. Although we believe that the Onge are only distantly related to ASI, we do not replace the Onge in our analysis because this is the only group we have data from that is consistent with forming a clade with the ASI (the only requirement for our method to work is for the outgroup to form a clade with the ASI)."

They consistently found Georgians to share the most drift with ANI, which is consistent with other data supporting Georgians as the best proxy for ancient Central Asians among modern populations.

But yes, I agree with probably everyone that we need ancient DNA from all the relevant places to really know. Till then it's all too speculative.

Davidski said...

You can see on this tree why Georgians are such a good proxy for ANI.

https://drive.google.com/file/d/0B9o3EYTdM8lQd3kwN0k1Mk13NWs/view?usp=sharing

But ANI wasn't a single population. It's a composite to describe the West Eurasian ancestry in South Asia. So when we dig a little deeper, we see this...

https://drive.google.com/file/d/0B9o3EYTdM8lQZk52SDEzdGp0NXc/view?usp=sharing

So Georgians are a good proxy for ANI, but a lousy proxy for steppe admixture in South Asia.

Seinundzeit said...

David,

A very interesting tree. So overall, the Kalash are placed quite close to Georgians. But as you note about digging deeper, they clearly come out as predominantly LN/EBA European, but with very heavy Georgian-like admixture.

Alberto,

The Singapore Malay probably do have very minor West Eurasian admixture from India and the Near East.

Matt,

Your point about variation between studies is certainly correct. Although, all the papers I've seen show a range of 50% to 80%, so Qamar et al. 2002 is definitely the odd one out (I'll have to give it a read).

Just a side note, but many of the mtDNA lineages counted as "South Asian" in that mtDNA paper are better construed as "West Eurasian lineages that are restricted to South Asia in terms of geographical distribution". The recent paper on West Eurasian mtDNA in South Asia construed many of these same lineages as West Eurasian.

Also, for what it's worth, the Kalash don't display any ASI y-DNA haplogroups. Both y-DNA haplogroup L and y-DNA haplogroup H are clearly West Eurasian, but just restricted in terms of geographical distribution to South Asia (although this isn't really quite true, since L is fairly common in West Asia + the Caucasus, and western Europe has it's own ancient subclade of H, one which we know came with Neolithic farmers, since Starcevo_EN has that same subclade of H). I don't think any population in northern South Asia really displays any Y-DNA haplogroup which we can tie to ASI.

Alberto said...

@Sein

The Singapore Malay probably do have very minor West Eurasian admixture from India and the Near East.

For what I've seen they seem to have less than Dai, who have less than Kharia. So it's very minor indeed. Do you think they have more than any of them?

Krefter said...

If South/Central Asians were 65% European, wouldn't we be able to tell by their physical appearance? This should at least lead to suspicion of these numbers.

I guess it's possible. I've seen a few Indian and half European people, and they could pass as Afghan or whatever. But, if an entire population was that way, I'm not sure if almost 100% of them would.

Davidski said...

South Asians aren't up to 65% modern European, they're up to 65% EBA steppe, most of which lies in Central Asia.

I have no idea what a 65% EBA steppe/23% Southwest Asian Neolithic farmer/12% South Asian hunter-gatherer is supposed to look like, but probably not like a modern Lithuanian, Norwegian or Ukrainian.

Krefter said...

@Sein,

If you're from South/Central Asia you would know. I trust actual DNA more than anything and expect other things to make sense later. Anyways, facial-features is probably the place to look because they change less quickly.

There are many undiscovered mutations that cause Pale-features in Europeans. And they're all probably recessive. What I mean is both parents usually need the gene for the kid to have a Pale-feature. The genes that cause Blue eyes and Red hair have been discovered(for the most part) and that's how they work.

Almost 100% of non-Europeans don't carry any of these genes. So, a Euro/non-Euro mixed population will be much darker than Europeans. Latinos are an example of this.

So, if the Steppe-ancestors of South/Central Asians had many Light-pig genes, those genes would have little room to express themselves once mixed with non-Europeans.

Davidski said...

The Andronovo and Sintashta samples we have show normal Northern/Eastern European pigmentation traits, and were probably lighter in real life than Southern Euros.

Seinundzeit said...

I guess it depends on the samples. For example, here are RISE505's genotypes at some skin pigmentation-implicated SNPs:

ASIP, rs2424984: TC (if I'm not wrong, northern Europeans are always homozygous for the derived allele T)

ASIP, rs6058017: AG (same here, if I'm not wrong, northern Europeans are always homozygous for the derived allele A)

SLC45A2, rs16891982: GC (in this case, I know that northern and eastern European populations are always 100%-99% GG at this SNP, and southern Europeans more around 85%-90% GG at this SNP)

I'm homozygous for the derived "European" variants at these SNPs, so my skin is probably lighter than whatever shade this individual was at.

Yet, my skin color is still on the darker end of the southern European range (I've known only four Italians with darker skin than myself, despite knowing quite a few people of Italian ancestry).

In addition, looking at pigmentation SNPs for the West Asian and South Asian raw-data files that I have, and taking into consideration their actual skin color (since I know those people IRL), at least this Andronovo individual was quite dark. And we already know that Yamnaya and Afanesievo were much more so.

But, RISE505 could easily be an outlier. I wouldn't be surprised if many steppe Indo-Iranians had the same skin color as modern Northern/Eastern Europeans.