Aryan Invasion Theory: The Genetics (Part II)

For reading other articles in this series, see Part I and Part III and here.

Aryan Invasion Theory(AIT) has been debunked by almost all of the recent datasets and studies on genetics, linguistics, mythology/religion and archaeology. Yet, this theory keeps cropping up in one form or other, specially in its milder form as the Aryan Migtration Theory(AMT), which is just the invasion theory rephrased as migration due to the lack of any archaeological evidence for any invasion in the Indus Valley Civilization (IVC) at the 1500 BC, which was the time line given for the AIT originally. Most often, the proponents of AIT or its sneaky cousin AMT cite either out-dated studies or half truths to push their outdated ideas. The most blatant case of misinformation and half truths is from the genetics, which unlike linguistics is a hard science and have little room for guesswork. So, it is very important to look at current evidences from genetic theory.

One of the most often used justification for the AIT is that North Indians have genetics similar to the Europeans and hence this theory is true. The proponents even cite various genetics studies to show this point. However, this is a classic case of presenting half truths. This similarity is higher between North Indians and Europeans than between South Indians and Europeans. While this genetic similarity does show that Indians and Europeans are related, it fails to show how they are related i.e. it does not give information on whether Indians migrated to Europe or Europeans (as Aryans) migrated/invaded India. The proponents of AIT just push the invasion of Europeans/central Asians/Aryans into India as per their pre-conceived notion of Aryan Invasion/migration. Also, they are almost always silent on the time period of this invasion/migration.

But unfortunately for AIT proponents, deeper genetic studies can now give the direction of movement of the population as well as the time period for this mixing. The recent studies on the genetics of R1a1 Single Nucleotide Polymorphism (SNPs) in the Y chromosome and mitochondrial DNA studies have helped understand the migration pattern of Humans during the late Holocene (i.e. end of ice age) period better. Y chromosome is found only among males and so it is inherited only by the sons of the group and hence it gives valuable insight into the patrilineal gene flow and migrations. On the other hand mitochondrial DNA is exclusively derived from the mother of the child (as opposed to the DNA/Chromosomes in the nucleus, which is derived from both father and mother) and so it helps understand the pattern of matrilineal gene flow and migration better. In layman term, by tracing the Y chromosomes, one can trace the fathers and grand fathers of the individual/group while tracing the mitochondrial DNA would help trace the DNA of the mother and grand mothers of the group.

Lucotte G. is one such major study on genetics which studies the distribution pattern of R1a1 haplotype in detail among the Europeans, Central Asians and Indians. Since R1a1 is common between the above three groups of population, it is used by most studies to study migration patterns in Eurasia. The study looks for prevalence of R1a1 and the frequency of variation in the R1a1 haplotype. The principle behind this study is that, newer groups of people would have far lower variations among them while older population groups would have higher frequency of variations. In other words, older populations would be lot less homogeneous than the newer populations. This is how modern humans’ ancestry is traced back to Africans, because Africans have the highest amount of genetic variations among the different groups of people today. So as per this link :

In the present study we have extended the field of detection of haplotype XI/haplogroup R1a subject to other countries previously uncovered in our preceding articles [9,10]: these countries are mainly Northern Europe, Georgia and Armenia, Near/Middle East, NorthAfrica, Iran and Afghanistan, Pakistan and India. We found high haplotype XI frequencies values in Afghanistan (18.4%), in Iran (26.5%), in Pakistan (28% and 30.4%) and in India; in this last subcontinent, the maximal value of 61.3% was found in Punjab. We have refound in our samples the clear distinction initially established by Pamjav et al. [21] between Indian Z93 populations and European Z280 populations: all our South Asian populations are Z93, while almost all our European populations are Z280. Datations show that the Z93 Pakistano-Indian group is the most ancient (about 15.5K years); in Europe, the Eastern populations are the most ancient (about 12.5K years) and the Northern ones the most recent (about 6.9K years).

So, the data from the study gives evidence that the R1a1 found in Punjab is the oldest while the central European haplotype is in the middle and Northern European R1a haplotype is the youngest. This pretty much buries the Aryan invasion/migration theories to the ground because if AIT/AMT was true, we would have seen from the genetic results that the Eastern European haplotype being the oldest and Indian haplotype being the youngest since AIT claims that Europeans (the parent population) came as invaders/migrants to North India as Aryans. It is also important to note the age of different population groups in the study. The study states that Central/Eastern European population is 12.5k years old while Northern Population is 6.9K years old. And the Punjabi/North Indian population is at least 15.5k years old.

The age component in these studies is the most important because as per Aryan Invasion/Migration theory, IVC was Dravidian and it ended because of Migration/Invasion by the Aryans around 1500 BC i.e. 4000 years ago. In the light of above mentioned and similar other genetic studies, the Aryan Invasion theory/Migration theory is off the mark by at least 10000 years. The significance of this fact is that it demolishes the argument of AIT peddlers who push the narrative that Hinduism is not native to India and came with the Aryan invaders. Since North Indian population is at least 15,000 years old, this claim falls flat on its face and make Hinduism indigenous to India and India alone.

The actual population composition of India

As per the latest genetic studies, population of India is derived from two separate population groups ANI (ancestral North indians) and ASI (Ancestral South Indians). The data suggests that ANI came to India at around 60000 BC and ASI came to India at around 45000 BC. But without falling for the labels, the ANI and ASI are not synonymous with Aryans and Dravidians as some the AMT/AIT enthusiasts try to make it up. North Indians on average have about 60% ANI genes and 40% ASI genes while South Indians on average have 40% ANI genes and 60% ASI genes. The South and North Indian names for ASI and ANI are given out respectively to denote that they contributed to South Indians and North Indians respectively more and not as a distinct North Indian Aryan or South Indian Dravidian entity. This is also the reason why the South Indians also share genetic similarity with the Europeans. Since North Indians share 40% genetic material with South Indians, and Europeans share 60-70% genetic similarity with North Indians, we find that Europeans have 20-30% genetic commonality with the South Indians. This could also mean that this mixture of North and South Indian ancestors must have happened before 5000 years. This also puts rest to the theories that Aryan invaders became upper castes and the Dravidians became lower castes as Hinduism originated 5000 years back, years after the ANI and ASI had already mixed and because the upper castes of North India have 30-40% ASI genes and lower castes of south India have 30-40% North Indian genes, meaning the birth of castes happened only after near thorough mixing of the two parent populations of India – ASI and ANI

ANI migrations (Source:
ASI migrations (Source:

Technical Summary on using Genetics for dating migration patterns

The first part is sample selection. Like in case of any other study, random samples are taken from a selected population group. Then, a particular gene loci is selected to study. Gene loci have multiple expressions of genes. For example- tall and short beans plants. So the place where the gene sits is the gene loci- in this case the place where the tall and short genes sits is the gene loci. And the tall and short genes are the different kind of genes.

Once you understand the gene loci and different genes, we can move on to next topic i.e. single nucleotide polymorphisms (SNP)!

SNP is nothing but changes in a gene in a single nucleotide, but it does not change the behaviour of the gene significantly and so no discernible phenotype difference can be seen (appearance differences). So SNPs , because they don’t confer any genetic advantages, can be used as a tools for study of distribution and migration of populations. For example, if a tallness gene is identified and its distribution is studied, you will find variations in its distribution because of the different kinds of survival advantages being tall gives. So tallness gene cannot be used for studying migrations, as any undue selection pressures (like better survival advantages, mate selections etc) can confound the study. But SNPs , which don’t affect the survival significantly, are of great help in studying migrations as the pattern of its distribution can only be a result of migrations. But how does one apply SNPs to interpret the data here?

Let’s say that there is a parent population A. It has a gene with SNP- R1a1. Also let’s say, it is present in a place called X. Now, any mutation in R1a1 will be transferred within that group to their progenies. Also, the frequency distribution of any new mutation will be similar in the population A.

Let’s say after a few centuries, a sub group of population A move from place X to a new place Y. Since X is far from Y, any new mutations in pop A in X won’t be found in the pop in Y. So while new mutations might occur in Y, the new mutations will occur in X as well, but since X is the parent population, it will have more variety of variations. So the higher the degree of variations in the genetics, the older the population must be.

So when the scientists study X and Y populations and their SNPs, the population which has a higher density of variations will be the older among the two. This is how scientists concluded that the modern humans came out of Africa because of the highest density of variations there.

Now, when the sub part of population A moves into Y, there might me some natives to the land Y too. So when the natives mix with the sub group of population A, those who migrated from X to Y, the frequency/density of the R1a1 will decrease among them due to mixing with natives from Y. So higher the density of the R1a1, the older the population group. That’s how the above mentioned study by Lucotte claims that the Punjab with the highest density of R1a is the oldest population group.

Other References:


Its primary frequency and diversity distribution correlates well with some of the major Central and East European river basins where settled farming was established before its spread further eastward. Importantly, the virtual absence of M458 chromosomes outside Europe speaks against substantial patrilineal gene flow from East Europe to Asia, including to India, at least since the mid-Holocene.


A peculiar observation of the highest frequency (up to 72.22%) of Y-haplogroup R1a1* in Brahmins hinted at its presence as a founder lineage for this caste group. Further, observation of R1a1* in different tribal population groups, existence of Y-haplogroup R1a* in ancestors and extended phylogenetic analyses of the pooled dataset of 530 Indians, 224 Pakistanis and 276 Central Asians and Eurasians bearing the R1a1* haplogroup supported the autochthonous origin of R1a1 lineage in India and a tribal link to Indian Brahmins. However, it is important to discover novel Y-chromosomal binary marker(s) for a higher resolution of R1a1* and confirm the present conclusions.

*Autochthonus origin means indigenous origin


Analysis of associated STR diversity profiles revealed that among the R1a1a*(xM458) chromosomes “the highest diversity is observed among populations of the Indus Valley yielding coalescent times above 14 KYA (thousands of years ago)”, whereas the R1a1a* diversity declines toward Europe where its maximum diversity and coalescent times of 11.2 KYA are observed in Poland, Slovakia and Crete. As islands such as Crete have been subject to multiple episodes of colonization from different source regions, it is not inconsistent that R1a1a* Td predates the date of its first colonization by the first farmers approximately 9 KYA.38 Also noteworthy is the drop in R1a1a* diversity away from the Indus Valley toward central Asia (Kyrgyzstan 5.6 KYA) and the Altai region (8.1 KYA) that marks the eastern boundary of significant R1a1a* spread (Figure 1, Supplementary¬† Table S4.). In Europe, Poland also has the highest R1a1a7-M458 diversity, corresponding to approximately an 11 KYA coalescent time.