The Origin and Ancient Migrations of the Khasi People: Genetics Tells the Story


The greatest history book ever written is the one hidden in our DNA. Spencer Wells, Director, Genographic Project, National Geographic Society

India has a population of 1.2 billion, of which the tribal population of the country, as per the 2011 census, is 104.3 million, constituting 8.6% of the total population. Broadly, Scheduled Tribes inhabit two distinct geographical areas, central India and the Northeast. Tribalsin India form about one-fourth of the world's indigenous peoples1-3.

 Tribals are often referred to as adivasi, a catch-all term for all of the diverse tribal groups of India. The term carries the specific meaning of being the original inhabitants of the region.  Substantiating evidence of tribal origins comes from a variety of historical sources, supplemented by linguistic studies and more recently by genetic studies.  The archaeological data is somewhat thin, especially in Northeast India. Few scholars have attempted to look at the available assemblage of studies, since academicians tend be experts in one discipline. The sudden burst of genetic studies, hitherto unavailable to historians has not yet entered the history books, as historians are understandably reluctant to cross over into strange territory.

Among the 220 odd tribes of NE India, only Khasis speak an Austro-Asiatic (AA) language, a family of languages generally considered to be the oldest identifiable language group of a region that spreads from east India to southeast Asia. 'Austro' is an adjective meaning 'southern'. One of the major theories of the origin of the AA languages postulates a northeastern India locus or somewhere in the vicinity of the Bay of Bengal. The two main language groups in the Northeast are Tibeto-Burman, e.g. spoken by Garo, Bodo, and most of the Arunachal tribes and Indo-European e. g. Assamese. There are no native speakers of Dravidian in the Northeast, another major language group of India.

The Austro-Asiatic languages are thought to be the first languagesto be spoken in ancient India, the early form of which is called Proto-Austro-Asiatic.  Among the AA languages, Munda predates the other languages5. The date of separation of the two main Austroasiatic subfamilies, Muṇḍā and Mon-Khmer (e. g. Khasi), has never been estimated and must be placed well back in prehistory6. The competing theory that the AA languages originated in SE Asia is partly based on the observation that the Mon-Khmer languages show greater diversity and spread in that region. This argument could be mitigated by the relative isolation of AA languages in India since they were later surrounded by other language groups which were more dominant and inhibited the spread of AA languages in eastern and NE India7.

The consideration of the AA languages as the earliest to be spoken in India bears consonance with a body of genetic studies that have explored the origin and migrations of the earliest inhabitants of India. While the linguistic and genetic evidence is not fully conclusive, there are several interesting pointers to the origin of the Khasisthat have scientific credibility.

 Out of Africa and In Our Genes

Homo sapiens originated in east Africa about 200,000 thousand years ago. Because of drought and search for food, a small band set out northwards, crossing the Red Sea to the Arabian peninsula. Over the next millennia, migrations spread out in various directions, but firstly towards the East, reaching India about 60-70, 000 years ago. In later migrations, the other continents became populated, and these first intrepid humans became the 'aborigines' of their lands, followed much later by other human groups. From India, passing through the Northeast corridor, onward migrations travelled to East Asia, Southeast Asia and Australasia. The "Out of Africa" theory, is the most widely accepted model of the geographic origin and early migration of modern humans. This initial migration out of Africa was responsible for the peopling of the world.

 It has been estimated that from a population of 2,000 to 5,000 individuals in Africa, only a small group, possibly as few as 150 to 1,000 people, crossed the Red Sea. Today at the Bab-el-Mandeb straits the Red Sea is about 12 miles (20 km) wide, but 50,000 years ago sea levels were 70 meters lower (owing to glaciation) and the water was much narrower. The group that crossed the Red Sea travelled along the coastal route around the coast of Arabia and Persia until reaching India, which appears to be the first major settling point8.

 Reaping the growing use of genetics to track genealogies, the National Geographic Society set up the Genographic Project to study ancient ethnic communities. Material from genetic studies has been culled and compiled in a reader-friendly format on their website and anyone can get their DNA tested by sending in a vial with a saliva swab.

 According to their website, "When DNA is passed from one generation to the next, most of it is recombined by the processes that give each of us our individuality. But some parts of the DNA chain remain largely intact through the generations, altered only occasionally by mutations, which become 'genetic markers'. These markers allow geneticists to trace our common evolutionary time line back many generations.Different populations carry distinct genetic markers. Following the markers through the generations reveals a genetic tree on which today's many diverse branches can be followed backwards to their common African root. The markers in our genes allow us to chart the ancient human migrations from Africa across the continents"8.

 Through the eons of time, the full story remains written in our genes. When DNA is passed from one generation to the next, most of it is recombined by the processes that give each of us our individuality. But some parts of the DNA chain remain largely intact through the generations, altered only occasionally by mutations, which become genetic markers. These markers allow geneticists to trace our common evolutionary time line back many generations and to chart the ancient human migrations from Africa across the continents.

Fig 1: Human migration out of Africa (Wikimedia Commons).

 The Munda and Khasi

Austro-Asiatic populations are considered to be the first to have arrived and settled in India. The ancestors of the Munda arrived 66,000 years ago, and the first genetic offshoot were the Khasi, about 57,000 thousand years ago, who then migrated to Northeast India. Khasi speakers probably went on to Southeast Asia via the Northeast Indian corridor about 40,000 years ago. So the AA Khasi tribes represent a genetic continuity between the populations of South Asia and Southeast Asia. Khasis, the only AA speakers in the Northeast are surrounded by tribes of Sino-Tibetan-Burman origin, who came to the region 10-20,000 years ago.

 In a genetic study of 25 groups from different parts of the country conducted in 2007, blood samples of 1222 individuals from the major ethnolinguistic groups were tested. These included Indo-European, Dravidian, Tibeto-Burman and the three AA populations residing in India: (1) Mundari, spoken by tribes inhabiting Chota-Nagpur plateau in Central and Eastern India, (2) Mon-Khmer, spoken by Nicobarese andShompen tribes from Andaman and Nicobar islands and (3) Khasi-Khmuic, represented by the Khasi from Northeast India. Ninety-two Khasis from the West Khasi Hills, East Khasi Hills, Jaintia Hills andRi-Bhoiin Meghalaya were tested. The primary institutes which conducted the study were the Indian Statistical Institute, and the Centre for Cellular and Molecular Biology, both in Hyderabad.

 The authors concluded that, "Our results suggest a strong paternal genetic link, not only among the subgroups of Indian Austro-Asiatic populations but also with those of Southeast Asia…The results also indicate that the haplogroup O-M95 had originated in the Indian Austro-Asiatic populations ~65,000 yrs BP (Before Present) and their ancestors carried it further to Southeast Asia via the Northeast Indian corridor. Subsequently, in the process of expansion, the Mon-Khmer populations from Southeast Asia seem to have migrated andcolonized Andaman and Nicobar Islands at a much later point of time. Our findings are consistent with the linguistic evidence, which suggests that thelinguistic ancestors of the Austro-Asiatic populations have originated in India and then migrated toSoutheast Asia9.

 O-M95 is a gene entity which originated in the Munda and is found in India only among Austro-Asiatic populations, but now seen all over Southeast Asia. This strongly suggests that Austro-Asiatic populations of India are not only linguistically linked to Southeast Asian populations but also genetically associated. The tracking of gene entities leads to a calculation for TMRCA (time to most recent common ancestor) which indicates that the Khasi appeared 57,000 years ago, 9000 years after the Munda.

Fig 2: Map showing present-day distribution of Austro-Asiatic groups and the routes of migration of the different Austro-Asiatic linguistic subgroups of India (BMC EvolBiol)

 Migrations through the Northeast land corridor

Two major routes have been proposed for the initial entry of humans to East Asia:(1) via Central Asia to Northeast Asia, and subsequently onwards to Southeast Asia and beyond, and(2) through India to Southeast Asia. Given its unique geographic position, Northeast India is the only region which provides a land bridge between the Indian subcontinent and Southeast Asia,sandwiched between the mountains of Eastern Himalayas on the north and the Indian Ocean on the south.

 "Given that the Austro-Asiatic linguistic family is considered to be the oldest and spoken by certain tribes in India, Northeast India and entire Southeast Asia, we expect that populations of this family from Northeast India should provide the signatures of genetic link between Indian and Southeast Asian populations. In order to test this hypothesis, we analyzed mtDNA and Y-Chromosome SNP and STR data of the eight groups of the Austro-Asiatic Khasi from Northeast India and the neighboring Garo and compared with that of other relevant Asian populations. The results suggest that the Austro-Asiatic Khasi tribes of Northeast India represent a genetic continuity between the populations of South and Southeast Asia, thereby advocating that northeast India could have been a major corridor for the movement of populations from India to East/Southeast Asia."10

 The technical language aside, the conclusions of the study are clear.  The genetic evidence points to the direction of migration from Northeast India to Southeast Asia and beyond, supplementing evidence from linguistic studies. The authors of the study describe the Khasi tribe as providing the hitherto missing link between the AA populations of the two regions and this finding is highlighted in the title of the article.10

 The data from archaeology are of more recent antiquity, but are mentioned to round off this discussion. Because of heavy rains and mountainous hillsides, archeological evidence is not easily preserved and may have been washed into the rivers and floodplains. Neolithic stone tools have been found at several locations in the Northeast11.  In Meghalaya, stone implements have been found at several sites, including around Umiam-Barapani, including Sohbetpneng. A large 'tool factory' has been excavated at the foot of the LumDiengiei hill slope with 'unfinished' and 'un-ground' tools. The earliest Neolithic sites in India date to around 7500 BC and the Northeast sites have not been conclusively dated.

In the last few years there has been a deluge of papers on genetic origins, not all in agreement. Because of their highly technical nature, the conclusions are sometimes difficult to decipher. Historians are not geneticists and geneticists are not historians and neither are linguists. These interdisciplinary gulfs are yet to be bridged in history textbooks. But a great deal of light has been shed on the origins and antiquity of the Khasis and other ethnic groups of the Northeast.

The above is an intense research conducted by Mr. Glenn Kharkongor. He can be contacted at


  1. Census of India Website: Office of the Registrar General. Registrar. 2011.
  2. Statistical Profile of Scheduled Tribes in India 2013. Ministry Of Tribal Affairs Statistics Division Government of
  3. Who are Indigenous Peoples? United Nations Factsheet. documents/5session_factsheet1.pdf (nd)
  4. George van Driem. Glimpses of the Ethnolinguistic Prehistory of Northeastern India. In Origins and migrations in the extended eastern Himalayas. Toni Huber and Stuart Blackburn (eds).  Leiden & Brill, 2012. P187-211.
  5. Linguistic History of the Indian Subcontinent. (nd)
  6. Austroasiatic languages. Encycl Britannica. Austroasiatic-languages. (nd)
  7. Paul Sidwell and Roger Blench. Part IV Origins and diversification: the case of Austroasiatic groups. N. J. Enfield, editor. Chap 14 The Austroasiatic Urheimat: the Southeastern Riverine Hypothesis. In Dynamics of Human Diversity, 317-345. Pacific Linguistics, 2011
  8. The Genographic Project by National Geographic – Human Migration … (nd)

  1. Y-chromosome evidence suggests a common paternal heritage of Austro-Asiatic populations.
  2. Kumar, A. N. S. Reddy, J. P. Babu, T. N. Rao, B. T. Langstieh, K. Thangaraj, A. G. Reddy, L. Singh and B. M. Reddy. BMC Evolutionary Biology, 7:47 doi:10.1186/1471-2148-7-47, 2007.
  3. Reddy BM, Langstieh BT, Kumar V, Nagaraja T, Reddy ANS, et al (2007) Austro-Asiatic Tribes of Northeast India Provide Hitherto Missing Genetic Link between South and Southeast Asia. PLoS ONE 2(11): e1141. doi:10.1371/journal.pone.0001141
  4. Piecing Together from Fragments: Re-evaluating the'Neolithic' Situation in Northeast India. Tiatoshi Jamir, Department of History & Archaeology, Nagaland University. (nd)

Featured image Courtesy: Tour My India Blog and Meghalaya Online