Current pool of ultimate collection of mitochondrial DNA from remnants of Kalash

Abstract The mitochondrial DNA (mtDNA) complete control region coverage of 111 individuals from Kalash population of Pakistan has been presented for forensic applications and to infer their genetic parameters. We detected in total 14 different haplotypes with only five unique and nine shared by more than one individual. This population has come up with quite lower haplotype diversity (0.8393) and very higher random match probability (0.1682), and ultimately lower power of discrimination (0.832). Additionally, haplogroup distribution reveals the genetic ancestry of Kalash, mainly from West Eurasia (98.8%) and very little from South Asia (0.9%). Neither African lineages nor East Asian genetic segments were detected among these Kalash. This study will contribute to the database development for forensic applications as well as to track the evolutionary highlights of this ethnic group.


Introduction
Pakistan is assumed to be on the crossroad of modern humans out of Africa and it is one of first terrain where contemporary humans inhabited. On the basis of culture and language, Pakistan is usually divided into 16 ethnic groups of miscellaneous pedigrees. The major ethnic groups include the Punjabis, Pathans, Sindhi, Saraiki, Muhajir, Balochi, Kalashi, and Makrani (Rakha et al. 2011). The Kalasha or Kalash people are a group of Indo-European Indo-Iranian speaking people living in the Chitral district of Khyber-Pakhtunkhwa province of Pakistan (Denker 1981;Ayub et al. 2015). This unique tribe amongst Indo-Aryan peoples of Pakistan comes from a Dardic family. Census' outcome reports its population size to be 5000 individuals which shows its religious minority accompanied by rich cultural attributes (Ayub et al. 2015). Generally, the Kalash people by dint of their legends and mythos are associated to ancient Greece, but traditionally they are much nearer to Vedic and pre-Zoroastrians (Mela-Athanasopoulou, 2011) . It is proudly claimed by some of Kalash people of being descendants of Alexander the Great's soldiers but wide-ranging genetic studies do not support this claim (Williams et al. 2015). Autosomal and Y-chromosome short tandem repeat (STR) analysis suggested no admixture of Greek genetic element in gene pool of Kalash population (Mansoor et al., 2004;Firasat et al., 2007) . Mitochondrial DNA (mtDNA) analysis on Kalash population also does not provide much insight on their evolutionary history because of low cohort size (44 individuals) studied in them (Quintana-Murci et al., 2004). Further, in contrast to previous mtDNA-based Kalash studies where haplogroup assignment is done on the basis of high-resolution RFLP analysis (Ayub et al. 2015), in this study Kalash characterization on the basis of maternal inheritance is done by sequencing $1122 bp long entire control region of mtDNA. So, this study with the highest number of Kalash samples (111) so far is aimed to analyze the mtDNA control region of the genome to identify the haplogroup composition of Kalash.

Materials and methods
2.1. DNA extraction, amplification, and sequencing amplified by the same condition as described in . Sequencing of the entire mtDNA control region spanning nucleotide positions 16,024-16,569 and 1-576 was done using Big Dye Terminator Cycle Sequencing version 3.1 Ready Reaction Kit (Applied Biosystems, Carlsbad, CA) according to the manufacturer's instructions.

Data analysis
All samples were sequenced bi-directionally and evaluated twice as recommended by ISFG (Parson et al. 2014). Applications of online available tools like MitoTool (Fan and Yao 2011), mtDNA profiler (Yang et al. 2013), and HaploGrep (Weissensteiner et al. 2016), making use of PhyloTree as classification tree, were used to evaluate the quality of mtDNA data. Haplogroup assignment was done using the most updated PhyloTree build version 17 (Utrecht, The Netherlands (Van Oven and Kayser 2009). The population statistical parameters haplotype diversity, random match probability, and power of discrimination were statistically calculated by using DnaSP version 6 (Rozas et al., 2017). Analysis of molecular variance (AMOVA) and pairwise F ST values was calculated using Arlequin software version 3.5 (Excoffier and Lischer 2010, Institute of Ecology and Evolution, University of Bern, Switzerland). The Kalash data were compared with the samples from other ethnic groups from Pakistan (Pathan (Rakha et al. 2011), Kashmiri (Rakha et al. 2016), Saraiki , and Makrani , ) and with population datasets from other countries including Uzbekistan, China, Dubai, Egypt, Iraq, Kuwait, Laos, Thailand, and Vietnam (Alshamali et al. 2008;Irwin et al. 2008Irwin et al. , 2009Saunier et al. 2009;Zimmermann et al. 2009;Irwin et al. 2010). Median-joining haplotype network was constructed using the software NETWORK (Kong et al., 2016) .

Results and discussion
In this study, we present the data of 111 Kalash people for mtDNA control region. Mitochondrial complete control region sequences were submitted and available through GenBank accession (KM358270-KM358380). We observed the 14 different haplotypes (five unique and nine shared) with only 47 polymorphic sites. The detected haplotypes, their respective frequencies, and haplogroups have been presented in Table 1. The Kalash population has come up with mtDNA genetic diversity 0.8393, random match probability 0.1682, and power of discrimination 0.832 as shown in Table 2. Total of eight haplogroups were found for Kalash population. Among them, the highest frequency was observed for haplogroup R0a'b (28.8%) in Kalash population (this study) and no evidence was found in Makrani , Pathans (Rakha et al. 2011), Hazara (Rakha et al. 2017), and Saraiki  for this haplogroup.
The genetic diversity of Kalash is exceptionally low, reflecting their conserved traditions in every aspect of life. In another recent study, Kalash has been demonstrated showing the largest genetic distance with respect to other ethnic groups. Moreover, Kalash has been reported as genetic outlier who does not fall in any of definite genetic identity (   Rosenberg et al., 2002). This could be due to the strong impact of evolutionary force like genetic drift causing reduction from prior large population to current smaller population. Likewise, being in sub-continent, Kalash has no visible traces of east or south Asian genetic lineages which could be due to a founder event where fewer numbers of individuals isolated and overtime thrived into present population. The findings of this study are parallel with (Cardoso et al. 2012) study outcomes on Waorani tribe from Amazon. The Kalash with only five unique haplotypes depicting them as conserved population with poor gene pool and lower haplotype diversity mainly due to higher endogamy practices. Very few populations are reported with such a less mtDNA haplotype diversity. Only 11 unique haplotypes were reported in Kichwa population from Ecuador ( Baeta et al., 2012) while only three unique haplotypes were reported in Waorani tribe from Ecuadorian Amazon (Cardoso et al. 2012). The Kalash dataset was compared to other 20 populations by computing AMOVA (Table S1). The majority of observed variance (96.11%) was attributable to differences within populations, and only 3.89% represented differences among populations. The F ST pairwise differences calculated were found comparatively higher between the populations from East and South East Asia, whereas lower between South and West Asian populations (Table S2). The most dominant haplogroups in the mtDNA of Kalash population are U4, R0a, U2e, and J2 indicating the occurrence of Western Eurasian influence (Table 1). Western Eurasian origin of Kalasha people was also indicated by Rahman et al. (2020). The median-joining network analyses of Kalash population have shown a considerable divergence between haplotypes. This network shows many independent branches giving rise to many sub-branches that are separated by several mutations (Figure 2). The Kalash population is the best example to represent population complexity of the Central Asian region and had very low frequency of South Asian lineage haplogroups. So, the Kalash population after comparison with rCRS (NC_012920) has been presented with total only 14 haplotypes (nine shared and five unique haplotypes). The haplogroup R0a þ 60.1T (West Eurasian was observed most frequently 28.8% in this population affirming the previous studies as far as West Eurasian dominance is concerned. This data is in accordance with Rahman et al. (2020) study outcomes. However, the number of individuals (38 samples) included in that study was less than this study cohort size (111 samples). In this population, we observed the West Eurasian haplogroups with unmatched frequency (98.2%) including R0a þ 60.1T (28.8%), U4 (27%), U2e1h (17.1%), J2b1a (14.4%), R2 (6.3%), and H2a1 (3.6%) compared with other Pakistani populations. South Asian representation is led by only one haplogroup M65a Ã (0.9%). The limited haplogroup diversity is most probably the consequence of the isolation of the population.