Population genetics and forensic efficiency of 30 InDel markers in four Chinese ethnic groups residing in Sichuan

Abstract Sichuan Province is located at the transitional junction regions of the Qinghai-Tibet Plateau and the low-altitude plains. It also serves as the corridor of Sino-Tibetan-speaking population migration and expansion since neolithic expansion of Proto-Tibeto-Burman populations from Middle/Upper Yellow River during Majiayao period (3300–2000 BC). However, the population structure and the corresponding genetic diversity of forensic-related markers in this region remain unclear. Thus, we genotyped 30 insertion-deletion (InDel) markers in 444 samples from four ethnic groups (Han, Tibetan, Hui and Yi) from Sichuan Province using the Investigator® DIPplex kit to explore the characteristics of population genetics and forensic genetic focuses. All the loci were found to be in Hardy-Weinberg Equilibrium (HWE) after applying a Bonferroni correction and no pairwise loci showed prominent linkage disequilibrium. The combined matching probability (CMP) and the combined power of discrimination (CPD) are larger than 1.8089 × 10−11 and 0.99999999995, respectively. Principal component analysis, multi-dimensional scaling plots and Neighbour-Joining tree among 65 worldwide populations indicated that Sichuan Hui and Han are genetically close to Hmong-Mien and Tai-Kadai-speaking populations, and Sichuan Tibetan and Yi bear a strong genetic affinity with Tibeto-Burman-speaking populations. The model-based genetic structure further supports the genetic affinity between the studied populations and linguistically close populations. Key Points Forensic parameters of 30 insertion-deletions (InDels) in 444 individuals from four populations are reported, which showed abundant genetic affinity and diversity among populations and high value in personal identification. Genetic similarities existed between the studied populations and ethnically, linguistically close populations. Sichuan Hui and Han are genetically close to Hmong-Mien and Tai-Kadai-speaking populations. Sichuan Tibetan and Yi bear a strong genetic affinity with Tibeto-Burman-speaking populations.


Introduction
Insertion-deletion (InDel) polymorphisms has been a promising and powerful supplementary tool in forensic personal identification cases and kinship testing, since they bear the advantages of lower mutation rate, smaller amplicons and absence of stutter peaks compared with traditional gold standard short tandem repeats (STRs) [1][2][3][4][5].
Lying in Southwest China, backland of the Eurasian continent, Sichuan Province owns a complex topography spanning Qinghai-Tibet Plateau, Hengduan Mountains, Yunnan-Guizhou Plateau, Qinba Mountains and Sichuan Basin. Evidence of both archaeology and linguistics subsequently suggested that these regions play an important role in the formation of modern Sino-Tibetan-speaking populations, especially for Tibeto-Burman-speaking populations spreading southwards and westwards from the middle and upper basins of Yellow River during Yangshao period (about 7 000-5 000 years BP) with the millet cultivation expansion [6,7].
Nowadays, the permanent population of Sichuan has exceeded 80 000 000 according to Sichuan Population Statistics Bulletin 2018 (http://tjj.sc.gov. cn/tjxx/zxfb/201903/t20190319_277119.html), which is nearly equivalent to Germany's population. As a multi-ethnic province, Sichuan is the biggest agglomeration for the Yi ethnic group, the secondlargest community of Tibetan nationality and residence for large number of Hui group. However, Han group is still the largest population here (93.53%). Over the past hundreds of years, the ethnic groups have been the largest immigrant population inflow and accounted for an indispensable part in the construction of genetic variability in the province. In this study, we use the Investigator V R DIPplex kit (Qiagen, Hilden, Germany) to investigate population genetics and forensic efficiency in these four Sichuan ethnic groups.

Sample collection
Bloodstains were collected from 444 healthy unrelated individuals from Sichuan Province (155 Chengdu Hans, 132 Liangshan Yis, 119 Tibetans of Ganzi and Aba, and 38 Chengdu Huis) after obtained participants' written informed consents with the approval of the Ethics Committee of the Institute of Forensic Medicine, Sichuan University (K2015008).

DNA extraction, quantification, amplification and genotyping
Genomic DNA was extracted with the DNA Blood Mini Kit (Qiagen) and quantified using the NanoDrop-2000 Spectrophotometers (Thermo Fisher Scientific, Waltham, MA, USA) according to the manufacturer's instructions. InDels amplification was carried out using the Investigator V R DIPplex PCR Amplification Kit (Qiagen) following the manufacturer's protocols and analyzed with Applied Biosystems 3130 Genetic Analyzer and Gene Mapper v3.2 (Thermo Fisher).

Quality control
Control DNA 9948 (Qiagen) and ddH 2 O (Qiagen) were amplified and genotyped along the population samples as the positive and negative control, respectively. All experiments were carried out in our laboratory which has been accredited by the China National Accreditation Service for Conformity Assessment (CNAS) and ISO 17025.

Forensic characteristics
All InDel markers were successfully amplified and genotyped in the 444 samples with no significant deviation from Hardy-Weinberg Equilibrium (HWE) and no significant linkage disequilibrium after performing a Bonferroni correction (P 0.0017) (see raw genotype data in Supplementary Table S2)

Population genetic analyses
To further explore the genetic similarities and differences between populations, 61 worldwide population data of 30 InDel markers were collected into the reference database mentioned above [4,5,. The heat map was built based on the insertion allele frequencies (DIPþ) of 30 InDels from four Sichuan populations and 61 previously published populations, which was shown in Supplementary Figure S1. On the left side of the heat map, 65 populations were separated into two clusters: Mexican populations and others. The second cluster was divided into two branches as African populations (Somali, Nigerian, Xhosa, Zulu) and others. The further separation happened between Asian, European and American roughly. In most Asian populations, which is contrasted with other populations, HLD111, HLD39 and HLD122 loci show prominent low insertion frequencies and HLD118, HLD99, HLD64 and HLD81 loci perform distinct high frequencies.
MDS, PCA and N-J tree were generated based on the R st generated from PHYLIP v.3.6. (Supplementary Table S4). As is shown in MDS (Supplementary Figure S2), population stratification along the continental geographical boundaries is identifiable. Genetic similarities and differences were further dissected using PCA (Figure 1). Top three components could explain 84.254% of the total differences (PC1 ¼ 57.019%, PC2 ¼ 18.228%, PC3 ¼ 9.007%). Yi and Tibetan from Sichuan are genetically close to Tibeto-Burman-speaking populations (Yunnan Yi, Tibetan from Tibet, Qinghai and Sichuan, Tujia from Hubei), while Hui and Han are localized close with Hmong-Mien and Tai-Kadai-speaking populations (Zhuang, Dong and Miao from Guangxi).
The evolutionary genetic relationship revealed by the N-J tree (Supplementary Figure S3) follows a pattern of ethnic, geographical and linguistic affinity.
Asian populations are close to each other on the large scale, while the Sichuan Tibetan group remains close to other 10 Tibetan groups, and Sichuan Hui group stays next to Xinjiang Hui in the picture. Guangdong Han is considerably closer to the geographically close Tai-Kadai or Hmong-Mien-speaking populations (Dong, Miao, and Zhuang from Guangxi).
The STRUCTURE analysis was used to dissect the genetic structure of the four Sichuan populations we studied and other 32 populations on the basis of their raw genotypes, which were shown in Figure 2 [8, 10, 12-17, 19, 21-26, 30, 33]. The K represents the number of the predefined ancestry. Genetic affinity in linguistically close populations can be identified when K increases from 5 to 6. Four investigated populations share large number of ancestry components with neighbouring East Asian populations, which was in accordance with results of MDS, PCA and N-J tree. Furthermore, as we all know, the Silk Road, the earliest and most important channel of communication between eastern and western civilizations, also served as a precious opportunity for gene communication between the populations along the road. Xinjiang was the midpoint of the road. As a result, Turkish-speaking groups (Xinjiang Hui, Kazak, and Kyrgyz) bear both Asian and European ancestries and appear as the interim of genetic structure between populations from East Asia and Europe when K equals 6.

Conclusion
According to the obtained results, Han, Yi, Tibetan, and Hui group of Sichuan were scattered among the Asian groups in terms of genetic distance, evolutionary relationship and genetic structure. The least significant differences in the genetic relationship were found between studied populations and ethnically, Figure 1. Principal components analysis among 65 worldwide populations revealed by top three components (PC1 ¼ 57.019%, PC2 ¼ 18.228%, PC3 ¼ 9.007%), which explains 84.254% of the total differences. linguistically close populations. Furthermore, based on the acquired values, the Investigator DIPplex V R PCR Amplification Kit has been proved to be extremely useful as a supplement of STR to apply in forensic case works in the four populations we studied.

Authors' contributions
Fei Wang, Guanglin He, Zheng Wang, and Yiping Hou designed and conducted the experiment. Mengge Wang, Jing Liu and Xing Zou analysed data. Shouyu Wang, Mengyuan Song, Ziwei Ye and Mingkun Xie helped with background checking and picture drawing. All the authors discussed the results together. Fei Wang and Guanglin He wrote the manuscript. All authors contributed to the final text and approved it.

Compliance with ethical standards
All procedures performed in studies involving human participants were in accordance with the ethical standards of the Ethics Committee of the Institute of Forensic Medicine, Sichuan University (K2015008) and with the 1964 Helsinki declaration and its later amendments or comparable ethical standards. Written informed consents were obtained from all individual participants included in the study.