Genetic and structural characterization of 20 autosomal short tandem repeats in the Chinese Qinghai Han population and its genetic relationships and interpopulation differentiations with other reference populations

Abstract China is a multinational country composed of 56 ethnic groups of which the Han Chinese accounts for 91.60%. Qinghai Province is located in the northeastern part of the Qinghai–Tibet Plateau, has an area of 72.12 km2, and is the fourth largest province in China. In the present study, we investigated the genetic polymorphisms of 20 short tandem repeat (STR) loci in a Qinghai Han population, as well as its genetic relationships with other populations. A total of 273 alleles were identified in 2 000 individuals at 20 loci, and the allelic frequency ranged from 0.000 2 to 0.532 7. The 20 STR loci showed a relatively high polymorphic rate in the studied group. Observed and expected heterozygosities ranged 0.613 0–0.907 5 and 0.614 8–0.920 0, respectively. The combined power of discrimination, and the probability of exclusion in duo and trio cases were 0.999 999 999 999 999 999 999 999 34, 0.999 996 0 and 0.999 999 996 5, respectively. Analyses of interpopulation differentiation revealed that the most significant differences were found between the Qinghai Han and Malaysian, while no significant differences were found between the Qinghai Han and Han people from Shaanxi and Jiangsu. The results of principal component analysis, multidimensional scaling analysis and phylogenetic reconstructions also suggested the close relationships between the Qinghai Han and other two Han populations. The present results, therefore, indicated that these 20 STR loci could be used for paternity testing and individual identification in forensic applications, and may also provide information for the studies of genetic relationships between Qinghai Han and other groups.


Instruction
The Han population is the largest of the 56 officially recognized ethnic groups in China. Findings from the 6th National Population Census of 2010 suggest that they make up 91.60% of the overall Chinese population with a population of 1 220 844 520, and they are also distributed worldwide. The Chinese language, used as the spoken and written language of Han people, belongs to the Sino-Tibetan language family. The appellation 'Han' can be traced back to the Han dynasty of the second and third centuries and represents the majority of the Chinese population to date.
Qinghai Province has the fourth largest land area in China. It is located in the northeastern part of the Qinghai-Tibet Plateau, which has an altitude over 3 000 m above sea level. The history of Qinghai Province began during the Han dynasty when General Huo Qubing built the military fortress known as Xipingting. This was the former site of Xining, which appeared during the Ming dynasty between 1368 and 1644. In 2010, Qinghai had a population of 5 626 723, of which 39% was taken up by minorities including Tibetans, Mongolians, Kazaks and the Hui, Tu and Salar.

Sample collection and DNA extraction
Blood samples were collected from 2 000 unrelated healthy Han individuals living in Qinghai Province, China, whose ancestors over the past three generations were Han individuals who had not migrated or interbred with other ethnic groups. All of the participants signed an informed consent form and completed a questionnaire providing information about their direct blood relatives over three generations. The experimental procedures conformed to the human and ethical research principles of Xi'an Jiaotong University Health Science Center, China. Genomic DNA extraction was performed using the Chelex-100 procedure as described by Walsh et al. [14].
Polymerase chain reaction (PCR) amplification and STR typing PCR amplification was performed using the PowerPlex V R 21 System (Promega, Madison, WI, USA). The total volume of PCR reactions was 25 mL, containing 5 mL PowerPlex V R 21 5Â Master Mix, 5 mL PowerPlex V R 21 5Â Primer Pair Mix, 1 ng template DNA and amplification grade water. Amplification was carried out using a GeneAmp PCR System 9700 Thermal Cycler (Applied Biosystems, Foster City, CA, USA) under the manufacturer's specifications. The AB PRISM 3130 Genetic Analyzer (Applied Biosystems) was used to obtain sample genotypes. Raw data were analysed using GeneMapper ID 3.2 software (Applied Biosystems). 9947A DNA was used as a positive control.

Statistical analysis
Allelic frequencies of 20 STRs, their forensic relevant parameters, and P-values for exact tests of Hardy-Weinberg equilibrium were calculated using the modified Powerstat (version 1.2) spreadsheet. Linkage disequilibrium (LD) analysis of pairwise STR loci was calculated using Genepop version 4.0.10 [15]. Based on genetic data of the 13 overlapping STRs, genetic differentiation comparisons (Pvalues) between the Qinghai Han population and other referenced populations were conducted using Arlequin software version 3.1 with the method of analysis of molecular variance (AMOVA) [16].
Population genetic structure analysis among the Qinghai Han population and other populations was performed using Structure software version 2.2 [17]. The pairwise genetic distance (D A ) and fixation index (F st ) of the studied Han group and other populations were calculated using the DISPAN program [18] and Arlequin software version 3.1, respectively. Heatmaps of D A and F st between these populations were plotted by R software (https://www.r-project. org/). Principal components analysis (PCA) of these populations was drawn using MVSP software version 3.1 [19] based on the allelic frequencies of the 13 overlapping STRs. Multidimensional scaling (MDS) analysis of the Qinghai Han population and other compared populations was plotted using IBM SPSS version 18.0 (IBM Co., Armonk, NY, USA). Two different phylogenetic trees were constructed by MEGA software version 5.0 [20] and PHYLIP software version 3.6 to determine the phylogenetic relationships between Qinghai Han and other populations.

Results and discussion
Allelic distributions and forensic parameter analysis of 20 STR loci The allelic frequencies of 20 autosomal STR loci and their corresponding forensic relevant parameters are shown in Table 1. A total of 273 alleles were found in the studied Han population within the 20 loci ( Table 1). The minimum allelic frequency was 0.000 2 and the maximum was 0.532 7. The lowest values of the power of discrimination (DP), and the probability of exclusion (PE) in duo and trio cases were 0.792 6, 0.203 5 and 0.306 8, respectively, at the TPOX locus, while the highest values were 0.987 5, 0.720 5 and 0.810 8, respectively, at the Penta E locus. With the exception of locus TPOX, the polymorphism information content (PIC) of all remaining loci reached above 0.6. Highest observed heterozygosity (Ho) and expected heterozygosity (He) values were observed at the Penta E locus, while the lowest Ho and He values were at the TPOX locus. The combined DP, and PE in duo and trio cases were 0.999 999 999 999 999 999 999 999 34, 0.999 996 0 and 0.999 999 996 5, respectively. These results revealed that these 20 STR loci are highly polymorphic, and have the potential to be used in both forensic human identification and paternity testing in the Qinghai Han population.

LD analysis
The results of LD tests are shown in Table 2. After Bonferroni correction, the exact P-values of two (D19S433 and FGA; TH01 and FGA) out of the 190 pairwise comparisons were below the significant level (0.000 263). LD can be influenced by many factors, such as selection, the rate of recombination, the mutation rate, genetic drift, the system of mating, population structure and genetic linkage. As the loci are located on different chromosomes, genetic linkage cannot explain the observed LD. However, additional studies are required to determine the role of other factors in LD.

Interpopulation differentiations between the Qinghai Han population and other compared populations
The P-values of genetic differentiation comparisons are shown in Table 3. Significant differences (P < 0.05) were observed between the Qinghai Han and the following groups: the Malaysian at 13 loci, the Tibetan and She at 11 loci, the Uygur and Shui at 10 loci, the Zhuang at five loci, the Dong and Yi at three loci and the Hui, Guangdong Han and Russian at two loci. No significant differences were observed among the Qinghai Han, Jiangsu Han, and Shaanxi Han groups. The highest ethnic diversity was observed at the D18S51 locus, where significant differences were found in eight out of 13 compared groups. The lowest ethnic diversity was observed at CSF1PO and TPOX loci, where significant differences were found in only three out of 13 compared groups.
Population structure clustering analysis of the 14 populations

Genetic distance (D A and F st ) analysis among the 14 populations
Pairwise genetic distances of the Qinghai Han population and other reference populations are shown in Figure 2 and Supplementary Tables S1 and S2. As shown in Figure 2(A), close relative

Phylogenetic reconstructions of the Qinghai Han population and other Chinese populations
Phylogenetic analysis was used to explore genetic relationships between the Qinghai Han group and other populations, as shown in Figure 4. The Neighbor-Joining tree method (Figure 4(A)) demonstrated that the Qinghai Han population formed a sub-branch of the tree with Shaanxi Han and Jiangsu Han populations. Another phylogenetic tree (Figure 4(B)) also showed that the Qinghai Han population was close to Shaanxi Han and Jiangsu Han populations. These phylogenetic results are consistent with the findings of the interpopulation differentiation study, genetic distance analysis, PCA and MDS described above, which likely reflects similar genetic distributions of Qinghai Han, Shaanxi Han and Jiangsu Han populations.

Conclusion
The present study indicated that the loci examined were highly polymorphic in the studied Qinghai Han population. We also found that the Qinghai Han population has the close genetic relationships with Shaanxi Han and Jiangsu Han populations. These results suggest that the loci can be used for paternity testing and forensic human identification, and could also provide information about the genetic relationships between Qinghai Han and other groups.