Genetic analysis of 23 Y-STR loci in the Va population from Yunnan Province, Southwest China

Abstract Background Y-chromosomal short tandem repeat (Y-STR) polymorphisms are widely used in forensic DNA analysis. However, there is a lack of information about the Chinese Va population in the Y-STR Haplotype Reference Database. Aim To establish the Y-chromosome Haplotype Reference Database of the Yunnan Va population and investigate the population genetic relationships with other geographically adjacent groups. Subjects and methods In total, 23 Y-STR loci were genotyped with the PowerPlex Y23 Kit in 368 unrelated healthy Va males from Yunnan Province, Southwest China. Genetic polymorphism was analysed using the YHRD’s AMOVA tools and the MEGA 6.0 software. Results The gene diversity (GD) of the 23 Y-STR loci ranged from 0.3092 (DYS19) to 0.7868 (DYS385a/b). According to haplotype analysis, 204 different haplotypes were obtained, out of which 144 were unique. The haplotype diversity (HD) and discrimination capacity (DC) were 0.9852 and 0.5543, respectively. By comparing the Yunnan Va group with the other 22 referential groups, the results revealed that Yunnan Va was isolated from other groups. Conclusions The 23 Y-STR loci were highly polymorphic and informative in the Yunnan Va population, and the results enriched the basic genetic information for forensic investigation and population genetic studies.


Introduction
The Va (also called "Wa") is an ancient population in China, which is mainly distributed in the southwest of Yunnan Province.Ximeng and Cangyuan counties are the main places where the Va people live in compact communities; others are found scattered in the Lancang, Menglian, Shuangjiang, Gengma, and Zhenkang counties and the Xishuangbanna Dai Autonomous Prefecture.The Hans, Yis, Dais, Hanis, Lahus, Jingpos, Blangs, De'angs, and Lisus ethnic groups coexist with the Va people in Yunnan Province.It is the 26th largest ethnic minority and has a population of 430,977 according to the 2020 census.98.8% of this ethnic group is distributed in Yunnan.Historical records show that the forbears of today's Vas, Blangs, and De'angs came under the rule of the Han Dynasty.The Blang, De'ang, and Va populations may have originated from ancient Bai-Pu in the south of China (M.Cang 1997).The Va language belongs to the Palaung-Va language group of the Austroasiatic Language family.Before the founding of the People's Republic of China in 1949, except for some parts of the area where an alphabetic script was used, the Va people had no written language, and they kept records and accounted or passed messages with material objects or by engraving bamboo strips.An alphabetic script was created for the Va people in 1957.The sample's location of the studied population in this paper is shown in Supplementary Figure S1.
For most of its length, the Y chromosome is uniparentally inherited and escapes recombination.Thus, variation arises only by the sequential accumulation of new mutations, reflecting the history of paternal lineage.Recently, Y-chromosomal short tandem repeat (Y-STR) polymorphism has become increasingly interesting, not only for population genetics or evolutionary studies but also for forensics, particularly in cases where standard autosomal DNA profiling is not informative.Haplotypes composed of Y-STRs are very useful both for excluding suspects from involvement in a crime by demonstrating non-matching haplotypes and for identifying groups of male relatives belonging to the same paternal lineage by demonstrating haplotype matches (Leite et al. 2008;Huang et al. 2011).The PowerPlex Y23 (Promega Corporation) system has been used to investigate approximately 100,000 Y-STR reference databases from various populations around the world on the YHRD website, however, population data for the Chinese Va are lacking, and the genetic relationships between the Va minority and other Chinese populations or adjacent Asian populations are unclear.In this study, we presented allele frequencies and haplotype distribution of 23 Y-STR loci in the Va group from Yunnan Province, China, and compared pairwise genetic distances with the other populations.

Study population
368 blood samples of the Va ethnic group were collected from Ximeng Va Autonomous County, Pu'er City, Yunnan Province after informed consent, and this study was approved by the ethics committee of Kunming Medical University, Yunnan, China (No. KMMU2020MEC013).

DNA typing
Genomic DNA was extracted from FTA cards with the Chelex-100 method (Walsh et al. 1991).PCR for 23 Y-STR loci was carried out on a GeneAmp PCR system 9700 (Applied Biosystems, USA) using the PowerPlex Y23 (Promega Corporation) PCR Amplification kit according to the protocol described by the PPY23 and using 10.0 μl reaction volume for each sample, which contains 2.0 μl PCR Master Mix, 1.0 μl Primer Pair Mix, 6.0 μl ddH2O, and 1.0 μl template DNA.PCR conditions had the following steps: pre-denatured 96 °C for 2 min, followed by 30 cycles of 94 °C for 10 s, 61 °C for 1 min, 72 °C for 30 s, a final extension hold at 60 °C for 20 min, and a final soak at 4 °C.The amplified products were separated on an ABI 3130XL Genetic Analyser (Applied Biosystems, USA), and subsequently analysed by GeneMapper ID-X v.1.5software (Thermo Fisher Scientific, Waltham, MA, USA) in comparison with the allelic ladders provided in the kit.The DNA typing and assignment of nomenclature were based on the ISFG recommendations (Bar et al. 1997;Lincoln 1997).2800 M Control DNA and Nuclease-Free Water were used as positive and negative controls, respectively.Our laboratory has participated and passed the YHRD quality control.The population accession number is YA005788.

Statistical analysis
Allele frequencies and haplotype frequencies were estimated by direct gene counting.Single-marker GD was calculated according to Nei with the formula GD = n(1-ΣPi2)/(n-1), where n is the total number of samples, and Pi is the relative frequency of the i-th allele (Clegg 1987).For multi-locus markers such as DYS385a/b, the haplotype frequencies were calculated on account of their two alleles.HD was calculated in the same way that GD was: HD = n(1-ΣPi2)/(n-1), where n and Pi denote the total number of haplotypes and the relative frequency of the i-th haplotype, respectively.DC was computed as the ratio between the number of different haplotypes and the total number of haplotypes.The analysis of molecular variance (AMOVA) test was conducted to calculate the population pairwise genetic distance (Rst) between Yunnan Va and another 22 adjacent reference populations.A multidimensional scaling (MDS) plot was generated based on Rst values.AMOVA and MDS were both performed in YHRD via online statistical tools (http://www.yhrd.org).With the exception of DYS385 and DYF387S1, multi-locus markers were ignored for Rst calculation and MDS analysis, and all haplotypes with duplications, micro variants, and null-alleles were excluded from this analysis based on the algorithm of YHRD's AMOVA tools.A neighbor-joining (NJ) phylogenetic tree was constructed using MEGA 6.0 software to help illustrate population relationships (Tamura et al. 2013).

Forensic parameters of the 23 Y-STR loci system when applied to the Yunnan Va population
Allelic frequencies and locus diversities for the 23 Y-STR markers typed in the Va population are shown in Table S1.138 different alleles were found at all 23 Y-STR loci, and corresponding allelic frequencies ranged from 0.0027 to 0.8179.The number of alleles at each locus varied from 3 at DYS391, DYS437, and DYS393, to 29 at DYS385a/b.The GD values of the 23 loci ranged from 0.3092 (DYS19) to 0.7868 (DYS385a/b).The highest GD was observed for the single locus marker Y-GATA-H4 (0.7542).17 out of the 23 loci had GD values greater than 0.5, with the exception of 6 loci: DYS576 (0.3942), DYS448 (0.4880), DYS389II (0.4527), DYS19 (0.3092), DYS391 (0.4886), and DYS390 (0.4968) (see Table S1).The haplotype distribution and associated frequencies are displayed in Table S2.A total of 204 different haplotypes were found at the 23 loci in the 368 Yunnan Va individuals, out of which 144 (70.59%) were unique, 36 appeared twice, 8 occurred three times, 7 were observed four times, 2 were shared among five, nine, and ten individuals, and 1 was shared among seven, eight, and thirty-seven individuals, respectively.The overall HD was calculated as 0.9852 with a DC of 0.5543.Micro variants were also observed at DYS385a/b (12.1,19,12.3,12.3 and 14.2,19) and DYS458 (15.1) loci, and no null allele was identified in this study.Individuals with non-standard alleles are listed in Table S3 and were confirmed by repeating the entire experimental procedure.The results indicated that the 23 Y-STR kit had a high power of discrimination and could be used to identify paternal lineage in the Va group in Yunnan Province.

Genetic relationship between the Yunnan Va population and reference populations in China or adjacent countries
The pairwise Rst values and associated p-values were calculated for Yunnan Va and another 22 populations in the YHRD database, and the results are shown in Table S4.Table S5 contains detailed information about the compared groups, including the number of haplotypes and the reference accession number.The 23 populations in which the number of samples was equal to or larger than 100 were chosen for the phylogenetic analysis.There was a significant genetic difference (p-values <0.0002, after Bonferroni's correction) between Yunnan Va population and the other 22 previously reported reference populations.Based on Rst values, the largest genetic distance was detected between Yunnan Va and Qinghai Tibetan (Rst = 0.3949), followed by Sichuan Tibetan (Rst = 0.3867), while the smallest genetic distance was identified with Hunan Miao population (Rst = 0.1549).Population relationships were visualised in the MDS plot and the NJ phylogenetic tree as shown in Figures 1 and 2. Overall, the distributions of 23 populations were in accordance with their ethno-geographical regions or language families.Sichuan Tibetan and Qinghai Tibetan, Yunnan Yi and Yunnan Lisu converged closely and formed different clades in the upper right quadrant and the bottom left quadrant of the MDS plot, respectively.Yunnan Bai and Guizhou Miao populations clustered closely with Yunnan Han and Sichuan Han, as well as ethnic minority groups of Hunan Miao, Sichuan Yi, Guizhou Bouyei, Thailand Yong, South Korea Korean, and even North Vietnam Kinh in the middle left of the MDS plot, whereas Yunnan Hui, Guizhou Tujia, Xinjiang Mongolian, Xinjiang Uighur and Inner Mongolia Mongolian were distributed in the middle right of the MDS plot.The MDS scatter diagram showed that 8 out of the 23 populations (Yunnan Va, Sichuan Tibetan, Qinghai Tibetan, Sichuan Qiang, Tsukuba Japanese, Yunnan Yi, Yunnan Lisu, and Uttar Pradesh Indian) fell into the surrounding area of the scatterplot.As shown in Figure 2, two main clusters were observed: Hui, Uighur, Mongolian, and Tibetan populations were clustered in the lower branch.However, the other ten southern Chinese populations and four neighbouring Southeast Asian populations were clustered together in the upper branch.Interestingly, the NJ phylogenetic tree and MDS plot demonstrated that the Yunnan Va population was distinct from the other compared groups.This could be attributed to Va research samples collected in the Ximeng Va Autonomous County, Pu'er City, Yunnan province, where the Va people live in compact communities, and this area was also known as the Ava hilly region, which was blocked by undulating mountain ridges some 2000 metres above sea level.The closed social environment and unique geographical areas may lead to a lack of gene flow with other populations.In the NJ phylogenetic tree, we also found that geographically adjacent or ethnically close populations have close genetic distances, and similar language family groups have a higher genetic affinity.For example, Yunnan Lisu, Yunnan Yi, Yunnan Han and Yunnan Bai, which are members of the Sino-Tibetan language family and live in the same area, appeared to have closer evolutionary relationships.The Altaic family includes Xinjiang Mongolian, Inner Mongolia Mongolian, and Xinjiang Uighur populations, so these groups form a distinct branch.Tibetans in Sichuan and Qinghai, who are descended from the same people, have a close genetic relationship.The genetic distance between the two groups Han and Qiang in Sichuan, who lived in the same area, is similar.Some of these phenomena were also found in previously published studies (He et al. 2017;Fan et al. 2019).

Conclusions
In summary, this is the first report of a Y-STR genetic database of the Chinese Va population, which will supplement the existing Y-STR population database.We were also able to use this data to assess the genetic relationships between Chinese Va and another 22 reference populations, and the results indicated that Chinese Va was isolated from other populations.

Figure 1 .
Figure 1.multidimensional scaling (mDs) analysis for yunnan Va and other 22 reference populations based on pairwise Rst values.

Figure 2 .
Figure 2. A neighbor-joining (n-J) phylogenetic tree of the yunnan Va population and other 22 reference populations was constructed based on a distance matrix of Rst.The lengths of bars represent genetic distances.