Population genetic analysis of 12 X-STR markers in Slovakia

Abstract Background: During the last 20 years, X-chromosomal STR markers have become widely used in forensic genetics and paternity testing. Nevertheless, to exploit their full potential in any given population, a reliable reference dataset needs to be established. Since no relevant studies concerning these markers have been performed on the Slovak population so far, we decided to analyse several commonly used markers in this population. Aim: To create an informative set of Slovak population data concerning X-STR markers. Subjects and methods: We genotyped 378 individuals and analysed 12 loci (DXS10148, DX10135, DXS8378, DXS7132, DXS10079, DXS10074, DXS10103, HPRTB, DXS10101, DXS10146, DXS10134 and DXS742) localised in four distinct linkage groups. Results: Our analysis showed that the most informative marker is DXS10135 (PIC = 0,927) and the most informative linkage group (LG) is LG1 with 149 different haplotypes. This analysis also confirmed linkage disequilibrium for two pairs of markers (DX10101-DX10103 and DX10101-HPRTB) within LG3 in female samples. No statistically significant departure from HWE was observed for any locus. Moreover, the interpopulation comparison of 8 European populations based on haplotype frequencies showed no statistically significant FST values in any LG, except for LG2 in comparison with the German population. Conclusion: We created a haplotype database for forensic analyses and kinship testing in Slovakia, as well as the CE dataset which can be used to further increase the decision power in similar analyses in the future.


Introduction
in recent years, many autosomal short tandem repeats (stRs) have been extensively analysed and adopted for human identification and parentage testing.this effort led to the development of 20-30 stR marker sets, which are generally sufficient to give reliable results in criminal casework and kinship analyses (Phillips, 2017).however, when more complex relationship cases are under investigation (the alleged father is not available for testing, sibship analyses, etc.), X-chromosomal markers can be used to either complement the weak or inconclusive information from autosomal loci or provide evidence on their own (e.g. if a female child is of incestuous origin) (Gomes et al. 2020).this has led to increased interest in their analyses from population and forensic perspectives in recent years.currently, X-stR data are available for many worldwide human populations (Gomes et al. 2020), mostly obtained with the use of investigator argus X-12 kit (Qiagen Gmbh, Germany).these datasets are available for most of the central european populations (czech Republic, hungary, Poland) (Łuczak et al. 2011;horváth et al. 2012;Zidkova et al. 2014), however, only one study on X-stR variation in the slovak population has been conducted, employing a small set of four X-stR loci and a sample size of just 116 individuals (cybulska et al. 2008).therefore, the aim of this study was to employ a larger sample (378 individuals) to examine all 12 stR loci incorporated in the argus X-12 kit and thus extend the number of X-stR markers analysed in the slovak population.subsequently, these data can be used primarily for deficiency paternity cases when the autosomal stR markers do not lead to desired results.

Samples and experiments
Genomic DNa was extracted from buccal swabs of 378 anonymised individuals (189 men and 189 females) using a forensic GeM kit (MicroGem international Plc, UK). the X-stRs amplification was performed using investigator argus X-12 stR kit (Qiagen Gmbh, Germany) according to the manufacturer's instructions.the stR fragments were separated and detected using a 3500 Genetic analyser (thermoFisher scientific, Usa) and individual genotypes were analysed using GeneMapper iD-X v1.4.allele designation followed the allelic ladder included in the investigator argus X-12 stR kit.
all samples used in this study were collected from routine paternity testing after obtaining signed informed content and further analysed in accordance with the valid system of the law of slovak Republic (Personal Data Protection act, act No. 18/2018 collections of laws).

Statistical analysis
the assessment of allele frequencies (both males and females together), the exact test of hardy-Weinberg equilibrium (only for female samples), and the linkage disequilibrium (lD) test (separately for female and male samples) were carried out using arlequin software v3.5 (excoffier and lischer, 2010).haplotype frequencies were calculated only in male samples.Pairwise F st values for the comparison of related populations based on haplotype frequencies and aMOVa test of homogeneity were also calculated with arlequine software v3.5. the computational tool available at chrX-stR.org2.0 database (szibor et al. 2006) was used to assess the forensic and paternity testing parameters of all X-stRs including polymorphism information content (Pic), homozygosity (hOM), heterozygosity (het), paternity exclusion chance in trios and duos (Mec t and Mec D ) and the power of discrimination for males (PD M ) and females (PD F ). the cumulative values of PD M and PD F (cPD F and cPD M ) were also evaluated using Microsoft excel.additionally, PD M for four distinct linkage groups as well as the cumulative PD M was computed (with the use of haplotype frequencies observed in males) using Microsoft excel.
Multidimensional scaling plot analysis was performed using Xlstat software v2022.3.2 (addinsoft, 2019).all parameters were calculated at 5% significance level and Bonferroni correction for multiple testing was applied.

Results
the allelic frequencies of 12 X-stRs are summarised in supplementary table 1. a total of 178 alleles were observed among the 12 X-stR markers.the Pic and het values of all studied loci were higher than 0.6 indicating that all of these loci are highly polymorphic.the DXs10135 locus was the most informative marker (Pic = 0.927), with 27 alleles.the least polymorphic locus was DX8378 (Pic = 0.629), with only 6 alleles.all evaluated forensic and paternity testing parameters for X-chromosome markers are listed in table 1.
No statistically significant departure from hWe (female samples only) was observed for any locus (supplementary table 2).Furthermore, genotyping results revealed a significant lD (p ˂ 0.0008, after Bonferroni correction for multiple testing) in female samples for two pairs of markers (DX10101-DX10103 and DX10101-hPRtB) within linkage group 3 (lG3) (supplementary table 3). the lD test in male samples confirmed the lD between markers DX10101 and DX10103, but not between markers DX10101 and hPRtB.Moreover, in male samples, significant linkage disequilibrium was observed between markers DX10146 and DX7423 within lG4 (supplementary table 4).No lD was observed within lG1 and lG2, which is probably due to their large diversity, requiring an even larger sample size to detect possible lDs between these markers (Kling et al. 2015;Bergseth et al. 2022).
When examining the data as a group of markers (linkage groups or haplotypes), the observed number of haplotypes was 149, 110, 108 and 127 for lG1, lG2, lG3 and lG4, respectively, which comprised 4.6 − 7.6% of the total amount of possible haplotypes for each lG.lG1, the most informative group of markers, included the most common haplotype with a frequency of 2.6%, whereas the least informative lG -lG3 included the same haplotype with a frequency of 4.2% (table 2). the highest PD M exerts lG1 − 0.9917, followed by lG4 − 0.9895 (table 2), while the cumulative PD M was estimated to 0.999999983.haplotype frequencies for all lGs are summarised in supplementary table 5.
the observed haplotypes were further analysed to compare the slovak population data with other european populations, namely czech (Zidkova et al. 2014), hungarian  ).this is in agreement with other studies that have shown that european populations are quite homogeneous with respect to these four linkage groups (Mršič et al. 2018;Rębała et al. 2015).When the pairwise F st values were used for creating an MDs plot, no clear grouping according to the geographical or linguistic characteristics of individual populations was observed, although the slovak and czech populations are located closer to one another than to other populations (Figure 1).this may reflect their common history, as both nations were part of a single state unit (former czechoslovakia) from 1918 to 1992.as the F st analysis resulted in very low and nonsignificant values in all comparisons, (supplementary table 6), and aMOVa test of homogeneity between czech, slovak and hungarian populations also showed no population stratification (supplementary table 7), we combined these three geographically proximal haplotype frequencies databases (czech, slovak and hungarian) into one unit, a ce dataset (central europe, supplementary table 8), to increase the exclusion or inclusion probabilities in complex kinship analyses employing X-chromosome stRs.

Discussion
the basic population data and allele frequencies in this study showed that all 12 examined X-stR markers are highly polymorphic (up to 27 alleles).interestingly, locus DXs10135 (most polymorphic in our dataset) is the most polymorphic locus in many worldwide X-chromosome stR studies, whereas loci DXs7423 and DXs8378 exert consistently low Pic, PD F and PD M values and only a few alleles (Zhang et al. 2012;salvador et al. 2018;García et al. 2019;Kakkar et al. 2020;Bini et al. 2021).the small number of alleles at both loci could be primarily explained by very low mutation rates (about ten times lower than DXs10135 locus) (Pinto et al. 2020).subsequently, no statistically significant deviation from hardy -Weinberg equilibrium was detected for all 12 loci, indicating no signs of population stratification.Moreover, linkage disequilibrium (lD) analysis revealed lD between marker pairs only within lG3 and lG4, but not within lG1 and lG2.this is not surprising as due to large diversities within these two lGs, detection of lDs between markers requires a larger sample size (Kling et al. 2015;Bergseth et al. 2022).the study of haplotype data in the swedish population also showed a high probability of not detecting lD in these two lGs when the sample size was less than 400 male individuals (Kling et al. 2015).therefore, we still kept the concept of four lGs, assuming meiotic transmission of markers within a single group without recombination.
since the estimation of lD is important for kinship analyses based on X-chromosomal markers, the use of haplotypes instead of single markers are recommended whenever lD has been observed (tillmar et al. 2017).lD between markers within a single lG also leads to the need for larger haplotype databases to cover a significant number of possible haplotypes (Pinto et al. 2020).as all haplotype frequencies would not be known, a mathematical model has been developed to estimate the frequencies of unobserved haplotypes (lambda model) and this approach is implemented in FamlinkX software (Kling et al. 2015).however, previous simulations showed that the lambda model is still sensitive to unobserved haplotypes with expected rare frequencies and can sometimes lead to false results (tillmar et al. 2017), making the database with a higher number of possible haplotypes even more important.therefore, we performed aMOVa test of homogeneity on three historically and geographically related populations (czech, slovak and hungarian) to explore the possibility of integrating them into one larger haplotype dataset.interestingly, only a negligible fraction of variation is contained between these populations and more than 98% of this variation is comprised within individual populations, with very small non-significant F st values.taking together the data from pairwise F st analysis and aMOVa test, we pooled the haplotype data of all three populations and established a ce (central europe) dataset, which might be useful in population studies in the future.

Conclusion
X-chromosomal stRs have become a useful tool in forensic and kinship practice.to introduce these markers for analyses in the slovak population, we explored the genetic and forensic parameters of 12 X-stR markers in 378 individuals and established a haplotype dataset for future forensic analyses and kinship testing.We also created a combined ce dataset of three proximal populations to increase the decision power in complex kinship analyses when X-chromosomal stR markers are preferably used.

Figure 1 .
Figure 1.Two-dimensional mDs plot is drawn from f sT genetic distances calculated from haplotype frequencies of nine European populations.Kruskal′s stress = 0.134.

Table 1 .
Paternity testing and forensic indices of 12 X-chromosome sTR markers in slovak population.

Table 2 .
Haplotype statistics and PD m values of four X-sTR linkage groups in 189 males of slovak population.
m : Power of discrimination (males).