Unveiling genetic diversity and forensic utility of SureID® human DNA identification kits: a comprehensive analysis of 44 autosomal STR loci in English and Irish populations

Abstract Background: Human identification and kinship testing in forensic science rely on Short Tandem Repeat (STR) multiplex kits, typically containing loci recommended by standard sets. However, complementary kits with additional STR loci can be valuable in complex cases. Allele frequency databases specific to the population are essential for accurate forensic analysis. Aim: This study aimed to generate allele frequencies and population genetic data for 44 autosomal STR loci from SureID® PanGlobal and 27comp kits in English and Irish populations for forensic casework, human identification, and kinship testing. Subjects and methods: Buccal swab samples from 645 White Caucasians (365 English, 280 Irish) were collected. DNA was extracted and amplified using the mentioned kits. Quality control, statistical analysis, and genetic distance calculations were performed. Results: Both kits demonstrated robustness with no significant deviations from Hardy-Weinberg Equilibrium (HWE). Variant alleles and minor discordances between kits were observed. Syntenic STR pairs were identified but showed no significant linkage. A close genetic relationship was found between English and Irish populations, allowing for combined databases. Conclusions: The SureID® PanGlobal and 27comp kits showed high discriminatory power and reliability in the English and Irish populations. Care is needed when handling variant alleles, discordances, and syntenic loci. Combining data from both populations is feasible for a comprehensive database. Further studies are required to explore their effectiveness in diverse populations.


Introduction
Human identification and kinship testing utilise Short Tandem Repeat (STR) multiplex kits, which normally contain STR loci recommended by the European Standard Set (ESS) (ENFSI DNA Working Group 2019) and Combined DNA Index System (CODIS) (FBI 2017).Testing these recommended STR loci ensures international compatibility with DNA databases and enables data sharing for forensic investigative purposes.However, complementary kits containing STR loci outside these standard sets could be beneficial in complex cases where these loci do not provide enough discrimination, such as, but not limited to, mixed samples, close kinship, and incest (Alsafiah and Goodwin 2022).Moreover, it is also important that specific allele frequency databases are developed for the populations where the kits are used, especially for complementary STR loci, which are less studied (Schneider 2007(Schneider , 2012;;Butler 2015).Where no specific allele frequency database exists, the use of a global database would lack a true individual representation of the samples in question (Iyavoo et al. 2019).
In population studies, the PanGlobal kit has been used to examine the forensic features and genetic structure of the Hotan Uyghur population, adjacent to Central Asia, for which it was found to be highly polymorphic, making it suitable for paternity testing as well as forensic personal identification (Chen et al. 2019).This kit has also been used for a study on a Central Indian population alongside other multiplex STR kits (Dash et al. 2021).In an investigation that statistically evaluated parentage determination on Indian populations, this kit gave the highest values for parameters of forensic importance as well as a high degree of forensic efficacy for identification (Shrivastava et al. 2021b).The efficacy for direct amplification of this kit was also successfully tested with a reduced PCR reaction volume of 10 µL on DNA extracted from saliva samples (Shrivastava et al. 2021a).However, when used on degraded, aged DNA, the performance of this kit was found to be very limited and less optimal (Liu et al. 2022).
Regarding the 27comp kit, to our knowledge, there are currently no published studies available.Some investigations that used the 23comp kit are accessible, such as cross-species validation (Singh et al. 2019), an allele frequency study (Iyavoo et al. 2019), as well as an evaluation of its use for kinship testing (Alsafiah et al. 2019).The 23comp kit has also been validated with the minimum criteria to meet the Scientific Working Group on DNA Analysis Methods (SWGDAM) and European Network of Forensic Science Institutes (ENFSI) validation standards for a supplementary STR kit (Alsafiah et al. 2019;Iyavoo et al. 2019).
The aim of this study was to generate allele frequencies and population genetic data for 44 autosomal STR loci from both PanGlobal and 27comp kits in English and Irish populations for use in forensic casework, human identification, and kinship testing.

Population samples
Buccal swab samples were collected from a total of 645 persons of European descent, comprising 365 English and 280 Irish unrelated individuals with self-proclaimed ethnicity for three generations.Informed consent was obtained during sampling under Anglia DNA Services terms and conditions (Anglia DNA Services 2018).Samples were anonymised and used for internal concordance database creation.

DNA profiling
DNA from buccal swabs was extracted using the Buccalyse DNA Release Kit (Isohelix ™ ).Extracted DNA samples were directly amplified using both SureID ® PanGlobal Human DNA Identification Kit (Ningbo Health Gene Technologies) and SureID ® 27comp Human DNA Identification Kit (Ningbo Health Gene Technologies) on the GeneAmp ™ PCR System 9700 with the Silver 96-Well Block (Applied Biosystems ™ ).Capillary electrophoresis was carried out using an ABI Prism ® 3500xL Genetic Analyser (Applied Biosystems ™ ).Allele designations were made on GeneMapper ® ID-X Software v1.6 software (Applied Biosystems ™ ) with the aid of allelic ladders provided by the manufacturer.All procedures were performed as per the manufacturer's instructions.

Statistical analysis
Quality checks for any matching profiles or profiles sharing alleles at all loci were performed using the GenAlEx 6.5 platform (Peakall and Smouse 2012).Forensic and population genetic parameters were generated using STRAF 2.1.5(Gouy and Zieger 2017) and FORSTAT (Ristow and D'Amato 2017), web-based forensic and population genetic analysis tools.Hardy-Weinberg Equilibrium (HWE) was determined via STRAF, employing 1,000 permutations.The obtained results were then evaluated using both the conventional significance threshold p-value of 0.05 and an adjusted p-value of 0.001, derived through Bonferroni correction (adjusted p-value = 0.05/44) (Guo and Thompson 1992;Bland and Altman 1995).Each pair of markers in the combined kits was tested for linkage by calculating the p-values for pairwise Linkage Disequilibrium (LD) using STRAF.STRAF was also used to generate the principal component analysis (PCA) scatter plot.

Allele frequencies and population genetic parameters
The allele frequencies (Supplementary Table S1) and population genetic parameters (Supplementary Table S2) were  successfully generated for the English and Irish populations for 44 autosomal STR loci from both PanGlobal and 27comp kits.The allele frequencies for both populations are graphically represented in Figure 1, while Figure 2 shows the allele count for each population with markers grouped by kit.In both figures, locus SE33 clearly emerged as the most polymorphic, with 37 alleles in the English population and 35 in the Irish population.In contrast, markers D5S818, TH01, and TPOX in the Irish population, and D6S474 in both populations appeared as the least polymorphic, with 6 alleles.Since locus SE33 is absent in the 27comp kit, the most polymorphic marker in this kit was D21S2055 for both populations, with 23 alleles in the English and 22 in the Irish population.In general, the Irish population showed less alleles than in the English, likely due to the lower number of individuals sampled.
The power of discrimination (PD) for SE33, the most discriminating locus in the PanGlobal kit, was 0.9924 for the English and 0.9908 for the Irish population.On the other hand, the most discriminating loci in the 27comp kit were D1S1656 (PD = 0.9790) for the English and D12S391 (PD = 0.9775) for the Irish population.For the PanGlobal kit, the combined power of exclusion (PE) was 0.999999999991 for the English and 0.999999999990 for the Irish population, while in the 27comp kit, the PE was 0.999999999995 for the English and 0.99999999997 for the Irish population.However, with the combined 44 STR loci from both kits, the PE was calculated as 100% in both populations, with match probabilities (MP) of 1 in 1.40 × 10 52 and 1 in 8.19 × 10 51 in the English and Irish population respectively.The MP were 1 in 3.07 × 10 29 in the English and 1 in 1.87 × 10 29 in the Irish population for the PanGlobal kit, and 1 in 8.03 × 10 29 in the English and 1 in 6.40 × 10 29 in the Irish population for the 27comp kit.
No significant deviation from HWE was observed after applying the Bonferroni correction to account for the number of STR loci tested (p-value = 0.001), with the lowest p-value recorded at 0.0069 (specifically at the TPOX locus within the Irish population).Nevertheless, the indiscriminate application of the Bonferroni correction to HWE tests in the context of population genetics has been a topic of contentious debate.Critics argue that, when applied to HWE tests, the null hypothesis (H 0 ) of the Bonferroni correction may be irrelevant, while it becomes easier for researchers to prematurely accept H 0 and conclude that a locus conforms to HWE.Additionally, when this correction is applied to tests involving numerous loci, the probability of a single test conforming to HWE can greatly increase, and the probability of type II errors (accepting H 0 when the alternative hypothesis is true) is also increased (Ye, Wang and Hou 2020).Consequently, to address these concerns, the statistically significant p-value of 0.05 without corrections was also employed to assess potential deviations and facilitate a more comprehensive discussion of these findings.With a p-value of 0.05, loci D3S1744, D11S2368, D13S317, and D14S1434 in the English population, and loci D12S391, D19S253, D19S433, and TPOX in the Irish population showed deviations from HWE.The cause of these deviations could be from the self-declaration of ethnicity by donors, due to bias, or it could also be due to inbreeding, population substructure, selection, or population size.This has previously been reported in the studies of Walsh et al. (2003) and Iyavoo et al. (2019) where the developed databases were determined to be fit for forensic use.Another common cause of deviation from equilibrium is genotyping error.In STR typing, null alleles resulting from primer binding site mutations are frequently encountered and can lead to lack of heterozygosity and false rare homozygotes, which would cause deviation from HWE (Graffelman and Weir 2022).However, the observed and expected heterozygosity for the 7 markers deviating from the equilibrium obtained in this study (Supplementary Table S2) did not reveal incongruencies and appeared in line with the results for the other markers and results from other investigations (Choueiri et al. 2006;Iyavoo et al. 2022).

Variant alleles
Several variant alleles were observed in the English (Supplementary Table S1) and Irish (Supplementary Table S2) populations.A variant allele is an allele which is not covered by the binset of a kit (Gettings et al. 2015).Among them, the most frequently observed variant alleles were allele 14.3 at locus D1S1656 in both populations and allele 20.3 at locus D13S325 in the English population.Allele 14.3 at locus D1S1656 is covered in the binset of common forensic STR kits such as GlobalFiler ™ PCR Amplification Kit (Applied Biosystems 2019), PowerPlex ® Fusion System (Promega Corporation 2020), and VeriFiler ™ Express PCR Amplification Kit (Applied Biosystems 2023).Locus D13S325 is not recommended by ESS and CODIS, thus allele frequencies for this marker are not available on online databases such as the National Institute of Standards and Technology (NIST) Population Dataset (STRBase; https://strbase.nist.gov/NISTpop.htm) (Ruitberg 2001), pop.STR (http://spsmart.cesga.es/popstr.php) (Amigo et al. 2009), and ALFRED, the Allele FREquency Database (https://alfred.med.yale.edu/)(Rajeevan et al. 2012) [accessed 2023 Feb 17].The ENFSI DNA working group STR population database (STRidER; http://strider.online)(Bodner et al. 2016) has reported locus D13S325 allele frequencies for the Saudi population but allele 20.3 was not observed [accessed 2023 Feb 17].In addition, microvariant alleles 15.3 and 16.3, which were detected at locus D1S1656, were identified as the most frequent variant alleles in the PanGlobal kit, but not in the 27comp kit since, in this kit, both these alleles were covered by the virtual bins.
Comparison of 6 overlapping loci between PanGlobal and 27comp showed consistent allele calls between them.Thus, these overlapping loci could be used for quality assurance, to prevent any sample mix-up when the 27comp kit is used as a complementary kit for the PanGlobal kit to solve any complex cases requiring additional loci.

Concordance study
Concordance study using control DNA samples and comparison of common loci (Supplementary Table S3) revealed that, for HDplex kit at locus D6S474, the alleles were 1-repeat less compared to the profiles of 27comp and 23comp kits.This has been previously reported in a population study using the 23comp kit (Iyavoo et al. 2019).Also, discordance at locus D15S659 was observed between both the 27comp and 23comp kits, where the alleles in the 23comp kit were 1-repeat less than the 27comp kit.Since the 27comp kit is an extended version of the 23comp kit with four additional STR loci, the discordance between these kits was not explained and rectified by the manufacturer.For the STR loci in the PanGlobal kit, no mismatch was found when comparison of common loci was carried out with other kits.As reported previously in a study conducted by Shrivastava et al. (2021b), concordance of PanGlobal loci was also found with AmpFℓSTR ™ Identifiler ™ Plus PCR Amplification Kit, PowerPlex ® 21 System, GlobalFiler ™ PCR Amplification Kit, PowerPlex ® Fusion 6 C System, and VeriFiler ™ Plus PCR Amplification Kit.
The general assumption of recombination rate in humans predicts no linkage when a genetic distance of at least 50 cM is present between two loci, where 1 cM ≈ 1 Mb (Pritchard and Przeworski 2001;O'Connor and Tillmar 2012;Tillmar et al. 2017).However, especially in kinship tests, the estimated recombination fraction rather than physical distance should be considered to evaluate the possible impact of closely located loci (Phillips et al. 2012).Hence, the recombination fraction for syntenic pairs with less than 50 Mb distance has been reported, as calculated by Phillips 2017, in  be observed between 3 syntenic pairs located less than 10 Mb apart: SE33-D6S1043, D18S51-D18S1364, and D21S2055-Penta D, while vWA-D12S391, located 6.36 Mb apart, had a recombination rate of 11.7%.Even if the recombination fraction for the pair D21S11-D21S2055 was not available, the genetic distance and the recombination rate calculated between D21S11-D21S1270 and D21S1270-D21S2055 (0.16-0.17) would lead us to assume that the recombination between D21S11 and D21S2055 would be >10%.The pair vWA-D12S391 has been previously investigated and revealed the necessity of introducing linkage in the calculation of likelihood ratios for kinship testing primarily in scenarios where the phase of the individuals typed was of relevance e.g.incest (O'Connor and Tillmar 2012).
To test for LD between each pair of loci in both the English and Irish populations, the p-values for the presence of linkage were calculated (Supplementary Table S4).As shown in Table 1, no significant linkage between each of the syntenic pairs in the combined kit was observed, apart from D3S3045-D3S1744 in the Irish population, with a p-value of 0.0349.This shows that in the datasets presented in this study, no association between the closest syntenic loci (<10 Mb) was observed.
Regardless of these data, since ignoring any degree of physical linkage between syntenic STRs could be problematic (Nothnagel et al. 2010;O'Connor and Tillmar 2012), care would be advised for closely located markers.In such cases it should be considered either excluding one of the markers in the syntenic pair, considering the pair as a single marker or using biostatistical tools to evaluate the impact of the linkage on the likelihood ratio (O'Connor and Tillmar 2012; Phillips et al. 2012;Tillmar and Phillips 2017).

Genetic distance study
A genetic distance study between the English and Irish populations using 44 STR loci from both kits revealed genetic similarity between them (pairwise Fst = 0.0005), with low differentiation defined as Fst between 0 and 0.05 (Sethuraman 2013).This similarity is also illustrated in PCA scatter plot (Figure 3), where the first two principal components, PC1 and PC2, account for a minuscule 0.75% and 0.78% of the variance between these two populations, respectively.This shows the genetic affinity between them, enabling the combination of both populations as a single database, if required.The genetic proximity observed between English and Irish populations mirrors the outcomes reported in a previous study utilising the VeriFiler Express PCR Amplification Kit, which yielded a Fst value of 0.0013 along with PC1 and PC2 values of 0.88% and 0.86%, respectively (Perry et al. 2023).
Comparison with other populations was not performed as there is currently a lack of population data for the STR markers present in these kits.

Conclusion
The SureID ® PanGlobal Human DNA Identification Kit, containing both CODIS and ESS recommended loci, has already been evaluated in some surveys on Asiatic populations.In contrast, the SureID ® 27comp Human DNA Identification Kit is a relatively new kit that shows potential in increasing the discrimination power when used in combination with other kits.The outcomes of this study highlight the exceptional discriminatory power and robustness of both the PanGlobal and 27comp kits for kinship testing and forensic identification within the English and Irish populations.Whether employed individually or in conjunction, these kits exhibited no deviation from HWE when the Bonferroni correction was applied.Furthermore, when a p-value of 0.05 was utilised, only a minimal number of significant deviations were observed, likely attributable to false positives.
Since these kits are not well studied for European populations and frequency databases are lacking, allele calls for some of the variant alleles (labelled as off-ladder alleles) should be assigned with care.Laboratories using these kits could also add virtual bins to cover the variant alleles, at their discretion.Moreover, due to the discordance found between the 27comp kit and the HDplex and 23comp kits, it is recommended that, for results comparison or utilisation of databases generated using the 27comp kit profiles, caution should be exercised to avoid wrong allele calls.
Another issue to keep under consideration when using these kits, especially when combined, is the presence of syntenic loci that could lead to biased likelihood ratio calculations for kinship testing.In this study, no significant LD was observed between the most closely related loci.However, careful consideration should be given to pick a method to avoid this bias.
To summarise this evaluation, the genetic distance between the two populations was estimated, revealing a very close relationship.This meant that the data for the English and Irish populations could be combined to provide a well-sized database for these two SureID ® kits.More population studies using these kits would be required to generate data that could be used for comparison purposes.This would provide a deeper understanding of the effectiveness of these markers in distinguishing amongst populations.
Ultimately, in order to benefit from the increased discrimination power, the laboratories deciding to utilise these two kits, individually or combined, will have to apply their judgement to evaluate if they can take the necessary steps to address the problems originating by the current lack of frequency database information, the existence of variant alleles, the observed occasional discordance, and the presence of syntenic loci.

Figure 1 .
Figure 1.graphical comparison of allele frequency distribution between English and irish populations for each marker of the combined sureiD ® kits.

Figure 2 .
Figure 2. Allele count for the combined sureiD ® markers in English and irish populations.The dashed red line represents the minimum number of alleles counted among the markers under consideration.The green frame represents markers from the 27comp kit, while the blue frame represents markers from the Panglobal kit.The 6 markers included in both frames represent loci shared by both kits.

Figure 3 .
Figure 3. PCA scatter plot portraying the genetic relationship between English and irish populations.

Table 1 .
Phillips (2017)ination rate below 10% could Chromosomal position and approximate genetic location (gRCh37) for 44 sTR loci obtained from combining the Panglobal and 27comp kits.pair with genetic distance below 50 mb (in bold), the recombination fraction (Kosambi mapping function) is reported.sTRpairswith a recombination rate below 10% are reported in bold.Data were retrieved fromPhillips (2017).Recombination fraction for the D21s11-D21s2055 pair was not available in the Phillips (2017) article and it was therefore omitted (n/A).Pairwise lD p-values calculated using sTRAf for syntenic pairs in both the English and irish populations are also shown, with significant p-values (<0.05) in bold.