Development of an Okinawa panel for biogeographic inference of Okinawans

Abstract Background The Precision ID Ancestry Panel with 165 SNP markers was unable to differentiate between mainland Japanese and Okinawa Japanese or to distinguish either of them from other East Asian populations. Aim An Okinawa panel was developed with the aim of further separating Okinawa Japanese individuals from mainland Japanese and other Asian groups. Seventy-five SNPs were selected using the most informative markers from the literature. Further, 22 SNPs were selected to separate Okinawa Japanese at minimum SNPs. Subjects and methods Samples were collected from 48 unrelated individuals from mainland Japan and 46 unrelated residents of the Okinawa prefecture. Data were evaluated by STRUCTURE, principal component, and GenoGeographer analyses. Results The 22 SNP set had similar levels of differentiation in STRUCTURE and PCA analyses as the 75 SNP set. GenoGeographer analysis showed that, out of the 46 Okinawa Japanese individuals, the 75 SNP and 22 SNP sets correctly assigned the Okinawan population as the most likely population of origin for 32 and 31 individuals, respectively. Conclusion Neither SNP set could completely differentiate between Okinawa Japanese and other Asian groups, however, these sets should be useful for crime investigation, when the sample, cost and time are limited.


Introduction
ancestry informative markers (aiMs) are useful for estimation of the biogeographic background of an individual and may be used to narrow down the number of possible suspects to a crime (hollard et al. 2017; sun et al. 2019).the development of next generation sequencing and commercial kits specifically designed for estimation of biogeographic ancestry (Jäger et al. 2017;Pereira et al. 2017) have made it easier to implement ancestry inference as a routine investigation in forensic genetic laboratories.Most commercial panels were developed for separation of main population groups, which means that intracontinental variation is usually not represented, and that these panels may not be suitable to distinguish among closely related populations.therefore, specific sets of aiMs that can differentiate among populations within a specific region, such as North africa and the Middle east populations (Pereira et al. 2019;truelsen et al. 2021), european and south asian populations (Phillips et al. 2013), east asian countries (shi et al. 2019), Korea (Jung et al. 2019) and Japan (Yuasa et al. 2015) have been reported.the population structure of modern Japan is described by the dual structure model.the first inhabitants of the Japanese archipelago (Jomon people) came from southeast asia during the Upper Palaeolithic, approximately 30,000 years before current era (Bce).historical evidence documents that this group admixed with a second group of migrants from Northeast asia in the Yayoi age approximately 300 Bce (Yayoi people) (hanihara 1991).although the dual structure model has not been rejected completely, recent genetic analyses suggest that the Japanese population might have an even more complex structure (Jinam et al. 2015).the current ainu population, which are indigenous groups living on the island of hokkaido in the north most part of Japan, and the Ryukyuan population, which are indigenous groups living on the Okinawa island in the south, are believed to retain more Jomon components based on morphological measurements (Jinam et al. 2015).Phylogenetic analyses of 641,314 sNPs showed that the mainland Japanese were located between an ainu-Ryukyuan cluster and continental asians, suggesting that they are an admixed population with components from these two populations (Jinam et al. 2012).Yamaguchi-Kabata et al. (2008) analysed 140,387 sNPs in 7,001 individuals from Japan and showed that the Ryukyu (another name for Okinawa) individuals formed a distinct cluster that was clearly separated from the mainland individuals.however, the large number of sNPs examined in the study is not realistic to be routinely applied in forensic genetic investigations.Nakanishi et al. (2018) investigated mainland Japanese and Okinawan Japanese using 165 sNPs from the Precision iD ancestry Panel (thermo Fisher scientific, Waltham, Ma) and performed population genetic analyses in the two populations.their results showed that the panel could not differentiate between mainland Japanese, Okinawan Japanese, and other asian populations.however, despite the lack of resolution for the targeted populations, several sNPs were identified that significantly differed amongst mainland Japanese and Okinawan Japanese.
today, Okinawa is one of the most popular tourist sites in Japan and a lot of tourists from mainland Japan and nearby countries visit the islands.Moreover, many americans live in Okinawa because there are several active Us military bases on the main island.therefore, in Okinawa prefecture, which is a multi-ethnic environment, the DNa profiling in criminal investigations should be important.Moreover, the sample, cost, and time are generally limited in a criminal investigation, so the ability to distinguish Okinawan Japanese using a small number of sNPs should be useful.in this work, we developed an sNP set to further differentiate between Okinawan and mainland Japanese.to reduce the number of sNPs needed for the investigations, we selected and evaluated the most suitable sNPs.Further, we created an "Okinawa-panel" with as few sNPs as possible, that may separate Okinawa Japanese individuals from mainland Japanese and other asian groups.

Samples and DNA extraction
samples were collected from 48 unrelated individuals from mainland Japan living in 34 prefectures and 46 unrelated residents of the Okinawa prefecture.the parents of the individuals typed in this study were from the same prefecture.Genomic DNa was extracted from buccal swabs or Fta cards using the Qiaamp DNa Mini Kit (Qiagen, Venlo, the Netherlands) following the recommendations of the manufacturer.informed consent was obtained from the participants.

Selection and sequencing of SNPs
supplementary table s1 lists the 75 sNPs evaluated in this study, which were selected from the studies of Yamaguchi-Kabata et al. ( 2008 2018) (5 sNPs).these sNPs were selected from sNPs showing (significant) differences in genotype frequencies between the mainland Japanese and Okinawa Japanese.Moreover, genetic distance for each sNP was also considered. the 27 sNPs selected from the study of Nakanishi et al. (2018) were typed by the Precision iD ancestry Panel based on methods in their study.the remaining 48 sNPs were typed by sanger sequencing using the BigDye terminator v. 1.1 cycle sequencing Kit (thermo Fisher scientific) and a 3730 Genetic analyser (thermo Fisher scientific).the primer sequences are shown in supplementary table s2.PcR amplification of 48 sNPs was performed in 18 separate reactions containing 1-4 sNPs, as shown in supplementary table s2.twenty microlitre reaction mixtures contained 10 μl KOD One PcR Master Mix (toyobo, Osaka, Japan), 1 μl of 10 μM oligonucleotide primers (final concentration = 0.5 μM each), and 2 ng template DNa.three-step PcR was performed using a simpliamp thermal cycler (thermo Fisher scientific) with the following program: 40 cycles of 98 °c for 10 s, 60 or 62 °c for 5 s, and 68 °c for 1 s.two-step PcR was performed using a simpliamp thermal cycle with the following program: 40 cycles of 98 °c for 10 s and 68 °c for 5 s.

Data analysis
allele frequencies, hardy-Weinberg equilibrium (hWe), and Wright's Fixation index (Fst) by locus-by-locus analysis of molecular variance (aMOVa) were calculated using arlequin v3.5.2.2 (excoffier and lischer 2010).hWe analysis was carried out using 1,000,000 Markov chain Monte carlo (McMc) steps and 1,000,000 dememorization steps.Pairwise linkage disequilibrium analyses and genetic differentiation among populations using Fisher's exact test were calculated for each sNP using Genepop v. 4.7.5 (Rousset 2008).Both tests were carried out using 10,000 McMc steps, 1,000 batches, and 10,000 iterations per batch.correction for multiple testing was performed according to Bonferroni (1936), by dividing the significance level of 0.05 by the number of comparisons.
stRUctURe, Pca, and GenoGeographer interpopulation analyses were done using data from five asian populations.in addition to the mainland Japanese (MNl) and Okinawa Japanese (OKi), reference data from five biogeographic regions (93 chinese Dai in Xishuangbanna, china (cDX); 103 han chinese in Beijing, china (chB); 105 han chinese south (chs); 104 Japanese in tokyo, Japan (JPt); and 99 Kinh in ho chi Minh city, Vietnam (KhV)) were collected from the 1,000 Genomes Project (https://www.genome.gov/27528684/1000genomes-project). the distribution of genetic ancestry was investigated using stRUctURe v. 2.3.4 (Pritchard et al. 2000;Falush et al. 2003) with 100,000 steps of burn-in followed by 100,000 repetitions for the McMc."admixture" and "correlated allele frequencies" models were considered.the number of assumed populations (K) varied from K = 2 to K = 6 and three independent runs were performed for each value of K to check the consistency of the results.to determine the most likely value of K, structure harvester v. 0.6.94 was used (earl and vonholdt 2012), which implements the evanno method (evanno et al. 2005).the results of stRUctURe were visualised using clUMPP v. 1.1.222(Jakobsson and Rosenberg 2007) and Distruct v. 1.1.23software (Rosenberg 2003).Pca analysis was performed using an in-house script written in Python based on the software sklearn.decomposition.Pca v. 0.16.1 (halko et al. 2011).
each individual was further analysed using the GenoGeographer software (tvedebrink et al. 2017;Mogensen et al. 2020), which was developed to assign the most likely population of origin and calculate the statistical weight of the evidence in the form of a likelihood ratio (lR).a z-score was computed for each individual, considering the seven asian populations, including MNl and OKi, as reference.the analyses were performed using a leave-one-out approach, excluding the individual tested from the reference dataset each time the analysis was conducted.the population assignment was classified as "Rejected" if no reference population was assigned (z-score > 1.64, p-value < 0.05), "ambiguous" if more than one metapopulation was assigned (z-scores ≤ 1.64; p ≥ 0.05) and the population likelihoods were not significantly different from each other (z-scores ≤ 1.64; p ≥ 0.05), or "accepted" if only one metapopulation was assigned (z-score ≤ 1.64; p ≥ 0.05) or, if more than one metapopulation was assigned (z-score ≤ 1.64; p ≥ 0.05), one of the population likelihoods was significantly higher than the population likelihoods of all other assigned metapopulations (p < 0.05).

Results and discussion
the allele frequencies for the 75 loci in the mainland Japanese and the Okinawan Japanese are listed in supplementary table s3.None of the loci were monomorphic in either population, and none of the 75 sNPs showed statistically significant deviation from hWe in either population (supplementary table s4; p-value after Bonferroni correction < 0.0007).No significant association between loci was found by pairwise lD testing (data not shown).supplementary table s5 lists the numbers of genotypes in the mainland and Okinawan Japanese, and the p-values by Fisher's exact test for the 75 markers.Overall, 35 sNPs showed significant differences between the mainland and Okinawan Japanese.One of these sNPs was rs17822931-G/a (538 G > a; Gly180arg), located in the aBcc11 gene, that is responsible for determination of the earwax type (Yoshiura et al. 2006); aa homozygotes have dry ear wax, whereas aG heterozygotes or GG homozygotes have wet ear wax (Ohashi et al. 2011). in this study, the frequency of the G allele (presented as the complementary allele, c on supplementary table s3) was significantly higher in the Okinawa Japanese than in the mainland Japanese (p = 0.006).this is in agreement not only with a prior study (Yamaguchi-Kabata et al. 2008) but also with the report that this sNP has different allele frequencies among the Jomon and Yayoi people (super science high school consortium 2009).
supplementary table s6 shows the locus-by-locus aMOVa Fst results and respective p-values for the 75 markers.From these, a subset of 22 sNPs was selected based on an Fst of > 6.5% and significant p-value (p < 0.05) between the mainland Japanese and the Okinawan Japanese.stRUctURe, Principal component, and GenoGeographer analyses using the original 75 sNP set and the subset of 22 sNPs were performed.Besides the two Japanese populations, the analyses also included reference population data for other asian populations from the 1000 genomes database (https://www.genome.gov/27528684/1000-genomes-project).
stRUctURe analyses were run from K = 2 to K = 6 (data not shown).Figure 1 shows the stRUctURe results for the optimal K = 3 detected by structure harvester for the 75 (Figure 1a) and 22 sNP (Figure 1B) sets.the global apportionment of ancestry for mainland and Okinawan Japanese is similar for both sets.the mainland Japanese present components shared by other asian populations while the Okinawan Japanese seem to represent a separate cluster dominated by the blue component.the same was observed in the Pca plot (supplementary Figure 1), where the Okinawan Japanese were more separated from the main cluster that included the other asian populations (cDX, chs, chB, KhV, JPt, and mainland Japanese).
GenoGeographer analysis was done considering each of the seven asian metapopulations.each individual was investigated for their most likely population of origin, using a leave-one-out approach (table 1). the 75 sNP set correctly assigned the Okinawan population as the most likely population of origin for 32 individuals (70%).two individuals were rejected from the analysis, meaning that no relevant population of origin was represented in the reference population dataset.For the remaining 12 individuals, the population assignment was classified as ambiguous.this means that two or more reference populations were assigned as likely populations of origin, and that their likelihoods were not statistically significantly different.the 12 individuals with an "ambiguous" classification were classified as originating from MNl, OKi, and/or JPt (supplementary Figure 2).the 22 sNP set correctly assigned the Okinawan population as the most likely population of origin for 31 individuals (67%).three individuals were rejected from the analysis.seven of the remaining 12 individuals with an "ambiguous" classification were classified as Japanese (any combination of MNl, OKi, and JPt, see supplementary Figure 3).the statistical weights of the population assignments were calculated in the form of likelihood ratios (table 2) using the likelihoods reported by GenoGeographer.With the 75 sNP set, strong evidence (lR > 10,000) for Okinawan was obtained for Okinawan individuals with "accepted" classification.Only when the alternative hypothesis was that the individual was mainland Japanese did 12 out of the 32 individuals obtain weight of the evidence that was less than 10,000.as expected, the lRs were lower with the 22 sNP set.Nevertheless, lR > 10,000 for 11, 26, and 23 of the Okinawans with "accepted" classification, when the alternative hypothesis was that the tested individual was MNl, chB, or KhV, respectively.For Okinawans classified as "ambiguous", the lRs were even lower, and only reached 10,000 or higher, when the alternative hypothesis was that the individual was chB or KhV and when all 75 sNPs were used for the lR calculations.
Most of the mainland Japanese typed in this work were classified as ambiguous with both sNP sets, which suggested that this panel is not suitable for discrimination of mainland Japanese and other asian populations.this was also reflected in the statistical weights of the population assignments (supplementary table s7) that were lower than for the Okinawans and rarely reached 10,000.Furthermore, the likelihoods were not significantly different for many of the comparisons (25% and 41%, respectively, for the 75 and 22 sNP set) and lRs were not calculated.
although the weights of the population assignments of Okinawan individuals with the 22 sNP set was lower than with the 75 sNP set, separation of the Okinawan population from the other asian groups could still be obtained with the smaller number of markers.the 22 sNP set may therefore be useful in cases where the sample material is limited.however, for most purposes, the 75 sNP set would be more useful for the identification of Okinawa Japanese.Nevertheless, more studies with other asian populations and with more individuals from both mainland Japan and Okinawa are needed to confirm and validate the applicability of the two panels.
in conclusion, we created the Okinawa panel by selecting sNPs that differed significantly between the mainland and Okinawa Japanese.the Okinawa panel represents a novel sNP set towards the differentiation of Okinawan Japanese from mainland Japanese and other asian groups (Yamaguchi-Kabata et al. 2008;Watanabe et al. 2021).Mizuno et al. (2021) reported that approximately 3,000 randomly selected sNPs were needed to distinguish Japanese subjects from other east asians in a computer simulation.a panel with a small number of sNPs is unlikely to be able to completely differentiate between Okinawa Japanese and other asian groups, however, the Okinawa panel should be useful for crime investigation, when the sample, cost, and time are limited.Funding the author(s) reported there is no funding associated with the work featured in this article.

Figure 1 .
Figure 1.membership coefficients by sTRuCTuRE analysis based on the (A) 75 and (B) 22 snP sets.The most likely number of clusters (K) of both sets is 3.

Table 1 .
The results of the genogeographer analyses using the 75-snP set (upper) and 22-snP set (lower).no likely population among the reference populations.b Aim profile has at least two likely populations of origin and the likelihoods are not statistically significantly different. a