Haplotype analysis for Irish ancestry

Abstract Forensic haplotype analysis of the male Y chromosome is currently used to establish the number of male donors in sexual assaults, the number of male bleeders in blood pattern analysis, and for ancestry correlation to genetic founder populations in biogeographic studies. In forensic laboratory applications, its primary use is for DNA profile generation with trace amounts of male DNA in the presence of excess female DNA (e.g. spermatozoa identification, male component of fingernail scrapings). Our study supports the potential use of the Y chromosome in a “dragnet” approach (most haplotypes are unique) similar to that described by Kayser in 2017 for solving a cold case sex assault and homicide in The Netherlands. Our study also researched the potential for the identification of an ancestral Irish genetic “footprint” linked to surname O’Brien and identified multiple founder group origins in Ireland and England as well as three samples with the Dal Riata (a Gaelic overkingdom) ancestral haplotype. This study indicates correlation to ancestral Irish ancestry by haplotype but not conclusively to the O’Brien surname.


Introduction
Genetic markers are used routinely for forensic applications and DNA databases [1][2][3][4][5][6]. Y-STRs are a specialty form of DNA testing that can be a useful tool for forensic applications. Y-STR markers are male specific, inherited as a genetic block, and often represent a common paternal lineage [7][8][9][10]. Over the years, Y chromosome-specific STRs have been used in criminal casework involving sexual assault with a multitude of different scenarios such as mixed stain samples, fingernail scrapings, sperm negative casework, cases of multiple male assailants and when evolutionary and genealogical background needs to be inferred [7][8][9][10]. Variations in DNA that are inherited together are known as haplotypes.
Haplotypes are important because like surnames they are passed down generationally from father to son. Individuals who share a common surname might be expected to share more of their DNA if genetically related and, barring mutation, should be genetically identical by descent (IBD) [6,7]. For this project, the loci DYS19, DYS389I/II, DYS390, DYS391, DYS392, DYS393 and DYS385a/b (the European minimal haplotype), and loci DYS438 and DYS439 (recommended by the Scientific Working Group on DNA Analysis Methods (SWGDAM)) and locus DYS437, were used in the comparison of surnames [7,8]. A patrilineal surname is inherited in the same way as the non-recombining region of the Y chromosome, therefore, a correlation between the two should be recognized [7][8][9][10][11][12][13][14].
There are several Irish ancestry projects related to biogeographical ancestry. Three major ones are described here. The Ireland Y-DNA Project has 8 580 number break noted and offers both Y-chromosome (paternal) testing and mitochondrial DNA (maternal) testing to add to verification of historical information regarding ancestry. One hundred and eighty-one samples with surname O'Brien are included in this project (https://www.familytreedna.com/groups/ireland-heritage/about, accessed May 9, 2019). A second project researches the history of the Irish clans to identify founder populations and associated relatives many generations later. The goals of the Irish clan Y-DNA project are to connect genetic relatives and verify paternity and county of origin. Over 60 Irish clans are listed (http://www.clansofIreland.ie/baile/ dna, accessed October 10, 2018). A third project has easily accessible haplotype data for identification of an O'Brien genetic footprint or common haplotype. The O'Brien surname project is available at website https://www.familytreedna.com/groups/obrien/about/ background, accessed October 10, 2018). This website consists of sampled individuals correlated by geography, Y-chromosome haplotype and single nucleotide polymorphisms (SNPs).

Materials and methods
Participants in our study were selected based on surname and ancestry and were asked to fill out a questionnaire (Appendix I) to gather background information (with informed consent). Buccal samples were collected and processed for male DNA haplotypes. Sample collection was performed according to University of New Haven guidelines and after Institutional Review Board (IRB) approval. Following human sampling, DNA from cotton swabs was extracted using the QIAamp V R DNA Mini kit (Qiagen; Germantown, MD, USA) following the buccal swab spin protocol [15]. DNA extractions were quantified using an ABI Quantifiler TM total Human DNA Quantification kit (according to the manufacturer's recommendations; ThermoFisher Scientific, Waltham, MA, USA). Real-time PCR was performed on an Applied Biosystems 7500 Real-Time PCR system (ThermoFisher Scientific). All samples were amplified following the AmpF'STR V R Yfiler V R PCR Amplification kit (ThermoFisher Scientific) manufacturer's protocol on an Applied Biosystems PCR GeneAmp 9700 thermocycler (ThermoFisher Scientific). After PCR amplification, sample fragment separation and detection was performed on an Applied Biosystems 3130xl Prism V R Genetic Analyzer (ThermoFisher Scientific). The results were then analyzed using GeneMarker software (SoftGenetics, LLC; State College, PA, USA) to score alleles at each genetic locus. The haplotypes were transferred to Excel software for data analysis. A pair-wise analysis was performed on all the samples within each population. The resulting data were assessed for a common genetic footprint between specific populations. When reporting new match criteria with Y-chromosome polymorphisms, the DNA Commission of the International Society of Forensic Genetics (ISFG) recommends three methods: counting method, likelihood ratio and Bayesian statistic [16]. Our study used the counting method or "frequentist approach" to compare haplotypes from different populations. The haplotype frequency is defined by a simple equation: X/N, where X ¼ number of times observed; N ¼ number of samples in the database.

Results
Ten O'Brien's of Irish descent were able to be successfully sampled and analyzed (Table 1). From the interview questionnaires, all the donors identified as being Caucasian having their paternal origins beginning in Ireland were not adopted and birthplace varied. From the results obtained, the most common shared allele in the haplotype was a 13 at locus DYS393 and a 14 at locus DYS19 (90% of the sampled population, respectively). Interestingly, the O'Brien's county of origin that bordered or was in close geographic proximity with an adjacent county had a greater percentage of shared haplotypes than those O'Brien's with counties of paternal origin that were not near in geography. This illustrates the importance of geography in surname analysis as farming communities often maintain family relationships for generations in a restricted area due to land assets. All the sampled Irish O'Brien's had a unique haplotype except one pairing from County Clare. Within the general Irish population all the male subjects were of the Caucasian ethnicity and 66.66% of the population sampled was of pure Irish descent ( Table 2). The remaining 33.34% of the sampled population was admixture between Irish and other European countries. Like the O'Brien population, all the analyzed samples had their paternal lineages originating in Ireland. The paternal lineages originated from counties Dublin, Clare, Cork, Limerick, Mayo, Galway and Kerry. The locations of birth were varied. The randomly sampled control population's nationality, country of paternal origin  13  12  13  13  13  13  13  13  13  13  DYS391  11  10  11  11  10  10  11  10  11  11  DYS439  12  12  13  13  12  12  11  10  13  11  DYS635  23  21  23  23  23  23  23  21  23  23  DYS392  13  11  13  13  14  14  13  11  13  13  YGATAH  12  10  13  12  12  12  12  11  12  12  DYS437  15  16  14  15  15  15  15  15  15,16  15  DYS438  12  10  12  12  12  12  12  10  12  12  DYS448  19  20  19  19  18  18  18  20  18 19 a OB5 and OB6 are not known to be genetically related but both originate from County Clare, Ireland. b OB9 and OB11 are consistent with the Dal Riata founder haplotype with the exception of the DYS389II locus. and location of birth was much more diversified than the other two sampled populations (Table 3). In this control group, 83.33% of the population self-identified as Caucasian, 13.34% self-identified as Hispanic and 3.33% self-identified as Asian. The paternal origins for this population covered different regions of Western and Eastern Europe, the Mediterranean, Oceania, the Caribbean, the Middle East and Central America. Birthplace locations varied.
After the pair-wise analysis of the randomly sampled O'Brien population was completed, trends were reviewed to determine if there were any similarities in males with varying locations of paternal origins and locations of origins of birth (Table 4). In the remainder of the sampled population there was an abundance of haplotype dissimilarity as one might anticipate with unrelated paternal lineages that lack a geographic connection. This control population indicates greater haplotype uniqueness and suggests that the Y-STR haplotype may be useful for screening individuals as a "dragnet" approach to identifying or eliminating persons of interest in a crime scene investigation (Table 5). Although Y-STR haplotypes have not been used to search national DNA databases for candidate matches due to population frequency issues, for crime scenes with a geographic restriction and required access to a victim, this may be a quick approach to narrowing the list of potential unrelated male individuals to fully investigate.