Genetic characterization of four dog breeds with Illumina CanineHD BeadChip

Dear Editor, Due to human's cohabitation with domesticated animals, molecular analysis of animal DNA is increasingly being admitted as evidence in forensic investigations. In 2011, recommendations ...

In total, we genotyped 48 samples using the CanineHD BeadChip WG-440-1001 (Illumina, Inc., San Diego, CA, USA). The sample details are attached as Supplementary Table S1. The 37 unrelated individuals investigated were GS (n ¼ 12), DS (n ¼ 7), SS (n ¼ 7) and M (n ¼ 11). Since the CanineHD BeadChip WG-440-1001 is capable of sequencing a maximum 48 samples in parallel, we also tested three DS puppies from a single brood, one negative control and seven resequenced samples. The seven re-sequenced samples were collected via a second blood collection, taken a year following the initial collection. The aims of the investigation were: (1) to characterize the genetic profile of the four pure dog breeds by quantifying the genetic differentiation among them and the degree of genetic homogeneity within breeds; and (2) to determine whether the results can be applied for designing breed recognition strategies aimed at distinguishing these dog breeds, as well as distinguishing the identity of individuals.
Genomic DNA (gDNA) was isolated using a QIAamp DNA Blood Kit following the manufacturer's protocol (QIAamp; Qiagen, Hilden, German). DNA was quantified using agarose gel electrophoresis and the Nanodrop ND-200 spectrophotometer (Thermo Fisher Scientific). Detailed concentration information is listed in Supplementary Table S1. gDNA concentrations for all samples were a minimum of 50 ng/mL. DNA samples were whole-genome amplified for 20-24 h at 37 C, fragmented, precipitated and resuspended in an appropriate hybridization buffer. The samples were hybridized on the prepared BeadChips for 16-24 h at 48 C. Following the hybridization, nonspecifically hybridized samples were removed by washing, while the remaining specifically hybridized loci were processed for the single-base extension reaction, stained and imaged on an Illumina iScan Reader. SentrixBarcode and SentrixPosition on the chip are listed in Supplementary  Table S1. We used GenomeStudio and the accompanying guidelines from Illumina (www.illumina.com) to identify individuals suitable for genetic profile analyses. Genotype data generated from the iScan system were loaded into Illumina GenomeStudio Genotyping Module and used to perform primary data analysis, including raw data normalization, clustering and genotype calling (https://support.illumina.com.cn/ array/array_kits/caninehd_whole-genome_genotyping_ kit/documentation.html?langsel¼/cn/). A final custom report was created from GenomeStudio using PLINK Input Report, which generated a PED and MAP file to use for downstream analyses.
We evaluated the population genetic profiles using a Bayesian inference model in the program STRUCTURE 2.3.3 [11]. We used 10 000 burn-in runs, followed by 10 000 Markov Chain Monte Carlo repetitions and evaluated three possible population clusters (K ¼ 2-4). Each parameter setting was repeated three times. We used STRUCTURE HARVESTER and CLUMPP v1.1.2 [12] to summarize the output, which included estimates for delta K, and plotted individual assignments with Distruct v1.1. The STRUCTURE approach has become a standard method of evaluating the number of genetic clusters in a dataset, while assuming equilibrium genetic conditions (Hardy-Weinberg and linkage equilibrium). These conditions may nonetheless not be fulfilled in all breeds. Therefore, we also evaluated the data with principal component analysis (PCA) methods that are without such equilibrium assumptions using the adegenet package in R 2.14.2. A phylogenetic tree was generated using Mega 7.0 (https://megasoftware.net/). The genetic differentiation between breeds was calculated using the Fst [13]. Moderate and large differentiations had Fst values ranging from 0.05 to 0.15 and 0.15 to 0. 25, respectively [14].
In the PLINK Input Report, 173 662 SNPs of the 47 samples were provided, resulting in 8 162 114 genotypes, while no genotypes were called for the negative sample. The calling rate of the 47 samples ranged from 99.32% to 99.66%, while the average calling rate was 99.53%. For the seven re-sequenced samples, the both called genotypes were all consistent. However, there are some SNPs detected with genotypes in one sample, while detected with no genotype at the other double sequenced sample. This kind of sequencing error rate ranged from 0.0144% to 0.0311% and was found at 54 SNPs. These SNPs were deleted from following analysis. Among the 37 unrelated samples, data were screened with following steps: (1) Figure S1) was constructed with VENNY 2.1 to show SNP numbers, unique or shared, across the four breeds. Venn diagrams are illustrations composed of overlapping circles that demonstrate the relations between finite collections of breeds and are most useful in defining areas of commonality among different breeds. A breed-specific SNP was defined as "private SNP" for which one of the alleles was detected only in one breed (a fixed SNP). The number of "private SNPs" of GS, DS, M and SS is 11 494, 2 325, 5 841 and 17 329, respectively. We also validated the data with three DS puppies in a brood (sample DS-O-1, Table S1) and found 1 882 SNPs of the 2 325 DS "private SNPs" are with fixed genotypes. These "private SNPs" which identified as specific breed markers would be helpful for breed identification or evaluation of purity of a breed. Moreover, the quantity of "private SNPs" would be minimized when more samples were further tested. Grasso et al. [14] found 99, 99, and 11 190 fixed SNPs for Corriedale, Merino and Creole sheep, respectively. Wiggans et al. [15] reported that a set of 622 SNPs can be used to determine breed identity as part of the quality control process for dairy cattle. Ramos et al. [16] reported 29 146 putative breed-specific SNPs in five pig breeds (Duroc, Landrace, Large White, Pietrain and Wild Boar). In future studies, an independent group of the aforementioned four canine breed samples should be tested for validation of the "private SNPs" reported here.

DS-O-2 and DS-O-3) (Supplementary
The polymorphic SNPs (MAF > 0.05) presented in GS, DS, M and SS are 52 871, 68 048, 61 525, and 46 965, respectively. Highly polymorphic SNPs (MAF > 0.4) presented in GS, DS, M and SS are 535, 1 583, 2 730 and 2 227, respectively; GS had a much lower number of highly polymorphic SNPs than the other three breeds (1%). Among these highly polymorphic SNPs, we found only 129 SNPs were observed in all the four breeds, which could be used for canine parentage testing and individual identification. We analyzed the 129 SNPs in the three DS puppies of a brood (sample DS-O-1, DS-O-2 and DS-O-3) and their parents (1-Z07-A and 1-Z08-B) and found they all follow Mendel's law. And these polymorphic SNPs can distinguish one individual from another.
With the filtered 76 599 SNPs, we performed STRUCTURE, PCA and phylogenetic tree analysis. Supplementary Figure S2(A,B) supported the presence of three genetic clusters (K ¼ 3) determined by the delta K method. With K ¼ 3, the GS, M and SS were clearly differentiated. Only DS individuals formed three clusters with high levels of admixture. Both the PCA (Supplementary Figure S2(C)) and the Neighbor-net tree (Supplementary Figure S2(D)) revealed a clear genetic separation of the four breeds. In Supplementary Figure S2(C), the four breeds were clearly differentiated by the three principle components, which sufficiently accounted for the observed population structure, with findings very similar to those provided in Supplementary Figure S2(A). Only the DS samples located within a loose cluster. In Supplementary Figure S2(D), individuals within the same breed were clustered together, and the different breeds were distributed in distant branches. The four breeds considered in this study were clearly genetically differentiated from each other, regardless of current small population sizes.
Large genetic differentiations were observed among the GS, M and SS, with Fst values ranging from 0.18 to 0.22 (Supplementary Table S2). Low-to-moderate genetic similarity has been detected when DS was compared with the three other breeds, with an Fst ranging from 0.04 to 0.14 (Supplementary Table S2). This result agrees with the above findings.
To conclude, the genetic characterization, despite the small population size, showed relatively high genetic diversity among the four dog breeds considered in this study. The results could be helpful in developing specific sets of SNPs for breed identification, individual identification and parentage testing, all of which could be used in forensics, population genetics, and other analyses.

Authors' contributions
Zihao Yang, Jingyi Zhang, Jiashuo Zhang and Ruiyang Tao performed the experiments and the statistical analysis; Wei Ren, Jie Zhang and Jilin Dong collected the samples; Chengtao Li participated in its design and coordination; Suhua Zhang conceived of the study, performed the statistical analysis and wrote the manuscript. All authors contributed to the final text and approved it.

Compliance with ethical standard
No approval from the Animal Care and Use Committee was obtained because no animals were used.

Disclosure statement
No potential conflict of interest was reported by the authors.