Age structuring and spatial heterogeneity in prion protein gene (PRNP) polymorphism in white-tailed deer

ABSTRACT Chronic-wasting disease (CWD) is a prion-derived fatal neurodegenerative disease that has affected wild cervid populations on a global scale. Susceptibility has been linked unambiguously to several amino acid variants within the prion protein gene (PRNP). Quantifying their distribution across landscapes can provide critical information for agencies attempting to adaptively manage CWD. Here we attempt to further define management implications of PRNP polymorphism by quantifying the contemporary geographic distribution (i.e., phylogeography) of PRNP variants in hunter-harvested white-tailed deer (WTD; Odocoileus virginianus, N = 1433) distributed across Arkansas (USA), including a focal spot for CWD since detection of the disease in February 2016. Of these, PRNP variants associated with the well-characterized 96S non-synonymous substitution showed a significant increase in relative frequency among older CWD-positive cohorts. We interpreted this pattern as reflective of a longer life expectancy for 96S genotypes in a CWD-endemic region, suggesting either decreased probabilities of infection or reduced disease progression. Other variants showing statistical signatures of potential increased susceptibility, however, seemingly reflect an artefact of population structure. We also showed marked heterogeneity across the landscape in the prevalence of ‘reduced susceptibility’ genotypes. This may indicate, in turn, that differences in disease susceptibility among WTD in Arkansas are an innate, population-level characteristic that is detectable through phylogeographic analysis.


Introduction
Chronic wasting disease (CWD) is a fatal neurodegenerative disorder that affects white-tailed deer (WTD; Odocoileus virginianus) and related cervids [1,2], with severe impacts on native wildlife, that also reverberate economically for recreational hunting and ancillary commercial enterprises [3,4]. Most CWD eradication efforts have proven unsuccessful thus far [5], leading to its continued spread and increased prevalence [6]. Given this, wildlife managers in many jurisdictions are responding by shifting long-term goals away from eradication and instead towards suppression, containment, and mitigation [7,8].
Several factors have impeded the eradication of CWD, including aspects of life history in both host and agent, as well as limited knowledge with regards to how these interact with environment to define CWD epidemiology. Pathogenicity and transmission, for example, occur via the structural transformation of a naturally occurring cellular prion protein (PrP C ) into a misfolded 'pathogenic' isoform (PrP SC ) [9]. The efficiency with which this occurs, coupled with an extensive incubation period [10], serve to confound proactive surveillance and management. Additionally, both vertical [11,12] and horizontal transmission [13,14] are seemingly involved, with prion persistence well established within 'environmental reservoirs' [15][16][17]. As a result, surveillance and monitoring are being increasingly used by state agencies to inform harvest and selective-removal-based management strategies to suppress the disease where it cannot be eradicated [18,19]. This is complicated particularly due to a potential for longdistance host dispersal [20][21][22], and prion 'shedding' by individuals with subclinical infections [23,24].
CWD transmission among herds [26,27]. Considerable effort to date has focused on characterizing intrinsic susceptibility, especially with regard to the quantification of genetic polymorphisms that encode PrP C (PRNP) [28][29][30][31]. One consistent approach is to identify those PRNP gene variants that are associated with reduced CWD susceptibility [32,33]. The amino acid composition of the resulting protein is thus assumed to impact disease progression [34], although the exact mechanism remains unclear.
Consistent among those variants reported to be associated with reduced CWD susceptibility is a nonsynonymous mutation corresponding to an amino acid substitution at position 96 (i.e., from glycine to serine; hereafter 96G versus 96S). Inoculation studies employing this mutation have successfully delayed the progression of CWD in both WTD and proxies [13,35]. Our primary interest is to characterize if (and how) these PRNP alleles vary spatially [30,36], as one landscape-level axis that depicts a 'resistance' to CWD spread. We provide such an analysis by employing WTD sampled state-wide as a basis for the phylogeography of the PRNP gene (i.e., the geographic distribution of individuals associated with a gene genealogy [37]). We then examine spatial and age-structured patterns within this phylogeographic framework to ask several questions regarding the role of PRNP variants on population dynamics in CWD-endemic areas: Do 'reduced susceptibility' variants have an effect on survivorship (e.g., as one might expect if PRNP polymorphisms do indeed drive susceptibility)? If so, does this leave a detectable signature of biased fitness (e.g., as a result of increased survivability and thus potential reproductive output)? What are the impacts of PRNP polymorphism on population demographics? We address these questions at two spatial scales: 1) within a dense sampling of the CWD-focal area (Newton County, Arkansas) from which prevalence, and, presumably, measurable impacts on population demography are highest; and 2) state-wide, where sampling densities were lower, and in many areas within which CWD has not yet been detected (though we note that lack of detection does not mean lack of occurrence). The county-level spatial scale allowed us to examine demographic impacts within a recently detected outbreak in a novel area (Newton County), wherein dense sampling and high prevalence allow sufficient sampling for testing hypotheses of age structuring and selection on PRNP polymorphisms, while the state-level spatial scale allowed broad-scale analyses of heterogeneity and phylogeographic structuring. The combination of these two spatial scales allows for superimposing differential susceptibilities and fitness across a CWD-absent landscape which could in turn facilitate the creation of management scenarios to project and potentially mitigate disease spread.

Data generation
From 2016 to 2019, ear and tongue tissue samples were collected from 1,720 harvested WTD across 75 counties in Arkansas (Supplemental Table S1; Figure 1). Subsequently, tissue samples from 1,460 WTD were amplified and sequenced, yielding ~800 nucleotides of the PRNP gene and PRNP psg pseudogene. From these data, we obtained 1,433 sequences, of which 316 were obtained from Newton County, the CWD focal area. Sequences were trimmed to 720 unambiguously scored nucleotides, with 11 sites found to be polymorphic (Table 1). Three previously reported polymorphic sites (i.e., nt285, nt299 and nt372; Brandt et al., 2015Brandt et al., , 2018 were found to be invariable in our data, whereas one additional site had a novel synonymous substitution (nt499, A/C). Three sites (i.e., nt286, nt367 and nt676) also reflected non-synonymous substitutions, corresponding to amino acid substitutions 96S, 122 T and 255 K, respectively.
Amplification and sequencing of the PRNP PSG pseudogene were successful in 30% of our samples (443 of 1,459). Our comparison of the 443 PRNP and PRNP PSG haplotypes indicated that nucleotide polymorphism at site nt413 did not represent a biological variant of the PRNP gene but was instead off-target amplification of the PRNP PSG pseudogene.

PRNP haplotype frequencies
Haplotype frequencies (Table 2 and S2) differed slightly from those reported in other states (e.g., Wisconsin and Illinois [30,31]). The four most frequent haplotypes in Illinois and Wisconsin (i.e., >10%), were also common among Arkansas samples (Figure 3; Haplotypes A-D). We also found that Haplotype A, most common in Illinois/Wisconsin at 30%, occurred in only 15% of our samples ( Figure 3; Table 2). Haplotypes B and D were instead most common in our data (each at ~23%). Haplotype C, associated with reduced CWD susceptibility, was detected at a frequency of 15%, quite similar to that in Illinois/Wisconsin (17%). Two additional haplotypes (E and G) also occurred at high frequencies in Arkansas (7% and 11%, respectively), whereas they were found at <5% in the Illinois/Wisconsin [30]. Out of the 16 rare haplotypes with previously reported frequencies ≤1% (Haplotypes K-Z [30]), seven were also observed at low frequencies in Arkansas (i.e., Haplotypes K, L, O, P, R, T, and V). Haplotype frequencies also differed across counties (Supplemental Table S1), although we note that sampling effort was uneven across the state.

Evidence for disease-mediated selection on PRNP variants
Haplotypes B and C did not occur in the same frequency within CWD-positive and CWD-negative deer (Table 2; Figure 3), and likewise were the most spatially heterogeneous of the observed haplotypes ( Figure 4). Haplotype B was over-represented within CWDpositive deer, both state-wide (OR = 2.00, p = 0.000001), and in Newton County (OR = 1.43, p= 0.033). Haplotype C was under-represented within CWD-positive deer, both state-wide (OR = 0.30, p = 0.00003) and Newton County (OR = 0.42, p = 0.015).
If Haplotypes B and C influence CWD susceptibility, then their relative frequency of occurrence should vary among deer age-classes, reflecting a biased probability  Table 1. PRNP haplotypes tabulated from 1,433 white-tailed deer tissues collected from 75 counties in Arkansas (2016-2019). Haplotypes (Hap) are identified by letter (A-V) following Brandt et al. (2015Brandt et al. ( , 2018. Haplotypes denoted as AR1-4 are previously unreported. Mutations that differ from haplotype A are shaded, with green indicating synonymous substitutions (no amino acid change) and blue as non-synonymous (amino acid changed in protein; NSS). Also listed are amino acid position and nucleotide site (Brandt et al. 2015(Brandt et al. , 2018 (Table 2), and tick marks representing number of mutations (=nucleotide substitutions) distinguishing one from another (Table 1). Colour codes reflect relative frequency among CWD-positive (red) and CWD-negative/undetected (green). Letters correspond to haplotype names (per Brandt et al. 2015), with haplotypes unique to Arkansas indicated with numbers (AR#). Haplotypes sharing the 96S mutation are indicated with (*).
of reaching older age classes. We restricted our analyses to Newton County to ensure only deer from CWDendemic locations were considered. We found the frequency of Haplotype C significantly more common in older deer, both when measured as relative allele frequency (p= 0.036; R 2 = 0.635; Figure 5), and as an agepartitioned odds ratio (p= 0.043; R 2 = 0.601; Figure 5), though we note that the sample size for older cohorts was lower than other cohorts ( Figure S2). The increase in Haplotype C remained substantial even when we considered only the relative frequency of each haplotype among CWD-positive deer from Newton County (i.e., 4% of yearling/fawn haplotypes, and 17% of those from individuals older than 5; Supplemental Table S3). Among CWD-positive deer state-wide, Haplotype C was recorded at 5.24% of all sampled haplotypes ( Table 2). By contrast, Haplotype B showed neither a discernible relationship regarding CWD status, nor relative frequencies across age classes ( Figure 5; Supplemental Table S3).

Discussion
Our diagnosis of variability in the PRNP gene across 1,433 WTD collected from 75 counties in Arkansas was comparable to that found in other states (Tables 1-2).
Of the 20 haplotypes we detected, 16 were previously identified in other states, with those alleles at higher frequencies in Arkansas also being most common elsewhere [30]. Four novel variants were found at low frequencies (Table 2; Figure 2-3), with paralog artefacts as a source of this variation eliminated due to our sequencing of the pseudogene. Haplotype C, characterized by the 96S substitution, showed a significantly biased representation among CWD-positive deer, being under-represented in younger deer and overrepresented in older CWD-positive deer ( Figure 5). This suggests that 96S-individuals tend to live longer following CWD-infection than those with alternate genotypes. This could reflect either a reduced likelihood of contracting the disease or a slower disease progression following exposure.
We also found Haplotype B as significantly overrepresented in CWD-positive deer (Table 2 and Figure 3) but failed to find a reciprocal impact on life expectancy ( Figure 5). Instead, we found Haplotype B to be most extensive within the region where CWD is currently centred (Figure 4). This, in turn, may suggest heterogeneity in Haplotype B frequency represents an artefact of population structure, a well-known confounding variable recognized in trait-association studies [38][39][40][41]. We also note that decreased precision of ageing of white-tailed deer on the basis of tooth development and wear patterns may also contribute some variability, particularly with regard to haplotype frequencies among older cohorts [42][43][44]. We also cannot rule out the potential for linked variation as influencing the observed pattern in Haplotype B, however our results do not find any evidence for increased susceptibility of the variant. Using panels of nuclear markers, we suggest further study of patterns of spatial connectivity and population structure in wild WTD [45] as a means of separating potential spatial and Table 2. PRNP haplotype frequencies and odds-ratios, as associated with CWD status of white-tailed deer in Arkansas, 2016-2019. Haplotypes were derived from unphased sequences representing 720 nucleotides of the PRNP gene. Haplotypes (Hap) are identified by letter (A-V) following Brandt et al. (2015Brandt et al. ( , 2018. Haplotypes denoted AR1-4 are previously unreported. Listed are total numbers (N) and relative frequency (%) and associated values for deer that were CWD-negative (-), CWD-positive (+) or untested (?). Odds Ratio (OR) reflects the relative representation of a haplotype in CWD-positive deer; OR>1 indicated over-representation, OR<1 underrepresentation. SE = standard error, CI = 95% confidence interval, Z = OR Z-score and p= OR p-value. Values in bold are significant.  phylogeographic drivers of haplotype frequencies from those driven by disease-mediated selection. The recent detection of CWD in Arkansas may suggest an insufficient time to generate the genotype frequency differential we observed for Haplotype C (e.g., Figure 5). However, the prevalence rate in the Arkansas CWD Management Zone is suggestive that it was present in the landscape much earlier. Here, the  putative peak prevalence rate of 22-30% (Middaugh and Riggs, pers. comm) points to a longer-standing epidemic. For example, Miller et. al [46] showed prevalence rates in silico to reach 1% in a 15-20 year span, and 15% in 37 to 50 years, using plausible transmission rates. This seemingly suggests that the time required to reach a prevalence rate of >22% is likely to be longer than 37-50 years. However, we note that true prevalence likely varies as a function of numerous demographic factors [47] and also likely exhibits a non-linear temporal trend [48], thus conclusions are highly dependent on the exact model parameterization [49].

Management implications of PRNP variation
Structural variants of the PrP protein play a role in disease progression [28,34]. However, the exact mechanisms remain poorly understood. The primary variant we have implicated herein (i.e., 96S) as influencing disease susceptibility has been supported as such in laboratory settings. For example, Mathiason et al [23]. inoculated WTD with prion strains and examined time-to-detection (via saliva) across deer in multiple cohorts. Several infected individuals having the 96S prion gene variant remained undetected at 18 months post-inoculation, although this may represent an insufficient time for the necessary in vivo prion protein build-up. Race et al [35]. similarly inoculated transgenic mice expressing different whitetailed deer 96GG and 96SS PRNP genotypes and showed that this delay in disease progression also extended to heterozygotes (96GS genotype).
Despite compelling evidence for an inhibitory mechanism at the 96S allele, its use as a management tool remains unclear. Genetically guided selective breeding in domestic sheep (Ovis aries) has reduced scrapie incidence [50,51]. In a captive setting such an approach may be viable for WTD [35], although we note that the degree of protection offered by the 96S genotype to CWD is likely much lower than that seen among the most 'protective' genetic variants to scrapie in sheep [50]. However, captive deer maintained at high density and under genetic selection could also drive artificial selection for emergent prion strains with heightened pathogenicity in 96S deer, as well as potentially expanding the host range to include novel species [52].
The implication of 96S frequency with regard to the spread of CWD within and among herds is likewise uncertain. An important question is whether the potential for an increased incubation period [53] associated with 96S could also produce a similar extended period for asymptomatic and subclinical transmission/prion shedding [24,54]. If so, this could increase the probability of transmission by 96S deer, thus promoting its increase within populations (Supplemental Figure S3). An understanding of the prion growth kinetics across host genotypes is needed, as well as a more thorough understanding of prion strain evolution [55].

Conclusion
Our results corroborate previous research conducted with WTD in Illinois and Wisconsin: reduced CWD susceptibility in PRNP variants associated with the nonsynonymous 96S mutation [30,31]. We demonstrated that a common haplotype (Haplotype C) harbouring 96S increased in relative frequency when older CWDpositive cohorts were examined.
Management implications of this research requires further epidemiological understanding necessary to predict outcomes (Supplemental Figure S3). The next step in understanding the current distribution and future spread of CWD in Arkansas requires the characterization of context-specific factors, including 1) population structure as a potential driver of PRNP trends, 2) landscape features that modulate deer dispersal, population demography, and density, and 3) increased knowledge of epidemiology, including the interactions among context-specific factors [56]. We thus advocate for a landscape genomic framework for WTD as the next logical step to characterize CWD spread such that fine-scaled deer movement patterns can be more effectively parsed and interpreted [57].

CWD surveillance and prevalence in Arkansas
In October 2015, CWD was initially detected in Arkansas in a 2.5-year old female elk (Cervus canadensis) legally harvested near Pruitt in Newton County. In February 2016, a CWD-positive female WTD was found dead in Ponca in Newton County. During March 2016, biologists from the Arkansas Game and Fish Commission (AGFC) collected 266 WTD tissue samples within a 50,500 ha focal region. CWD prevalence was 23% and differed by gender (female = 20%, male = 32%). For these surveys, to include the samples collected for molecular work, ageing of deer was done by examination of tooth replacement and wear [58]. Although the method of ageing described by Severinghaus [58] is well established, we do note that it has reduced accuracy in older cohorts as compared with alternative methods using cementum annuli [44,59].
Subsequent state-wide monitoring, which included hunter-harvested and road-killed deer, identified CWD-positive individuals outside of the initial focal region. Additional state-wide sampling efforts, in conjunction with hunter harvested and bi-annual surveys, established a state-wide baseline for occurrence of CWD. As of April 2020, 818 WTD and 23 elk have tested positive for CWD. Given the incidence of confirmed CWD-positive deer, AGFC established a CWD Management Zone (CWDMZ) that included counties within a 16 km radius of identified positives. At the completion of this study (June 2019), the CWDMZ encompassed 19 counties of northwestern Arkansas ( Figure 1).
There has been a concerted effort by the AGFC to proactively manage CWD prevalence and potential disease spread in Arkansas, which included hunting regulations that promote harvest of young male deer and increased harvest of female deer (e.g., removal of antler point restrictions and altered bag limits). Addition restrictions included prohibiting baiting and feeding to reduce grouping behaviour in deer and to hinder human-mediated transmission via hunting and subsequent removal of carcases from with the CWDMZ. The CWDMZ has subsequently been expanded to encompass the known distribution of CWD. During the 2018/ 2019 deer and elk hunting seasons, 246 additional CWD-positive cervids (241 WTD and five elk) were detected. Moreover, the AGFC has mandated a compulsory testing requirement for harvested elk, and a voluntary test for WTD, facilitated by a statewide network of deer head drop-off locations.

DNA extraction and amplification
Frozen ear or tongue tissue was homogenized with the TissueLyser II (QIAGEN© Corporation, Maryland, USA), with genomic DNA subsequently extracted using the QIAamp Fast DNA Tissue Kit protocol (QIAGEN© Corporation, Maryland, USA). To ascertain presence of high-quality genomic DNA (i.e., molecular weight >20kb), a 5 μl aliquot of the DNA extract was separated on a 2% agarose gel and visualized using GelGreen on a blue-light transilluminator (Gel Doc™ EZ Imager; Bio-Rad). DNA was then used as template material to amplify a coding section of the PRNP gene, following a protocol modified from previous studies [31,32]. For the functional PRNP gene, the forward primer (CWD-13) straddles Intron 2 and Exon 3, with the reverse primer (CWD-LA) located 850bp downstream [32]. To ascertain if the polymorphisms were indeed in the functional PRNP gene, we tested for the presence of the non-coding PRNP pseudogene (PRNP PSG ) by using pseudogene primers 223 and 224 [33].
Amplifications If both PRNP and PRNP PSG amplified in a sample, then each was sequenced to identify the true polymorphism in the functional PRNP gene. Amplicons were enzymatically purified, sequenced using BigDye v. 3.1 (Applied Biosystem Inc., Forest City, CA) dyeterminator chemistry, and resolved on an ABI 3730XL GeneAnalyzer (University of Illinois Keck Centre for Functional and Comparative Genomics). Sequences were manually edited using Sequencher (v 5.4, Gene Codes, Ann Arbour MI) and aligned against a reference database of PRNP gene sequences obtained from the NCBI GenBank (Accession # AF156185.1; AY3600089.1; AY3600091.1).

Haplotype phasing and network construction
Following alignment, sequences were phased to haplotypes (paired nuclear alleles) using the programPHASE2 [60], which reconstructs haplotypes using a probabilistic model of linkage disequilibrium. Only haplotypes assigned with >90% posterior probability (N = 1,433) were retained. Scripts employed to format inputs and parse haplotype phasing are available at github.com/tkchafin/haploTools. Haplotypes were then categorized using a published nomenclature [31], with haplotype frequencies calculated globally, by county, and by CWD status. We constructed a haplotype network to visualize similarity amongst haplotype, using the median-joining algorithm employed by POPART [61]. Scripts to formulate these input files are found at: github.com/tkchafin/scripts.

Analysis of PRNP variants
We first applied spatial interpolation to examine the structure of PRNP haplotypes as distributed across the state. Haplotype frequencies were computed by first dividing the state into non-overlapping 'pseudopopulations' that contained between 5 and 10 sampling localities each. This was done because our state-wide sampling process lacked a priori information with regard to natural population structure. Our results yielded N = 211 polygons (Supplemental Figure 1). We then applied Empirical Bayesian Kriging in ARCMAP v10.7.1 (Esri, Inc.) to interpolate our posterior probabilities.
To associate disease with PRNP variants, we followed prior studies [30,31] by computing odds ratios (OR). We first consider the probability of displaying an outcome (=CWD status) given the presence of a focal haplotype. An OR>1 = an over-representation of the focal haplotype among CWD-positive deer; an OR<1 indicates the focal haplotype is under-represented. We identified and evaluated haplotypes separately, by examining haplotype frequencies among age classes. We did so because a bias in relative representation can be driven by multiple factors such as population structure, which may drive a coincident relationship.
Here, we assumed that if a haplotype indeed affects the probability of survival to adulthood in CWD-present regions, presumably by reducing disease risk and/or progression, then it should show a significant change in relative representation in older age groups.