Comprehensive genotyping of the C9orf72 hexanucleotide repeat region in 2095 ALS samples from the NINDS collection using a two-mode, long-read PCR assay.

Abstract Objective: Expansion of the G4C2 repeat tract in the C9orf72 gene is linked to frontotemporal dementia (FTD) and amyotrophic lateral sclerosis (ALS). Here, we provide comprehensive genotyping of the C9orf72 repeat region for the National Institute of Neurological Disorders and Stroke (NINDS) ALS collection (n = 2095), using a novel bimodal PCR assay capable of amplifying nearly 100% GC-rich sequences. Methods: A single-tube 3-primer PCR assay mode, resolved using capillary electrophoresis, was used for sizing up to 145 repeats with single-repeat accuracy, for detecting expansions irrespective of their overall size, and for flagging confounding 3′ sequence variations (SVs). A modified two-primer PCR mode, resolved via agarose gel electrophoresis, provided further size information for hyper-expanded samples (>145 repeats) up to ∼5.8 kb amplicons (∼950 G4C2 repeats). Results: Within the evaluated cohort, 177 (8.4%) samples were expanded, with 175 (99%) samples being hyper-expanded. 3′-SVs were identified in 64 (3.1%) samples, and were most common in expanded alleles. Genotypes of all 606 (29%) homozygous samples were confirmed using an orthogonal PCR assay. Conclusion: This study and PCR method may improve and standardize molecular characterization of the C9orf72 locus, and have the potential to inform phenotype–genotype correlations and therapeutic development in ALS/FTD.


Introduction
Microsatellite repeat expansions are associated with neuromuscular and neurodegenerative disorders, including frontotemporal dementia (FTD) and amyotrophic lateral sclerosis (ALS) (1). Both FTD and ALS are considered part of the same clinical continuum and have been linked by a hexanucleotide repeat element (G 4 C 2 ) n in intron 1 of the chromosome 9 open reading frame 72 (C9orf72) gene (1). Expansions in the C9orf72 repeat region have been reported in approximately 0.14% (1:700) of the population (2), and are enriched in both familial FTD and ALS patients (25% and 20-67%, respectively). The expansion also appears in up to 7% of sporadic FTD and ALS patients, making it the most prevalent genetic mutation in both disorders (3)(4)(5).
Although a validated threshold for repeat expansion pathogenicity has yet to be established, normal alleles have fewer than 20 repeats whereas disease-associated C9orf72 expansions have more than 30, and more typically, hundreds to thousands of repeats. Expanded alleles also manifest extensive somatic size mosaicism (1,6). The impact of repeat length and associated DNA methylation status on a range of clinical phenotypes requires further elucidation (3,7). Moreover, C9orf72 expansions may influence other neurodegenerative disorders, including Alzheimer's disease (2).
Supplemental data for this article can be accessed here. Molecular characterization of the hexanucleotide repeat can be challenging, as the large size and high GC content of pathogenic C9orf72 expansions impede polymerization during PCR and limit the utility of conventional PCR-based fragment sizing methods. To this point, Akimoto et al. (8) evaluated C9orf72 genetic testing paradigms across labs, revealing a high incidence of false positives and negatives using laboratory-developed PCR assays. They concluded that more informative and consistent analysis methods are needed and recommended a combination of gene-specific (GS) and repeatprimed (RP) PCR as a minimum for research use, with Southern blot techniques advised for clinical diagnostics (8). In addition to PCR challenges imposed by GC-rich repetitive DNA, repeat size analysis is also confounded by the presence of sequence variations (SVs) at the 3 0 end of the repeat region (5,9). The presence of such 3 0 -SVs within PCR priming sites or target regions can attenuate amplification efficiency, distort allele sizing, and/or cause allele dropouts. Testing of 6891 clinical samples identified 3 0 -SVs in 3% of the population (10). This report extended a previously established 3 0 -variable region to include an additional 50 bp downstream of the repeat tract (10).
In this study, we demonstrate the analytical capabilities and performance of a novel two-mode, multiplexed PCR chemistry for the genotyping of the C9orf72 repeat tract. This assay balances coamplification of GS and RP PCR products tailored for capillary electrophoresis (CE)-based, accurate sizing of C9orf72 repeats from genomic DNA (gDNA) samples. We show the utility of this GS/ RP-PCR assay by genotyping the National Institute of Neurological Disorders and Stroke (NINDS) repository's ALS collection. With over 2000 samples, this collection is the largest publically available annotated set of patient-derived cell lines and matched genetic materials and is a valuable and readily accessible resource for researchers studying ALS and other neurodegenerative disorders. We used GS/RP-PCR/CE to quantify the number of hexanucleotide repeats in this collection at singlerepeat resolution while also detecting 3 0 -SVs and low-level size-mosaicism. All expanded samples were further sized using a complementary GS-PCR/agarose gel electrophoresis (AGE) workflow that resolves amplicons whose size exceeded the resolution window of CE; this companion analysis generated reproducible results for up to 5.8 kb PCR products consistent with the amplification of 950 repeats.

Materials and methods
Clinical genomic DNA samples

PCR/CE assays
Samples were analyzed using the AmplideX V R PCR/CE C9orf72 Kit (Cat. # 49581; Research use only, Asuragen, Inc., Austin, TX). A general schematic of the 2-and 3-primer assay in context of the C9orf72 gene region is shown in Figure 1.
Briefly, gDNA was PCR amplified using three primers: two GS primers that flank the repeat region (GS forward primer 5 0 -CGCAGCCTGTAG-CAAGCTCTGGAACTCAGGAGTCG-3 0 ; GS reverse primer 56-FAM/TGCGCCTCCGCCG-CCGCGGGCGCAGGCACCGCAACCGCA-3 0 ), and one forward RP primer designed to be complementary to three G 4 C 2 repeats. Thermal cycling was performed on Applied Biosystems' Veriti and 9700 Thermal Cyclers (Thermo Fisher, Waltham, MA) using 98 C for 5 min, 37 cycles of 97 C for 35 s, 62 C for 35 s, 72 C for 3 min, and final extension at 72 C for 10 min. PCR product (2 mL) was mixed with Hi-Di TM Formamide (Thermo Fisher, Waltham, MA) and ROX 1000 Size Ladder (Asuragen, Inc., Austin, TX) for analysis by CE (3500xL Genetic Analyzer, Thermo Fisher, Waltham, MA). The FAM-labeled amplicons were detected using a fragment analysis protocol (50 cm capillary; 2.5 kV, 20 s injection, 19.5 kV run for 2400 s). By adjusting CE run parameters, POP-7 sizing range can be extended beyond 145 repeats. Lower run voltages (<12 kV) allowed for hyperexpanded allele sizing up to 180 repeats, with an increase in run time and loss in sensitivity due to the spreading of the signal within a "pile-up peak" (>145 repeats) that comprises aggregated amplicons too long to be adequately resolved by CE (Supplementary Figure 2). Electropherograms were processed as .fsa files using GeneMapper v4. 1 for analysis, and manually annotated for repeat profile initiation and size of full length product. A PCR/CE control admixture sample comprised of C9orf72 alleles with 2, 5, 8, and 10 repeats and a no template control were used in all experiments. Repeat size was determined using a linear fit adjustment of the ROX ladder size peaks to the PCR/CE control sample alleles. Homozygous and discordant samples were further analyzed using an alternative 3 0 -distal (alternative GS reverse) priming site with the following sequence (6): 5 0 -ATGCAGGCAATTCCACCAGTCGCT-AGAGGCGAAAGC-3 0 .

PCR/AGE assay
To assess expanded alleles with >145 repeats, the AmplideX V R PCR/CE C9orf72 Kit was used with modifications: (1) 80 ng gDNA input, (2) omission of RP primer (GS primers only), and (3) an alternative thermal cycling protocol (Supplementary materials). A total of 177 expanded samples were reflexed to this 2-primer modified assay and resolved on CE and AGE. A positive PCR/AGE control, comprised of samples ND06751 and ND09492 mixed in equal parts, was used to evaluate assay processivity, sensitivity, and sizing accuracy. PCR products (13 mL) and 2-Log DNA Ladder (NEB) were loaded onto a 12-well Reliant Precast 1% Seakem Gold DNA MiniGel (Lonza Walkersville, Inc., Walkersville, MD) and visualized with ethidium bromide. AGE data were analyzed using the GelAnalyzer software (2010a, Dr. Istvan Lazar). To enhance the accuracy of software-assisted sizing and address non-linear electrophoretic mobility, fragment sizing was partitioned into three partially overlapping AGE size ranges using the reference ladder. Target DNA bands were assigned visual intensity levels as high (H), medium (M), or low (L). Intensities were assigned by visual comparison to the 2-log DNA ladder bands, where 4.0-10.0 kb ladder bands were medium intensity.

DNA sequence analysis
Following PCR amplification and size analysis, samples with abnormal repeat profiles were cloned and sequenced to resolve allele-specific changes. C9orf72 repeat amplicons were cloned into the pGEM-T Easy Vector System II (Promega, Madison, WI). Positive colonies were expanded and sequenced (Sanger) on an ABI Prism 3730xl DNA analyzer (Thermo Fisher, Waltham, MA).

PCR assay design and performance evaluation
We designed two sets of single-tube PCR reagents configured with overlapping components to amplify the ALS/FTD-associated (G 4 C 2 ) n repeats in the C9orf72 gene (Figure 1(a)). A 3-primer PCR design operates in two modes by combining degenerate repeat priming with repeat-flanking GS priming (GS/RP-PCR; Figure 1(b)). Omitting the RP results in a 2-primer GS-PCR generates large amplicons free from the background of the RP profile to further clarify the detection of full-length products (Figure 1(c) & 1(e)). GS/RP-PCR amplicons produce an overlapping electrophoretic repeat profile with a 6 bp peak frequency, and create two distinct yet complementary data signatures that are ideally resolved using high- resolution CE (Figure 1(d)). When hyperexpanded alleles (>145 repeats) are present, the high molecular weight amplicons can exceed the sizing range of CE (950 bp; Figure 1(d), inset) and produce a "pile-up peak". GS-PCR products generated from the 2-primer assay design can be resolved by CE or by AGE for hyper-repeat expansions (Figure 1(e)).
The 3-primer GS/RP-PCR configuration enables CE-based sizing up to 145 repeats ( Figure 2). Allele sizing can be assessed from the mobility of GS amplicons or by directly counting RP amplicon peaks. Importantly, the RP peak profile independently identifies hyper-expanded alleles and thus resolves uncertainty in allele zygosity (Figure 2(a)).
Analytical verification experiments showed that the assay detected expanded alleles over a 2-log gDNA input range using as little as 1 ng, and revealed low-level mosaic alleles down to a 5% mass fraction of an expanded allele spiked into a 95% normal background (Figure 2(b,c)). Similarly, reproducible low-level mosaic amplicon profiles were observed in 67% of expanded samples, absent in the normal samples, and could be consistently resolved on CE and AGE ( Figure  2(d) and Supplementary Figure 3). Finally, the assay was evaluated against 569 previously characterized gDNA samples from the NINDS ALS repository (11), demonstrating full categorical concordance and 99.3% sizing agreement within ±1 repeat (Supplementary Table 1). Interestingly, genotypes for the four size-discordant samples (ND11513, ND11685, ND120467, ND12382), previously reported as homozygous normal (11), were reconciled using an alternative 3 0 -distal primer configuration and were determined to be heterozygous (two normal alleles in each case), consistent with results obtained using the primary PCR assay.

Genotyping of the NINDS ALS sample collection
Following verification of the assay's analytical performance, the GS/RP-PCR was utilized to genotype cell-line gDNA samples from the NINDS ALS Repository (n ¼ 2095). C9orf72 genotype information for the sample set is in Supplementary  Table 1 and summarized in Figure 3(a).
Consistent with the established prevalence of C9orf72 expansions in ALS-positive populations (10), 177 (8.4%) of the 2095 samples harbored expanded C9orf72 alleles, while only two of these samples had primary expanded alleles between 30 and 145 repeats (ND10554 and ND12780 with 56 and 70 repeats, respectively). Furthermore, the size distribution of normal C9orf72 alleles agreed with earlier analyses in European populations (12), underscoring the relatively high frequency of 2-, 5-, and 8-repeat alleles (78% of all normal alleles; Figure 3(b)) and the resulting high prevalence of homozygous samples (28.9%).
All 177 expanded samples were analyzed using the GS-PCR assay and resolved by CE and AGE. Amplicons as large as 5.8 kb were produced corresponding to 950 repeats (Figure 3(c), ND12667, ND11081). The results also demonstrated the mosaic nature of C9orf72 cell-line gDNA (Figures 2(d) and 3(c)). AGE images for the 177 samples, and their corresponding table of expanded genotypes, are in Supplementary Figure  2 and Table 2, respectively.
Sequence variations have been identified at the 3 0 end (3 0 -SVs) of the repeat region (5,9,10). While 3 0 -SVs have unknown clinical significance, they can confound interpretations by causing allele dropouts or by distorting amplicon mobility and skewing C9orf72 allele size conversions. To circumvent these effects, primers were positioned outside regions with reported variability ( Figure 1(a), and section "Materials and methods"). Furthermore, the complementary repeat primer orientation included this variable region, and thus was able to identify and indirectly characterize 3 0 -SVs through their impact on sample repeat profiles (Figure 3(d)). Sixty-four of the 2095 ALS samples (3.1%) had indels downstream of the repeat tract. These indels were primarily found in expanded samples, and represented 30% of expanded cell lines. By comparison, indels were marginally present (0.6%) in non-expanded samples. Insertions were 10.6fold less common than deletions and manifested as either RP signal dips, or offsets in the otherwise fixed 124 bp RP start site (Figure 3(d)). A subset of these samples (n ¼ 6) were further investigated using Sanger sequencing. Sequencing confirmed the RP-based analysis and identified previously reported 3 0 -SVs (10). Furthermore, an insertion event of a common 6 bp element within the repeat stretch (Figure 3(d)) was found in two unrelated samples. Interestingly, this common 6 bp insertion also maps directly downstream of the repeat stretch and was previously identified as a frequent 3 0 -deletion site (5,9).
To help ensure that no alleles dropped out from the primary analysis, all 606 samples (29%) initially identified as homozygous were verified using PCR with an alternative distal priming site that was 194 bp downstream of the standard primer. The orthogonal results confirmed primary findings using the primary GS/RP-PCR assay, despite significant presence of 3 0 -SVs in this sample set.

Discussion
PCR innovations have created reliable systems for the amplification of long tracts of repetitive GCrich DNA, including those with >95% GC content and >1000 CGG repeats in the 5 0 -UTR of the FMR1 gene (13,14). In this study, we leveraged these advances to design, verify, and apply a novel PCR assay to characterize the C9orf72 repeat region across the complete NINDS ALS sample collection (n ¼ 2095). The results demonstrated broad agreement with earlier population studies (5,12) across distributions of allele sizes and categorical sample genotypes, as well as the frequency and location of 3 0 -SVs. Additionally, we observed genotypic concordance for an available subset of previously annotated NINDS samples (11) (n ¼ 569). These findings underscore the analytical validity of the C9orf72 PCR assay, as well as the relevance of the NINDS cell lines for ALS research.
The PCR technology offers several benefits compared to conventional C9orf72 PCR or recently published long-read assays (15,16). First, the PCR can navigate extreme GC content (>98% GC character) and accurately size 145 repeats on CE, with an extended range of 950 repeats by AGE. Second, the sensitivity of the assay enables detection of expanded alleles using 1 ng of gDNA, and the identification of expanded major and minor alleles down to a 5% mass fraction. Third, the single-tube, 3-primer (GS/RP-PCR) design overlays the RP profile onto prominent GS peaks, thereby generating a multiplex, information-rich readout which expands analytical value in several ways: (1) enhances sizing accuracy through direct RP peak counting that can confirm sizing from GS peaks; (2) confirms sample zygosity through the sensitive and size-agnostic detection of expanded repeat profiles and pile-up peaks; and (3) flags 3 0 -SVs through repeat profile irregularities such as signal dips or peak offsets. Finally, the assay may be used with gDNA isolated from whole blood and other biosamples with comparable results to cell-line DNA.
Deviations in RP-PCR profiles reflect repeat SVs, which are associated with phenotypic or heritable genetic features in several other repeat disorders (10,(17)(18)(19). In this study, we identified 3 0 -SV in the C9orf72 repeat tract in 3.1% of the NINDS samples, predominantly in expanded samples. Furthermore, sequencing data confirmed these observations and identified an insertion of a reported 6 bp deletion "hot spot" element (5 0 -CGTGGT-3 0 (5,9,10)) in the repeat tract of two unrelated samples. These results reiterate the highly variable nature of this region and the need to interrogate the hexanucleotide sequence space for insertions and repeat interruptions. To this end, the "dips" in the GS/RP-PCR repeat profile are ideally suited for sensitive and rapid identification of such interruptions deep into the repeat region, and can be employed in future screening studies to help elucidate their prevalence and possible impact on C9orf72 biology.
From an analytical standpoint, 3 0 -SVs may attenuate PCR efficiency and/or cause allele dropouts if they impact PCR priming sites. This effect, in turn, could lead to misclassification of samples as homozygous (only single allele detected). Indeed, this may well be the underlying cause of discordance for four size-discrepant samples identified in our study that were previously classified as homozygous (11) (Supplementary Table 1).
3 0 -SVs have been shown to extend 80 bp beyond the repeat element (10). If 3 0 -SVs are captured within the GS-PCR amplicons, they can cause allele-sizing inaccuracies and even categorical miscalls (20,21). This is a particular concern for "at-risk" alleles that straddle the expanded categorical cutoff of 30 repeats. We estimate from our data that such at-risk alleles are rare and occur at a frequency of approximately 1 in 10,000 alleles; this is based on a 0.6% indel rate in alleles with <145 repeats and a 1.8% frequency of at-risk alleles with 17-42 repeats (Figure 3(b)). Furthermore, the GS/RP-PCR assay design allows for inspection of RP peak profile irregularities which can help identify 3 0 -SVs and at-risk samples.
When a skewed RP start site is observed and individual allele repeat profiles are out of phase, meticulous repeat-peak tracing can accurately resolve skewed size and even inform of the underlying 3 0 -SV allele association (Figure 3(d)).
The analytical risks associated with the detection of rare variants is shared across all sizing assays, including Southern blotting (e.g. when variants occur at restriction enzyme cut sites, within a probe hybridization area, or as indels that alter the repeat region size). These risks can be mitigated by running orthogonal or multiplexed/multi-modal confirmatory tests. Our results highlight the robust performance of the described PCR primer designs, as sample genotypes were in accord with previous genotyping results, and with results obtained using an alternate PCR design.
The majority of C9orf72 expanded samples have hyper-expanded alleles yet several reports, including this study, have identified patients or patient-derived samples with primary expanded alleles shorter than 100 repeats (22)(23)(24)(25). Interestingly, some of these patients remain asymptomatic at ages in which full penetrance is normally observed suggesting a more complex mechanism of pathogenicity and inheritance for shorter expansions which may be influenced by genetic modifiers, SVs, environmental conditions, and somatic mosaicism (22,24,25). Albeit rare, these observations warrant investigation into factors influencing pathology and the categorical cutoffs for expansions, which can be addressed using the sizing capabilities of the PCR assays described.
In conclusion, we described a multi-purpose, streamlined PCR assay that can quantify molecular features of the C9orf72 repeat region. The assay reports repeat sizing from 2 to 950 repeats, detects expansions of >950 repeats in agreement with other assays, and flags SVs around the repeat tract. Importantly, this reagent system provides a rapid time-to-result in a single-tube format, requiring 4.5 hours for PCR/CE and 7 hours PCR/ AGE, enables inputs of 10-to 100-fold less than published assays (8), and is configured and verified to help standardize results across laboratories. These capabilities can address well-documented inconsistencies in performance in current assays (8) and offer potentially more reliable results in screening and genotype-phenotype studies (26). Finally, the technology may also be helpful in accelerating new therapeutic approaches such as antisense oligonucleotides (27)(28)(29)(30) or Cas9-based agents (31,32) for repeat expansion disorders such as ALS/FTD. official views of the NINDS or the National Institutes of Health. This study used cell line DNA samples from the NINDS Repository, as well as clinical data. NINDS Repository sample numbers corresponding to the samples used are listed in Supplementary Table 1.

Declaration of interest
All authors are employed by Asuragen, Inc. Asuragen markets an RUO assay for genotyping C9orf72 hexanucleotide repeats (AmplideX V R PCR/ CE C9orf72 Kit). This work was supported by NINDS [R44NS089423] (to EB).