Multiple detection and spread of novel strains of the SARS-CoV-2 B.1.177 (B.1.177.75) lineage that test negative by a commercially available nucleocapsid gene real-time RT-PCR

ABSTRACT Several lineages of SARS-CoV-2 are currently circulating worldwide. During SARS-CoV-2 diagnostic activities performed in Abruzzo region (central Italy) several strains belonging to the B.1.177.75 lineage tested negative for the N gene but positive for the ORF1ab and S genes (+/+/- pattern) by the TaqPath COVID-19 CE-IVD RT-PCR Kit manufactured by Thermofisher. By sequencing, a unique mutation, synonymous 28948C > T, was found in the N-negative B.1.177.75 strains. Although we do not have any knowledge upon the nucleotide sequences of the primers and probe adopted by this kit, it is likely that N gene dropout only occurs when 28948C > T is coupled with 28932C > T, this latter present, in turn, in all B.1.177.75 sequences available on public databases. Furthermore, epidemiological analysis was also performed. The majority of the N-negative B.1.177.75 cases belonged to two clusters apparently unrelated to each other and both clusters involved young people. However, the phylogeny for sequences containing the +/+/- pattern strongly supports a genetic connection and one common source for both clusters. Though, genetic comparison suggests a connection rather than indicating the independent emergence of the same mutation in two apparently unrelated clusters. This study highlights once more the importance of sharing genomic data to link apparently unrelated epidemiological clusters and to, remarkably, update molecular tests.


Introduction
Hallmark of coronaviruses (CoVs) is their exceptional genetic plasticity which may promote changes in their antigenic profile, tissue tropism or host range by means of two distinct mechanisms. First, the viral replicase (an RNA dependent-RNA polymerase) does not possess a good proof-reading activity, therefore the incorporation of wrong nucleotides at each replication cycle and the consequent accumulation of mutations in the viral genome lead to a progressive differentiation of the viral progeny from the parental strain. This mechanism may cause the progressive adaptation of the viral surface proteins to the cell receptors, thus increasing the viral fitness. Second, the distinctive replicating machinery of CoVs facilitates homologous and heterologous recombination events [1,2].
Severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2) (species Severe acute respiratory syndrome-related coronavirus, subgenus Sarbecovirus, genus Betacoronavirus, family Coronaviridae) is the causative agent of the current pandemic of CoV respiratory disease, named by the WHO coronavirus disease 2019 (COVID-19) [3][4][5]. Since its emergence, SARS-CoV-2 evolved rapidly and several viral lineages (containing unique constellations of mutations, including several of known biological importance located at the S1 portion of the spike protein) are now circulating worldwide [6]. Within these lineages, B.1.1.7, P.1, and B.1.351 gained international concern as for their enhanced transmission capabilities, mortality rates and/or reduced neutralization of specific immunity stimulated by previous infections or by vaccines against them [7].
However, besides being able to affect viral pathobiology [8][9][10][11], mutations may have significant impact on virus direct diagnosis by polymerase chain reaction (PCR)-based assays. Fast and accurate molecular diagnosis of SARS-CoV-2 RNA out of naso-oropharingeal swab specimens is the first step to quickly identify positive cases, and thus, preserve the virus from further spreading and limiting the impact of the pandemic in the human population. Hence, mutations occurring in those regions of the genome that are the target of molecular tests may generate false negative results, which could hamper proper diagnosis, contact tracing, and quarantine measures.
Several real-time RT-PCR assays have been developed and are commercially available for detection of SARS-CoV-2 RNA, which are generally established to detect multiple (up to three) SARS-CoV-2 ORFs including the large viral polymerase gene located at the 5 ′ -most two thirds of the SARS-CoV-2 genome and ORFs encoding for structural proteins located at the 3 ′ -end of the genome. One of the most common kit for SARS-CoV-2 RNA diagnosis is the TaqPath COVID-19 CE-IVD RT-PCR Kit manufactured by Thermofisher ® (Thermo Fisher Scientific, Waltham, MA, USA). This test is able to detect simultaneously three different portions of the SARS-CoV-2 genome located in the ORF1ab, spike (S) and nucleocapsid (N) protein ORFs, respectively. This test is currently used for molecular diagnosis of SARS-CoV-2 at Istituto Zooprofilattico Sperimentale dell'Abruzzo e del Molise (IZSAM), a veterinary public health institute, designated by the Italian Ministry of Health and the Abruzzo region as diagnostic hub for COVID-19 diagnosis and genome analysis [12].
In this manuscript, we describe the detection of multiple strains belonging to the B.1.177.75 lineage which tested negative for the N gene with the Thermofisher molecular test but positive for the ORF1ab and S genes. The implications for SARS-CoV-2 diagnosis of mutations in the N gene, potentially involved in the failed detection of N target, are discussed.

Ethical approval
The results analysed in the present study derive from the official control activities performed by the Public Health Local Authority of Abruzzo region and no ethical approval is specifically requested.

Surveillance and epidemiological analysis
Specimens that are N-negative but positive for the ORF1ab and S (+/+/-) genes are considered positive for SARS-CoV-2 according to the manufacturer's recommendations.
The first three samples (2021TE101854, 2021TE101848 and2021TE101850) showing this pattern were collected on 19 February 2021 in Torino di Sangro (Chieti province). Additional information was provided from the Local Health Authority as the three samples belonged to individuals of the same family. As a consequence, we monitored the spread of N-negative SARS-CoV-2 strains in a time period comprised between 20 February and 6 March 2021. N-gene negative samples were processed by means of an additional real-time RT-PCR targeting the N gene, 2019-nCoV_N1 [13]. Epidemiological analysis was performed for samples showing the +/+/-pattern supported by the Local Health Authority (Azienda Sanitaria Locale -ASL) as this later provided personal data and epidemiological information of the detected cases.

Sequencing and genome analysis
Within the N-negative samples, those showing cycle threshold (C T ) values <20 by the ORF1ab and S realtime RT-PCR assays, when possible, were further processed for whole genome sequencing by next generation sequencing (NGS) and for genome analysis as described previously by our group [12]. Samples were processed by NGS by means of two different protocols. The first included the Artic v3 protocol, whereas the second was based on the CovidSeq protocol. As for the Artic v3 protocol [14], cDNA was synthesized starting from 2.25 µl of purified RNA and then amplified with Q5 highfidelity DNA polymerase (NEB) using each of two Artic v3 primer pools tiling the SARS-CoV-2 genome. Library preparation was carried out using Nextera DNA Flex Library Prep (Illumina Inc., San Diego, CA USA) and deep sequencing was performed on the Min-iSeq (Illumina Inc., San Diego, CA, USA) by the MiniSeq Mid Output Kit (300-cycles) and standard 150 bp paired-end reads. As for the CovidSeq protocol, the libraries were prepared from 8.5 µl of purified RNA, according to the manufacturer's protocol (Illumina COVIDSeq Test, Illumina Inc, San Diego, USA). This method combines ARTIC multiplex PCR protocol with Illumina sequencing technology. Deep sequencing was performed on the NextSeq 500 platform (Illumina Inc., San Diego, CA, USA) using the NextSeq 500/550 High Output Reagent Cartridge v2, with 75 cycles and 36 bp paired-end reads. For bioinformatic analyses, all tools were run with default parameters unless otherwise specified. Quality control of the reads was performed using FastQC. Reads obtained were trimmed by Trimmomatic [15]. SARS-CoV-2 consensus sequences were obtained using iVar [16], and reads were then mapped to the reference sequence (Wuhan-Hu-1 [GenBank accession number NC_045512]) by Snippy [17].
Consensus sequences were submitted to the Pango COVID-19 lineage assigner [18] for lineage assignment. Obtained sequences were submitted to GISAID ( Table 1) April 2021) were downloaded and aligned using MAFFT implementing an FFT-NS-2 algorithm [19]. Alignments were manually inspected in Geneious Prime ® 2021.1.1 [20] and identical sequences were removed. A phylogenetic tree was estimated using IQTree with the Hasegawa-Kishino-Yano nucleotide substitution model and a gamma distributed rate variation among sites (HKY+Γ) as well as 1000 rounds of SH-like approximate likelihood ratio test (SH-aLRT) for branch support [21,22]. The N-gene from one of the +/+/-samples was used for a blastn search via the National Center for Biotechnology Information (NCBI) and the top 100 hits were added to the data set.

SARS-CoV-2 positive samples showing the +/+/pattern are circulating in central Italy
From 20 February to 6 March 2021, a total of 32,681 nasopharyngeal swab specimens were tested at IZSAM for SARS-CoV-2 RNA by means of the TaqPath COVID-19 CE-IVD RT-PCR Kit. A total of 5,277 swabs tested positive for SARS-CoV-2 in the study period (5277/32,681, 16.15%). Of these, 17/5277 (0.32%) showed a diagnostic pattern +/+/-for the ORF1ab, S and N genes respectively. C T values, sampling date and onset of symptoms related to these cases are showed in Table 1. These samples were collected from individuals living in three municipalities located in the province of Chieti: Torino di Sangro, Lanciano, and Paglieta.  Table 1.
Blastn search of the N gene from a sequence enclosing both 28932C > T and 28948C > T revealed the presence of five additional sequences with this very same mutation pair, all of which were from Switzerland. Notably, two of these Swiss sequences were obtained in October 2020. Phylogenetic evaluation  . This group includes also one sequence from Switzerland, which contains the wild-type C nucleotide at position 28,932 and the mutation T at position 28,948.

Two apparently unrelated epidemiological clusters were evidenced
Information provided by the Local Health Authority highlighted the existence of two apparently unrelated clusters (cluster A and B), within the samples under investigation, plus three additional unrelated cases, all sustained by the N-negative SARS-CoV-2 strains of the B.1.177.75 lineage. In both clusters and all cases, clinical signs were generally mild and included fever, coughing, headache, and myalgia. They did not require hospitalization. Cluster A, identified from the first three samples collected on 19 February 2021, included 10 cases. It concerned two related families and two individuals living in adjacent houses. The first case of cluster A, a 3-year-old child, showed clinical signs on February 17, followed by his parents (February 18 and 19) and his older sister (February 20). Both children attended the same day-care in Torino di Sangro where several SARS-CoV-2 cases (nine cases in total, including the two in our study population) were detected starting from February 15. One of these cases was the teacher of the positive 3-yearold child. Interestingly, none of the other SARS-CoV-2 cases related to the day-care, including the teacher, showed the +/+/-pattern, as they were positive also for the N-gene. The rest of the cases belonging to cluster A showed clinical signs shortly after the first case, from February 18-20.
Cluster B grouped seven cases identified from February 27 to March 5. The cases, of which only five were symptomatic, belonged to two families living in Lanciano (Abruzzo region), linked by means of two teenagers attending the same class. Interestingly, two swab samples collected from scholars of the same school on March 2 and 3, respectively, showed the +/-/+ diagnostic pattern, a feature that, with the adopted molecular assay, is strongly suggestive of the B.1.1.7 lineage, which was widespread in the area [23,24]. Thus, the cluster B cases were not likely linked to those occurring at school.
Phylogenetic analysis showed that all sequences form a distinct clade, and thus, are genetically related to each other (Figure 2). Additional three unrelated cases showing the +/+/-diagnostic pattern were identified at IZSAM during the survey period but only two were sequenced. Among these, one involved an individual who worked in the same town as cases of cluster A; the remaining two cases did not have any apparent connection to the other cases.
In addition, two additional sequences, downloaded from GISAID, are present in this clade. One (hCoV-19/Italy/ABR-IZSGC-160455/2021) was collected after the survey period, while the other (hCoV-19/ Italy/ABR-EN893/2021) was processed by another diagnostic and sequencing hub in Abruzzo region. Unfortunately, we do not have any epidemiological information of these cases.

Discussion
We describe here the circulation of B.1.177.75 strains, which are N-negative when processed with one of the most common kit for SARS-CoV-2 RNA diagnosis, the TaqPath COVID-19 CE-IVD RT-PCR Kit manufactured by Thermofisher. This phenomenon is likely due to the onset, in the B.1.177.75 lineage, of the synonymous 28948C > T mutation. Although we do not have any knowledge upon the nucleotide sequences of the primers and probe adopted by this kit, it is likely that this gene dropout only occurs when 28948C > T is coupled with 28932C > T, this latter present in all B.1.177.75 sequences available on public databases. Thus, 28948C > T is the sole mutation characterizing the N-negative B.1.177.75 samples with respect to N-positive strains belonging to the same lineage. It is known that few mismatches in the oligonucleotide binding region can affect the amplification efficiency, with prevention of any amplification when located in the very 3 ′ end of the primer(s) or on the middle of the TaqMan probe [25][26][27]. In a TaqMan assay for detection of rabies virus RNA, the number of sequence mismatches between gene-specific oligonucleotides and the target sequence significantly affected amplification and point mutations at the centre of the probe resulted in false-negative results through the prevention of probe binding and subsequent fluorescence [28]. Point mutations at positions 11-17 of the probe target site were found to yield false-negative results in a realtime RT-PCR assay for human respiratory syncytial virus [29].
Mutations 28948C > T and 28932C > T in the N gene are not novel within the plethora of mutations evidenced during SARS-CoV-2 evolution and adaptation to the human host. Indeed, they have been noticed, independently and regardless of the lineage, in SARS-CoV-2 sequences available on public databases. However, we were not able to quantity their presence across the GISAID database. GISAID currently holds ∼1.5 million sequence entries which is too much data to analyse. Although, GISAID has the option to search for specific mutations/variants, those found in this study are not on that list, suggesting their global prevalence is low.
B As we did not observe any +/+/-diagnostic pattern prior to 19 February 2021 and considering that IZSAM processes for SARS-CoV-2 RNA the vast majority (up to 65%) of swab samples of the entire Abruzzo region, we may speculate that mutation 28948C > T emerged with case 2021TE101854, a 3year-old child who likely got infected from a B.1.177.75 N-positive individual, potentially including his teacher at daycare. This speculation relies on the fact that, epidemiologically, this is the most likely scenario as 2021TE101854 upstream connections did not show this mutation (being N-gene positive by real-time RT-PCR) and as this child was the first case to show symptoms in the cluster. The phylogeny for sequences containing the +/+/-pattern strongly supports a genetic connection and one common source for clusters A and B, as well as for the epidemiologically unrelated cases. Epidemiological connections are difficult to establish, particularly when young people are involved. Though, genetic comparison was useful in supporting a connection rather than suggesting the independent emergence of the same mutation in two apparently unrelated clusters. In addition, the lack of epidemiological link suggests the presence of additional cases (not sequences) containing the +/+/-pattern. Indeed, additional SARS-CoV-2 diagnostic hubs are active in Abruzzo region, thus we cannot exclude that other samples, harbouring this mutation, might have been tested somewhere else and potentially processed with different diagnostic kits which do not reveal this diagnostic pattern.
This manuscript has certainly some pitfalls. First, we were not able to sequence any of the N-positive cases connected to the N-negative clusters as a consequence of the fast turnaround of samples at the COVID-19 diagnostic hub of the IZSAM. These samples have been unfortunately discharged before epidemiological investigations were carried out.  Figure 1 for those sequences congaing the mutation leading to the +/+/-pattern. Lineages are indicated and sequences belonging to cluster A or B are coloured in orange a green, respectively. Branch length is indicative of nucleotide substitutions per site.
Second, we did not demonstrate that the combination of 28948C > T and 28932C > T is, with certainty, responsible for the observed N-gene dropout. In this regard, however, it is important to point out that, in support to our hypothesis, 28948C > T is the only mutation differentiating N-negative from N-positive B.1.177.75 samples and that, when N-negative samples are tested with a different N-based molecular test, all of them tested positive.
The number of infections caused by N-negative B.1.177.75 strains is overall limited and, importantly, these strains do not seem to cause more severe disease or more sustained transmission. However, the evidence of this strain endowed with this genomic hallmark has obvious consequences for molecular diagnosis as Thermofisher TaqPath COVID-19 CE-IVD RT-PCR Kit is one of the most used assays for SARS-CoV-2 diagnosis. Albeit limited, this evidence highlights the need for continuous surveillance, sharing of genomic data, which are indeed essential to update molecular tests, and the need for multiple genetic targets in nucleic acid amplification tests.
One of the most worrisome and currently widespread lineages, the B.1.1.7, is characterized by the Sgene dropout when tested with the same assay used in this study. This characteristic is related to an inframe, 6-nucleotide deletion in the S gene, which is responsible for a 2-amino acid deletion at positions 69 and 70 of the spike protein (69-70del). This particular deletion has been observed in multiple distinct lineages besides B.1.1.7, notably in the mink cluster V lineage from Denmark and also in some B.1.177 strains. It cannot be ruled out that, as a consequence of convergent evolution or homologous recombination events, the 28948C > T and 28932C > T mutations may emerge in other lineages, including B.1.1.7, and that could lead to substantial diagnostic problems. Therefore, it remains essential to share genomic data if, when multi-target assays are used, novel diagnostic patterns are evidenced.