Rapidly emerging SARS-CoV-2 B.1.1.7 sub-lineage in the United States of America with spike protein D178H and membrane protein V70L mutations

ABSTRACT The SARS-CoV-2 B.1.1.7 lineage is highly infectious and as of April 2021 accounted for 92% of COVID-19 cases in Europe and 59% of COVID-19 cases in the U.S. It is defined by the N501Y mutation in the receptor-binding domain (RBD) of the Spike (S) protein, and a few other mutations. These include two mutations in the N terminal domain (NTD) of the S protein, HV69-70del and Y144del (also known as Y145del due to the presence of tyrosine at both positions). We recently identified several emerging SARS-CoV-2 variants of concerns, characterized by Membrane (M) protein mutations, including I82T and V70L. We now identify a sub-lineage of B.1.1.7 that emerged through sequential acquisitions of M:V70L in November 2020 followed by a novel S:D178H mutation first observed in early February 2021. The percentage of B.1.1.7 isolates in the US that belong to this sub-lineage increased from 0.15% in February 2021 to 1.8% in April 2021. To date, this sub-lineage appears to be U.S.-specific with reported cases in 31 states, including Hawaii. As of April 2021, it constituted 36.8% of all B.1.1.7 isolates in Washington. Phylogenetic analysis and transmission inference with Nextstrain suggest this sub-lineage likely originated in either California or Washington. Structural analysis revealed that the S:D178H mutation is in the NTD of the S protein and close to two other signature mutations of B.1.1.7, HV69-70del and Y144del. It is surface exposed and may alter NTD tertiary configuration or accessibility, and thus has the potential to affect neutralization by NTD directed antibodies.

Introduction B.1.1.7 emerged in the UK and was the first major SARS-CoV-2 variant of concern (VOC) that is both more transmissible and apparently more virulent [1]. It now accounts for 50-90% of the COVID-19 cases in US and Europe. The Spike (S) protein N501Y mutation in the receptor-binding domain (RBD) confers higher binding affinity of the S protein for ACE2, while the other two deletions, HV69-70del and Y144del in the N-terminal domain (NTD) may also play a role in ACE2 receptor binding or neutralizing antibody escape [2]. With millions of new B.1.1.7 cases in recent months, there is a very high probability of continuous acquisitions of new mutations, some of which may result in the emergence of new and even more infectious sub-lineages of B.1.1.7. While these new mutations may not be significantly deleterious by themselves, but when they appear in the context of other mutations within this VOC the result may be a more transmissible or pathogenic virus. This calls for rigorous genomic surveillance for newly acquired mutations in previously reported VOCs, including but not limited to B.1.1.7 and B.1.351.
Using the Children's Hospital Los Angeles (CHLA) COVID-19 Analysis Research Database (CARD) [3], and viral sequences submitted to GISAID and NCBI GenBank, we have routinely performed genomic epidemiology and genomic surveillance studies of local, national and international databases [4][5][6][7][8][9]. This allowed us to identify a new rapidly expanding SARS-CoV-2 lineage (B.1.575) with a signature mutation I82T in the M gene [7]. In the same study, we identified multiple other M mutations including V70L that are currently being encountered with significantly increased frequency. We have identified the M:V70L mutation in multiple SARS-CoV-2 lineages but primarily in the B.

Ethics approval
The study design conducted at Children's Hospital Los Angeles was approved by the Institutional Review Board under IRB CHLA-16-00429.

SARS-CoV-2 whole genome sequencing
Whole genome sequencing of the 2900 samples previously confirmed at Children's Hospital Los Angeles to be positive for SARS-CoV-2 by reverse transcription-polymerase chain reaction (RT-PCR) was performed as previously described [5].

SARS-CoV-2 sequence and variant analysis, and emerging variant monitoring
Full-length SARS-CoV-2 sequences had been periodically downloaded from GISAID [10,11] and NCBI GenBank. They were combined with SARS-CoV-2 sequences from CHLA patients, annotated, and curated using a suite of bioinformatics tools, CHLA-CARD, as previously described [3]. A custom Surging Mutation Monitor (SMM) standardized and integrated the viral genome and demographic data, in order to identify the trend of surging mutations and lineages across state and country levels. The current study was based on the 1.33 million global viral genomes that were available on 1 May 2021.

Results
Identification of a rapidly emerging B.1.1.7 sublineage We evaluated 1,333,679 SARS-CoV-2 viral genomes available on 1 May 2021, including 2900 from our own institution and the rest from GISAID and NCBI GenBank. We searched for SARS-CoV-2 mutations with a significantly higher prevalence rate in both the US and globally. Candidate mutations were further partitioned by pangolin lineage to identify emerging mutations in the context of a specific lineage, such as B.1.1.7 or B.1.351. We focused initially on the M mutations that we previously identified, including V70L, that was spiking near the end of 2020 [7]. Overall the percentage of isolates that carried the M:V70L mutation had been relatively stable in the US and globally with a gradual month to month increase (Table 1). In the vast majority of cases, the M:V70L mutation occurred on the B.1.1.7 lineage. While the percentage of B.1.1.7 isolates with the V70L mutation remained relatively stable across the world, the percentage fluctuated significantly in the US, attributable largely to the initial small number of B.1.1.7 cases in the U.S.
We identified the acquisition of another S mutation, D178H, in this B.1.1.7 sub-lineage ( Figure  1 and Table 2), which was estimated to have occurred on 23 January 2021. By April, the prevalence of SARS-CoV-2 isolates carrying the S:D178H mutation increased to 1.05% nationally and as high as 14.77% in Washington. When we examined the prevalence of S:D178H in the context of the B.  Table 3). In California, S:D178H was first seen in December 2020, but it was not seen within the B.1.1.7 lineage until 4 February 2021 (Table 3). Its prevalence increased to 1.6% (45/2904) in April compared to all isolates studied in California,

Protein structure and mutation effect prediction
The 3D structure of the Spike protein, as visualized using the CoV3D mutation viewer. Using these results we were able to show that the S:D178H mutation is close structurally to two signature deletions of  B.1.1.7, HV69_70del and Y144del ( Figure 4). They are all surface exposed and likely alter N terminal domain (NTD) tertiary configuration.

Discussion
It should be noted that our study relied primarily upon sequence data deposited at GISAID, which represents a limitation and could also introduce potential bias, as state public health laboratories have varied sequencing capacity and nonuniform data sharing and reporting practices. It is less likely to be biased significantly by the practice of a single laboratory though. As an example, we included 2900 SARS-CoV-2 sequences that we obtained at Children's Hospital Los Angeles since March 2020. We found only 29 sequences belonging to the B. This abrupt change could reflect undersampling or potentially reflect superspreader events. Given the spike in the number of B.1.1.7 cases, this sub-lineage clearly appears to be more transmissible than even the original B.1.1.7 lineage. This finding warrants prompt and further attention by public health authorities, as this mutation profile is closely linked to the resurgence of cases in Washington in particular. It also strongly supports the now widely recognized need for more extensive SARS-CoV-2 viral sequencing of PCR positive COVID-19 cases for detection of new mutations of concern as part of widespread genomic surveillance [16,17].
The S:D178H mutation, while demonstrably associated here with the more pathogenic B.1.1.7 lineage, is not necessarily by itself more pathogenic. Dozens of SARS-CoV-2 genomes that carry the S:D178H mutation were reported before February, but none of these demonstrated the increased frequency seen when the mutation occurs in the context of the B.1.1.7 lineage. Phylogenetic analysis revealed a distinct and long branch leading to the new S:D178 branch after M:V70L. Together, these observations suggest that the S:D178H mutation is recurrent, but only increased exponentially in the context of the more pathogenic B.1.1.7 lineage, which serves as an argument for its fitness. This is the same observed  The S:D178H mutation arose independently again in the US on the B.1.1.7-M:V70L background. The rapid increase in its prevalence, only after its acquisition by the B.1.1.7-M:V70L sub-lineage suggests this combination of mutations is associated with increased transmissibility. It is also of interest that this mutation occurs in the NTD, unlike most of the mutations associated with current VOC that are centred on the spike protein RBD, implying that NTD mutations beyond the original 69-70del and the 144del are of concern. And finally, it should be noted that this NTD mutation co-exists with the previously reported M protein mutation M:V70L, suggesting that M protein mutations also contribute to enhanced biologic "fitness" or pathogenicity of this sub-lineage.
The appearance of the S:D178H mutation in the context of the B.1.1.7 lineage is temporally associated with the increased incidence of COVID-19 in Washington. New cases in Washington were higher than the national level at the time of the study. According to New York Time COVID-19 dashboard, the 7-day average of new cases on May 2 was 1379 in Washington, which was only a 50% reduction compared to 2757 cases on December 15. In comparison, the numbers on May 2 was 49,270 in US, a 77.3% reduction from the December 15 number of 21,7325. The appearance of this new B.1.1.7 sub-lineage temporally linked to increased cases in Washington warrants further investigation.
The potential effect of the S:D178H mutation on immunity and vaccine "escape" also warrant further analysis. Mutations in the Spike N-terminal domain have been associated with a lack of neutralization by NTD directed antibodies, especially when the N5 loop is affected [18,19]. The NTD initiates viral binding to the ACE2 receptor-expressing host cell. Since the D178H falls in the NTD domain close to the N5 loop, it may alter NTD structure and antibody recognition. It may thus have a similar immune evasion effect as the HV69-70del and Y144del mutations [20], or it may further enhance that of the two other mutations, based on the 3D model. These findings highlight the continued importance of active genomic surveillance to monitor the spread of this B.1.1.7-M: V70L-S:178H lineage.