Shotgun proteomics of SARS-CoV-2 infected cells and its application to the optimisation of whole viral particle antigen production for vaccines

Severe acute respiratory syndrome-related coronavirus 2 (SARS-CoV-2) has resulted in a pandemic and continues to spread quickly around the globe. Currently, no effective vaccine is available to prevent COVID-19 and an intense global development activity is in progress. In this context, the different technology platforms face several challenges resulting from the involvement of a new virus still not fully characterised. Finding of the right conditions for virus amplification for the development of vaccines based on inactivated or attenuated whole viral particles is among them. Here, we describe the establishment of a workflow based on shotgun tandem mass spectrometry data to guide the optimisation of the conditions for viral amplification. In parallel, we analysed the dynamic of the host cell proteome following SARS-CoV-2 infection providing a global overview of biological processes modulated by the virus and that could be further explored to identify drug targets to address the pandemic.


SARS-CoV-2 belongs to the B lineage of the beta-coronaviruses and is closely related to the SARS-CoV
virus [1,2]. It is the causative agent of COVID-19, a severe acute respiratory syndrome that spread world-wide within a few weeks starting on December 2019 in Wuhan [3]. The major four structural genes encode the nucleocapsid protein (N), the spike protein (S), a small membrane protein (SM) and the membrane glycoprotein (M) with an additional membrane glycoprotein (HE) occurring in the HCoV-OC43 and HKU1 beta-coronaviruses [3].
Based on the speed at which the outbreak of COVID-19 has developed, SARS-CoV-2 appears to spread easily in the human population. The reproductive number (R0) of the virus is currently thought to be around 3, suggesting the potential for sustained human-to-human transmission that appears to be through respiratory droplets and potentially a fecal-oral route [4]. In this pandemic situation, one of the outstanding questions concerns the possibility to contain the spread of SARS-CoV-2 and its persistence in the human population. Social distancing policy, lock-down and other containment measures have been worldwide implemented to slow down the spread. Current roadmaps to lifting the restrictions rely on the deployment of effective diagnostics, therapies, and eventually on the development of an effective vaccine.
Several platforms are being used to develop vaccines against SARS-CoV-2, including spike subunit, DNA, RNA, whole-virion, and nanoparticle vaccines. Most successful antiviral vaccines employ inactivated or attenuated whole viral particles as vaccine antigen and depend on the induction of neutralizing antibodies [5,6] against structural proteins of the virus. However, virus yields from the dedicated cell culture systems could be relatively low compared to quantities envisioned to be required for massive vaccine production. In addition, the production campaigns are time-consuming and highly demanding due to the danger of working with these pathogens, and thus optimization of the production of whole viral particle antigen is of utmost interest for vaccines. Concomitantly with vaccine development, a better understanding of how the host responds to SARS-CoV-2 infection may help direct further therapeutic avenues.
Multiple proteomics strategies have been shown insightful for better understanding of coronavirus structure and its molecular mechanisms of infection. Tracheal tissues of chicken infected with infectious bronchitis coronavirus were analyzed by 2D-DIGE and MALDI-TOF tandem mass spectrometry to establish the host response [7]. Vero cells infected with porcine epidemic diarrhea virus (PEDV) were analyzed by shotgun proteomics [8]. Different PEDV coronavirus strains were compared with an iTRAQ-labeling quantitative approach showing differences of inflammatory cascade eliciting [9]. The dynamics of the host proteins triggered by specific overexpressed coronavirus genes was also established [10]. While no literature is yet available on the proteomics characterization of SARS-CoV-2 virus, several studies of interest have been recently submitted and should be soon available [11,12,13].
Here, we describe the establishment of a workflow based on shotgun tandem mass spectrometry data that in addition to gaining more basic information about SARS-CoV-2 infection aims at guiding the optimisation of the conditions for whole viral particle antigen production and aiding SARS-CoV-2 vaccine development.

Infection
For the kinetic, 1x10 6 Vero cells seeded into 25 cm 2 flasks were grown to cell confluence in 5 mL DMEM supplemented with 5% FCS and 0.5% penicillin-streptomycin for one night at 37°C under 9% CO2. They  (Thermo Fisher Scientific) prior in-gel trypsin proteolysis performed as described in Hartmann et al. [15].

Liquid chromatography-mass spectrometry
Peptides were identified using an ultimate 3000 nano-LC system (Thermo Fisher Scientific) coupled with a Q-Exactive HF mass spectrometer (Thermo Fisher Scientific

MS/MS Data Interpretation and label free protein quantification
The MS/MS spectra recorded on each sample were assigned to peptide sequences using the Mascot higher than 98% of the top ion score. Proteins were grouped if they shared at least one peptide, and in each group label-free quantification was based on PSM counts for each protein following the principle of parsimony. Proteins identified by one or more specific peptides were retained for the analysis (protein FDR 1%).

MS/MS data repository.
The mass spectrometry proteomics data have been deposited to the ProteomeXchange Consortium via the PRIDE [17] partner repository with the dataset identifier PXD018594 and 10.6019/PXD018594.

Data analysis
Principal component analysis was done as previously described [18]. Co-expression cluster analysis was obtained using the Bioconductor R package coseq v1.5.2 [19]. The protein abundance matrix was used as an input in coseqR. Log CLR-transformation was applied to the matrix to normalize the abundance of proteins and the K-means algorithm was chosen to detect the co-expressed clusters across the different time points. The K-mean algorithm was repeated 20 times in order to determine the optimal number of clusters. The resulting number of clusters in each run was recorded, and the most parsimonious cluster partition was selected using the slope heuristics approach. Both the PCA and the coseq analysis were performed after removal of proteins with spectral counts lower than three (1402 host protein groups retained). Finally, proteins assigned to the different clusters were retained for cluster visualization and gene ontology (GO)-enrichment analysis per each cluster. Statistically enriched (FDR ≤ 0.05) GO terms on proteins that are differentially expressed between pairwise samples or on proteins assigned to each co-expression cluster were identified using Metascape [20]. The most statistically enriched GO terms were visualized in ggplot2 [21].

Profiling of virus production by tandem mass spectrometry
To understand the dynamics associated with SARS-CoV-2 infection and determine optimal conditions for whole-viral particle antigen production we infected Vero E6 cells with SARS-CoV-2 at two multiplicity of infection (MOI 0.01 and 0.001) and monitored the kinetics of the infection by means of tandem mass spectrometry over several days ( Figure 1). Overall, we identified 3220 Vero cell proteins and 6 SARS-CoV-2 proteins with 27388 and 94 unique peptides (FDR below 1%), respectively (Table   S1) was in accordance to notable cytopathic effect following intense virus replication observed here for each culture at this time-point, as previously reported [22].
As expected, a larger variety of peptides was found for overrepresented proteins. Figure 2B shows the distribution of this diversity across the time points for the three most abundant viral proteins.
Interestingly, increase of protein levels relied on increasing abundance of the same set of peptides with only few new sequences registered at the peak of viral production compared with an early time point.
To evaluate to what extent virus profiles obtained by tandem mass spectrometry reflected virus production we measured SARS-CoV-2 RNA molecules by quantitative PCR analysis across the same time points ( Figure 2C). Variations in most abundant viral protein yields reflected variation in the number of SARS-CoV-2 RNA molecules confirming that LC-MS/MS with label-free quantitation can be applied to monitor SARS-CoV-2 infection kinetics.

Characterization of host cell protein dynamics upon SARS-CoV-2 infection
Next

Discussion
In the race to develop a vaccine for fighting the global spreading of SARS-CoV-2, different technology platforms have been evaluated [23]. In this context, the obtainment of vaccines based on inactivated or attenuated whole viral particles could be challenged in the finding of the right conditions for virus amplification. Virus yields from the dedicated cell culture systems could also represent a limitation.
Concomitantly with vaccine development, inactivated virus particles are also of interest for testing real serology or screening neutralizing antibodies. Evidently, production of well-characterized active virus particles is also of interest for fundamental research purposes. Given the requirement for speed, here we evaluate the use of LC-MS/MS as a tool for guiding the optimisation of the conditions for SARS-CoV-2 whole viral particle antigen production.
The results presented here demonstrate the potential of our pipeline to profile virus production across time. In particular, by analysing the proteome of Vero cells infected with SARS-CoV-2 at two different MOI, it was possible to monitor changes in the levels of three SARS-CoV-2 structural proteins and three non-structural ones. Whilst as for other analyses [11,13] we could not detect peptides from protein E like. The lack of detection of other accessory proteins could be imputed to differences in samples processing with the protocol described here favouring simplified steps and speed while maintaining accuracy. Deeper analyses are envisaged for monitoring virus homogeneity during the different steps of viral production once the most permissive conditions will be established.
Remarkably, comparable profiles were obtained at the two tested MOI, with the profiles obtained at lower MOI slightly delayed and hence more insightful regarding the timing of the burst in the abundance of viral levels.
Overrepresented proteins were described by a larger variety of peptides. Interestingly, an increase of protein levels relied on the increasing abundance of the same set of peptides with only a few new sequences registered at late time points suggesting that absolute quantification of the virus could be obtained by targeted approaches by following early detectable peptides. Eventually tandem mass spectrometry proteotyping [24,25] could be also proposed to detect SARS-CoV-2 viruses.
Besides profiling virus production, our mass spectrometry analysis of the whole cell content provides insights regarding the cellular response to SARS-CoV-2 infection. Notwithstanding, while more detailed information is available regarding virus cell entry, increased understanding of the different steps in the SARS-CoV-2 replication cycle are needed [26]. To our knowledge, this is one of the first attempt to characterize cellular response to infection with SARS-CoV-2 in Vero E6 cells. Interestingly, such proteomics data can be acquired on different cell lines from humans and primates in order to define by comparative proteomics the common mechanisms of cell infection and the mechanisms specific of a given cell line.
The analysis of our proteomic data suggested substantial temporal remodelling of the host proteome Here, we show that our pipeline based on LC-MS/MS analysis is a suitable tool for the characterisation of SARS-CoV-2 production. We, therefore, suggest that it could be of use in the optimisation of the condition for viral amplification to speed up the initial steps in favour of those that later on during the development process will require a more careful evaluation of effectiveness and safety.
Besides, peptide information described here provide sufficient information to enable a targeted analysis, opening the possibility of using mass spectrometry-based targeted approaches for the evaluation of critical aspects (i.e. quality and quantity) during the different steps of the virus purification processes. Furthermore, the characterized changes in cellular protein networks upon SARS-CoV-2 infection provided valuable insights that could be further explored and guide the identification of drug targets to address the pandemic caused by SARS-CoV-2.
We can anticipate that the same workflow could be successfully applied to expedite the characterisation of human organ-on-a-chip (Organ Chip) microfluidic culture devices used to obtain insights on the different steps of the virus life cycle as well as to study human disease pathogenesis [27] in response to infection by variants of SARS-Cov-2 under or not the addition of existing [28,29,30] and novel therapeutics.

Author contributions
LG, FG, OP, LB, and JA conceptualized the stud design; FG, JCG, HB, SR, NB performed the experiments; LG, OP, DG, FG, LB and JA analysed the data; GM, KC, SD, MAR, GS, CF, AD, BAB, and CB contributed reagents and software development; LG and JA draft the manuscript. All authors read and approved the final manuscript.