The SARS-CoV-2 Spike Protein Mutation Explorer: using an interactive application to improve the public understanding of SARS-CoV-2 variants of concern

Abstract Due to the COVID-19 pandemic the virus responsible, SARS-CoV-2, became a source of intense interest for non-expert audiences. The viral spike protein gained particular public interest as the main target for protective immune responses, including those elicited by vaccines. The rapid evolution of SARS-CoV-2 resulted in variations in the spike that enhanced transmissibility or weakened vaccine protection. This created new variants of concern (VOCs). The emergence of VOCs was studied using viral sequence data which was shared through portals such as the online Mutation Explorer of the COVID-19 Genomics UK consortium (COG-UK/ME). This was designed for an expert audience, but the information it contained could be of general interest if suitably communicated. Visualisations, interactivity and animation can improve engagement and understanding of molecular biology topics, and so we developed a graphical educational resource, the SARS-CoV-2 Spike Protein Mutation Explorer (SSPME), which used interactive 3D molecular models and animations to explain the molecular biology underpinning VOCs. User testing showed that the SSPME had better usability and improved participant knowledge confidence and knowledge acquisition compared to COG-UK/ME. This demonstrates how interactive visualisations can be used for effective molecular biology communication, as well as improving the public understanding of SARS-CoV-2 VOCs.


Introduction
Severe Acute Respiratory Syndrome Coronavirus 2 (SARS-CoV-2) emerged in late 2019, leading to the global Coronavirus Disease 2019 (COVID-19) pandemic which has claimed over 6.4 million lives as of August 2022 (World Health Organization, 2022).SARS-CoV-2 is a betacoronavirus with a ribonucleic acid (RNA) genome; of the proteins it encodes, four are incorporated into virus particles, namely the spike (S), envelope (E), membrane (M) and nucleocapsid (N) proteins (Chan et al., 2020;Peacock et al., 2021).The spike proteins are distributed across the outer viral envelope, giving the virus the characteristic crown-like or 'solar corona' appearance from which it gets its name (Benton et al., 2020;Ke et al., 2020).The spike protein is a homotrimer, meaning that it is composed of three identical smaller units called monomers; each monomer is split into two subunits, termed S1 and S2.The S1 subunit contains two key regions termed the N-terminal domain (NTD) and receptor-binding domain (RBD).As indicated by its name, the RBD binds to the host cell surface receptor Angiotensin-Converting Enzyme 2 (ACE-2) (Letko et al., 2020;Peacock et al., 2021) and is also the part of the spike protein principally targeted by antibodies of the immune system.The Nterminal domain is also an important target for the immune system, to which host antibodies can bind to block infection (Casalino et al., 2020).The spike protein can alternate between 'open' and 'closed' conformations, allowing for the RBD to bind to host cells when open, and avoid recognition by host antibodies when closed (Benton et al., 2020;Ke et al., 2020).The S2 subunit plays an important role in membrane fusion, where the virus fuses with the host cell to release its contents (Wrobel et al., 2020).The spike proteins are one of the key determinants of infectivity and, because their functions can be blocked by the binding of antibodies, of antiviral immunity (Peacock et al., 2021).SARS-CoV-2 has undergone significant evolution since becoming established as a new human pathogen, with mutations that cause changes in the spike protein having particularly important consequences for the course of the pandemic.As the spike protein acquires changes in regions that improve the ability of the virus to infect, transmit and evade the host's immune system, such as the RBD and NTD, variants which are more harmful to the human populationtermed variants of concern (VOCs)have emerged (Harvey et al., 2021).As of August 2022, five key variants or VOCs have been designated, referred to as Alpha, Beta, Gamma, Delta and Omicron (World Health Organization, 2022), of which Omicron is currently circulating around the globe.Many research groups and public health bodies are monitoring the spread of SARS-CoV-2 and its accumulating mutations, including the COVID-19 Genomics UK consortium (COG-UK).COG-UK maintains an online Mutation Explorer (ME) resource (COG-UK/ME, http:// sars2.cvr.gla.ac.uk/cog-uk/), which provides open access data on SARS-CoV-2 spike mutations and VOCs (Wright, 2021).As this resource is tailored to the needs of expert users, we recognised that its data are likely to be less accessible to a non-expert audience.We reasoned that incorporating interactive visual resources into the COG-UK/ME would allow us to combat this issue and enhance public understanding of the vital data contained on the website.
It has been demonstrated that introducing interactivity and visualisations can bring molecular biology topics into a more comprehensible format for those without a scientific background (Jenkinson, 2018).Interactivity can better allow the user to engage with the material, boosting 'active learning' and allowing for higher engagement and deeper information retention (Bruce-Low et al., 2013).Including three-dimensionality into teaching molecular biology can also be informative, as it better allows the user to view protein structures and gain spatial cues, as well as boosting understanding of protein function (H€ offler, 2010).Additionally, by visualising these complex topics in animations rather than in static formats, it is possible to convey protein dynamics with a visual narrative, which can further facilitate learning (Barak & Hussein-Farraj, 2013;Jenkinson, 2018).Therefore, user learning may be improved by incorporating interactive visual resources into resources conveying complex scientific information, such as the COG-UK/ME.
To address this need for interactivity, we created a graphical resource, the SARS-CoV-2 Spike Protein Mutation Explorer (SSPME).The SSPME is an online resource that brings together 3D models and narrated molecular animations into an interactive application to explain SARS-CoV-2 spike protein amino acid substitutions (linked to underlying point mutations in the virus RNA) and VOCs to a general audience.Usertesting showed that SSPME provides a better user experience and increases the acquired knowledge of non-specialists when compared to the COG-UK/ME website.As well as providing a resource to help the general public interpret information about SARS-CoV-2 VOCs and to help the explanation of these topics in the media, this research provides a case study of using interactive graphical resources to facilitate molecular biology education for the general public.

The SARS-CoV-2 Spike Protein Mutation Explorer
Early user-testing of the COG-UK/ME was completed in order to compile feedback on this resource and identify areas for improvement that could be addressed within the SSPME (Supplementary Material).Participants from this early user-testing of the COG-UK/ME would form the Control Group (CG) in this research.CG results indicated that this development should include more introductory information on the SARS-CoV-2 viral structure, including more information about the regions of the spike protein that were most relevant for VOCs.In addition, engaging visualisations were requested for use throughout the COG-UK/ME website in order to facilitate user understanding of these concepts.
A combination of visualisation techniques were used to develop the SSPME, containing information on SARS-CoV-2 spike mutations and VOCs.Briefly, using a combination of protein visualisation and 3D modelling software (Table 1), 3D models of relevant proteins were created from experimentally-resolved protein structures made available through the RCSB Protein Databank (Table 2).
An accurate, interactive spike protein model was developed for use in the application and within the animations.In addition, 3D models of other proteins relevant for SARS-CoV-2 properties (see Table 2) were created using mMaya and used within the animations.All scientific content developed within the animations and the application was reviewed and validated by virologists from the MRC-University of Glasgow Centre for Virus Research and COG-UK/ME.The final application, the SSPME, is hosted freely online (https://sc2-application.itch.io/sars-cov-2-mutation-explorer).The full methodology used to create the SSPME has been described in detail previously (Iannucci et al., 2022).
The SSPME has the goal of providing the public with a concise and comprehensible visual resource for the SARS-CoV-2 spike protein and VOCs, as well as containing animations that explain key viral processes related to SARS-CoV-2 infectivity, transmissibility and antigenicity.The features of the application are explained briefly below.
When a user visits the SSPME website, after encountering the opening and instruction pages, they are introduced to a 3D SARS-CoV-2 spike protein model embedded in a portion of the viral membrane.Using the computer mouse, the user can rotate, zoom in and interact with four main regions of interest (ROI) on the spike protein: the RBD, the NTD, the FCS and the glycans (Figure 1).
From this area of the SSPME, along the right-hand side the user can select buttons to alternate the spike protein RBD's conformation between open and closed (two different states that are important for understanding SARS-CoV-2 infection and immunity; Harvey et al., 2021), download a screenshot of the protein in its current position, access the VOC menu and review a glossary panel of technical terms (Figure 2).
The VOC menu, which opens as a pop-up alongside the interactive spike protein, provides an overview of VOCs that have played a dominant role in the pandemic up to the time of release (when the application was first released on 09 July 2021, this was the Alpha, Beta, Gamma and Delta VOCs; the application was updated on 01 December 2021 to include the Omicron VOC) (World Health Organization, 2022).To illustrate the specific amino acid substitutions that define these VOCs clearly, we used static 3D visualisations provided by the COG-UK/ME (Wright, 2021).
To understand why these amino acid substitutions affect SARS-CoV-2 biology, users need to be provided with detailed contextual information about the ROIs in which they occur.When a user mouse-clicks a ROI on the spike protein, they are taken to that ROI's corresponding region-specific page, which goes into more detail about the ROI and how changes in this region may impact the properties of SARS-CoV-2 (Figure 3).
It is from the RBD-and NTD-specific pages that the 3D molecular animations can be accessed, demonstrating key viral properties and providing visual explanations of how changes in these regions could affect SARS-CoV-2 (Figure 4).

Experimental methods
User-testing was carried out to compare the impact of the SSPME to that of the original COG-UK/ME site on learning and knowledge acquisition of the users, as well as exploring the usability of the application.This study received Ethical Approval from The Glasgow School of Art's Learning and Teaching Office.

Participant recruitment and demographics
In order to recruit participants for this study, digital posters were regularly advertised on Twitter, making particular use of the MRC-University of Glasgow Centre for Virus Research (https://twitter.com/CVRinfo) account which, due to their subject matter and high follower count, could potentially extend our reach for recruitment.Participants signed up via a Google Form which confirmed that they were 16 years or older and had access to a computer and internet.Participants were anonymised throughout the study and assigned to two groups: a Control Group (CG) which would examine the original COG-UK/ME website and a Test Group (TG) which would examine the SSPME.Informed consent and  demographics were collected from all participants (Table 3).All participant demographics were later used in the assessment of the user-testing phase of this study.

Experimental procedure and data analysis
Following participant recruitment, sign-up and obtainment of informed consent, participants were privately emailed an overview of the experimental study and given access to the experimental questionnaire (taking them through seven stages; see Supplementary Material) (Figure 5).
In the first stage (Stage 1: Participant Demographics), information on the participants, such as age and level of education, were collected.A fivepoint Likert scale was used in Stage 2 (Pre-Experiment Rate Your Knowledge) and Stage 5 (Post-Experiment Rate Your Knowledge).Questions asked the user to rate their level of knowledge on different parts of the SARS-CoV-2 structure, with 1 meaning 'strongly disagree' and 5 meaning 'strongly agree'; this was used to measure confidence of knowledge.Stage 4 (Online Resource Review) differed between the CG and TG, with the CG reviewing the COG-UK/ME and the TG reviewing the SSPME.The data obtained from the pre-and post-experiment quiz portions of the experimental questionnaire (Stage 3: Pre-Experiment Quiz and Stage 6: Post-Experiment Quiz) allowed us to determine the knowledge acquisition of the participants.In order to measure the usability of the resources in this project (Stage 7: Usability Questionnaire), the System Usability Scale (SUS) and associated guidelines were used to score the usability of each resource; the SUS score was determined by a standardised series of 10-questions assessing their perceived ability to use the resource successfully (Brooke, 1996;Smyk, 2020).An optional open-ended comment section was provided at the end to obtain additional qualitative feedback which would allow the researchers to assess the success of different aspects of the application and identify areas for improvement.Apart from this, the question order in the pre-and post-experiment quiz was randomised to remove any potential order effect in the study.The results of these methods were used to assess the overall success of the SSPME compared to the COG-UK/ME and to draw general conclusions about the use of 3D interactive visualisations in scientific education and communication.

Use of the SSPME improves knowledge acquisition
We tested for differences in participant knowledge acquisition (determined by analysing the differences in the number of correct responses between preand post-experiment quizzes).The number of correct responses for the CG and TG between the pre-and post-experiment quiz questions is shown in Figure 6.
As experimental data were not normally distributed, we used a Mann-Whitney U test to statistically analyse the difference between groups and a Wilcoxon Signed Rank Test to assess the significance of results within each group.
We did not find any statistically significant difference between pre-experiment knowledge levels for the CG and TG groups (U(N CG ¼ 9, N TG ¼ 14) ¼ 54.00, p ¼ .564).For the CG, there was no statistically significant difference in knowledge acquisition between pre-and post-experiment (Z(N CG ¼ 9) ¼ À0.79, p ¼ .429).However, for the TG there was a very significant difference in knowledge acquisition between pre-and post-experiment (Z(N TG ¼ 14) ¼ À3.16, p< .005).In addition, the post-experiment knowledge acquisition of the TG was significantly higher than that of the CG (U(N CG ¼ 9, N TG ¼ 14) ¼ 30.00, z ¼ À2.15, p < .05).These results suggest that CG and TG had similar knowledge before undertaking the experiment, and while the CG did not improve their knowledge after using the COG-UK/ME, the TG acquired knowledge in a significant manner after using the SSPME.Therefore, we concluded that the SSPME was better than the COG-UK/ME in allowing the participants to acquire knowledge after use.

Use of the SSPME boosts knowledge confidence
We tested for differences in participant knowledge confidence (determined by a Likert scale self-evaluation of how well the participant felt they knew the material between pre-and post-experiment) between the CG and TG by Mann-Whitney U test.Before starting the experiment, there were no significant differences in knowledge confidence between the CG and TG.However, following the experiment, the CG and TG knowledge confidences differed significantly on all questions (p < .05for all five questions and p < .01 for two of the five questions) indicating an increase in knowledge confidence in the TG.We therefore concluded that the SSPME was better than the COG-UK/ME at allowing the participants to feel more confident in their knowledge after using the SSPME.

The SSPME demonstrates strong usability
SUS scores were calculated after user-testing.The COG-UK/ME received 42.78 (Grade F or 'Awful') and the SSPME received 82.68 (Grade A or 'Excellent'), showing a dramatic difference between the resources apparent usability for non-expert audiences (Alathas, 2018;Sauro, 2016).From these results, we concluded that the SSPME was more user-friendly to participants than the COG-UK/ME.

Open-ended feedback indicates that the SSPME is positively received by users
Open-ended feedback received from the CG was used early in the research timeline in order to inform the development of the SSPME (see Supplementary Material).Much of the feedback from the CG expressed a desire for the COG-UK/ME to contain more basic background information to help in understanding the data on the website.Additionally, it was noted that visual aids would be beneficial for the COG-UK/ME.
Open-ended feedback received from the TG was used to assess the effectiveness of the SSPME and to identify areas for improvement (see Supplementary Material).The feedback received on the SSPME showed a lot of positive responses, describing the SSPME as 'easy to use' with 'clear' and 'understandable' information and animations.Additionally, the TG provided valuable constructive comments, suggesting potential improvements for future versions of the SSPME, such as redesigning and relocating the animation and implementing interactive models of specific VOCs.

Research findings
The COG-UK/ME was developed as a data sharing resource for SARS-CoV-2 genome data, with the stated aim of allowing anyone to follow information over time on important changes in the SARS-CoV-2 genome.Given the technical nature of the data presented, the primary audience of COG-UK/ME was specialists in virology, public health, pharmaceutical industry and bioinformatics.However, the widespread public and media interest in SARS-CoV-2 and its variants during the pandemic meant that it was desirable to make this resource accessible to as wide an audience as possible.By doing so, the implications of SARS-CoV-2 mutations and VOCs on COVID-19 restrictions and public health measures could be better understood by the public, improving the science communication about the pandemic as a whole.
User-testing results show that not only did the SSPME improve the knowledge acquisition of the TG, but it also enabled the TG to be more confident in their knowledge of the material.These results also showed that the COG-UK/ME did not have the same impact on the CG, whose knowledge acquisition and confidence of knowledge did not improve as effectively after using the website.Additionally, results revealed that the SSPME has higher usability, earning a place in the top 10% ranking, while the COG-UK/ME fell in the bottom 10% ranking for usability under the SUS guidelines.This suggests that the SSPME can support a better user experience for nonspecialist users, compared to the COG-UK/ME.These results are promising and indicate that this visual resource could benefit the wider public.

Experimental review and limitations
There were limitations placed on the user-testing process of this project due to the COVID-19  pandemic.This project was conducted entirely remotely, which had advantages and disadvantages.It is possible that recruiting participants in an online format via social media allowed a wider participant population to view the study and sign-up.However, it is also possible that conducting the user-testing in person could have increased participation and improved questionnaire follow-through; noting the difference between the number of individuals who initially signed up for the study (N CG ¼ 55, N TG ¼ 61) and the number who fully completed the study (N CG ¼ 9, N TG ¼ 14) suggests that this may be the case.Email reminders were sent to combat this issue during the research, but they did not substantially improve completion.The relatively low number of participants who fully completed the study limited our ability to interpret the significance of the experimental results.Further work could examine the knowledge retention of participants to determine if the SSPME had lasting impacts on participant knowledge of SARS-CoV-2 spike proteins and VOCs.User-testing could then examine the participants' knowledge after a longer period of time to assess medium to long-term retention of knowledge.Furthermore, to eliminate any potential order effects on participant knowledge, a crossed-factorial experimental design could be applied during user-testing.
When reviewing the demographics of the participants in this study, we noted that many participants had strong academic backgrounds, with 57% of both the CG and TG participants having a Master's degree or other higher degree.This level of education and training can be argued as not fully representative of the 'general public' that the SSPME was originally designed for.It is likely that by advertising this scientific study primarily through platforms with higher engagement from those interested in science, the participants who signed up for this study already had some background in scientific topics explored in this project.This concern is reflected in Table 1, where many participants considered their knowledge of these topics to be at least 'basic'.In the future, conducting participant recruitment across more varied channels and platforms could result in a greater number of participants with more wide-ranging educational backgrounds.However, the importance of advanced research to everyday life during the SARS-CoV-2 pandemic meant that even people with a high level of relevant education and training could benefit from explanatory resources.In this case, even though the TG had well-educated participants, they still demonstrated an increase in both knowledge acquisition and confidence of knowledge after using the SSPME.Therefore, the data obtained in this study is still very encouraging in analysing the potential of the SSPME for the wider public.

Future research in visual scientific communication
Considering the encouraging user-testing results of this study, future research could explore how implementing interactive 3D visualisations could assist in improving scientific communication on other complex topics.Including this technology into science communication can give the viewer a new perspective, allowing them to visually explore challenging topics such as molecular structures and dynamic processes.Without interactive visualisations, the viewer can only try to grasp this material through text and static images, which can be extremely difficult for understanding topics that require building a complex mental image of something that, due to its microscopic scale, is not part of our normal experience (Barak & Hussein-Farraj, 2013;H€ offler, 2010;Jenkinson, 2018).Beyond their educational use, applications of this sort can also have a wider public value.During the COVID-19 pandemic, it has been clear that miscommunication and misinformation can circulate widely with deleterious consequences for public health measures.The development of effective communication toolslike the SSPMEcan increase the public's understanding of complex scientific topics and by doing so may help them to make well-informed decisions about how to protect themselves and the people around them from viruses such as SARS-CoV-2.

Conclusion
This project successfully developed an interactive 3D visual resource, the SSPME, to allow non-expert audiences to understand SARS-CoV-2 spike mutations and VOCs, demonstrating promising results on the SSPME's potential for educating the public on these topics.These user-testing results suggest that the COG-UK/ME website will be made more accessible to the general public through the 3D visualisations and interactivity of the SSPME.Not only could this benefit those without a scientific background, but it could also be valuable for informed non-specialist communicators such as teachers and journalists.

Figure 1 .
Figure 1.The 3D SARS-CoV-2 spike protein scene of the SARS-CoV-2 Spike Protein Mutation Explorer.Screenshots show the 3D SARS-CoV-2 spike protein embedded in the viral membrane (A) and the same scene when the user has rotated the model and is hovering their mouse over the NTD (B).

Figure 2 .
Figure 2. Accessing additional information through the SARS-CoV-2 Spike Protein Mutation Explorer.Screenshots show (A) the VOC menu, which is accessed via the right-hand side button which is labelled 'VOC' and contains an image of two simple viruses and (B) the Glossary panel, which is accessed from the bottom right-hand side button containing an image of books.

Figure 3 .
Figure 3. Information about Regions of Interest (ROI) in the SARS-CoV-2 Spike Protein Mutation Explorer.Screenshot shows the NTD-specific ROI page, one of the four ROI-specific pages, which is accessed by clicking the button labelled 'NTD' on the interactive 3D spike protein (see Figure2).Going from top to bottom, the buttons along the right-hand side allow the user to hear the narrated text, alter the text size, access the animation and access the Glossary panel.

Figure 4 .
Figure 4. Explanatory animations in the SARS-CoV-2 Spike Protein Mutation Explorer.Screenshots show 3D molecular animations, which are contained within the RBD-and NTD-specific ROI pages of the SARS-CoV-2 Spike Protein Mutation Explorer, showing (A) SARS-CoV-2 virus particles and (B) a close-up SARS-CoV-2 spike protein avoiding host antibodies, with the antibody binding sites highlighted with a dashed line.Mucin proteins present in the nasal mucus are visible as background detail.

Figure 5 .
Figure 5.The flow of the seven stages of the experimental questionnaire.

Figure 6 .
Figure 6.Knowledge acquisition with the CG (n ¼ 9) and TG (n ¼ 14).The number of correct answers out of 11 questions between the pre-and post-experiment quiz questions is shown for both the CG and TG, with individual responses plotted along with box and whisker plots.Differences between pre-and post-experiment were tested for significance with a Wilcoxon Signed Rank Test.Ã p < .005;n.s.p > .05.

Table 1 .
The protein visualisation and 3D modelling software used in the development of the SSPME.

Table 2 .
The experimentally resolved data used in generating the 3D models of proteins during the development of the SSPME.

Table 3 .
The demographics collected from all participants within this study.