Localizing COVID-19 Misinformation: A Case Study of Tracking Twitter Pandemic Narratives in Pennsylvania Using Computational Network Science

The recent COVID-19 outbreak has highlighted the importance of effective communication strategies to control the spread of the virus and debunk misinformation. By using accurate narratives, both online and offline, we can motivate communities to follow preventive measures and shape attitudes toward them. However, the abundance of misinformation stories can lead to vaccine hesitancy, obstructing the timely implementation of preventive measures, such as vaccination. Therefore, it is crucial to create appropriate and community-centered solutions based on regional data analysis to address mis/disinformation narratives and implement effective countermeasures specific to the particular geographic area. In this case study, we have attempted to create a research pipeline to analyze local narratives on social media, particularly Twitter, to identify misinformation spread locally, using the state of Pennsylvania as an example. Our proposed methodology pipeline identifies main communication trends and misinformation stories for the major cities and counties in southwestern PA, aiming to assist local health officials and public health specialists in instantly addressing pandemic communication issues, including misinformation narratives. Additionally, we investigated anti-vax actors’ strategies in promoting harmful narratives. Our pipeline includes data collection, Twitter influencer analysis, Louvain clustering, BEND maneuver analysis, bot identification, and vaccine stance detection. Public health organizations and community-centered entities can implement this data-driven approach to health communication to inform their pandemic strategies.

highlights the risk of misinformation, confusion, and mistrust undermining efforts to promote the use of vital resources, services, and information without active community engagement (World Health Organization, 2020, p. 7).
To empower communities and help people take control of their lives, it is essential to build a strategy that creates opportunities for them to participate in the COVID-19 response and develop locally appropriate, community-centered solutions. To implement this approach, it is important to develop a datadriven framework that can identify the key challenges that need to be addressed. This framework will help improve communication quality and provide adequate information to assist people in their decision-making processes.
According to the WHO report, the strategy for fighting COVID-19 disinformation and vaccine hesitancy should be community-led, data-driven, and collaborative while also reinforcing capacity and local solutions (World Health Organization, 2020). This effort requires the participation of state and local government, universities, religious organizations, libraries, media newsrooms, and other communitycentered entities. To accelerate the coordinated communication strategy and community response, a research pipeline is suggested to inform the strategy for effective health communication using a data-driven analysis of social media conversations about COVID-19 with a local focus on state discourse. Our case study of southwestern Pennsylvania provides an example of how this data-driven approach was implemented by considering the local conversations on social media, particularly Twitter.
One of the main challenges for health officials during the COVID-19 pandemic was to find an effective communication strategy to encourage people to stay home, wash hands, practice social distancing, wear a mask, and use the available vaccinespractices that were implemented to help to eliminate or minimize the negative consequences of the pandemic (Mummert & Weiss, 2013). Constantly growing vaccine hesitancy, mis/disinformation spread about the pandemic and the vaccines, news avoidance, and the usage of alternative media brought even more issues and complications (Allington et al., 2020;Bonnevie et al., 2022;Chou & Budenz, 2020;Hornik et al., 2021;Puri et al., 2020). Dubé et al. (2013) indicate that vaccine hesitancy is a complex phenomenon that is difficult to define. Attitudes toward vaccination are viewed on a continuous scale, ranging from a positive stance and active demand for vaccines to a negative stance and complete refusal to receive them. Vaccine-hesitant individuals fall somewhere in the middle of this continuum. They are often hesitant to receive vaccines that are safe and recommended. At the same time, people with antivaccination attitudes express more vaccine skepticism and relate to the strongly negative side of the continuum (Lindeman, Svedholm-Häkkinen, & Riekki, 2022). Several factors can influence vaccine hesitancy and decision-making, including past experiences, political beliefs, the information environment, media literacy, trust in government, and public health communication strategies.
Furthermore, vaccination has been the subject of numerous controversies and scares, such as the fraudulent link between COVID-19 vaccines and infertility (Wesselink et al., 2022) and others. Media and the Internet, particularly social media, provide a platform for anti-vaccination advocates and conspiracy theorists to disseminate mis/disinformation. Although the terms mis/disinformation are used interchangeably in this paper, there is a difference between the two. Misinformation refers to the spread of false information, regardless of whether there was an intent to deceive (Sherman, 2018), while disinformation contains false information intentionally created to mislead and misinform (Fallis, 2015). However, identifying intent can be challenging, especially for users on social media platforms.
During the COVID-19 pandemic, misinformation, disinformation, and vaccine hesitancy narratives spread rapidly online among anti-vax social media users. These trends have jeopardized efforts to promote health communication aimed at persuading people to get vaccinated against COVID-19. According to previous research, highly polarized and active anti-vaccine conversations were mainly influenced by political and nonmedical Twitter users, while less than 10% of the tweets stemmed from the medical community (Hernandez et al., 2021). One of the most significant contributing factors to vaccine hesitancy is considered to be the vast proliferation of mis/disinformation that led the WHO to declare an "infodemic." According to Scannell et al. (2021), the nonstop propagation of mis/disinformation has sparked confusion, suspicion, and negative sentiment toward the COVID-19 vaccine. To counter those issues, we need to attract the attention of society and healthcare professionals to the growing number of disinformation stories and ensure the presence of medical fact-checking information that would debunk disinformation narratives. A failure to target COVID-19 social media anti-vax discourse may continue to disrupt the mass-vaccination plans worldwide, including in the US.
As a result, one of the necessary steps in addressing vaccine hesitancy could be debunking the disinformation stories disseminated online. To combat these harmful narratives, it is necessary to develop a systemic approach that enables health practitioners and public health communicators to quickly identify online disinformation and address it promptly at the community level. To reach this goal, we propose a methodological pipeline that utilizes computational methods to gather social media data about COVID-19 within a specific geographical area. This data is then used to identify particular disinformation narratives being spread within the community at the state level. Therefore, we will be able to facilitate the implementation of misinformation debunking approaches and target local communities with more effective health communication strategies. For our case study, we use an example of southwestern Pennsylvania. To address the issues outlined, we have formulated several research questions: RQ1: What trends in social media discourse are demonstrated on Twitter regarding the propagation of pro-vaccination and anti-vaccination narratives in southwestern Pennsylvania? RQ2: What trends in social media discourse could be identified on Twitter regarding the propagation of narratives by bots and authentic users?
RQ3: What types of disinformation stories in Pennsylvania can be identified through computational methods? RQ4: What strategies are used to spread those disinformation stories throughout the area?

Methodology
We have collected Twitter data for the southwestern Pennsylvania COVID-19 Vaccine Project since the beginning of April 2021 to compile a set of tweets that capture conversations about the vaccine in southwestern Pennsylvania. Our initial collection of streamed tweets used keywords such as "Moderna," "vaccine," and "Pfizer," which did not limit the location where the tweets originated. Additionally, we started collecting tweets using a geographic bounding box for tweets with geolocation information to find tweets originating from specific locations in Pennsylvania. Spatial bounding boxes let users select tweets by placing squares on maps or using geolocation coordinates (Landwehr & Carley, 2014). Data collection resulted in weekly reports for the local healthcare professionals, including medical groups, religious organizations, and community service groups. The vaccine keyword data has a hierarchical location prediction neural network to extract locations for each tweet. This process results in tweets from the larger southwestern Pennsylvania cities: Philadelphia, Pittsburgh, Erie, Norristown, Chester, Bethlehem, and Allentown. These comprise the bulk of our final processed tweets.
The bounding box set is processed via keywords to limit tweets to those concerning the vaccine, as we know the tweets originate from southwestern Pennsylvania. Most of these tweets are from Pittsburgh and Philadelphia, but we have also collected tweets from 1275 other identifiable locations in Pennsylvania. These other tweets come from outlying suburbs, townships, and boroughs in rural areas and a few very specific user-defined locations (e.g., "Interstate 80 Rest Area: Danville"). Cities and counties have similar percentages of tweets collected since July 2021, with Philadelphia and Pittsburgh (and Allegheny County) representing most of the set (See Tables 1 and 2).
The raw tweets collected, including all locations for the keyword stream and all conversations from the geolocation stream, total approximately 4.3 terabytes of data since April 2021. Once processed to limit the data to Pennsylvanialocated tweets concerning vaccines, we have 32 gigabytes of data, or about 6 million tweets from almost 1.4 million users. During our analysis, we applied the principles of social cybersecurity, which aims to analyze, comprehend, and predict changes in human behavior, as well as social, cultural, and political outcomes that result from cyber-mediated activities. The main goal of cybersecurity research is to develop the necessary cyberinfrastructure that would enable society to maintain its essential character in a cyber-mediated information environment, even when faced with changing conditions or social cyber threats (Carley, 2020). The main guidelines of this field highlight the significance of analyzing the communication strategies used in social media and exploring effective countermeasures (Beskow & Carley, 2019;Carley, Cervone, Agarwal, & Liu, 2018).
Our pipeline consists of data collection, data filtering, bot detection with BotHunter (Beskow & Carley, 2018), Twitter analysis of influencers such as super spreaders and super friends, as well as analysis of BEND maneuvers (Blane, Bellutta, & Carley, 2022), and stance detection analysis (See Figure 1). Super spreaders and super friends possess highranking centrality scores on communication, meaning their messages are widely spread or they have many friends with substantial influence in the collected dataset. Other influencers are users with an active network presence, tweeting often or mentioning other users. They also operate in central parts of the conversation, such as by using important hashtags or mentioning important users (Alieva, Ng, & Carley, 2022;Uyheng & Carley, 2019).
The data was collected using twarc and analyzed with ORA and NetMapper software tools (Carley, 2014;Carley, Reminga, & Carley, 2018). We have used the BotHunter tool to identify bot activities, a tiered supervised machine learning approach for bot detection and characterization. Afterward, we have used NetMapper to compute language cues and ORA to compute reports and scores. ORA produces several metrics for Twitter data, such as the list of super spreaders (users that generate often shared content and hence spread information effectively) and super friends (users that exhibit frequent two-way communication, facilitating large or strong communication networks).
The Louvain method was used to identify network communities participating in the COVID-19 discussion in southwestern PA (Blondel et al., 2008). The Louvain algorithm is a widely adopted method for community detection that allows a more granular rendering of the network (Alieva & Carley, 2021; . We have also used stance detection analysis to compute positive and negative stances about vaccination in PA for users and messages. To identify the communication strategies of anti-vax spreaders, we have implemented BEND maneuver analysis which includes 16 categories of maneuvers for online persuasion and manipulation (Beskow & Carley, 2019). The BEND framework serves as a tool for deciphering strategic engagement and information maneuvers (Carley, 2020). The framework divides maneuvers in information space regarding positive and negative actions related to actions affecting narrative or network structure. Narrative maneuvers focus on the content of the message, while network maneuvers show network communities and structures. We have used ORA software to compute BEND analysis (see Table 3 for an overview of BEND maneuvers). For stance detection analysis, we have compiled a list of common hashtags and URLs where each hashtag and URL has a stance code: either negative, positive, or neutral. We have used this list with ORA software to code tweets in our dataset. As a result, we have implemented a mixed-method approach using quantitative analysis of the most influential stories and narratives (e.g., network analytics, Louvain clustering, BEND maneuvers, stance detection) and qualitative observations (e.g., qualitative discourse analysis, textual analysis) of the disinformation trends found in southwestern Pennsylvania with the focus on the major cities and counties (see Figure 1). We have compiled weekly reports identifying the trending hashtags, stories, and tweets per week with a deeper focus on popular disinformation stories and narratives. Those weekly reports are available online (Alieva, Robertson, & Carley, 2021).

Results
As a result of the data analysis for the April 2021 -February 2022, we found main topics and trends in stories, such as wearing masks; implementing vaccine mandates; COVID-19 new variants; vaccination and masks for children; COVID-19 regulations at schools; coronavirus issues related to sports and athletes, military and the US Army, as well as the trending topic of a booster vaccination. Overall, we have visualized the number of tweets about COVID-19 over time (See Figure 2). The number of tweets has increased over the summer of 2021 (see July), reached two peaks in May and September 2021, and started slowly decreasing with occasional spikes. Based on the tweets collected during these dates, the peak in May 2021 is attributed to the new CDC guidelines that allowed vaccinated individuals to forgo masks indoors and outdoors. However, many organizations and schools in Pennsylvania opted to maintain mask mandates. Another peak occurred in September 2021 following the US president's announcement of vaccine mandates for employees nationwide.
Twitter analysis in ORA computes lists of the main super spreaders and super friends. The leading super spreader list of vaccine hesitancy stories in Pennsylvania includes the Children's Health Defense website (childrenshealthdefense. org) and the founder of the organization Robert F. Kennedy Jr. (@RobertKennedyJr). Other organizations and websites on the list that actively spread disinformation, anti-vax, and vaccine hesitancy narratives are: One America News Network or One America News (OANN or OAN); Red Voice Media; Daily Caller; video platforms Rumble and Bitchute; Epoch Times; and media organizations such as dailyexpose.uk; www.dailymail.co.uk; theblaze.com; expose.uk; thepostmillennial.com; www.rebelnews.com; and others, as well as multiple Twitter users. YouTube links with disinformation content sometimes were found in our analysis but were eventually deleted by the platform, while services like Rumble and Bitchute do not moderate COVID-19 disinformation. Most of the trends are related to the narratives that occurred nationally; however, we could also identify several local narratives. Most of them were related to the the local numbers of COVID-19 cases and deaths as well as discussions of mask and vaccination mandates by the local groups. We have identified a negative framing of individual stories spread by the media in the state, such as one of them  We could also find disinformation stories that claim severe side effects after getting the COVID-19 vaccine (a story with the headline "Pennsylvania girl suffers a stroke and brain hemorrhage 7 days after being vaccinated"). We could also identify local media narratives promoting vaccine hesitancy framing in their stories (e.g., a local story with the headline "'I'm Not Willing To Go Through This Again:' Woman Diagnosed With Tinnitus After COVID Vaccine"). Previous research found that messages with negative framing result in a more substantial persuasive effect (Block & Keller, 1995). Ashwell and Murray (2020) found that negatively framed news is perceived as more credible and, therefore, more easily accepted. For that reason local healthcare professionals should address stories with negative framing of vaccination to avoid detrimental consequences. Also, among anti-vax and disinformation topics, the users focused on promoting natural immunity and various alternative treatments, including ivermectin. In addition to the previous topics, the emphasis was often on the "experimental" nature of the COVID-19 vaccine and many misrepresentations of data and scientific findings. Local stories would cover the national context, Pennsylvania, and nearby states. Many, if not most, of the influencers spreading disinformation, originate from outside Pennsylvania. Still, we find them and their tweets in our data as they are being retweeted, quoted, and replied to by Twitter users in Pennsylvania. The most popular pro-vaccination tweet in the entire dataset: "So y'all banning abortions while simultaneously saying you can't force people to get vaccinated because it's their body . . . make it make sense," while the most popular antivaccination tweet in the dataset: "They're not 'vaccine passports,' they're movement licenses. It's not a vaccine, it's experimental gene therapy. 'Lockdown' is at best completely pointless universal medical isolation and at worst ubiquitous public incarceration. Call things what they are, not their euphemisms." Our analysis indicates that negative framing of the messages tends to attract people's attention and motivate them to share it, as observed in both the pro-vaccination message that incorporates negative views on abortion bans and the anti-vax message that employs conspiracy theories such as "experimental gene therapy" and narratives that emphasize a perceived threat to individual freedom. Generally, lockdowns and vaccine mandates received the most negative responses in the tweets.
With BotHunter and stance detection analysis, we could identify the trends in Twitter communication between bots and non-bots and users with positive and negative stances about vaccination. See Figure 3 for a difference between bots and non-bots over time and Figure 4 for a difference between users with positive and negative stances about vaccination. The overall full dataset indicates the prevalence of positive stance messages about vaccination. Nevertheless, starting from October 2021, the number of positive and negative messages became almost equal. This trend suggests that users who support vaccination are becoming less active over time. Although the number of users posting negative messages also decreased over time, the negative stance narratives still formed an increased share of the overall total tweets. This increase in share could result from the decrease in positive stance narratives. However, the overall communication network is characterized by the presence of echo chambers and polarized communities (see Figure 5).
For the next step, we have used Louvain clustering and investigated the most influential groups in the dataset. Since we aimed to investigate all anti-vax users (bots and non-bots identified by the algorithm), we employed a mixed-method approach to qualitatively examine top influencers in each Louvain group and identify a group with the most influential anti-vax users.
The anti-vax users are either real users, bots, trolls, or cyborgs that spread anti-vax messages, vaccine hesitancy, and COVID-19 disinformation narratives online. In this context, "bot" is an account fully managed by computer software and programmed to produce automated messages, while "cyborg" can refer to either a human aided by a bot or

80
I. Alieva et al. a bot aided by a human. A "troll" uses social media to intentionally provoke an emotional reaction from as many users as possible by posting offensive and emotionally charged content (Paavola, Helo, Jalonen, Sartonen, & Huhtinen, 2016). The diverse nature of the accounts highlights why relying solely on computational methods is not always possible. As a result, we used a qualitative approach to manually check each list of the most influential users and identify anti-vax users. Next, we examined the Louvain clusters where those users were present. After conducting a stance detection analysis, we identified that the same group was leading in negative stance messaging about vaccination. Therefore, we ran a BEND maneuvers report for the group with the most influential anti-vax and negative stance users to investigate the narratives and strategies used to amplify the anti-vax discourse.
After investigating the lists of influencers, we discovered that anti-vax users tend to be disseminated by groups of other users. They also engage in frequent two-way communication and facilitate large or strong communication networks. Furthermore, they have high values of degree centrality (linked to many users), communicate in groups, and demonstrate their intent to influence other users by frequently retweeting, replying, mentioning, and quoting. The following analysis provides an overview of the BEND analysis, highlighting examples of harmful communication maneuvers.

BEND Maneuvers in Pennsylvania COVID-19 Twitter Discourse
BEND maneuver analysis includes 16 categories of maneuvers for online persuasion and manipulation. We found that primarily positive narrative maneuvers (Explain, Excite, Engage), positive network maneuvers (Back, Build, Boost, Bridge), and certain negative narrative maneuvers (Dismiss, Dismay and Distract) prevailed in the conversation, while Enhance and Neglect were not actively implemented (see Figure 6).
We will discuss each group of the maneuvers we found and provide examples. The negative narrative maneuvers, namely  Dismiss, Distract, Distort, and Dismay, are employed to amplify disinformation. The Dismiss maneuver is used to express denial of facts, the Distort maneuver is used to change or reinterpret information, and the Distract maneuver is used to create noise and confusion. Meanwhile, the Dismay maneuver causes attitudes of sadness, fear, anxiety, or anger (see Table 4 for examples).
Anti-vax users also utilize positive narrative maneuvers such as Explain, Enhance, Excite, and Engage. Explain provides additional details and context, while Enhance covers the views of others and provides more information about the discourse. Excite is used to attract the audience through positive expression, and Engage provides more arguments for better associations with a particular idea. See examples of these maneuvers from anti-vax users in Table 5.
Negative network maneuvers, such as Neutralize, Nuke, Narrow, and Neglect, attempt to eliminate the impact of counternarratives in the conversation. Neutralize targets a particular influential opinion, while Nuke is used to split the community. Narrow polarizes and isolates groups, while Neglect is used to reduce the community (see examples in Table 6).
Positive network maneuvers, including Build, Back, Boost, and Bridge, strengthen connections between actors in a community network. Build creates communities, while Back supports the opinions of a group. Boost maneuver enhances the connections between network actors, and Bridge adds linkages between various groups (see examples in Table 7).
Generally, we observe that B maneuvers (positive network maneuvers), E maneuvers (positive narrative maneuvers), and

82
I. Alieva et al. D maneuvers (Dismiss, Dismay, Distract) are predominantly present in discussions on vaccine hesitancy and anti-vax sentiments on Twitter in southwestern Pennsylvania. These maneuvers aim to promote vaccine hesitancy and anti-vax discourse by building a community around anti-vax narratives and engaging the target audience with more controversial discourse.

Conclusion
Addressing vaccine hesitancy requires debunking disinformation that is spread online. To combat these harmful narratives, we have developed a systematic approach that enables health practitioners and public health communicators to identify and address online disinformation at the community level. Our computational pipeline gathers social media data about COVID-19 in a specific geographic area and uses it to identify disinformation narratives being spread within the community at the state level. Our analysis revealed that negative messaging often attracts people's attention and encourages them to share it. This pattern was observed in both pro-vaccination and antivaccination messages. Tweets expressing negativity toward  Build MN Families: Democrats introduced legislation to strip parents of their rights to exempt their child from ANY vaccine (or future vax). They want total control over your children. Back The NFL is 95% vaccinated, the NBA is 97% vaccinated, the NHL is nearly 100% vaccinated. All three leagues are still overwhelmed with covid cases & hitting new covid case highs. So how is @JoeBiden arguing vaccines will end covid? The pro sports leagues prove that's 100% a lie. Boost The WHOLE POINT in being against vax passports is not bc we care about going to gyms or restaurants. The POINT is that we should never ever ever submit to a social credit system in which the government gets to decide what behaviors merit total access to society. DO YOU GET IT? Bridge Governments and Pharma companies may have liability protection from vaccine damage, but will it be extended to private companies? If one is coerced to take the "vaccine" to interact with these private companies are they not liable? lockdowns and vaccine mandates were the most prevalent. Furthermore, we identified that anti-vaccination users employ positive network and narrative maneuvers to promote vaccine hesitancy and anti-vaccination beliefs on Twitter in southwestern Pennsylvania by building a community around anti-vax narratives and engaging the target audience with controversial discourse.

Limitations
By focusing on a specific geographic location, such as a state, city, or county, we were able to identify and analyze disinformation being spread at the local level, providing a more effective approach to understanding the problem and developing countermeasures. While this method may limit our ability to extrapolate our results to other locations and states, it can be applied by various organizations in different locations. Moreover, this study's focus on a smaller geographic area can also be a limitation, as it may hinder our ability to observe more significant trends in disinformation. The smaller number of tweets in these areas could make it difficult to draw conclusions about disinformation patterns and compare them with larger cities where more tweets originate. Furthermore, this study only covers Twitter, and a multi-platform approach would provide richer data and enhance our analysis.

Recommendations
To ensure effective health communication strategies, it is crucial to prioritize official and timely communication, shape attitudes toward vaccination beforehand, and reframe social media as a valuable resource. Healthcare organizations should implement social media strategies to counter disinformation. Positive messages on platforms like Twitter can promote behavior change and build positive attitudes about vaccination. Communication strategies should be adjusted based on the audience and the epidemiological situation. This data-driven approach can guide communication strategies for public health organizations, civil society, mass media, and other communitycentered groups during the pandemic.