January 6th on Twitter: measuring social media attitudes towards the Capitol riot through unhealthy online conversation and sentiment analysis

ABSTRACT While social media can serve as public discussion forums of great benefit to democratic debate, discourse propagated through them can also stoke political polarization and partisanship. A particularly dramatic example is the January 6, 2021 incident in Washington D.C., when a group of protesters besieged the US Capitol, resulting in several deaths. The public reacted by posting messages on social media, discussing the actions of the participants. Aiming to understand their perspectives under the broad concept of unhealthy online conversation (i.e. bad faith argumentation, overly hostile or destructive discourse, or other behaviours that discourage engagement), we sample 1,300,000 Twitter posts taken from the #Election2020 dataset dating from January 2021. Using a fine-tuned XLNet model trained on the Unhealthy Comment Corpus (UCC) dataset, we label these texts as healthy or unhealthy, furthermore using a taxonomy of 7 unhealthy attributes. Using the NRCLex sentiment analysis lexicon, we also detect the emotional patterns associated with each attribute. We observe that these conversations contain accusatory language aimed at the ‘other side’, limiting engagement by defining others in terms they do not themselves use or identify with. We find evidence of three attribute clusters, in addition to sarcasm, a divergent attribute that we argue should be researched separately. We find that emotions identified from the text do not correlate with the attributes, the two approaches revealing complementary characteristics of online discourse. Using latent Dirichlet allocation (LDA), we identify topics discussed within the attribute-sentiment pairs, linking them to each other using similarity measures. The results we present aim to help social media stakeholders, government regulators, and the general public better understand the contents and the emotional profile of the debates arising on social media platforms, especially as they relate to the political realm.


Introduction
The rise of social media during the past two decades presents renewed challenges to democratic societies.On the one hand, social media is a public forum where a wellinformed citizenry can engage in the type of dialogue and debate that is the lifeblood of democracy; on the other, in recent years, social media proved it can also be a vector for disinformation, fake news, bullying, hate speech, and, most importantly for the present paper, increasingly extreme political polarization.Thus, discourse on social media, far from being merely a question of moderation or censorship solely of concern to social media platform administrators and stakeholders, has the potential to result in definite social effects in the real world.
One of the most dramatic such manifestations in recent years has been the so-called Capitol riot that took place in Washington, D.C. at the Capitol building, the seat of Congress, the U.S. legislature.On 6 January 2021, a group of Trump supporters attending the lawful 'Save America' rally turned aggressive, broke police lines, and breached the Capitol building with the intent of interrupting the certification process of Electoral College votes ('Capitol Riots Timeline', 2022), while displaying a mixture of pro-Trump and extremist iconography (Simpson & Sidner, n.d.).The violent incident resulted in the death of five people and significant damage to the credibility of the US democratic process ('Capitol Riots Timeline', 2022).These events did not occur in a vacuum, as a spontaneous outburst of violence; rather, they must be understood in the context of the controversial 2020 US Presidential Election cycle, dominated by bitter polarization (Abilov et al., 2021;E. Chen et al., 2021;Lees & Cikara, 2020), ad hominem arguments, ceaseless online polemics, and allegations of voter fraud against the victorious Democratic party candidate Joe Biden from mass media personalities and pro-Trump factions of the Republican party (Abilov et al., 2021;#StopTheSteal, 2021).As accusations of vote rigging immediately call into question the legitimacy of any democratic system, these events must be understood in the context of a more general political realignment, in which social media discourse plays an important role.Research must be directed towards an understanding of this realignment and its potential to reflect on the ill health of our democratic institutions.
As a contribution towards this goal, in the current paper, we wish to better understand the difficulties in communication posed by such polarization and partisanship concerning the events of 6 January 2021, in contemporary posts on Twitter, a popular social media platform (Most Popular Social Media Apps in U.S., n.d.).We use the broad notion of 'unhealthy online conversation' (Price et al., 2020) to characterize social media discourse relevant to the events.More specifically, we are interested in answering the following questions: (1) what proportion of relevant tweets are part of unhealthy conversations (2) how does this proportion increase over time in response to significant events (3) do unhealthy conversations have specific markers we can identify in text data (4) are there any specific emotional patterns that correlate with specific attributes of unhealthy conversations (5) can we identify topics of concern relevant to polarized discourse To investigate these research questions, we retrieve 1,300,000 tweets from the #Elec-tion2020 dataset (E.Chen et al., 2021) dating from January 2021, and classify them with an XLNet deep learning (DL) model trained on the Unhealthy Comment Corpus (UCC) dataset annotated for attributes healthy and unhealthy, and sub-attributes hostile, antagonistic, dismissive, condescending, sarcastic, generalization, and/or unfair generalization.On this dataset, we achieve higher performance than the sample implementation presented by the authors (Price et al., 2020).We also use the NRCLex emotion lexicon (Mohammad & Turney, 2013) to identify the emotions expressed in the texts, and compare and contrast our findings with the predicted attributes of unhealthy conversation.Finally, we proceed to discover discussion topics for each sub-attribute using latent Dirichlet allocation (LDA) and follow the trends of the attributes during the period studied.
The paper is structured as follows: in the section 'Related Work', we review relevant background material informing this paper.In the section 'Data and Materials', we discuss the data and methodology we used for training our classifiers, comparing their performance on the UCC dataset with the literature.We also briefly detail the emotion identification and topic modelling techniques we used in this section.In the section 'Results', we present the results of the analyses we performed in detail.In the section 'Discussion', we present our interpretation of our findings and attempt to situate them within a growing literature on social media analysis as it applies to political polarization.Finally, in the section 'Conclusions' we conclude our study, acknowledge its limitations, and give recommendations for further research.

Related work
Use of Twitter data in social science research Social media presents an attractive source of text data that can be used in text mining for social science applications (Macanovic, 2022).Twitter is an especially attractive option because of the high volume of data and the flexible API access that was offered to academics prior to the recent change in administration of this platform (Calma, 2023).Computational methods such as dictionary methods, semantic and network analysis tools, and language models have been widely applied to such data in social science fields such as sociology (Macanovic, 2022), aiding in theory building and testing.Supervised and unsupervised machine learning techniques that are popular in many domains (Dobriţa et al., 2022;Oprea et al., 2021;Preda et al., 2018), are also increasingly used.This approach has come to be known as computational social science (Conte et al., 2012;Lazer et al., 2009;Macanovic, 2022).Nevertheless, these methods must be used carefully and with an understanding that they present a complement to traditional social science methodologies (Macanovic, 2022).
Twitter data specifically has been used by Ruan and Lv in assessing the public's attitudes towards electric vehicles (Ruan & Lv, 2023).The authors use large data sets and compare the data obtained from Twitter with data obtained from Reddit, another text-based social media platform, and conclude that different information can be obtained from the two sources (Ruan & Lv, 2023).They also highlight that such methods can provide novel insights when compared with traditional methods such as surveys, questionnaires, or interviews (Ruan & Lv, 2023).The study highlights the difference between the optimistic tone of the politicians, journalists, and other public figures, and the more mixed perceptions of the general public (Ruan & Lv, 2023).K. Chen et al. used a 5,000,000-text corpus obtained from Twitter to uncover the public's perceptions of climate change and climate strikes (K.Chen et al., 2023).They show that Twitter activity spikes around major events related to these topics, stress the difference between the topics on which policymakers and news outlets focus on as opposed to other climate actors, and call attention to the highly politicized nature of the discussion (K.Chen et al., 2023).Tschirky and Makhortykh also use Twitter data related to the 2022 siege of Mariupol in the context of the ongoing Russo-Ukrainian war to compare manually-performed qualitative content analysis and automated topic modelling performed through LDA (Tschirky & Makhortykh, 2023).They highlight the limitations of qualitative approaches such as close reading, discourse analysis, and qualitative content analysis, mostly related to the small sample size of texts, but also point out the shortcomings of quantitative approaches such as topic modelling, related to the limited interpretability of the results from a framing analysis perspective (Tschirky & Makhortykh, 2023).They point out however that the two approaches lead to similar results, and suggest that other quantitative research methods, such as text classification using Transformers-based models, can lead to better and more interpretable results (Tschirky & Makhortykh, 2023).
Finally, Ouni et al. have used data sourced from Twitter to submit an entry answering the author profiling task related to the PAN19 challenge (Ouni et al., 2022).They highlight Twitter's popularity as a communications platform, and the growing importance of data sourced from Twitter in social science-related fields such as politics, forensics, marketing, and security (Ouni et al., 2022).They showcase a classification pipeline able to predict, based on the text of a tweet, whether the author is a bot or a human, and if human, determine their gender (Ouni et al., 2022).They use this pipeline with multiple languages and obtain high performance for both sub-tasks (Ouni et al., 2022).They obtain this superior performance by using hand-built feature sets using stylistic information, showing that there are statistical differences between texts produced by bots and those produced by humans (Ouni et al., 2022).These results show that the usage of Twitter data within the social sciences is a well-established practice that can reveal new and complementary findings when used alongside traditional research methods.

Online toxicity and incivility
The issue of problematic online communication has only recently begun to be posed as a question of unhealthy (i.e.not informative) conversation, as past research focused mostly on combatting toxicity, or more overt abusive and destructive behaviour, often with the aim of moderating or censoring it within social media settings.According to Wang et al., such automated detection still mostly relies on heuristic methods such as text matching (K.Wang et al., 2021), making the practical aim of such research the development of more advanced algorithmic moderation methods.Recently, interest arose in detecting less overt harmful behaviours, such as incivility (Y.Chen & Wang, 2022).Studying YouTube comments, also in the context of the 2020 US Presidential election, Chen and Wang highlight the potential limits of automatic moderation, observing the fact that the YouTube recommendation algorithm and the company's advertising policies are at least partly responsible for the development of toxic and 'incivil' conversations in the comments as drivers for engagement (Y.Chen & Wang, 2022).They stress rather the regulatory aspect (Y.Chen & Wang, 2022), highlighting that simple moderation is often insufficient given contradictory market incentives.

Unhealthy online conversations
An even broader and more nuanced approach is proposed by Price et al., who propose a novel nomenclature that refers to conversations as either healthy, defined as conversations 'where posts and comments are made in good faith, are not overly hostile or destructive, and generally invite engagement' (Price et al., 2020), or unhealthy, where at least some of these conditions of healthy conversations are not met.Furthermore, they establish a taxonomy of 7 attributes of unhealthy conversations: hostile, antagonistic, dismissive, condescending, sarcastic, generalization, and/or unfair generalization.They release their dataset, the Unhealthy Comment Corpus (UCC), and provide a sample BERT-based model to be used as a baseline classifier.It is their work that primarily informs the current approach.
Several other authors also use the UCC dataset and the framework of unhealthy online conversation in their research.Gilda, Giovanini, Silva, and Oliveira perform a survey of different classifier architectures, including a variant of long short-term memory neural network (LSTM), with the UCC dataset used as a benchmarking tool (Gilda et al., 2022).Their aim is to engineer improved algorithmic moderation techniques, and as such do not apply the classifiers to out-of-domain data as we do.Our classifiers also outperform theirs when run on the validation dataset included in UCC.On the other hand, Lashkarashvili and Tsintzadze adapt the methodology for the Georgian language (Lashkarashvili & Tsintsadze, 2022), curating and releasing their own Georgian-language annotated dataset.They also argue that in the online forum they obtained their out-of-domain data from, the 'politics' section was deemed the most probable to contain unhealthy discussions, validating our hypothesis that political polarization might be related to unhealthy online conversation.However, to the best of our knowledge, we are the first to apply this methodology on social media data at the n > 1 M scale, as well as to the topic of political polarization specifically.

Sentiment analysis
The topic of classifying unhealthy texts immediately calls to mind the more traditional natural language processing (NLP) tasks of sentiment analysis, polarity analysis, or stance analysis.While NLP techniques such as these have been widely cited and applied in the literature, one of the ways in which they differ from our approach is that the definition of unhealthy conversations is deliberately left broad, referring more to a set of counter-productive pragmatics (Price et al., 2020) than any concrete semantic content, like the value judgement towards a specific target in the case of stance analysis (ALDayel & Magdy, 2021; R. Wang et al., 2019) or the general positive or negative meaning of a text in the case of polarity analysis (Giachanou & Crestani, 2016).
There are numerous sentiment analysis techniques discussed in the literature, mainly grouped into two categories: lexicon-based, which counts words with special meaning to establish the sentiment behind a text, such as SenticNet2 (Cambria et al., 2012), NRCLex (Mohammad & Turney, 2013), and VADER (Cristescu et al., 2022;Hutto & Gilbert, 2014); and machine learning-based (Lashkarashvili & Tsintsadze, 2022), in which case an algorithm is used to infer the sentiment automatically, with hybrid approaches also possible (Cambria et al., 2020).
While the state-of-the-art in the domain of sentiment analysis involves a deep learning (DL) approach using large transfer learning models such as BERT (Cascini et al., 2022), one of the advantages of lexicon-based methods is their speed and simplicity, and especially the fact that a manually annotated dataset is not needed, unlike in the case of supervised learning.The difficulties in the development of such a set, and the lack of appropriate publicly available datasets, is an ongoing challenge in the field (ALDayel & Magdy, 2021;Cotfas et al., 2021;R. Wang et al., 2019).
As mentioned, we aim to compare the approach of evaluating text corpora in terms of unhealthy conversations with sentiment analysis.In order to facilitate this comparison, we have chosen to use the NRCLex emotion lexicon provided by Mohammad and Turney (Mohammad & Turney, 2013), in part because, in addition to identifying the polarity of the text, it also provides a more detailed sentiment breakdown based on psycholinguistic research (Mohammad & Turney, 2013).The authors used the crowdsourcing tool Amazon Mechanical Turk, a platform where requesters can submit tasks including data entry, classification, annotation etc. for crowd-workers called Turkers (Mohammad & Turney, 2013) to perform for a monetary reward, to compile a large corpus of emotional annotations on over 10,000 uni-and bi-grams, of which the final lexicon retains 8,883 (Mohammad & Turney, 2013).
While not offering the best performance possible, in part because of its ignorance of context (Lashkarashvili & Tsintsadze, 2022;Mohammad & Turney, 2013), NRCLex is nevertheless suitable as a baseline to investigate what exactly the two approaches measure when it comes to informal social media texts, that is, how strongly correlated the sub-attributes of unhealthy conversation are with traditional sentiment measures.

Data and methods
We use supervised ML models trained on public datasets for classifying unhealthy Twitter conversations, the NRCLex sentiment lexicon for sentiment analysis, and reveal latent topics in the data using unsupervised ML techniques (LDA) (Figure 1).

Data
As mentioned above, we have leveraged the UCC dataset for the training stage.This dataset contains 44,000 short texts in English annotated with the seven sub-attributes hostile, antagonistic, dismissive, condescending, sarcastic, generalization, and/or unfair generalization as well as the healthy flag.The annotations are not exclusive: each text can be part of any number of classes.Thus, classifying unhealthy comments is a multi-class classification problem rather than a multi-label one, where class membership is exclusive.
We chose to use this dataset due to its large size and the clear definition of its attributes; earlier datasets with similar aims, such as the Google Jigsaw toxic comment dataset used by Chen and Wang (Y.Chen & Wang, 2022), are in some respects inferior because there is limited information on how the data was compiled, the attributes used for annotation are only loosely-defined, and subtler forms of harmful or bad-faith comments, such as generalizations, dismissive comments or a condescending tone are not found within the set.However, the dataset size is larger (160,000 texts) and thus attractive for a more restricted study of outright toxicity or hate speech (Toxic Comment Classification Challenge, n.d.).
While the authors note the sampling bias introduced by the comments being collected from the user comments section of a single Canadian newspaper (Price et al., 2020), we feel applying it to contemporary North American texts is appropriate and should result in minimal issues due to linguistic and cultural proximity, since the outof-domain data was obtained from the #Election2020 dataset released by Chen, Deb and Ferrara (E.Chen et al., 2021).This is an enormous dataset consisting of over 1.2 billion tweets referring to the 2020 US Presidential election (E.Chen et al., 2021).As mentioned above, we feel the events must be understood in this context, so this choice of data source is appropriate.We restricted our analysis to January 2021, when the events took place, and as such we downloaded only tweets from this period.We removed all tweets not in English (∼12%).Even so, to make the data manageable given hardware limitations, we had to take a 3.5% stratified random sample of the tweets based on their datetime.Our final dataset numbered approximately 1,300,000 (1.3 M) tweets in total.

Representation
For the classical machine learning algorithms, we represented the data as an n × m document-term matrix M, where the columns represent the features (tokens), the rows represent the documents, and each element M ij represents the TF-IDF score, a measure of the token's importance within the document (Vajjala et al., 2020).We used the TF-IDF vectorizer from the scikit-learn Python package.For the deep learning models, we used pretrained tokenizers to obtain word embeddings: WordPiece encoder for BERT (Devlin et al., 2019), byte-pair encoding for RoBERTa (Liu et al., 2019), and a SentencePiece-derived tokenizer for XLNet (Yang et al., 2019).We used the tokenizer implementations from the HuggingFace 1 provider.

Classifier evaluation
Price et al. use the area under the receiver operating characteristic curve (ROC-AUC) performance measure to evaluate their model.The ROC curve describes the relationship between true positive rate (TPR or sensitivity, Equation 1) and false positive rate (FPR, Equation 2) as the decision threshold τ is varied in a model over the interval [0, 1] ( Kelleher et al., 2020).
Where TP is the number of true positives, FN the number of false negatives, TN the number of true negatives, and FP the number of false positives.The AUC score is then the definite integral of the ROC curve, and can be calculated using the trapezoidal rule (Kelleher et al., 2020): Where T is the set of all thresholds τ and T i is the i-th threshold τ.
A greater value of the AUC score means better performance, with an area of value 1 corresponding to a perfect TPR (all accurate predictions and no false positives at all).
We trained six classifiers in total: three classical machine learning algorithms, adapted for multi-class problems using the one-vs.-allensemble method: logistic regression (LR), random forest (RF), and support vector machine (SVM); and three deep learning models: BERT (Devlin et al., 2019), RoBERTa (Liu et al., 2019), and XLNet (Yang et al., 2019).To facilitate comparisons, we calculated the ROC-AUC score for each sub-attribute as well as the attribute healthy, then took their mean to capture the overall performance ranking of the models (see Table 1).
Contrary to our previous experience, XLNet (mean ROC-AUC score 0.8075) slightly outperformed RoBERTa (mean ROC-AUC score 0.8026), with the difference especially visible at the level of the sarcastic class, which Price et al. note is particularly challenging to predict (Price et al., 2020).XLNet however obtains a 0.7410 score for this class, better than RoBERTa (0.6999) and significantly better than the baseline proposed by the authors (0.5880) (Price et al., 2020).The subjective quality of the predictions from XLNet is also significantly better than the other models we experimented with, and the class proportions within predictions are more consistent with Price et al.'s observations.

Sentiment analysis
After performing unhealthy attribute prediction on the dataset, we also perform emotion detection using the lexicon-based approach cited above, introduced by Mohammad and Turney (Mohammad & Turney, 2013), available in an enhanced and updated form as the NRCLex 2 Python package, allowing the identification of an arbitrary sequence of text along 8 dimensions: the emotions fear, anger, anticipation, trust, surprise, sadness, disgust, joy.These attributes are based on the Plutchik wheel model of emotion (Mohammad & Turney, 2013).In addition to these, the polarities positive and negative are also outputted by this library (thus, 10 dimensions in total).
NRCLex requires the data to be cleaned and lemmatized (that is, each token transformed into its standard dictionary form) before it can analyze the text, so we used the NLTK 3 Python package for this purpose.For each document, we ran the emotion classifier and saved the output into a vector e, e ∈ ℝ 10 to facilitate comparison with the outputs of the unhealthy sub-attribute classification.

Topic discovery
We used LDA, an unsupervised ML technique that can be used to reveal unobserved structures ('topics') in text data by treating topics as a mixture of tokens and documents as a mixture of topics (Vajjala et al., 2020) for each attribute.We optimized the number of topics k, as well as the hyperparameters η (the Dirichlet prior of the topic-word probability distribution) and α (the Dirichlet prior of the document-topic probability distribution) using the C v coherence measure (which measures similarity between the sets of tokens comprising a topic, related to cosine similarity) for each attribute, excluding terms appearing in fewer than 50 tweets or more than 95% of tweets.We used the Gensim 4 Python package for this analysis.

Results
We present our findings from the perspective of the unhealthy sub-attributes characteristic of our data and compare them with the results obtained using the sentiment lexicon NRCLex.We create sentiment profiles for each sub-attribute to facilitate comparisons, observing some of the same dynamics we have discussed during training.Finally, we present the results of our topic analysis.

Attribute distribution and trend analysis
Overall, 35.30% of the tweets have been labelled as unhealthy.However, since the authors' methodology allows for texts to exhibit some unhealthy characteristics without being labelled as fully unhealthy, 42.39% of tweets exhibit at least some unhealthy traits.Out of these tweets, overtly toxic attributes such as antagonize, dismissive, and hostile were most prevalent (58.28%, 38.38%, and 36.45%),followed by condescending (20.44%) and sarcastic (13.82%).The least well-represented were the more subtle generalization and unfair generalization attributes (7.79% and 7.97%).
Even during training, some features of the seven attributes have become apparent.They cluster along distinct lines (see Figure 2): generalization and unfair generalization cluster together and are learned very distinctly by all models, including the baseline proposed by the authors (Price et al., 2020), despite their low proportions in the data.This poses the question of whether there is any value in even making a distinction between the two, as they are clearly intrinsically related.We believe one of these classes is not informative and thus redundant; we propose for follow-up analyses to merge them during training and prediction, treating them as one, as they appear to be extremely highly correlated in our data (Pearson correlation coefficient of 0.9599).
The second cluster consists of attributes generally considered toxic rather than simply unhealthy: hostile and antagonize.The 'toxic' cluster can be seen to rise rapidly and then quickly decay, while the others, especially generalization (which denotes both types of generalization in the figure) show less volatility and follow the general pattern of the tweets.The third consists of the attributes which are more subtly destructive: dismissive and condescending.Sarcastic, a feature difficult to identify even for humans, was difficult to learn, as mentioned aboveit is possible the higher success of XLNet when learning sarcasm when compared to the other models might be because its internal structure is better suited to picking up subtle differences in word order (Yang et al., 2019).Nevertheless, we note several characteristics of sarcasm within the observed data and the training results that suggest further improvements can be made for sarcasm detection.This is discussed at more length in the next sections.
As can be seen in the first two plots in Figure 3, where attributes that clustered together are displayed together, there is a strong daily cyclical pattern reflecting the day-night cycle which becomes attenuated for certain periods.For instance, during the events and their immediate aftermath (January 6-10, denoted with E1 and E2), this cyclical pattern is almost flattened for all clusters, suggesting heightened engagement during all times of the day.These might correspond to periods of intense discussion, or the spread of the topic outside the US (especially in Europe) due to live news coverage and online engagement; it might also indicate the deployment of bots.
The second period of heightened unhealthy conversation happened during January 13-16 (denoted with E3), when, as a result of his role in the Capitol riots, Congress took steps towards impeaching Trump a second time (Sheth, n.d.).The cyclical pattern prevails after January 16, except for a significant flare-up of toxic posts during January 19-23 (denoted with E4), corresponding to Trump officially leaving office on January 20, together with the new President Biden's inauguration ceremony (CNN, 2021).Finally, the third graph in Figure 3 shows the ratio of unhealthy tweets to the overall number of tweets.Its near-stationarity suggests that unhealthy conversation makes up a relatively stable proportion (between 30 and 50%) of the overall discussion throughout the entire period, being a perpetual occurrence during the normal course of social media discourse, not merely a response to dramatic events.Thus, positing a causal relationship between social media phenomena and political violence in the real world is unwarranted; political polarization appears in this case to be a particular case of unhealthy conversation, which only through some other, unrelated conditions leads to dramatic effects.

Sentiment profiles of unhealthy online conversations
We were interested in how the attributes of unhealthy online conversations correlated with another popular way of analyzing social media texts, emotion identification or sentiment analysis.We ran the lexicon-based sentiment analysis package NRCLex on the labelled set and represented the output as a vector e, e ∈ ℝ 10 , with each component representing one of the 8 emotions in addition to the polarities positive and negative.We transformed the outputs into binary variables, with each component of e being set to either 1 or 0 if they exhibit the sentiment corresponding to that component or not.
We then computed the correlation matrix between the 8 emotions, the 2 polarities, and the 7 sub-attributes using Cramer's V-measure, a statistic that measures the correlation between categorical variables (Liebetrau, 1983).This statistic yields a score in the range [0, 1], with 0 meaning no correlation and 1 meaning perfect correlation (Liebetrau, 1983).Note that Cramer's V-measure does not give the direction of the correlation (direct or inverse) like Person's correlation coefficient, only the strength of the association.
As can be seen in Figure 4, the two sets of features, Price et al.'s attributes of unhealthy conversation (Price et al., 2020) and Plutchik's emotion wheel as identified by Mohammad and Turney's lexicon (Mohammad & Turney, 2013), are mostly independent of each other.On the other hand, they do display significant correlation between their own feature sets; for instance, within the attributes of unhealthy conversation, there is the situation with generalization and unfair generalization being very highly correlated (Cramer's V of 0.96, almost perfect correlation), an issue which we have already noted.We can also observe significant correlations between dismissive and condescending (0.7), hostile and condescending (0.55), and hostile and dismissive (0.51).With the exception of hostile and condescending, and though the strength of these correlations is greater than what Price et al. report, they are consistent with what the authors observe within the UCC data itself (Price et al., 2020).It is interesting to note that, again, sarcastic appears to be very distinct; it has almost 0 correlation with every other feature except for a weak correlation with the attribute condescending (0.1), which appears sound from an intuitive point of view, as sarcasm is often employed in order to patronize (Keivanlou-Shahrestanaki et al., 2022), though Price et al. do not observe this, but rather a small, inverse relationship (Price et al., 2020).
Within the emotion-based features, fear is moderately correlated with anger (0.39), disgust (0.35), joy (0.38), and positive sentiment (0.27); anger is moderately correlated with disgust (0.42), joy (0.34), and positive sentiment (0.38); negative sentiment is correlated with anticipation (0.36), trust (0.32), and sadness (0.36).Surprisingly, negative sentiment is not correlated with any of the sub-attributes, not even the most toxic ones.This suggests that negative general sentiment can in no way be equated with toxicity, supporting other similar findings in the literature (ALDayel & Magdy, 2021).
It is also worthwhile to observe that there is, in fact, some correlation, albeit weak, between the emotion disgust and some of the unhealthy sub-attributes, such as hostile (0.21), dismissive (0.13), and condescending (0.13), and between positive sentiment and the attributes hostile (0.16), dismissive (0.13) and condescending (0.12).In order to disambiguate these counter-intuitive findings, we decided to investigate further by building emotion profiles for each attribute (see Figure 5).
These profiles were built by calculating, for each attribute, the proportion of tweets identified as exhibiting each emotion divided by the total number of tweets with the attribute (see Equation 4).These were then plotted using radial coordinates to visualize their shape.This visualization reinforces some of our previous observations, as the plots for antagonize, condescending, dismissive, and hostile appear to be very similar to each other, suggesting a similar emotional profile, with a surprisingly high proportion of tweets with positive sentiment, a large proportion of tweets exhibiting disgust, and a moderate proportion of tweets exhibiting fear, anger, and surprise.Generalization and unfair generalization appear to have a distinct emotional profile from the other attributes and are very similar to each other.Sarcastic is, yet again, distinct from the other sub-attributes, and is the only sub-attribute with a large proportion of tweets exhibiting negative sentiment and the emotion anticipation.Finally, healthy tweets were included as a control to facilitate comparisons.
Where e i is the i-th emotion component of the emotion vector e, a j is the j-th attribute of unhealthy conversation, the numerator is the number of tweets exhibiting both sentiment e i and attribute a j , and the denominator is the total number of tweets exhibiting attribute a j .

Topic analysis
While discussing polarization in a similar context, Lees and Cikara stress the intergroup conflict aspect of the phenomenon (Lees & Cikara, 2020).This aspect is best captured by the traditionally toxic attributes antagonize and hostile, but such discussions appear to be of limited value to the study of polarization, as these topics seem uninformative, consisting mostly of noise and vulgar abuse.
On the other hand, the attributes indicating generalization in a political context are especially informative as through their markers of stereotyping and labelling, they express the kind of meta-perceptions that are indicative of misperceived polarization (Lees & Cikara, 2020).They paint the 'other side'the outgroupwith broad, possibly inaccurate strokes, using simple slur-like keywords to describe them and what they believe (see Table 2, 'gop' -Republican party, 'maga'a contraction of Trump's slogan 'make America great again!', 'proud boys'a militant extremist movement (Simpson & Sidner, n.d.), 'nazi').Also important to note are the moralistic and emotional tone ('traitor', 'insurrection', 'terrorist'), as well as the racial aspect ('white', 'black', 'supremacist').It is also interesting to note how topics across attributes relate to each other.In Figure 6 we have set up a network, with named topics as nodes and vertices representing the cosine similarity between the topics.There are two main clusters relevant to meta-perceptions, consisting of accusations against the outgroup (racists, Proud Boys, white supremacists against the Capitol rioters, etc.) and generalizations about the outgroup (equation of Democrats, liberals and leftists, and Republicans, respectively).It is also noteworthy that there existed a topic (generalization_anti_vote) that was not partisan but rather appeared to be critical of the two-party system in a very broad way, appearing to be opposed to participation in the voting process.Finally, the bottom cluster consists of uninformative topics (condescending_not_informative) and traditional toxic comments (vulgar_abuse), which in this context can be seen as noise.
We have also computed, using the same methodology, sets of topics for each attributeemotion pair.The most interesting selection of these can be seen in Table 3.While these additional topics are similar to the ones computed in Table 2, it is interesting to note that the topics computed for some combinations appear to be quite coherent; for instance, dismissive_anger_1 appears focused on themes of deception, while the generalization ones indicate slightly different identifications of the outgroup.
As a final consideration, on the issue of detecting sarcasm, we have observed certain keywords (see Tables 2 and 3) and syntactical structures (such as beginning a phrase with the interjections 'yeah', 'oh' or 'well') have been deemed important by the model.The presence of 'light' profanities (as opposed to gross vulgarities) might also be characteristic of sarcastic comments.The co-occurrence of negative sentiment and the emotion anticipation might also provide clues towards sarcasm detection.It must be noted, however, that these might not generalize well to other contexts as they appear at least in part to be internet-specific ('lol') or simply North American colloquialisms.Nevertheless, looking for such structures might become useful in retrieving sarcasm-related data from social media, similar to other cues such as the '/s' token on Reddit (Khodak et al., 2018).

Discussion
Useful for the interpretation of the results is the concept of meta-perception we borrow from the field of social psychology, defined as beliefs about what others believe (Lees & Cikara, 2020).Negative meta-perceptions are thought to be central to online political polarization, where engagement with the outgroup is mediated through text-based conversations.We have detected textual markers of such expressions, especially in tweets exhibiting the dismissive, condescending, and generalization attributes.Misperceived polarization refers to an unwillingness of actors to engage in a social media context due to negative meta-perceptions, the belief that 'the other side' is unwilling to compromise or even present as a good-faith actor (Lees & Cikara, 2020).Thus, toxicity appears to be a marker of avoiding dialogue due to these negative meta-perceptions and is indicative of a breakdown in communication in intergroup confrontation online.We have seen evidence of this in the fast rise and fall of toxic spikes visible in the data (see Figure 3).
The proportions and clustering of the attributes discussed above, the distinct emotional profiles found for the attributes, and the distinctions between topics generated from them all suggest the presence of different 'layers' of toxicity and unhealthiness exhibited through polarized or adversarial online discussions, with toxicity being the most virulent, followed by bad-faith argument, trolling, dismissiveness, or superiority, followed by different levels of generalization regarding the outgroup.Sarcasm appears to develop along a distinct axis and as such could be considered an unrelated characteristic that can appear at any level, but it is yet unclear how it affects online discussions.
Moreover, our findings suggest that the attributes of unhealthy conversation and the NRCLex emotion detection lexicon measure different, unrelated aspects when applied to social media discourse.Nevertheless, they appear to be good complements to each other, allowing researchers to gauge both the rhetorical forms, as well as the emotional contents of the discussion.When coupled with a topic analysis technique such as LDA, it is possible to infer semantics as well.When combined, these three elements allow invaluable insight into the collective mind, as much as it can be seen reflected as through a scanner, darkly; that is, reflected through the imperfect and systematically distorted lens of social media.

Conclusions
During the course of this study, we were interested in investigating a series of research questions.We can now sketch some answers: (1) approximately 42% of all tweets sampled have been identified as being at least to some extent part of unhealthy conversations (Figure 3) (2) this proportion varies widely over time (between 30 and 50%), changing in response to significant events, and the attribute-wise distribution is not constant, as controversial topics tend to lead to spikes in toxic (antagonize, dismissive or hostile) comments, which rapidly decline (see Figure 3) (3) unhealthy conversations do, in fact, have specific markers we can identify in text data, such as the lexical markers found in Table 3, suggesting that there are specific pragmatics of antagonism at play in Twitter discourse (4) there are specific emotional patterns that correlate with specific attributes of unhealthy conversations (see Figure 5), but there is no simple relationship between any of the unhealthy attributes or the emotions in the NRCLex model (see Figure 4), suggesting that they represent distinct aspects of the texts (5) we were indeed able to identify some topics of concern relevant to polarized discourse (Tables 2 and 3, Figure 6) We have gathered Twitter posts related to the 6 January 2021, Capitol riot in the context of the 2020 US Presidential elections and detected unhealthy attributes of posts using an ML approach.We compared our results with those obtained from the NRCLex emotion lexicon.We followed the trend of each attribute in relation to the events and revealed latent topics in the data using LDA.
We found that a substantial proportion of relevant tweets are part of unhealthy conversations, but that they are aggregated along a relatively small number of topics.We also find that unhealthy conversations have specific markers for specific attributes we can identify in text data, and that we can point out patterns on how unhealthy conversations promote polarization or the perception of polarization especially in the subtler layers of online toxicity, but that traditionally toxic conversations are generally not informative with respect to political polarization.We also note that the proportion of unhealthy tweets increases over time in response to significant events, eventually decaying as conversations turn toxic and engagement declinesat the same time, we observe that overall, unhealthy conversations make up a relatively stable proportion of tweets over time, suggesting that politically polarized discourse is a particular case of unhealthy online discourse.
We also observe that the attributes of unhealthy conversations do not correlate with specific emotions as revealed by the NRCLex emotion lexicon, but that there are some distinct emotional profiles for each attribute that tend to support our earlier observations about them clustering together: antagonize, hostile, condescending and dismissive cluster together; generalization and unfair generalization are very highly correlated and should be merged in future studies, and are separate from the other attributes, while being highly informative to how social media users perceive the 'other side'/outgroup; and finally, sarcasm is a distinct axis of conversation separate from other forms of unhealthy conversations.
A possible weakness of our interpretation of the results is that, by necessity given the large volume of data and lack of underlying demographic information, we assume homogeneity and symmetry across political divides, but this was clearly not the case during the 2020 election, when unsubstantiated and sensational arguments tended to be presented in the mass media much more by certain sections of pro-Republican and specifically pro-Trump voices (#StopTheSteal, 2021).However, as noted above, in online conversations the proportions of stereotypical generalizations of the outgroup do appear to be consistent regardless of who is speaking or who is seen as the 'other side'.
From a theoretical perspective, these findings are evidence for the hypothesis that social media represents a distorted view of social life, as the level of political discourse appears to be limited to a specific partisan mode of communication, where labelling the 'other side' and identifying one's 'own side' is more important than evidence-based argument or good-faith engagement.From a methodological perspective, we have outlined a way to assess the topics of interest in a specific field of study, in our case the 2020 US Presidential Election, and gained insight into what topics are most widely discussed in an unhealthy way on social media.These insights can be potentially used to dispel the aura surrounding these topics, as social media actors can highlight the established facts and combat their hijacking by self-serving, politically-motivated forces.By drawing attention to these controversial topics, we believe information campaigns can be built to address them and potentially defuse their inflammatory potential, yet we are skeptical about the role social media can play in this, especially given its potential to inflate already-existing partisan biases.
Finally, at the practical level, we have built a classification pipeline able to perform these analyses and documented our results, which can help social media stakeholders and government regulators understand the magnitude of the problem of political polarization and gain better insight into the topics discussed by agents involved in politicsrelated unhealthy conversations online, an important first step towards addressing it via internal social media reform and/or legislative action.Similarly, they can provide guidance to the general public by highlighting certain linguistic characteristics of the confrontational inter-group dynamics leading to unproductive and unhealthy discussions online, involvement in which can be avoided or minimized by paying attention to language use, especially the presence of labelling and stereotyping.
Future research on the phenomenon of polarization using unhealthy comments should focus on two general directions.Firstly, markers of polarization can be detected best in the middle layers of unhealthy attributes (dismissive, condescending, generalization), and research can focus on studying the prevalence and nature of accurate or misperceived polarization in these layers.In addition, more data labelled with these attributes is needed.The other direction concerns studying the role of sarcasm and irony in informing unhealthy conversations, and how it relates to online political polarization.

Figure 1 .
Figure 1.The study methodology we used.

Figure 2 .
Figure 2. ROC curves and ROC-AUC scores for each attribute, for the XLNet model we trained.

Figure 3 .
Figure 3. Tweet trends by unhealthy attribute cluster and by the proportion of unhealthy tweets relative to the total number of tweets.

Figure 5 .
Figure 5. Radar plots showing the proportion of tweets exhibiting the sentiment for each subattribute.

Figure 6 .
Figure 6.Network formed of LDA topics (nodes) linked by cosine similarity scores (edges).Only scores ≥ 0.5 have been included.

Table 1 .
Mean ROC-AUC score of the classifiers.

Table 2 .
Most relevant topics.