The online hostility hypothesis: representations of Muslims in online media

ABSTRACT Using a large data set of online media content in eight European countries, this paper broadens the empirical investigation of the online hostility hypothesis, which posits that interactions on social sites such as blogs and forums contain more hostile expressions toward minority groups than social interactions offline or in editorial news media. Overall, our results are consistent with the online hostility hypothesis when comparing news media content with social sites, but we find that negatively charged representations are common in both media types. It is instead the amount of attention to Muslims and Islam on social sites that most clearly differs and is the main driver of online hostility in the online media environment more broadly conceived.


Introduction
Representations of Muslims and Islam in Western news media have long been recognized for their remarkably negative tone (Ahmed & Matthes, 2017;Bleich & van der Veen, 2022;Dixon & Williams, 2015).In the digital age, with the proliferation of online media, concerns arise about the potential exacerbation of negative depictions, particularly on platforms that lack traditional journalistic filters and editorial control.The online environment, with its relative anonymity and absence of face-to-face communication (Brown, 2018;Duffy et al., 2005), could give rise to even more negative or hostile portrayals of minority groups.While this phenomenon has been empirically supported in case studies of specific online forums and blogs (e.g., Miller, 2017;Törnberg & Törnberg, 2016), there remains a dearth of research on representations of minority groups in the broader online landscape beyond platforms known for their hostility.
This study addresses these knowledge gaps by delving into the representations of Muslims and Islam across a vast corpus of online data encompassing eight European countries where Muslims are a significant and growing minority.Existing studies have primarily focused on specific contexts or the anglosphere, leaving a considerable void in cross-national coverage analysis (Bleich & van der Veen, 2022, p. 9).Our research seeks to bridge this gap by examining online hostility toward Muslims in established European democracies.
Conceptually, we emphasize a crucial distinction in the forms and prevalence of negativity in the representation of Muslims when exploring the online hostility hypothesis.We posit that the online hostility hypothesis can manifest through either or both of two empirical patterns.First, it could be reflected in more hostile forms of representation on social sites.Second, there may be a disparity in the amount of attention paid to Muslims on social sites compared to news media.If negativity is pervasive in both realms, differences in attention levels may influence the perception of a more hostile media environment on social sites.Hence, we pose two key questions: (1) Are representations of Muslims and Islam more hostile on social sites?(2) To what extent does content mentioning Muslims or Islam occur more frequently on social sites compared to news media?
In this study, we analyze a big data corpus comprising 41 billion words from various sources, including editorial news media and user-generated content on social sites such as blogs and forums.The dataset covers eight strategically selected European democracies where Muslims play an important role and are confronted with a growing presence of radical right populist parties: Norway, Sweden, Denmark, the Netherlands, Germany, France, Spain, and the UK.
Methodologically, we employ recent advances in computational linguistics and applied machine learning, more specifically, word embedding models.This approach allows us to move beyond the identification of explicit forms of hostile expressions and delve into more implicit manifestations of hostility (Bolukbasi et al., 2016;Garg et al., 2018).Unlike studies that rely on word co-occurrence counts, our approach enables us to comprehend the embedded meaning of words based on their semantic context within the corpus.Furthermore, it overcomes language barriers, allowing for cross-lingual comparisons and offering a broader understanding of media coverage of Muslims and Islam across different European languages.
In conclusion, this study aims to contribute substantially to the understanding of representations of Muslims and Islam in the ever-evolving digital media landscape.By examining both the prevalence of negativity and the attention Muslim communities receive in the online sphere, we aim to shed light on the role of online media in shaping public discourse and perceptions across European democracies.As online communication continues to shape societal attitudes and behaviors, insights from this research have critical implications for promoting inclusivity, countering stereotypes, and fostering constructive dialogue among diverse communities.

The online hostility hypothesis
Online media have brought significant changes to the way we create, distribute, and consume information.While traditional news media still play a central role in these processes, newer forms of online media such as blogs, forums and other user-generated content platforms have emerged without any editorial function or content moderation.Consequently, curbing online harassment, hostility, hate speech, and derogatory speech has become increasingly challenging (Sandberg & Segesten, 2022).In this environment, hostile representations have become a growing concern, especially with regards to ethnic or religious minority groups, as they are frequently targeted and marginalized in many online environments (Bilewicz & Soral, 2020;Soral et al., 2020).
Scholarly work has highlighted how the tone and style of online interactions often tend to be harsher, more hostile, and less civil than in other settings (Sandberg & Segesten, 2022;Sydnor, 2018).A key point of divergence within the research field is whether this pattern is primarily driven by an increase in hostile behaviors and the intensity of such expressions or if they simply become more observable and traceable.Proponents of the causality thesis argue that the absence of mass media gatekeeping processes and control over information makes it easier to express hostility toward out-groups.Less risk of legal and social penalties can in addition form an environment where the social repercussions of such speech are less severe, i.e., the social pressure to control prejudice weaken (Walker et al., 2015).The abundance of information available online and the resulting struggle for attention, could also drive content to become more extreme (Brady et al., 2017).
Online hostility is a concept covering a wide range of adverse expressions, including incivility, intolerance, cyberbullying, hate speech, trolling, and flaming (Sandberg & Segesten, 2022).These subcategories, in turn, vary greatly in their usage.For instance, flaming, a term that was popular in the past (Jane, 2015), is now rarely used, whereas incivility has become a more commonly used term, albeit one without an agreed-upon definition.Some have defined incivility loosely as 'the set of behaviors that threaten democracy, deny people their personal freedoms, and stereotype social groups' (Papacharissi, 2004, p. 267).Others, such as Rossini (2022), make a distinction between incivility and intolerance, viewing intolerance as an expression of negative stereotypes of groups based on race, sex, gender, or religion.Online hostility, in contrast, is a broader concept that encompasses a range of various negative expressions.
Overall, the drivers of hostility in the online public sphere is still a live research question.Conventional wisdom has it that anonymity and lack of face-to-face interactions make users less prone to act in accordance with social norms.Online interactions might be perceived as angrier and less civil than exchanges in offline settings due to a psychological mismatch between the impersonal online environment and human adaptations to face-to-face interaction (Sydnor, 2018).Recent studies, however, have questioned the alleged role of anonymity as drivers of online hostility (Bor & Petersen, 2022: Rossini, 2022).They instead argue that the 'hostility gap' can be explained by the public nature of online discussions and their increased visibility, which expose people to more hostile attacks directed against strangers than otherwise would be noticeable to them (Bor & Petersen, 2022).The motivation for sharing hostile content has further been associated with discontent rather than a genuine belief in the content, which has been referred to as a 'need for chaos' (Petersen et al., 2018).Thus, sharing hostile content about Muslims might be more strongly motivated by political and strategic considerations and less of a reflection of sincere beliefs.To sum up, the causes of online hostility are likely multiple, and we currently do not know exactly which factors weigh more heavily on the observed patterns.However, they jointly lead to a strong expectation of increased hostility in social forms of online media.

Representations of Muslims and Islam
The media plays an essential role in shaping the public sphere and public perceptions of minority groups, as it determines both visibility in the media, or lack thereof, as well as how religious or ethnic minority groups are portrayed in the coverage.In societies where the majority population has limited contact with Muslims, it is problematic, from a democratic representation perspective, when Muslims are persistently portrayed in a negative light or associated with threats and security risks.Such portrayals are likely to influence how this group is viewed by society at large, as well as how the group perceives its own place within that society (Bleich & van der Veen, 2022).One-sided negative reporting can therefore have a detrimental impact on the social cohesion of the wider community, particularly when it leads to stigmatization, discrimination, or prejudice toward Muslims (e.g., Saleem & Ramasubramanian, 2019;Saleem et al., 2017).In the first truly large-scale analysis of newspaper coverage of Muslims conducted recently, Bleich and van der Veen (2022) examine the tone of coverage in well-established traditional news media, mainly in the U.S.They found that coverage of Muslims is remarkably negative in tone by any measure and much more negative than the coverage of other groups.This pattern was consistent over time, with certain predictable changes occurring after the 9/11 terrorist attack in 2001.Even if the national newspapers they examine, such as the New York Times and the Washington post, may not set out to stigmatize Muslims, it is nevertheless the case that the tone in their aggregated coverage of Muslims stands out as especially negative (Bleich & van der Veen, 2022).
All the consequences of this negatively charged coverage are not known.But we do know that negative news has a disproportionate influence on cognition and attitudes (Kroon et al., 2021;Soroka & McAdams, 2015;Soroka et al., 2019).Thus, media coverage matters: 'What the media communicate both reflects and reinforces perceptions of Muslims that shape social boundaries in our societies' (Bleich & van der Veen, 2022, p. 4).
It has been suggested that more distinctive, group-specific constructs should be considered when studying the dynamics of prejudice toward Muslims as a minority group in Western countries where Muslims are perceived to represent a challenge to the modern Christian world (Oskooii et al., 2021).In that vein, Western societies view Muslims not only as a religious minority but also as a cultural minority (Kalkan et al., 2009).Representations linked to social identity in addition to religious traits could therefore be particularly relevant for Muslims, not least in relation to being perceived as a threat to the social identity of the majority population.At the same time, it is important to highlight that prejudiced views of Muslims should not be confounded with a legitimate critique of Muslim practices based on secular grounds (Imhoff & Recker, 2012;Tartaglia et al., 2019).When examining representations of Muslims, negativity could stem from either secular critique of Islam or prejudices building on negative stereotypes, however, jointly they might lead to essentially negative representations for a vulnerable group in society.
In the coverage of Muslims, patterns of representations that are not intentionally negative or explicitly hostile, are of importance.Of interest, therefore, is examining also more broadly the nature of representations of Muslims in large media corpora.Next, we turn our focus to how these representations might take form outside the journalistic newsroom, and in the broader online environment.

Online social sites
Social sites such as the blogosphere, social news, forums, and message boards represent a highly diverse and heterogeneous environment online.Previous studies have mainly been case studies focused on specific sites and forums where hostile content is expected.This limited scope has therefore left us uncertain about the extent and nature of depictions of Muslims in the broader online environment.Hostile expressions, in general, have been observed to gain traction on social sites.Studies of the portrayal of Muslims and Islam on specific sites have shown that discussions often revolve around the perceived political threat from Islam, with Muslims being depicted as an outgroup embroiled in conflict, violence, and extremism (Awan, 2014;Miller, 2017;Törnberg & Törnberg, 2016).Additionally, Islam is frequently portrayed as a religion oppressing women and homosexuals, and more extreme associations in social forums have been made between Islam, pedophilia, and child marriages (Törnberg & Törnberg, 2016).Taken together, these studies reveal a 'toxic' online environment where hostility toward Muslims is prevalent and reinforced.Hostility toward Muslims in online discussions also tend to increase after Islamist terror attacks (Álvarez-Benjumea & Winter, 2020), indicating the responsive nature of online discussions to real-time events.In such instances, particularly pernicious negative stereotypes about Muslims as potential terrorists and violent extremists are reinforced (Sides & Gross, 2013).
Although limited in their scope, the existing studies on individual social sites, suggest a widespread negative bias.We can therefore formulate a general expectation that negative outgroup stereotypes pertaining to Muslims at the very least are reproduced on social sites but that we might also expect to see a potentially stronger negative bias or other types of negative portrayals than in news media as these two media types naturally are used for different kinds of communication.
Turning our focus to a potential attention gap that together with the prevalence of negativity would support the online hostility hypotheses, we now consider how social sites online can be expected to drive attention to Muslims in Western countries.Previous research suggests that internet forums 'serve as an online amplifier that reflects and reinforces existing discourses in traditional media' of Muslims and Islam (Törnberg & Törnberg, 2016, p. 132).Hate speech directed at Muslims and anti-Muslim sentiment are not only explained by users' motivations but also by a platform's policy and the specific opportunities that arise for these types of expressions on a platform (Ben-David & Matamoros Fernández, 2016).Factors such as the absence of content moderation can therefore be expected to lead to more extreme expressions.The surges in hateful and xenophobic content online after terrorist attacks have, for example, been found to be highly dependent on which social norms are prevalent in online forums (Álvarez-Benjumea & Winter, 2020).Platforms where the level of monitoring is low or editorial function is absent are thus at higher risk of reinforcing prejudice.If others voice xenophobic opinions in online environments, subsequent comments also tend to become more extreme (Álvarez-Benjumea & Winter, 2020).For many online social sites, we can assume that attention to Muslims as a minority group is amplified.This is especially true if the news media is perceived to be neglecting or toning down issues related to Muslim integration.A need to bring such issues to light might then develop, while controversial aspects at the same time often drive attention online (positive or negative).
It should also be pointed out that there are additional effects of exposure to hostile sentiments in the broader online environment, than in traditional news media.In the absence of strong social norms online, negative stereotypes or hostile sentiments might be reinforced in a downward spiral.This kind of debate climate may also risk muting moderate voices and lead to (self-) exclusion of minority groups (Alkazemi, 2015;Bor & Petersen, 2022;Slater, 2007).Moreover, if certain expressions and attitudes are more visible than others, it might lead to the assumption that these views are more widespread than they are.It is therefore central to shed light on the potential differences in how Muslims as a minority group are portrayed across media types and to what extent the amount of attention differs.So even in the instances where the two types of media are equally negative (or displays variation in the type of negative context), the mere share of attention on social sites is important because it might lead to perceptions that some views are more widespread than they de facto are.
Based on our comprehensive review of the literature concerning online hostility toward Muslims in Europe, we have formulated a two-dimensional concept to understand the 'online hostility hypotheses' better.The first dimension focuses on the intensity or nature of expressed hostility, while the second dimension examines the amount of negative attention.By assessing the evidence in our study against these conditions, we aim to determine the level of support for the online hostility hypothesis in relation to Muslim representations.
To accomplish this, we have chosen to concentrate on eight European democracies, namely Norway, Sweden, Denmark, the Netherlands, Germany, France, Spain, and the UK.These countries were selected based on their shared experience of considerable growth within their Muslim minority populations, as well as the presence of radical right populist parties (RRPs) represented in their parliaments.The presence of RRPs in the political landscape indicates that the issue of migration is politicized and potentially extensively discussed in both editorial news media and unmoderated social sites.
Utilizing online text data from these eight countries provides us with a robust and substantial dataset for conducting our empirical analysis.

Data and methods
For the analysis we have utilized a distributional semantic online lexicon developed by the Linguistic Explorations of Societies (LES) project (Dahlberg et al., 2023).The lexicon allows researchers to explore meanings and usages of words in online media across a substantial number of languages. 1The LES distributional semantic online lexicon contains a massive number of news articles, blog posts, and forum posts that are divided into 'news' media and 'social' content.Along with the text content, each document contains metadata, including time stamps, source country, and source language.Overall, the lexicon covers data from 45 languages and 102 countries, and in combination, it covers 141 language-country combinations (for a detailed overview, see Dahlberg et al., 2023).Of these available data, we chose to focus on established democracies in Western Europe that have experienced considerable growth within their Muslim minority populations and in which questions related to Muslim minorities have been central to political campaigns and mobilization in the period of data collection by the presence of radical right populist parties.Further narrowing our scope, we have on top of the above mentioned criteria, chosen to include only those countries where both sufficient amount of data exists for the analysis and where the authors have the basic knowledge of the languages required in order to conduct sensible analyses and interpretations, which left us with a sample of eight countries: Norway, Sweden, Denmark, the Netherlands, Germany, France, Spain, and the UK.Within these countries, there are important variations that might influence our measurements such as media system, journalistic conventions, migration restrictiveness etc.In addition, we cannot fully account for the possible variations when it comes to media sites included or excluded in the corpus, as will be further discussed below.Our results should be interpreted with these possible caveats in mind.
In the lexicon, data is provided by the Swedish language technology company Gavagai.The lexicon is continuously updated with daily batches of structured online text data in the JSON format, each JSON-file contain several documents. 2It should be mentioned that text data has become a profitable commodity and acquiring data from large technology companies is difficult and very costly, even for larger data distributors.Consequently, the LES lexicon only contains sources open to the public and free of charge.Web articles placed behind paywalls are omitted from the sample, as are data from closed platforms, such as Facebook, Instagram or Twitter (Dahlberg et al., 2023).Controlling the content of the different sources in the text database is practically impossible.Dahlberg et al. (2023) have, however, provided a list of the top ten URLs for each language, country, and media type available in the lexicon (see Table in  Appendix 1).The list ranks in descending order all of the stem URLs sampled in the models according to the number of web documents originating from each particular URL.In spite of the large number of documents, it is important to remember that they do not represent the entire text corpus for a given language-country combination.Rather it is a large, although not random, sample of the largest sources from each languagecountry combination.
The classification between social sites and editorial media in our study is justified for several compelling reasons.The classification decision was made independently by technicians at the data provider, ensuring objectivity and consistency in the analysis.This pragmatic approach avoids potential biases that could arise if researchers subjectively classified the data based on their research interests.
News media, including traditional newspapers, are recognized as established editorial sources with content produced through professional journalistic practices.While some news articles may feature reader-generated content in comment sections, the primary content is authored by professionals and the comments sections are strictly moderated.Consequently, the vast majority of text data from these sites is expected to be editorial in nature, making it highly relevant for examining news media representations.
On the other hand, social sites comprise a diverse range of online platforms, such as forums and blogs, where communities of users actively generate and disseminate content.These platforms offer more participatory and decentralized spaces for discussions and exchanges.The data from social sites is open and accessible to a broader audience, which allows for greater anonymity and potentially more candid expressions.This characteristic is particularly important for exploring the online hostility hypothesis, as more anonymous spaces may facilitate more pronounced expressions of online hostility.
A drawback in our analysis is the fact that we do not have access to closed platforms like Facebook, Twitter, or Instagram.At the same time, these kind of media platforms often require user verification, reducing the level of anonymity.In contrast, open web forums, categorized as social sites in our study, offer a higher degree of anonymity, enabling users to maintain their privacy through the use of non-verified e-mail addresses.This feature may contribute to more candid expressions and a potentially heightened level of online hostility.
Additionally, the division between news media and social sites aligns with the evolving nature of online communication.Social platforms are known for their participatory nature, where user-generated content and interactions play a significant role.By distinguishing these participatory spaces from news media, we can focus on distinct aspects of online communication and identify patterns of semantic similarity in the representations of Muslims.The division in our study is thus both practical and methodologically reasonable.
From a methodological point of view, the lexicon builds on word vectors or word embedding models.These models are machine-learning methods used in natural language processing and distributional semantics in which each word is represented by a vector that captures the semantic relations between terms.According to the distributional hypothesis, semantic similarity is defined as linguistic distributions in which the degree of semantic similarity between two linguistic terms depends on how similar the linguistic contexts in which they appear are (Lenci, 2008, p. 3).In this respect, it is not the co-occurrence of words per se that is of interest but rather the co-occurrence with the same other words.That is, it is the shared context that matters as two words that co-occur with the same cluster of other words is considered semantically similar (Garg et al., 2018).Few social science studies have yet to utilize these recent advances in natural language processing (Rheault & Cochrane, 2020), but they show clear advantages over more traditional lexicon-based approaches.Crucially, these models have proven to be a useful tool for studying subtle aspects of out-group hostility in mass-mediated content (Kroon et al., 2021).
The LES distributional semantic online lexicon is an implementation of a specific type of word2vec model, Continuous Bag of Words (CBOW), which was developed by Mikolov et al. (2013).The selection of this model is motivated by its particularly suitable properties, including effective computational implementation and high-quality vectors.By assigning context and target vectors to words via an initial randomization of vectors, the CBOW model is computationally efficient and to compare the semantic similarity of different words, the vectors are subsequently updated and trained to satisfy the distributional hypothesis.More specifically, by gathering and processing numerous samples of text data, the CBOW model updates the vectors.To determine a score for the target word and its context, the components of the single target and context vectors are multiplied and compiled (Dahlberg et al., 2023).There are, however, some drawbacks associated with word2vec models.Antoniak and Mimno (2018) used a smaller corpus to demonstrate that different word embedding methods yield unstable results.In the case of small corpora, it is unclear whether this result reflects corpus characteristics or language constants.This variance is, however, significantly lower for larger corpora sizes (Pierrejean & Tanguy, 2018).Given that the median corpus size of all models in the LES lexicon (~140,000,000 words for news media data and ~ 110,000,000 for social data) is considerably greater than the largest corpus used by Antoniak and Mimno, one can expect less variability of this type.
Our analytical strategy builds on using the word embeddings themselves as a measure and the output from the word2vec model in terms of similarity scores for neighboring terms.When the cosine similarity is higher than around 0.7 this usually shows a robust semantic similarity in terms of how two terms are used (Dahlberg et al., 2023).For our analysis of negatively loaded words, we observed that the top 30 most semantically similar terms to 'Muslims', all had a similarity score exceeding 0.7.We therefore chose to narrow our analysis of negative terms to this subset of the top 30 across countries and then proceeded to qualitatively interpret and categorize these terms based on their negative connotations.In the section of our analysis where we examine several negative terms across media types, the variation in the similarity scores between the same terms are of importance.For this part of our analysis, therefore, we included the top 100 most semantically similar terms.

Representations of Muslims in online social and news media
In the first part of the analysis, we focused on examining evidence of a difference in the tone of Muslim representation between the two media types.First, we explored if 'Muslims' tended to be associated with a larger set of negatively charged terms in social content.Next, we examined the intensity of negatively charged words, if they were more strongly associated with 'Muslims' in social content than in news media content.

Negatively charged words
In the first test, which focused on whether representations of Muslims on social sites are characterized by a larger variety of negative associations, we examined the 30 words with the highest cosine similarity scores for each of the two media types.We made a qualitative interpretation of a 'negatively loaded' term as these often appeared as 'extremists', 'terrorists' or 'Islamists' while other terms where neutral such as nationalities or general religious terms such as priests or imam (see Appendix 2).Cosine similarity indicates the level of semantic similarity between two terms as represented by the vectors.This reflects the semantic context in which the terms occur and should not be confused with document context.Thus, important to keep in mind when interpreting the results is that any negative term reflects a semantic context rather than a co-occurrence of words, why the terms are not necessarily used directly to describe Muslims but used in a similar way.
Figure 1 shows that to the extent that there is a systematic difference between social and news content, it is in the direction of the online hostility hypothesis.In most of the countries, there were slightly more negative semantically similar words to 'Muslims' in social forums compared to news media (see Figure 1).In the Nordic countries and in the UK, there were more negatively loaded words on social forums, with Sweden displaying the largest difference between media types.Sweden is also the country with the largest number of negatively loaded words in social content (7 out of 30), followed by Denmark (6 out of 30), and Norway and the UK (5 out of 30).Nevertheless, negatively loaded words exhibited greater prominence in social content only within five out of the eight examined countries.Additionally, the findings indicated a notable statistical significance solely in Sweden. 3 Across both media types, there emerged a notable coherence among negatively loaded terms such as 'Islamists,' 'extremists,' 'Nazis,' 'terrorists,' and 'jihadists'.This pattern signifies a discursive trend related to extremism and terrorism, wherein these terms shared analogous semantic contexts with the term 'Muslims.'This consistent association was observed across nearly all countries and within both media types.However, it is worth noting that Spain diverged from this prevailing trend.
In countries where fewer negatively loaded words appeared in news media (Denmark, Sweden, Norway, UK, and Germany), we found a clear pattern of a linguistic context relating to terrorism and extremism, but no other negatively charged words appeared in the news content.Taken together, the following words appeared in news media content in these countries: 'nazis,' 'extremists,' 'terrorists,' 'jihadists,' 'militants,' 'right-wing extremists,' 'fundamentalists,' and 'mørkemaend' [roughly religious fundamentalists].
In social content, we found additional negatively loaded words with high similarity scores such as 'negroes' in Sweden and Germany, 4 'barbarians' in the Netherlands, 'invaders,' in Germany and Spain, 5 'communists' in Denmark and Spain, and 'fascists' in the UK (see Appendix 2).This suggests a larger depository of negatively loaded terms in the semantic context pertaining to social content although the patterns of differences across media types were not consistently present across all contexts.
Out of the top 30 terms with highest linguistic similarity, we also noticed a clear tendency across countries toward fewer religious words related to 'Muslims' on social sites.Instead, we note connections to terms about minority groups, immigration or national identity in general, such as the Roma, Yazidis, or Palestinians.Other social identity-related terms, such as 'women,' 'feminists,' 'homosexuals,' and 'transgender,'  were also found, as well as 'oppression'.Apart from minority groups and immigrationrelated terms, these other identity-based markers foremost appeared in the Nordic countries and the Netherlands.This could indicate a stronger discourse in these countries around the perceived conflict between gender and sexual rights and Islam.
To sum up, we find that the most prevalent negative associations in news media are centered around the linkages to terrorism and extremism, whereas additional negatively charged terms emerged in the social forum data in a majority of the countries but not in all (e.g., barbarians, invaders, and negroes).This renders some support to the online hostility hypotheses when it comes to a larger span of negative sentiment.The first part of this analysis of tone thus finds some support for the online hostility hypothesis.That noted, the most prominent negative association between Muslims and terrorism is present in both media types and not specific to social forums.For the next part of the analysis, we examined if these negative associations were stronger in social content.

Cosine similarity
In the second step of our analysis, we took full advantage of the cosine similarity metric to examine to what extent the similarity score differed between sources for a chosen negatively charged term (the higher the similarity score, the more similar the use of the two terms).A similarity score that is close to 1 indicates that two words are used more or less synonymously.We examined how this score for the most common negatively charged representation of Muslims -the association with terrorism and extremismdiffered across media types.As these associations were found in all countries and in both social and news content (among the top 100 neighboring terms with the highest similarity score), we were provided with the opportunity to compare the similarity scores of an identical set of corresponding words across media types.In previous studies, the association to terrorism and extremism has been found in both news media and on various internet forums.However, according to the hostility hypothesis, this link could be stronger, i.e., have a higher cosine similarity score on social sites than in the news media.
Figure 2 shows that nearly all the words related to terrorism/extremism had higher similarity scores in social content, overall supporting the online hostility hypothesis, although the cosine similarity scores are high in both media types.This finding generalizes across all countries, but there are some notable country differences that could be further explored in future work.In Spain, the similarity scores were the most similar between media types for merely two phrases: 'fundamentalists' and 'extremists.'We found the biggest difference between media types in Sweden, the cosine similarity scores for social content were high -all phrases scored between 0.83 and 0.88-and news media had among the lowest scores.Compared to Sweden, Denmark and Norway both showed a smaller gap between media types but otherwise displayed a similar pattern: a tendency toward more hostile portrayals on social sites.However, the differences in similarity scores between media type was only significant for Sweden. 6 To sum up, Figure 2 also provides some, albeit modest, evidence that is consistent with the online hostility hypothesis.The main finding is that the semantic similarity between Muslims and terrorism and extremism are quite high in both media types.

The attention gap
To examine the second part of the online hostility hypothesis, the potential attention gap, which holds that Muslims and Islam are discussed more often on social sites than in news media, we compared differences in attention between media type across countries.Both the amount of textual data and the total term count varied by country, depending upon the number of unique words and the size of the textual corpora.As an indicator of amount of attention, we therefore calculated the relative term frequency per thousand occurrences of the terms 'Muslim', 'Muslims' and 'Islam' in each country for both social sites and news media sources (the translated term count divided by the total term count multiplied by 1,000).
Figure 3 shows the amount of attention directed toward Muslims and Islam in news media and social sites.The x-axis illustrates the relative frequency of mentions in the whole corpora (per thousand), the countries are sorted in descending order given the value of attention in social forums.
The main takeaway from Figure 3 is considerable evidence of an attention gap as 'Muslim/Muslims' and 'Islam' made up a larger share of the total content on social sites than in news media.This implies that the broader online media environment contributes to increased visibility of content about Muslims and Islam in Europe. 7 There were some differences across countries.To start with, Sweden had a somewhat larger gap than Norway, and 'Muslim(s)/Islam' occurred more often in Norwegian news media than in Swedish.However, the extent to which these terms appeared on social sites were equivalent (the relative term frequency was about 0.11 in both countries).In the Netherlands, news media wrote even less about Muslims and Islam, although these topics occur far more often in social content (just as in Sweden).In the UK, the relative share of mentions was much higher in social content, resulting in the largest gap despite mentions in news media not being exceptionally low (0.08).In France, there was a slightly smaller attention gap between social and news media sites, as the use of the terms 'Muslim(s)/ Islam' occurred relatively frequently in both.The relative share of mentions in Germany was similar to that of France, with only a slightly lower representation of Muslims in social content.In news media content, the lowest relative share was observed in the Netherlands, followed by Sweden and Spain.
It is generally common in all these countries, then, for Muslims and Islam to attract considerably more attention on social sites than in news media; however, Figure 3 also shows that there are two exceptions to this general pattern.Denmark stands out as the only country where Muslims and Islam made up a much larger share of news media content than of social content.The difference is mainly the result of a comparatively high share of mentions in Danish news media.This raises interesting hypotheses about peculiarities in the amount of coverage about Islam and Muslims in traditional news media in Denmark.Spain stands out as a country-context with comparatively less content about Islam and Muslims in both media types.

Discussion and conclusion
In this study, we explored the online hostility hypothesis, which posits that usergenerated content in online media might lead to an increase in hostile expressions, particularly affecting minority groups.By utilizing advanced word embedding models, we aimed to analyze not only explicit traces of online hostility but also the more implicit forms of biases in the representation of Muslims within the broader online environment.
An essential aspect of our research involved comparing data categorized into 'social' and 'news' media content.Despite acknowledging the inherent imperfections in this classification due to blurred boundaries between the two, such a challenge is common in studies focusing on online media.Nevertheless, it does not diminish the importance of our results, as our study still contributes valuable insights into the distinctions in online rhetoric between social sites and news media.By centering our analysis on social sites such as blogs and forums, we gained unique access to representations in more participatory and user-generated spaces, offering unparalleled insights into how minority groups are discussed in online environments where users actively generate content.
We formulated a conceptualization of the general online hostility hypothesis as a twodimensional framework that distinguishes between whether: (1) representations of Muslims on social sites take on more hostile forms and ( 2) the amount of negative attention given to Muslims is greater.
Regarding the first dimension, our results are mainly influenced by representations linked to extremism and terrorism in social forums, which (re)produce and reinforce existing representations in the news media, rather than Muslim representations being considerably more hostile on social sites.In most contexts studied, the number of negatively loaded terms on social sites tended toward a larger repertoire of derogatory terms, possibly reflecting a larger presence of extremist content in social forums.Although these terms were not numerous, the fact that some quite extreme wordings, such as 'barbarians,' were found among the terms with the highest similarity scores says something about the semantic context in which Muslims appear in social forums.In addition, our results showed that the linkages to terrorism and extremism were overall more strongly pronounced on social sites in many countries, but not all.
Regarding the second dimension, our study yielded intriguing findings concerning the attention gap.We discovered that representations of Muslims and Islam received more attention on social sites than on news media sites in most of the countries under study.This discrepancy underscores the significant role social platforms play in shaping the discourse around minority groups, potentially contributing to a more hostile public sphere.
Moreover, the selection of countries allowed us to explore diverse socio-political contexts, contributing to a comprehensive analysis.While acknowledging country-level differences, we highlight common patterns of negative representations and attention given to Muslims on social sites across these European democracies.Muslims and Islam were represented more often in social forums than in news media in most of the countries studied.However, Denmark and Spain were two exceptions.In Denmark, the relative frequency of the terms 'Muslim/Muslims' and 'Islam' was much higher in news media than in social content, and in Spain, the relative frequency was similar between media types.We interpret these findings as the Spanish political discourse around Muslims and Islam being less charged regardless of media type, whereas in Denmark, the news media stands out because of the comparatively high amount of coverage.Such forms of variations need to be followed-up in future research.
In conclusion, our results are consistent with the online hostility hypothesis and more strongly support an attention gap in the two types of media.It is important to note that hostility in online media is not less prevalent than what previous research has indicated.The sheer size of the data analyzed here is not well suited to capturing the harm that could be done by the most aggressive or hostile expressions found on various unedited blogs and forums.Nonetheless, our findings should be interpreted in the context of representations of Muslims in a very broad and popular 'social' online environment.
This study contributes to a larger body of work that documents how social sites with usergenerated content, such as social media, blogs, and forums, have contributed to a fundamental reshaping of the public sphere.A common challenge in this study and most others is the lack of access to data from some of the largest social media companies such as Facebook, Instagram and Twitter.It is possible, but ultimately remains to be established if the results of our study would have supported the online hostility hypothesis even more strongly if such data could be included.Also, as mentioned, a limitation of our study is the challenging task of distinguishing between news media and social sites in the online realm.News media content is integrated into social platforms, and vice versa.Despite the imprecise categorization of the sources, our results show clear differences between the two types of online media.
Large platforms today have active policies against content that is intolerant or hostile toward minority groups.A potential consequence of these policies might be representation that is increasingly similar to legacy news media.Nonetheless, the negative effects of onesided angles and representations, such as Muslims being predominantly portrayed as extremists or terrorists, remain.The increased attention paid to Muslims on social sites therefore contributes to a more hostile public sphere.
Our findings underscore the importance of addressing negative portrayals of minority groups in online media.By emphasizing shared challenges and overall trends, our study enriches the understanding of the impact of user-generated content on public discourse.As we acknowledge the limitations and challenges faced in analyzing online representations, future research could explore data from larger social media platforms to enhance the understanding of online hostility.Despite these limitations, our contribution highlights the need for increased attention from media companies, regulators, and researchers to promote more inclusive and respectful online environments for minority groups.

Figure 1 .
Figure 1.Online hostility-number of negatively loaded terms.Note.The figure shows the number of negatively loaded words among the top 30 words with the highest similarity score in social and news media content.See Appendix 2 for the complete list of most similar terms.

Figure 2 .
Figure 2. Cosine similarity scores.Note.The figure shows cosine similarity scores for terms related to extremism/terrorism that were found in both media sources in the eight countries of interest.The maximum value along the y-axis is 1.

Figure 3 .
Figure 3. Attention gap between social and news media sites.Note.The figure displays the relative term frequency of the terms 'Muslim', 'Muslims', and 'Islam' in social and news media across eight European countries.The relative frequency values in the figure were calculated by dividing the total number of occurrences of the terms by the total number of words in each corpus, then multiplying by a constant of 1000.This allows for a comparison of term usage across different corpus sizes.Taking Denmark as an example, the corpus size for social sources is 287,515,778 and the term frequencies 28,269, multiplied with a constant of 1000 the value is 0.0983.The corpus size for news sources in Denmark is 306,396,404 and the term frequencies 44,341, resulting in a value of 0.1447 (multiplied with 1000).