Examining the emotional impact of sarcasm using a virtual environment

ABSTRACT This study aimed to investigate the emotional impact of sarcasm. Previous research in this area has mainly required participants to answer questions based on written materials, and results have been mixed. With the aim of instead examining the emotional impact of sarcasm when used in a more conversational setting, the current study utilized animated video clips as stimuli. In each clip, one individual answered general knowledge questions while the other provided feedback that could be delivered either literally or sarcastically, and either complimented or criticized the question answerer’s performance. Participants rated the feedback based on (a) the anticipated emotional impact on the recipient, (b) how the speaker intended the recipient to feel, and (c) whether the speaker intended to be humorous. Results overall supported the Tinge Hypothesis, showing that sarcastic criticism was rated as less negative than literal criticism, and sarcastic compliments (also termed “ironic praise”) were rated as less positive than literal compliments, when judged from both the perspective of the recipient and of the speaker. The speaker was also perceived to be intending to be more humorous when sarcastic feedback was given.

Irony is used in around 8% of conversational turns between friends (Gibbs, 2000), making it a common form of nonliteral language. Verbal irony describes a situation in which an individual makes a statement that means the opposite of what they say (Booth, 1974). This article concentrates specifically on sarcastic irony (also known as sarcasm), which is a common form of irony most frequently used in situations of interpersonal communication (Hancock, 2004). Although the definition of sarcastic irony is debated, it is commonly believed to differ from general irony in that the statement or attitude conveyed tends to be one of ridicule (Lee & Katz, 1998;Toplak & Katz, 2000), directed toward a specific individual (Kreuz & Glucksberg, 1989). This can either be through uttering a positive statement to imply something negative (sarcastic criticism) or uttering a negative statement to imply something positive (sarcastic compliment). For example, if someone performed poorly in a test, a sarcastic response may be "Wow, you're so smart" (sarcastic criticism), or alternatively, if they performed exceptionally well, a sarcastic individual might say, "Wow, you did terribly" (sarcastic compliment).
Although ironic criticism is more common than ironic praise (Sperber & Wilson, 1981), ironic compliments do nonetheless still occur. It is interesting to note that some researchers may argue that sarcastic compliments do not exist, as they do not easily fit with the definition of sarcasm (i.e., to criticize an individual). However, even though a sarcastic compliment would intend to praise, there may still be an additional element of implied criticism that is not present in a literal compliment, as sarcasm tends to be delivered in a more mocking manner (Kreuz, Long, & Church, 1991).
Since verbal irony, and therefore sarcasm, often involves stating the opposite of what is meant, the words used are unlikely to resemble the speaker's intended meaning. Consequently, irony and sarcasm can be fairly ambiguous and may result in processing difficulty, for example, resulting in disruption to eye movements during reading of an ironic comment (e.g., Au-Yeung, Kaakinen, Liversedge, & Benson, 2015;Filik, Leuthold, Wallington, & Page, 2014;Filik & Moxey, 2010;Kaakinen, Olkoniemi, Kinnari, & Hyönä, 2014;Olkoniemi, Ranta, & Kaakinen, 2016;Turcan & Filik, 2016;. In addition to potentially resulting in processing difficulty, misinterpretation of a sarcastic comment can result in failed communication (Clark, 1996). Why then, is sarcasm commonly used as opposed to the generally more straightforward and comprehensible literal alternative? It seems likely that nonliteral comments entail different discourse goals that would not be achieved through the use of literal statements (Gerrig & Gibbs, 1988;Kreuz et al., 1991). Roberts and Kreuz (1994) explained that in order to justify the decision to use nonliteral language, such as sarcasm, the additional information gained from the chosen expression of speech must overshadow the possibility for misinterpretation of the intended meaning.
Experimental research into the additional goals achieved through the use of sarcasm have suggested a range of social and emotional functions, for example: to be polite (Kumon-Nakamura, Glucksberg, & Brown, 1995); to save face (Dews, Kaplan, & Winner, 1995;Jorgensen, 1996); to identify with the in-group (Colston, 1997); to mock (Katz & Pexman, 1997;Kreuz et al., 1991); and to harshly criticize (Gibbs, 1986;Jorgensen, 1996;Kreuz & Glucksberg, 1989). One particular goal that has often been associated with the use of sarcasm is to be humorous. For example, Kreuz et al. (1991) found people chose to use sarcasm with the intent to be funny and witty. Additionally,  and Toplak and Katz (2000) found that ironic comments were perceived as more humorous than their literal alternatives. Furthermore, Dress, Kreuz, Link, and Caucci (2008) found that although sarcasm was perceived to be negative, it was also rated as more humorous. In an attempt to explain the effect of humor,  suggested that the disparity between what the speaker intends and what the speaker actually says inevitably creates tension and that this tension, together with the surprise of hearing a comment that is the opposite to what is expected, results in the comment being perceived as humorous.
Thus, sarcasm may be used to influence the emotional impact a comment has on the recipient compared to the literal alternative. However, there is disagreement regarding whether this effect is to mute (e.g., , or instead enhance (e.g., Colston, 1997) the positive or negative nature of the statement. The tinge hypothesis, also known as the "muting the meaning hypothesis", was first proposed by  and suggests that sarcasm is used to attenuate the condemnation or praise in a message, relative to the literal, direct alternative. This is based on the idea that when a person first hears a sarcastic statement, they must initially process the literal meaning to some degree (as shown by, e.g., Fein, Yeari, & Giora, 2015;Giora et al., 2007), which then "tinges" the intended meaning. For example, using the negative term "awful" to sarcastically compliment an individual would tinge the comment with the negative literal meaning of awful, thus making it appear less positive than the positive literal alternative of say, "great." Likewise, using the positive term "great" to sarcastically criticize someone would tinge the comment with the positive literal meaning of great, to appear less negative than the negative literal alternative, "awful." A number of studies have previously been conducted to evaluate the tinge hypothesis, but conflicting results have been reported. For example,  investigated the impact of ironic insults and ironic compliments by presenting participants with a booklet displaying transcripts of conversations between three individuals. While participants were able to freely read the transcripts, a narrator also read the conversations aloud, altering their tone of voice depending on the feedback given. Each scenario ended with the speaker expressing either a literal or ironic comment to another character about offensive (study one) or commendable (study two) behaviors. Participants then completed a series of rating scales regarding how condemning or praising they viewed the comments to be. They found evidence for the tinge hypothesis in that ironic compliments were viewed as being significantly less praising than their literal alternative in study two and similarly, ironic criticisms were viewed as significantly less condemning than their literal alternative in study one. Thus, an ironic statement muted the emotional impact of the message in comparison to the emotions produced from an equivalent literal statement (for further evidence see Filik et al., 2016;Jorgensen, 1996).  related their findings to Brown and Levinson's (1987) politeness theory, which suggests that ironic insults are used as an alternative to literal insults because their indirectness consequently enables the criticism to be expressed in a less face-threatening, more polite manner. However, this theory only accounts for sarcastic criticisms while the tinge hypothesis considers sarcastic compliments too.
Conflicting evidence, however, is presented by Colston (1997). He argued that the negative emotional impact of a message is enhanced, rather than muted, through the use of sarcasm. He proposed that a contrast is created when an individual comments using a more desirable outcome than what actually occurred, as in sarcastic criticism. This contrast then makes the current outcome appear more negative than if a literal comment had been made. Consequently, this could lead to increased condemnation through the use of sarcasm. He carried out a similar experiment to  but with no verbal presentation of the stimuli and found that ironic criticisms were judged as more condemning than literal criticisms.
Further evidence that sarcasm may not always mute the meaning of a message was provided by Toplak and Katz (2000), who built on Colston's (1997) findings by considering the perspective of the speaker of the comment as well as that of the recipient. They found that sarcastic comments were rated as more impolite from both perspectives, relative to the literal alternative. Here, the speaker was rated as intending to be more verbally aggressive and offensive through the use of sarcasm and the recipient was perceived to be more offended by such sarcastic remarks. Therefore, it would seem that sarcasm enhanced the criticism from both points of view (for further evidence of an enhanced effect see Bowes & Katz, 2011;Filik, Hunter, & Leuthold, 2015;Gibbs, 2000;Lee & Katz, 1998;Leggitt & Gibbs, 2000). Subsequent studies have generally followed the methodologies used by  and Colston (1997), where participants read short vignettes with characters making literal or sarcastic remarks to other characters and then rated the statements on various measures. This led to the question of why such varying results have been found when the methodologies appear relatively similar. One potential reason is that a range of different dependent measures have been used. Although most researchers used rating scales to gather participants' responses, the specific questions to be answered varied. For example, Colston (1997) used ratings of condemnation;  used ratings of humor and degree of insult;  additionally used ratings of criticalness; and Jorgensen (1996) gathered a range of ratings including politeness, thoughtfulness, degree of hostility, and seriousness.
As a result, Pexman and Olineck (2002) argued that the inconsistent findings were due to the different dependent measures across past studies varying on the basis of whose emotions the participants were required to interpret. They argued that some studies required participants to judge the speaker intent, and therefore take the perspective of the speaker, whilst other studies had participants consider the perspective of the recipient, thus measuring social impression. For example, they proposed that ratings of mocking and sarcasm measured speaker intent while ratings of politeness and positivity measured social impression. In a subsequent study, Pexman and Olineck (2002) found that the tinge hypothesis only applied to sarcastic insults when participants were asked to judge the social impression (impact on the recipient and/or a bystander) and rather an enhanced negativity effect was observed with speaker intent. Muting was however found from both perspectives with sarcastic compliments. This therefore indicates that the muting effect may be moderated by the perspective that the participant is required to take. Pexman and Olineck (2002) referred to this idea as the "modified tinge hypothesis." Pexman and Olineck (2002) argued that this effect was observed for sarcastic criticism because the intended meaning is not always obvious. Therefore, when considering the intent of a speaker, the perceiver (be it the recipient of the comment, or an onlooker) must focus on the negative information provided by the context, which consequently enhances the perceived negativity of the comment. Contrarily, when taking the perspective of the addressee or bystander (rating social impression) negative information in the context may not be taken into account since the contextual information may not be readily available, thus allowing the message to be muted by the positive literal meaning of the sarcastic comment. They proposed that this does not occur with sarcastic compliments because, unlike sarcastic criticisms, the negative literal meaning of the sarcastic utterance is evident and cannot be misinterpreted. Therefore, there is no need to take into account the context, thus affecting ratings similarly, with a muting effect from both perspectives.
Having said this, Pexman and Olineck's (2002) work was based on the untested assumption that mocking and sarcasm ratings assess speaker intent, and that politeness and general positivity assess social impression. Consequently, Bowes and Katz (2011) carried out a factor analytic study, but failed to find single factors of speaker intent and social impression, thus failing to support the above assumptions. They additionally found that it did not matter what perspective the participant took; sarcastic comments were still viewed as more victimizing than literal ones, further criticizing the tinge hypothesis and supporting the enhanced negativity view of Colston (1997).
However, as in previous studies, Bowes and Katz's (2011) measurements of intentions and impressions were not directly separate. As a result, Boylan and Katz (2013) carried out a study with similar methodologies but directly asked participants for either ratings of speaker intention or social impression of the criticisms presented, rather than assuming that certain factors implied impression or intent. They subsequently found similar results to Pexman and Olineck (2002), where sarcastic comments were rated not only as more mocking (enhanced negativity effect) but also as more humorous, positive, and polite (muting effect). However, their results did not support the explanation that the differences were caused as a result of the perspective taken.
As a result of these varied findings with written scenarios, the current study aims to assess the emotional impact of sarcasm in more of an animated conversational setting. As described above, participants in previous studies read and/or listened to literal versus sarcastic comments. Although research has demonstrated that participants are good at imagining the written scenarios and taking on the perspective of the characters involved (Toplak & Katz, 2000), the methodology could still be made more realistic or usage-based, regarding how an individual would encounter sarcasm in everyday life. Therefore, in the current study, participants will be presented with virtual scenarios on a computer screen, which will provide them with the opportunity to observe and listen to spoken interactions, as opposed to having to read a transcript and imagine the scenario.
In the current study, during each scenario, one character will ask another character a general knowledge question to which their response will be either correct or incorrect. The character asking the questions will then provide feedback on the other character's responses, which will vary in terms of literality (literal or sarcastic) and valence (compliment or criticism). Following each scenario, participants will be presented with three rating scales that will assess: the perceived emotional impact of the feedback on the recipient (from very negative to very positive); how the speaker intended to make the recipient feel when giving the feedback (from very negative to very positive); and the intent of the speaker to be humorous (from not very to very).
Based on previous research, we hypothesize that the use of sarcastic feedback will impact emotions differently to literal feedback. Specifically, the tinge hypothesis would predict that a muting effect will occur. That is, sarcastic criticisms will be rated less negatively than literal criticism, from both the perspective of the speaker and of the recipient. Likewise, sarcastic compliments would be rated less positively than literal compliments, again from both perspectives. Alternatively, following Colston (1997), it would be predicted that sarcasm may enhance the negativity of a message, following which we would expect the opposite pattern of results. Third, the modified tinge hypothesis would predict that the muting effect would vary, depending on the perspective taken. When considering the perspective of the recipient, both sarcastic compliments and sarcastic criticisms would be rated as less positive and negative, respectively. On the other hand, when considering the speaker's perspective, it would be expected that sarcastic compliments would be rated as less positive than literal compliments. However, an enhanced negativity effect would be expected with sarcastic criticisms, resulting in more negative ratings than their literal alternative.
A secondary aim was to assess whether sarcastic irony would still be perceived as being used with the intent to be more humorous compared to the literal alternative using the current paradigm. Following the results of Kreuz et al. (1991), it was hypothesized that the perceived intent to be humorous for both sarcastic criticism and sarcastic compliments would be rated as greater than that of literal criticisms and literal compliments.

Method
Participants: Forty native English-speaking students from a range of academic institutions including the universities of Nottingham, Lincoln, Loughborough, Derby, Manchester, and Liverpool took part (mean age = 20.5 years, SD = 2.05, 23 females).
Materials: The stimuli were created in and recorded from an open world role-playing PC game, The Elder Scrolls V: Skyrim. This game was used as it features a sophisticated editor called the Creation Kit (http://www.creationkit.com). This allows the creation of relatively realistic environments and the implementation of animated and voiced characters, with facial and lip-sync animation matching the spoken audio track. The stimuli were presented as individual video clips using the "Open Sesame" software and included a total of 60 general knowledge questions, which were taken from http://www.pubquizarea.com/. On selecting the quiz questions to be used in the current study, we wished to ensure that the questions would be perceived as challenging and not trivially easy (so that it would be believable that the hidden character would give an incorrect response on 50% of trials). To this end, 15 participants completed a selection of 463 questions from the website, and their responses were then marked. The final 60 materials that were selected reflected a range of difficulties (M = 41.6% correct, SD = 21.9%, range = 0-87%). The questions and the feedback to responses were pre-recorded by a male native English speaker. This audio was assigned to a custom character using the Creation Kit, along with the answers to the general knowledge questions.
The resulting stimuli each consisted of a short video clip with a question, answer, and response to that answer. The speaking character (the "Quizmaster") was presented in the center of the scene, and would ask a general knowledge question to another (unseen) character (the "Player"). Four potential answers then appeared as text on-screen, to the right hand side of the Quizmaster: one correct and three incorrect (see Figure 1 for an example of the character and potential answers). The response made by the player was then indicated on the screen by an arrow and highlighted in bold (see Figure 2). The Quizmaster then provided either literal or sarcastic feedback to the player's response. There were four variations of feedback per question, which were matched so that they only varied with regard to the final word: literal compliment, literal criticism, sarcastic compliment, and sarcastic criticism (see Table 1 for an example question and the Appendix for a wider selection; the full set of materials is available from the corresponding author). Participants listened to the interactions via headphones.
The literal compliment and sarcastic criticism conditions were differentiated by tone of voice (likewise for the literal criticism and sarcastic compliment conditions). Following the approach of Filik et al. (2014), since we wanted all materials to sound natural, when the stimuli were recorded, the speaker was instructed to speak as he would normally in order to convey the message that was intended (i.e., literal or sarcastic) in all cases. Stimuli were then reviewed by the authors and rerecorded if necessary. To further aid in the correct interpretation of the feedback as being literal or sarcastic, the participant was also informed of whether the player's answer had been correct or incorrect via the presentation of either a green tick (to indicate correct) or a red cross (to indicate incorrect) positioned in the center of the screen for 1,000 ms.
Three 7-point rating scales were then presented sequentially. These measured the emotional impact (1), speaker intent (2), and the intent to be humorous (3):  (1) How do you think the recipient of the comment would feel in response to the feedback given? Very Negative-1 2 3 4 5 6 7-Very Positive (2) How do you think the speaker intended the recipient of the comment to feel in response to the feedback given? Very Negative-1 2 3 4 5 6 7-Very Positive (3) How humorous do you think the speaker intended to be?
Not Very-1 2 3 4 5 6 7-Very There were 60 trials in total, consisting of 30 trials with the player making an incorrect response, half of these receiving literal criticism and half receiving sarcastic criticism as feedback, and 30 trials with the player making a correct response, half receiving a literal compliment and half receiving a sarcastic compliment. Trials were counterbalanced so that each participant observed each trial in only one of the four conditions. Procedure: Participants were first presented with a set of instructions stating that they would be required to watch a scenario on the computer screen and then answer a series of three questions after each clip. They were also reminded that the player was aware of whether or not their answer was correct or incorrect. They clicked "I'm ready" and pressed the space bar to continue. Before each trial, the participant was presented with a fixation point for 300 ms. They were then presented with the video clips on the computer screen where they observed the conversations between the two characters. Following each video clip, participants were presented with the three 7-point rating scales. These were presented individually in the center of the screen and the next question appeared once the participant had made their response. They were given as much time as they needed before answering, and made their response by pressing the corresponding number on the keyboard. Once participants had responded to all three questions, a black screen appeared, instructing them to "press space" to continue. This allowed participants to have a break following each trial if needed. The next trial commenced when they pressed the space bar.

Results
Separate 2 feedback literality (literal vs. sarcastic) × 2 valence (compliment vs. praise) repeated measures analyses of variance were conducted on participants' mean rating scores for each of the three dependent variables, treating both participants (F1) and items (F2) as random variables.
Question 1 (perceived emotional impact): How do you think the recipient of the comment would feel in response to the feedback given?
The Analysis of simple main effects indicated that when compliments were literal they were rated as more positive than when they were sarcastic, F1(1, 39) = 198.43, p < .001; F2(1, 59) = 857.83, p < .001, but when criticisms were literal they were rated as more negative than when they were sarcastic, F1(1, 39) = 27.10, p < .001; F2(1, 59) = 111.82, p < .001 (see Figure 3). Question 2 (Perceived emotional-impact intent): How do you think the speaker intended the recipient of the comment to feel in response to the feedback given?

Discussion
The results showed that ratings of the perceived emotional impact of feedback were influenced by literality. Specifically, literal compliments were rated as more positive than sarcastic compliments and literal criticisms as more negative than sarcastic criticisms. The same pattern of effects was observed when participants considered the perceived intent of the speaker. These results are consistent with a range of previous findings including those by , Filik et al. (2016). Following the explanation proposed by , this effect can be explained in relation to the tinge hypothesis, in which the sarcastic meaning of the comment is "tinged" by the literal meaning. For example, in the current study, one potential feedback was "Wow! Your History is great." If this was intended sarcastically in response to an incorrect answer, then the positive literal meaning of "great" would tinge the intended negative message to appear less negative.
As previously mentioned, the current results could have alternatively followed an enhanced negativity effect (Colston, 1997) or the effect could have varied depending on the perspective taken by the participant (Pexman & Olineck, 2002). However, the current findings are inconsistent with these hypotheses. First, Colston (1997) proposed that sarcasm would induce higher negativity ratings irrespective of the feedback valence, thus suggesting sarcastic criticisms and sarcastic compliments would both be perceived as more negative than their literal alternatives, regardless of perspective taken. While this study found that sarcasm reduced the perceived positivity of compliments, enhanced negativity was not observed with sarcastic criticisms (from either the perspective of the speaker or the recipient) and therefore this hypothesis was not supported. Alternatively, Pexman and Olineck (2002) proposed that a muting effect would be observed for sarcastic compliments and sarcastic criticisms when considering the impression of the feedback but would only apply to sarcastic compliments when considering the speaker intent, with enhanced negativity for sarcastic criticisms. However, since the current study found a muting effect from both the point of view of the speaker and the point of view of the recipient, this hypothesis was also not supported. Although Colston (1997) and Toplak and Katz (2000) found evidence for the enhanced negativity effect, and Pexman and Olineck (2002) found evidence for the modified tinge hypothesis, the current findings are inconsistent with these. One potential explanation for why the current findings do not support those of Colston (1997) may be due to the auditory mode of stimulus presentation. Colston (1997) proposed that his findings differed from those of  due to differences in the way their stimuli were presented. He suggested that specific prosodic cues (which were present in the current study) may have been present in the acoustic presentation used in  that consequently decreased the negativity of the message. In the current study, participants also had access to visual information due to the video presentation of the stimuli. Given that there was no condition in which participants simply heard the audio, it is difficult to know the extent to which each factor may have contributed to the muting effect. However, it is important to note that Pexman and Olineck (2002) and Filik et al. (2016) used standard written materials and also found muting effects, suggesting that the tone of voice (or presence of video-based information) may not be the reason for the observed effect.
It is also the case that the current stimuli differed somewhat from the kinds of scenarios that are commonly used in the literature (usually written texts describing some kind of social interaction). The use of a quiz-type scenario was designed to create a situation where someone has either performed well, or badly, thus eliciting direct criticism or praise. Another key consequence of such a task is the consideration of the extent to which the participant might feel that the character "deserved" to be criticized or praised for their performance. For instance, if someone gives a correct answer for a trivially easy question, then any form of praise may seem over the top. In contrast, if someone gives an incorrect answer for a very difficult question, then any form of criticism may seem overly harsh. This question undoubtedly applies to the kinds of actions that are criticized and praised in the scenarios used in previous studies in the literature, and would be an interesting avenue for future research. In relation to this, it may also be of interest to establish how the character that utters the criticism or praise actually feels about the actions or performance of the recipient, in addition to how they intend the recipient to feel.
The results of the current study are also inconsistent with Pexman and Olineck's (2002) explanation of why previous studies have found conflicting findings. They proposed that the differences were due to the perspective taken by the participant. However, this is not the case in the current study, where a consistent muting effect was found when participants rated both the emotional impact on the recipient, and the intent of the speaker.
The current result also showed that when considering the speaker's intent to be humorous, sarcastic feedback was rated to be used with a greater intent to be humorous than literal feedback, in relation to both compliments and criticism. This finding could be explained by the speaker intending to maintain a positive relationship with the recipient and relieve their anxiety about answering incorrectly. For example,  proposed that the speaker-addressee relationship may be impacted less negatively when ironic criticisms are used compared to literal criticisms in stressful situations. They proposed that literal criticisms may increase the recipient's stress while humorous ironic criticism may relieve some of this stress. Since the current study employed a situation where the recipient of the feedback was being tested on their general knowledge, it would seem fair to assume that to some degree, they may feel an element of stress. However, this does not explain why sarcastic compliments were also rated with a greater intent to be humorous than literal compliments, as it is likely that there would be little anxiety induced from answering a question correctly.
Although the current research aimed to provide a more naturalistic representation of a social interaction than used in many previous studies, there are still aspects in which this approach could be further developed. For example,  pointed out that one issue with all previous research is that the participant has consistently been an "observer" of the situation rather than actually experiencing the ironic or literal remarks themselves. This was the case for the current study and it is possible that ratings may have differed if the participant was the recipient of the feedback. Therefore, one aim of our future research is to make the participant the subject of the feedback and directly measure their emotional response (using direct measures of emotional responding such as those employed by Thompson, Mackenzie, Leuthold, & Filik, 2016). This will also allow us to further test the recent proposal that emotional responses to sarcasm may change over time (Filik, Brightman, Gathercole, & Leuthold, 2017).
To conclude, there have been varied findings in relation to the emotional impact of sarcasm, with some studies finding a muting effect, some finding an enhanced negativity effect, and others finding the effect to vary depending on the perspective taken. By employing stimuli representing a conversational setting, as opposed to using written or spoken scenarios, this study demonstrated that sarcastic irony when used in the context of feedback on task performance can effectively mute emotional impact of criticism and praise, as judged by an observer; supporting the tinge hypothesis. We also found that in this context, sarcasm is judged as being more humorous than literal feedback. In order to get an even clearer idea of the emotional impact of sarcasm, our next steps are to make the participant the recipient of the feedback and directly measure the emotional impact of sarcasm.

Funding
This work was supported by the Economic and Social Research Council [grant number ES/L000121/1], awarded to Filik. We would like to thank S. Ling for processing the audio files and implementing them in the Creation Kit.