Cognitive features of indirect speech acts

ABSTRACT The offer of some cake can be declined by saying “I am on a diet” – an indirect reply. Here, we asked whether certain well-established psychological and conceptual features are linked to the (in)directness of speech acts – an issue unexplored so far. Subjects rated direct and indirect speech acts performed by the same critical linguistic forms in different dialogic contexts. We find that indirect replies were understood with less certainty, were less predictable by, less coherent with and less semantically similar to their context question. These effects were smaller when direct and indirect replies were matched for the type of speech acts for which they were used, compared to when they were not speech act matched. Crucially, all measured cognitive dimensions were strongly associated with each other. These findings suggest that indirectness goes hand-in-hand with a set of cognitive features, which should be taken into account when interpreting experimental findings, including neuroimaging studies of indirectness.


Introduction
In day-to-day communication, people often communicate in an indirect manner. For instance, exchanges such as "Would you like to have dinner at a steakhouse?", followed by the reply "I am vegetarian" occur often and are seamlessly understood. In the present case, the reply is understood as implicating (+>) a "no". From a theoretical perspective, indirect speech acts have been described as cases of language use where a speaker who "utters a sentence, means what he says, but also means something more" (Searle, 1975). In this perspective, indirect speech acts allow the speaker to perform one speech act and in addition perform another one. On Searle's account the listener then infers what the intended additional meaning of the speaker was by using general world knowledge, but also by assuming cooperativeness of the speaker as well as assuming his/her contributions to be relevant. Similarly, Grice attempts to provide a rational framework to explain how indirect speech acts are comprehended (Grice, 1975). He also proposes that conversational success is based on its cooperative nature, implicating that all communicating partners are cooperative and assume the same of each other. In Grice's words, this means that they follow a communicative principle to "Make [their] contribution such as is required, at the stage at which it occurs, by the accepted purpose or direction of the talk exchange in which [they] are engaged" (Grice, 1975). This further implies that speakers follow several communicative maxims, including the maxim of Relation (Grice, 1975), and say things that are relevant for the scope of the conversation, rather than producing utterances that are unconnected to each other. In Grice's cooperative framework indirect speech acts (which lead to Relevance Implicatures) are those speech acts that prima facie appear to violate the principle of Relation, but in fact do not on second view. The "irrelevance" is only apparent, as the implied (second) meaning conveyed by the utterance is in fact relevant for the ongoing conversation. Another peculiarity of indirect speech act is that the implicated content is not logically entailed by the literal meaning of the same utterance. So, the reply "I am vegetarian" in the example above conversationally implicates that the addressee does not want to join the person who made the offer to visit the steakhouse, although it does not logically imply (entail) it. Finally, indirect speech acts are strongly context-dependent, where context is meant in a broad sense, thus including immediate physical context, linguistic context and background knowledge or common ground. In our example "I am vegetarian" would hardly ever be understood as a declination if the linguistic environment similar to the context sentence "Would you like to have dinner at a steakhouse?" were absent. The outlined features suggest that a range of different cognitive properties distinguish direct from indirect speech acts.
The phenomenon of indirectness has been the object of attention also in the field of psycho-and neurolinguistics. Comprehension of the intended indirect meaning is thought to be the result of a process of inference that allows the comprehender to go beyond the (often irrelevant) literal meaning and find the relevant non-literal one. The exact mechanisms underlying the processing and understanding of indirect speech acts have been the object of debate and research, resulting in several cognitive accounts (Standard Pragmatic Model inspired by Grice, 1975;and Searle, 1975; Direct Access Hypothesis, Gibbs, 2002; Graded Salience Hypothesis, Giora, 1997Giora, , 2002Relevance Theory, Sperber & Wilson, 1995; see Meibauer, 2019 andRuytenbeek, 2021 for a review of open issues). In addition, experimental studies focused on the neural (and other physiological) correlates of indirectness assessed which processing delays characterise and which brain areas engage specifically in the processing of indirect (as compared to direct) speech acts. These studies highlighted how indirect replies elicit different EEG (Coulson & Lovett, 2010) and pupillary responses (Tromp et al., 2016). Overall, these studies showed relatively consistently that two major brain networks were active when indirect replies were contrasted with direct replies (Bašnáková et al., 2014(Bašnáková et al., , 2015Feng et al., 2017Feng et al., , 2021Jang et al., 2013;Shibata et al., 2011;van Ackeren et al., 2016). The first network involved areas such as the medial prefrontal cortex (mPFC), the left and/or right temporoparietal junction (TPJ) and the precuneus. These activations were interpreted as being part of an inferential process eventually allowing the listener to understand the communicative intention of the speaker (Bašnáková et al., 2014(Bašnáková et al., , 2015Feng et al., 2017Feng et al., , 2021Jang et al., 2013;Shibata et al., 2011). The second network, which is consistently found active with the same contrasts, groups together several bilateral cortical areas that have been related to language such as the inferior frontal gyrus (IFG) and the middle temporal gyrus (MTG) as well as the temporal poles (TP). These were interpreted as involved in processing greater demands for coherence building in order to construct the situation model and semantic binding to allow bridging larger semantic gaps between the indirect reply and its context (Bašnáková et al., 2014(Bašnáková et al., , 2015Feng et al., 2017Feng et al., , 2021Jang et al., 2013;Shibata et al., 2011).
However, in order to study the mechanisms of indirectness comprehension both at the cognitive and neural level, it is essential in the first place to understand in which ways indirectness differs from directness. Interestingly, a systematic quantitative study of how direct and indirect speech acts are perceived and which cognitive properties distinguish them from one another is still not available. In particular, interpretation of the results provided by neuroimaging studies have a limited scope if no information about the cognitive properties of indirect vs. direct speech acts is available. For instance, the above-mentioned studies interpreted greater activation in the right MTG and right IFG as a result of the greater effort required to achieve a coherent reading in the case of an indirect reply. This interpretation rests on the fact that other studies have established that these very same areas play a role in coherence building and on the intuition that indirect replies might be less coherent with their context than direct ones (reverse inference; Poldrack, 2006). However, a crucial piece of information is missing. That is, it has not been shown yet that indirect replies actually have a lower coherence with their context than direct ones. Only once such information is provided, can the claim that indirectness processing requires a greater engagement of coherencebuilding efforts be fully justified. In a similar fashion, other properties of indirectness might need to be characterised in order to better understand how processing of indirectness engages certain neural and /or cognitive mechanisms. Therefore, the goal of the present study is to characterise (at least some of) the cognitive properties of indirectness, such that investigations of cognitive and neural mechanisms of indirectness comprehension can be informed.
In the previous section, we have provided classical definitions of indirect speech acts (or Relevance implicatures) by Searle and Grice. These definitions allow to set up certain hypothesis about how direct and indirect speech acts can be differently perceived. In particular, if indirect speech acts are context-dependent, then the relationship between direct and indirect speech acts and their respective linguistic context might systematically differ, which in turn might affect the cognitive processes engaged during comprehension of indirectness. As stated by Searle based on Grice, indirect speech acts are the result of an apparent violation of a maxim. In other words, they seem not to satisfy a tacit "rule" that typically constrains communication. Therefore, we hypothesise that indirect replies might be less predictable than direct ones. In addition, it is specifically the Maxim of Relevance that is apparently violated by indirect speech acts. This means that the utterances used to perform an indirect speech act might appear to be semantically unrelated or disconnected from their context. Therefore, we hypothesise that indirect speech act might be less coherent and less semantically related to their context. Finally, as the non-literal message conveyed by the means of an indirect speech act is not entailed (but only implicated following an inferential scheme) by the literal interpretation of the utterance, it is possible that it is interpreted with less certainty compared with a direct reply. As these four dimensions of predictability, semantic relatedness, coherence and interpretative certainty might be related to the linguistic definition of indirectness, we also hypothesise that they correlate with one another. Importantly, these four properties, are also known to be associated with specific patterns in brain activity (see Discussion), which might also be detected in neuroimaging studies of indirect speech act comprehension. As such, they are of particular importance given our aim to inform neuroimaging research. Please note that, whereas some of the features mentioned, e.g. Coherence, are sometimes discussed in interpretations of experimental work, others, including predictability and interpretative certainty, are rarely taken into account (see Discussion).
Additionally, neuroimaging studies have focused on neural correlates of indirectness from different points of view. Whereas some of these focused on specific cases of indirectness, for instance, indirect utterances used to convey a request/directive speech act (Coulson & Lovett, 2010;Tromp et al., 2016;van Ackeren et al., 2012), other examined neural correlates of indirectness using a broader variety of stimuli used to convey various types of communicative intention and, therefore, speech act functions (or illocutionary forces), such as statement, request, opinion expression, disclosure, request refusal, excuse, etc. (Bašnáková et al., 2014(Bašnáková et al., , 2015Feng et al., 2017Feng et al., , 2021Jang et al., 2013;Shibata et al., 2011;van Ackeren et al., 2016). Finally further studies examined indirect speech acts depending on whether or not they had a face-saving effect (Bašnáková et al., 2014(Bašnáková et al., , 2015, namely based on whether the use of indirectness had the effect to make an utterance more polite and more socially acceptable as in the case of indirect excuses, indirect refusals or indirect negative opinions. However, none of the previous experimental neuroimaging studies of indirectness reported that they matched the speech act function between direct and indirect conditions. Looking at the example stimuli given in some of the well cited works it appears that some studies had a mixture of matched un unmatched stimuli, but did not have this property as a factor in their analysis (Bašnáková et al., 2014(Bašnáková et al., , 2015, while others appeared to have only unmatched stimuli (van Ackeren et al., 2012). Thus, the effect of the presence or absence of SA-change co-occurring with indirectness has never been manipulated in a controlled fashion within the same study nor it was the object of systematic investigation. This factor is however susceptible to affect neural mechanisms involved in the comprehension of indirectness, given that different types of speech acts have been shown to be associated with different neural signatures (see e.g. Boux et al., 2021;Egorova et al., 2013Egorova et al., , 2016Egorova et al., , 2014Tomasello et al., 2019Tomasello et al., , 2022. We therefore decided to create two sets of stimuli: one in which this confound was removed, namely where direct and indirect conditions performed the same speech act type, and another with "non-SA-matched" in/direct speech acts, as they have commonly been used in neurocognitive studies. Here follows a more detailed explanation of this important difference. If we take the reply "I am healthy again", it could be read as a direct reply in the context of the question "Have you still got a cold?", and it could also be read indirectly in the context of the question "Are you still taking these pills?". In both cases the reply, while conveying different messages, has an assertive communicative function (Searle, 1979), namely the function of describing a state of affairs. There is therefore no change of speech act type co-occurring with indirectness. Let's now take a different example. The reply "I am vegetarian" is read as a direct reply with an assertive communicative intention in the context of the question "Do you eat meat". However, when read in the context of the question "Would you like to have dinner at a steakhouse?", it is interpreted as the declination of an offer (commissive speech act; Searle, 1979). In this latter case, indirectness co-occurs with a change in speech act type.
In the present study, we separately examined indirect replies with and without changes in speech act type, as the co-occurrence of change of speech act type might possibly require different cognitive mechanisms. For instance, it might require additional processing as, in addition to the mere propositional content of the utterance, also the speech act type has to be inferred and recalculated. We therefore hypothesised that indirect replies with speech act change might differ more substantially from their direct counterparts than would direct and indirect twins matched for speech act function. The additional differences would then be attributable to the additional difference in speech act function. Nevertheless, we still expected indirect replies to be rated markedly differently from direct ones, and to be attributed relatively lower Coherence, Predictability, Semantic Similarity and interpretative Certainty.
To sum up, our aim for the present study was to assess whether there are systematic differences in how direct and indirect replies are perceived. We studied direct and indirect replies which, were conveyed by the same linguistic form but acquired a direct/indirect pragmatic status based on the preceding context question. This approach is similar to the methodology used in recent neurocognitive studies (Bašnáková et al., 2014(Bašnáková et al., , 2015Feng et al., 2017Feng et al., , 2021Jang et al., 2013;Shibata et al., 2011;van Ackeren et al., 2016) and therefore maximises comparability of the resulting findings. Additionally, we assessed whether these properties were affected by whether indirectness was co-occurring with a change of speech act type relative to the direct response, a factor that was not systematically examined so far. To this scope, we used two different sets of stimuli, one where indirectness occurred with (non-SAmatched) or without (SA-matched) a change in speech act type relative to its direct interpretation. Importantly, our stimulus material was created following established linguistic definitions of indirectness (see above and see Materials and Methods). We then asked participants to rate the direct and indirect replies on the cognitive dimensions of Certainty of interpretation, Coherence with the context question, Semantic Similarity to the context question and Predictability. We therefore examined whether theoretical linguistic notions were reflected by how lay subjects perceive indirectness. In addition, to check for congruency between the established linguistic criteria used in stimulus generations and the subjects' understanding of "direct" and "indirect" replies, we asked participants to rate the property of Directness. Finally, we asked whether all these rated properties were in close association with one another and, in particular, whether they were consistently tied to indirectness.

Subjects
Twenty-eight healthy adult volunteers (11 males, 16 females, 1 diverse; age mean = 25.5 years, ±4.8 SD,median = 24,range = [20,33]) took part to our study. All subjects were right-handed (mean LQ = 80.4 ± 19.4 SD), as assessed by the Oldfield Handedness Test (Oldfield, 1971), did not report having any psychological or neurological disorder and had normal or corrected-tonormal visual acuity. Additionally, they were all native speakers of English, which was also the only language that they spoke at native level. The study was carried out in accordance with the Helsinki Declaration after ethical permission had been obtained from the Ethics Committee of the Charité Universitätsmedizin, Campus Benjamin Franklin (Berlin, Germany). All participants were recruited via advertisement on campus. They all signed an informed consent form prior to the beginning of the experiment and received a monetary compensation of 10 EUR/hour. The entire session including net task time, breaks, instructions and administrative forms was always rounded-up to the full hour and therefore compensated with 30 or 40 EUR.

Stimuli
Individual stimuli were minimal dialogues consisting of two utterances, a question (interrogative) sentence uttered by partner A (henceforth the "context question") and a reply, a declarative sentence uttered by partner B (henceforth the "critical" reply). Each reply was preceded by one of two alternative context questions, which defined whether the critical reply was direct or indirect. All question sentences were yes/no (polar) questions. Therefore, all replies could be interpreted either as a "yes" (henceforth positive polarity items) or as a "no" (henceforth negative polarity items) to the question. Note that the label positive/negative polarity items only reflects their interpretation as "yes" or "no" answer in the present study and is completely unrelated to the linguistic property of polarity, which instead denotes a distributional property of certain lexical items across affirmative and negative sentence types (Baker, 1970). In selecting sentence pairs for direct and indirect speech acts, we followed the classic criteria of Grice (1975) and others (Levinson, 1983;Searle, 1975). Specifically, indirect replies were defined as (i) an apparent violation of Grice's principle of Relation with respect to the context question (Grice, 1975), (ii) performing one speech act by the way of performing another (Searle, 1975) and (iii) implicating a non-literal level of meaning that is not entailed by the literal sentence meaning by which it is conveyed (Levinson, 1983). In contrast, direct replies were defined as not fulfilling the criteria (i)-(iii), while providing a straightforward literal reply to the context question.
Two different sets of stimuli, speech act matched (SAmatched) and non speech act matched (non-SAmatched), were used that differed in the speech acts they conveyed. Let's first address the commonality between the Sets and then their differences. The direct condition was constructed identically in the two Sets and consisted of a question whose communicative function (i.e. speech act type or illocutionary force) was querying information and a subsequent affirmative whose communicative function was providing that factual information. For instance, question such as "Is your cat hurt?" or "Have you decided on a destination?" were followed respectively by the replies "It got wounded." and "We are not sure yet". However, the two Sets differed in their indirect condition. In the SAmatched set, the indirect condition consisted of a context question whose function was again querying information (e.g. "Are you bringing your cat to the vet?") and of an indirect reply (e.g. "It got wounded") which conveyed an indirect assertive speech act (+>Yes, I am bringing my cat to the vet). Therefore, within the SA-matched set, the only difference between indirect critical replies and the complementary direct critical replies was the in/directness of the critical replies, as, importantly, both still conveyed assertive speech acts. On the contrary, in the non-SA-controlled set, the indirect condition consisted of a question which conveyed an offer/proposal speech act, whereas the reply conveyed a speech act of accepting (in one half of the stimuli set) or rejecting the offer/proposal (in the other half of the stimuli set). For instance, the sentence "Shall I buy the train tickets?" was followed by the critical reply "We are not sure where to go yet." implicating a rejection of the offer (+> No, don't buy the tickets). The critical replies in the non-SA-matched set were thus assertive speech acts in the direct condition but conveyed a different speech act type (e.g. offer declination) in the indirect condition. Therefore, in the non-matched case, the difference between the direct and indirect replies was not exclusively constituted by their in/directness, but, in addition, by their type of speech act. Stimuli examples for both the SA-matched and the non-SA matched set are provided in Table 1.
All context questions and critical replies consisted of a single clause with a length between 3 and 8 words (see Table 2) and the critical reply was the same in both direct and indirect experimental conditions, thus being identical in all relevant psycholinguistic variables including length, bi-/trigram frequency, lemma frequency. However, to exclude potential context effects due to surface similarity between context questions and critical replies in the direct and indirect condition, the conditions were matched for various additional variables (see Table 3), namely length in words of the context question, pronoun repetitions between context question and critical reply, number of coreferences between the context questions and the critical reply, number of repeated lemmas between context questions and critical replies as well as cosine similarity between semantic vectors computed for the context questions and the critical reply. Cosine similarity is a measure of distributional semantic similarity between individual words or larger bits of texts which is based on Latent Semantic Analysis (LSA). LSA is a statistical method which, after training on a corpus, allows to represent any word (as long as it was provided during training) as a vector indexing the distributional properties of the item across many texts in a multidimensional semantic space. Also, novel combinations of these words (e.g. sentences), which were not part of the training corpus, can be represented as vectors in this semantic space by adding the vectors of their individual component words. Thus, the semantic similarity between two sentences is conceptualised as the cosine of the angle formed by the two vectors corresponding to the sentences of interest (Landauer et al., 1998;Landauer et al., 2007). In the present study, the cosine similarity between question and reply was obtained from the online tool, http://lsa.colorado.edu/, selecting the termto-term comparison and applying it to the tasaALL semantic space (300 semantic dimensions). The corpus on which distributional measures were calculated included written language coming from different types of documents including novels, newspaper articles and other texts, which were estimated to correspond to the reading level up to a fist-year college student (Landauer et al., 2007).
Each stimulus set consisted of 76 critical replies each of which could be presented in the direct or indirect condition. Half of them was to be interpreted as a "yes" (positive polarity items) and half as a "no" (negative polarity items), with the same critical reply maintaining the same polarity in both conditions. All above mentioned properties were matched between the eight conditions resulting from the crossing of the factors of SAmatching [SA-matched, non-SA-matched], Polarity [yes, no] and Directness [direct, indirect]. Although a small number of items had to be excluded from the analysis (see below for details), it was made sure that the final item sets used for evaluation remained well-matched for the above mentioned properties, as reported in Tables 2 and 3. Specifically, differences in length of critical reply and in number of content words in the critical replies were tested with a 2 × 2 ANOVA with factors SAmatching [SA-matched, non-SA-matched] and Polarity [yes, no] and were not significant (all main and interaction effects had p > 0.05). Differences in cosine semantic similarity and length of context question between conditions were also not significant as assessed by a 2 × 2 × 2 mixed ANOVA with factors SA-matching [SAmatched, non-SA-matched], In/Directness [direct, indirect] and Polarity [yes, no] (all main and interaction effects had p > 0.05). Number of repeated pronouns, number of coreferences and number of repeated lemmas were also comparable between conditions, as assessed by likelihood-ratio chi-squared tests applied to all (12) relevant pairwise comparisons (all p > 0.05). As all the indicators of semantic relatedness between context question and reply in the various conditions did not differ, we assumed that the degree of semantic relatedness of direct and indirect sequences were comparable.

Experimental procedure
Data collection was carried out at the Brain Language Laboratory at the Freie Universität Berlin. Subjects were invited to sit in a sound-proof cabin, facing a computer monitor. They were instructed to read all questionreply pairs that would be displayed on the screen and to rate them. The ratings were prompted by the questions reported in Table 4 and were given on a 7-points Likert scale with the respective anchor labels written below the extreme values (1 and 7) andwhen applicablethe middle (4) of the scale. Subjects were encouraged to provide intuitive ratings and to use the whole range of the scales. The written stimuli were visually presented using PsychoPy 2 (Peirce et al., 2019) in five distinct blocks. In each block, subjects had to rate all question-reply pairs under one of the following aspects or dimensions on a scale: (Function, FUN-R) the affirmative's function as a positive or negative reply, (Coherence, COH-R) the coherence between the speech acts performed by using the two sentences of the pair, (Directness, DIR-R) the directness of the speech act performed with the second sentence, (Predictability, PRE-R) predictability of the second sentence in context of the first, (Semantic Similarity, SSI-R) semantic similarity between the two sentences. Additionally, the certainty (CER-R) of the attribution of the affirmative to a "yes"/"no" function was derived from the Function rating and corresponded to the distance between the Function rating and the middle of the Function scale. Verbal instructions for each individual rating are available on the on-line repository (see Data Availability Statement). In order to avoid response biases due to the exact wording of the rating questions, the rating questions 3 to 5 were available in two versions, one for each half of the subjects ("How in/direct … ?"; "How un/predictable … ?"; "How close/ distant … ?"). For the same reasons, the anchor labels of the Likert scales were mirrored for half of the subjects. Question wording and anchor labels layout were however both kept constant across blocks within subjects. The order of the blocks (and so the order in which each subject gave the individual ratings) was randomised across participants.
All stimuli (direct and indirect from both sets) were displayed in random order within each block and with different randomizations for each subject. Each subject was exposed to all stimuli (both the direct and indirect versions) such that every item was rated 28 times and such that every subject saw each stimuli version (direct vs. indirect) five times. For each trial, the question and the reply were shown together on one slide but on separate lines in the upper half of the screen. At the same time, the question prompting the rating and the rating scale itself were displayed on the lower part of the screen. Subjects had to select one of the discrete Likert scale values with the left and right arrow keys and rejecting (or accepting) the offer no Note: In the SA-matched and the non-SA-matched sets both "yes" and "no" polarity items where present in equal numbers. and Polarity (Yes, No). The number of items (n) is indicated for each condition. Note that the critical utterance is identical between the direct and indirect condition.
confirm with the return key ( Figure 1). Each screen was shown until a selection was confirmed and the next screen including a new sentence pair was shown immediately. Overall, the ratings took ca. 2 h. Short breaks were allowed in the middle and at the end of each block (i.e. about every 10-15 min). In addition, subjects were asked to leave the testing cabin and take a longer (15 min) break after the end of the third block.

Analysis
Data preprocessing and statistical analyses were performed in Matlab 2014b (The MathWorks Inc., Natick, MA, 2000), R 3.6.1 (RC-Team, 2019) and SPSS Statistics 26 (IBM, Armonk, NY). First, all ratings where the anchor labels of the Likert scale were presented in a mirrored fashion (see above) were inverted such that they matched the ratings that had non-mirrored anchor labels. Values produced during the Function rating were transformed to produce an additional variable, namely Certainty (CER-R), which was the rectified distance of the interpretation rating from the centre of the scale (range 0-3). For comparability with the other scales which started from 1, we added the value 1 to the rectified values. Therefore, our final Certainty scale ranged from a minimum score of 1 to a maximum score of 4, which captured how close function ratings were to the "yes" or "no" extremes. For instance, a Function rating of 1 or of 7 corresponded to a Certainty of 4, while a Function rating of 4 corresponded to a Certainty rating of 1. Thus, the following dimensions were available for statistical analysis: certainty (CER-R) about the correctness of yes/no responses, coherence (COH-R) between question and reply, directness of the reply (DIR-R), predictability of the reply (PRE-R), and semantic similarity between question and reply (SSI-R). The dimensions of Function (FUN-R) were only used for item rejection purposes, as explained below.
Next, direct-indirect item pairs were excluded from all analyses if, based on the average rating over all partaking subjects, (1) the sentence pair of the "direct" group was judged by the experimental subjects to be more indirect as compared with the "indirect" one (SAmatched set: 3 pairs; non-SA-matched set: 5 pairs); and/or (2) one of the two stimuli was not predominantly assigned to the expected function by the participants, meaning that "no" items were rejected if average FUN-R > 3.5 and "yes" items were rejected if average FUN-R < 4.5 (SA-matched set: 1 pair, non-SA-matched set: 7 pair). This led to the exclusion of 16 direct-indirect stimuli pairs across sets (SA-matched set: 4 items, namely 5.3%; non-SA-matched set: 12 items, namely 15.8%). As an unequal amount of "yes" and "no" pairs were excluded within each set, an additional 4 pairs were removed across sets such that, in both SAmatched and non-SA-matched sets, an equal number of "yes" and "no" pairs were maintained (SA-matched set: 2, non-SA-matched set: 2). These latter pairs were selected so as to balance the remaining items of each set. The final analysis included 70 item pairs in the SAmatched set (6 overall exclusions, namely 7.9%) and 62 in the non-SA-matched set (14 overall exclusions, namely 18.4%), with an equal amount of "yes" and "no" pairs within each Set (Table 1).

Linear mixed models analysis
Our a priori hypotheses concerning differences in measured propertied between direct and indirect replies and the effect of speech act-matching (see introduction) were tested using linear mixed models (LMM)  0.67 ± 0.13 0.65 ± 0.17 0.64 ± 0.15 0.64 ± 0.13 0.67 ± 0.12 0.66 ± 0.13 0.64 ± 0.12 0.63 ± 0.12 Length context question in words (mean ± SD) 5.67 ± 1.20 5.50 ± 1.36 6.06 ± 1.31 6.19 ± 1.37 5.91 ± 1.55 6.18 ± 1.13 5.97 ± 1.33 5.94 ± 1.27 Number of repeated pronouns (sum) from the R package lme4 (Bates et al., 2015). The models all included a random intercept for both subject and item, which accounted for inter-subject and inter-item variability, respectively. The present study includes three independent variables: In/Directness [Direct, Indirect], Speech Act (SA) matching [SA-matched, non-SAmatched] and Polarity [yes, no]. For each rating, we started building a null model, which did not contain any fixed effects. Subsequently we progressively increased the complexity of the model by adding the various factors alone or in interaction. All models were based on the default contrast of lme4 package (the socalled treatment contrast) and the base level of the In/ Directness predictor was direct, of the SA-matching predictor was SA-matched and of the Polarity predictor was no. For each increase in complexity, the model was compared with the previous one in a pairwise fashion using a likelihood ratio test (LRT).

Correlations between dimensions
If indirectness comes with differences in Predictability, Certainty, Semantic Similarity and Coherence with its Table 4. Questions used to prompt the rating of each of the measured dimensions, together with their respective anchors. Figure 1. Example of experimental procedure. The stimulus to rate is presented on the upper part of the screen (interrogative and affirmative simultaneously). In the lower part of the screen, the rating question and the Likert-scale are depicted. Subjects can select their ratings using the arrow keys and confirm it with the return key. linguistic context, it is possible that these dimensions also correlated with one another. To test the hypothesis of linear relationships between the outcomes of the ratings, pairwise Pearson correlations between the average rating values by item were performed in the two sets collapsed. Collapsing of data across sets was motivated by statistical results of the Linear Mixed Models analysis reported below. Correlation analyses were performed on all rating dimensions except Function, which was omitted as visual inspection indicated that the respective ratings did not show a normal distribution but a bimodal one. This was an unsurprising consequence of the fact that the replies were always interpretable as "yes" or "no", pushing the subjects to provide ratings that tended to be clustered at the extremes of the Function scale. Note however, that, after the data transformation into the new variable Certainty, which we explained below, the functional data could still be used. The statistical evaluation of the correlations was Bonferroni corrected (for 10 comparisons). We therefore report corrected p values.

Principal component analysis
A principal component analysis (PCA) was performed in an exploratory analysis to further quantify the relationship between the various ratings. The PCA allows to determine whether the variability captured by our five ratings is better captured by a number of underlying variables. In particular, it allowed to investigate whether our original dimensions tended to all load (i.e. to be assimilated) onto the same underlying component or whether they segregated on different ones. In other words, it allows to check how interconnected these dimensions are. Before proceeding, the Kaiser-Meyer-Olkin measure of sampling adequacy (KMO), the Barlett's Test of Sphericity and the correlation matrix determinant were conducted in order to ensure that our data met the assumptions of PCA (Field, 2000). Next, the average values for Coherence, Directness, Predictability, Semantic Similarity and Certainty for items of both sets collapsed (264 items) were entered in the PCA (5 rated properties x 264 items matrix). The number of components to be extracted in the PCA was defined using the Kaiser criterion and the varimax rotation was applied in order to achieve orthogonality between components.

Ratings across (In)directness, SA-matching and polarity
For each of the ratings the outputs of all the individual comparisons in the linear-mixed models analysis are shown in Table 5 together with the corresponding statistical parameters. As our hypothesis for the rated dimensions of interest might be considered related to one another, the table also provides the significance criterion after correction for multiple comparisons (Bonferroni, 5 comparisons). All best models remain significant after correction for multiple comparisons. To further investigate these two-and three-way interactions, we proceeded to post-hoc tests with Tuckey's HSD correction for multiple comparisons to identify where these differences occurred. For conciseness, we report in text only significant differences in single-degree-of-difference pairwise comparisons. A full report of all post-hoc pairwise comparison can be found in Supplementary Material B (tables SB1-B5). Additionally, an indication of the inter-rater reliability is provided by the mean standard deviation of each item, separated by SA-matching, Directness and Polarity, reported in the Supplementary Material, in Table S.A1. Linear mixed models analysis indicated that all the ratings were explained by a significant main effect of the In/Directness condition alone (DIR-R: χ 2 (1) = 3241.8, p < 0.001, CER-R: χ 2 (1) = 1106.9, p < 0.001; COH-R: χ 2 (1) = 3183.5, p < 0.001, PRE-R: χ 2 (1) = 1119.6, p < 0.001, SSI-R: χ 2 (1) = 2414.3, p < 0.001, see Table 5) indicating that in general indirect replies received lower ratings compared to direct replies. Importantly, the fact that the Directness rating was reflected by the In/Directness factor confirms the classification of speech acts into the direct and indirect categories, which had been performed during stimulus preparation according to established linguistic criteria.
However, testing further models allowing for interactions with the factors of Polarity and SA-matching indicated that some interaction effects were detectable for all variables and were better at accounting for the data than a main effect of the In/Directness condition (see Table 5 and Figure 2). In fact, the Certainty rating was best explained by a two-way interaction between the factors In/Directness and Polarity (CER-R: χ 2 (1) = 42.6, p < 0.001, see Table 5) meaning that the difference in these ratings between direct and indirect replies was modulated by whether the reply was intended as "yes" or as a "no". Direct items were characterised by significantly higher Certainty ratings than the indirect ones both in the no (p < 0.001) and yes (p < 0.001) Polarity condition. However, while direct items received similar ratings regardless of their interpretation as "yes" or "no" (p = 0.703), indirect ratings were judged as having a slightly higher Certainty when conveying a no rather than a yes (p < 0.001).
Most importantly, the ratings of Directness, Coherence, Predictability and Semantic Similarity were all best explained by a three-way interaction between the factors In/Directness, SA-matching and Polarity (DIR-R: χ 2 (4) = 22.9, p < 0.001; COH-R: χ 2 (4) = 53.2, p < 0.001; PRE-R: χ 2 (4) = 46.2, p < 0.001; SSI-R: χ 2 (4) = 16.9; see Table 5) indicating that all cognitive ratings obtained were modulated by all three factors in a complex manner. Note that none of the ratings was explained by an interaction between Directness and SA-matching alone. Consistent with the main prediction, Directness, Coherence, Predictability and Semantic Similarity ratings were significantly lower for indirect than for direct items, irrespective of Polarity or SA-matching. Thus, indirect replies received lower ratings than direct items across all SA-matching by Polarity combinations (p < 0.001). While this difference underlies the main effect of the Directness factor, the complex 3-way interactions were due to the following modulation: indirect replies with positive polarity were rated lower than the corresponding negative polarity items, but this effect was only significant for non-SA-matched materials (p < 0.001). Recall that in the non-SA-set, indirect "yes" replies performed an acceptance of an offer (e.g. "Shall we go to the cinema?" being responded to by saying "There is an interesting new movie.") while the "no" indirect replies performed a declination of an offer/invitations (e.g. "Shall I buy the tickets?" being responded to by saying "We haven't decided on a destination yet."). Interestingly, in the non-SA-matched set, the items calling for a "yes" answer were not only rated lower compared with their corresponding "no"-items, but, in addition, as significantly lower than their negative SA-matched counterparts on almost all scales tested, Table 5. For each of the ratings of certainty (CER-R), coherence (COH-R), directness (DIR-R), predictability (PRE-R) and semantic similarity to the question (SSI-R), the table provides information about the tested models, namely the Bayesian Information Criterion (BIC), the Akaike Information Criterion (AIC), and the log-likelihood (logLik). including Directness, Predictability and Coherence ratings (DIR-R: p = 0.009; COH-R: p = 0.003; PRE-R: p = 0.034). For Semantic Similarity, there was a numerical difference of average values pointing in this same direction, which was however not significant (SSI-R: p = 0.374). These results are partial support for our second hypothesis that speech act mismatch aggravates the cognitive differences between direct and indirect speech acts as these differences became relatively stronger for positive non-SA-matched items.

Correlations between dimensions
To examine a possible link between the five main dependent variables, we ran Pearson correlations between all pairwise combinations. Function (FUN-R) in its raw form was excluded, but was included in its transformed form, namely the Certainty ratings (CER-R) (Figure 3). All pairwise correlations were significant (all p > 0.001) after Bonferroni correction for multiple comparisons (for 10 comparisons) and overall confirmed a strong positive linear association between all of them, with Pearson's R coefficients all greater than 0.80 ( Figure 3). Thus, whenever an item received low (high) ratings on one scale, it was very likely that it also received low (high) ratings on the other scales. These results clearly document a strong correlative link between the measures.

Principal component analysis
To examine whether, in face of the documented strong correlations, any of the five dimensions could be dissociated from the others, principal component analysis was performed. Ratings of Certainty, Coherence, Directness, Predictability and Semantic Similarity to Context Question were entered into one analysis. The sampling adequacy was confirmed by a Kaiser-Meyer-Olkin (KMO) of 0.883. Conceptually, the KMO indicates the ratio between the variance that is shared between the variables and the one that is not shared. It can vary between 0 and 1 and a value larger than 0.8 is considered appropriate for PCA and indicates that "the pattern of correlations is relatively compact" (Field, 2000). Correlations between input variables in our dataset were large enough as confirmed by Barlett's Test of Sphericity (χ 2 (10) = 2395.460, p < 0.001). Yet the variables were not collinear, as indicated by a determinant of 0.00015. Therefore, our dataset met the assumptions for reliable PCA (Field, 2000, pp. 683-686). Of the five resulting principal components, principal component 1 (PC1) had an Eigenvalue of 4.59 and by itself explained 91.86% of the variance in the data. All further dimensions had Eigenvalues below 0.3 see (Table 7(A) and Figure 4), thus not passing the Kaiser criterion of Eigenvalue of 1 (Kaiser, 1960). Additionally, all original dimensions loaded similarly onto PC1, with rotated factor loadings above 0.9 (Table 7(B)). Thus, one single principal component (PC1) seemed to explain most of the variance in our set of items.

Discussion
In the present study, we asked whether linguistic indirectness (In/Directness factor) of speech act sequences Table 6. Summary of the fixed and random effects of the best fitting model for each of the collected ratings of certainty (CER-R), coherence (COH-R), directness (DIR-R), predictability (PRE-R) and semantic similarity to the question (SSI-R). expressed by two consecutive sentences is systematically associated with other cognitive variables, including the interpretative Certainty and Predictability of the second speech act and the Coherence and Semantic Similarity between the sentences used. Furthermore, we assessed whether any such association was modulated based on whether indirectness co-occurred with a speech act change (SA-matching factor) and whether the reply was intended to be understood as a "yes" or a "no" response (Polarity factor). Importantly we also asked whether these cognitive properties of direct and indirect speech acts were interlinked with one another. As expected, subjects consistently found the indirect replies to be less direct than the direct ones, but they also judged the corresponding interpretation to be less certain, less coherent with respect to the context Figure 3. Correlation matrix shown for the following rated dimensions: certainty (CER-R), coherence with the question (COH-R), directness (DIR-R), predictability (PRE-R) and semantic similarity to the question (SSI-R). The plots below the diagonal show the scatter plot displaying the relationship between pairs of variables, together with the regression line in red. Each observation represents an item and its average score on a given scale. The plots above the diagonal show the respective Pearson correlation coefficient (R) and significance level after correction for multiple comparisons (*p < 0.05, **p < 0.01, ***p < 0.001).
question, and less predictable and less semantically similar to the context question. Complex three-way interactions of the factors In/Directness (direct/indirect), Polarity (yes/no) and SA-matching (SA-matched/non-SAmatched) were seen for the Directness ratings, and those of Coherence, Predictability and Semantic Similarity. These interactions were due to significant differences between the cognitive ratings of replies meant to express "yes" and "no"-responses in the non-SAmatched set, but not in the SA-matched set. Note that in the non-SA-set, indirect replies came together with a change of speech act function relative to their direct control: "yes" replies performed an acceptance of an offer (e.g. "Shall we go to the cinema?" being responded to by saying "There is an interesting new movie.") while the "no" indirect replies performed a declination of an offer/invitations (e.g. "Shall I buy the tickets?" being responded to by saying "We haven't decided on a destination yet."). Conversely, in the SA-matched set, no speech act function change occurred, relative to the direct control: "yes" replies performed a confirmation (e.g. "Are you bringing your cat to the vet?" being responded to by saying "It got wounded.") while "no" replies preformed a disconfirmation ("Did you have time for sightseeing?" being responded to by saying "It was a business trip."). Thus, in the non-SA-matched set, for all four rating variables, there were relatively reduced values of the "yes" responses as compared with the "no"-replies, and, in addition, relative to their SA-matched "yes" response counterparts, although the latter effect reached significance for only three of the four rating dimensions (not for Semantic Similarity). In other words, for some of the stimuli (positive polarity items), lack of speech act matching led to an increase of the ratings of the cognitive differences between direct and indirect speech acts. Furthermore, indirect repliesbut not direct oneshad a more certain interpretation in case they conveyed a "no" compared to when they conveyed a "yes". Crucially, all ratings displayed strong positive and significant correlations with each other. This finding was further supported by the fact that principal component analysis (PCA) yielded one single major component explaining ca 92% of the variance, onto which all our rating dimensions loaded about equally, thus speaking for these ratings being all indices of one single underlying property. Furthermore, a supplementary analysis showed that the difference between direct and indirect item pairs in any rating correlated with the differences in all other ratings (Supplementary Material C). Finally in an item-by-item inspection, only a very small subset of direct-indirect item pairs could be identified, where the direct and indirect items were matched in terms of the above-mentioned properties, while still differing in their directness rating (Supplementary Material C).
The present study thus demonstrated that significant differences could be found in the perceived cognitive properties of direct and indirect speech acts, even when these were conveyed by the same linguistic form and when their relationship with their linguistic context was matched in terms of various psycholinguistic variables. Indirectness of speech acts never stands alone, but almost always is tied to differences in interpretative Certainty, Predictability, Coherence and Semantic Similarity to the context. Furthermore, for some communicative activities, including those items in our sets that were interpretable as "yes" responses, the cognitive differences between indirect and direct speech acts appear to be particularly strong if these are not matched for communicative function. This latter observation shows that lack of speech act matching may artificially alter and enhance the cognitive differences linked with in/directness per se.

Properties of indirectness
The first and most important finding of the current study is that direct and indirect speech acts were all differing in the five dimensions examined, such that compared to direct replies, indirect replies were (1) perceived as less direct by the participants, (2) interpreted with less certainty, and considered as (3) less coherent with their context, (4) less predictable and (5) exhibiting less semantic similarity to their context. These rating dimensions, while being modulated by other factors too (SAmatching and Polarity, see Discussion section "Speech Act Type, Polarity and Politeness") were most and foremost affected by the direct/indirect status of the critical utterance, as indicated by the estimates of the respective linear mixed models (see Table 6). Before moving on to discuss each property individually, we would like to address one potential confound which could have affected all ratings, namely that subjects were exposed multiple times to the same stimuli while progressing through the various rating blocks. It is therefore possible, that the degree of exposure to the stimuli affected the responses of the various subjects during the ratings. To evaluate this possibility, we performed additional analyses (see Supplementary Material D) showing that the degree of exposure of the subject to the stimuli (i.e. the position of the block for a given rating in each subject's session) did not significantly affect the Coherence, Directness, Predictability and Semantic Similarity ratings. The only rating that was significantly affected was the Certainty of the interpretation. Indeed, the more subjects were exposed to the stimuli, the more certain they were in the interpretation of indirect replies, but not of the direct ones. Being exposed multiple times to the same indirect reply might have given more time to the subjects to think about an appropriate interpretation and to be certain of it. Exposure therefore had a significant facilitatory effect on the interpretation. This however does not contradict the result of our main analysis, but it reinforces it instead. Indeed, our main analysis still detected differences in Certainty between direct and indirect replies in spite of this difference being minimised by the degree of exposure to the indirect stimuli. To sum up, overall, there was no evidence suggesting that our subjects' ratings were affected by exposure on any of the dimensions. Only Certainty was slightly affected by exposure, but the general pattern of direct replies being interpreted with greater Certainty than indirect ones still remained.

Ratings of directness
First, the rating of the Directness of the stimuli indicated that, as we expected, indirect stimuli received lower directness rating than their direct counterparts. Although this finding may seem trivial on first view, it turns out to be important as it confirms that our stimulus choice was appropriate for investigating the phenomenon of linguistic indirectness. In other words, the a priori construction of direct and indirect stimuli according to well established linguistic criteria (see Material and methods) resonated with a more intuitive understanding of in/directness in lay subjects. In addition, note that indirect replies on average received ratings that were rather central on the Likert scale, which is consistent with the fact that indirectness, being commonly used in daily communication, is not perceived as an "extreme" phenomenon.

Ratings of predictability
Indirect replies were rated as significantly less predictable than direct ones. One may suggest that this observation may have been related to peculiarities of our study. One could criticise that, in our design, when the context question in the non-SA-matched set was understandable as an offer, the reply was always indirect. This could inadvertently have made subjects predict the indirect (vs. direct) reply and could therefore have biased the Predictability ratings toward relatively higher values. If Predictability ratings had indeed been affected, then both response options to an offer, i.e. acceptance and declination, would have achieved higher Predictability ratings compared to indirect replies in the SA-matched condition, where the direct/indirect status of the reply was not predictable. However, contrasting with the observed pattern, Predictability ratings of indirect declinations (non-SAmatched set) were in fact comparable to those of indirect disconfirmation (SA-matched set). Only the Predictability ratings for indirect acceptations (non-SA-matched set) received significantly lower cognitive ratings than for indirect confirmations (SA-matched set). This pattern of results is incompatible with the possibility that the difference between all indirect replies in the non-SA-matched set was caused by the contingency of an indirect reply following an offer. However, it still remains possible that, had the indirectness of the reply not been predictable by the offer in the context question, then indirect  declinations and acceptance might potentially both have achieved lower Predictability scores than what they have. Importantly, the difference that we find between indirect acceptation and indirect declinations, and that we interpret as a consequence of politeness dynamics (see below), cannot be explained by the fact that both were made more predictable by the offer. We therefore consider the Predictability ratings not affected by the degree of exposure to the stimuli, nor by the fact that the indirectness of the reply in the non-SA-matched set was always anticipated by an offer. Grice proposed that human communication is based on a set of principles (or maxims) that people tend to tacitly take for granted during communication (Grice, 1975). Indirect speech acts are the result of the violation of the Relation maxim, stating that the speaker typically says things that are relevant to the ongoing discourse or situation. An expectation that a speaker follows the principle of Relevance might restrain the choices of utterances in their propositional content and consequently also in their form. For instance, if one asks, "Where are the scissors?" then the direct (and most immediately relevant) response to this query consists of mentioning a location by using a sentence similar to "The scissors/they are [location]", e.g. "They are in Jim's bedroom". However, if the response is indirect, then it is possible to omit any information about the location of the scissors in the propositional content of the reply, e.g. "Jim used them for his arts & crafts project". Note that in the indirect case, the reply significantly deviates from the expected direct reply in its propositional content and consequently also in its form (syntactically, lexically, phonologically, etc.). Indeed there are somewhat limited ways of communicating something directly, but multiple if not unlimited ways of communicating the same thing indirectly (Holtgraves, 1994(Holtgraves, , 1999. Therefore, we consider the lower predictability ratings associated with indirect replies to be the result of less constraints on their propositional content and form. It is of course still possible that certain ways of expressing something indirectly are more predictable than others (see Discussion section "Speech Act Type, Polarity and Politeness") but based on the present data it appears that there is nevertheless a general and strong characteristic of indirectness, that it is less predictable than direct communication.
Ratings of semantic similarity and coherence relative to the context In the present study, we cannot tell for sure to which extent the ratings measured the semantic similarity and coherence were affected by the indirectly conveyed message (so by the utterance at its non-literal level). However, the decreased coherence and semantic similarity relative to the context detected for indirect replies with respect to direct ones fits well with the Gricean framework, where indirect speech acts are defined as an apparent violation of the maxim of Relation (Grice, 1975;Levinson, 1983;Searle, 1975). Indeed, for an utterance to be considered unrelated to the ongoing discussion, this should be somehow thematically disconnected, namely semantically distant from the preceding linguistic context at the surface level, while a certain coherence is still achieved at a nonliteral level, which should be fit well with the context and situation. Note in this respect that Semantic Similarity, Coherence and Directness where the three properties that most strongly correlated with one another (Figure 3), supporting the idea that these might be in a particularly close relationship. We therefore consider it most likely that the decreased semantic similarity and coherence that indirect replies have with their context is driven by differences at their literal level with their context and that they, together with predictability, are a direct consequence of the violation of the maxim of Relation. One may be inclined to state that indirectness, superficial coherence and semantic relationship are intrinsically connected. Our experimental study adds to this that is indeed very difficult to find example stimuli that dissociate the three cognitive features.

Ratings of Certainty of interpretation
Next comes Certainty of interpretation. Several researchers have pointed out that the implicature carried by a specific utterance is often associated with a degree of indeterminacy (Grice, 1975;Levinson, 1983). Holtgraves (1998) stresses how the same indirect speech act might allow for multiple implicatures to be derived, thus making indirectness typically ambiguous or vague, at least to a degree. For example, if Sarah replies "I am vegetarian" to the invitation "Are you coming to the steakhouse tonight?", it is not quite certain which of these implicatures is the intended one (+> I am not coming; +> I am coming but will not eat with you; +> I am showing you my appreciation by accepting this invitation to a steakhouse although I am vegetarian; +> I am morally judging you for eating habits that I do not endorse; etc.). Furthermore, one of the classic tests of implicatures is that, as opposed to literally conveyed meaning, they are cancellable or defeasible (Levinson, 1983), meaning that their implicated propositional content can be negated without causing a semantic contradiction. As a consequence, indirectness has often been considered to involve a lesser degree of commitment from the part of the speaker and to be "plausibly deniable" (Pinker et al., 2008;Reboul, 2017) or "off-record" (Brown & Levinson, 1987). Taking an experimentalist approach, Lee and Pinker (2010) compared the same "intended message" conveyed in more direct vs. more indirect manner. They found that subjects consider the message to be less certain, the more indirectly expressed it was. Additionally, Sternau et al. (2015) also compared to one another the same message when it was directly conveyed (bare linguistic meaning/explicature) vs. indirectly conveyed (strong/ weak implicature). They found that, in the case of indirect speech acts, comprehenders were less confident about their truth judgement of the implicated content. Our present results are consistent with these previous studies and extend them by taking a different approach. Indeed, these previous studies chose to keep the "intended message" constant and to vary the linguistic from to achieve different degrees of (in)directness. We here take the opposite approach and take the same linguistic form to be used either as a direct or indirect mean to convey different intended messages. Therefore, we here show that indirect speech acts are understood with less certainty also when the direct and indirect stimuli are conveyed by exactly the same sentence. A possible reason for this could be that the comprehender implicitly knows that the implicature could be defeated shortly thereafter during conversation, or potentially also at a later time point.

Relationship between properties
Strikingly, these dimensions were all strongly and positively intercorrelated (see Figure 3), a PCA could not separate the variability in our data into multiple underlying dimensions (see Figure 4). Instead, nearly all the variance in the ratings, including that in the directness ratings, was accounted for by one single component (PC1), which we could consider the directness-to-indirectness dimension. A further analysis (Supplementary Material S.C) also indicated that, whenever the direct and indirect replies within a pair scored differently on one scale, they most likely had equally distant scores on any other of the measured scaled. Further item-by-item examination indicated that only extremely few item pairs in our set escaped this pattern (see Supplementary Material, Table S.C1 and S.C2). These different analyses, all converge to the finding that the various properties of Predictability, Coherence with the context, Semantic Similarity to the context and Certainty of the interpretation are not easily separable from one another. Most likely, they all represent different facets of the phenomenon of indirectness. Note that, we of course do not claim that these various properties are the same thing as indirectness. Indeed, these properties can be realised by a linguistic stimulus without it being indirect. For instance, an utterance can be unpredictable without being indirect. However, our result seems to indicate that there is a solid relationship between indirectness and these properties, such that when an utterance is indirect, it seems not to be dissociable from being to a degree unpredictable, uncertain, dissimilar to the preceding linguistic context and incoherent. This link between properties is also consistent with the fact that differences at the level of individual properties are all tied to different linguistic explanations of indirectness (as discussed above) and can be seen as the direct cognitive manifestations thereof. So, altogether, it seems that the interrelated perceived (cognitive, psychological) properties that in the current study we found to systematically differ between direct and indirect replies are most likely inextricable from one another and are intrinsic to indirectness itself. Therefore overall, the characteristics of indirectness that have been identified by linguistic theorists are clearly reflected at the cognitive level in the mind of the comprehender.

Speech act type, Polarity and Politeness
The independent variable of in/directness of the reply most strongly affected the ratings across all measured dimensions. In addition, further factors modulated this central effect in a more fine-grained manner. The ratings of Certainty of interpretation were best explained by an interaction between the factors In/ Directness and Polarity. In contrast, ratings of Directness, Coherence, Predictability and Semantic Similarity results were best explained by a three-way interaction between In/Directness, Polarity and SA-matching. The crucial difference behind this latter interaction can be described as follows: in the non-SA-matched set it was always the case that indirect replies conveying a "no" (namely declining an invitation/offer) achieved ratings that tended to be slightly more similar to their direct counterparts compared to those conveying a "yes" (namely accepting an invitation/offer). Interestingly, for the Certainty ratings, this pattern was also found in the SAmatched set. These effects are difficult to be attributed to the change of speech act type generally, as this should have affected all indirect replies co-occurring with a speech act change, irrelevant of whether they conveyed a "yes" or a "no". It rather appears that for some speech acts, the change of speech act function had an effect of enhancing the directness/indirectness difference (yes-responses), whereas, for others, this effect was not significant. So how could this difference between "yes" and "no" indirect replies in the non-SA-matched set be explained? This motivates a closer look at the speech act changes realised in the non-SAmatched set.
We suggest that these findings can be interpreted in the framework of Politeness Theory (Brown & Levinson, 1987), according to which indirectness is one of the linguistic strategies that are typically used in natural conversation to mitigate face threatening acts (FTAs), which constitute an attack to the face of the hearer or of the speaker him/herself. In this context, the concept of face (Goffman, 1955) corresponds to the wish of each individual to be unimpeded (negative face) and to be desirable (positive face). In the present study, the non-SA-matched condition consisted of an interrogativeaffirmative sentence pair, where the interrogative conveyed an offer, and the affirmative was understandable as an acceptance or rejection of the offer. Note that similar stimulus sets involving face-saving replies are common in neurocognitive research (see e.g. "Did you find my presentation convincing?" -"It's hard to give a good presentation" from Bašnáková et al., 2014b; "Have you received any grants or scholarships during your studies?" -"The competition for scholarships in my field is extremely harsh." From Bašnáková et al., 2015; "Will my film be successful at the box office?" -"It is hard for audiences to really enjoy a literary film." from Feng et al., 2021). More specifically, in our study offers included both proposals to engage in joint activities (henceforth invitations, such as "Shall we have some drinks?") and offers to do something for the other person (henceforth offer of favour, such as "Shall I do the dishes"?). In the framework of Politeness Theory, the case of an invitation made by A being rejected by B (negative polarity items) constitutes an FTA for A, as it threatens the positive face of A. Similarly, in the case of the offer of favour made by A, a rejection by B can potentially be a FTAs for A, because it would threaten A's positive face, whose good intention is being turned down. Note, that it is in principle possible that B accepting an offer of favour made by A also constitutes an FTA, albeit to A's negative face, as it will make A commit to actually doing the favour. Note however, in the present study, the offers of favours always consisted of rather trivial and small favours such that the subsequent acceptance (positive polarity) would most likely represent a minor degree of imposition on A. Thus, we consider that overall, in our set of stimuli, the "no" reply conveying a rejection would have been generally more face threatening than the "yes" reply conveying an acceptance. Therefore, in the indirect items of the non-SA-matched condition, the context question (invitation or offer of favour) opens the possibility for the subsequent reply to be an FTA. This was not the case for indirect items in the SA-matched condition, as these were mere assessment of a state of affair, without face threatening potential. Furthermore, the speech act matching guaranteed that the type of speech act function was the same between direct and indirect conditions; the lack of such matching brings with it the danger of introducing additional differences such as the presumed difference in face threat. Ifas Politeness Theory predictsindirectness is frequent or more likely to occur when a face threatening message is being conveyed, then we would expect that there is greater motivation for the speaker to use and more reason for comprehenders to expect and thus process an indirect speech act when it is used to perform a face threatening act. This would be relevant for our negative polarity non-SA-matched indirect condition. Our present results fit well with this prediction as they indicate that the facethreatening indirect replies, i.e. the invitation/offer declinations, scored higher in all rated scales compared to non-face-threatening invitation/offer acceptances. Compared to indirect acceptance, indirect declinations seemed to be perceived in a way that was more similar to direct replies. This pattern of results suggests that these indirect declinations could be easier to process compared to indirect acceptance. Conversely, positive non-SA-matched indirect replies which were also not used to convey a FTA in their respective contexts (e.g. in response to an offer) appear to be perceived as relatively more anomalous than their SA-matched indirect counterparts on most cognitive dimensions, thus suggesting a greater indirectness effect for non-SA-matched items compared with matched ones if FTA issues are not relevant. In our proposed interpretation, it is the face-saving and politeness-related function of indirectness specifically in the negative-response condition that works against and minimises the otherwise present cognitive difference due to lack of speech act matching.
Our findings are in line with previous research reporting indications of interactions between the perception of indirect speech acts and the presence of a face threatening context. Indirect replies were found to be recognised as conveying indirect meaning more often and to be understood relatively more quickly when they occurred in a face threatening context (Holtgraves, 1991(Holtgraves, , 1998. Similar results were replicated in an eyetracking study, where reading of indirect replies was found to be less fluent when their use was not justified by a face threat (Stewart et al., 2017). An unexpected result in our study, however, is that the Certainty of the interpretation was affected by an interaction between the factors Directness and Polarity also in the SA-matched conditions, which did not include FTAs.
This latter effect is difficult to explain and we indeed do not have a fully convincing explanation to offer. One may argue that it may be a possibility, which once again rests in Politeness Theory, that there is a high co-occurrence between indirectly conveyed negative replies (replies that communicate a "no") and face threatening contexts. Therefore, it could be that the mere fact that an indirect reply conveys a "no" biases the subject towards a reading the question-answer minimal dialogue as a face threatening scenario, also if it isn't one. This in turn might have provoked "spill over" of these effects of Politeness on the SA-matched set, which actually did not involve a face threat. This possible explanation however remains highly speculative and further work might be needed to confirm this "spill-over" effect. To sum up, the present results are mostly in line with previous research establishing that indirect replies used to perform a face threatening speech act such as rejecting an offer are easier to process than indirect replies not performing an FTA. Additionally, they provide more insights with respect to what properties of the indirect replies are affected by that, namely certainty of interpretation, coherence relative to the question, directness, and predictability. Finally, they also demonstrate how a change of the type of speech act between direct and indirect conditions which may overlay and confound the differences in cognitive properties normally present between direct and indirect speech acts per se can be associated with additional differences (here: face threat, see Bašnáková et al., 2014). Contrasting with the pattern seen for the non-SA-matched set, the positive SA-matched replies, which were, according to our analysis, not overlaid by a confounding difference in face threatening, showed more substantially reduced cognitive ratings for indirect relative to direct speech acts matched for their illocutionary function on most rating dimensions (indirectness, coherence, predictability).

Implications for research on linguistic indirectness
After these results and conclusions, it appears that indirectness is a multifaceted phenomenon, as it comes together with a range of other factors. If indirectness is inextricably associated with lower predictability, lower coherence, lower certainty and lower semantic similarity, these properties are most likely each reflected by patterns of activity in the brain. The awareness of these properties should inform related psycholinguistic and neurolinguistic research and is of particular interest for the interpretation of neuroimaging studies. Indeed, studies investigating neural correlates of indirectness have relied on reverse inference (Poldrack, 2006) to interpret neuronal activation patterns. So, they explained the activation of certain brain regions by stating that they were involved in certain cognitive processes, but only assumed that these processes were required during comprehension of indirectness. With the present study, we provide evidence that there are several systematic differences in cognitive propertied between direct and indirect speech acts, thus providing a solid ground for interpretation of neuroimaging results and addressing the reverse inference problem.
fMRI studies conducted so far have isolated two main brain networks associated with processing of indirectness (Bašnáková et al., 2014(Bašnáková et al., , 2015Feng et al., 2017Feng et al., , 2021Jang et al., 2013;Shibata et al., 2011;van Ackeren et al., 2012van Ackeren et al., , 2016. First, the Theory of Mind network, including the temporo-parietal junction (TPJ), medial prefrontal cortex (mPFC) and precuneus, which is hypothesised to contribute to the hearer taking the perspective of the speaker, or understanding what the speaker "really means". Second, activation in regions belonging to the language network but extending also to the right hemisphere homologue areas, such as bilateral inferior frontal gyrus (IFG), bilateral middle temporal gyrus (MTG) and anterior temporal lobe (ATL) have been interpreted as reflecting semantic integration, semantic unification, and coherence building. The interpretation of these latter activation foci is congruent with our present result that, compared to direct replies, indirect ones are characterised by reduced Semantic Similarity and Coherence relative to the context. However, none of these regions that are typically activated during comprehension of indirectness has been interpreted in relation to lower Predictability or greater Uncertainty in the interpretation of indirectness. For instance, the above-mentioned studies consistently find a large portion of the mPFC to be active in response to indirect replies. mPFC is a multifunctional brain region has been found to be divided in multiple subregions based on the type of task that activate it (Amodio & Frith, 2006;De La Vega et al., 2016). Particularly, whereas the anterior part of the mPFC (arMFC as in Amodio & Frith, 2006; anterior portion of the mPFC as in De La Vega et al., 2016) was found to be associated with mentalizing, person perception, and social processing, the more dorsal part (prMFC as in Amodio & Frith, 2006; middle portion of the mPFC as in De La Vega et al., 2016) was associated with other cognitive functions such as "decision making" and processing of "uncertainty". In several neuroimaging studies of indirectness, detected activation in the mPFC seems to overlap with the anterior, but also extend partially to the dorsal middle portion, which could possibly be a consequence of the higher degree of uncertainty in interpreting the "intended meaning". Alternatively, or possibly in addition, this activation could reflect a greater involvement of prediction or prediction error processing in indirect speech act comprehension. Predictive processing has recently attracted substantial attention in the cognitive neurosciences (Friston, 2005;Wolpert et al., 1995;Wolpert & Kawato, 1998) and also in the psycho-and neurolinguistics filed (Huettig, 2015;Pickering & Clark, 2014;. Most studies exploring processing of predictable vs. unpredictable (which however did not constitute a semantic incongruency or syntactic violation) linguistic stimuli were based on EEG and MEG methods and have found larger N400 responses for less predictable words Kutas & Hillyard, 1984;León-Cabrera et al., 2017Van Berkum et al., 2005) and broad frontal anticipatory activity, socalled Prediction Potentials, before the onset of semantically predictable speech and written text (Grisoni et al., 2016(Grisoni et al., , 2017. In view of the relatively reduced predictability of indirect replies (compared with direct ones) revealed by the present study, it could be expected that indirect replies elicit stronger N400 responses than direct ones as well as enlarged semantic prediction potentials elicited by critical words of the reply sentence. Unfortunately, to the best of our knowledge, only one study (Coulson & Lovett, 2010) investigated indirectness with EEG methods, a scarcity that is probably related to the methodological difficulties of such an enterprise. In particular in the case of indirect replies (typically sentences) it is difficult to create stimuli with a well-defined point in time when the indirectness of the speech act becomes effective, thus making it difficult to constrain the analysis of electrophysiological data with high temporal resolution to specific time windows. Coulson and Lovett (2010) examined the ERP responses while subjects read 7-word-sentences which could be understood as indirect requests or as direct statements depending on the preceding context. In their case, the context was not defined by the previous turn in dialogue, but by a brief text describing the situation. They found no differences between direct statements and indirect requests in the N400 responses for any of the individual words in the critical sentence. This, at first glance, appears to be in contrast with our finding that indirectness tends to be associated with decreased predictability. However, it is a well-known fact that requests, due to their potential to threaten the negative face of the speaker, are very frequently performed in indirect form following a politeness strategy (Brown & Levinson, 1987;Holtgraves, 1991Holtgraves, , 1994. Indeed, in our present data, indirect replies that had implications for politeness, were only minimally less predictable than their direct counterparts (see Figure 2). Thus, it also possible that indirect replies used in Coulson and Lovett (2010) were in fact not that "unpredictable" due to their politeness function, which could explain why they did not see N400 differences between the direct and indirect condition.
A further consideration concerns the way indirectness has been operationalised in the literature. In more recent studies of indirectness, experimental designs where the very same critical stimulus could be either direct or indirect depending on the preceding context were preferred (Bašnáková et al., 2014(Bašnáková et al., , 2015Feng et al., 2017Feng et al., , 2021van Ackeren et al., 2016). While some studies attempted to equalise semantic similarity to context in direct and indirect stimuli by several means (e.g. Bašnáková et al., 2015Bašnáková et al., , 2014 and the present study), others used LSAa measure of semantic relatednessas a criterion to quantify and operationalise indirectness (Feng et al., 2017). In these studies, the cosine similarity based on latent semantic analysis (LSA) (Landauer et al., 1998(Landauer et al., , 2007 between context and critical reply was used as a proxy for semantic similarity. In the current study, our direct/ indirect stimuli were also counterbalanced for LSAbased cosine similarity and for further indicators of semantic relation (number or repeated lemmas, number of repeated pronouns, number of coreferences). Nevertheless, subjects still perceived indirect stimuli as less semantically related to their context. This is likely due to the obvious imperfectness of using a sum of semantic vectors of individual words to obtain semantic information about a larger construction. All sequential and combinatorial information is lost in this case. LSA is indeed an imperfect tool that fails in the representation of various aspects of language such as lexical ambiguity, idiomatic meaning, metaphors, etc. Therefore, the present claims are limited to a set of stimuli that was matched based on the LSA-based cosine similarity between context question and critical utterance in the various conditions. However, other more contemporary distributional models such as BERT (Devlin et al., 2018) or ELMo (Peters et al., 2018) could offer a useful alternative to the currently used LSA. To sum up, our data show that an increased semantic distance from its context sentence might be an intrinsic property of indirect stimuli, as it closely correlates with perceived indirectness. On the basis of these results, it appears that using behavioural ratings of semantic similarity between context and critical utterance as a proxy for the degree of indirectness might be sound (as in Feng et al., 2017).
Our study revealed the intrinsic relationship between the cognitive properties of indirectness, semantic relatedness, predictivity, coherence and certainty of understanding. As such, our results are of relevance for any study making claims about specific features of indirectness, as the multiple implications for the related cognitive processes need to be taken into account. This applies, in particular, to studies aiming at drawing inferences on the brain loci of indirectness. In all of these cases, additional studies are necessary to disentangle which feature of indirectness, or which combination of features, are crucial for a specific brain locus to "light up". Current interpretations offered in the literature had so far been lacking on these aspects in many cases.
In our study, we examined differences between direct and indirect speech acts that differed with regard to their speech act function and, in addition, speech actmatched sets where the critical speech act performed with the second sentence had the same speech act function. As mentioned, the factor "SA-matching" was involved in a 3-way interaction with In/Directness and Polarity, whereby the indirect items with negative polarity were also characterised by face threat, which, as we argue, led to relatively enhanced cognitive ratings for the indirect condition. We noted that, in the absence of a face-threat difference (i.e. the positive polarity items), the discrepancies between most cognitive ratings of SA-matched vs. non-SA-matched indirect speech acts were relatively more pronounced. In this context, we note again that, as to our knowledge, none of the previously published neurocognitive studies reported to have implemented such matching. It may therefore be that some of the brain activation signatures of indirectness reported so far may be due to a change in speech act function, rather than to indirectness per se. Therefore, we suggest to implement speech act matching in future studies of any neurocognitive differences related to in/directness, or consider the effects of a lack thereof.

Limitations and outlook
One criticism that can be raised concerning the present study, is to which degree the cognitive properties rated by lay subjects can be considered reliable. Patterns that we find in our data seem to confirm that the subjects had an appropriate understanding of these dimensions. First, the Directness ratings reflected the a priori categorisation of stimuli as direct or indirect. Second, the rating of Semantic Similarity between context question and critical replies was slightly higher (although not significantly) for direct positive replies than for direct negative replies in the SA-matched set. This seems reasonable, given that the latter, but not the former were a paraphrased form of the context question. Direct replies which entail a "yes", as opposed to those entailing a "no" consisted in a reformulation of the question's propositional content in an affirmative form, which might have increased the semantic similarity of the reply to the context question. For instance, if the context question is "Is your cat hurt?" and the confirmatory direct reply is "It got wounded.", the critical words "hurt" and "wounded" are closely related semantically and the propositional content of the two utterances is likewise similar. However, if the context question "Did he grow up in the country?" is followed by the disconfirming direct reply "He has always lived in the city.", there is less overlap of propositional content between the two utterances, although a semantic link between "country" and "city" can hardly be denied. Clearly subjects were sensitive to this difference although LSA was not. Third, the ratings provided by the subjects correlated significantly with the corresponding logRTs (see Supplementary Material E). The explicit ratings provided by the subjects are therefore supported by the implicit measure of their reaction times even though subjects were only instructed to be accurate, but not to be fast, and in absence of any time constraint. To sum up, while lay subjects most likely have a more intuitive understanding of properties such as those rated here, it is very unlikely that they fully lack meta-linguistic understanding. Furthermore, it has to be pointed out that the goal of the present work was precisely to examine how/whether the linguistic definition of (in)directness is reflected in the perceived properties of indirectness, in order to inform psycho-and neurolinguistics studies of indirectness. Of course, the overarching goal of such studies is to investigate mechanisms of indirectness comprehension in the mind and brain of the average individual, who might indeed lack high degrees of meta-linguistic awareness.
Our interpretation of the 3-way interactions rests on Politeness Theory (Brown & Levinson, 2006) combined with a specific effect of SA-matching. However, in the present study, we did not collect ratings of the perceived Politeness of the replies or of the perceived face-threat associated with the question-reply minimal dialogue. As the present results suggest that the perceived facethreat might play a role in how indirectness is perceived, future studies should consider evaluating such a dimension too, with the aim to provide more support to our claim. Additionally, in the present study we only assessed the dimensions that were related to our hypothesis, i.e. dimensions in which we expected direct and indirect replies to score differently. However, we did not include any negative control variable i.e. a dimensions unrelated to in/directness where we would not have expected direct and indirect replies to differ. One possibility, for instance, would have been to ask participants to provide grammatical acceptability ratings, which are not expected to vary depending on in/ direct status of the utterance. Having such additional variable(s) was difficult in the present study, as this would have come with the risk of excessively fatiguing the subjects. Nevertheless, we acknowledge that including a negative control variable would have made our experimental design stronger.
Given the strong associations that we find here between the assessed dimensions, one could rise the question of whether subjects might just have systematically reported scores given along a given scale on the other scales for the same stimulus. Note however that the design of the rating procedure should have minimised such a potential issue, as each rating question was presented in a different rating block (see Material and Methods section "Experimental Procedure") such that the subjects never had to rate the same stimulus on all dimensions at the same time. This is also supported by the individual subject trajectories displayed in Figure SF.1 (Supplementary Material), which indicate that it was not the case that same subjects provided same values across all ratings.
The present study investigated only a subset of types of indirectness (intended as Relevance implicature) that can be encountered in natural language. Indirect utterances might however not only be replies to questions and might not always implicate a "yes" or a "no". Also, they might convey many other types of speech acts other than assertions, acceptance and rejection (Holtgraves, 1991(Holtgraves, , 1998(Holtgraves, , 1999Holtgraves & Robinson, 2020). Most notably, indirect requests which were the object of much previous research (Clark, 1979;Gibbs & Mueller, 1988;Holtgraves, 1994;Trott & Bergen, 2018; see Ruytenbeek, 2017 for a critical review), were not examined in the present study. However, we consider it possible that the present findings generalise to a degree to these other types of indirect speech acts. Also, specifically (indirect) requests, which are a face threat because they involve an imposition and therefore threat to the negative face of the hearer, could possibly follow a pattern similar to the indirect declinations of offers in our non-SA-matched set.
Finally, the way pragmatics in general and indirectness more specifically are used, might be the object of cross-linguistic and cross-cultural variation. The current work was based on stimulus material an English which was evaluated by a cohort of subjects who were native speakers of English. Also, the present study is in the prolongation of a long-lasting tradition of theoretical research in the field of pragmatics which is also mostly based on English Language too. The degree to which the current findings extend to other languages and, more generally, other cultures should be the object of further investigation.

Conclusion
The present study investigated the cognitive properties of linguistic directness vs. indirectness of consecutive speech acts, here expressed by question and reply sentences, and how their cognitive properties are affected by other factors such as the speech act type of the reply, i.e. whether it is affirmative or disconfirming (factor "polarity"). Overall, indirect replies differed from direct ones insofar as they were perceived as less coherent with their linguistic context, more semantically distant from the linguistic context, less predictable and yielding more uncertain interpretations. These main differences were finely modulated by the type of speech act that they conveyed, such that indirect declinations of offers or invitations were evaluated more similarly to direct replies than the indirect acceptances were to their direct counterparts, possibly due to a face threatening function of the former. When such face issues were not present (positive replies) the cognitive ratings of indirect speech acts were relatively lower than their direct counterparts as compared to the situation with non-SA-matched stimuli, thus suggesting enhanced cognitive differences for non-matched in/direct speech acts. Furthermore, the properties that distinguished between direct and indirect replies were strongly intercorrelated. We conclude that linguistic indirectness is characterised by specific cognitive properties. We also argue that these features are not only occasionally associated with indirectness but that they are systematic and intrinsic to indirectness, as a cognitive manifestation of the linguistic concept of indirectness and thus represent genuine conceptual features of the phenomenon. These distinct properties most likely have differential impacts on the way indirectness is processed in the mind and brain. Therefore, this knowledge should be used on one hand to support, guide, but also challenge the interpretation of psycholinguistic and neurolinguistics studies on indirectness. Similarly, it provides a basis to improve future experimental designs that aim at understanding the individual contributions of brain areas or brain networks involved in understanding of indirectness. Finally, our findings also highlight how the specific type of speech acts performed indirectly along with the matching of direct and indirect items represent an important factor which can affect underlying mechanisms of comprehension of indirectness. Future studies should aim at understanding the mechanisms of indirectness comprehension while more systematically varying the type of speech act being performed indirectly.