Detection of early warning signals for overruns in IS projects: linguistic analysis of business case language

Many Information Systems (IS) projects fail to be completed within budget and on schedule. A contributing factor is the so-called planning fallacy in which people tend to underestimate the resources required to complete a project. In this paper, we propose that signals of the planning fallacy can be detected in a project ’ s business case. We investigated whether language usage in business cases can serve as an early warning signal for overruns in IS projects. Drawing on two theoretical perspectives – the Linguistic Category Model (LCM) and Construal Level Theory (CLT) – two sets of rival hypotheses were tested concerning the relationship between project overruns and whether the language usage in a business case is abstract or concrete. A linguistic analysis of the business cases of large IS projects in the Netherlands suggests that concrete language usage in the business case is associated with bigger budget and schedule overruns. For researchers, our study contributes to the existing literature on the importance of language usage. For practitioners, our study provides an early warning indicator for overruns.


Introduction
Failing to complete projects on time and within budget is a common problem which plagues many organisations (e.g., Kutsch et al., 2011;Project Management Institute, 2018). Recent surveys have estimated that cost overruns affect as many as 60% to 70% of projects, and the estimates for schedule overruns are similar (KPMG, 2017;Wellingtone, 2017). Unfortunately, many IS projects also fail to meet their targets with regards to budget, schedule, and functionality (e.g., Cao, 2008;Conboy, 2010;Keil et al., 2000;Lang et al., 2013). It is estimated that only 36% of software projects complete on time and within budget (Standish Group, 2014). Large software projects appear to be particularly prone to overruns with 66% experiencing cost overruns and 33% experiencing schedule overruns (Bloch et al., 2012). Even though advances have been made in software development (e.g., agile practices), the problem of budget and schedule overruns persists (e.g., Conboy, 2010;Flyvbjerg, 2006Flyvbjerg, , 2018. As Conboy (2010, p. 273) states in his study on budget overruns in IS development projects: "There is no reason to suggest that this trend is improving. Such failures are not restricted to certain industry sectors or project types".
Many (IS) projects fail to meet their targets due to poor initial estimation (e.g., Kutsch et al., 2011;Lang et al., 2013;Nelson & Morris, 2014), and people appear to be unable to learn from past mistakes. While people acknowledge failures to accurately estimate in the past, they generally do not question the accuracy of their current predictions Kahneman & Tversky, 1982), arguing that no meaningful comparison with past projects can be made due to the perceived uniqueness of the new project Kahneman & Lovallo, 1993). In addition, failures to meet estimates in past projects are often attributed to external or incidental factors .
In this paper, we suggest that the abstraction level of language usage in a business case of an IS project can serve as an early warning signal regarding the quality of the initial project planning as well as the risk of budget and schedule overruns. It is well known that language plays an important role in IS projects (Bostrom, 1989;Chiasson & Davidson, 2012;Conboy et al., 2012;Truex & Baskerville, 1998) as it is used to CONTACT Nick Benschop benschop@ese.eur.nl "describe the IS we build, to explain and justify their possible uses and implications, and to represent the data and information they contain . . ." (Chiasson & Davidson, 2012, p. 192). Prior literature has suggested that language might play a role in some of the persistent problems of information systems (Chiasson & Davidson, 2012;Conboy et al., 2012).
In this study, we extend the discourse on the role of language in IS projects by examining language usage in business cases with the aim of gaining insight into which projects are at risk of budget and schedule overruns. Drawing on information leakage research, which suggests that small differences in message wording might reveal valuable information (Sher & McKenzie, 2006), we investigate whether language usage in business cases can provide early detection of over-optimism in IS projects that could later manifest in budget and schedule overruns.
We draw upon the Linguistic Category Model (LCM) to analyse the extent to which concrete or abstract language is employed in business cases of IS projects (Riley et al., 2014;Semin & Fiedler, 1991). Prior research has established that the abstraction level of language usage is related to the way in which people think about, or construe, objects . According to Construal Level Theory (CLT), people can form abstract (i.e., focused on high-level, general, central, enduring and decontextualised features) or concrete representations (i.e., focused on specific and unique aspects or details) of an object (Trope & Liberman, 2003. When people construe objects at an abstract level (i.e., using a high construal level), they tend to use more abstract language, as compared to when they construe objects at a concrete level (i.e., using a low construal level) . Through analysis of the language used in business cases, CLT can provide insight into how managers construe projects. These project construals may in turn influence the estimates that managers make .
Based on the literature, two rival explanations concerning the relationship between construal and planning exist. One explanation, supported by the planning fallacy literature, implies that adopting a view that ignores the concrete details of a project (i.e., high construal level) can produce more accurate estimates (e.g., Haji-Kazemi et al., 2015;Kahneman & Lovallo, 1993;Lovallo & Kahneman, 2003;Mitchell, 2006). Another explanation, supported by CLT, suggests that a concrete view that takes into account the myriad details of a project (i.e., low construal level) can lead to more accurate plans (Trope & Liberman, 2003. This study aims to address this theoretical tension by testing two sets of hypotheses based on these two different theoretical perspectives. A further contribution is made by testing, in large IS projects, the relationship between (1) language usage in business cases and (2) budget and/or schedule overruns.

Theoretical background
Prior research suggests that how things are said matters. For example, Riley et al. (2014) conducted an experiment demonstrating that the willingness of people to invest in a company changes depending on whether its earnings press release is described using concrete or abstract language. However, in the context of IS projects, we are not aware of any research concerning the relationship between the abstraction level of language usage and budget and/or schedule overruns. This study contributes to the discourse on the role of language in IS projects by exploring this relationship. To do so, we draw upon both CLT and the LCM. Before getting into the specific aspects of CLT and LCM, we provide a brief review of the literature on the planning fallacy as it relates to budget and schedule overruns in IS projects.

Overruns of IS projects and the planning fallacy
Poor planning is a serious problem for IS projects (Kutsch et al., 2011;Moløkken-Østvold & Jørgensen, 2003;Shmueli et al., 2016a). Specifically, unrealistic estimates at the start of a project are a key cause of budget and schedule overruns (Flyvbjerg et al., 2002;Lang et al., 2013;Nelson & Morris, 2014;PricewaterhouseCoopers, 2014). Prior research suggests that underestimation is much more common than overestimation (Flyvbjerg, 2008;Flyvbjerg et al., 2002;Kutsch et al., 2011). This is especially true in complex situations (Connolly & Dean, 1997), with larger tasks typical of software engineering and management (Halkjelsvik & Jørgensen, 2012) and of large governmental IS projects such as the ones studied in this paper.
Three main causes for over-optimistic estimates have been identified: technical reasons, political reasons, and psychological reasons (Flyvbjerg, 2008;Flyvbjerg et al., 2002;Kutsch et al., 2011). Technical reasons relate to not being able to make accurate forecasts due to incomplete, inaccurate or incorrect information (Connolly & Dean, 1997;Kutsch et al., 2011;Meyer et al., 2002). Political reasons involve making intentionally over-optimistic estimates to increase the likelihood of the project being selected or started (i.e., strategic misrepresentation) Flyvbjerg et al., 2002;Kutsch et al., 2011). Psychological reasons refer to cognitive biases that can unintentionally lead people to make estimates that are over-optimistic. There are various psychological reasons for poor software estimates (see, for example, Buehler et al., 2010;Jørgensen & Moløkken-Østvold, 2004;Halkjelsvik & Jørgensen, 2012), amongst which the planning fallacy Kutsch et al., 2011) is the most well known.
The planning fallacy (Kahneman & Lovallo, 1993;Lovallo & Kahneman, 2003) refers to situations in which people ignore distributional data related to similar projects when making project estimates. As a result, they are likely to underestimate what is required to complete a project Kahneman & Lovallo, 1993;Lovallo & Kahneman, 2003). The planning fallacy is a robust effect which has been observed in a variety of contexts, including IS project management, and it is considered an important cause of budget and schedule overruns in IS projects Kutsch et al., 2011).
In this study, we suggest that the language usage in business cases of IS projects can serve as an early warning signal for budget and schedule overruns. In the following subsections, we will elaborate upon the relationship between language usage and budget and/ or schedule overruns. Semin and Fiedler (1991) developed the LCM as a language classification model which can be used to categorise language usage based on level of abstraction. As Riley et al. explain, the same behaviour or event can be described using varying levels of abstraction: "For example, a physically violent act can be represented by 'to punch,' 'to hurt,' 'to hate,' or 'aggressive'" (Riley et al., 2014, p. 63).

Linguistic Category Model
Research shows that an event or behaviour described at an abstract level, e.g., "aspiration", is perceived as less verifiable and less informative by an observer, whereas a concrete description of an event or a behaviour, e.g., "to stare", is perceived as verifiable and representing detail. Abstract representations are also "perceived as relatively stable over time and generalizable across settings" (Riley et al., 2014, p. 63), whereas an event or behaviour described in concrete language is perceived as drawing attention to incidental and situational factors (Riley et al., 2014;Semin & Fiedler, 1988).
The LCM deals with different levels of abstraction in language usage by grouping word choices into five categories: descriptive action verbs, interpretive action verbs, state action verbs, state verbs and adjectives. Each category varies on the dimension of abstractness, with descriptive action verbs being the most concrete and adjectives being the most abstract (see Table 1).
In the context of business cases for IS projects, we theorise that the use of abstract vs. concrete language may reveal useful information about the project. Prior research has indicated that the level of abstraction in how a person says something, can provide valuable information about how a person mentally construes it (Fujita et al., 2006;Semin & Fiedler, 1989;. Furthermore, Trope and Liberman (2010) suggest that there is a link between the construal level of individuals and the accuracy of their planning estimates. Building on this idea, we next discuss Construal Level Theory which provides a theoretical Table 1. Categories in the Linguistic Category Model, based on Semin and Fiedler (1991).

Category Explanation
Examples in an IS Project context Descriptive Actions Verbs (DAVs) Constitute the most concrete (i.e., the least abstract) language usage. These verbs refer to a specific action (e.g., 'to kick') rather than to a broader description which can encompass a variety of actions (e.g., 'to hurt'). DAVs have a clear beginning and end and refer to a single, observable event (Semin & Fiedler, 1991).
To log-in, to run on, to hire Interpretive Actions Verbs (IAVs) IAVs refer to a broader category of actions. For example, the IAV 'to help' someone does not refer to one specific action. It can refer to giving advice, carrying something, listening to someone, etc. (Semin & Fiedler, 1991).
To assure, to process, to implement State Action Verbs (SAVs) Actions which fall into this category cannot be objectively observed, unlike those which are classified as DAVs and IAVs. For example, it is difficult for an observer to objectively verify that a person indeed 'surprised' someone else. Rather than describing an action itself, they relate to an (emotional or cognitive) reaction to an action (Semin & Fiedler, 1991).
To remind, to burden State Verbs (SVs) Refer to mental processes or states. They do not need to be linked to any particular action and can persist over time. Unlike SAVs they do not refer to reactions evoked in others as a result of an action but rather describe perceptions or emotions of a person (Semin & Fiedler, 1991).
To consider, to expect, to assume, to strive for Adjectives (ADJ) Refer to qualities or characteristics of a person, object or situation. Nouns that are used to classify a person or object are also included in the category (e.g., the baker 'is a father') as they also provide information about a characteristic of the person or object in question (Coenen et al., 2006). An ADJ does not have to refer to a specific situation, a specific action or even a specific actor. As such, they make up the most abstract category (Semin & Fiedler, 1991).
Important, negative, timely, external, substantial basis for interpreting the relationship between language usage in IS project business cases and subsequent budget and/or schedule overruns.

Construal Level Theory
CLT holds that people form either abstract or concrete mental construals when thinking about objects (Stephan et al., 2010;. As Stephan et al. (2010, p. 270) describe: "any event or object can be represented at different levels of construal". At a high construal level, an object is construed more abstractly, with a focus on the general, central, enduring, and decontextualised features of the object, whereas at a low construal level an object is construed more concretely, with a focus on the specific and incidental aspects or details of the object (Trope & Liberman, 2003. Construal level is related to the concept of psychological distance , which refers to how near or far an object is perceived to be. The further the object is removed from the self in the here and now, the greater the psychological distance towards that object . The CLT literature refers to four types of psychological distance: temporal, spatial, social, or hypothetical. Trope and Liberman (2010, p. 440) describe the relationship between psychological distance and construal level as follows: "Transcending the self in the here and now entails mental construal, and the farther removed an object is from direct experience, the higher (more abstract) the level of construal of that object." In other words, a high construal level is associated with greater psychological distance whereas a low construal level is associated with less psychological distance.

The link between the level of abstractness and construal level
The concept of abstraction and concreteness plays a central role in both CLT and the LCM. Trope and Liberman (2010) explain that actions, like objects, can be represented at different levels. In the LCM, more abstract categories also refer to less concrete descriptions of the action being performed, or even the performer of the actions, and to broader descriptions of the type of action being performed, as well as its general meaning and valence (Semin & Fiedler, 1991). 1 Trope and Liberman (2010) note that a direct link has been established in prior literature between language abstraction and psychological distance. Several studies have shown that as psychological distance increases, so does the usage of abstract language. For example, a study by Fujita et al. (2006) demonstrated that when describing psychologically distant events, subjects used more abstract language. Similarly, subjects who were asked to describe actions of themselves used more concrete language, as compared to subjects who were asked to describe the actions of others (Semin & Smith, 1999). This suggests that the language used by someone to describe objects or actions could thus reveal information about their construal level. This idea is consistent with the phenomenon of "information leakage" (Sher & McKenzie, 2006). Gaining insight into someone's construal level can be valuable since theory suggests that there is a link between construal level and planning accuracy (Trope & Liberman, 2003. 2.5. Relation of construal level to budget and schedule overruns 2.5.1. Theory which suggests that a high construal level will improve estimates Kahneman and Lovallo (1993) suggest that upon estimating schedules and budgets people commonly adopt an "inside view" that can lead to over-optimistic estimates. Estimates generated from an inside view draw upon "knowledge of the specifics of the case, the details of the plan that exists, [and] some ideas about likely obstacles and how they might be overcome" (Kahneman & Lovallo, 1993, p. 25). Generating estimates in this manner is intuitive and common Kahneman & Lovallo, 1993;Lovallo & Kahneman, 2003). However, this approach can cause people to ignore distributional information about past projects when making their estimates Kahneman & Lovallo, 1993;Kutsch et al., 2011;Lovallo & Kahneman, 2003). This has also been suggested to be the reason why people do not seem to learn from past estimation mistakes (Buehler et al., , 1994Kahneman & Lovallo, 1993;Lovallo & Kahneman, 2003), which could explain why the problem of poor estimation persists. Furthermore, when using an inside view, taking into consideration all possible scenarios for what the future may bring is very difficult, which means that some possible obstacles will not be factored into the estimates (Kahneman & Lovallo, 1993;Lovallo & Kahneman, 2003).
A more realistic forecast can be generated by adopting an "outside view" (Flyvbjerg, 2013;Haji-Kazemi et al., 2015;Kahneman & Lovallo, 1993;Lovallo & Kahneman, 2003;Stingl & Geraldi, 2017), which ignores the details of the project and, instead, bases estimates on the outcomes of similar projects. If necessary, these estimates can then be adjusted based on a comparison between the current project and the prior projects (Kahneman & Lovallo, 1993;Lovallo & Kahneman, 2003).
Just as the inside view focuses on the details of the project at hand, so does the adoption of a low construal level . Conversely, the outside view focuses on reference class abstraction and ignores the incidental features of the project (Lovallo & Kahneman, 2003). As such, an outside view has much in common with a high construal level in that both favour abstraction and avoid the details of the project. In line with this, Halkjelsvik and Jørgensen (2012) observe that prior studies suggest a link between a high construal level and more accurate estimates, though they point out that these findings are not conclusive and that there are conflicting findings in literature.
Further support for the link between a high construal level and accurate estimates is provided by Buehler and Griffin (2003) who suggest that focusing on specific, stepbased plans to generate estimates tends to cause people to ignore relevant prior experiences, leading to overoptimistic estimates. Focusing on concrete specifics is consistent with a low construal level . Furthermore, several experiments by Kanten (2011) suggest that low temporal psychological distance is associated with lower task duration estimates. Similarly, Peetz et al. (2010) found that low psychological distance generated lower estimates of expected completion times. Theory thus suggests that a high construal level may be associated with more realistic estimates which should reduce budget and schedule overruns. Since a high construal level is reflected in abstract language usage, we state the following hypotheses: Hypothesis 1a: The use of more abstract language in sections of the business case related to the IS project budget is associated with smaller budget overruns.
Hypothesis 2a: The use of more abstract language in sections of the business case related to the IS project schedule is associated with smaller schedule overruns.
2.5.2. Theory which suggests that a low construal level will improve estimates There also is a theoretical perspective that suggests that a low construal level can improve planning. Under CLT it is argued that people often adopt a high construal level when planning for the future, basing their estimates on abstract representations. Trope and Liberman (2003) suggest that, since these abstract representations of the future are oversimplifications, people might ignore important details. These authors link this reliance on oversimplified representations of the future to overconfident predictions and to the planning fallacy, implying that that people with a high construal level may actually be more prone to over-optimistic and thus less accurate budget and schedule estimates.
A second reason why low construal level might lead to more accurate plans is related to the concepts of feasibility and desirability. According to CLT, when people adopt a low construal level, feasibility concerns become more important and desirability concerns become less important (Liberman & Trope, 1998;. In goal-directed activities such as IS projects, the desirability of the goal is associated with a high construal level, but the feasibility of attaining this goal is associated with a low construal level (Liberman & Trope, 1998). Thus, when people adopt a high construal level, they tend to focus more on the desirability of an IS project and less on feasibility, possibly underestimating the resources required to carry out the project. This is in line with Siddiqui et al. (2014), who find that, for complex tasks, people with a low construal level are more attuned to the various steps that are required to complete the task, leading to higher duration estimates and thus a reduction of the planning fallacy. From this theoretical perspective, high construal may be associated with inaccurate estimates that will give rise to budget and schedule overruns. Thus, we state the following set of alternative hypotheses: Hypothesis 1b: The use of more abstract language in sections of the business case related to the IS project budget is associated with bigger budget overruns.
Hypothesis 2b: The use of more abstract language in sections of the business case related to the IS project schedule is associated with bigger schedule overruns.
Our rival hypotheses are included in our theoretical model, which is depicted in Figure 1.

Research design
Our research design followed a sequential exploratory mixed-method approach in which the results of an initial qualitative analysis served as input for a subsequent quantitative analysis that was used to empirically test the relationships found during the exploratory qualitative phase (Creswell et al., 2003). As Creswell et al. (2003, p. 181) describe: "the purpose of this design is to use quantitative data and results to assist in the interpretation of qualitative findings." Specifically, linguistic analysis of the business cases of six Dutch governmental IS projects was performed in the qualitative phase using the LCM to manually code all instances of DAVs, IAVs, SAVs, SVs and adjectives (as described in Table 1) related to the budget and schedule of the project. This provided the data necessary for the quantitative phase which involved a two-way frequency analysis to determine whether there was a statistically significant relationship between budget and/or schedule overruns and the language abstraction level.

Dataset
The dataset consisted of six Dutch governmental IS projects with a budget of at least five million euros.
These projects are considered to be a relevant sample since a parliamentary investigation concluded that many such projects exhibit significant problems and that 1-5 billion euro is being wasted each year (Commission Elias, 2014). To have meaningful outcome data with which to assess budget and schedule performance, we limited our analysis to business cases of either completed or abandoned projects. Collaboration with the Dutch government enabled access to the business case documents.
Like many projects (Project Management Institute, 2018), the ones that we examined did not completely follow a waterfall approach, nor were entirely agile, but were managed using hybrid methods which contained elements of both approaches. In all the IS projects studied, the schedule and budget had been estimated at the start of the project and formalised into a business case. This is a common practice for such large government IS projects in the Netherlands, where specific budgets have to be estimated and requested well in advance before they can be allocated.

Outcome variables
Budget and schedule overrun values, which constitute the outcome variables for our study, were obtained by comparing the initial estimates of the required budget and schedule at the start of the project, with the actual costs and time spent at the time of completion or when the project was abandoned (Flyvbjerg, 2018;Flyvbjerg et al., 2002). Since an overrun of two million euros is a lot more significant for a project of five million euros then for a project of fifty million euros, the budget overrun was calculated as a percentage of the initial budget estimate. Similarly, the schedule overrun was calculated as the percentage difference between the initial time estimate and the actual amount of time spent on the project. Based on these percentages, the IS projects were assigned to specific categories, describing the degree of overrun on an ordinal scale (0: No overrun; 1: Overrun of 1-100%; 2: Overrun of 101-200%; 3: Overrun of more than 200%). This enabled a two-way frequency analysis of the relationship between the level of language abstraction and overruns.

Coding and analysis process
Each of the business cases was coded and analysed using Atlas.ti, a computer-aided qualitative data analysis software. Since theory on the LCM relates to the abstraction level of specific words, coding was performed at the word level. This is the common practice for measuring and coding language abstraction, as described in the LCM coding manual (Coenen et al., 2006). Each instance (i.e., each word belonging to one of the LCM categories) was manually coded and assigned a score representing the abstraction level of the word in Atlas.ti, based on practices and coding rules described in the LCM coding manual (Coenen et al., 2006). While coding, we focused specifically on the sections of the business case in which the budget and schedule were discussed, and each instance was also coded to indicate whether the word appeared in a section about the budget or schedule.
Instances of concrete or abstract language usage were classified into one of five categories: DAV, IAV, SAV, SV and ADJ. Specifically, we identified all instances of DAVs, IAVs, SAVs or SVs (see Table 1), in accordance with the LCM coding manual (Coenen et al., 2006). Verbs that did not meet the criteria for any of these categories were not coded. Adjectives, as well as nouns which classified objects (e.g., my pet is a dog), were coded as an ADJ (Coenen et al., 2006).
Each DAV, IAV, SAV, SV and ADJ was assigned an abstraction score. All coded predicates were given a score between 1 and 4, where a score of 1 represents the lowest level of abstraction (i.e., the most concrete level) and a score of 4 represents the highest level of abstraction (i.e., the most abstract level). 2 Table 2 provides an overview of the abstraction score associated with each of the five LCM categories.

Two-way frequency analysis
Two-way frequency analysis using non-parametric statistical tests was performed to probe whether the relative number of observations of abstract (or concrete) language usage coded according to the LCM is associated with the degree of budget and schedule overruns. Specifically, Chi-square tests were performed to determine whether the number of observations of abstract (or concrete) language usage were higher (or lower) than one would expect if there were no relationship between the abstraction level of language usage and overruns in budget and schedule. In addition, Goodman and Kruskal's gamma was calculated to probe the direction of the effect, the effect size, and its statistical significance. The following two relationships were examined: (1) Abstraction of language usage related to budget and overruns in budget (H1a & H1b) (2) Abstraction of language usage related to schedule and overruns in schedule (H2a & H2b) Tests of multi-way frequency analyses, such as the Chi-square test, generate estimates that are too conservative when the expected frequency in any cell falls below one or when the expected frequency is less than five for at least 20% of the cells (Tabachnick & Fidell, 1996;Watson & Gallois, 2002). While performing our analyses, we found that some of the expected frequencies, particularly for DAVs, fell below these thresholds. This is not surprising since usage of DAVs is relatively uncommon. Riley et al. (2014) similarly observed a very low number of DAV's in their study. Following an approach employed by Watson and Gallois (2002), who also encountered the situation of counts being too low when performing multi-way frequency analyses, the five LCM categories were collapsed into two groups. DAVs, IAVs and SAVs, which constitute the most concrete language were put into the "concrete" category. SVs and ADJs, which constitute the most abstract language, were put into the "abstract" category. This two-level ordinal variable was used in our two-way frequency analyses.

Outcomes of the qualitative coding process
The aim of the coding process was to identify and analyse all instances of concrete and abstract language usage in the six IS project business cases. The following four examples of the coding (which was based on Table 1) provide insight into how the coding process worked for different LCM categories and into how abstract or concrete language was used in the business cases. For the readers' sake, the texts in these examples have been translated from Dutch to English. One example from a section related to the project budget involved a warning that hiring outside expertise could lead to significant costs: "The possibility of hiring knowledge, can lead to sizeable 'out of pocket' costs." In this sentence, the verb "to hire" meets the criteria of a DAV and the two adjectives "sizeable" and "out of pocket" meet the criteria of ADJs. In addition, all three instances received a second code which indicated that they were used in a section related to the project budget.
Another example, this time illustrating an IAV, is provided in a discussion of the cost savings that the project could realise, stating that the new system would allow the organisation to "process more volume at a massive scale, at lower costs". The verb "to process" meets all the requirements of an IAV. A third example, involving an SAV, is provided in a discussion of the costs of not updating an existing system, where it was described that doing so would "burden the offices to a higher degree". The verb "to burden" is an example of a SAV. Finally, an example involving an SV is the use of the verb "to expect" in the following sentence: "For the implementation of this part, no costs are expected".
In total, we identified and assigned an abstraction score to 1,880 instances of concrete and abstract language. Table 3 provides an overview of the number of observations for each LCM category as well as information about the number of observations related to  (Coenen et al., 2006  the budgets and schedules of projects. The number of IAVs and ADJs was relatively high, while the number of SAVs, DAVs and (to a lesser extent) SVs was relatively low. As described above, the LCM categories were subsequently grouped into instances of concrete language usage, consisting of DAVs, IAVs and SAVs, and abstract language usage, consisting of SVs and ADJs, forming the basis for our subsequent quantitative analysis. Table 4 summarises the occurrence frequency of concrete and abstract language usage related to the budget overruns of projects. Its columns represent the different categories of budget overrun, as described previously.

Language usage related to budget overruns
Of the six IS projects, one experienced no budget overrun, one experienced an overrun of less than 100%, two experienced an overrun of between 100% and 200% and two experienced an overrun of more than 200%. The rows provide information about (1) how many instances of abstract and concrete language were observed for the various degrees of budget overruns, the actual count, and (2) the number of instances that would be expected under the assumption that there were no relationship between abstraction of language usage and budget overruns. Rather than focusing on the differences between absolute counts across categories (i.e., the columns), our interest here was in differences between the expected frequencies and the actual frequencies in the different cells. These differences provide information about the relationship between language usage and budget overruns. To make this comparison easier, Table 4 depicts both the actual count as well as the expected number of observations in any given cell, assuming no relationship between the two variables. Thus, if the actual counts differ significantly from the expected values, the two variables are related. As depicted in Table 4, under no overruns, more instances of abstract language were observed than would be expected, yet for high degrees of overruns in budget, the actual count of abstract language usage was lower than expected. A reverse pattern can be noticed for observations of concrete language usage. Specifically, under no budget overruns or overruns under 200%, fewer instances of concrete language were observed than would be expected yet, for budget overruns of over 200%, the actual count of concrete language was higher than expected.
To determine if these observed differences were statistically significant, a Chi-square test was conducted which revealed a significant effect (χ 2 (3) = 16.72, p < 0.01). While this indicates that abstraction level of language usage is related to budget overruns, it does not provide information about the direction of the effect. Hypothesis 1a predicts that more abstract language is related to smaller budget overruns, whereas Hypothesis 1b predicts that more abstract language is related to bigger budget overruns. To test these hypotheses, a Goodman & Kruskal's gamma test revealed a negative relationship between abstraction of language usage and overruns in schedule (γ = −0.23, SE = 0.08, p < 0.01), indicating that more abstract language is related to smaller budget overruns. Hence, Hypothesis 1a was supported and Hypothesis 1b was not. Table 5 provides an overview of concrete and abstract language usage related to the schedule overruns, depicting in its columns the relationship between the various categories of overruns and in its rows the observed frequency of abstract and concrete language as well as the expected frequencies under the assumption of no relationship between the two variables. For schedule overruns, we used the same intervals as above, but without a more than 200% column, since no projects exhibited more than a 200% overrun. Of the six IS projects, one experienced no schedule overrun, four experienced an overrun of less than 100% and one experienced an overrun of between 100% and 200%. When there were no schedule overruns, more instances of abstract language are noticeable in Table 5 than would be expected. For high degrees of overruns in schedule, however, the actual count was noticeably lower than expected. A reverse pattern can be noticed for concrete language usage. Using a Chi-square test, a statistically significant difference between the expected and actual frequencies was found (χ 2 (2) = 6.73,  p = 0.04). Therefore, abstraction of language usage and schedule overruns are related. Hypothesis 2a predicts that more abstract language is related to smaller overruns in schedule, whereas Hypothesis 2b predicts that more abstract language is related to bigger schedule overruns. In order to test these hypotheses, a Goodman & Kruskal's gamma test was performed and a negative relationship was found between abstraction of language usage and overruns in schedule (γ = −0.15, SE = 0.06, p = 0.01). This indicates that more abstract language is related to smaller schedule overruns in schedule, supporting Hypothesis 2a and not Hypothesis 2b.

Language usage related to schedule overruns
In sum, our analysis showed a statistically significant relationship between abstraction of language usage in business cases and budget/schedule overruns. Specifically, in the business cases analysed, overruns in budget and schedule were smaller when more abstract language was used in project planning. Next, the implications and limitations of our research are discussed as well as suggestions for future research.

Implications for research
First, this study contributes to the body of knowledge on overruns in IS projects by identifying a potential early warning signal for budget and schedule overruns. This is an important contribution because budget and schedule overruns are among the most persistent problems that IS projects encounter (Cao, 2008;Conboy, 2010;Lang et al., 2013). Yet, relatively little attention has been given to identifying potential early warning signals of budget and schedule overruns. One such early warning signal, abstraction level of language usage in business cases, is proposed here and supported by our findings.
The second contribution of our study lies in extending the existing literature on the importance of language usage in IS projects (Bostrom, 1989;Chiasson & Davidson, 2012;Conboy et al., 2012). In a special issue of EJIS on opportunities for qualitative methods in IS research, Chiasson and Davidson (2012) have suggested that our understanding of IS implementation could be improved by analysing the language usage in IS documentation. In response, we explored and analysed the role of language usage abstraction in IS project business cases. Our findings suggest that language usage in business cases could provide valuable prognostic information about IS projects.
Third, our study provides further insight into the relationship between construal level and planning. As Halkjelsvik and Jørgensen (2012) pointed out, prior studies suggest that there is a link between a high construal level and more accurate estimates, but their findings are conflicting. While some theory suggests that adopting an abstract view can reduce the planning fallacy (e.g., Buehler & Griffin, 2003;Kahneman & Lovallo, 1993;Kanten, 2011;Lovallo & Kahneman, 2003;Peetz et al., 2010), there is also theory that argues that a concrete view can promote more feasible and realistic plans (Siddiqui et al., 2014;Trope & Liberman, 2003. To address this theoretical tension, we tested two sets of rival hypotheses and found that abstract language usage is associated with smaller budget and schedule overruns. This suggests that a high construal level can lead to more realistic estimates, at least in the context of IS projects. More research is needed to gain further insight into the conditions under which a high construal level is, or is not, beneficial to estimation.
The fourth contribution of our research is to the literature on language abstraction and its effect on decision making. To our knowledge, this is the first study to identify a connection between abstraction of language usage in IS projects and the planning fallacy. More broadly, this research opens the door for other scholars to apply our linguistic analysis method to gain insights into other problems plaguing IS projects. Our process, while manual, could potentially be automated once an exhaustive database of DAVs, IAVs, SAVs, SVs and ADJs has been compiled, allowing researchers to examine the effects of language abstraction in a large body of projects.

Implications for Practice
Budget and schedule overruns in IS projects represent a significant problem for organisations and can occur regardless of whether traditional (e.g., waterfall) or agile project management approaches are used (Lang et al., 2013;Nelson & Morris, 2014). The results of this study suggest that language usage in IS project business cases can provide early warning signals of possible budget and schedule overruns. Specifically, an abundance of concrete language might suggest over-optimistic planning that may ultimately result in budget and schedule overruns. IS practitioners could use linguistic analysis to detect signs of potential over-optimism. When such signs are detected, IS managers can benefit from reevaluating the project plans and then putting adequate project control mechanisms in place to minimise the risk of budget or schedule overruns.
IS projects often face a high degree of uncertainty. As such, the desire to provide concrete descriptions of these projects in business cases is understandable. However, there may be a downside to adopting this approach. Our findings suggest that concrete language usage in business cases of IS projects is associated with bigger budget and schedule overruns. To avoid this problem, it may be possible to use more abstract language to evoke a high construal which in turn may lead to more accurate plans.
Since the information in business cases plays an important role in the decision to start a project, it is also important for organisations to be aware that concrete or abstract language usage in business cases may have an effect on a range of stakeholders, including the executives (Cadle & Yeates, 2008). Riley et al. (2014) showed that investors are more willing to invest in an organisation when positive (negative) news is described using concrete (abstract) language. It is possible that executives assessing a business case can similarly be influenced by the language usage in business cases. This may lead to undertaking projects that should not be undertaken, or to not undertaking projects that should be undertaken. It thus could be worthwhile to educate decision makers about the effect that small changes in language usage can have on decisions in IS projects in order to avoid biasing decisions in ways that could be detrimental.

Limitations
As is the case with all research, this study has limitations. First, we were only able to obtain access to a limited number of large governmental IS projects, since business cases for such projects are not readily available to the public. As such, we only explored these effects in the specific context of large governmental IS projects in the Netherlands. While CLT provides no indication that such effects would be limited to this specific context, further research is needed to confirm our findings. While the pattern of results that we obtained is intriguing and suggestive of a relationship between language abstraction and project overruns, further quantitative research employing larger and more diverse samples of IS projects is warranted to determine if our results are generalisable.
Second, this study only examined language abstraction and there may be other ways in which language usage could play a role in IS projects. Third, we certainly do not mean to suggest that language abstraction in business cases is necessarily the best or only predictor of budget and schedule overruns. Yet, the relationship between language usage and overruns observed in this study could serve among other predictors as an early warning signal. Fourth, while a relationship was found to exist between business case language abstraction and project overruns, it is uncertain whether the relationship is causal in nature.
Finally, our study focused on budget and schedule overruns without investigating other important factors in project management such as quality. The reason for this is that we were unable to obtain accurate outcome data regarding project quality as the organisation we worked with did not have a systematic way of capturing and quantifying this type of outcome data.

Directions for future research
In addition to further research aimed at addressing the limitations noted above, there are a few other possibilities for future research. The existing study examined one-way written communication, but a considerable amount of communication on projects is two-way and spoken rather than written. Thus, the frequent face-toface communication among analysts, designers, developers and users, which is key to agile IS projects, forms a particularly interesting setting for further research. Perhaps similar language usage by multiple parties could reinforce certain views about the project, or one party could be influenced by another to adopt their form of language use.
There is also value in studying language usage at various stages of IS projects, other than at the business case stage. It might be particularly interesting to study how language usage develops and changes over the course of the project. Future research could investigate whether the development of language during a project differs depending on project performance. In addition, it could be relevant to study how people in different IS project roles might use language differently. For example, project owners might use language differently than do project managers and different information might be leaked as a result.
While the current research focuses on the abstractness of language usage, there are other differences in how things are said that may leak information of influence to people involved in IS projects. For example, a recent study by Idan et al. (2018) found that giving similar descriptions of a situation using either verbs (e.g., "settling") or nouns (e.g., "the settlement") can influence emotions of readers and their support for specific courses of action. This suggests there may be more to the effects of verbs than only the abstraction level.
Finally, from the perspective of the LCM, all adjectives are considered equal and representing the same abstraction level. Yet, prior research into attribute framing has shown that using different adjectives when describing an object can lead to big differences in evaluations of that object, even when both adjectives convey the same factual information (Levin et al., 1998). A common practice of real-estate agents, for example, is to carefully choose specific adjectives to increase the attractiveness of a property (e.g., describing a tiny home as "cosy" rather than "cramped"). The choice of adjectives may thus subconsciously bias decision makers.

Conclusion
Based on the planning fallacy, CLT, and LCM, we identified in this exploratory study a potential early warning signal for overruns in IS projectsthe abstraction level of language in the business case. Using a sequential exploratory mixed-methods design, a linguistic analysis of the language usage in business cases was conducted to determine whether there is a relationship between budget or schedule overruns in IS projects and abstraction of language usage. Subsequently, we performed a two-way frequency analysis to determine whether this relationship was statistically significant. Our findings indicate that concrete language usage in the business case is associated with bigger budget and schedule overruns. This suggests that the presence of predominantly concrete language usage in a business case could serve as an early warning signal for overruns in IS projects.
Our study contributes to the existing literature on IS project management by (1) identifying a potential early warning signal for overruns, (2) extending the existing literature on the importance of language usage by demonstrating that the language usage in business cases could provide valuable information about the risk of overruns, (3) expanding the discussion about the relationship between construal level and task estimates by proposing that estimates made with a high construal level are more accurate, and (4) connecting the research on abstraction of language usage to the planning fallacy and identifying an early warning indicator for overruns.
Our study opens the door for further research into the role of language usage in IS projects and specifically into how language usage is related to the planning fallacy and overruns. Additional research, incorporating a larger and richer sample of IS projects, could help to provide further statistical support to our findings. In addition, future studies could focus on different aspects of language usage in IS projects in order to discover different ways in which language may subconsciously influence decision making and/or what valuable information leakage may result from this choice of language. With this study, we hope to have provided a foundation for further research on this subject.

Notes
1. In psychology, valence refers to the perceived attractiveness of an object, event, or situation. 2. Both IAVs and SAVs were assigned an abstraction score of 2, as prescribed by the LCM coding manual (Coenen et al., 2006). Prior research has indicated that both of these categories do not differ significantly in abstraction level (Semin & Fiedler, 1991).