Measuring bullying at work with the short-negative acts questionnaire: identification of targets and criterion validity

ABSTRACT The current study aims to investigate the psychometric properties of the abbreviated version of the Negative Acts Questionnaire, also known as the SNAQ (Short Negative Acts Questionnaire). A Latent Class analysis of 7,790 observation from 38 Belgian organizations demonstrated that four latent classes of respondents can be distinguished in our data: ‘not bullied’, ‘work-related criticism’, ‘occasionally bullied’, and ‘severe targets’. Like with the original full version, both occasionally bullied and the severe targets align with the theoretical definition of workplace bullying as exposure to repeated and systematic negative behavior. The extent to which these clusters report bullying does not only account for their difference, yet also the type of behavior sets the two categories apart. Whereas severe targets had a high probability to report social isolation, this type of social behaviors was more likely to be absent among the occasionally bullied group. The results from the HSD post-hoc test demonstrated that both occasionally bullied and severe targets experienced deteriorating health, more sickness absenteeism and lower for job satisfaction than the two other latent class clusters. Hence, the SNAQ seems to be a psychometrically sound and easy to use instrument to identify targets exposed to varying degrees of workplace bullying.

Since the turn of the millennium, academic and applied interest in workplace bullying has grown strongly (Einarsen, Hoel, Zapf, & Cooper, 2011), with bullying increasingly being included in surveys on the psychosocial working environment. The phenomenon crystalises and manifests itself through negative social behaviours, such as humiliating remarks, gossiping, finger pointing, or excluding employees from the social group or social activities. Negative social behaviours of these kinds do not necessarily need to be considered as problematic, yet become so when frequently and consistently targeted towards particular employees (Zapf, 1999).
A vast body of studies has demonstrated the multiple negative effects of workplace bullying, especially for those targeted. In a meta-analysis, Nielsen and Einarsen (2012) reported cross-sectional associations between workplace bullying, on the one hand, and mental ill health, somatisation, psychological ill health, post-traumatic stress, burnout, sleep, and strain, respectively, on the other hand, with Pearson's r varying between .23 and .37. Similar relationships exist between workplace bullying and job-related outcomes such as turnover intention, organisational commitment, and job satisfaction (Nielsen & Einarsen, 2012). When comparing targets of bullying with employees who were not bullied, the difference between both groups often exceeds one standard deviation (Einarsen, Hoel, & Notelaers, 2009;Leon-Perez, Notelaers, Arenas, Mundate, & Medina, 2014;. A later meta-analysis saw 57% of the targets of bullying reporting symptoms of Post-Traumatic Stress Disorder above thresholds for caseness (Nielsen, Tangen, Idsmoe, Matthiesen, & Magerøy, 2015). Longitudinal studies support these cross-sectional findings. For instance, in a five-year prospective study, Einarsen and Nielsen (2015) found exposure to bullying to be a significant predictor for subsequent increases in mental health problems. In a three-wave longitudinal study employing a representative sample, bullied employees appeared to be two times more likely to display subsequent suicidal ideation as compared to non-bullied employees (Nielsen, Nielsen, Notelaers, & Einarsen, 2015). Hence, it is clear that workplace bullying poses a serious threat for the health of the workforce.
Two main approaches to measure exposure to workplace bullying exist, and which are both based on self-reports. Regarding the first approach, a self-labelling method investigates the extent to which a respondent claims to have been bullied within a certain time frame, using a single-item measure (Nielsen, Notelaers, & Einarsen, 2011). This approach seems to have a high content validity if the respondents are presented with a precise and easy to grasp theoretical definition of the concept (Nielsen et al., 2011). Yet, the self-labelling method has some flaws as well, as it does not offer any insight into the nature of the behaviours involved. As people may have different personal thresholds for labelling themselves as a victim, the self-labelling method comprises a very subjective approach in which personality, emotional factors, and cognitive factors may figure as potential biases (Felblinger, 2008;Lewis, 2004;Nielsen et al., 2011).
In the light of these shortcomings, the behavioural experience method has been proposed as an alternative approach (Nielsen et al., 2011). Here, irrespective of the label that respondents put on their own experiences, respondents are asked to report, within a given time period, how frequently they have been exposed to various types of negative social behaviours that are typical for bullying situations if occurring repeatedly over time. For this purpose, different inventories or scales have been developed (see Table 2 from Nielsen et al., 2011 for an extensive overview). Some scales have only been used in one single study, whereas others, such as the Leymann Inventory of Psychological Terror (LIPT; Leymann, 1990), the Negative Acts Questionnaire (NAQ/NAQ-R; Einarsen et al., 2009;Einarsen & Raknes, 1997), the EAPA-T (J Escartín, Rodríguez-Carballeira, & Zapf, 2010), have been employed in a range of studies. Nielsen (2009) concluded, however, that NAQ-R seems to be by far the most utilised inventory, with its popularity reflected in the interest in the validation article of the NAQ-R (Einarsen et al., 2009), which by the end of 2017 had been cited more than 700 times (scholar.google.com).
The NAQ-R consists of 22 items and adapted the NAQ from Scandinavia to the American-Anglo-Saxon culture. Its items resulted from working with focus groups counting in total 61 respondents (Einarsen et al., 2009). The NAQ-R items cover different types of negative behaviour: whereas person-oriented bullying, including social isolation, is measured with 12 items, 7 items point to work-related bullying items, and physically intimidating bullying is measured with three items. Empirical research has demonstrated that physical forms of aggression may differ from workplace bullying. In some studies, these indicators had low factor loadings (Einarsen et al., 2009), even beneath a threshold of .3 (Giorgi, Arenas, & Leon-Perez, 2011). Also, studies employing Latent Class (LC) Analysis to deal with the highly skewed distributions of negative acts have found that, on the one hand, targets of workplace bullying hardly reported acts of physical aggression, while on the other hand, targets of physical aggression rarely reported bullying behaviours (Einarsen et al., 2009;Leon-Perez et al., 2014;Reknes et al., 2017).
Scholars have further questioned the NAQ-R's discriminant validity claiming that it measures other phenomena than mere bullying. For example, Fevre, Robinson, Jones, and Lewis (2010) and Ólafsson and Jóhannsdóttir (2004) claimed that some workrelated behaviours, e.g. "excessive monitoring of work" or "being given tasks with unreasonable deadlines," may not be perceived as behaviours attributed to bullying. They suggested that these behaviours should rather be seen to form part of the managerial prerogative. Specifically, they argued that managers sometimes would need to give employees other work, renege on promises given, as well as monitor work closely (cf. transactional leadership). As a counter-argument, Beale and Hoel (2011) posited that in cases where these practices are applied excessively or for personal gain, they could be considered bullying, particularly when used together with other types of bullying behaviour. Employing the two negative acts' criterion (Mikkelsen & Einarsen, 2001)meaning that a respondent who scored "weekly" or more often' to have been subjected to two negative acts or more is to be classified as a victim - Notelaers and De Witte (2003) reported that the discriminating properties of some work-related negative social behaviours were weak. Moreover, using this criterion, half of the identified targets reported only work-related negative behaviours. The idea of lower discriminating properties of work-related negative behaviours came also to the fore in two Spanish studies (Escartín, Rodríguez-Carballeira, Zapf, Porrúa, & Martín-Peña, 2009;Rodríguez-Carballeira, Solanelles, Vinacua, García, & Martín-Peña, 2013). In these studies, respondents indicated that the most severe forms of bullying were related to emotional abuse (similar to person-oriented bullying), whereas, work-related forms of bullying were of lower severity.
In their seminal work on the original NAQ scale, Einarsen and Raknes (1997) employed principal component analysis to explore the dimensionality of the scale. They separated social isolation from person-oriented bullying and from work-related bullying. In line with this, studies  have indicated that targets of bullying are distinct from other classes, not only with respect to the frequency of reported negative acts but also with respect to the kind of negative acts reported. In particular, severe targets reported behaviours associated with social exclusion Notelaers, Vermunt, Baillien, Einarsen, & De Witte, 2011). The importance of social exclusion as a major part of the phenomenology of bullying comes clearly to the fore in the social ostracism research where social isolation as such is conceived as a severe social stressor (Williams, 2007). With fMRI scans, researchers have even identified its location in the brain and demonstrated that rejection and physical pain are similar, not only in that they are both distressing but also in that they share a common somatosensory representation (Kross, Berman, Mischel, Smith, & Wager, 2011).

The present study
The NAQ-R is relatively long with its 22 items making it difficult to integrate theoretical and practical research that is administrated via online platforms. Indicative of the need for a short version of the NAQ, is the number of short NAQ-like inventories that have been proposed. The NAR-RUS (Simons, Stark, & DeMarco, 2011) was developed in a sample of nurses in a Massachusetts' hospital and selected only four NAQ items with the highest factor loadings in a principal component analysis. In Japan, the NAQ-R was adapted and reduced to 12 items without a clear rationale for omitting and including items (Takaki et al., 2009). In Italy, Giorgi and colleagues (2011) developed a 17-item NAQ based on earlier experience, thereby omitting 2 indicators of social exclusion, and one of the two items measuring violence or aggression. In Spain, a 12-item version of the NAQ-R was developed, wherein 2 items referring to social isolation and 6 indictors of work-related bullying were omitted (Jiménez, Muñoz, Gamarra, & Herrer, 2014).
Scholars may have different reasons for developing a shorter questionnaire. However, some did not provide a rationale for choosing items, while others based their decisions on statistical criteria only. We agree that there is a need for a shorter questionnaire. In the current paper, we present and investigate the psychometric properties of a short version of the Negative Acts Questionnaire (SNAQ) taking the above reasoning and empirical findings into consideration. We aimed for an equal number of indicators for each of the three distinguished forms of bullying; person-oriented, work-related, and social exclusion, yet creating an overall valid measure of exposure to workplace bullying with items chosen based on the aforementioned issues and a range of statistical analyses to investigate the item properties of the original NAQ-R scale. The SNAQ was presented to a wider audience in two conferences of the International Association of Workplace Harassment and Bullying. For the sake of economic expression, the presented analyses in both structural equation modelling and LC modelling across different samples from the UK, Belgium, and Norway will not be described here.
In this contribution, we aim to investigate the psychometric properties of the SNAQ that was disseminated and used among scholars from different countries such as Denmark, Finland, Sweden, Lithuania, UK, Spain, France, Italy, Norway, India, Jordan, South-Korea, and New-Zealand. First, we investigate the dimensionality of the SNAQ by comparing the overall fit of the measurement models that allow for the identification of targets. Second, we examine the scale's criterion-related validity, by focusing on its predictive validity. Given the vast body of empirical research supporting the negative effects of workplace bullying with respect to health and well-being (Kivimäki, Elovainio, & Vahtera, 2000;Nielsen & Einarsen, 2012), the criterion validity of the distinguished latent classes will be investigated for job satisfaction, need for recovery, self-perceived health, somatic symptoms, sickness absenteeism, and presenteeism.

Sample and procedure
The current sample was selected from a larger sample of 101,046 respondents from 381 different Belgian organisations collected by a statistical consulting agency that specialises in the measurement of occupational stress for Belgian Health and Safety Executives who, by Belgian law, are entitled to guide organisations and employers with respect to their prevention policies regarding safety, ergonomics, health, and well-being. Between January 2008 and May 2016 these health and safety bodies measured occupational well-being with the aim to formulate occupational health prevention measures. The Short Inventory to Monitor Psychosocial Hazards (Notelaers, De Witte, van Veldhoven, & Vermunt, 2007) formed the skeleton of the survey and, depending upon the request of the organisations, extra instruments were added including the SNAQ, scales for health, sickness absenteeism, and so forth. The data were collected in different ways. In some cases, data were collected in organised group sessions to allow employees to complete a paper-and-pencil version of the questionnaire while at work. For some organisations, the paper-and-pencil version of the survey was distributed by mail (internally or externally). In other ones, both a paper-and-pencil self-administered survey and an electronic version were used. Finally, most participating organisations employed an electronic survey distributed to employees' mail. Anonymous paper-and-pencil questionnaires were collected during group sessions or were returned to sealed boxes that were collected directly by the health and safety bodies. Alternatively, in many organisations, employees were given the option of returning completed questionnaires directly by mail to the specific health and safety body or to the statistical consultancy agency in a sealed envelope. No members of a surveyed organisation had access to any of the completed questionnaires, whether manually or electronically completed, herewith guaranteeing anonymity. E-mail addresses were deleted.
The final sample that was extracted from the larger database was determined by the absence or presence of criterion variables which were needed to investigate the psychometric qualities of the SNAQ. Thus, samples were only included if job satisfaction, need for recovery, self-perceived health, somatic symptoms, sickness absenteeism, and presenteeism were surveyed together with the SNAQ. The final sample consisted of 7790 employees from 38 organisations distributed across sectors as follows: industrial, e.g. manufacturing and distribution (N = 18); private service sector (N = 12); local or centralised public service providers (N = 4); health sector (N = 2); and railway sector (N = 1), school (N = 1). Forty percent of the sample was at least 45 years old. Female respondents (37%) were underrepresented. Approximately one out of four respondents had supervisory responsibilities. Two-thirds of the respondents had a tenure of 15 or more years in their current organisation. Half of the sample worked during the day, 1 out of 5 respondents worked in a fixed day shift, approximately 5% worked in night shifts, 6% worked in irregular shifts, and the remainder had other types of work arrangements. Almost 9 out of 10 respondents had a fixed contract, approximately 1 out of 10 had a temporary position, and the remainder had another type of contract. Despite the heterogeneity of the sample, it is not representative for the Belgian workforce as a whole due to the overrepresentation of men, private sector employees, supervisors and respondents with a higher education, and due to the underrepresentation of employees working only during regular office hours.

Measures
For the investigation of the criterion validity, we used both single-item measures and scales. Self-perceived health was measured with six items from the Questionnaire for Older Employees (Hellemans, 2013) using a five-point response scale ranging from "strongly disagree" to "strongly agree." Examples of items are: "I seem to get sick easier than others" and "My daily life is hindered by my health." Cronbach's α was .78.
Somatic symptoms were measured with 10 dichotomous items from the Flemish Work Monitor (Bourdeaud'hui, Janssens, & Vanderhaege, 2004). The format of the question was as follows : "During the last two weeks, did you suffer from … " for example, "neck or shoulder pain?," "headache?" or "pain in your chest or heart region?" Cronbach's α was .77.
A single item measured the evaluation of the respondents' general health by means of the following question: "How would you rate your general health over the past two weeks?" (Bourdeaud'hui et al., 2004). The response scale ranged from 1 (excellent) to 5 (bad).
Two items were used to measure sickness absenteeism: "In the past twelve months, how many times did you stay at home due to illness or an accident? (Parental leave is not considered an illness)" (response categories ranged from "not once" to "five times or more") and "In the past twelve months, how many days did you stay at home due to illness or an accident?" Presenteeism was measured with a single item: "In the past twelve months, how many times did you go to work while you should have stayed home for health reasons?" (response categories ranged from "not once" to "five times or more") (Bourdeaud'hui et al., 2004).
Both recovery need and job satisfaction were measured with five items each . Job satisfaction was measured with items such as "I dread going to work" (reversed coded) and "I'm pleased to start my day's work." Recovery need was measured with items such as "I find it difficult to relax at the end of a working day" and "Because of my job, at the end of the working day I feel absolutely exhausted" The response scale was dichotomous: "yes" -"no." Cronbach's α of the job satisfaction scale was .80 and that of recovery need was .78.

Statistical considerations
Previous research on workplace bullying has mainly relied on standard linear regression techniques. However, this kind of approach has some significant statistical limitations that may hamper the validity of the reported findings. The first limitation concerns the response scale of the indicators in these measures. The response anchors often express a frequency of exposure like the following: "never," "occasionally," "often" ("monthly"), "weekly," and "always" (or "daily"). Strictly speaking, such response anchors do not constitute an interval scale but should rather be treated as ordinal ones. A second limitation consists of measuring responses using a frequency count (Hershcovis & Reich, 2013). This assumes that all incidents are equal in severity and interpretation, whereas it is reasonable to assume that different type of behaviours may have different consequences (Hershcovis & Reich, 2013;Hoel, Faragher, & Cooper, 2004). Third, the validity of the conclusions drawn from previous studies may be threatened. Conclusion or statistical validity is the degree to which conclusions that we reach about relationships in our data are reasonable (Trochim, 2000). Most of the earlier investigations have relied on statistical techniques that assume normally distributed datathat is to say, that the population from which they are drawn would be distributed according to a "normal" or "bell-shaped" curve. If this assumption is not true, one is likely to obtain an incorrect estimate of the true relationship (Li, 2016;Trochim, 2000;Vermunt & Magidson, 2005). In the current study, the average of the skewness of the items was 1.97 whereas the kurtosis was on average 4.54. That the distribution of the prevalence of bullying is very skewed is not an exception because the average in bullying studies is often around 1.5, using a one-to five-point rating scale. Moreover, the average standard deviation tends to be rather small as well, i.e. lower than 0.60. Vermunt and Magidson (2005) proposed the use of Latent Class Cluster (LCC) and Latent Class Factor (LCF) analysis for such data. LC models are suitable for several reasons. First LC analysis does not depend strongly upon distributional assumptions . Hence, LC analysis can deal with the fact that the indicators for workplace bullying are highly skewed. Second, LC analysis can also deal with count, continuous, interval, ordinal, and nominal measures. Third, LC analysis can also take into account the fact that item properties, such as item difficulty and discriminatory power of items, may diverge (Vermunt, 2001). Finally, because an LC model describes a measurement model with a categorical latent variable, it also suits the purpose of identifying victims .

Establishing fit of LCC or factor models
Evaluating the fit of LC models is not straightforward. There are many possible indicators of fit and rules of thumb that should be taken into account. For model selection, the Bayesian Information Criterion (BIC) is used most often. Among others, McCutcheon (1987) and Hagenaars (1990) suggested to select the model with the lowest BIC. After selecting a specific model, it is assessed whether it fits to the data. A model that does not fit to the data has a significant squared log-likelihood (L 2 ). However, for very sparse tables such as the ones we have, Langeheine, Pannekoek, and Van de Pol (1996) suggested a bootstrapping procedure. In addition to statistical fit measures, it is also important to inspect local fit. A rather important piece of information to evaluate local fit or misfit and its origin, are the bivariate residuals (BVRs). The BVR show how much association between each pair of indicators remains, using the 1-cluster model as a reference. Ideally, the BVR value should be lower than 3.84, being a value which corresponds to a significant χ 2 with 1 degree of freedom (Statical Innovations, 2013). However, as the L 2 follows a χ 2 distribution, the BVR is also quite sensitive for large sizes. Therefore, we suggest using a more relative threshold, where the reduction of the BVR should be at least 90% . To identify the origin(s) of misfit, Uebersax (2009) invited researchers to not only closely inspect the BVR's of the model with the lowest BIC but also the difference between the BVR of different estimated LC models. This allows researchers to identify whether an additional class is the mere result of the residual associations between only a few indicators, a situation that according to Uebersax must be avoided. We are not blind to this advice, because the large sample size in our study may hamper both the power of BIC and L 2 for selecting the most appropriate model (Paas, 2014). Indeed, the proper use of these statistical fit measures has only been illustrated for samples with a maximum of 500 respondents, which "leaves big data in the rain" (Paas, 2014). Finally, one assesses the quality of the classification. Here R 2 , entropy R 2 , and the total rate of classification errors, due to adjacent erroneous classifications, are indicators of (mis)classification. Table 1 gives an overview of the different measurement models that were estimated with Latent Gold 5 and their respective fit measures.

Model selection
According to the BIC, the LC model with the largest amount of classes fitted the data better. The bootstrapped p-value of L 2 was in all of the nine simple LC models significant. The Proportion Reduction of Error (PRE) measure showed that adding more than 4 clusters to the model led to less than 1.5% increase reduction of error. The entropy R 2 showed a large decrease in comparison with the 1, 2, and 3 cluster model. Yet, it was still close to the 0.7 rule of thumb in the 4-LC model. However, entropy R 2 of the model with 5 latent classes was further away from the threshold value. The total amount or erroneous adjacent classification in the last column of the model with 5 latent classes was almost 20%. Overall, the fit statistics appeared to be rather inconclusive to select an appropriate model. Therefore, the BVRs and the classification should be inspected. The 3-LC solution explained almost all bivariate associations of the 1-LC solution, for more than 95%. Only the initial association between "Someone withholding information which affects your performance" and "Persistent criticism of your work and effort" was rather poorly explained by the 3-LC solution (89.5%). A further inspection of the BVR yielded that some L 2 were rather high. The BVR between the two mentioned items was high (L 2 = 130; df = 1). In addition, other BVRs were also high, to mention some other outcomes: for the BVR between "Having insulting or offensive remarks made about your person, attitudes or your private live" and "Being shouted at or being a target of spontaneous rage" (L 2 = 95; df = 1), for the BVR between "Being ignored or excluded" and "Being ignored or facing a hostile reaction when you approach" (L 2 = 72; df = 1), and for the BRV between "Being ignored or facing a hostile reaction when you approach" and "Persistent criticism of your work and effort" (L 2 = 84; df = 1).
In the 4-LC solution most BVRs decreased. Compared to the initial associations in the 1-LC model, where no difference was made between persons, the 4-LC model explained 27 of the 36 associations, for 99% or more, whereas the 3-LC model explained 20 out of 36 associations, for 99% or more. However, the aforementioned BVRs seemed still rather high in the 4-LC model. The model with 5 latent classes explained all but four associations, for more than 99%. Adding a sixth class could not explain much more given that five latent classes explained almost all initial association, for 99% or more.
Since the high BVRs disappeared in the 5-LC model, it may be possible that only some negative social behaviours were related to the new class, making the latter a statistical artefact. Uebersax (2009) described this scenario and suggested to relax the assumption of local independence.
To test the assumption of local independence, we allowed direct associations between five pairs of indicators for which the BVR of the 4-LC model was larger than 40 (L 2 = 40; df = 1). The last line of Table 2 portrays the fit statistics of this model and indicates that it not only had the lowest BIC but also had a bootstrapped p-value of 0.01. Hence, according to the BIC and the bootstrapping outcomes of the L 2 , this model was the most appropriate one. The alternative strategy to relax the local independence, that is, adding latent variables, is portrayed by the 2-and 3-LC confirmatory factor models. Both their BICs and BVRs were amongst the highest which indicates a deteriorated fit.

Meaning of the LCC
The conditional means and the cluster loadings are depicted in Table 2.The conditional means portray the average response across items given LC cluster. In the second column, one can find the "not bullied" respondents. Their average response to the negative social behaviours was "never" (average over all items equals 1.077). Almost 32% of the sample fell within this category of not being bullied. The second LC cluster, in the third column, refers to respondents for whom their average response was "never" (1.38). Yet, they showed a higher average to be occasionally confronted with "Someone withholding information which affects your performance," and "Persistent criticism of your work and effort" and "spreading gossip and rumours about you." Because the inspection of the conditional probabilities (which portrayed the probability that a respondent responded "never," "occasionally," "monthly," "weekly or more often" to a certain item) showed that also "Repeated reminders of your errors or mistakes" was more frequently reported, we suggest to label this LC cluster as the "infrequent criticism about your work" cluster which fitted the experience of approximately 41% of the respondents. The average response to negative social behaviours of the third LC cluster was 1.88. Because this is close to the second response category "occasionally" we labelled this group as "occasionally bullied," comprising 23.6% of the respondents in the current sample. The overall average of the final LC cluster appeared to be 2.9, which corresponds to the "monthly" response category. Because these respondents were frequently confronted with negative social behaviours, they were labelled as "severe targets." The prevalence of being severe targets of bullying amounts to 3.5% in this sample.

Criterion-related validity
After establishing the most appropriate measurement model, the LC classifications were exported to an SPSS file (see for another example: De Cuyper, Rigotti, De Witte, & Mohr, 2008). Thereafter, the criterion validity of the latent classes was assessed using an ANOVA analysis, and conducting a pair-wise Tukey HSD post hoc test to discern differences between the latent classes. The one-way analysis of variance shows that all results are significant (p < .001). Hence, the between-group variance is significantly higher than the within-group variances. Furthermore, the multi-comparison procedure using Tukey's HSD pair-wise difference test (recommended given the uneven prevalence of the four groups), showed that all exposure groups were significantly (p < .01) different from one another, except for the number of days of sickness absence in the last 12 months. The number of sickness absence days for the not bullied group and for those reporting infrequent criticism of their your work was similar. Table 3 also shows that all means are as expected: targets of severe bullying reported the highest score for health deterioration, showed the highest levels of presenteeism and need for recovery, and experienced the lowest level of job satisfaction.

Discussion
The aim of the current study was to test the psychometric qualities of the SNAQ. We established that four LCC were sufficient to describe the associations between the nine indicators of the measurement model. A detailed inspection of the posterior conditional probabilities indicated that this 4-LC model differentiated "severe targets of bullying" well from respondents that were "occasionally bullied" or "infrequently criticised about their work," and from respondents who did not report exposure to bullying. The analysis of variance and the post hoc pair-wise comparisons clearly indicated that the average scores for somatic symptoms, self-perceived health, sickness absenteeism, presenteeism, need for recovery, and job satisfaction were indeed significantly different for the four LCC. In fact, severe targets of bullying had the worst scores, followed by occasionally bullied respondents, respondents who were infrequently criticised about their work, and from not bullied respondents, respectively.
Like with the full 22-item version of the questionnaire (NAQ-R), LCA enables us to distinguish between different groups or classes who differ with respect to the frequency of the reported acts. In the "not bullied class," the average conditional probability to report never being exposed to negative social behaviours was close to 100%. The class labelled "infrequent criticism about your work" hardly reported exposure to any negative social behaviours that are not work-related. In this cluster, however, the work-related negative social behaviours were more frequently reported. The average of the three conditional probabilities to have been confronted "occasionally" with these behaviours was 0.41 whereas the similar average for the other items was 0.21. The cluster labelled "occasionally bullied" reported, on average, to be confronted occasionally to all the included negative social behaviours. Yet, their average conditional probability to have been never exposed to these negative acts was 0.368. This relatively high average conditional probability to respond "never" was due to the high conditional probability to never have been confronted with the items related to social isolation. The conditional probability to respond never was approximately 0.50 or higher for "Being ignored or excluded," "Being ignored or facing a hostile reaction when you approach" and "Practical jokes carried out by people you do not get along with." Only in the fourth LC, that is among the severe targets of bullying, these kinds of negative social behaviours were also frequently reported. For this group of respondents, the average conditional probability to respond "monthly of more often" to be confronted with these four specific items was 0.45. The average conditional probability to be confronted monthly or more often with the other negative behaviours was as high as 0.68.
The resemblance of these four clusters with the four clusters that are labelled in a similar manner in the original NAQ is strong Notelaers et al., 2011) (comparing the above-mentioned conditional probabilities with those presented in the 2006 and the 2011 studies demonstrates clearly the resemblance). Also, a comparison with the findings in the UK sample in the validation paper for the NAQ-R reconfirms the resemblance (Einarsen et al., 2009). Like with the full NAQ-R and the NAQ, social isolation is reported frequently (that is monthly of more often) in the severe targets' cluster. Hence, it seems that the latent class cluster model concords with the idea that there are three types of negative acts, that is person-oriented, workrelated, and negative acts that envisage social exclusion. Both in previous studies and in the current one, severe targets have the highest likelihood to report that they have been systematically exposed to these three types of behaviours. The outcomes regarding the other clusters (that is, rarely and occasional bullying) in the aforementioned studies are similar to the ones in this study: respondents reported elevated levels of exposure to work-related and person-oriented negative behaviours but hardly any or no exposure to the negative acts that envisage social isolation. However, compared to the outcomes of the LC analysis on the full/long version of the NAQ(R), the current analysis of the short version did not generate the two classes: of "limited negative encounters" and "work-related bullying" found when employing the long version. The 5-LC solution we mentioned in the Results' section appeared to have an extra cluster that matched the "limited negative encounters" label rather well. However, following Uebersax (2009), who explained his strategy to allow local dependencies between the residuals of indicators, we concluded that this cluster could be viewed as an artefact because it could be ascribed to the reduction in five BVRs when going from the 4-to the 5-LC model. Furthermore, unlike an LC approach to the full version of the NAQ Notelaers et al., 2011) the "work-related bullying" cluster did not clearly emerge in the current study. This may be a result of omitting negative acts of which others have argued that they may not be perceived as behaviours attributed to bullying but possibly considered acceptable and enacted within the managerial prerogative (Fevre et al., 2010;Ólafsson & Jóhannsdóttir, 2004).
Previously, bullying researchers have argued that there may exist different types of bullying at work. Brodsky (1976) Einarsen and Raknes (1997), the present LC solution revealed one cluster of respondents reporting exposure to work-related tasks on an infrequent basis ("Someone withholding information which affects your performance," "Persistent criticism of your work and effort," and "Repeated reminders of your errors or mistakes") and one cluster wherein work-related behaviours were reported on a frequent basis as were acts of personal derogation and social isolation (severe targets). From the behaviours reported by the four clusters, evidence of these three main forms of negative social behaviours was therefore found, yet not as sub-dimensions of a higherorder construct. Indeed both confirmatory LC factor models with two and three dimensions fitted the data less well. Moreover, the polychoric correlation between the factors was over 0.90. Finally, a second-order LCF model had even a higher BIC (BIC was 99303). This implies that bullying researchers should be careful with respect to differentiating between dimensions. There are indeed different types of negative social behaviours, yet, our results indicate that this does not mean that one can distinguish between different forms of bullying as such. Only severe targets have a high likelihood to experience them repeatedly. The latter coincides strongly with the definition of workplace bullying as repeated and systematic negative behaviour during a longer period of time , herewith supporting the validity of the SNAQ.

Limitations
In the current study, we have used a selection of Belgian organisations. The large sample was rather heterogeneous which contributes to the generalizability of the meaning of the LC solution. However, the sample is not representative for the Belgian workforce as a whole, which hampers the generalizability of the reported prevalence rate of workplace bullying. In addition, the reported properties of the SNAQ need to be further investigated in other samples as well. Given the use of the SNAQ in other countries, we hope that scholars report the psychometric properties in detail and hope that they collaborate for investigating the psychometric properties from a cross-cultural perspective. Moreover, future work using cross-national data is needed in order to investigate the generalizability across countries. Furthermore, it must be underlined that the data were cross-sectional which implies that one cannot safely conclude yet that bullying causes ill health as indicated by the study. Next, the criterion variables were all self-reported data which may evoke common-method variance. In the case of need for recovery, job satisfaction and self-perceived health, it may seem straightforward that the evaluation is "in the eye of the beholder" making it difficult to circumvent common-method bias. However, as earlier research has shown that health and sickness absenteeism can be measured using more objective data, future validation may profit from research designs that use official health registries, or information from occupational physicians or general medical practitioners to evaluate respondents' health.

Conclusion and practical implications
All in all, we argue that it is sound to claim that the current study shows systematic empirical and theoretical support for the validity of the SNAQ as a measure of exposure to severe and occasional bullying as opposed to no or less frequent exposure to negative social behaviours at work. The Short NAQ has similar properties as the full version of the NAQ. Like for the full version, different groups of respondents may be identified with respect to the type and the frequency of negative social behaviours. Moreover, similarly to the outcomes regarding the full version, both the occasionally bullied and the severe targets' LC clusters aligned with the theoretical definition of workplace bullying, being exposure to repeated and systematic negative behaviour. Whereas the severe targets reported, on average, monthly exposure to negative social behaviour including social isolation, the latter type of negative social behaviours was more likely to be absent among the occasionally bullied. These respondents experienced deteriorated health, appeared to be more absent, and disliked their job much more in comparison to the others. Still, the difference between severe targets and occasionally bullied was almost twice as large as the distance between occasionally bullied and the remaining two classes, with respect to the outcomes, which added to the usefulness of LC modelling for identifying severe targets.
The ability to distinguish between different types of groups in function of their exposure level to negative social behaviours serves both theory and practice. Researchers interested in testing hypotheses on the correlates of bullying may want to focus on severe targets by using the classification probability, as proposed in this contribution, or they may focus on people not bullied and compare them with other exposure groups. In this way, they may obtain more precise estimates for their research on the relationships between antecedents and consequences of bullying (for examples of such approaches see: Notelaers, Baillien, De Witte, Einarsen, & Vermunt, 2013;Vander Elst, Notelaers, & Skogstad, early online). Practitioners, in particular, those interested in risk assessment andcontrol, may use the LC cluster solution to define the need for and the scope of primary, secondary, and tertiary interventions (Einarsen et al., 2009;. When analysed together with covariates such as function and department, the SNAQ LC framework may assist in detailed assessing risk groups as well . Our findings are therefore in line with the aim of developing a short measure of workplace bullying to be employed in general surveys of psychosocial working environments and in (longitudinal) research projects, where space in questionnaires are sparse. We therefore see many uses of the SNAQ both in applied and scientific studies of working environment quality. However, this does not mean that the NAQ-R is obsolete. On the contrary, as there may be cross-cultural differences in both the prevalence and the nature of workplace bullying across countries, we would also argue that NAQ-R is the best alternative when exploring bullying in new countries or in very new industries or new contexts. Also, where researchers and practitioners suspect that organisational culture or subcultures may construe the form or the shape of workplace bullying, the NAQ-R seems to be a better choice as long the cross-cultural validity is not assessed yet.