Assessing pupils’ attitudes towards religious and worldview diversity – development and validation of a nuanced measurement instrument

ABSTRACT In this article, we outline the development and validation of an instrument for the assessment of attitudes towards religious diversity. The instrument uses four parallel scales, which evaluate attitudes towards Muslims, Christians, Jews and towards Non-religious people. Each scale is subdivided into eight parallel profile types and tests for the acceptance of six levels of social proximity. Drawing on a sample of 281 respondents (which include Muslims, Christians, Jews, Non-religious and Other) and employing the item response theory, we examine the reliability and validity of the instrument and present first results. We found that respondents discriminated greatly between the eight profile types and made clear distinctions with regard to the proximity of social relationships. Relative to these distinctions within each religion, differences between the four groups Muslims, Christians, Jews and Non-religious appeared minimal. These findings suggest that the results of previous research which indicate scepticism towards Islam despite general respect for religious diversity could be due to stereotyping.


Introduction
In the context of increasingly diverse societies, reliable data on young peoples' attitudes towards religious plurality are one of many desiderata. How do young people perceive religions and worldviews that differ from their own? How do they feel about the increasing diversity of religions and worldviews? On the one hand, studies suggest that young people plead for respect for all religions (Yendell 2016;Francis, Penny, and McKenna 2016) and appreciate the acknowledgement of religious diversity in Religious Education and in school in general (Ipgrave 2016). On the other hand, adolescents appear to be sceptical of religious plurality, especially with regard to Muslims (e.g. Yendell 2016;Brockett, Village, and Francis 2009). These findings are in accordance with studies that report on prejudice towards Muslims (Yendell and Friedrichs 2012;Pickel 2013;Hafez and Schmidt 2015;Halm and Sauer 2017) and on Muslim pupils' experiences of discrimination (e.g. Willems 2017).
The somewhat contradictory findings suggesting both respect and scepticism call for more finegrained analyses. Who or what is it exactly that pupils find easy to tolerate or, conversely, hard to accept? Which (real or assumed) circumstances can cause respect and openness to turn into reserve and rejection? Our study aimed at developing a measurement instrument able to gauge such nuances. In the following, we describe its multi-cycled development, examine evidence for its reliability and validity, and report first results.

The research construct
The study of attitudes is a major field in social psychology. Attitudes are conceptualised as latent constructs composed of cognitive, affective and behavioural components (e.g. Eagly and Chaiken 1993;Zanna and Rempel 1988). We focus on the affective component without asking whether respondents are willing to accept diversity but rather how comfortable they are with it. This focus was chosen for two reasons. First, because emotion is an essential dimension of attitude, especially when it comes to religion. Second, focusing on the affective component allowed us to limit response bias due to social desirability. When questioned on a cognitive basis, many people tend to profess an attitude of acceptance towards diversity in an effort to conform to a socially desirable standard of behaviour. Our approach was to frame the conversation in such a way as to limit this effect. A similar strategy was employed by the Eurobarometer 2018 (European Commission 2018).
As to 'religious and worldviews diversity', there is no one universally accepted definition of the term (Bertram-Troost 2011). In this study, we decided to focus on attitudes towards Muslims, Christians, Jews and Non-religious people, because these are the four groups that are numerically strongest and most frequently covered by the media within our intended regions of study, Europe and North America. This allowed us to work on the assumption that young people have formed attitudes towards members of the groups in question. To avoid the continuous use of the cumbersome phrase 'religious and worldview diversity', 'religious diversity' is used hereafter in a broad sense that includes non-religious worldviews such as agnosticism and atheism.
More specifically, we concentrate on the acceptance of social proximity as the content area. Our main question is this: What profile types do pupils feel comfortable interacting with, as neighbours and classmates or more intimately as friends or family members? Acceptance of social proximity is a classic theme in the debate over migration (Wark and Galliher 2007), which made it possible for us to build on previous research.

Item response theory
Item response theory (IRT)also known as latent trait theoryprovides a rich array of psychometric tools to measure latent variables such as competences and attitudes. It has become state of the art in education research, particularly in the domains of math and science education (e.g. Wu and Adams 2006;Konnemann, Asshoff, and Hammann 2016). Its application to religious education research however has been limited so far (Benner et al. 2011;Riegel and Kindermann 2017). Unlike classical test theoryalso known as 'true score theory'in which analyses are executed on the entire test, IRT is a probabilistic approach that focuses on individual items. IRT has a number of advantages that have been driving its increased use. First of all, there is the fact that unbiased item estimates may be obtained from nonrepresentative samples and that, correspondingly, person estimates are independent of the specific sample of items administered to the respondents (Embretson 1996;Ayala 2013;Siddiq, Gochyyev, and Wilson 2017).
The simplest model within the IRT framework is the Rasch model (Rasch 1960). Based on the Rasch model, a number of additional models have been developed such as the Rating Scale Model (Andrich 1978) and the Partial Credit Model (PCM) (Masters 1982), which can both be applied to polytomous data. Furthermore, the unidimensional models have been expanded into a variety of multidimensional models to account for constructs consisting of several factors (Siddiq, Gochyyev, and Wilson 2017). Which of the models is used depends on the structure of the instrument (e.g. dichotomous or polytomous responses), on the research interest (e.g. measurement or explanation) and on a comparison of model fit after running analyses with different models under consideration (Boeck and Wilson 2004).

Research question
The objective of our study was to establish a construct for attitudes towards religious diversity and to develop a corresponding measurement instrument. To this end, we addressed the following questions in particular: (1) What evidence can be presented to support the internal validity (e.g. reliability, item fit, internal structure) and the external validity of the instrument (e.g. correlation to background variables)? (2) Does the measurement instrument produce reliable and valid results for adolescents between 14 and 15 years of age, and between 16 and 18 years of age? (3) Is 'acceptance of religious diversity' a unidimensional or multidimensional construct? i.e. do people differ in their degree of acceptance towards Muslims, Christians, Jews and the Nonreligious? (4) How are attitudes towards each of the four groups (Muslims, Christians, Jews and the Nonreligious) related to background variables such as religious affiliation?

Multi-cyclical development of the measurement instrument
We employed the Construct Modeling approach suggested by Wilson (2005). Construct and measurement instrument were developed in multiple cycles. In a first cycle, we conducted qualitative interviews with pupils (age 14-15, n = 2) which included items from the survey by Francis, Penny, and McKenna (2016). The interviews indicated that the precision of the items would be decisive for the reliability and validity of the instrument. This is reflected in the following interview passage with a 14-year-old girl: The interview passage illustrates: The interviewee's responses were to a considerable degree determined by her interpretation of the items and not simply by her attitudes towards religious diversity. The phrasing of the item ('all religious groups', 'a Muslim') leads her to consider a number of different interpretations: world religions versus the flying spaghetti monster, open-minded versus restrictive Muslim.
In a second cycle, we developed a construct map, items and a scoring guide using expert panelling and cognitive think-aloud interviews (Hermisson 2017). We were able to draw on individual items from the study by Francis, Penny, and McKenna (2016, see the interview above) and from the survey by Halm and Sauer (2017), which included an item on the acceptance of Muslims as neighbours. The reported findings from the qualitative interviews prompted us to build a double differentiation into the structure of the items: (a) Instead of testing for the acceptance of 'a Muslim', a spectrum of 12 specific profiles was developed ranging from secular to fundamentalist with parallel profiles for Christians, Jews and the Non-religious. (b) In addition to 'neighbour', we included additional levels of social proximity such as friendships and family relationships to test where respondents draw the line.
In a third cycle, we conducted a pilot study (n = 114) to test the construct map and the measurement instrument with regard to its calibration, reliability and validity (Hermisson 2017). This included two think-alouds (age 14-15, n = 2) and four exit interviews (age 14-15, n = 2, age ≥25, n = 2).
In a fourth cycle, we revised the construct map on the basis of the outcomes of the pilot study, eliminated redundant items and items with poor fit. Four additional items were developed to close a calibration gap.

Item design: 32 profiles and 6 levels of social proximity
The item design process resulted in: (a) a spectrum of 4 × 8 profiles types and (b) six levels of social proximity.
(a) The spectrum of 4 × 8 profile types is presented in Table 1 The four items on each of the eight levels were constructed as closely parallel as possible and often differ only with regard to stated religious affiliation (Muslim, Christian, Jew or Non-religious). This is also true for items that needed adaptation for specific religions (in particular, regarding A non-religious person who doesn't bother about belief or non-belief religious clothing, Level 4). The items were developed, tested and revised with a view to the optimal calibration of the instrument. This means that the spectrum was designed to cover the full range of item difficulty. We hypothesised that item difficulty would increase from each level (1-8) to the next for each of the four scales.
(b) Regarding the levels of social proximity, we drew on the Bogardus Social Distance Scale (Wark and Galliher 2007), a well-established scale in the social sciences. Since the Bogardus Scale stems from an era of patriarchal family structures (it was created in 1924) and was designed for adult males, we adapted it for postmodern family structures, including patchwork families, and reformulated it for young people as respondents. Note that the original Bogardus Scale as well as our adaptation implies a Guttman scaling approach (Guttman 1944(Guttman , 1950Wilson and Gochyyev 2013). Guttman's scalogram is based on the assumption that a respondent who endorses a more extreme statement will also endorse all less extreme ones (e.g. a respondent who feels comfortable accepting a 'Muslim who strictly follows Islamic teaching' (m5) as a 'close friend', will also feel comfortable accepting him/her on a less intimate level (e.g. as a neighbour). Designing an instrument that conforms to such a strict framework has advantages for the statistical analysis, while requiring careful item and item-category specifications. This way the instrument allows for polytomous scoring of the items rather than five successive dichotomous scorings for each individual level of social proximity. In the Rasch modelling approach used in Construct Modelling (Wilson 2005), this structure is softened to a probability.
The empirical data from the pilot study (n = 114) indicated that the Guttman scale applied to four of the six levels of social proximity, while the other two levels (family relationship vs. close friendship) did not display a clear hierarchical order. Therefore, we revised the levels and the scoring as to better conform to the Guttman structure (see Table 2). That this in fact worked was confirmed by the present study.

The questionnaire
The questionnaire contained 32 items employing the 4 × 8 profile types of Muslims, Christians, Jews and Non-religious specified in Table 2. For each individual profile, respondents were No acceptance of religious diversity on any level of social proximity No acceptance of religious diversity on any level of social proximity None of the above 0 asked to indicate whether they felt comfortable having the person as: a fellow citizen, neighbour or classmate, close friend, immediate family member, (future) spouse or partner, or none of the above. The formulation 'I feel comfortable' was used to gauge the affective dimension of the respondents' attitudes as discussed in Section 2.1. Table 3 presents an excerpt of the questionnaire. Both an online version and a paper-and-pencil version of the questionnaire were developed. Note that, when a respondent chose more than one category, the highest level of proximity was given the resulting score.

Independent variables
The background questionnaire consisted of five additional questions regarding age, gender, religious affiliation (Muslim, Christian, Jew, Buddhist, Hindu, Non-religious or Other), personal importance of one's own religion/non-religious worldviews (ranging from 'not important' to 'very important') and frequency of contacts with Muslims, Christians, Jews and Non-religious, respectively (ranging from 'a lot' to 'not at all').

Multidimensional and polytomous item response modelling
We accounted for the polytomous nature of the responses by applying the PCM (Masters 1982). To model the four correlated profile groups, we used the multidimensional version of the PCM. The Multidimensional Random Coefficients Multinomial Logit Model (MRCML) was proposed by Adams, Wilson, & Wang (1997) as the variant of the multidimensional IRT model in the Rasch modelling framework. MPlus 8 (Muthén andMuthén 1998-2017) and ConQuest 3 (Adams, Wu, and Wilson 2012) were used for the estimation of all of the measurement models presented in the paper. (1) I feel comfortable having a Christian who is not practicing his/her religion (e.g. doesn't pray, doesn't go to church) as a ◽ fellow citizen ◽ neighbour or classmate ◽ close friend ◽ immediate family member (e.g. stepsister, brother-in-law) ◽ (future) spouse or partner ◽ none of the above.

Latent regression
We compared attitudes towards the four groups (Muslims, Christian, etc.) across background variables (e.g. gender, age, etc.) and used the so-called latent regression or item response model with manifest predictors (Mislevy 1987;Verhelst and Eggen 1989;Zwinderman 1991Zwinderman , 1997 to account for measurement errors in the latent variables when investigating the relationship to the background variables. We thus regressed the actual latent variable itself within the model. This approach is a better alternative to regressing predictions (e.g. EAP scores) of the four latent variables on background variables (Adams, Wilson, & Wang 1997).

Evidence of internal validity
We applied the unidimensional PCM and the four-dimensional PCM to examine model fit and to test for the dimensionality of the instrument. As indicated in Table 4 below, the difference in deviances between the two models is 348 (16,055-15,707) with 12 (14-2) degrees of freedom, which is statistically significant at 0.005 (based on a conservative test described by Rabe-Hesketh and Skrondal 2005). The difference in deviance implies that the four-dimensional PCM fits significantly better than the unidimensional PCM. This is also supported by the Akaike Information Criterion (AIC). The number of parameters does not include the item parameters which were fixed ('anchored') to the values obtained from the dimensional alignment procedures (Schwartz, Ayers, and Wilson 2017). The parameters in the unidimensional model are the mean and variance for the single dimension. The parameters in the four-dimensional model are four means, four variances and six covariances between the four dimensions. The correlation among the four dimensions (i.e. attitudes towards Muslims, Christians, Jews, and Non-religious) is highest for Christians and Jews (estimated at 0.979) and lowest for attitudes towards Muslims and Non-religious (estimated at 0.685) (see Appendix - Table A1).
In order to provide evidence for the internal validity of the instrument, we investigated the internal consistency of the instrument (see Table 5). For the four-dimensional model, our findings show dimension-specific reliabilities of 0.95, 0.93, 0.98 and 0.95 for attitudes towards Muslims, Christians, Jews and Non-religious, respectively, and an overall reliability of 0.91. The type of reliability reported here is the Expected A Posteriori (EAP) reliability (Adams, Wu, and Wilson 2012) which is commonly used in the IRT framework. EAP and Cronbach's alpha of 0.91 (Cronbach 1951) indicate a high reliability of the instrument, both for the overall instrument and for each of the four individual scales. The item fit statistics suggest that all the items fit reasonably well within the usual range. To check whether the items are well aligned with the multidimensional Rasch model, the weighted mean square fit statistics are estimated for each item. Note that item and item-step parameters were fixed for the dimensional alignment of the four scales. More specifically, we used the deltadimensional alignment (DDA) technique (Schwartz 2012), since each of the four factors has its own scale in the four-dimensional Rasch model. For the multidimensional model, we found that all of the 32 items have less than 4/3 (1.33) weighted mean square fit used as acceptable upper bound (Adams and Khoo 1996) with only one of the 128 item steps falling outside (see Table 5).
Finally, the empirical results are in accordance with the hypothesised structure of the instrument. We had hypothesised the difficulty of the profile types to increase from each level (1-8) to the next for each of the four scales (Muslims, Christians, Jews, Non-religious). Overall, the empirical results confirm this hypothesis. The values presented in Table 6 indicate the estimated item difficulties for each of the 32 items. Low values indicate easy items. For each of the scales, item difficulty increases from level to level. Consider, e.g. the Muslims scale: The estimated difficulty of m1 (= Muslim item on Level 1) is lower than m2, m2 is easier than m3 etc. Only 5 out of the 32 items breach the hypothesised structure (j3, n2, m7, c7, j7).
The fact that the empirical results are, for the most part, in accordance with the hypothesised internal structure of the instrument can be considered further evidence of its validity.
The visual representation presented in Figure 1 provides the cumulative category characteristic curves for the response choices (increasing levels of social proximity). The curves are averaged across all items and show the probability of selecting a particular category or higher (i.e. '[fellow citizen] OR higher', '[neighbour or classmate] OR higher' and so on). Note that selecting '[none of the above] OR higher' implies any category, hence the curve for this is not shown (the probability is one). Regarding the six levels of social proximity, we found that the thresholds from one level to the next are fairly equally distributed for Levels 1-5 (from 'none of the above' to 'close friend and family member'). However, the threshold to Level 6 (to (future) spouse or partner) appears to be much harder to cross, as illustrated by the curve being further apart, and has less discriminatory power as illustrated by the flatness of the curve.
This anomaly could be a valid result in accordance with the construct as intimate personal relationships accentuate differences. However, exit interviews indicated that other reasons could play a role as well. One respondent in particular expressed that she preferred to remain celibate. Therefore, in order not to potentially compromise the validity of our instrument, we collapsed the two levels '(future) spouse or partner' and 'close friend and family member' for the analysis reported below. The spouse/ partner category yields relevant insights that are worth exploring elsewhere nonetheless.

External validity evidence
The validity of the measurement instrument is also supported by correlation to the background variable 'religious affiliation'. We had hypothesised that respondents would favour relationships with others from the same religious background. As discussed in detail in Section 4.5, this was confirmed in the analysis of the data. Therefore, the background variable 'religious affiliation' is external evidence for the validity of the instrument.

Does the instrument work for adolescents?
Since the results reported so far are all based on a sample consisting of respondents across all ages, the question of how well the measurement instrument works for adolescents is yet to be examined. Both evidences based on the response process as well as on the correlation with the background variable 'age' suggest that the reliability and validity of the instrument are high for adolescents aged 14 years and older. First of all, the instrument was geared to include this age group and adolescents were involved in the formative phase of the instrument (see above Section 3.1). Second, all of the five thinkalouds and two of the five exit interviews were conducted with adolescents (all age 14-15) and indicated that young people are well capable of understanding the structure and content of the questionnaire and the individual items. This is confirmed, finally, by person-fit statistics, which assess the validity of the instrument at the individual level and are based on the consistency of an individual's item response pattern (Embretson and Reise 2000, 238). Checking for person fit allows us to identify individuals with aberrant response patterns that are e.g. due to misunderstandings of the questionnaire or individual items. Person-fit statistics (Table 7) indicated that all age groups, including adolescents age 14-15 and 16-18 years, exhibited fairly consistent response patterns and are well within the expected range of less than 4/3 (1.33).

Capability of the measurement instrument
The development process resulted in a dense and complex measurement instrument. For any given sample, it allows answers to a large variety of specific research questions surrounding religious diversity. These include the following:

Individual items of a scale
What do respondents perceive as 'easy', what as 'difficult' to accept? E.g. is it easier to accept a Christian who practices only in private than a Christian who talks about his/her faith in public?

Classical item discrimination
Which items are most sensitive for measuring respondents' attitudes?

Parallel items of the four scales
E.g. what is easier to accept: a hijab, a kippah, a 'Jesus saves'-T-shirt or an 'Atheist'-T-shirt?, a Christian or an Atheist eager to convert?

Difficulty of the four scales
How comfortable are respondents with Muslims, Christians, Jews and the Non-religious in general? Is there a particular group that is discriminated against?

External variable religious affiliation
How do respondents with affiliations to individual religions or worldviews relate to religious diversity and to each one of the four groups (Muslims, Christians, Jews and Non-religious)? Is there a tendency for Christian anti-Semitism, Muslim anti-Semitism or Jewish Islamophobia?

External variables of age, gender and frequency of contact
Are young respondents more comfortable with religious diversity than older respondents? Are there gender-specific differences in the attitudes towards Muslims, Christians, Jews and the Nonreligious or in the acceptance of single items (e.g. hijab)? Is personal contact correlated with a higher level of acceptance, as Allport's contact hypothesis (1954) postulates? Due to limited space, we will only demonstrate the use of the instrument by presenting a few of our findings. Detailed results will be given elsewhere. We focus on three issues: (a) on item and scale difficulty, (b) on the external variable 'religious affiliation' and its effect on acceptance and (c) on the religious clothing question.

Selected findings
The Wright map in Figure 2 provides a visual representation of many of our main findings. The left side of the Wright map shows the respondents' estimated mean level of acceptance for each of the four scales. Each 'X' represents 2.8 respondents. The most accepting respondents are placed at the top; the least accepting respondents are placed at the bottom. The right side of the map indicates the estimated mean difficulty for each of the 32 items. The map indicates the following:

Not 'the' Muslim or 'the' Christian but considerable diversification
Respondents varied greatly in their acceptance of the eight profile levels, ranging roughly from -2 to +4 logits. While most respondents felt comfortable having close relationships with secular Muslims, Christians, etc. (m2, c2, j2, n2), they preferred not to interact with Muslims, Christians, etc. who are against religious freedom (m8, c8, j8, n8).

Tipping point
The large gap between the items of Level 5 ('strictly following Muslim/Christian etc. teaching') and Level 6 ('arguing that any other faith or worldview is wrong') indicate that respect for other faiths and worldviews or a lack thereof marks a bifurcation. Acceptance diminished greatly when other belief systems were demeaned (m6, c6, j6, n6). This gap persisted even though we had aimed for a balanced calibration and revised the items accordingly after the pilot study.

Equal difficulties of the four scales
We had anticipated divergent difficulties of the four scales (attitudes towards Muslim, Christian etc.). In the Wright map, this would have resulted in a lower distribution of the 'XXX's, say for the Muslim scale. However, contrary to our expectations, none of the four groups were perceived as significantly harder to accept by the respondents as a whole. This lack of discrimination is elaborated on in greater detail by the mean estimates presented in Table 8. The values indicate that the maximum difference in difficulty between the individual scales is less than 0.2 logits. None of these are statistically significant.

Similar difficulties of parallel items
This lack of discrimination applies equally to the level of individual items represented on the right side of the Wright map. See, e.g. the four most difficult items on the top: The estimated conditional difficulty of accepting someone who is against religious freedom does not differ greatly whether the person is Muslim (m8), Christian (c8), Jew (j8) or Non-religious (n8).

Hijab easier to accept than 'Jesus saves'-or 'Atheist'-T-shirt
While respondents found most parallel items similarly easy or difficult to accept, there are exceptions. This is particularly true for the clothing item which is the least parallel of the 8-item levels for substantive reasons. As the Wright map and Table 7 indicate, respondents felt more comfortable accepting a kippah (j4, −1.588 logits) or a hijab (m4, −1.534 logits) than a 'Jesus saves'-T-shirt (c4, −0.979 logits) or an 'Atheist'-T-shirt (n4, −0.8111). Thus, even though all four pieces of clothing display personal beliefs, the two rooted in traditional religious practice enjoy higher degrees of acceptance. This is not only due to favourable ratings received from the own group, but also from the other groups. Additional findings relate to the background variable of 'religious affiliation'. Table 8 below shows the difference in attitudes towards each of the four groups Muslims, Christians, Jews and Non-religious regressed on the respondents' self-identification. Note that we ran a latent regression model, in which we controlled for age and gender but that those results are not indicative of significant differences (see Appendix - Table A2). We therefore report the values from Table 9, because here the estimates of the reference category are more directly interpretable as discussed in the comments below. The higher the value of a given group of respondents for any give scale, the more accepting these respondents are towards members of this religion.
The values indicate: 4.5.6. Preference for one's own tribe As discussed in Section 4.2, respondents accepted closer social relations with people from their own religious backgroundeven when intimate relationships were disregarded. However, the differences that respondents made between each of the four groups are not very large in some cases (e.g. a maximum of 0.32 logits for Christian respondents, Row 2) whereas others are more pronounced (e.g. a maximum of 1.12 logits for Non-religious respondents, Row 4). Only Jewish respondents slightly favoured Non-religious profiles over Jewish ones. This is most likely due to the fact that some of the Jewish respondents from a Reformed Jewish congregation identify as cultural Jews and Atheists and proved to be fairly reserved towards the most extreme Jewish profiles.

Equality for all
Respondents as a whole were fairly balanced in their acceptance of religions and worldviews other than their own apart from the own group and, by and large, did not discriminate between Muslim, Christian, Jewish or Non-religious otherness. Consequently, no group was at a general disadvantage. We found neither Muslim nor Christian anti-Semitic tendencies nor Christian or Jewish anti-Muslim tendencies. Muslim respondents, while fairly reserved towards the three other groups, made no significant difference in their acceptance of Christians (−0.47) versus Jews (−0.56). Christian respondents differed only slightly in their acceptance of Jews (0.37) versus of Christians (0.45) and were more open to social interaction with Jews from the whole spectrum of Jewish profiles (0.37) than Jewish respondents (0.11). Similarly, we found neither Christian nor Jewish anti-Muslim tendencies: Christian respondents were found to be equally accepting of Muslims (0.14) as of the Non-religious (0.13). Jewish respondents did not differ significantly in their acceptance of Muslims (−0.37) or of Christians (−0.36). Non-religious respondents differed most pronouncedly in their acceptance of their own group (0.82) versus the other groups (−0.30, −0.27, −0.08). In contrast, the distinctions they drew between these three religious groups are minor. The estimates presented in Table A1 also suggest that greatest divide is not between the different world religions but rather between religious and non-religious groups. This conclusion is also supported by findings on individual items.

Conclusion
We have developed a nuanced and versatile instrument, using item response theory as the psychometric approach, to measure attitudes towards religious and worldview diversity. In comparison to previous research, the instrument allows for a more fine-grained analysis. This was accomplished by testing for the acceptance of a spectrum of eight specific social relationship profiles of Muslims with parallel profiles for Christians, Jews and Non-religious across six levels of social proximity. Internal evidence (internal consistency, item fit statistics and the empirical confirmation of the hypothesised structure) as well as external evidence (the correlation to the background variable 'religious affiliation') provide evidence for the reliability and validity of the developed instrument. This also holds true for adolescent respondents (from the age of 14 years).
We found that the four-dimensional PCM is a statistically significant better fit than its unidimensional version, with acceptance of the Non-religious to acceptance of Muslims having the lowest correlations (at 0.69), indicating that 'acceptance of religious diversity' is a multidimensional construct. In all four groups, respondents accepted closer social proximity with people from their own religious background. However, respondents did not discriminate significantly among the three groups other than their own. That is, the differences discovered between the four groups Muslims, Christians, Jews and Non-religious were minor. No religious group was at a general disadvantageeven with regard to the often-disputed hijab.
In contrast, respondents discriminated strongly between the eight profile types of each scale and made clear distinctions with regard to social relationships. They specifically indicated noticeable reserve towards profile types that lack respect for other faiths or worldviews. Our data thus suggests that the distinction between the eight profile types is highly relevant.
Previous research indicated that young people, even though they profess respect for all religions (Yendell 2016;Francis, Penny, and McKenna 2016), are sceptical about religious plurality, especially with regard to Muslims (e.g. Yendell 2016;Brockett, Village, and Francis 2009). The fact that we did not discover comparable levels of discrimination towards Muslims could be due to the nature of the sample, since the data was collected in a region where diversity is valued. Comparing these results to a European sample in the future may prove interesting. However, the data suggest a more interesting hypothesis. The results of previous research may be explained through correlation of a particular religion (e.g. Islam) with a distinct profile type (e.g. a Muslim strictly following Muslim teaching, m5) due to stereotyping. Given the considerable differences that respondents made between the eight profile types within a particular religion, even a weak correlation due to stereotyping can have significant effect. Testing this interpretation is a worthwhile avenue for further research, for which the instrument provides a necessary tool.

Disclosure statement
No potential conflict of interest was reported by the authors.

Notes on contributors
Sabine Hermisson is an assistant professor at the Institute for Religious Education, Protestant Theological Faculty, University of Vienna, Austria. Her research interests include attitude research, religious diversity, theology and natural science, qualitative and quantitative methods. In 2016/2017, she was a visiting scholar at the Pacific School of Religion and the Center for Theology and Natural Science, Berkeley.
Perman Gochyyev is a behavioural statistician and a project manager at the Graduate School of Education University of California, Berkeley, and at the Berkeley Evaluation and Assessment Research (BEAR) Center. His research areas include psychometric and multilevel modelling, and evaluation. Prior joining BEAR Center, he was the Manager of Research and Analytics at Medallia's CX Strategy Research group and a research associate at WestEd.

Mark
Wilson is a professor at the Graduate School of Education University of California, Berkeley, and Director of the Berkeley Evaluation and Assessment Research (BEAR) Center. His work spans a range of issues in measurement and assessment from the development of new statistical models for analysing measurement data, to the development of new assessments in diverse subject matter areas such as science education, patient-reported outcomes and child development. Note. Attitudes are statistically different from the reference category "Muslims" at p < .05 (*), p < .01 (**) and p < .001 (***)