A Rasch and factor analysis of an Indonesian version of the Student Perception of Opportunity Competence Development (SPOCD) questionnaire

Abstract This study is the first to investigate and validate the psychometric properties of an Indonesian translation of the Student Performance Opportunity of Competence Development (SPOCD) questionnaire that covers five key competencies related to the mathematics learning process, namely: (1) thinking; (2) relating to others; (3) using language, symbols and text; (4) managing self, and; (5) participating and contributing. The SPOCD questionnaire comprises 30 items, with a 5-point Likert scale. A total of 1413 Indonesian high-school students (46.4% male, 53.6% female) from three regions (eastern, western and central Indonesia), aged from 16 to 19 years and ranging from the 10th to the 12th grade, completed the SPOCD questionnaire. A Rasch rating scale model (RSM) and confirmatory factor analysis were applied to evaluate the psychometric characteristics and factor structure of the SPOCD questionnaire. The results indicated that assumptions of unidimensionality, local independence and monotonicity of the SPOCD questionnaire were met. In the Rasch RSM assessment, the SPOCD questionnaire response set worked well, matched the threshold estimates and functioned as an appropriate model for the response categories. The SPOCD questionnaire showed excellent psychometric characteristics for use in measuring the development of competence in middle school students in Indonesia.


PUBLIC INTEREST STATEMENT
The research investigates the psychometric properties of the Indonesian translation of the Student Perception of Opportunity Competence Development (SPOCD) Questionnaire in a middle education context. This is done within the framework of Rasch measurement models and Confirmatory factor analysis. These models where chosen as they set strict standards for measurement quality. The study is the first validity study of these questionnaire. The findings from this study indicated that the items in the SPOCD questionnaire were generally adequate and appropriate for the measurement of opportunities for competence development in mathematics as perceived by Indonesian students. The findings from this study should help to facilitate use of the SPOCD questionnaire in future educational evaluation studies and encourage awareness, understanding and use of the scale among students, teachers, and decision-makers. Although the current study focused on the SPOCD questionnaire, the same method may be used to assess and validate other educational performance assessment scales. set worked well, matched the threshold estimates and functioned as an appropriate model for the response categories. The SPOCD questionnaire showed excellent psychometric characteristics for use in measuring the development of competence in middle school students in Indonesia.

Introduction
With a population of around 271 million people, Indonesia is the fourth largest country in terms of population in the world, on 17,000 islands with more than 200 ethnic groups speaking 300 different languages/dialects (Worldometers, 2019). Such a diverse pattern of population demographics closely relates to differences in student academic performance. The underperformance of Indonesian students in international assessment programs has become a cause for concern and provided an impetus to calls for major national education reform (Wihardini, 2016). One cause of this underperformance is low performance related to poor mathematical competencies found in large-scale assessments and in computer-based national exams for Indonesian students, where mathematics has been reported as being "too hard" from the student perspective (Swaragita, 2018;Zulkardi, Putri, & Wijaya, 2019). There is a high cost to low educational performance, especially in Indonesia, despite the efforts of so many involved in education to improve education performance.
Addressing deficiencies in mathematical competencies has become a growing concern of the Indonesian government, following the implementation of a competence-based curriculum in 2004 by the Indonesian Ministry of National Education. However, despite wishing to promote mathematical literacy, when the Indonesian government changed the educational curriculum in 2013, competency development in mathematics education was not subject to investigation, specifically when related to student perceptions in the context of the mathematics learning process. This situation reveals a shortcoming in the Indonesian educational curriculum, especially when compared to many other countries.
One interesting approach to competency development was put forward by the Ministry of Education in New Zealand concerning the New Zealand education curriculum (Ministry of Education of New Zealand, 2007), where five key competencies were identified, namely: thinking; using language, symbols, and texts; managing self; relating to others; and participating and contributing. These key competencies were not simply a New Zealand Ministry of Education proposal, but had been adapted from an Organization for Economic Cooperation and Development project, and many other countries have their own version, meaning there was already a rich research base and a wealth of practical ideas that New Zealand could draw on (Ministry of Education, New Zealand, 2015).
These five key student competencies are developed through the learning process when applying various learning strategies, learning models and assessment techniques that are appropriate to the subject matter. In the learning process, teachers organize the learning environment so that students can interact with each other, with teachers and with the learning resources. Students can discuss issues and conduct self-monitoring in order to be able to understand the mathematics material well. The quality of the learning process is key to improving student competency, and any classroom learning activity can help students learn and influence their competencies (Brown, McNamara, Hanley, & Jones, 1999).
Previous studies have also examined mathematical competencies, including these five competencies. Theorizing has occurred on how competence with written mathematical symbols develops (Hiebert, 1988). Mathematical thinking was explored within the context of the Scottish Curriculum for Excellence reform (Hudson, Henderson, & Hudson, 2014). More specifically, it has been claimed that mathematics self-efficacy can be thought of as part of one of the key five competencies, managing self, which is "associated with self-motivation, a 'can-do' attitude, and with students seeing themselves as capable learners" (Bonne & Lawes, 2016). How the competency "relating to others" can be applied has been explored among New Zealand students (Tait-McCutcheon, 2014). Furthermore, another study has analyzed the collective learning process occurring in the classroom in terms of the evolution of classroom mathematical practices (Cobb, Stephan, McClain, & Gravemeijer, 2001). These studies have illustrated the importance of certain key competencies in the mathematical context. However, these competencies, specifically the five key competencies previously identified, have never been measured in students in Indonesia especially in the context of mathematics learning, due to the unavailability of relevant instruments. The availability of appropriate and readily usable measuring instruments is very important because of Indonesia's vast geographical area and the distance between large cities and remote areas. It is also essential that any potential instruments can be used by mathematics teachers in Indonesia, which precludes the use of English-language instruments as teachers in Indonesia are generally not required to speak English, especially in remote areas.
Although this kind of measurement has never been undertaken with Indonesian students and there are few measurement instruments available, we identified an appropriate measurement tool that measures the same competencies as classified in the New Zealand Education curriculum, and also an appropriate context-based measure, for mathematics. To assess the development of student competence in terms of the five key competencies in learning mathematics, a Student Perception of Opportunity Competence Development (SPOCD) questionnaire was adopted for use. The five key competencies provided the scale factors of the SPOCD questionnaire for measuring students' perceptions of development opportunities concerning mathematics competence. Although various reports have discussed research using the SPOCD questionnaire instrument, the validity of the SPOCD questionnaire in evaluating the five key competencies has not been assessed. This study is the first to address this gap. We adapted the SPOCD questionnaire for use in the Indonesian language, and sought to validate the SPOCD questionnaire and to assess its factor structure, as well as its psychometric characteristics, for the first time.
The psychometric evaluation of this type of instrument has typically been conducted using classical test theory. However, within the psychological sciences and for educational measurement, there has been a growing shift to using modern test theory, such as item response theory and the Rasch measurement model (Kean, Brodke, Biber, & Gross, 2018). A Rasch measurement model allows for investigation of many aspects of a measure. These include the response format, the fit of individual items and persons, dimensionality, local independence testing and the use of Wright maps (Andrich & Marais, 2019). A key feature of this model is the notion of parameter separation for the person and item estimates, which allows users to compare persons and items directly without reference to each other (Wright, 1968;Wright & Masters, 1982). Furthermore, many studies, while using confirmatory factor analysis (CFA) to validate the factor structure of a scale (Flora & Flake, 2017), have also used Rasch analysis "as a contrasting statistical approach" (Allison, Baron-Cohen, Stone, & Muncer, 2015). Given that CFA allows for formal statistical tests of multiple aspects of hypothesized models and provides a strong empirical basis concerning the factor structure of a model (Brown, 2015), we considered that a combination of both methods would provide relevant information on the psychometric characteristics of the SPOCD questionnaire from differing perspectives.
Consequently, the aim of this study was to investigate the dimensionality or factor structure and measurement properties of the SPOCD questionnaire, using both modern psychometrics analysis, such as CFA, and item and sample independent models, such as Rasch models. As a result of this study, we hope that the SPOCD questionnaire may subsequently be used directly by teachers to measure student perceptions of their mathematics competence without concern for the validity, reliability and score interpretative quality of the SPOCD questionnaire.
1.1. The student performance opportunity of competence development (SPOCD) questionnaire The SPOCD questionnaire was developed at Curtin University, Australia, as an instrument to measure students' perceptions of competency development in the learning process of mathematics, although no publications have as yet resulted concerning its development. The SPOCD questionnaire covers 5 competencies, namely: (1) thinking: involving the use of creative and critical processes to make sense of information, experiences, and ideas; (2) relating to others: that is, listening actively, recognizing different points of view, negotiating and sharing ideas; (3) using language, symbols and text: concerning how students make meaning-how they express and communicate our ideas, experiences, and information using a rich mix of language, symbols and texts, including spoken and written language, visual language such as photos and video, the symbols used in mathematics and much more; (4) managing self: that is, being self-motivated, having a can-do attitude, and understanding oneself as a learner, and; (5) participating and contributing: including contributing in a group, making connections with others and creating opportunities for others in a group (Ministry of Education, New Zealand, 2007, 2015Tait-McCutcheon, 2014).
Each competency consists of 6 items, evaluated using a 5-point Likert scale (see Appendix A for item wording). The response options are almost never, seldom, sometimes, often and almost always. In this study, we assumed that all the SPOCD questionnaire items together would constitute a single construct, which we termed opportunity for competence development. This construct allowed for a bipolar dimension (less-more), with a single line representing a single (unidimensional) construct, with the relevant person located at some point along the line. This approach enables a global score from the SPOCD questionnaire to be produced that can be understood by mathematics teachers throughout Indonesia for use as an evaluation and diagnostic tool. Higher scores would indicate that the students perceived greater opportunities to develop competence in the mathematics learning context.
The SPOCD questionnaire was translated into Bahasa (an Indonesian language). The translation process complied with the standards set by International Test Commission Guidelines for Test Adaptation (International Test Commission, 2018). The original English version of the SPOCD questionnaire used in this study was translated into Bahasa by two qualified translators who are Indonesian and lecturers at our university. They are fluent in English and have post graduate degrees from Australian universities. The Bahasa version of the SPOCD questionnaire was then translated back into English by English native translators. A committee of experts, comprising the first author and two lecturers fluent in English with more than 15 years of teaching experience, reviewed the original English versions, the English back translation, and the Bahasa versions of the SPOCD questionnaire in order to confirm an agreed and acceptable translation of the SPOCD questionnaire. The questionnaires were in paper-and-pencil format. Participation was voluntary and the purpose of the study was stated in a letter that accompanied the questionnaires. Participating persons were required to sign the informed consent form, also attached to the questionnaire.

The rasch model
The probabilistic model proposed by Rasch (1960) revolutionized psychometrics (Mair, 2018). The Rasch model is an advanced measurement approach, which is able to overcome some limitations of classical test theory, such as a lack of control over the difficulty level of scale items and appropriate ordering of ordinal response categories (Mitchell-Parker, Medvedev, Krageloh, & Siegert, 2018). Although mathematically simple, it is extremely effective in terms of measurement, and can be used when constructing a scale that meets the highest standards of measurement (Mair, 2018). Rasch measurement models, such as the Rasch rating scale model (RSM) and the dichotomous Rasch model, are distinguished from other latent trait models by a fundamental statistical characteristic, namely, separable person and item parameters; thus providing a more complete set of statistical data. This feature makes possible "specifically objective" comparisons of persons and items, enabling comprehensive measurement (Masters & Wright, 1984).
The Rasch RSM is appropriate for assessing polytomous data using a rating scale, as employed in this study (Andrich, 1978). Where applicable, a response rating scale yields ordinal data that need to be transformed into an interval scale to be useful. A Rasch RSM generally follows the equation: where P nik is the probability that person n on encountering item i would be observed in category k, P ni kÀ1 ð Þ is the probability that person n would be in category k À 1, B n is the latent ability level of person n, D i is the difficulty of item i (difficulty for respondents to endorse the item) and F k is the probability of being observed in category k relative to category k À 1. In this study, estimates of item difficulty (D i ) and respondent trait level (B n ) were expressed on a scale of log odd ratios or logits. The average logit was arbitrarily set at 0, with positive logits indicating higher than average estimates and negative logits lower than average estimates. A threshold indicates the location on the trait level scale at which a respondent had a 50/50 chance of choosing a higher category than the current one (Luo et al., 2009).
A Rasch RSM requires tenable assumptions for accurate estimation, including (1) construct unidimensionality, (2) local independence, and (3) a monotonic scale (that is, higher scores refer to higher levels of the latent construct) (de Ayala, 2009). In addition, a Rasch RSM requires that the rating scale categories increase in line with endorsement difficulty, and that the thresholds for each item are ordered (DiStefano & Morgan, 2010).

Confirmatory factor analysis (CFA)
A CFA model specifies how the observed variables are related to the latent variables. CFA is one type of structural equation model and provides a powerful method for testing a variety of hypotheses about a set of measured variables (Flora & Curran, 2004). In this study, the factorial structure of the translated version of the SPOCD questionnaire was tested using CFA. We investigated three hypothesized models, namely, first-order, five-factor and higher-order models of the SPOCD questionnaire. Analysis was conducted with Mplus version 8.3 (Muthén & Muthén, 1998, using the weighted least squares mean and variance corrected estimator. Several indices for goodness-of-fit were used in this study to check the adequacy of the hypothesized models. The chi-square test were used to evaluate the individual model, and the standardized root mean squared residual (SRMR) and the root mean square error of approximation (RMSEA) (Browne & Cudeck, 1993) were used to estimate the lack of fit in the model compared with a perfect model. The larger the value the greater the misspecification, as an RMSEA value of less than 0.05 is considered indicative of an adequate fitting model and smaller values for the SRMR are associated with better fitting models, with scores below 0.05 considered as evidence of good fit (Maydeu-Olivares & Joe, 2014). Finally, an incremental fit index measure, namely, the comparative fit index (CFI) (Bentler, 1990) was also employed. The CFI enabled a comparison to be made of the fit between the specified model and a null model. For the CFI, a value of .90 is generally considered to indicate an acceptable model (Hu & Bentler, 1999).

Rasch analysis
An analysis was conducted employing the Rasch RSM to calibrate the SPOCD questionnaire items, using the Winsteps (v. 3.65) statistical package (Linacre, 2008). Person and item parameters were estimated using joint maximum likelihood estimation. In order to establish the psychometric properties of each of the subscales, the following analyses were performed: (1) an analysis of the dimensionality of each version to check for expected unidimensionality in the Rasch RSM using principal component analysis of the residuals (PCAR); (2) local independence assumption testing using the Q 3 statistic; (3) a verification of the fit of each of the questions to the model, taking into account the mean squares (MNSQ) statistics when using the Rasch RSM; (4) a verification of the adequacy of the response categories to establish the functioning of their order and their rating scale discrimination; (5) a verification of reliability for persons and items, and; (6) a construction of a graphical representation using a Wright map and data concerning the test information function of the scales.

Dimensionality
In testing the unidimensionality assumption of the measuring instrument used, PCAR was performed (Chou & Wang, 2010;Smith, 2002). The measurement model of the SPOCD questionnaire proved to be unidimensional. Therefore, the results of this analysis confirmed that the Rasch RSM assumption of unidimensionality had been met, and that further analysis was worthwhile. According to PCAR, it can be concluded that a test only measures a dimension when the minimum variance explained by the measure is >30% (Linacre, 1998). The test employed here showed a single dimension (unidimensionality), as values above 35.6% (16.6 in eigenvalues unit) of the variance explained by the measure were found. This finding is in line with the results concerning the factor structure following CFA analysis, as noted earlier.

Local independence
The Rasch RSM is based on the specification of "local independence". Local independence means that, for a given examinee or examinees of a given proficiency level, the performance on one item is independent of the performance on another item (Mair, 2018). After the assumption of unidimensionality was shown to have been met, the assumption of local independence was tested using the Q 3 statistic (Yen, 1984). When using Q 3 statistic index criteria in which it is specified that the raw residual correlation between pairs of items is never > 0.30 (Christensen, Makransky, & Horton, 2017; Das Nair, Moreton, & Lincoln, 2011) no items were found to have local dependence. The items that had the highest raw residual correlations were items 29 and 30, each at 0.28, which is < 0.30. In other words, the assumption of local independence in this study was met.

Item fit
Item fit measures such as infit and outfit MNSQ statistics can also be used to determine how well each item contributes to defining one common construct as evidence of scale unidimensionality. An infit or outfit MNSQ value of 1 is considered ideal according to Rasch RSM specifications, and values in the range of 0.5-1.5 are effective for measurement (Andrich & Marais, 2019;Bond & Fox, 2015).
All 30 items of the SPOCD questionnaire, using infit and outfit MNSQ statistics, were found to be in the acceptable range (0.5-1.5). In addition, the point measure correlation range for the SPOCD questionnaire ranged from 0.44 to 0.74, as set out in Table 2, indicating that all the items had moved in one direction (Bond & Fox, 2015). This result further supported the findings of the Rasch RSM in this study. In general, Rasch item-fit statistics supported the SPOCD questionnaire scale's unidimensionality.

Rating scale diagnostics
In practice, most rating scales tend to be unequally spaced instead of conforming to the assumption of equal spacing between points in a response set. Rasch measurement diagnostics were used to evaluate how well the SPOCD questionnaire's response set functioned to create an interpretable measure (Kim & Kyllonen, 2006). For each item, we examined the number of endorsements, the shape of the distribution of endorsements, and the MNSQ statistics of each response for the SPOCD questionnaire (Table 3). Table 3, the distribution of the observed frequencies was negatively skewed, with no more than 7% of the total endorsements falling into the first category (almost never). However, there was no threshold disordering because the thresholds were ordered from negative to positive. This conclusion was supported further with none of the infit and outfit MNSQ measures being greater than 2 (Linacre, 2010), indicating that no noise was introduced into the measurement process. Based on this information, we concluded that the SPOCD questionnaire's response set functioned well.

As presented in
Results of the analysis of the scale are presented graphically in Figure 1 to represent the category response function of the SPOCD questionnaire. The graph shows the recommended pattern, where each scale competency was linked to the most probable response at each ability level. In summary, the analysis indicated that the overall rating scale functioned appropriately. Note: Infit & outfit: a mean-square statistic. This is a chi-squared statistic divided by its degrees of freedom. Its expectation is 1.0. Values of 0.5-1.5 are productive of measurement.
PTMEA: an observed point-measure correlation. Negative reported PTMEA correlations suggest that the orientation of the scoring on the item may be opposite to the orientation of the latent variable.

Reliability
In the Rasch RSM, reliability is estimated for both persons and items. The person separation reliability (Wright & Masters, 1982), which estimates how well the instrument differentiates persons on the measured variable, was 0.91 for the SPOCD questionnaire. The person separation index for estimating the spread of persons on the measured variable was 3.17, expressed in standard error units. This value indicated good separation among persons. Reliability and separation for items, estimated in the same manner as for persons, were 0.99 and 10.56, respectively, indicating excellent psychometric characteristics for the SPOCD questionnaire. A separation index criterion of 1.5 is considered sufficient to carry out an individual level analysis and a magnitude of > 2.5 is considered sufficient to conduct a comparative analysis at the group level (Tennant & Conaghan, 2007); and these criteria were met in this study.

Wright map
To investigate the sixth research question, we evaluated the items in the SPOCD questionnaire to determine which behaviors were "harder" for students to endorse. Calibrated scores for both person and items in relation to the opportunity for mathematics competence development construct are provided in Figure 2, plotted using a Wright map (Wilson & Draney, 2002). A Wright map Note: Threshold: Rasch-Andrich thresholds. These are the relationships between adjacent categories, and correspond to the points where adjacent category probability curves cross.
Observed count and %: the count of occurrences of this category.
Infit & Outfit: a mean-square statistic. Its expectation is 1.0. Values of 0.5-1.5 are productive of measurement. combines basic construct map concepts with a Rasch RSM to produce an effective map for interpreting the results of measurements undertaken. In such a map, the overall results between persons and items can be easily compared. The Wright map depicting the results of the Indonesian version of SPOCD questionnaire can be seen in Figure 2.
From Figure 2, it is clear that the least endorsed item was 17: Pekerjaan yang saya lakukan dalam pembelajaran matematika ditampilkan di dalam kelas dalam bentuk grafik, poster, dan bentuk lainnya (Work that I do in mathematics is displayed around the room on charts, posters and in other ways)". In contrast, the most endorsed item was 29: "Saya bekerja dalam kelompok matematika, dengan teman sebangku, dan teman sekelas (I sometimes work in a mathematics group, sometimes with a partner and sometimes with the whole class)". The mean person measure value was 0.43 [standard deviation (SD) = 0.74], suggesting that the average student's opportunity for competence development in mathematics was higher than the average level of item difficulty of the scale (zero). Furthermore, the person distribution spread ranged from −1.70 to 5.86, which exceeded the item difficulty range (−0.79 to 0.56).

Test information function
Once the relevant information has been determined for each item, it is of considerable interest to find the point along the ability scale at which a test provides the most precise measure of ability. The test information function (TIF) θ is simply the sum of the individual item information functions (Doran, 2005), and the TIF of the scale can be seen in Figure 3, below: Figure 3 shows a graph concerning the measurement information function of the SPOCD questionnaire. The findings for SPOCD questionnaire indicated that the highest information function was at the moderate opportunity level for competence development, with the information obtained from the measurements being considerable. At a high opportunity level for competence development, the information obtained from the measurements was relatively low, as was the latent trait level. These results showed that the SPOCD questionnaire produced optimal information when given to individuals with moderate trait levels.

Discussion and conclusion
The need for more effective competence development measures has increased. In this study, we used the Rasch RSM and CFA to assess and validate the SPOCD questionnaire. The findings from this study indicated that the items in the SPOCD questionnaire were generally adequate and appropriate for the measurement of opportunities for competence development in mathematics as perceived by Indonesian students. The Rasch RSM proved particularly useful in assessing the SPOCD questionnaire and in validating it. In addition, the investigation of the construct validity of the SPOCD questionnaire using CFA provided evidence of a higher-order dimensionality structure of the scale after comparing the quality of goodness-of-fit indices with one-and five-factor models.
The results of validation analyses using Rasch RSM indicated that the SPOCD questionnaire had acceptable item and person reliability. Even a perfectly unidimensional scale will not be useful in practice if the resultant scale score has unacceptably low reliability (Gerbing & Anderson, 1988); acceptable Rasch reliabilities of the SPOCD questionnaire can make the scores useful in practice. The SPOCD questionnaire also demonstrated responsiveness to competence developmental change and was able to differentiate between students giving differing levels of responses. The Rasch RSM analysis offered further support to these findings through an additional analysis of the performance of each item and response category of the SPOCD questionnaire. The Rasch RSM assessment indicated that all the SPOCD questionnaire items had acceptable infit and outfit MNSQ Figure 3. Test information function of SPOCD Questionnaire. This shows the Fisher information for the test (set of items) on each point along the latent variable. The test information function reports the "statistical information" in the data corresponding to each score or measure on the complete test.
statistics, suggesting that, in general, the SPOCD questionnaire items were consistent with the measurement of a single underlying construct. The CFA results supported a similar conclusion, as the higher order model fit to the data reasonably well.
Person measures from Rasch analysis are the single score representing opportunities for competence development in mathematics. A researcher can convert the total ordinal scores into intervallevel data based on person estimates of the Rasch model (Andrich & Marais, 2019); Indonesian teachers have had guidelines on and were trained to carry out such conversions since the mid-1970s by the Ministry of Education and Culture, Republic of Indonesia (Nasoetion et al., 1976). Although the original conceptual of the Rasch model and the results of our analyses support the use of the SPOCD to produce a single score, some researchers may wish to produce separate scores for the five subscales. The good psychometric properties of the higher order CFA model of SPOCD that was taking account of individual subscales as a first order factor suggests that it would be reasonable to use the subscales factor scores if needed (cf. Brown & Croudace, 2015). However, it should be noted that the scores generated from the CFA (i.e., EAP or MAP scores) are different from the Rasch measure because of different estimators and models (de Ayala, 2009). Further work is needed to explore the analytical procedure that uses subscale factor scores in relation to other variables.
The Rasch RSM assessment also revealed adequate performance of the five SPOCD questionnaire response categories, through showing that: 1) the average measure estimate for each response category increased monotonically and in the expected direction as the response categories moved from lower to higher categories; 2) the thresholds of the adjacent response categories increased monotonically and in the expected direction, and; 3) each of the five response categories showed acceptable infit and outfit MNSQ statistics. This was one of benefits of using the Rasch model (Tennant & Conaghan, 2007).
Some limitations of this study were that, first, although the sampled individuals were representative of the high school student populations from which they were selected, the use of a nonprobability sampling approach may not provide an accurate representation of this population in the study areas. However, the use of Rasch RSM analysis was likely to have overcome this issue as its methodology does not depend on the sampling involved, enabling generalization concerning the effective measurement properties of the SPOCD questionnaire. Second, more research is needed to fully understand the gender differences that come into play when considering opportunities for competence development in mathematics.
In summary, the application of the Rasch RSM and CFA to the SPOCD questionnaire helped to confirm the accuracy and usefulness of the SPOCD questionnaire summary scores, because all individuals have their own standard errors, and provided this scale with additional psychometric support. The findings from this study should help to facilitate use of the SPOCD questionnaire in future educational evaluation studies and encourage awareness, understanding and use of the scale among students, teachers, and decision-makers. Although the current study focused on the SPOCD questionnaire, the same method may be used to assess and validate other educational performance assessment scales.

Funding
The author(s) received no financial support for the research, authorship, and/or publication of this article.
1 Research and Evaluation of Education Department, Universitas Negeri Jakarta, Jakarta, Indonesia. 2 Faculty of Psychology, Gadjah Mada University, Yogyakarta, Indonesia. 3 School of Education, Curtin University, Western Australia.

Declaration of Conflicting Interests
The author(s) declared no potential conflicts of interest with respect to the research, authorship, and/or publication of this article. When our teacher is working with another group, I work considerately alongside others, completing the tasks that are set.
I am courage to read about maths and to get my maths ideas from lots of different places.
My teacher shows us lots of different ways of recording our maths thinking and solutions, using word, symbols and diagrams.
In our groups lessons our teacher writes maths problems and records our ideas and this helps me to understand better.
I talk a lot with others about what maths words and symbols mean.
Work that I do in maths is displayed around the room on charct, posters and in other ways.
I am encouraged to write and present my maths ideas in lots of different.
Managing self
In maths I always know what I am learning and why.
In maths, when the teacher is working with another group, I am expected to work independently and to take responsibility for my own learning.
When I can choose in the maths, I know I need to make sensible decisions about what activity will challenge me not just choose the easy ones. (Continued)