The Finnish Version of the Affinity for Technology Interaction (ATI) Scale: Psychometric Properties and an Examination of Gender Differences

Abstract The pervasiveness of technical systems in our lives calls for a broad understanding of the interaction between humans and technology. Affinity for technology interaction (ATI) scale measures the tendency of a person to actively engage or to avoid interaction with technological systems, including both software and physical devices. This research presents a psychometric analysis of a Finnish version of the ATI scale. The data consisted of 796 responses of students in a Finnish university. The data were analyzed utilizing factor analysis and both nonparametric and parametric item response theory. The Finnish version of the ATI scale proved to be essentially unidimensional, showing high reliability estimates, and forming a strong Mokken scale. Hierarchical multiple regression analysis showed that men had a slightly higher affinity for technology than women when controlling for age and field of study; however the effect size was small.


Introduction
Urban legend or not, the famous quote from the 1950s allegedly attributed to Thomas J. Watson, a long-time chairperson and the CEO of International Business Machines, claimed that there would be market potential for only five electronic computers (IBM, 2007). The future turned out to be different, and today we live amid ubiquitous technical systems. The pervasiveness of different technological systems stretches out to many fields of life, including work (e.g., van Laar et al., 2017), education (e.g., Kim, Merrill, Xu, & Sellnow, 2021), sports (e.g., Cranmer et al., 2021), health (e.g., Tajudeen et al., 2021), culture (e.g., Kim & Lee, 2022), virtual life (e.g., Taufik et al., 2021), and even afterlife (Beaunoyer & Guitton, 2021). Thus, technological development poses several challenges which call for multidisciplinary, international and global research efforts (Stephanidis et al., 2019).
Scholars have identified mixed effects relating to how information and communication technologies affect our lives (Ali et al., 2020). Many effects are reciprocal and mediated by personality traits and other individual differences. For example, individual differences can affect self-disclosure in social media (Chen et al., 2015), usability assessment (Kortum & Oswald, 2018), gaming (Caci et al., 2019), online learning (Alabdullatif & Vel azquez-Iturbide, 2020), and online privacy literacy and behavior (Sindermann et al., 2021). Furthermore, meta-reviews have documented gender differences relating to attitudes towards technology (Cai et al., 2017;Whitley, 1997): in general, men tended to have slightly more favorable attitudes towards technology. Information technology can facilitate personality research especially from the idiographic point of view (Matthews et al., 2021;Montag & Elhai, 2019) and, as Matthews et al. (2021) point out, socio-technological change might give rise to the evolution of the contemporary trait models in the future society. Thus, it is important to have valid constructs and psychometric instruments to discern individual differences and understand underlying phenomena relating to interactions between humans and technology.
A general concept for depicting the relationship between humans and technology is the person's affinity with technological systems and devices. Edison and Geissler (2003) consider affinity for technology as an attitude and a "positive affect towards technology (in general)." Franke et al. (2019) define affinity for technology interaction as a question of "whether users tend to actively approach interaction with technical systems or, rather, tend to avoid intensive interaction with new systems." In terms of technology interaction, it can be viewed as a key personal resource, and as such, it is of great importance considering the interaction between the user and technology.
One promising scale to assess human and technology interaction is the affinity for technology interaction (ATI) scale 1 developed by Franke et al. (2019). The scale was initially developed in English and German, and it is currently available as translations also in Italian, Spanish, Romanian, Persian, and Dutch. However, to our knowledge, besides the original English and German versions, no published analyses of the psychometric properties of the scale exist for other languages. In terms of cross-cultural research, confirmation of translations of scales plays a crucial role, especially when the scales are constructed to measure some universal constructs or phenomena (Cha et al., 2007).
This research presents a psychometric analysis of the Finnish version of the ATI scale. We elaborate on the subject by examining the gender differences relating to ATI. In other words, we address the validity evidence concerning the scale's internal structure and it's ability to capture differences and similarities (AERA, APA, & NCME, 2014). We use a comprehensive analytical process utilizing methodological triangulation and multiple sources of information. By presenting a psychometric analysis of the Finnish version of the scale, we aim to provide added value to the original research-based model of using ATI scale in measuring individual differences in affinity for technology interaction.

Materials and methods
2.1. Affinity for technology interaction (ATI) scale  The starting point of the definition of ATI mentioned earlier is a realization that affinity for technology interaction and need for cognition (NFC) are closely related; following Schmettow and Drees (2014), Franke et al. (2019) propose that the two "should be conceptualized in a close relationship". Relying on, for example, Cacioppo and Petty (1982), Cacioppo et al. (1996) and Fleischhauer et al. (2010), they note that NFC can be seen today as "the inter-individually varying, stable intrinsic motivation to engage in cognitively challenging tasks". Given that NFC can be applied in different psychological domains, they see developing ATI in line with the construct of NFC as useful.
The purpose of Franke et al. (2019) was both to develop and validate a new scale to be able to assess ATI. Their goal was to provide "a highly economical and reliable unidimensional scale that is suitable for differentiating between users across the whole range of the ATI trait," keeping in mind the focus that ATI has as a general interaction style in relation to technology . As a result, their ATI scale is a unidimensional 9-item scale having 3 reverseworded (RW) items. All items are measured using a 6-point Likert scale. The shorter version of the scale (ATI-S) consisting of a subset of 4 items is currently available in German and in English . Franke et al. (2019) summarized the results of their validation process of ATI scale using multiple studies (N > 1500) as follows: first of all, the factor analyses indicated unidimensionality of the ATI scale. Secondly, their analysis revealed that reliability estimated using coefficient a ranged between good and excellent. Thirdly, when it comes to the need for cognition, geekism, technology enthusiasm, computer anxiety, control beliefs in dealing with technology, success in technical problem-solving and technical system learning, technical system usage, and the personality dimensions linked to Big Five, the expected relationships were supported by construct validity analysis. Fourth, when considering the ability of the ATI scale to differentiate between higher-and lower-ATI participants, item analysis and descriptive statistics showed that this was possible. Fifth, when taking into account analyses of demographic variables, the gender effect turned out to be large, the age effect small, and the educational background had no effect at all. Thus, the results showed that it could be possible to "discriminate between participants based on their differing tendency to actively engage in intensive (i.e., cognitively demanding) technology interaction" . The ATI scale has been used in varied contexts. These include studies on partially automated vehicles (e.g., Boelhouwer et al., 2020;Schartm€ uller et al., 2019), automated decision-making in health care (Schlicker et al., 2021), use of information technology among primary care physician trainees (Wensing et al., 2019), privacy concerns in users' acceptance of e-Health technologies (Schomakers et al., 2019) and activity tracker usage (e.g., , as well as augmented reality (Kammler et al., 2019).
the original German version of the ATI scale for creating the Finnish translation. A native German speaker and a German language teacher evaluated the connotations of a few essential wordings between the initial Finnish version and the original German version of the scale to achieve consistency between both original versions. After a few minor refinements, the initial translated version was back-translated to English by two independent professional translators. The back-translated versions proved excellent similarity with the original English version of the scale. The exact word-by-word equivalences of the back-translated versions compared to the original scale version were 70% and 76%, and when considering synonyms 77% and 85% respectively. Thus, the final translated Finnish version (Table 1) was chosen to be used in the primary data collection. The translations of the introductory text and Likert categories are presented in the Appendix A.

Data collection and participants
We used a non-probabilistic convenience sample of students (N ¼ 796) studying in a Finnish public multidisciplinary research university (ISCED 2011 level 6-8). The data were collected using an online questionnaire. The link to the questionnaire was sent through student email lists in six faculties or departments. The questionnaire contained a privacy statement complying with GDPR, and informed consent was obtained from the participants of the research. The participants had the opportunity to participate in a raffle to win one of 10 gift cards worth 20 euros each. Demographic information (i.e., age and gender) were asked, including information about the faculty where the respondent was studying. Gender was asked using a single-item open-ended question because "it allows respondents to define their own gender using whatever terminology they choose" (Cameron & Stinson, 2019).
Respondents' ages ranged from 17 to 73 years (Mdn ¼ 25, M ¼ 27:6, SD ¼ 8:9 ). An open-ended form field was used to ask gender and 519 (65%) of the respondents identified themselves as women, 264 (33%) as men, 7 (1%) as nonbinary, and 6 (1%) that were unknown (i.e., preferred not to answer or the answer was uninterpretable) were coded as missing values. Respondents represented six university faculties or departments: information technology (28% of the total respondents, 47% women in the subsample), natural sciences (22%, 67% women), humanities and social sciences (14%, 74% women), sport and health sciences (14%, 76% women), education (11%, 92% women), business school (10%, 66% women), and other units (1%). There were eight responses with missing values relating to age or gender.Compared to the general population, the sample is limited by age and educational background as it consists of relatively young people pursuing university studies. The use of a convenience sample is exploratory in nature, and it limits the generalizability of the results with respect to the general population, which is discussed in more detail in the study limitations in Section 4.1.

Psychometric protocol
Our analysis process consisted of four main phases: (i) describing the data using common descriptive statistics, (ii) utilizing the non-parametric item response theory by conducting a Mokken scale analysis, (iii) conducting factor analysis based on a classical test theory, and (iv) utilizing the parametric item response theory by conducting a complemental analysis based on partial credit model. A similar approach excluding the parametric item response theorybased analysis has been conducted for the part of the original ATI scale data (c.f., Lezhnina & Kismih ok, 2020). Furthermore, we applied the scale using hierarchical multiple regression analysis to examine the gender differences concerning the affinity for technology interaction. All analyses were conducted using R version 4.0.3 (R Core Team, 2020) and packages mentioned later in the methods section.
Our analytical process was as follows (Figure 1). We began with descriptive statistics and examined whether the data contained any peculiarities (e.g., excess skew or kurtosis, ceiling or flooring effects, categories without responses). Subsequently, we continued with Mokken scale analysis. It is a convenient nonparametric method making no assumptions about the data distribution while providing an initial assessment of the important scale properties. Mokken scale analysis addresses whether the total score of the scale can be used to order persons with respect to the measured construct. A scale forming a Mokken scale would be a promising candidate for further analysis.
Next, we used factor analyses for evaluating the structural properties of the scale in detail. The assessment of dimensionality is critical, and it "requires informed judgment that balances statistical information with conceptual plausibility and utility" (Fabrigar & Wegener, 2012, p. 148). The expected number of dimensions is naturally determined in the original scale validation research for an existing scale. However, a new translation and a new cultural context necessitate a new dimensionality assessment.
After the Mokken scale analysis and the factor analysis, we scrutinized the scale further by utilizing the parametric item response theory. As the ability to order persons in their latent variable is an important feature of a scale, we used the partial credit model (PCM) for the analysis because it is the least restrictive parametric IRT model still possessing a more accurate property of stochastic ordering of the latent variable by the total scale score (i.e., SOL by X þ ) (Hemker et al., 1997;Ligtvoet, 2012;van der Ark, 2005). Furthermore, the PCM analysis provided information at the item level. For conciseness, the complemental analysis based on parametric item response theory using PCM is presented in Appendix C.

Descriptive statistics and multivariate outliers
The data were first examined using basic descriptive statistics. Nonnormality is usual in the case of real-world psychological and educational data (Cain et al., 2017). Mardia's test for multivariate skewness and kurtosis (Mardia, 1970;Mecklin & Mundfrom, 2007) was used to assess whether the data complies with the multivariate normal distribution (McDonald & Ho, 2002). A significant result in Mardia's test indicates that the data were not complying with multivariate normal distribution. Univariate skewness and kurtosis were assessed using Fisher's skewness (G 1 ) and kurtosis (G 2 ) (Cain et al., 2017;Joanes & Gill, 1998). A scalogram was used to describe the individual response patterns visually (e.g., Massof, 2004).
To detect possible multivariate outliers, we used Mahalanobis-Minimum Covariance Determinant with a breakdown point of 0.25 (MMCD75). As a robust version of the traditional Mahalanobis distance, MMCD75 was suggested to be efficient in detecting outlying values as well as having an acceptable false detection rate (Leys et al., 2018). We analyzed the data with and without the outliers. For transparency, both results are reported when the difference is deemed to be more than negligible or the result is otherwise crucial for assessing the effect of outliers in the data (e.g., measures of association). All possible outliers identified by the MMCD75 method were depicted visually using a scalogram.

Nonparametric item response theory
Models belonging to the field of nonparametric item response theory (NIRT) are data-driven exploratory models, which assume that the relationship between the latent variable and the item score is restricted only by order (Sijtsma, 2005). One such model is the monotone homogeneity model (MHM) for dichotomous (Mokken, 1971(Mokken, , 1997 and polytomous (Molenaar, 1997) data. MHM is also known as the nonparametric graded response model (np-GRM) (Sijtsma & van der Ark, 2020, p. 233) and it is the most general of all well-known IRT models for polytomous data (Hemker et al., 2001;van der Ark & Bergsma, 2010). The general MHM has some desirable psychometric properties in terms of total scale score. Thus, in this paper, we first start our analysis by assessing the applicability of MHM to the collected scale data.
MHM is based on three key assumptions: i) unidimensionality, which means that all items are measuring the same latent variable, ii) local independence, which means that the item scores depend only on the person's latent variable; and iii) latent monotonicity of the item step response functions (ISRFs), which means that the functions are nondecreasing concerning the latent variable (Sijtsma, 2005;Sijtsma & van der Ark, 2017;van der Ark, 2012). A more strict model, the double monotonicity model (DMM), also assumes invariant item ordering, which means that the scale items can be placed in order with respect to the latent variable (Sijtsma & van   a set of tools, which can be used to analyze how dichotomous or polytomous scale data meet the assumptions of MHM and DMM. Scale identification in MSA involves examining the applicability of the assumptions in the data, in other words, assessing scalability, local independence, and invariant item ordering of the scale items in addition to the monotonicity of the ISRFs. Scalability in MSA is based on the coefficient of homogeneity H (Loevinger, 1948;Mokken, 1971) also called as the scalability coefficient (Sijtsma & van der Ark, 2017). Existing scales can be evaluated directly using the inter-item coefficients H jk , coefficients of the individual items H j , and the overall coefficient H for the whole scale (Mokken, 1971(Mokken, , 1997. Higher H j implies better item discrimination and values close to 0 do not discriminate well in terms of the latent variable (Sijtsma & van der Ark, 2017;Straat et al., 2014). Thus, a common approach to decide whether to include items to a scale is to define a threshold value c so that all H j > c. The lowest threshold value traditionally used for considering the inclusion of an item to the scale is H j > 0:30 for all items and the excluded items are considered as unscalable (Mokken, 1971;Sijtsma & van der Ark, 2017). For classifying complete scales, 0:30 H < 0:40 forms a weak Mokken scale, 0:40 H < 0:50 forms a medium scale, and H ! 0:50 forms a strong scale (Mokken, 1971, p. 185).
Instead of relying on arbitrary threshold values, an automated item selection procedure (AISP) provides a way to examine the scale items' scalability and dimensionality. AISP is an iterative process, which aims to select items from the initial item bank so that (i) the selected item has a positive covariance with each of the already selected items, (ii) the item has H j ! c, and (iii) the selected item maximizes the overall H value of the scale with other selected items (Hemker et al., 1995). Instead of selecting a single threshold value c, one suggested approach is to run AISP for a sequence of thresholds (e.g., c ¼ {0.05, 0.10, 0.15, … , 0.60}) (Hemker et al., 1995;Sijtsma & van der Ark, 2017). The examinations of the sequential outcome pattern of AISP can reveal whether the data form one or more scales and whether some items turn out to be unscalable at a certain level of c (Hemker et al., 1995). Two different procedures have been proposed for AISP, Mokken's procedure (Mokken, 1971, p. 191) and a genetic algorithm (Straat et al., 2013). The two algorithms might yield different results for the same data (Sijtsma & van der Ark, 2017). The minimum sample size for using the AISP procedure depends on the item quality, but at least 250 to 500 responses are needed (Straat et al., 2014).
The scale items' local independence can be assessed using a procedure based on conditional association (CA). The procedure CA flags items as locally dependent and removes them one by one based on conditional covariances, indices W ð1Þ , W ð2Þ , and W ð3Þ , to identify a locally independent item set (Straat et al., 2016). An item or an item pair is flagged as locally dependent if W > Q 3 þ 1:5 Ã IQR, where Q 3 is the third quartile, and IQR is the interquartile range of the empirical W distribution (Straat et al., 2016) (i.e., W is outside of Tukey's upper inner fence (Tukey, 1977, p. 44)). In this study, we utilized the procedure CA implemented in the mokken package (van der Ark, 2012).
Latent monotonicity means that the item step response function is a nondecreasing function with respect to the latent variable (van der Ark, 2012). In other words, the higher the person's ability on the latent variable, the higher the probability of scoring cases typical of the higher attribute level (Sijtsma & van der Ark, 2017). Manifest monotonicity-a property observed from the scale data-can be used to assess latent monotonicity using a procedure implemented in the R package mokken (van der Ark, 2007Ark, , 2012. The procedure combines respondents to rest score groups based on a selected minimum group size criterion minsize. Manifest monotonicity is assessed based on the probability of belonging to a higher rest score group with respect to a higher latent variable, and violations exceeding a minimum value minvi are considered relevant. For the data in this study, minsize ¼ N=10 and minvi ¼ 0.30 were used (van der Ark, 2007).

Factor analysis
The first step in factor analysis is to assess the dimensionality of the data and decide how many factors to retain. A suggested approach is to use multiple methods to assess the dimensionality of the data and compare their results (Lubbe, 2019). To assess the dimensionality and the number of factors to retain, we used parallel analysis (PA) and minimum average partials (MAP). The parallel analysis compares the structure in the collected data to a structure of randomly sampled data. The number of dimensions in the actual data exceeding the number of dimensions on the random data is retained. PA is often referred to as one of the most accurate and robust rules for determining the dimensionality of the data (Lubbe, 2019), and it performs well in a wide variety of scenarios (e.g., . PA with PCA extraction (PA-PCA, a.k.a., Horn's PA (Horn, 1965)) using polychoric correlation has been suggested to be suitable for all types of data (Garrido et al., 2013). For PA-PCA, we used a non-parametric version of parallel analysis with column permutation (500 random data sets), polychoric correlation, and quantile thresholds 50% (median, PA-PCA-m) and 95% (PA-PCA-95) (Auerswald & Moshagen, 2019;Buja & Eyuboglu, 1992).
Another promising and recent approach for analyzing the dimensions of psychological constructs is exploratory graph analysis (EGA). EGA draws from the methods behind network psychometrics, which in turn aims to combine different latent variable models and network models (Epskamp et al., 2017(Epskamp et al., , 2018. EGA utilizes partial correlations and the Gaussian graphical model with a clustering algorithm for a weighted network (i.e., Walktrap algorithm) . EGA is suggested to possess several advantages over more traditional methods. For example, the results of EGA can be interpreted visually instead of interpreting a factor loading matrix, and there is no need to make decisions about the factor rotation . Two different estimators have been suggested to be used in EGA : the graphical least absolute shrinkage and selection operator (GLASSO) (Friedman et al., 2008) and the triangulated maximally filtered graph (TMFG) (Massara et al., 2016). The advantage of the EGA-TMFG method is that it does not assume the data to be multivariate normal, and it is suggested to perform at its best with unidimensional data . Furthermore, total entropy fit index (TEFI) using Von Neuman's entropy can be used to evaluate the EGA model fit, and lower values of TEFI indicate lower disorder (i.e., better fit) .
Confirmatory factor analysis (CFA) covers the steps of model specification, estimation, and evaluation (Brown, 2015). The basis for the model specification in our research was the original (a priori) unidimensional model (i.e., Franke et al., 2019). The model estimation was conducted employing polychoric correlation (Holgado-Tello et al., 2010) and robust diagonally weighted least squares (DWLS) estimation with test statistics adjusted in terms of mean and variance (i.e., scale-shifted approach, a.k.a., WLSMV (El-Sheikh et al., 2017)), which is a suggested estimation method for ordinal data (Beauducel & Herzberg, 2006;DiStefano & Morgan, 2014;Foldnes & Grønneberg, 2021;Forero et al., 2009;Li, 2016aLi, , 2016bLi, , 2021. To describe the goodness of fit of the CFA models, we used the standardized root mean squared residuals (SRMR) as an indicator of the absolute fit, the root mean square error of approximation (RMSEA) as an indicator of the parsimony corrected fit, and the comparative fit index (CFI) and Tucker-Lewis index (TLI) as indicators of the comparative fit. In general, SRMR and RMSEA values closer to zero and CFI and TLI values closer to one are considered as indicators of better fit of the model. Specifically, SRMR is relatively insensitive to different estimators and appropriate to use in the case of ordinal models (Shi & Maydeu-Olivares, 2020). Various suggestions for deriving cut off values and combinational rules for an acceptable model fit can be found in the literature (e.g., TLI or CFI > 0.95 and SRMR < 0.09 (Hu & Bentler, 1999), dynamic fit index (McNeish & Wolf, 2021)); however, no "golden rule" exists (e.g., Greiff & Heene, 2017;Shi et al., 2019).
Furthermore, we conducted a specification search using modification indices (i.e., Lagrange multipliers) to examine the localized areas of strain in the model. Complementing the global model fit assessments based on the goodness-offit measures, the modification indices based on the expected parameter change provide insights about the local misspecifications (e.g., Greiff & Heene, 2017). The use of modification indices is exploratory in nature, and the modifications should be based on underlying theoretical assumptions of the model (Brown, 2015, p. 106). CFA was conducted and the modifications were applied based on the expected parameter change and power analysis (Saris et al., 2009) using the R package lavaan (Rosseel, 2012).
We estimated the reliability of the scale from the classical test theory (CTT) and factor analysis points of views. Coefficient a (e.g., Cronbach, 1951) is based on the assumptions of CTT (Lord & Novick, 1968, p. 36-38). The underlying idea in reliability is replicability: the reliability q XX 0 of a test reflects the degree of linear correlation between two parallel tests having the same formal properties (Sijtsma & Pfadt, 2021). In essence, coefficient a is a lower bound to the reliability (Sijtsma, 2009;Sijtsma & Pfadt, 2021); however, under approximate unidimensionality it is close to reliability q XX 0 (Sijtsma & Pfadt, 2021). On the other hand, the reliability coefficient x (e.g., McDonald, 1999, p. 88-90) is based on the concept of a factor analysis (FA) model. As suggested for categorical data, we estimated the reliability following FA approach by using categorical omega x c with bias-corrected and accelerated bootstrap confidence interval (Dunn et al., 2014;Kelley & Pornprasertmanit, 2016) implemented in R package MBESS (Kelley, 2007).

Differential item functioning
Differential item functioning (DIF) means that persons having the same level of ability in the latent variable respond differently to the item depending on the persons' characteristics. For example, if an item exhibits gender-based DIF, it means that men and women with the same ability with respect to the latent variable have different probabilities for response categories. A wide variety of methods have been proposed to detect DIF, but many of them suffer from fundamental issues (e.g., requiring a priori chosen anchor items) (Bechger & Maris, 2015;Yuan et al., 2021). For detecting uniform DIF, we used a recent method utilizing an approach based on the lasso principle, which does not require using anchor items (Schauberger & Mair, 2020). The DIF analysis was executed using R package GPCMlasso (Schauberger, 2019).

Total scale score
When measuring a latent variable using a psychometric scale, the total scale score X þ -usually calculated as the unweighted sum of all item scores-is assumed to be the proxy of the measured latent variable. Some IRT models have a property called stochastic ordering of the latent variable by X þ (SOL by X þ ), which implies that a higher total score X þ results in a higher expected latent variable value (Hemker et al., 1997). If the data comply with a model having the property of SOL by X þ , then the simple sum score X þ can be used to order respondents in terms of the latent variable (Sijtsma & Hemker, 2000). SOL by X þ holds for MHM with dichotomous data (Mokken, 1997;Sijtsma & Hemker, 2000). MHM with polytomous data does not imply SOL by X þ (Hemker et al., 1997). However, a property called weak SOL was proposed to apply to MHM with polytomous data, which in turn is argued to justify the ordering of respondents on the latent variable using the total score X þ (van der Ark & Bergsma, 2010). MHM for polytomous items does not imply complete person ordering, but it allows for pairwise person ordering (van der Ark et al., 2019). Even though MHM might not be completely satisfactory for the exact ordering of individuals, it can be used to order groups of people using statistics of central tendency (e.g., mean and median) as people with a higher X þ have on average a higher ability on the latent variable compared to people with a lower X þ (Zwitser & Maris, 2016). The corrected item-total correlation is used to indicate the coherence between an item and the other items in a scale, and it is one of the best methods for item assessment when constructing tests (Zijlmans et al., 2018). The corrected itemtotal correlation is calculated by correlating the item score with the total scale score without that item. Items with a higher corrected item-total correlations are more desirable (DeVellis, 2017, p. 142).

Ethical considerations
The research was conducted following the guidelines of the World Medical Association (WMA) Declaration of Helsinki. According to the guidelines of The Finnish National Board on Research Integrity and the research institution where the research was conducted, ethical pre-evaluation or permission was not needed for executing the research. Participation in the research was voluntary. Research and privacy statement was prepared following the GDPR and national legislation. An informed consent was asked before respondents answered the questionnaire. Identifying information (i.e., email address) was asked to enable the voluntary gift card raffle. It was also possible to participate in the research entirely anonymously. The data were anonymized directly after data collection.

Descriptive statistics
First, multivariate outliers were identified using MMCD75 (Leys et al., 2018), and 39 (4.9%) responses were identified as potential outliers. After excluding the outliers, there were n ¼ 757 responses with similar demographic properties as the complete data. We analyzed the data with and without outliers. The results including outliers are interpreted in the text or presented using a marker †. It is worth noting that the distributional properties of the data represent the reponses of this particular convenience sample consisting of relatively young and educated people. Figure 2 depicts the distributions of answer categories of each scale item without outliers. All answer categories received responses. The least amount of responses were in Completely disagree categories of items ATI8 (n ¼ 20; 2.6%) and ATI9 (n ¼ 14; 1.8%). Table 2 describes the distributional properties of the items without outliers. The effect of the outliers on the distributional properties was negligible. Mardia's tests for multivariate skewness and kurtosis of the items were significant, which indicated the data deviated from multivariate normal distribution. As expected, the reverse-worded items ATI3, ATI6, and ATI8 were negatively associated with all other items. After reverse-coding the reverse-worded items using linear scaling, all interitem correlations were positive.
We used a scalogram to depict the variation in the respondents' response patterns visually. Figure 3 shows all response patterns excluding the outliers. In the figure, the respondents were ordered according to the sum score X þ , and the scale items were ordered according to the item sum score. Thus, the colors depicting the amount of agreeableness would accumulate to the top right corner of the figure. Respectively, the colors depicting the amount of disagreeableness would accumulate to the lower left corner of the figure. Visual inspection showed that especially the item ATI3R exhibited a somewhat irregular response pattern. Appendix Figure D1 depicts the scalogram containing only the outliers identified by the MMCD75 procedure. Several dubious responses can be identified (e.g., extreme and low scorers, contradicting responses) exhibiting ambiguous response patterns. As described above, instead of subjective  selection, we removed all outliers suggested by MMCD75 and report results with and without outliers.

Mokken scale analysis
We utilized non-parametric item response theory (Sijtsma, 2005), namely the monotone homogeneity model (Mokken, 1971), by applying Mokken scale analysis (Sijtsma & van der Ark, 2017) to the scale data. First, we examined the scalability of the scale items using the coefficient H (Appendix Table B1). After reverse coding the reverse-worded items, all scalability coefficients were positive. For individual items, the values were 0:43 < H j < 0:66 (0.014 < SE < 0.027) exceeding the traditional cutoff value c ¼ 0.3. For item pairs, the values were 0:35 < H jk < 0:82. The effect of including outliers was small. Local independence was assessed using conditional association procedure (Straat et al., 2016). Without outliers, all items were found to be locally independent. With outliers, W ð1Þ flagged the item pair ATI7-ATI1 as locally dependent. We examined the monotonicity of the ISRFs, and there were only six (five † ) violations of manifest monotonicity (all in RW items) using minvi ¼ N=10, minvi ¼ 0.03. All violations of manifest monotonicity were non-significant at the level of a ¼ :05, which indicates that the assumption of monotonicity holds. However, the data with and without outliers showed significant violations of invariant item ordering. Thus, the data did not support the more strict assumption of the double monotonicity model.
version of the ATI scale formed a strong Mokken scale, which met the criteria of the monotone homogeneity model. Figure 4 shows the results of the parallel analysis. PA-PCA supported unidimensional structure because the eigenvalue of the first component in the scale data was higher and the value of the second component was lower than the corresponding values in the parallel simulated data. Two of the smallest values of MAP were 0.045 (0.047 † ) and 0.053 (0.049 † ). Also, the smallest MAP value suggested a unidimensional structure. The effect of including outliers in the parallel analysis was negligible.

Exploratory graph analysis
We applied EGA using both GLASSO and TMFG estimation for the data without outliers. The model using TMFG (Figure 5(a)) was chosen as the final model as TMFG was suggested to perform better in case of unidimensional data  and it also showed a smaller TEFI value (TEFI ¼ À4.97) when compared to the GLASSO estimation (TEFI ¼ À4.57) ( Figure 5(b)). Both estimation methods showed similar two-dimensional structures, except that the GLASSO estimation assigned ATI7 to the same dimension with RW items. The structures were identical with the data including outliers, except that the GLASSO estimation assigned also ATI1 to the same dimension with the ATI7 and RW items (Figure 2(a,b)).

Confirmatory factor analysis
We conducted CFA for the original unidimensional a priori model (Model 1). The model without outliers showed a better fit than the model including outliers (Table 4). The post hoc exploratory power analysis of the modification indices suggested adding correlated errors between ATI6R-ATI7, ATI6R-ATI8R, ATI3R-ATI8R, ATI3R-ATI6R, and ATI1-ATI6R. The results reflect a local discrepancy in the model. In other words, the model does not adequately reproduce the relationships of the item pairs mentioned above. Adding correlated errors based on the modification indices can be justified if there is a reason to believe that some of the covariations are due to some common exogenous cause instead of the latent variable (Brown, 2015, p. 157). The common cause for the discrepancy of the RW item pairs could be caused by a common method bias (DiStefano & Motl, 2006;Podsakoff et al., 2003;Woods, 2006) and systematic bias relating to item wording (Dalal & Carter, 2014, p. 117). For the other item pairs (i.e., ATI6R-ATI7 and ATI1-ATI6R), the cause for a local misfit could be their polar opposite wording as polar opposite items could affect on factorial structure (Zhang et al., 2016). Consequently, we applied the modification indices above, and the modified model (Model 2) resulted in an improved and sufficient fit both with and without outliers.

Differential item functioning
We examined the existence of a uniform DIF concerning age and gender using a regularization approach based on the lasso principle and PCM (Schauberger & Mair, 2020). Only item ATI3R exhibited uniform DIF based on gender, and none of the items showed a DIF concerning age. The results were the same for both data with and without outliers.

Total scale score
The corrected item-total correlations were adequate ranging between 0.51 (ATI3R) and 0.82 (ATI5). The RW items had the lowest corrected item-total correlations (0.51-0.67). Shapiro-Wilk normality test was significant (W ¼ 0.99, p < Figure 5. EGA for the complete data without outliers using TMFG (a) showed better fit (TEFI ¼ À4.97) supporting two-dimensional structure. Edges represent the partial correlations between items. .001), indicating that the total score was not normally distributed. However, the histogram and quantile-quantile plot of the total score ( Figure 6) showed a shape of an approximately normal distribution. The sample mean (M ¼ 33.3) was close to the center of the scale (C ¼ 31.5). There were no ceiling or flooring effects present. The scale mean for all respondents without outliers was 3.7 (SD ¼ 1.0), for all women 3.5 (SD ¼ 0.97), for all men 4.1 (SD ¼ 0.99), and for a very small sample of non-binary respondents 4.0 (SD ¼ 0.88). Appendix Table B2 shows the descriptive statistics of the total score in different groups. Figure 7 shows the difference in mean ATI scale score between women and men by field of study in this sample. Men showed slightly higher score value than women, specifically and interestingly in the fields of information technology and natural sciences. The results describe the differences in this particular sample and convenience sampling limits the generalizability of the results.
We used hierarchical multiple regression analysis to determine if the addition of gender improved the prediction of the total ATI scale score over and above age and field of study alone. The difference in explained variance between the models with and without gender as an independent variable would indicate the effect of gender on ATI when controlling for age and field of study. After fitting the model, visual inspection of the plot of studentized residuals versus unstandardized predicted values and the quantile-quantile plot did not reveal heteroscedasticity or violations of normality in the full model. The full regression model regressing the total ATI scale score on gender, age, and field of study was statistically significant, R 2 ¼ 0:18, R 2 adj ¼ 0:17, Fð9, 739Þ ¼ 17:76, p < :001. The full model results indicated that when controlling age and faculty, men had 4.6, 95% CI [3.2, 5.9] points higher total ATI scale score (0.51 on the Likert scale). When comparing  to a nested model without gender variable, the addition of gender to the prediction of the total ATI scale score led to a statistically significant increase in the coefficient of determination, DR 2 adj ¼ 0:05, Fð2, 739Þ ¼ 22:51, p < :001. In other words, gender accounted for an additional 5% of the variance in the data and the relating effect size of gender on ATI score, f 2 ¼ 0.06, 95% CI [0.05, 0.07], was small when using conventional criteria (Cohen, 1988, p. 413). The effect of outliers on the effect size was negligible.
In a meta-analysis by Cai et al. (2017), the overall weighted effect size relating to gender and attitudes toward technology (i.e., men having favorable attitudes towards technology) across all 87 reported studies was found to be small. On the other hand, when comparing group means between men and women using a quota sample, Franke et al. (2019) identified a large effect in ATI (i.e., men having a higher ATI score). Naturally, the findings relating to gender differences in our study can not be generalized due to sampling method and sample characteristics. However, the results are a one indication that the translated scale is able to capture differences between groups.

Discussion
Affinity for technology interaction (ATI) scale is a psychometric instrument used to quantify the tendency of a person to "actively approach interaction with technical systems or, rather, tend to avoid intensive interaction with new systems" . This research presented a psychometric analysis and properties of a Finnish version of the scale. The main aims of the analyses were to assess the evidence concerning the scale's internal structure (i.e., dimensionality and the functioning of the individual items). Furthermore, we examined its ability to capture differences and similarities in ATI among students in higher education by gender and field of study.
The Finnish translation was conducted using a professional forward-backward translation process with committee approach (Sousa & Rojjanasrirat, 2011). Data were collected using convenience sampling, and the respondents were university students from six different faculties in a Finnish university. Our comprehensive analysis involved factor analysis and analyses based on both parametric and non-parametric IRT. In addition, we analyzed the data in terms of outliers, and the results were reported both with and without outliers. In general, the outliers seemed to hinder the properties of the scale, but the effect was minimal.
Unidimensionality is a convenient feature of a psychometric scale and the original scale has been deemed unidimensional in previous studies Lezhnina & Kismih ok, 2020). For the translated version in this study, parallel analyses using traditional PA-PCA and MAP supported a unidimensional structure. On the other hand, a network psychometrics approach using EGA showed a twodimensional structure where the RW items separated as a second dimension. Post hoc analysis of the unidimensional CFA model showed similar structural indications as EGA. As a result, it can be stated that RW items and both items ATI1 and ATI7 showed some discrepancy concerning unidimensionality. The discrepancy from the unidimensionality could be caused by a common method bias relating to RW items. It is well-known in the literature that the RW items in a scale can hinder the unidimensional structure (Boley et al., 2020;Su arez-Alvarez et al., 2018). Furthermore, they can affect response patterns (Baumgartner et al., 2018;Woods, 2006;Zhang et al., 2016) and mixed scales could also be less reliable and have more measurement error (Dalal & Carter, 2014;Schriesheim et al., 1991).
In general, however, the translated version showed at least moderate fit to the unidimensional model even though the cut-off values for the fit indices for ordinal CFA are not yet settled in the research literature. Essential unidimensionality means that the scale consists of minor dimensions still tapping the same latent variable (Slocum-Gori & Zumbo, 2011). Thus, the translated version could be considered as essentially unidimensional. From the both CTT and FA point of view, the translated scale showed excellent reliability estimates.
The Mokken scale analysis based on the non-parametric IRT showed that the translated scale's data fitted well to the MHM. In other words, the scale conformed with the requirements of unidimensionality, monotonicity, and local independence. Furthermore, the original ATI scale was found to support also invariant item ordering (Lezhnina & Kismih ok, 2020). However, the Finnish version in our research did not support invariant item ordering indicating that the items do not have a specific ordering based on the item difficulty. The translated version formed a strong Mokken scale which means it supports at least the weak SOL by X þ . Thus, it is possible to form a composite total scale score that can be used to order persons on the latent variable (van der Ark & Bergsma, 2010).
The possibility to use the total scale score to differentiate persons concerning their latent variable (SOL by X þ ) and the lack of uniform differential item functioning allowed us to advance our analysis by examining the gender differences relating to ATI. We used multiple regression to assess the gender difference in ATI while controlling for the known variables, age and the faculty the respondent was studying in. The results showed that men exhibited slightly more affinity towards technology interaction than women. Specifically, the difference in the sample among the IT students was interesting as the particular subsample was more balanced in terms of gender than other subsamples. The findings can be considered as indications of the differential validity of the scale. While the actual effect size relating to gender was small, a similar difference between genders has been identified in the research literature, and the effect size in this study was comparable to findings in a recent meta-review by Cai et al. (2017). Naturally, it is worth questioning if the difference is large enough to have any practical significance (c.f., Whitley, 1997). However, as the technical systems gain more and more traction and influence in our lives, even a small effect can become significant and meaningful over time. Technological agency would be an essential characteristic that enables and promotes equal participation in various fields of life. Thus, it is of the utmost importance to have valid and cross-cultural instruments to assess the different personal stances towards technology and technical systems.
Our study presented several contributions. Firstly, to our knowledge, this is the first comprehensively analyzed translation besides the original ATI scale versions (c.f., Franke et al., 2019) and the first Finnish psychometric scale for measuring affinity for technology interaction. Secondly, as the previous research has analyzed the ATI scale using factor analysis and non-parametric IRT methods Lezhnina & Kismih ok, 2020), our analysis utilized also parametric IRT methods using PCM. Lastly, we provided an estimate of the gender effect relating to the affinity for technology interaction.

Limitations and future research
There are limitations in our study that need to be considered. Using an online questionnaire as a medium could be a source of common method bias (Podsakoff et al., 2003). Data were collected from university students having secondary education backgrounds and who pursued at least a bachelor's degree in their studies. Also, the sample consisted of mostly relatively young people. The sample characteristics mentioned above limit the generalizability of the results with respect to the general population, which is the population of interest of the ATI scale. Specifically, the results concerning the gender differences in ATI can only be applied to the population used in this research (i.e., Finnish university students).
However, it is notable that the fairly large sample covered a broad range of fields of study, and thus it can be seen as sufficiently representing Finnish university students. The translated version of the scale and the results presented here can be useful in research (e.g., educational technology, social media use, technology adoption) targeting university students. Considering the solid results, simple language used in the scale, and the nature of the construct, one could assume that the internal structure of the translated scale in the general population could follow the results presented here. The promising results presented in this study should encourage researchers to conduct more extensive studies. Future research should complement the findings of this research by examining the properties of the translated scale version and gender differences among older people and people with more diverse educational backgrounds, using samples representing the general population, and using other mediums for provisioning the questionnaire.
In general, results concerning gender-based differences in affinity for technology interaction should be treated with caution. The threat of a stereotypical interpretation is important to consider as it can have detrimental effects in various situations (e.g., Barber, 2020;Cadaret et al., 2017;Doyle & Thompson, 2021). Thus, it is worth noting that research examining gender-based differences has the potential to advance unfounded and stereotypical beliefs if not conducted with rigor and interpreted with care. On the other hand, it is essential to examine the differences, for example, from the equity point of view. For that purpose, functional measurement instruments are valuable assets.
We aimed to conduct an accurate and thorough translation; however, in a cross-cultural research, the complete equivalence between different languages is challenging to achieve. Thus, the presented Finnish translation should be examined in different contexts in future research. Furthermore, we did not assess the relationship of ATI with other similar constructs as there is a limited resource of related and validated psychometric scales in Finnish. Thus, future studies should examine the relationships of ATI with similar constructs within a nomological network.
We used correlated errors to modify the CFA model, which can be problematic (Hermida, 2015). Instead, another approach would be to examine the effect of the reverseworded items using a method factor (DiStefano & Motl, 2006). In general, the use of reverse-worded items in the first place is a controversial topic, and future studies should address the issue of how different types of reverse-worded items affect the factor structure (Zhang et al., 2016). Furthermore, the item ATI3R exhibiting lower qualities would need to be assessed critically and possibly revised as was also noted in a previous study (c.f., Lezhnina & Kismih ok, 2020).

Conclusion
We analyzed the psychometric properties of a forward-backward translated Finnish version of the affinity for technology interaction (ATI) scale. The analysis utilized factor analysis, non-parametric IRT, and parametric IRT. To conclude, the Finnish version of the ATI scale showed solid psychometric properties. Furthermore, the scale proved to be essentially unidimensional, having high reliability estimates, and forming a strong Mokken scale. The scale also showed differential validity by identifying a gender difference with respect to the measured construct: men showed slightly more affinity towards technology among the respondents in the sample; however, the effect size was small.

About the Authors
Ville Heilala is a doctoral researcher at the Faculty of Information Technology at the University of Jyv€ askyl€ a. He holds two Master's degrees, one in education and one in computer science. His research relates to understanding human learning using computational methods.
Riitta Kelly is an English teacher at the Centre for Multilingual Academic Communication, University of Jyv€ askyl€ a, Finland. She is also a doctoral researcher in Applied Linguistics at the University of Jyv€ askyl€ a.
Mirka Saarela is a research fellow at the Faculty of Information Technology at the University of Jyv€ askyl€ a. Her research interests combine machine learning, explainable artificial intelligence, cognitive computing, and education.
P€ aivikki J€ a€ askel€ a is an Adjunct Professor and a Senior Researcher at the Finnish Institute for Educational Research at the University of Jyv€ askyl€ a, Finland. Her research focuses on university student agency, learner-centered pedagogy, and teacher development in higher education.
Tommi K€ arkk€ ainen has been serving as a full professor of Mathematical Information Technology at the Faculty of Information Technology at the University of Jyv€ askyl€ a since 2002. He has led almost 50 R&D projects, supervised over 50 Ph.D. students, and published over 180 peer-reviewed articles.

Appendix A. Translations of the introductory text and Likert options
The original ATI scale, the items, and the introductory text can be found from the original scale developers' article  and from their website.

Parametric item response theory
The partial credit model (PCM) (Masters, 1982)-an extension of the dichotomous Rasch model for polytomous data-is the simplest of all polytomous IRT models widely used in various measurement and assessment scenarios (Masters, 2016, p. 110). Bond et al. (2020, p. 238) define a Rasch model as "a theoretical mathematical description of how fundamental measurement should operate with social/psychological variables". They continue that "no real, empirical data will ever fit Rasch's theoretical ideal," however, the question is more about how closely the model supports the measurement decisions one wants to make (Bond et al., 2020). One important decision a researcher usually wants to make is whether the respondents can be ordered based on their total score. While PCM is a strict model, it is the least strict of models that still have the property of stochastic ordering of latent variable by the total scale score (i.e., SOL by X þ ) (Hemker et al., 1997;Ligtvoet, 2012;Sijtsma & Hemker, 2000;van der Ark, 2005). PCM has been used to analyze scales relating to, for example, attitudes (e.g., Aghekyan, 2020), technology interaction (e.g., Makransky et al., 2017), and behavioral intent (e.g., Chen & Jin, 2020).PCM estimates two parameters: the ability of the person in the latent variable and the item difficulty representing the locations of the item at a point in the latent variable continuum where it is equally likely to choose either of the extreme categories (Masters, 2016, p. 111). Another characteristic of PCM is that the category thresholds are allowed to vary between items, which in turn can give valuable information about the functioning of the response categories between items (Wetzel & Carstensen, 2014). The PCM model was fitted in R using eRm package (Mair & Hatzinger, 2007b) and conditional maximum-likelihood (CLM) procedure, which has mathematical and epistemological advantages (Mair & Hatzinger, 2007a).Unidimensionality was assessed using principal component analysis of residuals (PCAR). Under unidimensionality, it is expected that the partial credit model explains all the variance in the data, and the residuals represent random noise. Thus, unidimensionality could be supported if all eigenvalues obtained from the PCA of the residuals are less than two and there are no contrasting item loadings on the first component (Bond et al., 2020, p. 254). Local dependency (LD) was evaluated using Q 3 statistics, which is the Pearson correlation between the raw item residuals (Yen, 1984). The difference between the largest observed correlation with the average of the observed correlations denoted as Q 3, Ã greater than 0.20 could indicate LD (Christensen et al., 2017).The goodness of fit at the item level was assessed using unweighted mean-square (i.e., outfit) and weighted mean-square (i.e., infit) statistics. Optimal infit and outfit values are close to 1.0, whereas statistically significant values greater than 1.0 indicate underfit (i.e., unpredictability and excess variation in the model), and values less than 1.0 indicate overfit (i.e., deterministic response patterns and less variation in the model) (Bond et al., 2020, p. 241). Overfitting can occur, for example, in the case of item redundancy, and it can cause smaller standard errors, inflated reliability, and local dependency (Bond et al., 2020, p. 241). On the other hand, underfit degrades the measurement quality and is in practice more important to diagnose than overfit (Bond et al., 2020, p. 241). Outfit and infit statistics were calculated based on the conditional residuals (M€ uller, 2020b) using iarm package (M€ uller, 2020a).We examined the category probability curves visually to find out possible disordered categories (i.e., reversed deltas or disordered thresholds). The ordering of category thresholds should follow the ordering of the actual response categories. Disordering of the categories can occur, for example, when a category is not the most likely category at any point along the latent variable (Wetzel & Carstensen, 2014). Disordered categories could indicate dependence among the underlying items; however, rash judgments should be avoided as disordering does not necessarily imply that the item is not functioning as expected (Adams et al., 2012). Another visual representation, person-item map (a.k.a, Wright map), depicts the respondents in terms of their ability and items in terms of their    Figure C1. Item characteristic curve of the item ATI4 showed an ordered category structure. Similar ordering was found in items ATI1, ATI8R, and ATI9. Figure C2. Item characteristic curve of the item ATI2 showed a slightly disordered category structure. Similar ordering was found in items ATI3R and ATI5. Figure C3. Person-item map of the PCM where items are ordered based on their location. Numbered circles represent the intersection points of adjacent item characteristic curves. Asterisk on the right marginal marks the items exhibiting category disordering.
location and thresholds along the latent variable continuum depicted as logit scale (Bond et al., 2020). Thus, a person-item map can be used to evaluate, for example, ordering the response categories and the coverage of the items along with the latent variable.

Results
All results for PCM were similar with and without outliers, and results without outliers are reported here. In the post-hoc analysis, the assumption of unidimensionality was assessed using PCA of the residual correlations. The largest eigenvalue was 2.19, which was slightly greater than the suggested threshold value of 2. Further examination showed contrasting item loadings. RW items and ATI7 showed negative loadings, and all other items showed positive loadings on the largest component. The results indicated that the residuals possibly contained unexplained variance, which would compromise the assumption of unidimensionality from the PCM point of view. Residual correlations ( Q 3 ¼ À0.12, Q 3, sd ¼ 0.16, Q 3, min ¼ À0.36, Q 3, max ¼ 0.23, Q 3, Ã ¼ 0.35) indicated the existence of local dependency. Seven item pairs out of 36 exhibited Q 3, Ã > 0:20, and two item pairs, ATI2-ATI4 and ATI2-ATI5, had Q 3, Ã > 0:30. The effect of including outliers was minimal for both assessing unidimensionality and the local dependency.Item fit indices (Table C1) indicated significant outfit and infit for all other items except ATI6R, ATI7, and ATI9. Significant overfit could be expected because of item content similarity. However, only items ATI3R and ATI8R showed significant underfit, which can degrade the measurement results (Bond et al., 2020, p. 241). For the item ATI3R, the misfit was extreme, indicating unpredictability and excess variation. The unpredictable pattern of the item ATI3R was also noticeable in the scalogram ( Figure 3) and in other previous analyses.
Items ATI1, ATI4, ATI8R, and ATI9 showed a plain and ordered category structure (Appendix Figure C1). On the other hand, items ATI2, ATI3R, and ATI5 showed disordered thresholds between two adjacent categories (Appendix Figure C2), and items ATI6R and ATI7 showed one extremely narrow category. Notably, the middle response categories Slightly disagree and Slightly agree showed disordering or highly narrow range in the latent variable (Appendix Figure C3), indicating it was more probable to prefer the extreme categories.
The person-item map showed that the items cover a wide range of the latent variable continuum (Appendix Figure C3). Items ATI4 and ATI9 showed relatively evenly distributed category thresholds. Item ATI8R proved to be the most challenging item to agree with, providing information from the higher end of the continuum. Especially, item ATI9 showed several desirable properties: it covers a wide range of the latent variable with ordered and relatively equal thresholds, it has a central location in the middle of the continuum, and its item fit statistics are adequate.
Appendix D. Results for data including outliers Figure D1. Scalogram containing only outliers (n ¼ 39) suggested by MMCD75, respondents ordered by total score, and items ordered by total item sum. Figure D2. EGA for the complete data including outliers.