What are we measuring with the morningness–eveningness questionnaire? Exploratory factor analysis across four samples from two countries

ABSTRACT Individual variability in diurnal preference or chronotype is commonly assessed with self-report scales such as the widely used morningness–eveningness questionnaire (MEQ). We sought to investigate the MEQ’s internal consistency by applying exploratory factor analysis (EFA) to determine the number of underlying latent factors in four different adult samples, two each from the United Kingdom and Brazil (total N = 3,457). We focused on factors that were apparent in all samples, irrespective of particular sociocultural diversity and geographical characteristics, so as to show a common core reproducible structure across samples. Results showed a three-factor solution with acceptable to good model fit indexes in all studied populations. Twelve of the 19 MEQ items in the three-correlated factor solution loaded onto the same factors across the four samples. This shows that the scale measures three distinguishable, yet correlated constructs: (1) items related to how people feel in the morning, which we termed efficiency of dissipation of sleep pressure (recovery process) (items 1, 3, 4, 5, 7, 9, 13, and 19); (2) items related to how people feel before sleep, which we called sensitivity to buildup of sleep pressure (items 2, 10, and 12); and (3) peak time of cognitive arousal (item 11). Although the third factor was not regarded as consistent since only one item was common among all samples, it might represent subjective amplitude. These results suggested that the latent constructs of the MEQ reflect dissociable homeostatic processes in addition to a less consistent propensity for cognitive arousal at different times of the day. By analyzing answers to MEQ items that compose these latent factors, it may be possible to extract further knowledge of factors that affect morningness–eveningness.


Introduction
The alternations between wakefulness and sleep are negotiated through an interaction between the circadian clock (Process C) and a sleep homeostat which measures the buildup and dissipation of sleep pressure (Process S) (Borbély 1982;Dijk and Lockley 2002). A wide range of molecular and behavioral processes, including hormone levels, core body temperature, and sleep-wake patterns, follow intrinsically generated cycles of approximately 24 h, which are known as circadian rhythms. In humans, circadian typology, which produces continuous variables that can be used to classify people into categories or chronotypes (e.g., morning, intermediate, and evening type) with distinguishable morningness-eveningness (M-E) profiles, is one of the most studied individual difference in circadian rhythmicity. However, the idea that M-E reflects a continuum is today regarded as a more appropriate way of characterizing individuals than using coarsely categorized cutoff chronotype scores (e.g., Caci et al. 2008;. A number of instruments have been proposed to measure individual differences in sleep-wake cycles. These include (number of citations in Web of Knowledge as per 14 June 2020 appended in parenthesis as a measure of their use to date) the morningnesseveningness questionnaire (MEQ: Horne andÖstberg 1976, N = 2445 citations), the Composite Scale of Morningness (CSM: Smith et al. 1989, N = 620 citations), which contains 9 items derived from the MEQ and four items from the Diurnal Type Scale (DTS: Torsvall and Åkerstedt 1980, N = 277 citations), the reduced 5-item version of the MEQ (rMEQ: Adan and Almirall 1991, N = 246 citations), the Early/late Preference Scale (PS: Smith et al. 2002, N = 127 citations), the Munich ChronoType Questionnaire (MCTQ: Roenneberg et al. 2003, N = 885 citations), the Chronotype Questionnaire (ChQ : Ogińska 2011, N = 39 citations), later modified to build the Caen Chronotype Questionnaire (CCQ: see Dosseville et al. 2013, 29 citations), with further modifications by Ogińska et al. (2017), the Circadian Energy Scale (CIRENS: Ottoni et al. 2011, N = 29 citations), and the Morningness-Eveningness Stability Scale-Improved (MESSi: Randler et al. 2016, N = 37 citations), composed of four items derived from the CSM, 9 items from the CCQ (Dosseville et al. 2013), and one item from the CIRENS (Ottoni et al. 2011). With the exception of the MCTQ, which measures the extent to which rhythmic biobehavioural events correspond to environmental ones (phase of entrainment, see Levandovsky et al. 2013;Roenneberg 2015), the other questionnaires evaluate subjective diurnal rhythm, subjective phase, and M-E orientation, correlating with sleep/wake habits (see Oginska 2011), mostly using ordinal, Likert scale-type responses for questions or statements. Although most of these scales are similar in the latter respect, they use different phrasing, number of items and of alternative responses, all of which yield different and non-directly comparable psychometric properties.
The MEQ is the oldest, most cited, and one of the most predominantly used measures in chronobiology and sleep research despite its self-reported and thus subjective nature (see Adan et al. 2012 for a comprehensive review). This is possibly so because scores in the MEQ have been shown to be valid by predicting objective and subjective measures, such as intrinsic circadian period (Duffy et al. 2001), alternations in temperature (Bailey and Heitkemper 2001) and cognition throughout the day (Yoon et al. 1998), circadian secretion of melatonin (Kantermann et al. 2015) and cortisol (Bailey and Heitkemper 2001), patterns of reported sleep and wake times , and capacity to adapt to night shifts and presence of sleep disorders (Sack et al. 2007). This questionnaire enquires about preferred rather than actual timings, and scores vary between individuals due to sociocultural factors (Biswas et al. 2014;Natale et al. 2009), age (Adan and Almirall 1990;Duffy et al. 2001), sex (Adan and Natale 2002), heritability (Hida et al. 2014;von Schantz et al. 2015), and geographical characteristics, such as latitude (e.g., Miguel et al. 2017) and longitude (e.g., Shawa and Roden 2016).
The MEQ was designed to measure M-E to determine suitability for shift work and was initially validated using changes in sleep temperature that are known to follow circadian rhythms (Horne and Östberg 1976). Later studies, however, showed that the MEQ also harbors dimensions related to sleep homeostasis (Mongrain et al. 2006;Taillard et al. 2003;Viola et al. 2007).
A slower build-up of sleep pressure can result in a later preferred bedtime, while a faster dissipation of sleep pressure may favor earlier wake-up time and better general disposition in the morning (Mongrain et al. 2006). This, in part, explains why adolescents, who display eveningness preference in the MEQ and attend schools with early morning start times, achieve insufficient sleep time on school-day nights (Arrona-Palacios et al. 2015). This reflects the earlier developmental reduction of sensitivity to buildup of sleep pressure, leading to a delayed sleep propensity at this age. On the other hand, later maturation of the efficiency of dissipation of sleep pressure, results in the considerable need of adolescents for sleep and, therefore, later wakeup times than an early school start allows (see Crowley et al. 2018). The different ontogenetic time course of sensitivity to buildup and efficiency of dissipation of sleep pressure indicates the physiological separability of these two homeostatic phenomena (Crowley et al. 2018).
Furthermore, subjective amplitude, also termed distinctiveness, is another aspect of sleep-wake cycles, which is explicitly measured in only some of the scales that assess circadian rhythms (i.e., ChQ, CCQ, CIRENS, and MESSi). It reflects the degree of awareness of one's own states of hyper-and hypo-activation and of the ability to modulate psychophysiological state, such as levels of alertness/energy throughout the day or night, representing the range of diurnal variations in mood and activation (Dosseville et al. 2013;Oginska et al. 2017). Because MEQ scores correlate with the subjective amplitude subscale of the ChQ (Dosseville et al. 2013), it is possible that, to some degree, the MEQ also measures circadian amplitude and the homeostatic drive, as discussed above, beyond its ability to assess individual differences in M-E.
Hence, a deeper look into the psychometric properties of the MEQ is necessary to understand more about what the scale actually measures. One of the attributes of the MEQ is its reliability, which has been confirmed by calculating its internal consistency using Cronbach's alpha in many samples worldwide with different characteristics (e.g., Li et al. 2011). However, this index of internal consistency is based on rigid, somewhat unrealistic assumptions, that limit its use under many conditions (Mcneish 2017;Raykov 1997Raykov , 1998Raykov , 2001, including its applicability to the MEQ. One of these assumptions is that all items of the scale must discriminate the intended measured behavior equally; in other words, the magnitude of the correlation of each item with the underlying M-E indicator should be the same. There is no evidence that this is so in the case of the MEQ. Another assumption is that unidimensionality is presumed, meaning that all items of the scale should reflect a single construct, which does not seem to be the case. Di , for instance, have pointed out that the homogeneity range among items of the MEQ is low, suggesting the scale is not unidimensional. Assuming the unidimensionality of the MEQ would mean that M-E is guided by a single process. However, to understand individual differences in this respect, it must be considered that these cycles reflect the interplay between Process C, which depends on time of day (circadian time), and Process S, determined by the duration of wakefulness (Borbély 1982;Dijk and Lockley 2002) and efficiency in dissipating this need during sleep (Rusterholz et al. 2016), as explained above. Additionally, some individuals display what has been termed a bimodal response, characterized as simultaneously signs of extreme morning and evening types (Martynhak et al. 2010;Tempaku et al. 2017). These findings point to the existence of different facets of M-E that may interfere with how people answer different items of the MEQ, which can be masked when total scores of the questionnaire are used, and/or its unidimensionality is assumed.
How these homeostatic and circadian factors affect the way people self-rate their M-E with the MEQ has seldom been investigated. Some studies have proposed to do so by analyzing the MEQ via principal component analysis (PCA), one of the most frequently used model-based component extraction methods (e.g., Jankowski 2013). However, results have been mixed. For instance, Adan and Natale (2002) used the principal component extraction method to determine the MEQ dimensions in an Italian sample and reported three dimensions: Time of greatest efficiency (items 6, 11, 15, 17, 18), sleep time/sleep phase (1, 2, 10, 12, 14, and 16), and awakening time/sleep inertia (3, 4, 5, 7, 8, 9, and 13). By contrast, Li et al. (2011) found two dimensions in a Chinese sample: "sleep phase" (items 1, 2, 3, 4, 5, 6, 7, 8, 13, and 19) and "time of greatest efficiency" (items 10, 11, 12, 14, 15, 16, 17, and 18). Hätönen et al. (2008) used another statistical approach, a factor analysis with least squares and maximum likelihood (ML) estimators. Factor analysis yielded four factors in these Finnish participants: General preferences (items 9, 11, 15, 16, 17, and 19), morning activities (items 1, 4, 5, 7, 8, and 13), evening activities (item 15); and times for physical work (item 12). However, ML estimators are not ideal when answers are not on a continuous scale, such as in the MEQ, and, therefore, alternative estimators that are appropriate for analyzing ordinal scales should have been used, such as weighted least square with mean and variance adjusted (WLSMV) ones (Beauducel and Herzberg 2006).
Clearly, no pattern of aggregation of items into components is apparent from the above-mentioned studies. This may be due to two circumstances. The first is the use of diverse populations, which might present different characteristics that affect how persons answer the MEQ, as discussed above. The second is the inadequacy of PCA for proposing the structural validity of the MEQ, that is, to determine possible different underlying processes affecting the MEQ scores. Indeed, PCA is a variable reduction technique (Brown 2015), which takes the scores of large sets of measured variables (e.g., questions or items of a questionnaire) and reduces them to scores of a smaller set of composite variables called components, which preserve information from the original variables as much as possible and that are not linearly correlated with each other. The first component explains the highest variance in responses to a scale, that is, it accounts for as much variability as possible from the total answers. Each succeeding component then explains progressively smaller amounts of variance, whereas one should decide the limit between the number of meaningful components to be retained, determining if a set of items of a given questionnaire/scale can be represented more parsimoniously by a set of derived components, where the number of components should be fewer than the number of items. Another important issue is that although PCA provides the variance accounted for by each component, these are usually inflated values because errors are not accounted for (Schmitt 2011). Measurement errors can be random, such as arbitrarily choosing a specific alternative answer, misreading a question, or having responses biased by one's present state of mind. In these cases, the errors are unlikely to be the same among different respondents. Nonrandom errors can also occur and be similar among respondents, and these have to do with ambiguities in the wording of a given question, biased answers toward the end of a questionnaire due to respondents' tiredness, etc. As measurement errors increase, reliability decreases, therefore, they have to be taken into account when determining the factor structure and consistency of a scale (Edwards and Bagozzi 2000).
According to Bandalos (2018), factor analysis provides evidence based on the internal consistency (formerly called construct validity). As suggested by Di , a better alternative to describe the structure of the MEQ is to use exploratory factor analysis (EFA), a technique of structural equation modeling (SEM) which explores the structure of the data and returns the number of latent factors that emerge. In fact, latent factors are different from principal components. They are mathematically inferred from scores in items of a questionnaire that share variance and reflect underlying, non-observable constructs (Brown 2015). If only one factor emerges, for instance, this means that the questionnaire is unidimensional. Two factors would indicate that there are two underlying sources that affect how people answer the questions within a scale. EFA provides factor loadings of each question or item in each latent factor and the pattern of intercorrelations among factors. Importantly, it accounts for measurement errors, unlike PCA (Schmitt 2011). Depending on the way the analysis is set up, factors can be allowed to correlate (oblique rotation) or not (orthogonal rotation) with each factor. It is expected that factors do correlate within a scale such as the MEQ, because all questions track different underlying facets of sleep-wake cycles. Considering that these facets are not correlated would be an unrealistic assumption. In this respect, EFA is also more adequate than PCA, especially when items are in ordered-categorical format (i.e., Likert scale) as in the MEQ.
In sum, although PCA and EFA apparently may create similar results, they are conceptually different. Indeed, only EFA statistically illuminates the most adequate number of latent constructs and the factor structure that underlies a set of variables while considering underlying measurement errors (Brown 2015), the ordinal, and the intercorrelated nature of responses, as is the case of the MEQ. Whenever the purpose is to identify the unknown latent constructs, such as in the case of the MEQ, applying the EFA is the more sensible approach (Fabrigar et al. 1999), since it provides the true evidence of internal consistency.
The only study that has analyzed the factor structure of the MEQ in this way was published by Caci et al. (2008). These authors found a four-factor solution with an exploratory PCA followed by EFA using data of French participants. They named the factors peak time (question/items 1, 9, 11, 15, 17 and 18), morning affect (items 4, 5 and 7), retiring (items 2, 8, 10, 12, 16 and 19), and rising (items 3 and 13). Their choice of software (i.e., STATA), which uses maximum likelihood (ML) as an estimator, might have produced bias in the parameter estimation (factor loadings and thresholds) due to the non-continuous answer formats (i.e., Likert scale), as in the MEQ, similarly to the case of Hätönen et al. (2008). Different authors have argued in favor of estimators from weighted least-squares family in detriment of the ML estimator approach under ordered-categorical items (Beauducel and Herzberg 2006;Kaplan 2012;Muthén and Kaplan 1985). In point of fact, the importance of using factor analysis (i.e., EFA and CFA) has been acknowledged in the analyses of other M-E scales, such as the rMEQ, MESSi and CSM, as will be described below. Therefore, in order to gain insights about the possible factor structure of the MEQ, we considered the factor structure of other similar questionnaires that were determined with adequate analyses (EFA using WLSMV estimator, scree test, and oblique rotation, CFA or bifactor models).
The psychometric properties of the rMEQ (Adan and Almirall 1991) have been assessed using latent variable approaches (i.e., EFA and CFA), which are robust techniques for understanding the underlying factor structure to a set of items. Urbán et al. (2011), for instance, confirmed a one factor solution proposed by Adan and Almirall (1991). Since the rMEQ has only five items of the MEQ, its unidimensional factor structure is not applicable to that of the full MEQ version.
Regarding the MESSi, some studies have also used latent variable approaches (EFA and CFA), and confirmed a three-factor structure (e.g., Tomažič and Randler 2018). Although four items of the MESSi are derived from the CSM (CSM items 3, 4, 12, and 13), only the first two are derived from the MEQ (MEQ items 4, 5). Hence, MESSi's factor structure is not directly applicable to that of the MEQ.
The third scale analyzed with factor analytic approaches was the CSM, which, at first glance, should have more bearings on the factor structure of the MEQ, because nine of its 13 items (items 1-9) derived from the MEQ (corresponding to items 1, 2, 4, 5, 7, 9, 10, 11, and 19) and also due to the high correlation of total scores (i.e., > 0.90) between the MEQ and CSM (Caci et al. 2008;Smith et al. 1989). However, it must be stressed that the MEQ items within the CSM will not behave psychometrically in the same way, because different patterns of intercorrelations between items might emerge, and consequently, the number of factors and how the items are correlated with them might also change. Overall, the great majority of studies on the CSM show two (Bhatia et al. 2013;Díaz-Morales and Sanchez-Lopez 2004;Kato et al. 2019;Smith et al. 2002) or three Caci et al. 2005Caci et al. , 2008Randler and Díaz-Moralez 2007) interrelated factors that varied in terms of number of items, the items, themselves, in each factor, and also regarding interpretations of what they represent (see Supplementary file for a description of the findings of these papers).
However, these data serve as an indication that M-E does not fit as a single dimension, as indicated by findings of Di . Consequently, we hypothesized that the MEQ would be multidimensional, although we did not predict the number of factors or the pattern of interrelations between the items and factors. Additionally, because various prior studies using the CSM seem to have shown a factor with a large proportion of items that pertains to how people feel in the morning Caci et al. 2005;Kato et al. 2019;Randler and Diaz-Moralez 2007;Smith et al. 1989), we hypothesized that we would also find a factor with this characteristic. Prior CSM and MESSi structure investigations have provided no specific interpretation about what the different factors could represent in terms of homeostatic, subjective amplitude, and/ or circadian underlying biological underpinnings that are known to be dissociable using other methodological approaches. This is important because using an adequate statistical approach is not enough to understand the factor structure of a scale without incorporating substantive theory of what the factors indicate. An example would be factors that correspond to separable homeostatic processes (Crowley et al. 2018;Dijk and Lockley 2002;Mongrain et al. 2006;Taillard et al. 2003;Viola et al. 2007).
We, therefore, sought to determine the internal consistency of the MEQ by describing its factor structure using EFA with WLSMV in four samples drawn from two countries, Brazil and the United Kingdom. Although the diversity of these samples in terms of geographical location, sociocultural, and genetic profiles can affect the MEQ answers as discussed above, the rationale behind the use of data from different contexts was to determine a common core structure underlying possible emerging latent factors that reflect sleep patterns that are present irrespective of this diversity, which can nevertheless impacts how people behave (e.g., Rad et al. 2018). This is a first step toward dealing with comparability of models across different samples, so that future studies can use robust types of analyses across samples, such as invariance techniques, which allow direct comparisons of the effects of culture, sex, age, and so forth. In other words, obtaining similar models in samples drawn from different locations is important to produce findings that are not samplespecific and to provide clues about how they vary according to economic and sociocultural factors and context. Having two independent samples each from Brazil and the UK also enabled the determination of variations of responses within the same nation.

Participants
We used data from four different studies with a total of 3,457 adult participants of both sexes (1,636 men), ranging in age from 18 to 113 y. Study one was conducted in São Paulo, a Brazilian megalopolis (N = 1,177; 425 men; ages ranging from 18 to 80 y; mean = 30.19; [SD = 10.24] y; latitude −23.6; longitude: 46.6). Study two also involved another Brazilian cohort from the small countryside town with less than 20,000 inhabitants, Baependi (N = 1343; 782 men; ages ranging from 18 to 113 y; mean = 45.07 [SD = 17.02] y; latitude: −22.0; longitude: 44.9). Study three and four were collected from the visitors to the Science Museum in London, UK in 2001UK in and 2004UK in , (2001

Measure of morningness-eveningness
For the UK samples, the original English version of the morningness-eveningness questionnaire (MEQ) (Horne and Östberg 1976) was used, and for the Brazilian samples, the validated translation into Brazilian Portuguese (Benedito-Silva et al. 1990). This questionnaire includes 19 items evaluated on 4 or 5-point ordinal scales generating a score associated with people's morningness (higher scores) or eveningness (lower scores) after reversal of scores of some items. However, we analyzed answers to each item without reversing scores and considered ordinal scores following the instructions from the original questionnaire for the five items (questions 1, 2, 10, 17, and 18) that are rated on a continuous scale.

Procedure
All studies from which the samples were obtained were approved by local Ethics Committees, and all participants provided informed consent, conforming to international ethical standards (Portaluppi et al. 2010). Participants also provided demographic information and filled in the full version of the MEQ. The London samples were collected in person during two separate periods from visitors to the London Sciences Museum (Jones et al. 2007;Robilliard et al. 2002). The São Paulo sample was collected online during two consecutive years as a subset of a larger 13 k sample of the Brazilian population (Miguel et al. 2017). The Baependi sample was collected as part of the Baependi Heart Study through verbal interviews with a scribe (von Schantz et al. 2015). All participants were aged ≥ 18 y. Anonymized databanks from all four studies were aggregated for statistical analyses.

Statistical analysis
Descriptive statistics (proportion and counts at item levels) and polychoric correlation were performed for each sample. Exploratory Factor Analysis (EFA) was applied separately for each of the four datasets. We employed a weighted least-squares analysis using a diagonal weight matrix with standard errors and mean-and variance-adjusted (WLSMV) estimator. This is the default estimator in Mplus under ordinal items Muthén 1998-2017), and it estimates the magnitude of the loadings more precisely (Beauducel and Herzberg 2006). Oblique rotations were used (GEOMIN), which allow for correlation among latent factors because there was no a priori reason to assume that factors would be totally independent. By default, a GEOMIN rotated solution is provided in Mplus Muthén 1998-2017), which is an oblique type of rotation.
Scree test as a first inspection of the number of factors to be extracted was used based on eigenvalues. Scree tests provide a graph in which the factors form the horizontal axis and the eigenvalues create the vertical axis. The graph is used to show the point where lines drawn through the plotted eigenvalues change slope, or the last significant decreasing trend for eigenvalues (Brown 2015). Together with scree tests, standardized solutions were evaluated taking into account the theoretical background underlying the MEQ answers. As the importance of items with small factor loadings, which <0.4, can easily be overinterpreted, we focused on factor loading of each item that were > 0.4, indicating that they had acceptable correlations with their latent factors (Nunnally 1967).
Model fit indices were evaluated as described by Schermelleh-Engel et al. (2003): p-value of χ 2 -test, comparative fit index (CFI), Tucker-Lewis index (TLI), rootmean-square error of approximation (RMSEA), and standardized root-mean-square residual (SRMR). The CFI and TLI should be ≥ 0.95. RMSEA values ≤ 0.08 indicate an acceptable to good approximate model fit, while for SRMR, this should be < 0.10. The p-value of the corresponding test of approximate fit (chi-square) should be > 0.05 (i.e., non-statistically significant p-value is considered). Finally, we compared the factor solutions in all samples and the items that loaded in each factor that were common to all studied populations.

Exploratory factor analysis
EFA yielded similar intercorrelated three-factor solutions based on scree test with good fit indices in all samples (Figure 1). The GEOMIN rotated factor loadings, which serve to determine item reliability, and the correlations among factors are shown in Table 1. Table 2 shows the model fit indices across the four tested samples regarding EFA results for one-to four-factor solutions. It is important to consider that, statistically, the more factors that are extracted in the EFA, the better the fit indexes are expected to be, as shown in Table 2 for the four-factor solution across all samples. However, choice of the number of factors must also consider the Scree plot and their theoretical explainability. Comparing the fit indices and dissociable sleep-wake phenomena, the three-correlated factor solution was selected because of the small reduction in Eigenvalues with more than three factors, because it made theoretical sense (addressed in the Discussion) and showed adequate fit indices across the four samples. All chi-square p values were < 0.05. Since this metric is sensitive to sample size (Tanaka 1987) other indexes were considered. Acceptable to good fits for three-factors were found for the UK 2001 (RMSEA = 0.048, CFI = 0.984, TLI = 0.976, SRMR = 0.036) and UK 2004 samples (RMSEA = 0.060, CFI = 0.977, TLI = 0.966, SRMR = 0.041). In the São Paulo sample, the same was found for three-factors (RMSEA = 0.047, CFI = 0.977, TLI = 0.967, SRMR = 0.033). In the Baependi sample, TLI and CFI for three-factors were not acceptable, but RMSEA was acceptable and SRMR was good (RMSEA = 0.064, CFI = 0.941, TLI = 0.914, SRMR = 0.049). It is important to note that it is not a requirement that all indexes exhibit good fit in order to accept the model solution. Other elements such as interpretability of the factor solution and parsimony of the solution must be considered. For Baependi, we, therefore, also consider the three-correlated factor solution for our discussion and interpretability.
We named the three-correlated factors efficiency of dissipation of sleep pressure (Factor 1), sensitivity to buildup of sleep pressure (Factor 2), and time of greatest cognitive arousal (Factor 3).
The items in bold font in Table 1 are those with factor loadings > 0.4. This is a minimal factor loading considered as meaningful as commonly used in the literature (Nunnally 1967), because it represents at least 16% of the shared variance between the latent factor onto which the item is loaded. There were 12 items with loadings > 0.4 (underlined in Table 1) that were common among all samples in the same factors. Correlations between factors (Table 1) were < 0.8, indicating that the three factors, despite being associated, also correspond to separable constructs, showing good divergent validity.

Discussion
This paper investigated the psychometric properties of the MEQ by applying exploratory structural equation modeling (EFA). If the MEQ had simply measured a unidimensional M-E construct, all the items should have loaded on a single factor, good fit indices should have appeared, and all the factor loading should have exhibited very strong values (i.e., close to 1). The analysis revealed, however, that the best solution had threecorrelated factors exhibiting the similar structure in terms of factor loadings and their correlation underlying the factors and consequently, it was possible to identify similarities in the factor nature and their core items reproducible across the four samples from the two countries. Our study was not designed to provide evidence of what these separable factors represent; thus, we can only hypothesize about their possible biological underpinnings. However, two of the latent factors underlying the responses to the MEQ items apparently reflect separable homeostatic processes, and the third dimension may relate to subjective amplitude, as will be discussed. In all samples, the majority of items (1, 3, 4, 5, 7, 9, 13, and 19) loaded on Factor 1, which we named "efficiency of dissipation of sleep pressure", as these items describe how respondents felt when they wake up in the morning (e.g., "Assuming adequate environmental condition, how easy do you find getting up in the morning?", which was the item that had the highest loading on this factor). Faster dissipation of sleep pressure may favor earlier wake-up time and better disposition in the morning, the opposite being true when dissipation is not as efficient (Mongrain et al. 2006). This corroborates prior work with the CSM in which the most numerous factor is usually found to be related with questions/ statements about how people feel at the beginning of the day (e.g., Adan et al. 2005;Caci et al. 2005;Kato et al. 2019;Randler and Díaz-Morales 2007;Smith et al. 1989). Interestingly, item 19, which requires responders to self-rate themselves as evening, morning, or intermediate subtypes, loaded on Factor 1, suggesting that people tended to determine their own chronotypes depending on their efficiency in dissipation of sleep pressure. Turco et al. (2015) assessed the reliability of item 19 of the MEQ by comparing the results with the full questionnaire, with the time of subjective sleepiness during a waking day and also with real-life sleep timing variables. They found that healthy adults could describe their diurnal preference based on this item. Arrona-Palacios and Diaz-Morales (2016) also used this item in Mexican and Spanish adolescents and corroborated these findings. This has important consequences. For instance, genome-wide association studies have determined genetic influences on chronotype by classifying participants in morning or evening chronotypes (Jones et al. 2019). These studies did not use a full questionnaire, but were based on a single question, essentially corresponding to question 19 of the MEQ, which loaded onto Factor 1. However, our results suggest that this item may relate more to dissipation of sleep pressure (Process S) than circadian parameters (Process C). Laboratory studies have shown this to be the case for the variable number tandem repeat (VNTR) polymorphism in the PER3 gene, which had been reproducibly associated with MEQ score (Viola et al. 2007).
Results of some studies using the CSM also confirm our findings. As noted by Smith et al. (1989), " . . . the Table 2. Model fit indices of the one to four factor solutions applied to data of the Morningness-Eveningness Questionnaire (MEQ) using Exploratory Factor Analyses underweighted least square using a diagonal weight matrix with standard errors and mean-and variance-adjusted (WLSMV) estimator across the four samples. df = degrees of freedom; Acceptable factor loadings are: RMSEA = Root Mean Square Error of Approximation ≤.08; SRMR = Standard Root Mean Square Residual ≤ 0.10; CFI = Comparative Fit Index ≥ 0.95; TLI = Tucker-Lewis Index ≥ 0.95. first (and, therefore, most important factors) are identified by morning items, and the evening items are clustered only in the less significant factors. This result may reflect that the morning items are more diagnostic than evening items in predicting adjustment to schedule changes". Likewise, Adan et al. (2005) showed most items of the CSM (CSM items 3-6 and 10-12, of which the first 4 correspond to MEQ items 4, 5, 7, 9) loaded on a "morning factor". Similar results were reported by Caci et al. (2005), Kato et al. (2019), and Randler and Diaz-Moralez (2007). Moreover, there is similarity between the content of morning-related factors in different studies of the CSM and MEQ items that loaded on factor 1 in our results (items 1, 3, 4, 5, 7, 9, 13, and 19). Moreover, item 9 of the CSM, which corresponds to item 19 of the MEQ, loaded on morning activity in the factor structure of the CSM proposed by Smith et al. (1989) and Morningness/time of day preference factor in the factor structure of the CSM proposed by Kato et al. (2019). What no publication has done to date is contextualize what a "morning factor" means in terms of M-E profiles. We suggest this factor reflects the responders' ability to dissipate sleep pressure, a homeostatic process that is separable from buildup of sleep pressure, which reflects how sleepy people feel at the end of the day (see Crowley et al. 2018;Dijk and Lockley 2002;Mongrain et al. 2006;Taillard et al. 2003;Viola et al. 2007).
Factor 2 was composed of fewer items (items 2, 10, and 12) in common in all samples, items that enquire about how responders feel in the evening, which we named "sensitivity to build-up of sleep pressure", such that a slower buildup of sleep pressure would result in a later preferred bedtime (Mongrain et al. 2006). Among these items the one with the highest factor loading was "At what time in the evening do you feel tired as a result in need of sleep?" (item 10). Item 16, related to how easy the responders find physical exercise in late evening, also loaded on factor 2 in all except the UK 2001 data, showing this to reflect sensitivity to sleep pressure in many, but not all, samples.
Factor 3 was less consistent across samples, as only one item (item 11) of the MEQ loaded on this factor in all studied populations. This item relates to the timing of greatest cognitive arousal ("You wish to be at the peak of your performance for a test which you know is going to be mentally exhausting and lasting for two hours . . . "). An equivalent item related to physical arousal (items 15) also loaded on Factor 3 in three of the four samples (except UK2004). We have no explanation for this smaller loading in this specific sample, but together with the pattern of effects for item 11, our findings suggest that mental/physical arousal tends to be separable from buildup and dissipation of sleep pressure. This factor may represent subjective amplitude, thought to correspond to the range of variation in people's ability to modulate their levels of alertness, mood, and activation (see Dosseville et al. 2013;Ogińska 2011;Ogińska et al. 2017). Its inconsistency corroborates prior findings that the psychometric properties of subjective amplitude are still unsatisfactory in the scales that specifically assess it (Dosseville et al. 2013;Ogińska 2011;Ogińska et al. 2017), possibly due to two reasons: It might be as the result of the complexity of this phenomenon (Ogińska et al. 2017), or the fact that the MEQ was not designed to measure circadian amplitude. Nonetheless, the subjective amplitude subscale of the ChQ has been found to correlate (weakly) with MEQ scores (Dosseville et al. 2013). It should be considered that amplitude factors have been identified using the MESSi questionnaire in samples from different countries (e.g., Tomažič and Randler 2018), although these studies reported that the internal consistency (Cronbach's alpha) was low for the amplitude factor compared to the other two dimensions (i.e., morning and evening factors) (Tomažič and Randler 2018).
Item 18 also enquires about peak activity ("at what time of day do you think that you reach your feeling best peak?"), but it did not consistently load on any factor, not even factor 3, possibly because it was interpreted differently among responders in both countries. After all, "feeling best" is quite an ambiguous term. Additionally, this item is rated on a continuous scale; therefore, inconsistent results may also have occurred because of inadequate cutoff scores used to separate answers into ordinal responses. For instance, responders are awarded three points if the answer is that they feel best from 10:00 until 17:00 h, and two points if between 17:00 and 21:00 h. This may unrealistically separate peak arousal times and result in inconsistent scores, so these cutoff scores should be reassessed. One way of doing so would be to run EFA or confirmatory factor analysis declaring those items as a continuous variable, which would still work under WLSMV estimator. Moreover, the use of clock times in the MEQ has been criticized by some authors (see Di Smith et al. 2002) and led to development of scales without clock time items, such as the PS (Smith et al. 2002) and the Morning Affect Scale . Similarly, item 17, which also relates to arousal during the day ("suppose that you can choose your own work hours . . . "), had quite a different pattern of loadings among samples (Factor 1 and 3 in UK2004, Factor 3 in the UK2001 sample, Factor 1 in Baependi, and no factor in the São Paulo sample). This may have arisen because of the differences in the predominance of physical and/ or mental work in each sample and/or because its answers are on a continuous scale divided into ordinal scores that may be inappropriate. Overall, our results corroborate the inconsistency in measures of subjective amplitude.
Intercorrelations among factors per sample were low (r ˂ 0.450), except for the higher relation of Factors 1 and 2 in the UK2001 sample (r = 0.764). Together with what was discussed above, this suggests that the MEQ predominantly reflects at least two clearly separable, but interrelated, homeostatic aspects of sleep-wake cycles, dissipation and buildup of sleep pressure, respectively. This separability explains why, for instance, MEQ scores can indicate individuals of high efficiency of dissipation of sleep pressure (predominant in morning types), low sensitivity to buildup of sleep pressure (evening types), and also ones regarded as displaying both traits (bimodal types: Martynhak et al. 2010;Tempaku et al. 2017), indicating that these processes are indeed separable, as proposed by Rusterholz et al. (2016). There is also developmental evidence for this. In a recent review, Crowley et al. (2018) showed that sensitivity to buildup of sleep pressure markedly decreases in the early years of adolescence, while dissipation of sleep pressure only matures in early adulthood.
Directly comparing the present factor structure with the only MEQ study that used EFA (Caci et al. 2008) is not advisable, because the analyses may have been biased by the limitations of the software, which is adequate for continuous but not ordinal answer formats (Beauducel and Herzberg 2006;Kaplan 2012;Muthén and Kaplan 1985). Still, there is similarity between the factor structure reported by Caci et al. (2008) and our findings. Items 2, 10, and 12 loaded on the "retiring factor" in their study and on factor 2 (buildup of sleep pressure) in ours. Also items 4, 5, and 7 loaded on their "morning factor", which corresponded to our factor 1 (efficiency of dissipation of sleep pressure).
Regarding the 5-item rMEQ, the items that correspond to items 1, 7, and 19 of the full MEQ in the present study loaded on factor 1, while item 10 loaded on factor 2; item 18 did not load on any of our factors. Despite our study not having been designed to directly compare the MEQ and rMEQ, a descriptive contrast between them shows that these five items were not the most reliable items in our factor structure (i.e., did not have the highest factor loadings) nor were they core items (i.e., did not replicate across the samples). Hence, in terms of presenting a balanced factor structure, our findings indicate the rMEQ does not reflect the same factors tapped by the full MEQ. In other words, rMEQ is partly measuring the efficiency of dissipation of sleep pressure and only includes one item representing factor 2, which is not enough to statistically specify a second domain (minimally two indicators are needed under a multidimensional solution) (see Bollen 1989;Kenny et al. 1998).
It is noteworthy that some items loaded onto specific factors depending on country. This happened for the item related to appetite in the morning (item 6), and best time to go to bed in the evening (item 2), both of which loaded on Factor 1 only in the UK samples, apart from the common loading on Factor 2 of the latter item in all samples. These differences may be related to sociocultural (e.g., Biswas et al. 2014;Natale et al. 2009), age (e.g., Adan and Almirall 1990), or sex (e.g., Adan and Natale 2002). To find a likely configural factor structure, the same number of factors and where the items are loaded is an initial step for psychometric testing of the "stability" of the constructs across samples (see Van de Schoot et al. 2012). To this end, future studies designed for this specific purpose should examine invariance of the two-factor solution underlying the core cross-sample reproducible items of the MEQ. If the model holds despite these variations in scores, this will indicate the MEQ measures the same underlying constructs irrespective of the sample characteristics, an essential property for any tool to understand human behavior. Given the current results from our EFA study, it seems unlikely that the MEQ will achieve strong levels of invariance, for which intercepts/thresholds and factor loadings are assumed to be equal across samples. Without evidence of invariance testing, direct comparison of latent factors found here across samples will not be trustworthy (Vandenberg and Lance 2000).
Moreover, it should be considered that some items loaded onto more than one factor (cross-loadings) in the UK samples (Table 1). This happened for the item related to best time to go to bed (item 2) in both UK samples. Also, items 11, 17 and 18 showed cross loading in the UK 2004 sample. Sociocultural, age, or sex effects may explain these cross-loading regarding items 2 and 11, because they only appeared in the UK populations. Some other items also loaded very differently across samples. Item 17 loaded onto factor 3 in the UK 2001 data and onto factor 1 in Baependi and the UK 2004 data, with cross-loading in the latter sample, but did not load onto any factor in the São Paulo sample. Item 18 did not load onto any factor, except in the UK 2004 sample, with cross-loading. As Brown (2015) suggests, eliminating items with cross-loading, and future studies could consider these items as "poorly behaved items" and remove them.
It should be noted that we did not generate Cronbach's Alpha as a general reliability index of the MEQ, mainly because of two issues among others (see Mcneish 2017;Raykov 1997Raykov , 1998Raykov , 2001: (a) an assumption of this index is that the scale be unidimensional, while we found a three-factor solution underlying the MEQ; and (b) we used EFA, under which the traditional reliability coefficients cannot be properly calculatedthis is more suitable under the CFA approach, which under our rationale of firstly exploring the factor structure, would have required larger samples, in which half the sample would be used to run EFA and the other half to run CFA. Via EFA, it is possible to determine the reliability of each item, i.e., via factor loadings that show the correlation coefficient between observed variables. We also presented the correlation between the latent factors that also indicate the extent to which the factors are distinguishable from each other. Factor analyses also provide evidence-based internal consistency (Bandalos 2018), and the returned solutions and their congruence across the four samples showed some core similarities that could be seen as a first step for future studies interested in cross-cultural comparison.
Our results clearly show that the MEQ measures a multidimensional construct. However, multidimensionality does not guarantee that subscales can provide meaningful and reliable information about subdomains that are distinguishable from a general underlying construct, which is usually measured with total raw scores.
Even in the presence of multidimensionality, the use of the total scale scores (summing all the items) can be justified (Gustafsson and Åberg-Bengtsson 2010). This means that prior findings based on the total MEQ scores constituted by 19 items might be reevaluated, since a great deal of the MEQ datasets are available, excluding items that were poor in terms of reliability, i.e., small factor loadings, and items that showed many crossloadings, which may reflect their unspecificificity. After excluding items with low factor loadings, data can also be used to test bifactor models in which separable factors or dimensions are assumed to reflect a global underlying construct following analytic corrections proposed by Eid et al. (2017), using CFA departing from our EFA results considering factors 1 and 2 (factor 3 was not reliable) to provide: a) extra evidence about the adequacy of likely subscores across samples with different characteristics; and b) insightful evidence in understanding consistency, specificity, and reliability for each item. Traditional bifactor analyses have shown anomalous results, such as vanishing specific factors and irregular loading patterns. Eid et al. (2017), (2018), (2020) argue that the application of the traditional bifactor modeling requires a two-level sampling process that is usually not present in empirical studies. In these publications, Eid and colleagues also demonstrate how alternative bifactor models with a G-factor and specific factors can be derived that are more well-defined for the actual single-level sampling design. Therefore, reanalysis under the authors' bifactor specifications would be necessary in understanding the factor structure of the MEQ under a general underlying factor. Based on our results, the ideal model solution would be that of a two-correlated factor solution constituted by eight items in factor 1 (1, 3, 4, 5, 7, 9, 13, and 19) and three items in factor 2 (2, 10, and 12).
Concerning limitations, it should be pointed out that the data of the samples we used were obtained from studies that varied in terms of experimental design and were not all representative of the populations in their respective countries of origin. Therefore, we cannot be certain the results, i.e., three-correlated factor solution, would be replicated in other populations. Still, the samples were large and from two countries that differ in many respects. We focused on the items that similarly loaded on latent factors found in the exploratory factors analyses to try to disentangle aspects of M-E that are sample-unspecific from those that vary according to economic and sociocultural factors and context (see Rad et al. 2018). Obtaining a factor solution with good fit indexes is the first step to allow investigations of invariance in the MEQ responses worldwide. Invariance analysis ensures that the psychometric properties of a scale are directly comparable across cultures, age, sex, etc., disentangling bias related to item features, i.e., factor loadings and thresholds, that can vary depending (i.e., are not stable) on cross-population characteristics. This type of testing uses CFA procedures that were not carried out here. The reason for this was that although the Brazilian datasets were large, which could have allowed us to split the sample in two and calculate EFA for one half and confirm the factors structure with CFA, for the UK data, the samples were too small considering the number of thresholds to be estimated (alternative answers for all items). If we had done so, we would have faced imprecision of confidence intervals.
In summary, we conclude that the MEQ measures two consistent separable, yet interrelated, latent constructs across four different samples that seem to reflect two homeostatic processes, sensitivity to buildup of sleep pressure and efficiency of dissipation of sleep pressure, rather than circadian parameters. Items that enquire about physical and mental arousal during the day seem to be much more varied according to the population under study, even when comparing samples from the same country, possibly because of difference in culture/context and/or demographic characteristics. This paper is intended to serve as a prototype about how to conduct EFA and the strengths of using