Monitoring the recovery-stress states of athletes: Psychometric properties of the acute recovery and stress scale and short recovery stress scale among Dutch and Flemish athletes

ABSTRACT The Acute Recovery and Stress Scale (ARSS) and the Short Recovery and Stress Scale (SRSS) are recently-introduced instruments to monitor recovery and stress processes in athletes. In this study, our aims were to replicate and extend previous psychometric assessments of the instruments, by incorporating recovery and stress dimensions into one model. Therefore, we conducted five confirmatory factor analyses (CFA) and determined structural validity, internal consistency, and construct validity. Dutch and Flemish athletes (N = 385, 213 females, 170 males, 2 others, 21.03 ± 5.44 years) completed the translated ARSS and SRSS, the Recovery Stress Questionnaire for Athletes (RESTQ-Sport-76), the Rating of Perceived Exertion (RPE) and the Total Quality of Recovery (TQR). There was a good model fit for the replicated CFA, sub-optimal model fit for the models that incorporated recovery and stress into one model, and satisfactory internal consistency (α=.75 – .87). The correlations within and between the ARSS and SRSS, as well as between the ARSS/SRSS and the RESTQ-Sport-76 (r = .31 – -.77 for the ARSS, r = .28 – -.63 for the SRSS), the RPE (r = .19 – -.23), and the TQR (r = .63 – -.63) also supported construct validity. The combined findings support the use of the ARSS and SRSS to assess stress and recovery in sports-related research and practice.


Introduction
Optimising the balance between recovery and stress can enhance performance and decrease the risk of injury and illness for athletes, making it a critical aspect of training and coaching (Smith, 2003).Therefore, researchers and practitioners are constantly searching for methods to capture both recovery and stress.Indeed, it is recommended to closely monitor the physical and psycho-social recovery and stress of athletes during the training process, as both recovery and stress are highly "intertwined and interdependent constructs" (Kellmann & Kallus, 2001).
Monitoring practices can be performed by measuring physiological, psychological, biochemical, and immunological responses (Kenttä & Hassmén, 1998).However, it is not feasible to collect these measures on a daily basis, because they are often invasive, costly, and time-consuming (e.g., biochemical markers such as creatine phosphokinase need to be derived from blood samples) (Hug et al., 2003;Lehmann et al., 1998;Petibois et al., 2002;Robson, 2003;L. L. Smith, 2000).To tackle these disadvantages, athlete monitoring through self-report was introduced as a valid and time-efficient alternative (Saw et al., 2015).Furthermore, it is suggested that self-report is more sensitive to training than physiological, biochemical, and immunological measures (Montull et al., 2022;Saw et al., 2015).Therefore, it is mostly the preferred method for athlete monitoring in practice (Taylor et al., 2012).
Ideally, self-report measures include the cause, intensity, and frequency of recovery and stress-related activities, or their consequences, such as fatigue, muscle soreness, or mood and concentration disturbances.This allows coaches, staff, and athletes themselves to adjust the cause (e.g., adjust the load), or intervene on the process (e.g., cognitive restructuring) or the consequences (e.g., adapt the recovery strategies) (Fry et al., 1994;Kenttä & Hassmén, 1998).Recently, the Acute Recovery and Stress Scale (ARSS) and the Short Recovery and Stress Scale (SRSS) were developed to assess the emotional (Kellmann & Kölling, 2019), physical, and mental aspects of recovery and stress on a day-to-day basis.Studies conducted in the UK and Germany are promising in light of the psychometric properties of the scales.For instance, the questionnaires showed satisfactory internal consistency and convergent validity was supported by correlations with the RESTQ-Sport-76 (Hitzschke et al., 2016;Nässi et al., 2017).In addition, all scales of the ARSS were affected by the changing loads of a field hockey training camp (Kölling et al., 2015).For the full details of the development and psychometric properties in the German and English cohorts, we refer to the manual of the ARSS and the SRSS (Kellmann & Kölling, 2019).
There are, however, important steps to be made to establish the validity of the ARSS and SRSS.First, previous validation studies proceeded from two separate models, which implies that recovery and stress are two independent and unrelated constructs.However, the so-called "scissors model", as defined by Kallus and Kellmann (2000), suggests that recovery demands and stress states are interrelated (Kallus & Kellmann, 2000;Kellmann, 2010).Moreover, from a psychometric perspective, it is important to include the interrelation between the domains in one model to test whether the "scissors model" fits.Second, given the purpose of the ARSS and SRSS to frequently monitor the load and recovery of athletes, it is important to determine its validity with respect to the daily load and recovery experienced.Currently, the Rating of Perceived Exertion (RPE) and Total Quality of Recovery (TQR) are widely used in practice (Borg, 1982;Kenttä & Hassmén, 1998), and considered as general measures of exertion and recovery.A combination of measures such as the RPE and TQR, and the ARSS and SRSS could have significant benefits for practice.For instance, they could design a workflow in which the single-item measures function as a quick assessment, followed by a comprehensive and more nuanced assessment of the complex and multidimensional nature of recovery and stress with ARSS and SRSS.Hence, in a next validation step, the relations between the ARSS/SRSS and these single-item questions need to be determined.Third, as cultural diversity of sports teams is increasing (Maderer et al., 2014), validation in a broader population is warranted.Because practitioners might want to compare athletes from various nationalities within one team, it is necessary that questionnaires have the comparable psychometric qualities (Hambleton & De Jong, 2003).Translating and adapting questionnaires for different context is common practice in other fields such as psychology (Hambleton & De Jong, 2003;Van De Vijver & Hambleton, 1996), and its importance has recently been stated in sport science as well (Jeffries et al., 2020).Hence, although the English and German questionnaires revealed promising initial results for different samples of athletes, validation in a broader population is required.
This study therefore aims to advance the ongoing process of validating the ARSS and SRSS among Dutch and Flemish athletes.After we translated the ARSS/SRSS, we replicated the analysis of the structural validity according to the analysis done by Kölling et al (2020) for the purpose of comparison with earlier results (Kölling et al., 2020).Next, we determined the structural validity with five alternative models that included both the recovery and stress dimensions (for the proposed structure of the models, see Appendix 1).Then, we followed the COnsensusbased Standards for the selection of Health Measurement INstruments (COSMIN) guidelines and analysed the internal consistency (Mokkink et al., 2019), and construct validity with the RESTQ-Sport-76, RPE, and TQR in a large group of athletes.

Participants
To properly validate the ARSS/SRSS, we aimed to include at least 320 participants aged 16 years or older from various endurance and team sports.This sample size was chosen according to the upper limit of the rule of thumb that Terwee et al. (2007) proposed for factor analysis (#items *10) (Terwee et al., 2007).To ensure a representative sample of the population, we considered all genders, athletes with and without disabilities, different levels of sports, and athletes from all regions in the Netherlands and Flanders, Belgium.Therefore, the research population was recruited through the Dutch Sports Federation, Flemish sport federations, university student athletes, and from the circle of acquaintances of the researchers.All participants were native Dutch speakers.
The study protocol was approved by the Ethics Committee (PSY-1920-S-0513), and informed consent was obtained from all athletes.Of the 850 athletes we contacted, 385 athletes (aged 16-57 years) completed the full questionnaires, which were then considered for analysis.The five sports with the highest number of participants were soccer (n = 74), athletics (n = 29), field hockey (n = 28), volleyball (n = 27), and basketball (n = 27).Participants competed at the Olympic (n = 24), continental (n = 35), national (n = 266), or regional (n = 60) levels.The descriptive statistics of the included athletes are shown in Table 1.

Translation procedure
The English versions of the ARSS/SRSS were translated through a parallel back-translation procedure (Vallerand, 1989).Both questionnaires were translated into Dutch by six sports scientists including academic staff of the of the Rijksuniversiteit Groningen, the Vrije Universiteit Brussel, and experts in endurance and team sports (i.e., rowing and football).All group members individually translated the items of the English ARSS/SRSS.The English version was used because this version is used worldwide, incorporates extra adjectives, and the outcomes can serve as a reference (Kellmann & Kölling, 2019).
First, the agreements and disagreements between the six translations were analysed.Items that were identical in at least three out of the six translations were considered to have sufficient agreement.In cases of greater variation, different translations were considered.The following ordered procedure was used for the consideration of the items: a) use in sports, b) translation closest to English, c) German equivalent, and d) use of the Dutch version of the RESTQ-Sport-76.After this procedure, a group of nine sports scientists, sports psychologists, and applied sports scientists (including members who translated the questionnaire) had the opportunity to provide feedback on the Dutch translation of the items.In case of disagreement, consensus was reached after one more round of feedback.
After agreement on the Dutch version, the result was translated into English by a near-native English speaker.Then, the original English questionnaire and the new English questionnaire were compared.Any ambiguities were discussed until a consensus was reached.Finally, the Dutch version was pretested with a small group of athletes, who were asked to provide feedback on the questionnaire.Their feedback on the items or questions was used to address ambiguities.No items were added or removed compared to the English version.

Design and measures
Participants received a link to an online environment named Qualtrics (2022) (Qualtrics, 2022).This survey included demographics with questions about age, gender, sport type, and sport level.This was followed by questions about the last training, such as duration (in blocks of 15 minutes), the RPE (on a scale of 6-20 "no exertion at all -maximal exertion") (G. A. V. Borg, 1982;E. Borg & Borg, 2001), TQR (on a scale of 6-20 "no recovery at all -maximal recovery") (Kenttä & Hassmén, 1998), and time since last training (in half hours).Subsequently, participants filled out the ARSS, SRSS, and RESTQ-Sport-76.
The ARSS consists of a list of 32 adjectives related to recovery and stress that are preceded by the sentence: "at this moment I feel/I am".Each item describes a different state of recovery or stress (e.g., "strong" or "muscle exhaustion").The items are grouped in eight scales, of which four describe the Recovery dimension (Physical Performance Capability, Mental Performance Capability, Emotional Balance, Overall Recovery).The four other scales describe the Stress dimension (Muscular Stress, Lack of Activation, Negative Emotional State, Overall Stress).Means and total scores of these scales are calculated.The SRSS is a compact version of the ARSS and consists of eight items that correspond to the eight scales of the ARSS (Hitzschke et al., 2016).For the SRSS, the items of the corresponding ARSS scale serve as descriptors for each item (e.g., Muscular Stress is described by muscle soreness and muscle stiffness).The items of the SRSS are rated in relation to the highest recovery or stress state of the athlete.Both the ARSS and SRSS items are rated on a Likert-type rating scale from 0 (does not apply at all) to 6 (fully applies).For full details of these questionnaires, we refer to the manual by Kellmann and Kölling (2019) (Kellmann & Kölling, 2019).
The Dutch RESTQ-Sport-76 is composed of 76 questions that can be answered on a Likert scale from 0 (never) to 6 (always) (Kellmann & Kallus, 2001;Nederhof et al., 2008).The statements refer to the frequency of perceptions of stress and of recovery activities in the last week (e.g., last week, I had muscle pain after performance or last week, my body felt strong).The questionnaire consists of 19 scales, which provide insights regarding non-sport and sport-specific aspects of recovery and stress.For further information see the manual (Kallus & Kellmann, 2016).This study was published as a preprint on SportRxiv (https:// sportrxiv.org/index.php/server/preprint/view/290)(Brauers et al., 2023).

Statistical analysis
For analysis, we used R with the packages Lavaan, semTools, and semPlot (R Core Team, 2020; Epskamp et al., 2015;Jorgensen et al., 2022;Rosseel, 2012) Descriptive statistics were calculated and the means (M) and standard deviations (SD) were determined for all values.
The structural validity of the ARSS was determined with confirmatory factor analysis (CFA) rather than exploratory factor analysis, because the factor structure has been determined previously (Kellmann & Kölling, 2019;Kölling et al., 2020).The analysis of the structural validity was done in two steps.First, we replicated the steps described by Kölling et al. (2020) and performed three CFAs with robust maximum likelihood estimators (first-order model, hierarchical model, and a bifactor model) (Kölling et al., 2020).Second, we conducted five CFA's (orthogonal first-order, single-factor, bifactor, oblique lowerorder, and a higher order model) in which we included all items to assess the proposed multidimensional structure of the ARSS (for the proposed structures, see appendix 1).Initially, in accordance with the models described by Kölling et al (2020) (Kölling et al., 2020), we allowed correlation between the error variances of the items strong and physically capable, muscle exhaustion and muscle fatigue, as well as between muscle soreness and muscle stiffness.To describe the global fit of the models, we reported the root mean square error of approximation (RMSEA), comparative fit index (CFI), Tucker-Lewis Index (TLI), and standardized root mean square residual (SRMR).Following the recommendations by Credé and Harms (2015) (Credé & Harms, 2015), we did not interpret the global fit indices using arbitrary cut-off values (as these were not developed for higher order models) but rather present the change in the χ 2 statistic when comparing different models using the alternative approach as described by Satorra and Bentler (2001) in combination with reporting the RMSEA, CFI, TLI, and SRMR (Satorra & Bentler, 2001).Ideally, if the χ 2 statistic is non-significant at an alpha level of > .01, the CFI is high, and the SRMR and RMSEA are low, the global fit of the model is assumed good.We did not determine cross-cultural validity of the ARSS/SRSS because the sample sizes were too small.
After we examined the structural validity, we assessed the internal consistency with Cronbach's α.Next, the corrected item-total correlation was calculated to assess the strength of the relationship between individual items and the total score of the scale that the item belongs to.Finally, we determined the inter-item correlations between different items within the scale.
Because no gold standard is available, we determined construct validity rather than criterion validity.According to the guidelines proposed by the COSMIN initiative (Mokkink et al., 2019), we formulated hypotheses about the magnitude of the relations within and between the ARSS/SRSS, the RESTQ-Sport -76, RPE, and TQR.Based on previous research (Kölling et al., 2020;Nässi et al., 2017), we formulated the following hypotheses: 1) there are moderate to large positive correlations within the Recovery and Stress domains of the ARSS and SRSS, as well as moderate to large negative correlations between the Recovery and Stress domains of the ARSS and SRSS; 2) there are large to very large positive correlations between the ARSS and SRSS; 3) there are significant positive correlations between the ARSS/SRSS scales with similar dimensions on the RESTQ-Sport-76 and significant negative correlations with opposite scales; and 4) there are significant positive correlations between the exertion and recovery factors (i.e., RPE, TQR) and stress and recovery scales of the ARSS/SRSS.If 75% or more of the proposed hypotheses were confirmed, the concurrent validity of the questionnaire is considered good (Terwee et al., 2018).The correlation coefficients were determined with Pearson correlations (r) and considered trivial (r < .1),small (.1<r≤.3),moderate (.3<r≤.5),large (.5<r≤.7),very large (.7<r≤.9),almost perfect (r > .9)or perfect (r = 1) (Hopkins et al., 2009).The alpha level was set at .05.

Structural validity
After examining the inter-item correlations, corrected itemtotal correlations, and the CFA's, one item (MPC2; receptive) was deleted which had the highest negative contribution to Cronbach's alpha, corrected item-total correlation, and a low factor loading.
First, we replicated the models described by Kölling et al. (2020)(see appendix 2) (Kölling et al., 2020).In addition, we estimated an orthogonal first-order, single-factor, bifactor, oblique lower-order, and a higher order model first-order CFA, a bifactor CFA, and a higher order CFA with all items of the ARSS.Although none of the alternative models reached optimal global fit values, the oblique lower-order model was retained for further analysis as it had the best global fit.Table 2 displays the full details of the factor loadings, Figure 1 display the results of the CFA of the oblique lower-order model, and appendix 3 the standardized factor loadings and the correlation matrix.

Internal consistency
The descriptive statistics for the ARSS and SRSS are presented in Tables 3 and 4. Internal consistency of the ARSS scales ranged between α = .59and α = .87.The corrected item-total correlations ranged between r = .47and r = .79,and were all significant.
The ARSS also demonstrated good internal consistency for the Recovery dimension α = .91and the Stress dimension α = .90.
As the SRSS is a condensed version of the ARSS, each item was supported by four example adjectives from the ARSS.Internal consistency of the SRSS was good for both Recovery (α = .78)and Stress (α = .75)dimensions.The corrected itemtotal correlations ranged from r = .55to r = .65for the Recovery dimension and from r = .31to r = .72for the Stress dimension.The correlations within the items were ranging from .33 to .61 for the Short Recovery Scale and from .18 to .67 for the Short Stress Scale (Table 5).The correlations between Recovery and Stress ranged from −.17 to −.68.All correlation coefficients were significant (p < .05).

Correlations within and between the ARSS and SRSS
All correlations between the corresponding scales and items of the ARSS and SRSS (Table 5) were significant (p<.05), and could be considered as large to very large (r ranged from .65 to .77).

Correlations between RESTQ-Sport-76, ARSS and SRSS
Convergent validity of the ARSS and SRSS was assessed by examining their scores in relation to the RESTQ-Sport-76 (Figure 2).The ARSS Negative Emotional State showed the largest correlation with the RESTQ-Sport-76 Emotional Stress (r=.69).Accordingly, the SRSS Negative Emotional State showed the largest correlation with the RESTQ-Sport-76 Emotional Stress (r=.66).Overall, 291 of the 304 correlations were significant, and the coefficients were moderate to large, while the coefficients with the SRSS were consistently smaller.Considering the hypothesis-relevant relations between the different questionnaires, a congruent pattern was found for the ARSS and SRSS.This pattern showed positive correlations among the related areas and negative correlations between the opposite areas for both stress and recovery.For example, the ARSS's Muscular Stress showed larger coefficients with the RESTQ-Sport-76 Injury (r=.53) scale, but not with Self-efficacy (r=-.03) or Self-regulation (r=-.12).

Correlation between RPE and TQR and ARSS and SRSS
Finally, we calculated the correlations between the RPE, TQR, and ARSS/SRSS (N = 385).There were significant correlations between the RPE and Overall Recovery (r=−.23) and Muscular Stress (r = .19).The TQR and all scales of the ARSS, except of Lack of Activation, were significantly correlated, of which Overall Recovery (r = .63),Muscular Stress (r=−.63), and Overall Stress (r=−.51) had the largest correlations.Similar patterns were found for the SRSS, although the magnitude of the correlations was smaller, and fewer correlations were significant (see Appendix 4).

Discussion
This study aimed to determine structural validity, internal consistency, and construct validity of the ARSS and SRSS in a large group of Dutch-speaking athletes.Our findings indicate that  Table 6.Predefined hypotheses and instances in which the hypotheses are confirmed.

Hypothesis
Confirmed in: There are trivial to large positive correlations of .3 to .7 within the Recovery domain of the ARSS and Stress domains of the ARSS.
6/6 There are trivial to large positive correlations of .3 to .7 within the Recovery domain of the ARSS and Stress domains of the SRSS.
4/6 There are trivial to large positive correlations of .3 to .7 within the Stress domain of the ARSS and Stress domains of the ARSS.
6/6 There are trivial to large positive correlations of .3 to .7 within the Stress domain of the ARSS and Stress domains of the SRSS.
4/6 There are trivial to large negative correlations of .3 to .7 between the Recovery and Stress domains of the ARSS.
13/16 There are trivial to large negative correlations of .3 to .7 between the Recovery and Stress domains of the SRSS.
12/16 There are moderate to large positive correlations of .5 to .7 between the ARSS and SRSS; 8/8 There are significant correlations between the ARSS scales and SRSS items with similar dimensions on the RESTQ-Sport-76.
291/304 There are significant positive correlations between the exertion and recovery factors (i.e., RPE, TQR) and stress and recovery scales of the ARSS.9/16 There are significant positive correlations between the exertion and recovery factors (i.e., RPE, TQR) and stress and recovery items of the SRSS.
these short questionnaires are easy to administer and that their structural validity, internal consistency, and construct validity are sufficient.An interesting observation is that the descriptive statistics of the ARSS and SRSS were similar to those previously reported (Hitzschke et al., 2016;Kölling et al., 2020;Nässi et al., 2017).When comparing the means and standard deviations of all specific adjectives for the ARSS, the results most closely resembled those of Kölling et al. (2020) and Hitzschke et al. (2016).The difference in mean scores between the current study and these studies only exceeded one point for the first item of Overall Recovery for the study by Hitschke et al. ( 2016) (Hitzschke et al., 2016).

Structural validity
To determine the structural validity, we replicated the analyses as described by Kölling et al. (2020) and retrieved similar global fit indices (Kölling et al., 2020).Next, we applied multiple CFAs that included both the recovery and stress dimensions.On the whole, at least psychometrically speaking, Recovery and Stress can be seen as intertwined and interdependent constructs with clear, and correlated, underlying scales but the global fit of this model can still be improved.For instance, during the estimation of the final model, a warning was generated indicating that the variancecovariance matrix of the estimated parameters was not positive definite.This may be caused by multicollinearity or by having too few observations relative to the number of parameters in the model.Despite this warning, the results of the model are presented for the sake of completeness, but should be interpreted with caution.Although, the global fit of the recovery-stress model is not optimal, this model reflects the interrelatedness of recovery and stress which is in line with the holistic nature of the "scissors model" and findings of previous studies on the relation between recovery and stress (Kellmann & Kallus, 2001;Neumann et al., 2022;Sansone et al., 2020).Therefore, the inclusion of both recovery and stress dimensions in one model offers a comprehensive assessment of the athlete's overall recoverystress state.By considering the interrelatedness of these dimensions, the model can capture the complex nature of the athlete's recovery-stress state more accurately.Hence, this model has the potential to add valuable insights to the fields of sports science and it would be interesting to improve the model further by applying it in other contexts such as in athletes with other cultural backgrounds or in high-stakes occupations.

Internal consistency
The deletion of the Mental Performance Capability item receptive increased internal consistency, with all scales being above the reliability threshold of α>.70.Next to that, the corrected item-total correlations (r > .47)and inter-item correlations (r > .41)all provided evidence for a good reliability of the scale.These high values are in accordance with validation studies among German and English samples (Kölling et al., 2015(Kölling et al., , 2020)).We discussed several possible alternative items and checked with the authors of the original questionnaire whether there were German items that could be considered.However, the items that they proposed were already covered in the final Dutch translation, and there was a consensus among all translators to delete the item in the final translation.

Construct validity: RESTQ-Sport-76
To determine the construct validity, we specified five hypotheses a priori based on previous studies (Kölling et al., 2020;Nässi et al., 2017).Of all the possible correlations, 90% were as expected.However, 4 out of 19 correlations between the Muscular Stress scale of the ARSS and the RESTQ-Sport-76, and 4 out of 8 correlations between the scales of the ARSS and the RESTQ-Sport-76 Self-regulation scale were not significant.In addition, there were some small differences with the previous studies.The slightly weaker relations for the SRSS items could be explained by the fact that the SRSS exists of one question, whereas the scales of ARSS and RESTQ-Sport-76 are based on four questions, which makes the outcome more robust (Diamantopoulos et al., 2012).A possible explanation for these differences could be that the Dutch RESTQ-Sport-76 refers to the preceding week, whereas the German and English versions refers to the preceding three days/nights.

Relation with RPE and TQR
To assess whether the ARSS and SRSS scores were related to previous training, we studied the association between the preceding training RPE and subsequent recovery.There were some significant but trivial correlations between the RPE and ARSS and SRSS.An explanation for the trivial correlations is most likely that enough time elapsed for full recovery between the end of the training and the moment that the athlete completed the questionnaire, as this was on average 34.57hours (see Table 1) (Doeven et al., 2018).Another explanation could be the reduced precision of RPE as a result of recall bias (Scantleburry et al., 2018).Kölling et al. (2015), found a relation between the intensity rated by the coach and the ARSS.However, they collected the ARSS twice a day, before and after training, during a 5-day training camp, thus excluding recall bias.Furthermore, the present study neglects the timeframe between training and measurement of stress and recovery states as the questionnaires were filled out independent of training.Thus, it is unknown what kind of activities an athlete has engaged in since their last training.This means that athletes can be exposed to stressors in daily life, or perform recovery-enhancing activities after the training (Otter et al., 2015).This could theoretically influence the relation between the preceding RPE and the score on the ARSS or SRSS at a later moment (Otter et al., 2015).
In contrast to the relation with the RPE, there were more significant relations between the TQR and the ARSS and SRSS.One could argue that TQR integrates different recovery dimensions into one question.This suggests that TQR could serve as an early marker and that the ARSS and SRSS could be used to further distinguish between recovery dimensions (Kenttä & Hassmén, 1998).For instance, the TQR could be applied on a daily basis before the training or match, and when the scores deviate from normal, the ARSS and SRSS could be used to determine what aspects of recovery should be intervened on.The same approach could be applied after training with the RPE and the stress dimensions of the ARSS and SRSS.However, these dimensions should be assessed immediately after training to improve precision and avoid recall bias.With such an approach, practitioners benefit from the properties of single item questions that minimise respondent burden and have a high face validity, while maintaining the superior psychometric properties of multipleitem scales such as the ARSS and SRSS (Jeffries et al., 2023).

Limitations, and future research
This study has limitations that must be considered.First, we received 385 valid responses while 850 athletes received a link to the questionnaire.Therefore, a selection bias could not be ruled out.However, the absolute number of respondents was above the target (N = 320).This means that with the current number of participants, the study design is adequate to assess construct validity (Mokkink et al., 2019).For the multigroup CFA, however, the required sample size (n = 200 per subgroup) was not met and therefore, these results should be interpreted with caution (Terwee et al., 2007).
For future research, it would be fruitful to better understand the reasons behind the lower fit of the models including both the recovery and stress dimensions.These reasons can be diverse and hence relate to, amongst others, instruction, translation, grouping of items, anchors used, and scoring.For instance, the ARSS instruction asks athletes to rate their current state, whereas the SRSS the instruction asks athletes to rate their current state compared to their best recovery or stress state ever.These different instructions could have led to confusion among respondents when filling out both questionnaires.Additionally, the CFA revealed high factor loadings between some Dutch items and their underlying factor (appendix 3; seven correlations > .8),which makes it difficult to determine the unique contributions of each item.This could mean that some correlations should be omitted from the model, or that items with high factor loadings within a scale should be collapsed (Chen et al., 2001).If confirmed in other studies, the number of items may be reduced.Future studies could determine whether these choices affect the measured construct and could add new items or remove items to reduce multicollinearity, and to determine if the global model fit of a model that includes both recovery and stress could be improved.

Conclusion
This study provides a new step in the validation process of promising new scales to measure recovery and stress: the ARSS and SRSS.Our psychometric assessment revealed novel evidence that recovery and stress could be considered as intertwined constructs which is in line with the theorised model.However, the model fit of the models that include both constructs is less than ideal.In addition, we found valuable evidence that the Dutch translations of the ARSS and SRSS show sufficient construct and convergent validity, and are both correlated with total quality of recovery.Therefore, these questionnaires can be used by coaches and athletes to assess the perception of recovery and stress in sports-related research and practice.For instance, to measure recovery and stress before and after training.Future research could focus on unravelling the underlying relations between recovery and stress to further improve the structure of the questionnaires.
Note.ARSS = Acute Recovery and Stress Scale, df = Degrees of Freedom, RMSEA = Root Mean Square Error of Approximation, CFI = Comparative Fit Index, TLI = Tucker-Lewis Index, SRMR = Standardized Root Mean Square Residual.Because we did not include the "other" gender in the multigroup CFA, the total number of participants for the multigroup confirmatory factor analysis on gender does not add up to 385.

Figure 1 .
Figure 1.Final oblique lower-order model with all items of the ARSS.Note: PPC = Physical performance Capability; MPC = Mental performance Capability; EB = Emotional Balance; OR = Overall Recovery; MS = Muscular Stress; LA = Lack of Activation; NES = Negative emotional State; OS = Overall stress.
Note: The upper matrix describes the correlations within the ARSS scales, the lower matrix describes the correlations within the SRSS items.PPC = Physical Performance Capability, MPC = Mental Performance Capability, EB = Emotional Balance, OR = Overall Recovery, MS = Muscular Stress, LA = Lack of Activation, NES = Negative Emotional State, OS = Overall Stress, * = p<.05.

Table 1 .
Description of the included athletes.

Table 2 .
Results of the confirmatory factor analysis of the Dutch ARSS for the total sample and subsamples.

Table 3 .
Means, standard deviations (SD), standardized alphas, corrected item-total correlations, and inter-item correlations of the Dutch ARSS for the sample.Scores range between 0 (does not apply) and 6 (fully applies).
Notes: PPC = Physical Performance Capability, MPC = Mental Performance Capability, EB = Emotional Balance, OR = Overall Recovery, MS = Muscular Stress, LA = Lack of Activation, NES = Negative Emotional State, OS = Overall Stress, a item deleted.Table4.Means, standard deviations (SD), standardized alphas, corrected item-total correlations, and inter-item correlations of the Dutch SRSS for the sample.Scores range between 0 (does not apply) and 6 (fully applies).

Table 5
presents the correlations within the ARSS scales and Table6presents the summary of the hypotheses and results for determining construct validity.Within the Recovery dimension (Table5: upper left quadrant) of the ARSS, r ranged from .50 to .73.Within the Stress dimension (Table5: lower right quadrant), r ranged from .22 to .69.The correlations between Recovery and Stress (upper right quadrant) ranged from −.16 to −.73.All correlations were statistically significant (p < .05).

Table 5 .
Pearson correlations within the Dutch ARSS scales, the Dutch SRSS items, and between the scales/items.
Nässi et al., 2017), 63 out of 304 (21%) correlations differed by more than .10(but almost always less than .20).The correlations between the SRSS and the RESTQ-Sport-76 showed fairly the same results as the results of the ARSS scales, but were somewhat smaller.Within the Short Recovery Scale of the SRSS, the highest correlations were found with the RESTQ-Sport-76 scales of Overall Recovery and Physical Recovery.