Measuring reliable change in cognition using the Edinburgh Cognitive and Behavioural ALS Screen (ECAS)

Abstract Background: Cognitive impairment affects approximately 50% of people with amyotrophic lateral sclerosis (ALS). Research has indicated that impairment may worsen with disease progression. The Edinburgh Cognitive and Behavioural ALS Screen (ECAS) was designed to measure neuropsychological functioning in ALS, with its alternate forms (ECAS-A, B, and C) allowing for serial assessment over time. Objective: The aim of the present study was to establish reliable change scores for the alternate forms of the ECAS, and to explore practice effects and test-retest reliability of the ECAS’s alternate forms. Method: Eighty healthy participants were recruited, with 57 completing two and 51 completing three assessments. Participants were administered alternate versions of the ECAS serially (A-B-C) at four-month intervals. Intra-class correlation analysis was employed to explore test-retest reliability, while analysis of variance was used to examine the presence of practice effects. Reliable change indices (RCI) and regression-based methods were utilized to establish change scores for the ECAS alternate forms. Results: Test-retest reliability was excellent for ALS Specific, ALS Non-Specific, and ECAS Total scores of the combined ECAS A, B, and C (all > .90). No significant practice effects were observed over the three testing sessions. RCI and regression-based methods produced similar change scores. Conclusion: The alternate forms of the ECAS possess excellent test-retest reliability in a healthy control sample, with no significant practice effects. The use of conservative RCI scores is recommended. Therefore, a change of ≥8, ≥4, and ≥9 for ALS Specific, ALS Non-Specific, and ECAS Total score is required for reliable change.


Introduction
Cognitive and behavioural symptoms affect approximately 50% of patients with amyotrophic lateral sclerosis (ALS), of whom 15% develop FTD (frontotemporal dementia) and the two form a spectrum disease. Executive dysfunction, language dysfunction, and social cognitive deficits are commonly reported (1)(2)(3). The presence of cognitive and behavioural symptoms in ALS can precede motor symptoms (4), have been associated with reduced survival (5,6), disengagement with life prolonging interventions (7), and increased caregiver burden (8,9). Neuropsychological status has additional implications for end-of-life care planning, capacity to consent, and powers of attorney (2). Thus, timely and accurate knowledge of patients' cognitive and behavioural status is vital for providing person-centred care.
The Edinburgh Cognitive and Behavioural ALS Screen (ECAS) was developed to offer a comprehensive screening tool to assess the cognitive and behavioural status of patients with ALS (10). The ECAS has been validated against full neuropsychological test batteries (11)(12)(13)(14)(15)(16) and was designed specifically for patients with ALS in that it accommodates motor disability.
Cognitive decline over the course of the disease (17), or in response to specific disease factors such as respiratory insufficiency (18), has been suggested and it is consequently important to monitor progression of cognitive symptoms. Howevert, repeated assessment using neuropsychological tests may result in improvement due to practice effects, i.e. improvement in performance due to learning test content, or test-taking strategies. For clinicians, it can be difficult to interpret whether an observed difference in test performance is due to true change in the patient's situation (recovery or decline) or extraneous factors. Measurement error, regression to the mean, and practice effects can produce or exaggerate changes in performance between testing sessions. Furthermore, demographic factors such as age, education level, and baseline performance can influence change in scores between testing sessions, and therefore, obscure a patient's true performance variation (19). With regard to the ECAS, practice effects have been demonstrated with healthy controls in the executive, language, and memory domains over six months, when using version A for repeat assessments (20).
Recently, alternate forms of the ECAS have been developed to accommodate the repeated assessment of patients with ALS (21). The alternate forms of the ECAS (the ECAS-B and ECAS-C) were designed to retain the construct characteristics and level of difficulty of the original ECAS-A while reducing potential practice effects. The ECAS-B and C have shown to be equivalent to the original ECAS-A, and resistant to practice effects from repeated administration (21). It has yet to be determined what can be regarded as a meaningful change on a case-by-case basis. Numerous methods have been proposed to support the interpretation of change scores on neuropsychological tests. Most notably, clinicians have utilized reliable change indices (RCI) and regression-based methods.
With regard to RCI methods, change in scores for an individual patient is interpreted in the context of normal healthy variation, such that an observed change in a patient's score needs to fall outside of the standard error of healthy controls' test-retest variability (22). Numerous variations of the RCI have been developed which adjust for factors such as practice effects, and regression to the mean (23). Conversely, regression-based methods employ regression models to predict performance at follow-up from initial test performance. Again, significant differences between a patient's predicted and actual score are used to determine reliable change. The regression-based method additionally allows the inclusion of moderating variables such as age and education, and controls for practice effects and regression to the mean (24). RCI and regression-based methods allow clinicians to interpret patients' change scores or can provide a meaningful interpretation or endpoint for clinical trials.
The aims of this study are: (1) to examine whether practice effects are observed using the ECAS alternate forms over clinically meaningful test-retest intervals; (2) to determine test-retest reliability of the ECAS-A-B-C over clinically meaningful intervals; and (3) to compare common methods for measuring reliable change in a patient's ECAS score across serially administered alternate versions.

Participants
Eighty Irish and Scottish healthy adults were recruited representative of the demographic characteristics of ALS patients. Only those participants who completed the ECAS at two or more timepoints were included in the present study. Fifty-seven participants completed one follow-up assessment, and 51 participants completed two follow-up assessments. Exclusion criteria included: a history of dyslexia or marked premorbid reading or writing difficulties or a learning disability; nonfluent English reading and writing abilities; history of neurological conditions that could affect cognition such as major hemispheric stroke, traumatic brain injury, and severe active epilepsy; alcohol and drug dependencies; and having a known blood relative with ALS. Participants were recruited through a research volunteer panel and through local community noticeboards. Non-blood relatives of ALS patients were also recruited as control participants. All participants provided informed written consent and this research was approved by the University of Edinburgh Psychology Research Ethics Committee and the Beaumont Research Ethics Committee. Participants' travel costs associated with participation were reimbursed.

Procedure
Participants were assessed every four months for three occasions. The ECAS is an ALS-designed measure of cognitive and behavioural functioning. For the purposes of this study, the ECAS behavioural interview was not included. The ECAS consists of three versions (A-B-C) which were designed to be administered serially. Each version of the ECAS consists of 15 parallel tests, categorized into five cognitive domains. Executive, language, and verbal fluency domains are described as ALS Specific functions, while the memory and visuospatial domains are described as ALS Non-Specific. The ALS-Specific and ALS Non-Specific domains combine to generate a measure of global cognitive functioning, namely, the ECAS Total score. At each assessment point, an alternate version of the ECAS was administered such that the ECAS-A was given at Time 1, the ECAS-B at Time 2, and the ECAS-C at Time 3.

Statistical analysis
Statistical analyses were conducted using R 3.3.2. Change scores were calculated for each time comparison by subtracting the baseline score from the follow-up score, i.e. (Time 2 -Time 1, Time 3 -Time 2, and Time 3 -Time 1). Welch t-tests were used to compare change scores between centres to ensure comparability. When data did not meet assumptions of normality, Mann-Whitney U-tests were employed. In all cases, Time 1 is synonymous with ECAS-A, Time 2 with ECAS-B, and Time 3 with ECAS-C.
Test-retest reliability: of the alternate forms of the ECAS (A-B-C) was examined using intraclass correlation coefficients (ICC) with mean-rating absolute agreement two-way random effects models. ICC coefficients were calculated for the component (language, fluency, executive, memory, visuospatial) and composite (ALS Specific, ALS Non-Specific, ECAS Total) domains of the ECAS.
Practice effects: were explored using one-way repeated measures analysis of variance models (ANOVA) to examine the presence of a main effect of Time. Repeated measures ANOVAs are limited to balanced designs and only those participants who completed all three time-points were included in this analysis.
Change indices: were calculated using four types of model: two regression-based methods and two reliable change index (RCI) methods. Each method corrects for slightly different moderating effects. The RCI JT method accounts for measurement error, while the Chelune method additionally accounts for practice effects. While significant practice effects in using alternate versions of the ECAS have not previously been found (21), small improvements may be present which might not reach statistical significance. The simple regression method incorporates corrections for regression to the mean, whereby individuals who perform in the extremes tend to converge on the group mean at follow-up. Finally, the multiple regression method allows for the incorporation of potential moderating variables such as age and education that may influence change over time. Given the higher sensitivity of the ALS Specific, ALS Non-Specific, and ECAS Total scores to cognitive impairment against a full neuropsychological battery (13), change score analysis was conducted for these composite domains.

Method 1: RCI (JT method)
The first RCI model calculated was the Jacobson and Truax method (JT method) (25), which accounts for measurement error. The JT method is calculated as the difference between Time 2 and Time 1 divided by the standard error of difference ðSE diff Þ between these two time-points. The standard error of the difference is derived from the standard error of the measurement ðSE m Þ such that . The standard error of the measurement ðSE m Þ is calculated with the equation where s 1 is the standard deviation of the Time 1 ECAS (i.e. for ECAS-B, the preceding version is ECAS-A, and for ECAS-C the preceding version is ECAS-B) and r xx is the testretest reliability coefficient between these two ECAS forms. Therefore, the RCI equation using the JT method is calculated with the formula: Reliable change is defined by values larger than AE1.645 (two-tailed 90% confidence interval). The formula was then restructured to calculate the upper and lower 'thresholds' of reliable change ðXÞ i.e. the number of points increase/decrease required between two testing sessions, which constitutes a reliable change. Therefore, the equation was restructured as: The second RCI method employed is the Chelune method (26), which corrects for measurement error and practice effects. While the alternate versions of the ECAS were developed to account for practice effects, it is possible that small non-significant improvements continue to exist. Additionally, the accounting for practice effects here may help to account for any small but non-significant differences in difficulty present in the alternate versions. The Chelune method is similar to the JT method, taking the form of: Here, the denominator is again the standard ). However, the Chelune method adds a constant as to the numerator to account for systematic changes in performance such as practice effects. This is achieved by calculating the mean difference in performance between Time 2 and Time 1 ð X 2 À X 1 Þ and subtracting this from an individual's change score. As before, this equation was restructured to solve for X resulting in:

Method 3: Simple linear regression
The first regression-based method employs a simple linear regression model that predicts follow-up performance based on the preceding performance. First, a patient's predicted Time 2 score is calculated using the basic regression equation: WhereX is an individual's predicted Time 2 score, is the beta coefficient for the predictor in the model, X is the Time 1 score, and C is the intercept estimate of the model. Next, the discrepancy between the observed Time 2 score and the predicted Time 2 score is calculated and referred to as the residual (i.e. Time 2 -predicted Time 2). To extract a change index, this residual is then divided by the residual standard error (or standard error of the estimate; SEE). When values of the residual divided by the SEE are greater than AE1.645 (twotailed 90% confidence interval) a reliable change can be determined. To determine reliable change 'thresholds', the equation is restructured to solve for the residual such that: Where ðX ÀXÞ is the difference between the observed score and the estimated score. The same procedure is used for predicting Time 3 performance from Time 2.

Method 4: Multiple linear regression
The second regression-based method is a multiple linear regression model to explore whether age, education, sex, preceding performance, or testing interval affects the model's prediction. Potential predictors were selected based on their correlation with the respective outcome variable (i.e. ALS Specific, ALS Non-Specific, or ECAS Total scores). A relationship with sex was explored using Mann-Whitney U-tests. Variables of interest were entered into each model in a single block and only retained if their individual contribution to the model was significant (i.e. backward elimination). Significantly influential cases were removed based on diagnostic plots. Multiple regression equations, similar to simple linear regression equations, take the form of: Where 1 is the coefficient of the first variable of interest, X 1 is the observed score for the first variable, 2 is the coefficient for the second variable, and so on to the Jth variable. Again, this equation is restructured to solve for ðX ÀXÞ such that: effects and thus the presence of ties in the language and visuospatial subtests, formal analysis was not conducted. However, given the similarity in mean scores and the lack of significant differences for their composite domains, a lack of observable practice effects may be assumed. Therefore, no evidence of practice effects was found for the repeated assessment using the ECAS alternate versions (A-B-C) over clinically relevant test-retest intervals of three to four months.

Method 3: Simple linear regression
Simple linear regression models were built to predict follow-up scores based on previous performance. Table 3 provides data for calculating predicted ECAS-B performance from ECAS-A, and for predicting ECAS-C from ECAS-B using the equation X ¼ X þ C. This predicted score can then be converted into a change index with AE1.645 constituting a reliable deviation for predicted performance. Alternatively, the column X ÀX in Table 3 provides upper and lower thresholds calculated as AE ðX ÀXÞ ¼ 1:645ðSEEÞ.

Method 4: Multiple linear regression
Variables of interest were explored as potential moderating factors in multiple regression models using correlational analysis. DX is the change in score required to be considered significant. The Chelune method results in different upper and lower thresholds due to its subtraction of a constant. Significant variables were entered into regression models in a single block and individual variables were only retained once their contribution to the model remained significant. For the multiple regression models, the variance inflation factor for the predictors did not exceed 2. Table 4 displays the results of these models. For the prediction of ECAS-B, education is retained in the model for ALS Specific and ECAS Total scores. For the ECAS-C, education significantly added to the model for ALS Non-Specific scores. No other variables were retained in the final multiple regression models. Additional models were generated to predict ECAS-C from the combined performance on ECAS-A and ECAS-B. As with the previous multiple regression models, variables of interest were correlated with, and regressed onto, the ECAS-C. However, in this instance, the only variable retained is that of age on ALS Non-Specific.

Example data
A 62-years-old male limb-onset ALS patient with 10.5 years of education was assessed at two timepoints, with a four-month interval between Time 1 and Time 2. The patient had no behavioural or respiratory symptoms at either time-point. For the ECAS total, the patient scored 108 on the ECAS-A, and 96 on the ECAS-B. This resulted in change scores of À12 which falls below the RCI threshold for significant decline by both the JT method and the Chelune method, and the recommended clinical thresholds ( Table 2).
Using the simple regression-based method, this patient's predicted ECAS-B score is calculated using the equationX ¼ X þ C, where in this casê X ¼ ð:648Þð108Þ þ 41:62. The resulting predicted ECAS-B score is therefore 111.60 with a residual (i.e. ECAS-B minus predicted ECAS-B) of À15. ¼ À2:44 which is less than À1.645). Therefore the patient's score was significantly lower than predicted. Furthermore, the residual of À15.2 falls below the simple regressionbased threshold of AE10.24. The multiple regressionbased method includes the variable education, here 10.5 years, with the equation R 2 is the multiple R 2 when model contains one predictor and adjusted R 2 when model contains more than one predictor. SEE is the residual standard error, C is the intercept, is the beta coefficient associated with the subscript, X ÀX is the residual (i.e. the difference between the model predicted score and the observed score). The X ÀX column indicates the number of points difference required between observed and estimated score to determine reliable difference -this is calculated as 1.645*(SEE). ¼ À2:22. Again, this falls outside of AE1.645, and the residual (À13.23) is less than the threshold of AE9.82. Therefore, under all measures, the patient presents with a significant and reliable decrease in cognitive functioning.

Discussion
The monitoring of cognitive and behavioural symptoms longitudinally in ALS is integral to measurement of progression of disease, outcome of clinical trials and in providing person-centred care. While the recent development of alternate forms of the Edinburgh Cognitive and Behavioural ALS Screen (ECAS) provided tools necessary to assess neuropsychological functioning over time, the present paper aimed to provide data necessary for individual-level interpretation. Moreover, thresholds for significant decline or improvement provide viable end-points for clinical trials. An additional goal was to explore the test-retest reliability of the ECAS's alternate forms when testing across clinically relevant intervals. The present results demonstrate that the alternate versions of the ECAS provide a consistent method by which cognitive functioning can be monitored over time in patients with ALS. Building from the study by Crockford et al. (21), the present study aimed to explore whether the alternate forms of the ECAS ameliorate practice effects when administered over clinically-meaningful testing intervals. No significant practice effects were observed for the ECAS-A, ECAS-B, and ECAS-C when administered sequentially. The alternate forms of the ECAS are successful in ameliorating practice, thus confirming the findings of the authors (21).
The test-retest reliability of the alternate forms was additionally explored. Intraclass correlation coefficients (ICCs) were excellent for the composite ALS Specific, ALS Non-Specific, and ECAS Total scores. The individual cognitive subtests of ECAS performed well also, achieving ICC values greater than .70. While the ICC values for the visuospatial task appear quite low, this is in part due to the dependency of the ICC calculation on betweensubjects variability. Because there is very little between-subject variability in the visuospatial task, the small differences present are exaggerated suggesting a smaller test-retest reliability than is warranted (27). However, as noted, the composite ALS Specific, ALS Non-Specific, and ECAS Total reliability was excellent. This is particularly pertinent given the sensitivity of these domains in detecting cognitive impairment against a full neuropsychological battery (13). As such, participants who were administered the ECAS forms serially showed good consistency and stability across testing sessions.
The primary aim of this paper was to establish methods of interpreting reliable change for patients with ALS. Four models were utilized, including two reliable change indices (RCI) and two regression based (RB) methods. These methods account for slightly different factors that may influence performance change. The RCI thresholds for reliable change are the minimum increase or decrease in performance necessary to be considered reliable. Both RCI methods produced similar thresholds for all comparisons (i.e. ECAS-A to ECAS-B, ECAS-B to ECAS-C, and ECAS-A to ECAS-C).
The RCI methods proposed by Jacobson and Truax (25) and modified by Chelune et al. (26) were developed on the assumption of repeated assessment using the same version of a test. For instance, the SE m in these authors' studies is calculated using the standard deviation and test-retest reliability of the same instrument. This does not pose an issue when one considers that the test-retest reliability used in the present study is the intraclass correlations between two ECAS forms. However, RCI calculations traditionally use the standard deviation of the instrument assuming equality of variation for Time 1 and Time 2. Fortunately, the standard deviations for ALS Specific, ALS Non-Specific, and ECAS total across alternate forms were only trivially different (e.g. for ECAS Total scores, the standard deviations were 11.26, 11.38, and 11.87 for the ECAS-A, ECAS-B, and ECAS-C, respectively). It was not deemed necessary to use a measure of shared variance in place of standard deviation.
With regard to the regression-based methods, linear regression models provide predictive scores for patients based on their baseline ECAS performance, or a combination of this and demographic variables. The deviation from a patient's predicted score and their actual score is used to determine whether a deviation constitutes a reliable difference. By dividing the residual by the standard error of the residual, one can determine if an individual's departure from their predicted performance is within normal variation, i.e. variation due to measurement error, practice effects, or regression to the mean. While regression-based methods may be more complicated to calculate, they may also provide more accurate predictions that take account of important moderating variables such as education level. However, some authors have argued that regression-based methods are not necessarily superior to RCI methods (e.g. (19)). Additionally, regression-based methods do have their own limitations. As noted by Crawford and Garthwaite (24), the error associated with predicting follow-up performance from baseline performance using regression based techniques will be larger at the extremes, i.e. the residuals at the extremes are greater. Therefore, caution should be paid to interpreting change of patients who score poorly at baseline. Because the sample herein is of healthy controls while the target patient population would be expected to score toward the lower extremes, further research would be needed to clarify whether RCI and regression-based thresholds need to be developed based on initial test performance. This may be achieved by exploring these thresholds in a sample of MND patients where cognitive deterioration is not expected or found, for example in patients who possess a slower disease progression.
An important caveat in utilizing these thresholds is that the ECAS is a cognitive screening tool, and not designed to replace full neuropsychological assessment. While a patient's test-retest performance may be reliably described as a decline using the thresholds herein, such findings should be corroborated with specialist neuropsychological input.
In deciding which method to utilize for detecting reliable change, a pragmatic approach is recommended. For research purposes, the multiple regression-based methods may provide more specific indicators of change. However, these regressionbased methods are relatively more technical and complex to calculate. The ECAS was designed to be accessible to non-specialist health care professionals, and thus, the recommendation of regression-based methods may compromise the clinical utility of the ECAS. Given the similarity in scores across all four methods and the ease with which RCI methods can be included in a clinical environment, a conservative application of change scores is recommended for clinical purposes. Based on the most conservative JT method, and to reduce the number of false-positives, a change of !8, !4, or !9 points is recommended for a significant change in ALS Specific, ALS Non-Specific, or ECAS Total score, respectively.

Conclusions
Measuring the progression of cognitive symptoms in ALS has important clinical implications. Cognitive status can play an important role in how patients engage with interventions, in how clinicians engage with patients, and in what services may be appropriate. The alternate forms of the ECAS provide a method by which cognitive symptoms can be monitored over time. The present study built on this by providing a means by which a patient's change over time can be reliably interpreted. Four models of change indices were calculated. The reliable change indices may be the method with the highest clinical utility; however, regression based methods may play a role in more detailed analysis or clinical research. Additionally, the present study demonstrated that the test-retest reliability of the ECAS and its alternate forms is excellent for the ALS Specific, ALS Non-Specific, and ECAS Total scores. This, along with no evidence of significant practice effects, suggests that the ECAS-A-B-C are stable, consistent, and useful in monitoring ALS patients' cognitive performance over time.