What is the extent of reliability and validity evidence for screening tools for cognitive and behavioral change in people with ALS? A systematic review

Abstract Objective: This systematic review provides an updated summary of the existing literature on the validity of screening tools for cognitive and behavioral impairment in people with Amyotrophic Lateral Sclerosis (pwALS), and also focuses on their reliability. Method: The following cognitive and behavioral screening tools were assessed in this review: the Edinburgh Cognitive and Behavioral ALS Screen (ECAS); the ALS Cognitive Behavioral Screen (ALS-CBS), the Mini Addenbrooke’s Cognitive Examination (Mini-ACE), the Beaumont Behavioral Interview (BBI); the MND Behavior Scale (MiND-B); and the ALS-FTD Questionnaire (ALS-FTD-Q). A search, using Medline, PsychINFO and Embase (21/09/2023), generated 37 results after exclusion criteria were applied. Evidence of internal consistency, item-total correlations, inter-rater reliability, clinical validity, convergent validity, and structural validity were extracted and assessed and risk of bias was evaluated. Results: The cognitive component of the ECAS was the tool with most evidence of reliability and validity for the assessment of cognitive impairment in ALS. It is well-suited to accommodate physical symptoms of ALS. For behavioral assessment, the BBI or ALS-FTD-Q had the most evidence of reliability and validity. The BBI is more thorough, but the ALS-FTD-Q is briefer. Conclusions: There is good but limited evidence for the reliability and validity of cognitive and behavioral screens. Further evidence of clinical and convergent validity would increase confidence in their clinical and research use.


Introduction
Around 50% of people with ALS (pwALS) show signs of cognitive or behavioral change, with 15% having ALS-frontotemporal dementia (ALS-FTD) (1,2).Diagnostic criteria for cognitive impairment (ALSci), behavioral impairment (ALSbi), simultaneous cognitive and behavioral impairment (ALScbi), and ALS-FTD are outlined by the ALS-frontotemporal spectrum disorder (ALS-FTSD) criteria (3) and the consensus criteria for the diagnosis of behavioral variant FTD (bvFTD) (4).Several screening tools have been developed to detect changes to cognition and behavior in pwALS.
Previous systematic reviews of the reliability and validity of cognitive and behavioral screening tools for ALS have been conducted (5)(6)(7).A review of cognitive and behavioral screening tools (5) concluded that the cognitive component of the Edinburgh Cognitive and Behavioral ALS Screen (ECASc) (8) had strong clinical validity, though validity evidence of the behavioral component of the ECAS (ECASb) was limited.It was also concluded that the Beaumont Behavioral Interview (BBI) (9) assessed the widest range of behavioral impairment so was most promising but validation evidence was limited.
Another systematic review (7) concluded that the ECASc and cognitive component of the ALS Cognitive Behavior Scale (ALS-CBSc) (10) had good clinical validity.Studies in this review (7) demonstrated satisfactory clinical validity for the behavioral component of the ALS-CBS (ALS-CBSb), BBI, and MND Behavior Scale (MiND-B) (11) but highlighted that the MiND-B assesses a smaller number of behavioral domains than the other screens and may not fully assess the Rascovsky et al. ( 4) criteria for bvFTD.
Taule et al. (6) conducted a systematic review of reliability and validity evidence for the ECASc, ALS-CBSc and other non-ALS-specific measures (12)(13)(14)(15).They reported that the ECAS and versions of the Addenbrooke's Cognitive Examination (12,13) were evaluated most for the greatest number of psychometric properties.
Gosselt et al. (7) and Simon & Goldstein (5) did not review evidence for the reliability of the cognitive screens, and Taule et al. (6) only evaluated reliability and validity for the ECAS and ALS-CBS.Taule et al. (6) made no assessment of behavioral screens other than the ALS-CBS.None of these three reviews assessed the reliability or validity of the Mini Addenbrooke's Cognitive Examination (Mini-ACE) (16).While the Mini-ACE (16) is not an ALS-specific measure, it has been suggested as a suitable method of screening for cognitive change in ALS (17).Since Gosselt et al. (7), Simon & Goldstein (5), and Taule et al. (6) published their systematic reviews, further studies of the reliability and validity of the screening measures have emerged.
The aim of this systematic review was, therefore, to assess the extent of both current reliability and validity evidence for different screening tools for cognitive and behavioral change in pwALS including the Mini-ACE.The ALS-specific cognitive screening tools (i.e.screening tools designed specifically to detect cognitive changes likely to be found in pwALS) assessed in this systematic review were the ECASc and the ALS-CBSc; of note, however, the ECASc also assesses cognitive functions that are less likely to be observed in pwALS, referred to as ALSnon-specific scores, although only total ECASc scores were considered here.The ALS-specific behavioral screening tools that were assessed in this systematic review were the ECASb, the ALS-CBSb, the BBI, the MiND-B, and the ALS-FTD Questionnaire (ALS-FTD-Q) (18).

Methods
This systematic review was conducted according to Preferred Reporting Items for Systematic Reviews and Meta-Analyses (PRISMA) guidelines (19).

Eligibility
Articles in press, conference proceedings, abstracts, non-observational studies, papers not published in English, and other systematic reviews were excluded.Studies were excluded if they did not report reliability or validity statistics.
Participant samples could include pwALS or healthy controls.Reported statistics must have been based on screening tool total scores and not sub-section scores.Studies needed to have determined cutoff scores, evaluated cutoff scores based on their sensitivity and specificity, or evaluated a screen's ability to differentiate between people with ALS and people with ALS with ALSci or ALSbi (e.g. using ROC curve analysis) to provide evidence of clinical validity.
Test-retest studies where test sessions were conducted more than two weeks apart or studies that did not report the test-retest interval were excluded (20).Test-retest studies that were conducted over a longer period may be influenced by declining cognition and behavior.
Demonstrating a significant difference in scores between ALSci or ALSbi and ALS with no impairment was deemed insufficient evidence of clinical validity (21,22).

Search strategy
The Embase, PsychINFO, and Medline databases were searched on 04/05/2022; searches were updated on 21/09/2023.The search terms are listed in Table 1.The process of article selection is detailed in Figure 1.

Process of study selection
Titles and abstracts of records identified through electronic database searching were screened and duplicates were removed.Reference lists of included studies were hand searched for articles not found in the electronic literature search and 13 were identified.After exclusion criteria were applied, 37 articles remained (Figure 1).Where it was not clear whether a paper met exclusion criteria, three researchers (LD, SV, LHG) decided whether papers should be included in the review.

Data synthesis and extraction
No changes were made to the data extraction form following a pilot of search terms, exclusion criteria and data extraction.Information was extracted from articles by one researcher (LD).One study author was contacted for statistical clarification.Studies were grouped by screening tool.See Supplementary Material for further details on data synthesis and extraction.

Risk of bias and study quality
No established quality evaluation tool was used in this systematic review; instead, characteristics that were important to the types of study assessed in this review were evaluated.This permitted the tailoring of the assessment of risk of bias and study quality to the currently evaluated studies.One researcher (LD) made risk of bias and study quality assessments with input from two other researchers (SV and LHG).
A risk of bias and study quality score was assigned to studies, with high scores indicating a low risk of bias (good quality) and low scores indicating a high risk of bias (poor quality).Bias and study quality scores were calculated as total points awarded for each bias assessment criterion over total points available (not all bias assessment criteria were applicable to each study) and are given as a percentage.See Supplementary Material and table legends for details.

Results
Demographic data extracted from the included studies is presented in Table 2 and ALS data is presented in Table 3. Excluded studies are listed in Supplementary Material.Risk of bias and study quality (Tables 4 and 5) varied across studies.

Reliability
Eighteen of the 37 studies assessed internal consistency (see Table 6).
There were nine studies showing acceptable internal consistency for the ECASc (8, 23-30), while one did not demonstrate acceptable internal consistency (31) (Table 6).This latter study (31) had a lower study bias/quality score (Table 4) than all other studies that reported acceptable internal consistency for the ECASc.The ECASc has been shown to have fair to good inter-rater reliability (26,32,33).Studies that reported better interrater reliability for the ECASc had higher bias/ quality scores (see Table 5).There was no evidence of test-retest reliability for the ECASc within the parameters set in this systematic review; while Kacem et al. (29) reported test-retest statistics, their methodology and reported results did not meet inclusion criteria.Item-total correlations were unavailable for the ECASc.
Inter-rater reliability, test-retest reliability, internal consistency, and item-total correlations of the ALS-CBSc were not reported in any of the studies, within the parameters set in this systematic review.Similarly, inter-rater reliability, test-retest reliability, item-total correlation, and internal consistency for the Mini-ACE were not reported in the included studies.
The internal consistency of the ECASb was questionable as it was only assessed by one study (25) that found its internal consistency to be statistically unacceptable.While inter-rater reliability and test-retest reliability for the ALS-CBSb were not reported in any of the studies included here, one study suggested that the ALS-CBSb had acceptable internal consistency (34).
The ALS-FTD-Q had acceptable internal consistency and 80% of items had acceptable itemtotal correlations (18,28) (Table 6).Inter-rater reliability and test-retest reliability of the ALS- Extent of reliability and validity evidence for screening tools in people with ALS 3 FTD-Q were not reported in any of the included studies.
While inter-rater reliability, test-retest reliability, and item-total correlations of the BBI were not reported, three studies reported that the internal consistency of the BBI was acceptable (9,35,36) (Table 6).
One study reported that 7/9 MiND-B items had acceptable item-total correlations (37) and two studies reported acceptable internal consistency (11,37).The inter-rater reliability and testretest reliability of the MiND-B were not reported in any of the studies.

Validity
Clinical and convergent validity of the different measures are summarized in Table 7.No validity data for the ALS-FTD-Q cutoff score has been reported.
Overall, there is substantial evidence that the ECASc has good clinical validity.

Clinical validity data for the ALS-
CBSc.For the ALS-CBSc, nine studies reported validity statistics (10,25,(42)(43)(44)(45)(46)(47)) (Table 7).Three studies with good methodological quality examined the clinical validity of the ALS-CBSc (10,43,47) (Table 7).There were some differences in the cutoff scores identified by the different studies but the reported sensitivity and specificity of those cutoff scores were very good to perfect.Extent of reliability and validity evidence for screening tools in people with ALS 5 Extent of reliability and validity evidence for screening tools in people with ALS 7

Clinical validity data for the Mini-ACE.
Validity data was given for the Mini-ACE by two studies (17,26) (Table 7).For differentiating Alzheimer's dementia from pwALS, a cutoff slightly lower than the published cutoff score was identified, with excellent sensitivity and good specificity (26).For identifying ALS plus (having cognitive and/or behavioral symptoms) from ALS cognitively normal, the published cutoff had fair sensitivity and excellent specificity (17).

Clinical validity data for the ALS-
CBSb.Six papers reported ALS-CBSb validity data (10,26,34,45,47,48) (Table 7).Two studies supported the clinical validity of the ALS-CBSb (10,47).They were in agreement over the cutoff score for determining ALS-FTD and reported similar cutoff scores for ALSbi, all with very good to excellent sensitivity and fair to very good specificity.7).

Clinical validity data for the ALSFTD-Q.
There were no available studies to support the clinical validity of the ALS-FTD-Q.
As well as evidence of convergent validity between the ECASc and ALS-CBSc (25,26,42,44), there was also evidence of convergent validity between the ALS-CBSc and the Mini-ACE (26).In addition, there was evidence of convergent validity between the ALS-CBSc and other cognitive tests (45,46).
As well as the evidence of convergent validity between the ALS-FTD-Q and the BBI (52), convergent validity was demonstrated between the ALS-FTD-Q and the Frontal Systems Behavior Scale (FrSBe) (18,53), and between the ALS-FTD-Q and the Frontal Behavioral Inventory (FBI) (18,49,54) (Table 7).There was evidence of convergent validity between the BBI and the FrSBe and between the BBI and the FAB but this evidence was not extensive (9) (Table 7).There was evidence of divergent validity between the MiND-B and the ECASc, suggesting that the MiND-B measures behavioral change as an independent construct from cognitive change (37).
3.2.9.Structural validity.One study (34), using Principal Components Analysis (PCA), reported that the ALS-CBSb has a single component structure.For the BBI, another study (35) reported that five items loaded weakly on the component of behavioral impairment when using PCA but reported a clear single factor structure.When PCA was conducted for the MiND-B to confirm unidimensionality, only 44% of raw variance was explained (target: 50%) (11).Both studies were awarded high scores in the risk of bias and study quality assessment.The structural validity of the ECASc, ALS-CBSc, Mini-ACE, ECASb, and ALS-FTD-Q has not been assessed.

Discussion
The purpose of this systematic review was to assess the evidence of reliability and validity relating to screening tools for cognitive and behavioral change in pwALS.This systematic review assessed 37 studies.Most (20/37) studies focused on the ECASc.

Cognitive screens
For cognitive screening tools, the methodological quality of studies varied, with 17/26 studies achieving the maximum possible score (signifying low risk of bias and high study quality).
Twenty studies reported reliability and/or validity statistics for the ECASc (Tables 6 and 7).
Given the extensive evidence of the reliability (internal consistency and inter-rater reliability) and validity of the ECASc and given that it is ALS-specific and tailored to people with ALS who are unable to either speak or write a response, the ECASc appears to be a suitable assessment option for clinicians and researchers, based on this review.
Ten studies reported validity statistics for the ALS-CBSc (Table 7).Overall, there was good evidence for the validity of the ALS-CBSc, though evidence of its reliability was limited.The ALS-CBSc is an ALS-specific measure, but there are insufficient adjustment mechanisms in place for Note: m ¼ not relevant because whole sample was used or a separate additional sample was used.NA ¼ not applicable.ECASc ¼ ECAS cognitive section.Ã N !30 was awarded one point.ÃÃ Those using an ALS cohort were awarded one point.ÃÃÃ If raters were blinded to scores given by other raters, studies were awarded one point.ÃÃÃÃ Where a sub-sample was selected from the larger main study sample, one point was awarded if the sub-sample was randomly selected.
people with ALS who cannot write or cannot say their responses.Despite good evidence of validity, the ALS-CBSc is limited by its lack of accessibility for some pwALS.Evidence of the validity of Mini-ACE, in terms of its use with pwALS, was limited and there was no data concerning the Mini-ACE's reliability.The Mini-ACE is also not ALS-specific and makes no adjustments for writing/speech difficulties.Based on the evidence considered in this systematic review, the Mini-ACE was the weakest of the three cognitive screening tools for identifying cognitive impairment in pwALS.

Behavioral screens
The methodological quality of behavioral screen studies was generally high.Reliability and validity statistics were presented for the ECASb by only four studies (23,25,26,34) (Tables 6 and 7).The internal consistency of the ECASb was questionable as it was only assessed by one study (25) that found its internal consistency to be statistically unacceptable.However, this may not necessarily be a weakness of the ECASb since there is no published evidence for its unidimensionality and no indication that identified behaviors contribute equally to a classification of ALSbi.Evidence of validity and reliability for the ECASb was, therefore, limited, thereby reducing evidence that can be used to justify its choice for the behavioral assessment of people with ALS.
There was good evidence for the validity of the ALS-CBSb but this was not extensive and there was limited evidence of ALS-CBSb reliability.
While evidence of the validity and reliability of the ALS-FTD-Q comes from a small number of studies, the quality of evidence was good and is, therefore, a good option for clinicians and researchers.
For the BBI, validity and reliability statistics were reported by four studies (9,35,36,52) (Tables 6 and 7).The BBI is ALS-specific and very thorough, though lengthy.The evidence reviewed in this study suggests that the BBI is a good option for researchers and clinicians who are not under assessment time constraints.
Three studies evaluated the reliability and validity of the MiND-B (11,17,37) (Tables 6 and 7).Due to limited evidence of the reliability and validity of the MiND-B evaluated in this review, it is likely not the best option for clinicians and researchers, especially given that other measures are available.
Consistent with a previous review (5), the current review suggested that the ECASc was the most appropriate screening tool for cognitive impairment in ALS with the strongest validation evidence.Another review (7) also concluded that the ECASc, along with the ALS-CBSc, had evidence of good clinical validity.A third review (6), however, highlighted that assessments of the psychometric properties of the ALS-CBSc are limited.The findings of this systematic review do not contradict this statement, but this is true of all of Extent of reliability and validity evidence for screening tools in people with ALS 11  (61).Sensitivity and specificity scores of 1.0 are described as perfect; 0.90-0.99 are excellent; 0.80-0.89are very good; 0.70-0.79are good; 0.60-0.69are fair; and 0.50-0.59are medium; this descriptive system was generated for this systematic review.
the assessed cognitive and behavioral screening tools, with the exception of the ECASc.It was previously highlighted (5) that the evidence of behavioral screening tool validity was limited; this is still the case, although eight studies have since contributed to the evidence-base (25,26,29,30,34,35,37,42). The results of the current review concur with previous conclusions (5) that the validity of the BBI and ALS-FTD-Q is similar but as the BBI is more comprehensive, it may be the better option.The advantage of the ALS-FTD-Q is that it is briefer and therefore quicker to complete than the BBI.
This systematic review assessed most ALS-specific screening tools for cognitive and behavioral impairment.This is the first systematic review of several ALS-specific cognitive and behavioral screening tools, beyond the ECASc and ALS-CBS, to evaluate both their reliability and validity.It was critical of the methods used to generate cutoff scores as part of the risk of bias and study quality assessment.However, this review did not evaluate the effect of age and education on scores on screening tools and how cutoff scores may vary depending on these factors.
This review excluded test-retest reliability studies where test sessions were more than two weeks apart (20).In such cases the mean duration between test sessions was often around six months (55).While this may reduce practice effects studies may instead be measuring sensitivity to progressive deterioration.
The current review assumed equivalence of the screening measures in different languages.Given that some cognitive items were altered to be culturally suitable, adapted cognitive screening tools may have functioned differently to the original.
All but one study (26) included in this review used healthy control group data for ROC curve analysis in order to evaluate the sensitivity and specificity of screening tool cutoff scores.These multiple studies are, therefore, informative when trying to identify impairment in pwALS when compared to the normal population.The comparison of pwALS with another clinical group (e.g., Alzheimer's disease; (26)) addresses a different diagnostic question and data on sensitivity and specificity must be examined in that context rather than in the context of distinguishing pwALS from healthy controls.
The conclusion of this review is that the ECASc remains the most suitable tool for assessing cognitive impairment in ALS because of its suitability to the population and the extensive evidence of its reliability (internal consistency and inter-rater reliability) and validity.For behavioral assessment, the evidence examined here suggests that the BBI or ALS-FTD-Q are the most suitable for clinical and research use, with the BBI being the most thorough and ALS-FTD-Q being briefer.Therefore, this systematic review recommends that clinicians and researchers consider the use of the ECASc to measure cognitive change in pwALS.When assessing behavioral change in pwALS clinicians and researchers should weigh up the need for a more comprehensive as opposed to a briefer assessment and, if opting to choose other measures of ALSci and ALSbi, should appraise themselves of the psychometric properties of the measures, and of the clinical groups in which they have been validated, before adopting them for use.

Table 1 .
Search terms for systematic review evaluating the reliability and validity of cognitive and behavioral screening tools.

Table 2 .
(Continued).Language refers to the language in which the screening tool was delivered.ÃÃ ¼ Median (IQR) is reported in place of mean (sd).
Note: ALSci ¼ ALS with cognitive impairment.ALSni ¼ ALS with no impairment.FTD ¼ frontotemporal dementia.bvFTD¼ behavioral variant FTD.NR ¼ not reported.ALSPlus ¼ ALS with cognitive and/or behavioral impairment.ECASc ¼ ECAS cognitive section.ALS-CBSc ¼ ALS-CBS cognitive section.NA ¼ not applicable.Ã ¼ these statistics are representative of the wider sample of 40 ALS participants, rather than the sub-set of 18 participants for which validation statistics were generated.ÃÃ ¼ Median (IQR) is reported in place of mean (sd).

Table 4 .
Risk of bias and study quality assessment.NA ¼ not applicable as ROC curves were not generated and so gold standard classifications of impairment were not needed.ECASc ¼ ECAS cognitive section.ALS-CBSc ¼ ALS-CBS cognitive section.Ã N !30 was awarded one point.ÃÃ Those using an ALS cohort were awarded one point.ÃÃÃ If validated against a gold standard of clinical diagnosis or outcome of a full neuropsychological test battery, two points were awarded; if validated against a widely accepted cognitive/behavioral measure, one point was awarded.

Table 5 .
Risk of bias and study quality assessment for inter-rater reliability assessments.

Table 6 .
Internal consistency.Papers omitted from this table did not report internal consistency of the relevant screening tools.ECASc ¼ ECAS cognitive section.Ã Internal reliability based on McDonald's x, rather than Cronbach's a. C¼ control group.ÃÃ Both statistics were reported in different places in the paper but the corresponding author did not respond to a request for clarification.

Table 7 .
Test cutoff scores, sensitivity, specificity, area under the curve and convergent validity.

Table 7 .
(Continued).Papers omitted from this table did not report validity statistics of relevant screening tools.AUC ¼ Area under the curve.Ã ¼ used published cutoff score.ÃÃ ¼ cutoff scores varied for age and education groups.† ¼ Authors demonstrated divergent validity, that the MiND-B measures behavior, a concept that is separate to cognition (measured by the ECASc).‡