The Abbey Pain Scale: not sufficiently valid or reliable for assessing pain in patients with advanced cancer

Abstract Background Patients with advanced cancer can be unable to verbalize their pain. The Abbey Pain Scale (APS), an observational tool, is used to assess pain in this setting, but has never been psychometrically tested for people with cancer. The aim of this study was to assess the validity, reliability, and the responsiveness of the APS to opioids for patients with advanced cancer in a palliative oncology care setting. Material and Methods Patients with advanced cancer and poor performance status, drowsiness, unconsciousness, or delirium, were assessed for pain using a Swedish translation of the APS (APS-SE) and, if possible, the Numeric Rating Scale (NRS). The assessments using APS were conducted simultaneously, but independently, by the same raters on two separate occasions, approximately one hour apart. Criterion validity was assessed by comparing the APS and NRS values using Cohen’s kappa (κ). Inter-rater reliability was determined using the intraclass correlation coefficient (ICC), internal consistency using Cronbach’s α, and responsiveness to opioids using the Wilcoxon signed-rank test. Results Seventy-two patients were included, of whom n = 45 could rate their pain using the NRS. The APS did not detect any of the n = 22 cases of moderate or severe pain self-reported using the NRS. The APS at first assessment had a κ of 0.08 (CI: −0.06 to 0.22) for criterion validity, an ICC of 0.64 (CI: 0.43–0.78) for inter-rater reliability, and a Cronbach’s α of 0.01 for internal consistency. The responsiveness to opioids was z = −2.53 (p = 0.01). Conclusion The APS was responsive to opioids but displayed insufficient validity and reliability and did not detect moderate or severe pain as indicated by the NRS. The study showed a very limited clinical use of the APS in patients with advanced cancer. KEY MESSAGE The Abbey Pain Scale (APS), an observational pain assessment tool, is used in patients with cancer who cannot verbalize their pain. However, when psychometrically tested, the APS did not display sufficient validity or reliability, so it cannot be recommended for clinical use in patients with advanced cancer.


Background
Patients with cancer suffer from many different symptoms, some of the most common symptoms being lack of energy, fatigue, and pain [1][2][3].Although the prevalence of pain is high in patients undergoing curative cancer treatment, being found in 40% [4] of them, it is even higher in patients with advanced, metastatic, or terminal cancer, among whom its prevalence is 58-74% [2,5,6].Pain management is not always successful, and inadequate pain assessment has been described as a major obstacle when identifying physicianrelated barriers to cancer pain management [7][8][9].When assessing pain, self-reporting is considered the gold standard [10,11], and the most widely used self-report pain instruments are the Numerical Rating Scale (NRS), the Visual Analogue Scale (VAS), and the Verbal Rating/Descriptor Scale (VRS/VDS).All these are considered valid and reliable for clinical use [10], although the NRS is recommended for patients with cancer according to patients' preference and in order to try to standardize which scale to use [11][12][13].In the NRS, the patient verbally rates the pain intensity on a scale of 0-10, in which 0 represents 'no pain' and 10 is 'the worst pain imaginable' [11].Many different wordings of anchor labels exist [11], and efforts are being made to create an international standard for anchor labels in the future [12].
As cancer progresses, it becomes increasingly difficult for some patients to express their pain due to delirium, sedation, or imminent death [14].When patients can no longer verbally communicate their pain, observational pain assessment by staff members is needed instead.The Abbey Pain Scale (APS) is an observational pain assessment scale recommended by the Australian Pain Society [15] and the British Geriatrics Society [16] for patients with cognitive impairment.
It is also mentioned in the Swedish National Clinical Practice Guidelines for Palliative Care [17] as a tool for patients with dementia.The APS was developed in Australia in 2004 to assess pain in people with end-stage dementia in residential aged-care homes [18].It quickly became the most widely used pain assessment tool for this purpose in Australia [19] and has been translated and psychometrically tested in Spain [20], Italy [21] Japan, [22], and Denmark [23].
The APS was created to be an easy-to-use assessment scale and consists of six items: vocalization, facial expression, change in body language, behavioral change, physiological change, and physical change.Each item is illustrated by different examples, such as 'vocalization, e.g., whimpering, groaning, crying', that are rated from absent to severe (0-3 points) and are summed to a total pain score.The staff member completes the instrument by ticking boxes for the total pain score, ranging from 'no pain' (0-2 points) to 'severe pain' (14-18 points), and by classifying the pain as chronic, acute, or acute on chronic.The original study demonstrated a reasonably degree of validity between the APS and nurses' holistic pain assessments when calculated with gamma, 0.59 (p 0.001), and a Cronbach's a for internal reliability of 0.74.The numbers for inter-rater reliability were not accounted for but declared as modest [18].
In Sweden, the APS is mainly distributed by the Swedish Register of Palliative Care (SRPC), a national quality register that collects data on end-of-life care [24].The SRPC distributes the scale to almost 650 Swedish healthcare facilities, such as nursing homes, specialized palliative care units (stand-alone palliative care units, palliative hospital wards, and outpatient homecare), and, to some extent, hospitals.Although the APS was designed for pain assessment in patients with dementia, it is also widely used to assess pain in patients with other diagnoses in Sweden, such as cancer [Maria Andersson, SRPC register manager, personal communication, 25 September 2020].
The APS has been culturally adapted and translated into a Swedish version, the APS-SE [25].A previous qualitative study of the APS in patients with advanced cancer revealed that it was not optimal in this context [26].Some parts were not considered useful, and the pain score tended not to reflect the healthcare professionals' intuitive perception of patient suffering [26].The aim of the present study was accordingly to assess the validity, reliability, and the responsiveness of the APS to opioids for patients with advanced cancer in a palliative oncology care setting.

Recruitment and setting
Recruitment was performed in care units that already had implemented the APS-SE: [25] four oncology wards at a university hospital and a stand-alone specialized palliative inpatient care unit.During weekly contacts, healthcare staff members were asked which patients potentially fulfilled the inclusion criteria.The charts for the suggested patients were then reviewed and the responsible nurses or physicians were consulted regarding the inclusion and exclusion criteria.
Patients 18 or more years old with advanced cancer were considered eligible if they exhibited poor performance [27], delirium [28], aphasia, and/or drowsiness/unconsciousness, [29].The patient could not be included if diagnosed with dementia, had an intradural catheter, or if less than four hours had passed since the last given as needed dose of opioid.For details, see Table 1.
Patients were included regardless of staff members' preunderstanding of whether or not the patients were suffering from pain at the time of assessment.Patients able to communicate were given oral and written information about the study and signed an informed consent form.If patients were unconscious, suffering from delirium or fatigue, and unable to give informed consent, they were included without consent as permitted by the Swedish Ethical Review Authority.If any family members were present at the time of inclusion, they were informed and asked whether they believed that the patient wanted to be included.Although such proxy decisions do not exist in Swedish legislation, family member assumptions regarding patient opinions on participation were respected.

Data collection
The pain assessments were executed the day of inclusion.The assessments using APS were conducted simultaneously, but independently, by the same raters on two separate occasions, approximately one hour apart.Prior to the first assessment of each patient, a brief review of the APS was made with the staff member involved in the rating.Rater number one, either a nurse, physician, or researcher, varied between the patients, but the same staff member performed both the first and second assessments of each patient.Rater number two was the same researcher, a physician, during all assessments of all patients.
All patients able to communicate were asked whether they consented to physical examination including measuring the blood pressure (BP), as BP outside normal limits is an example of a physiological change given in the APS.If the patient was not communicative, the examination was limited to only checking the body temperature by lightly touching the patient.When doable, the patient was asked about any physical changes such as skin tears or pressure areas, otherwise the nurse in charge was consulted.
After completing the APS in the first and second assessments, all communicative patients were asked to rate their pain with NRS by using the standardized phrase 'On a scale from 0 to 10, where 0 corresponds to no pain and 10 corresponds to worst pain imaginable -can you describe your pain right now?'.If they did not answer in digits even after a repeated request, they were considered incapable of completing the NRS.The scores and information about any patient requests for pain medication were forwarded to the nurse in charge, who then decided whether to give the patient pain medication, and if so, what, as part of routine healthcare.If the patient was given any medication between the two assessments, the kind of drug, dose, and route of administration were recorded.Time of death (if the patient was deceased by the end of data collection) was recorded from the patients' charts.Data were collected between February 2022 and March 2023.

Validity testing
Criterion validity, i.e., the association between the APS score and the gold standard, i.e., self-reported NRS, was determined using Cohen's kappa (j).Criterion validity was only calculated for the first pain assessment.The mean APS score between raters one and two at the first assessment was calculated for each patient, and then translated to pain categories: no, mild, moderate, or severe pain.The APS score followed the original study's categorization: no pain ¼ 0-2 points, mild pain ¼ 3-7, moderate pain ¼ 8-13, and severe pain ¼ 14-18 [18].The self-rated NRS score was translated to: no pain ¼ 0 points, mild pain ¼ 1-4, moderate pain ¼ 5-6, and severe pain ¼ 7-10 [30].Criterion validity was also calculated for no pain versus pain as assessed with the APS (0-2 vs. !3points) compared with the NRS (0 vs. !1 point).

Reliability testing
Inter-rater reliability, i.e., whether two independent observers agreed in their APS ratings, was determined using the intraclass correlation coefficient (ICC) based on a single-rating, absolute-agreement, two-way, random-effects model [31].The ICC was calculated at both the first and second assessments separately at n ¼ 50 and at n ¼ 72 patients.
Internal consistency reliability, i.e., how the individual items in the APS relate to one another and the APS score as a whole at the first assessment, was determined using Cronbach's a [32].A separate a value was calculated for the subgroup of patients who consented to BP assessments.

Responsiveness to opioids
Responsiveness, i.e., the ability to detect a change in the mean APS score and NRS score (if possible) between the first and second assessments, was determined using the Wilcoxon signed-rank test.The responsiveness was calculated separately for the group receiving opioids and, as a control, the group not receiving opioids.

Sample size
When planning this study, a statistician active in the field of cancer research was consulted.The statistician concluded that there was no certain way to calculate power due to the uncertainty of the how the different APS scores would range within the target population.In previous studies validating APS, 50-171 patients have been included [18,[20][21][22][23], so we aimed for at least 50 patients with the possibility of including up to 100.When analyzing the data from 50 patients, the inter-rater reliability test resulted in an ICC with a wide confidence interval, even though it has been suggested that 30-50 is a reasonable number of patients for estimating reliability [31,33].We accordingly decided to include additional patients to a total of 72, but this resulted in only a slightly narrower confidence compared to the confidence interval at 50 patients.

Results
The participants A total of 72 patients were included in the study (see Table 2).
The patients were bedridden during n ¼ 134 (93%) of the assessments and sitting in a chair during the rest, n ¼ 10 (7%).The median time between the assessments and death was 11 days (range 0.5-201).At the time of analysis, n ¼ 70 (97%) of the patients were deceased.The first assessment could be made as soon as more then four hours passed since the last given as need opioid had been administrated.a At least one inclusion criterion had to be present for patients to be eligible for inclusion.b Patients meeting either of these criteria were excluded.
Most participants, n ¼ 51 (71%), received ongoing regular opioid medication with a median equivalent dose of 80 mg of morphine p.o. per day (10-935 mg p.o.).The rest, n ¼ 21 (29%), had a prescription for opioids only as needed, with or without ongoing regular paracetamol and/or non-steroidal anti-inflammatory drugs (NSAIDs).Drugs for neuropathic pain, gabapentin or pregabalin, were given regularly to n ¼ 11 (15%) of the patients, to complement the opioids.In n ¼ 26 (36%) of the cases, the patient was given opioids as needed between assessments 1 and 2, with a median equivalent dose of 12.5 mg morphine p.o. (10-108 mg p.o.).

Validity
The mean APS score for all items at the first assessment was 2.35, with each item contributing as follows: vocalization, 0.28; facial expression, 0.66; change in body language, 0.13; behavioral change, 0.42; physiological change, 0.38; and physical change, 0.48.
The criterion validity, i.e., the association between pain categories assessed using the APS versus the self-reported NRS (n ¼ 45) at the first assessment was j ¼ 0.08 (95% CI: À0.06 to 0.22) (for the cross-tabulation table, see Table 3).The corresponding j value for the subgroup that answered the NRS question and did not suffer from delirium (n ¼ 38) was 0.15 (95% CI: 0.01 to 0.31).The association between the APS and NRS ratings regarding pain versus no pain for all patients was j ¼ 0.04 (95% CI: À0.19 to 0.26).Regardless of removing the patients with delirium or not, the j values were 0.2, which is considered slight [34].None of the patients had more than mild pain according to the APS scores, although when using the NRS 13 patients selfreported moderate and nine severe pain at the first assessment.
Internal consistency reliability, i.e., how the individual items in the APS relate to one another and to the APS score as a whole at first assessment, was indicated by a Cronbach's a of 0.01.When each item was separately deleted, the Cronbach's a ranged from À0.22 to 0.28 (see Table 4).
For the group of completed BP assessments (n ¼ 56), the Cronbach's a was 0.05, and when each item was separately deleted, the Cronbach's a ranged from À0.19 to 0.32 (see Table 4).An a value <0.5 is considered unacceptable [32].

Responsiveness to opioids
For the group (n ¼ 26) receiving opioids between the first and second assessments, there were statistically significant decreases in both the APS (p ¼ 0.011) and NRS (p ¼ 0.003) scores.For the group not receiving opioids (n ¼ 46), there was no statistically significant change for either of the instruments (see Table 5).A total of n ¼ 45 patients self-reported pain with the NRS.Almost all, n ¼ 41, were able to use the NRS in both assessments and thus responsiveness could be calculated.However, in n ¼ 4 patients, only one assessment with the NRS was doable, making it impossible to calculate responsiveness for these patients.

Discussion
The APS is not sufficiently valid or reliable when used for patients with advanced cancer suffering from poor performance status, delirium, drowsiness, or unconsciousness.The APS displayed responsiveness to opioids and can therefore be useful to evaluate the effect of medication administered for pain, at least when assessed by the same staff member before and after administering medication.However, the APS did not detect moderate or severe pain as indicated by the NRS and the correlation with the patients' own ratings was only slight.If used as a screening instrument, there is an imminent risk of the assessment indicating no pain or a lower intensity of pain than the patient is actually suffering.This problem was previously addressed in a qualitative study of healthcare staff members' experience of using the APS in patients with advanced cancer [26].To conclude, this study showed a very limited clinical use of the APS in patients with advanced cancer.
To maintain and improve healthcare quality and research, appropriate instruments and tools are needed.An instrument is usually tested for validity when originally developed, but this is unfortunately not always the case when instruments are adopted in different countries and/or settings.To ensure validity, guidelines exist to ensure that instruments are not only well translated linguistically but also culturally adapted to maintain their content validity across different settings and cultures before being psychometrically tested [35,36].The APS has previously been scientifically translated and culturally adapted for patients with dementia in Sweden [25], but is being used in patients with cancer without any previous psychometric testing in this different population.In fact, although the APS is widely used to assess pain in patients with advanced cancer, no literature was found regarding its use in cancer populations worldwide apart from a single case report.In this case report the APS was considered successfully for pain assessment for a patient with terminal cancer with understanding difficulties [37].To our knowledge, this is the first study psychometrically testing the APS in patients with advanced cancer.When testing the criterion validity, i.e., the APS scores compared with the NRS scores as the gold standard, it was only considered slight [34].This slight correlation prevailed when comparing the ability to detect any pain between the APS and NRS.Since the applied gold standard varies between studies, the criterion validities are difficult to compare; nevertheless, our study shows less correlation than do studies of patients with dementia [18,[20][21][22][23].When APS was compared with nurses' overall holistic pain assessment, the validity of the APS was considered reasonable [18] and highly significant compared with observer estimation of pain [20].The validity was moderate compared with that of the self-reporting VRS/VDS [22,23].
Previous reports indicate that patients with cancer rate their pain higher than do their caregivers [11,[38][39][40], but there is nonetheless a striking difference between the selfreported pain and the observed pain intensity rated using the APS: although 49% of the able patients self-reported moderate or severe pain, indicating a need for pain relief, there was not a single such rating in these categories when using the APS.This indicates that the scale is too insensitive.
The inter-rater reliability was considered moderate: the ICC was 0.64 at first assessment but with a confidence interval ranging from poor to good [31].The ICC for the interrater reliability in the original study was never disclosed but was declared modest at best [18].Other studies reported ICC values that indicated good reliability, 0.82 in the Japanese study [22] and 0.84 in the Danish study [23].In both these studies the ICC was calculated between only two raters.When exploring the feasibility and clinical utility of the Japanese version of the APS, the ICC was tested for several different raters (n ¼ 23), so the inter-rater reliability decreased to 0.68-0.76[41].
The Cronbach's a for internal consistency in this study was unacceptable [32] at a ¼ 0.01.In contrast, the Cronbach's a varied between 0.52 and 0.71 [20,22,23] when calculated for patients with dementia.When looking separately at each item, there was a substantial correlation among all the items except for 'physiological change' in the Japanese study [22], whereas the Danish study [23] showed fairly good correlation between 'vocalization', 'facial expression', and 'change in body language', but no or poor correlation for the rest of the items.When tested in patients with advanced cancer, there was no substantial correlation for any of the items.The APS displayed responsiveness to the administration of pain medication to patients, as shown in the original study as well as in the Spanish and Danish study [18,20,23].As a measure of quality, such responsiveness was also calculated for the NRS, and the result was coherent with the decrease in post-intervention scores found in other studies [42,43].A two-point change in the NRS has been suggested to be a clinically meaningful change in pain [44,45], but no data have been reported for a clinically meaningful change in APS scores.
Overall, the psychometric properties of the APS are better in patients suffering from dementia than in patients with advanced cancer.This seems reasonable since the instrument was developed for patients with dementia and, accordingly, some of the suggested examples in the APS are not always appropriate for patients with cancer.Patients with dementia may also still be in relatively good physical health, compared with patients with advanced cancer, when an observational pain assessment scale is needed.Patients with dementia are more likely to be bodily active, for example to display withdrawal or rocking back-and forth while sitting in a chair, even if the patients' impaired cognitive abilities make it difficult to express their pain verbally.Patients with cancer are often confined to their bed in end-of-life, and do not present changes in body language as rocking or fidgeting.In this study, 62% of the patients self-reported pain, although the median time between the assessments and death was only 11 days.The difference in overall health between the populations, and the fact that cancer per se might yield invalid cancer-generated measurements [26], could explain why the APS is more appropriate for people with dementia than those with advanced cancer.
According to the World Health Organization (WHO), cancer is ranked 6th and dementia 7th among the leading causes of death in the world, and thus affect millions of people worldwide [46].The prevalence of pain is high and about the same in both populations, being experienced by 35-79% of patients with dementia [47][48][49] and 40-74% of patients with cancer [2,[4][5][6].In both cases, inadequate pain assessment is considered a problem in managing pain effectively [8,50,51].While this problem has been embraced in the clinical setting of dementia, leading to the development of at least eight pain assessment tools with strong or moderate psychometric evidence [51], no comparable observational pain assessment tool has been developed especially for patients with advanced cancer.It is time to develop an observational pain assessment tool for the clinical oncology setting.

Strengths and limitations
To our knowledge, this is the first study examining the validity and reliability of an observational pain assessment tool for patients with advanced cancer.
The inclusion criteria 'delirium' was determined using the SQiD instrument, which was previously found feasible for patients hospitalized with cancer [28].The SQiD was originally designed to be answered by a relative or friend, but in this study it was answered by staff members in charge of the patients.It is possible that this shift caused us to underor overestimate the number of patients with delirium compared to relatives.Friends and relatives presumably have a greater familiarity with the patient and thus more accurately can determine if the patient is suffering from delirium or not.
Participating in this study probably generated some staff discussion of how to rate different items, likely making the inter-rater reliability higher than it would have been without the ongoing study.With fewer raters involved, the ICC would probably have been higher.

Conclusion
The APS displayed responsiveness to opioids but did not display sufficient validity or reliability and did not detect moderate or severe pain as indicated by the NRS in patients with advanced cancer.The study showed a very limited clinical use of the APS in this population.Further research is needed to develop a valid and reliable observational pain assessment scale for patients with advanced cancer.

Table 1 .
Additional inclusion criteria besides advanced cancer and age !18 years and exclusion criteria.

Table 4 .
Cronbach's a if each item is deleted separately.

Table 5 .
APS and NRS Responsiveness to opioids.
a Wilcoxon signed-rank test.b p 0.05 is considered statistically significant.