Asking the right questions to get the right answers: using cognitive interviews to review the acceptability, comprehension and clinical meaningfulness of patient self-report adverse event items in oncology patients

Background: Standardized reporting of treatment-related adverse events (AE) is essential in clinical trials, usually achieved by using the National Cancer Institute (NCI) Common Terminology Criteria for Adverse Events (CTCAE) reported by clinicians. Patient-reported adverse events (PRAE) may add value to clinician assessments, providing patient perspective on subjective toxicity. We developed an online patient symptom report and self-management system for real-time reporting and managing AE during cancer treatment integrated with electronic patient records (eRAPID). As part of this program we developed a patient version of the CTCAE (version 4.0), rephrasing terminology into a self-report format. We explored patient understanding of these items via cognitive interviews. Material and method: Sixty patients (33 female, 27 male) undergoing treatment were purposively sampled by age, gender and tumor group (median age 61.5, range 35 – 84, 12 breast, 12 gynecological, 13 colorectal, 12 lung and 11 renal). Twenty-one PRAE items were completed on a touch-screen computer. Subsequent audio-recorded cognitive interviews and thematic analysis explored patients ’ comprehension of items via verbal probing techniques during three interview rounds (n ¼ 20 patients/ round). Results: In total 33 item amendments were made; 29% related to question comprehension, 73% response option and 3% order effects. These amendments to phrasing and language improved patient understanding but maintained CTCAE grading and key medical information. Changes were endorsed by members of a patient advisory group (N

Patients receiving systemic cancer treatment can suffer adverse events (AE) (symptoms and side effects), which can compromise treatment plans, impair quality of life and escalate to emergency hospital admissions [1]. Currently, oncology treatment-related AE are assessed using the National Cancer Institute (NCI) Common Terminology Criteria for Adverse Events (CTCAE) version 4.0 [2]. Developed to standardize AE reporting in clinical trials, criteria describe the severity of organ toxicity for patients receiving cancer treatment. Toxicity is graded from mild (Grade 1) through to life threatening (Grade 4). In routine practice the CTCAE forms the basis for: (1) recommending treatment modification or delay and use of supportive medications; (2) standardizing documentation on the type and severity of AEs; and (3) standardizing advice to oncology clinicians and patients for the management of common AE [3].
CTCAE are designed to be reported by clinicians and many can be objectively measured (e.g., deterioration of liver function tests). However, 77 represent subjective symptoms experienced by patients, which require the clinician first to assess patient symptoms, then to assign an AE severity grade (based on CTCAE descriptors) [4]. These subjective judgements can lead to variations in reporting. Clinicians may often underestimate symptom severity and not fully appreciate patient priorities [5,6].
The importance of incorporating patients' perspectives in drug development and treatment evaluations has been increasingly recognized [7][8][9][10]. In 2005, the US Federal drug agency issued detailed guidance on the use of PROMs in drug trials, emphasizing that patient benefit from a certain drug can only be claimed if this benefit is demonstrated using PROMs. Similarly, the importance of PROMs are recognized in the UK, where they have been collected in the National Health Service (NHS) since April 2009 for common elective surgical procedures, and by providers of NHS-funded care [11]. Although AE are documented consistently by physicians in clinical trials in routine care, recording of AE by clinicians and reporting by patients is variable and often omitted. The National Confidential Enquiry into Patient Outcome and Death report noted the inconsistent documentation of AE as an important factor in the sub-optimal management of patients with significant life-threatening chemotherapy toxicity, such as neutropenic sepsis [1].
Accurate and comprehensive reporting of subjective AE may be improved by asking patients to self-report their own symptoms using validated instruments. If these do not provide sufficient symptom coverage of subjective AE, specially adapted self-report versions can be employed. In a pioneering study, Basch et al. reworded CTCAE terminology (Version 3) into a patient-reported format [12] using the existing descriptions for each severity grade. This straightforward approach preserves at face value the CTCAE severity grading, which has proven clinical utility. The resulting self-reported AE items have proven acceptable to patients and have concordance with clinician evaluated AE [5]. In a subsequent US research program the NCI set out to create in a systematic way a patient-reported measurement system, based on 77 identified subjective symptoms included in CTCAE v.4, known as NCI patient-reported outcome (PRO)-CTCAE. For each AE up to three patient-reported items were developed to individually measure the frequency, severity, and/or interference with activities [4,13]. This sound approach, based on measurement science facilitates significantly patient self-reporting, but it does not provide immediate mapping of the patientreported adverse events (PRAE) severity onto the CTCAE severity grading. Further planned research by NCI will establish and validate severity thresholds for the patient-reported items that correspond to the existing clinically relevant CTCAE severity grades.
We set out to improve monitoring and managing of AEs during routine cancer treatments by developing an online system eRAPID (electronic patient self-Reporting of Adverseevents: Patient Information and aDvice) allowing patients to self-report and manage AE remotely in 'real-time' [14,15]. As part of this research program, we adapted the current CTCAE (Version 4.0) items for patient self-report using a similar approach to Basch et al. [16], i.e., rephrasing the CTCAE into patient language thus maintaining the content and the established severity grading of AE. We chose this approach rather than using the NCI PRO-CTCAE items, as for eRAPID it was essential to use the CTCAE severity grades to guide the development of a clinical algorithm to provide either selfmanagement advice for Grades 1-2 AE or alerts to patients and clinicians for Grade 3 high severity AE and validation data at that time was not available for the NCI PRO-CTCAE items [13].
To confirm the validity of the newly adapted PRAE items, we undertook cognitive interviews with patients receiving treatment for a range of common cancers in the UK. We aimed to explore the understanding, acceptability and clinical meaningfulness of the new PRAE items. This manuscript describes the first step in the eRAPID program, namely how cognitive interviews were employed to evaluate the PRAE items to ensure their accuracy and suitability for remote AE monitoring in routine oncology practice.

Development of PRAE items
Fundamental to the success and safety of the program is that the selected PRAE items are suited to frequent (weekly) completion, consistently interpreted by patients and correspond accurately to the CTCAE, as subsequent clinical management algorithms will be based on this system. To this end, we explored developing items that matched the CTCAE with one question and one response format (descriptors corresponding to severity grades) for the eRAPID symptom report questionnaire.
In total 16 of the most common treatment-related AE were identified from a review of clinical trials literature, AE reported on reputable UK cancer advice websites and PRAE from a databank of 1500 previously recorded and contentanalyzed clinical consultations [16]. AE were ranked for commonality within and between five common cancer groups (Breast, Lung, Colorectal, Gynecological and Renal) including: nausea, mucositis, nose bleeds, vomiting, palmar-plantar erythema (PPE), chills, pain, loss of appetite, flu-like symptoms, diarrhea/increased stoma activity, insomnia, rash, fatigue, constipation, peripheral neuropathy, dyspnea and thrombocytic purpura. A further three item areas were added after consultation with clinical and patient representatives: depression, anxiety and a patient-reported version of the Eastern Cooperative Oncology Group (ECOG) performance status [17] along with a question on stoma activity (21 in total).We developed a set of core PRAE items mapping directly onto CTCAE version 4 (see Table 1). The suitability of the PRAE

Cognitive interviews
For some of the AEs, our approach resulted in complex descriptions of the frequency, severity and impact of symptoms if compared with simpler response options used in symptom or quality of life questionnaires (Have you had pain? Not at all, a little, quite a bit, very much). It was important to explore how patients would respond to these items. Cognitive interviews are established methodology for pretesting questionnaires to establish patient understanding [18]. Typically verbal probing techniques are used to explore participants' cognitive processes [18]. The technique has proven utility in pretesting questionnaires in clinical and health research and is particularly effective in helping to delineate complex ideas and concepts [19].

Pilot interview
Prior to the study interviews we piloted the technique with a single female breast cancer patient advocate (SK). This enabled us to refine the interview method and determine the timing and wording of cognitive probes and the level of patient burden when completing the questionnaire.

Procedure
During a six-month period patients completed PRAE (n ¼ 21 items) alongside NCI PRO-CTCAE items (n ¼ 56). Results from the cognitive interviews on NCI PRO-CTCAE items will be reported separately. The PRAE and NCI PRO-CTCAE items were presented alternately to all patients on a computer on a single occasion in a private area in the oncology outpatient clinic, the day case unit or the acute oncology admissions unit at St James's University Hospital, Leeds, UK. Participants were observed by a researcher whilst completing items, and asked to think aloud whilst answering. The researcher noted where patients experienced difficulty (e.g., understanding questions or choosing a response). Subsequently, patients took part in an audio-recorded interview and were retrospectively verbally probed (see Figure 1) on items where they had reported difficulty. Probes were determined a priori based on common sources of error in survey questions [18]. Interviews were transcribed verbatim and managed in NViVO software version 9. Concordant with recommended cognitive interview procedure, iterative rounds of testing were employed [18][19][20].
To increase the generalizability of the adaptations made to items, 20 different patients completed all the items in three rounds of testing (60 in total). Based on the feedback, changes to items were put forward to subsequent rounds. By the final round we concluded that saturation was reached because changes were minimal [21]. Patient partners from our patient advisory group (N ¼ 11) endorsed the final changes.

Analysis
Interviews were transcribed verbatim and managed in NViVO software (version 9). In addition, all patient responses to AE items were transferred into SPSS, and the percentage of patients having difficulty with each item was computed. Initially patients' views on items were coded under the corresponding AE, e.g., nausea, pain, chills. Issues related to understanding of questions and suggestions for alternative wording were then sub-coded into the themes of question comprehension and difficulties with selecting a response were coded under response option. Where patients intimated that they may have answered differently had the question been preceded or succeeded by another, this was coded as order effects.
A consensus-based approach to item modification was adopted to avoid false problem identification by using one  researcher alone [22][23][24]. Modifications were therefore made by two researchers PH & LW and a clinician GV; some items were referred to the eRAPID project management team, clinical team and patient advisory group. Item revision was discussed after each round in view of suggestions answers/ comments from patients by reviewing: 1) audio recordings; 2) researcher notes; and 3) the percentage of patients who identified problems with the item. An item tracking matrix [19] was developed showing the item content, changes made and supportive quotes (Table 2). Preserving congruence with CTCAE criteria was an important consideration when amending items.

Item change
The percentage of patients expressing a problem with items was on average 36% (ranging from 6-100%, i.e., all patients having a problem). Sometimes, item amendments were initiated when as few as one patient expressed a problem, e.g., one patient suggested to us how incorrect information could have been elicited from the question 'Have you increased the number of times you have had to change your stoma bag?' and that 'emptying' is a more appropriate term as patients do not 'change' their bags every time they have a bowel movement. This served as a reminder that frequency of occurrence is not always indicative of the magnitude of the problem [18]. In cases where only a few patients had difficulty, we sought advice from the clinical team and patient advocates. An illustration of the iterative process in adapting the items is illustrated in Table 2 for the pain PRAE item.

Question comprehension modifications
The majority of items, including pain, nausea, stoma activity, diarrhea and fatigue, were modified to the satisfaction of participants following one or two amendments, e.g., adding 'any' as a prefix to 'have you had pain'. Other notable findings include 15% of patients in Round 1 who misunderstood nausea as, for example 'feeling dizzy' leading to the addition of the suggested descriptors '(feeling sick and queasy)' to aid understanding. Following the changes made, no problems were expressed by patients discussing this item in subsequent rounds. Two patients understood chills to mean 'feeling cold' rather than experiencing symptoms of infection therefore the descriptors '(shivering, shaking, chattering of teeth)' were added as a suffix (thus maintaining congruity with CTCAE grading). However, in further rounds the word 'chills' was dropped from the question as it continued to be misunderstood by 10% of patients. The descriptor 'achy' was added to 'hot, cold and shivery' to aid the understanding of 'flu-like symptoms', and the descriptor 'loose' was added to describe diarrhea. Further, it became clear patients could not make the distinction between 'aging spots' and a 'purpuric rash' in the description of thrombocytic purpura. Therefore, it was eventually decided that we would have to consider using an image. Overall, 14 modifications related to question comprehension were made (for details see Table 4 of the online supplementary material). Depression and anxiety items were generally understandable to patients, however, clinicians suggested the removal of the prefix 'in the past two weeks' as although a useful time frame for diagnostic purposes, it was thought redundant and confusing for weekly home self-report. The peripheral neuropathy item was also adapted by clinicians from tingling in 'hands and feet' to tingling in 'fingers and toes'.

Response option modifications
The majority of response options required minor amendments and were consistently interpreted by patients by the final round (e.g., mucositis, depression, constipation, nose bleeds, anxiety, peripheral neuropathy and chills). Following patient recommendation for the diarrhea item, two response options were merged, based on the similarity of descriptors, e.g., 'incontinence' and 'bowel opening' seven times a day.
Other items needed further modification to ensure consistent patient interpretation, e.g., for nausea we inserted 'the same amount and type of food' into the response option to accommodate patients who answered they 'ate and drank as usual', but when probed actually changed what they ate or drank. Further, the quantifiers (e.g., a bit, quite and very) were removed from the nausea item after patients found it difficult to respond to both (1) interference and (2) quantity factors (e.g., 'I felt a bit sick but could not eat and drink normally'). These amendments ensured closer congruence with the CTCAE. The appetite response options 'ate and drank less than usual' and 'took supplement drinks', were amalgamated as both response options applied to many patients experiencing a reduced appetite. Despite modification to the skin rash item responses, patients were persistently confused by the complexity of the text and multiple factors (e.g., itching/ pain/oozing and peeling) resulting in the adaptation of the question to incorporate these multiple factors rather than the response option (see Table 5 of the online supplementary material). Patients had difficulty choosing a response for the palmer-plantar erythema item so further descriptors were introduced and the word 'not' emboldened to emphasize the distinction between 'painful' and 'not painful' required by the CTCAE. A similar emboldening of the word 'was' and 'was not' was employed to assess whether rest relieved fatigue. Researchers also amended the constipation item to include more patient friendly language, e.g., changing 'I have modified my diet' to 'I have changed my diet'.
Patient-reported descriptors matching the CTCAE instrumental (preparing meals/shopping) and self-care activities of daily living (ADL; bathing and feeding) were added to a number of PRAE items (fatigue, flu-like symptoms, breathing, depression, anxiety and PPE). We added 'shopping' to 'things I normally do' and replaced 'not able to carry out daily activities' with 'not able to take care of myself'.

Order effects
We reordered the item 'Have you had shivering or shaking chills?' to follow 'Have you had flu-like symptoms?', as a number of patients believed that 'chills' referred to being cold  Married  32  7  11  14  Cohabiting  6  2  1  3  Separated  13  7  5  1  Widowed  6  2  3  1  Single  3  2  0  1 rather than a serious infection with temperature. The order change was designed to ensure a stronger association between flu-like symptoms and chills.
Overall 33 amendments were made to the items, with the majority made in Round 1 (n ¼ 17), and Rounds 2 and 3 yielding n ¼ 10 and n ¼ 7, respectively.

Discussion
Cognitive interviews were employed to pretest the content and acceptability of items to report AE in oncology patients undergoing treatment for a range of common cancers. Generally, the items presented to patients were interpreted consistently. However, the interviews uncovered a number of important issues in patient understanding and responses, and modifications were made accordingly. When adapting items, we endeavored to refine items to facilitate patient understanding, whilst maintaining the clinical applicability to reflect the CTCAE.
Modifications included asking patients to report any pain (not just cancer-related pain) and adding an additional 'discomfort' as a descriptor to accommodate patients who could not endorse 'pain' as a concept. To aid comprehension, 'feeling sick and queasy' was added to nausea items.
In line with previous research, additional descriptors were included, in this case for items relating to anxiety, incontinence, flu and chills (the latter seemingly a cultural idiom which could not be embraced by UK patients). Additional or alternative descriptors are in line with adaptations made to the NCI PRO-CTCAE items during the cognitive interview process [4] with parentheses generally used to convey any additional contextual information. Another challenge when developing the items was reflecting the impact and interference on activities of daily living. The CTCAE reports these only for severe AEs whereas we had to adapt this general statement to the context of the item and use different descriptions.
Adapting the response options posed a greater challenge due to the need to represent the multiple attributes of the CTCAE (e.g., frequency, severity and interference). The majority of changes were made to items where symptom-specific information was required (e.g., eating habits). Also problematic was how to describe the CTCAE instrumental and self-care ADL. The level of interference with self-care and instrumental activities in these response options was particularly challenging, e.g., fatigue, PPE and skin rash. These items were fundamentally modified to accommodate the information required to align with the CTCAE, supporting previous studies where descriptions of daily living were misinterpreted [4]. This may have reflected cultural differences in interpretation (the CTCAE originates from the US). We strongly recommend that questionnaires be pretested using cognitive interviews as there appear to be subtle differences which affect understanding, even among native English speakers. For example, it appeared that our sample of English patients did not have the same understanding of the word 'chills'.
It is a known phenomenon that the order of presentation of questionnaire items can influence how they are answered [22]. Indeed, our findings indicated that similar items have to be mindfully presented to avoid influencing response selection.
We purposively sampled oncology patients by tumor group, age and gender to maximize the generalizability of the findings with the future goal of establishing an understanding of AE items across a range of common cancers for remote self-report. To further allow generalizability, we adopted an approach of presenting the items to a new set of patients in each round, rather than presenting to the same patients again. In essence, we took an exploratory [18] rather than a confirmatory approach to ascertain new insights/uncover new problems. With this approach, there is a risk that novel problems will continue to be unearthed. However, by the final round saturation was reached, as we were not discovering any new problems [21], and amendments were minimal. As others have suggested, it would be useful to engage in a dialog with those in the field to work towards an agreed best practice in cognitive interviewing [20,23,24]. Certainly, a balance must be achieved in terms of an exploratory or confirmatory approach to ensure that as many comprehension issues as possible are addressed, without focusing on irrelevant minutiae.
The findings from the cognitive interviews have resulted in a bank of items reviewed for clinical meaningfulness and face validity in a group of patients with common cancers in the UK. The items are a succinct measure of interference/ severity of AE whilst avoiding asking multiple attributes of each item. For our online adverse reporting and management system eRAPID, the responses from the PRAE item would form the basis of the algorithmic questionnaire scoring. We developed each PRAE item by re-wording the CTCAE criteria into patient language and each response option corresponded directly to the severity grading of CTCAE (Grades 0-3). The algorithms linking the AE severity to appropriate levels of advice were developed in consultation with clinician and patient representatives from each tumor group. For severe symptoms (Grade 3 AE) or a combination of several medically significant Grade 2 responses patients would be advised to telephone the hospital immediately and a clinician would be alerted. For Grade 1 response patients would be given self-management advice and for Grade 2 patients would advised to self-manage, but asked to mention their symptoms at the next hospital visit.
The necessary adaptations highlight the value of conducting pretests of items via cognitive interviews with patients and adapting them via discussion with the clinical team. Clearly, it is wrong to assume a common understanding [19] of item content across all respondents and pretesting items in this way may impact on the quality and ultimately the consequences of PRO reporting. It is possible that the educational level of participants may have influenced the understanding of items. Over a third of our sample had an education to a higher degree in comparison with an average of 25% figure for the general UK adult population in our area in 2011 [25]. This may have had a bearing on the level of engagement with the study and eager suggestions for the alternative wording of items.
The bank of items was included in an eRAPID usability study with adjuvant breast cancer patients. Prior to this oncologists and clinical nurse specialists had recommended adjustments. Generally, these changes were based on contextual considerations, e.g., time period for reporting, prioritization of questions (as essential) for different tumor groups (leaving other questions optional), and addition of a yes/no prefix. Although cognitive interviewing is an excellent way to check patient understanding of items, we suggest that researchers liaise with staff to check their understanding to situate their findings in a clinical perspective and in view of local treatment regimens. A full clinical validation of the items is now underway in a pilot study in common cancer groups undergoing systemic therapy.
In conclusion, cognitive interviews are a useful tool to pretest questions to allow adaptation prior to administration to ensure consistency in patient understanding. Pretesting of toxicity items in this way makes a necessary contribution towards developing a clinically relevant system for PRAE both in clinical trials and patient care thus improving patient safety and experiences during cancer treatment.