Predictors for mortality due to acute exacerbation of COPD in primary care: Derivation of a clinical prediction rule in a multicentre cohort study

Abstract Background In primary care (PC), 80% of the acute exacerbations of chronic obstructive pulmonary disease (AECOPD) are treated. However, no predictive model has been derived or validated for use in PC to help general practitioners make decisions about these patients. Objectives To derive a clinical prediction rule for mortality from any cause 30 days after the last PC visit. Methods Between December 2013 and November 2014, we performed a cohort study with people aged 40 and over who were treated for AECOPD in 148 health centres in Spain. We recorded demographic variables, past medical history, signs, and symptoms of the patients and derived a logistic regression model. Results In the analysis, 1,696 cases of AECOPD were included and 17 patients (1%) died during follow-up. A clinical prediction rule was derived based on the exacerbations suffered in the last 12 months, age, and heart rate, displaying an area under the receiver operating characteristic curve of 0.792 (95% confidence interval, 0.692–0.891) and good calibration. Conclusion This rule stratifies patients into three categories of risk and suggests to the physician a different action for each category: managing low-risk patients in PC, referring high-risk patients to hospitals and taking other criteria into account for decision-making in patients with moderate risk. These findings suggest that it is possible to accurately estimate the risk of death due to AECOPD without complex devices. Future studies on external validation and impact assessment are needed before this prediction rule may be used in clinical practice.


Introduction
Chronic obstructive pulmonary disease (COPD) is the fourth most frequent cause of death in the world. It is expected that mortality from COPD will continue to worsen in the coming decades, mainly due to the increase in tobacco consumption in low-and middleincome countries [1].
Many exacerbations occur throughout the life of a person with COPD, which the Global Initiative for Chronic Obstructive Lung Disease (GOLD) guidelines define as 'an acute worsening of respiratory symptoms that result in additional therapy' [2]. Until recently, the exacerbations were considered accessory phenomena without influence on the disease itself. However, numerous studies have shown that exacerbations contribute decisively to the deterioration of lung function, the quality of life of people with COPD and work productivity, in addition to worsening of the prognosis and the increase in associated costs [2][3][4][5][6].
A subgroup of people with COPD suffer frequent exacerbations: those with two or more exacerbations per year. These people suffer a faster deterioration of lung function, longer time at home, more inferior quality of life, higher probability of hospital admission, and higher risk of death than those with fewer exacerbations, regardless of the degree of deterioration of their lung function [7]. The best predictor of the exacerbation frequency in patients is the number of exacerbations they had in the previous year [8].
Exacerbations of COPD are heterogeneous. Lower respiratory inflammation is different depending on whether the aetiology is viral or bacterial infections. Also, patient characteristics affect the severity of the exacerbation. Decision-making in this context is complex and a tool to help physicians would be useful for both doctors and patients [9].
As expressed in the GOLD guidelines, 'prevention, early detection, and prompt treatment of exacerbations are vital to reduce the burden of COPD.' Although some predictive models have been published, they were derived and validated in the hospital setting. Almost all these models include variables that cannot be assessed in primary care (PC) [10][11][12][13][14][15][16]. We hypothesised that past medical history, symptoms, and signs in a person who suffers an acute exacerbation of COPD (AECOPD) and is treated in PC allow predicting his or her death in the short term. The objective of this study was to derive a clinical prediction rule (CPR) that contained these predictors and supported making the best decisions in the care provided to these patients.

Study design and participants
Methods have been described in detail elsewhere [17]. We designed a cohort study in PC including all people aged 40 and over who were treated between December 2013 and November 2014 in one of the 150 health centres (HC) of the Spanish provinces of Burgos, Salamanca, Soria, Valladolid, and Zamora and who were diagnosed with AECOPD (code ICD-9-CM 491.21). At the beginning of the study, these provinces had 736,183 inhabitants between 40 and 79 years of age. We excluded individuals who did not have a diagnosis of COPD in their electronic health record (EHR).
To study the prognosis of the AECOPD episode as a whole and not the prognosis of each visit to the general practitioner (GP) that the patient made during the same episode, we considered the visits made in the four weeks after a visit for AECOPD as part of the same episode of AECOPD. For patients who had several values of the same variable during the same episode of AECOPD, we selected the value corresponding to the visit in which the GP established that the patient had a worse general condition. In patients who made several visits, death was determined 30 days after the last visit. In patients who had several exacerbations, each of them was considered as an independent exacerbation.

Variables and data measurement
The outcome was death from any cause within 30 days after the last visit due to AECOPD. Evaluation of the independent variables was performed without knowing the result of the outcome. Independent variables were, at the time of the visit, sex, age, peripheral arterial oxygen saturation (SpO 2 ), systolic blood pressure (SBP), diastolic blood pressure (DBP), heart rate (HR), peripheral temperature, oedema in the legs, confusion, grade of dyspnoea according to the modified dyspnoea scale of the Medical Research Council (mMRC) scale, Charlson comorbidity index, cardiovascular disease, diabetes mellitus, dementia, cancer (except basal cell carcinoma), being included in the home care programme, type of health centre (rural or urban), season in which the episode began, and the number of exacerbations registered in the EHR in the last 12 months. The last body mass index (BMI) value was also included if it had been registered within the last year, and the last percentage between the observed and the expected forced expiratory volume in one second was included if it had been registered in the last two years.
Confusion was defined as drowsiness, stupor, or coma. Cardiovascular disease was defined as heart failure, acute myocardial infarction, cerebrovascular disease, or peripheral arterial disease. The home care programme is offered to people who spend most of their time in bed (those who can only leave with the help of others) and people with significant mobility impairments (preventing them from leaving home, except in exceptional cases) regardless of the cause, provided that the foreseeable duration of this disability exceeds two months.

Statistical analysis
The descriptive study of the sample was carried out by a table of frequencies for the qualitative variables and a table of medians and interquartile ranges for the continuous variables. In the univariate analysis, the effect of qualitative variables was studied with Fisher's exact test when the expected frequency was less than five in more than 20% of the cells and with the v 2 test in the rest of the cases. The effect of quantitative variables was studied with the Mann-Whitney U test after studying its normality with the Kolmogorov-Smirnov and Shapiro-Wilk tests in both individuals who died and those who did not.
We carried out descriptive and pattern analyses of the missing data in which all the independent variables and outcomes were included. Assuming that missing data were missing at random, we performed a multiple imputation procedure in which the outcome and all the independent variables were included, using a fully conditional specification method with ten maximum iterations. Minimum and maximum allowable imputed values were defined for SpO 2 , SBP, DBP, HR, BMI, and peripheral temperature, so that the values of the imputations were biologically plausible. The type of univariate model type used was multinomial logistic regression for categorical variables and linear regression for scale variables. Hundred imputations were made because the proportion of missing data was high. The SPSS syntax used can be found in the supplementary information.
We excluded independent variables whose predictive effect had not been demonstrated in previous studies and was not suspected based on the clinical knowledge of the principal investigator. Sex was excluded based on results of the univariate analysis. For categorical variables, categories with few elements were collapsed to maximise their statistical significance in univariate analysis. For continuous variables, extreme outliers, defined as values more than three times the interquartile range below the first quartile or above the third quartile, were truncated. Quantitative variables were not categorised, and their linear relationship with the logit model of the probability of the outcome was studied using the Box-Tidwell test. Only the number of previous exacerbations did not demonstrate linearity, so we iterated some simple transformations for this variable and root square transformation demonstrated linearity. The presence of collinearity between these variables was also studied through the analysis of principal components. Finally, we studied all the possible interactions of age with the remaining predictors by adding their cross-products as well as the interactions between the independent variables with statistically significant relationships.
We derived a logistic regression model for all-cause mortality at 30 days following a stepwise regression method using the logarithm of the likelihood ratio as a selection criterion. The rule to remain in the model was a p-value less than 0.157. A variable was considered a confounding factor when by eliminating it from the model, the regression coefficient of another variable changed by more than 10 percent. The internal validity of the model was studied through a bootstrapping resampling simulating 1,000 samples for each of the 100 imputations. We applied a uniform shrinkage factor for the regression coefficients calculated with bootstrapping and re-estimated the intercept based on the adjusted coefficients.
The model's discrimination was studied with the area under the receiver operating characteristic curve (AUROC), and the calibration was assessed with the calibration slope and the intercept. These performance measures were pooled parameters calculated from the 100 imputations.
To support the clinician making decisions, the result was returned as a mortality risk category (low, moderate, or high), and a different action was proposed for each category. The thresholds were based on predicted probability quintiles and a decision curve analysis was performed.
All the data were collected through an ad hoc form implemented in the EHR and the Spanish National Death Index. Statistical analyses were performed with IBM SPSS Statistics 24 and RStudio 1.4.1106 for Windows.

Ethics
The study protocol was approved by the Burgos Research Ethics Committee (reference CEIC 1185), the Salamanca Research Ethics Committee, the Soria Research Ethics Committee (reference CEIC 1227), the East Valladolid Research Ethics Committee (reference PI-13-115), the West Valladolid Research Ethics Committee (reference PI-13-115) and the Zamora Research Ethics Committee [18,19].

Participants and model development
There were 2,238 exacerbations evaluated in 1,536 people. Of those, 307 (13.7%) exacerbations were excluded because a diagnosis of COPD was not included in the associated EHR, 192 (8.6%) were wholly diagnosed and treated in a hospital, and 43 (1.9%) had no available EHR. Finally, 1,696 exacerbations in 1,054 people (1.6 exacerbations per person) from 148 HC were included in the analysis (Figure 1).
The mean age of participants was 76 years and 84% was male, 17 people (1%) died within 30 days after the last PC visit for AECOPD. Of these participants, 15 (88.2%) died due to AECOPD and one died due to end-stage renal disease; in one case, the cause of death could not be recovered. Complete characteristics of the sample and the relationship of the predictors with the outcome in the univariate analysis are presented in Table 1.
In the pattern analysis, we observed that the variables with the most missing data were those related to the physical examination and the dyspnoea grade. In the descriptive analysis, we observed that the distribution of the missing data in all the variables that had them was related to some of the other variables.
Diastolic blood pressure, peripheral temperature, oedema in the legs, diabetes mellitus, cancer, dementia, type of HC, and season in which the episode began were excluded because they did not demonstrate predictive ability in previous studies and were not suspected according to the clinical criteria of the authors. The categories of the variable 'grade of dyspnoea according to the mMRC scale' were combined to form a dichotomous variable called 'dyspnoea grade 4 according to the mMRC scale.' Likewise, the Charlson index categories were combined to form a dichotomous variable called 'Charlson index greater than 1.' All extreme outliers were considered biologically plausible,17 peripheral arterial oxygen saturation (SpO 2 ) values below 75% and four BMI values above 49.74 kg/m 2 were truncated. A lack of linearity was detected for the variable 'exacerbations in the last 12 months,' which was corrected by transforming it into its square root. No significant interactions were found between the predictors. A shrinkage factor of 0.921 was applied.

Model specification and performance
The regression coefficients for the full model, the variables eliminated, the values of the adjustment statistic in each step, and the results of the internal validation are shown in the supplementary information.
After shrinkage, the equation of the final model is: Where: P is probability of death from any cause within 30 days after the last primary care visit due to AECOPD; x are exacerbations in the last 12 months; y is age, measured in years; z is heart rate, measured in min-1 Figure 2 shows the calibration plot of the final model. Calibration slope was 1 (95% confidence interval, 0.45 À 1.55) and intercept was 0 (95% confidence interval, À0.48 À 0.48). AUROC was 0.811 (95% confidence interval, 0.72 À 0.902) for the final model ( Figure 3). This final model proposes three risk categories to be used in clinical practice, based on predicted probability quintiles. Patients with low risk would have a probability of death below the second quintile. Probability of death in these patients is below prevalence so physicians might treat them in primary care. On the other hand, patients in the top quintile of predicted probability might be followed very closely or referred to the hospital. Details about these risk categories are summarised in Table 2.
The decision curve analysis shows that the net benefit of the model is better than alternatives across this range of probabilities ( Figure 4).
We recommend estimating the risk of the patient with the equation. As it is not yet implemented in any medical calculator, we propose a simple score system that can be easily remembered and used at the office (Table 3).

Main findings
This study derived a CPR for short-term mortality due to an AECOPD treated in PC, based on data collected prospectively in 148 HC over one full year. The predictors are the EXacerbations suffered in the last 12 months of age, the AGE, and the heart RATE (mnemonic, EXAGGERATE), which do not need any complex instrument to be measured. The CPR stratifies patients into three categories of risk based on predicted probability quintiles and suggests to the doctor a different action for each category. Patients with low risk might be followed in primary care. Patients with medium risk might be followed closely if the doctor decides to treat them in primary care. Patients with high risk might be followed very closely or referred to the hospital.

Strengths and limitations
Age clearly showed a predictive effect in this study; the higher the age of an adult suffering from an acute illness was, the greater their probability of dying in the short term. This finding is also compatible with that of previous studies on the subject [10][11][12][13][14][15][16]. The predictive effect of the number of previous exacerbations was also compatible with that in previous studies; as stated in the introduction of the article, patients with frequent exacerbations have a higher risk of death than those with fewer exacerbations [7]. The heart rate had already been shown to have a predictive effect on mortality 30 days after hospital admission due to AECOPD in the derivation and validation of the BAP-65 rule [15], and an acute increase in the partial pressure of carbon dioxide (pCO 2 ) had also been shown to interact in a very complex manner with different systems; in cases of moderate acute hypercapnia, an increase in heart rate is often observed. Tachycardia may therefore be an early sign of an acute increase in pCO 2 prior to the onset of headache, agitation, or a decreased level of consciousness [20]. The study's main limitation is that our sample size was smaller than required [21]. This situation increases the risk that the model is overfitted; that is, it has a very good predictive performance in the derivation sample and bad predictive performance in new subjects. This risk of overfitting has been reduced by selecting predictors based on external information from the literature review and the authors' expertise and by using a p-value less than 0.157 as the stop rule to exclude a predictor from the model instead of using a value less than 0.05. Despite all the above, the only way to determine the true degree of overfitting in the model presented in this work will be through an external validation study [19].
This study's other significant limitation is the large proportion of missing data, which may have occurred because the study did not intend to change the way the doctors collected information but took advantage of data collected in the EHR from usual clinical practice. To compensate for power loss that this causes, a multiple imputation procedure was applied in patients with many imputations [22].
Only 16% of the people studied were women, and only two deaths were observed among them, which could reduce the predictive performance of the rule in women. This small proportion of women is because they represent only between 22 and 29% of people with COPD in Spain [23].
Concerns may arise using COPD diagnoses from EHRs. Previous validation studies suggest that such diagnoses have minor sensitivity and high specificity so they should not be used in prevalence studies but may be used to study risk factors [24][25][26].
As this CPR has not been externally validated, this could only be used in similar patients to its derived ones [27].

Comparison with existing literature
None of the similar studies published to date derived or validated its rule in PC, where 80% of AECOPD cases are treated [28]. The rules most identical to ours include DeCOPD [12], DECAF [10,16], and the one derived by Esteban and others [11] because the outcome was not intra-hospital mortality but mortality within 30 days after contact with a doctor, such as during hospital emergencies or hospital admission. Likewise, these studies are also based on data obtained from the 'real world,' with patients recruited opportunistically while trying to interfere as little as possible in the usual practice of doctors. All these  studies had small sample sizes, the continuous variables became categorical (risk of loss of information quality), and the predictors were selected according to the results of the univariate analysis (risk of excluding confounding factors or relevant interaction terms) [29]. In these studies, mortality was high, between 3.5 and 10.4%, because people treated in hospitals usually have more advanced disease and/or a more severe acute episode than people treated in PC. Discrimination for predicting death from any cause within 30 days after the last visit due to AECOPD in PC in our study was 0,74 (95% confidence interval, 0.571 À 0.909) for the simplified B-AE-D index, a validated, independent of lung function long-term mortality index in COPD [30].

Conclusion
This is the first predictive model derived from PC for the risk of short-term death due to AECOPD. Although it is a rare event, it can be accurately predicted from knowing the exacerbations suffered in the last 12 months, age, and heart rate. In addition, the rule suggests a different action depending on the calculated risk.
There is enough evidence to design a large validation study in primary care of all existing predictive models. Subsequently, the impact that the best predictive model would have on the results of patientoriented care compared to usual practices should be studied [27].  Disclosure statement