Predictive model of acute kidney injury in critically ill patients with acute pancreatitis: a machine learning approach using the MIMIC-IV database

Abstract Background Acute kidney injury (AKI) is a common and serious complication in severe acute pancreatitis (AP), associated with high mortality rate. Early detection of AKI is crucial for prompt intervention and better outcomes. This study aims to develop and validate predictive models using machine learning (ML) to identify the onset of AKI in patients with AP. Methods Patients with AP were extracted from the MIMIC-IV database. We performed feature selection using the random forest method. Model construction involved an ensemble of ML, including random forest (RF), support vector machine (SVM), k-nearest neighbors (KNN), naive Bayes (NB), neural network (NNET), generalized linear model (GLM), and gradient boosting machine (GBM). The best-performing model was fine-tuned and evaluated through split-set validation. Results We analyzed 1,235 critically ill patients with AP, of which 667 cases (54%) experienced AKI during hospitalization. We used 49 variables to construct models, including GBM, GLM, KNN, NB, NNET, RF, and SVM. The AUC for these models was 0.814 (95% CI, 0.763 to 0.865), 0.812 (95% CI, 0.769 to 0.854), 0.671 (95% CI, 0.622 to 0.719), 0.812 (95% CI, 0.780 to 0.864), 0.688 (95% CI, 0.624 to 0.752), 0.809 (95% CI, 0.766 to 0.851), and 0.810 (95% CI, 0.763 to 0.856) respectively. In the test set, the GBM’s performance was consistent, with an area of 0.867 (95% CI, 0.831 to 0.903). Conclusions The GBM model’s precision is crucial, aiding clinicians in identifying high-risk patients and enabling timely interventions to reduce mortality rates in critical care.


Introduction
Worldwide hospital admission rates are significantly influenced by acute pancreatitis (AP), a gastrointestinal condition with a global prevalence affecting various age groups and genders [1,2].In the context of AP, acute kidney injury (AKI) frequently emerges as a complication, especially in severe cases, often occurring at an advanced stage and following the deterioration of other vital organs [3].Despite comprehensive practice guidelines for AP management, morbidity and mortality rates persist stubbornly [4].Importantly, increased mortality in severe AP cases is closely tied to organ failure and subsequent secondary infections-key factors shaping AP outcomes [5].Therefore, it is crucial to clinically assess symptoms and indicators of organ failure (including respiratory, cardiovascular, and renal systems) in AP patients to accurately categorize the condition.
Therefore, clinicians face a critical need to early anticipate the likelihood of acute renal injury in patients with acute pancreatitis, a foresight that can significantly guide and improve clinical interventions.Additionally, in the realm of AKI detection, the imperative is for the diagnostic method to possess attributes such as minimal invasiveness, widespread accessibility, cost-effectiveness, procedural simplicity, and replicability [6].Several studies have explored determinants and established prognostic frameworks to predict AKI onset in individuals with acute pancreatitis.However, these investigations have been hindered by limited sample sizes and a lack of precision necessary for robust prognostic modeling [7].The challenge of promptly and accurately diagnosing AKI in acute pancreatitis patients persists within clinical practice.
In recent times, there has been a surge in the practical use of powerful computational methodologies, especially within the field of machine learning, for disease prediction efforts.Machine learning (ML), an emerging domain, has increasingly become integral to medical research.It's important to note that the effectiveness of ML analysis relies on the iterative use of diverse algorithms of varying depth, enabling the assimilation of candidate variables.This approach contributes to achieving prediction efficiency characterized by elevated precision [8].
In line with this cognitive perspective, our focus was on developing a prognostic framework outlining the risk of AKI in acute pancreatitis patients.We utilized the extensive information from the critical care database in this investigation.The envisioned outcome of this predictive model is an instrument capable of prompting quick interventions, thus creating a conducive environment for managing high-risk AKI cases.The prescient identification achieved through the model's effectiveness plays a pivotal role in the domain of intensive care.

Data source
This study utilized the Medical Information Mart for Intensive Care IV database version 2.2 (MIMIC-IV v2.2) as its primary dataset.MIMIC-IV, a publicly accessible repository of critical care data from a single medical center, has received approval from the Institutional Review Boards of Beth Israel Deaconess Medical Center (BIDMC, Boston, MA, USA) and the Massachusetts Institute of Technology (MIT, Cambridge, MA, USA).The database includes comprehensive records for a cohort of 73,181 patients admitted to various Intensive Care Units at BIDMC in Boston, Massachusetts, covering the period from 2008 to 2019 [9].The dataset comprises well-documented events, including demographic indicators, vital sign readings, laboratory results, fluid balance assessments, and patient survival status.Additionally, the database includes International Classification of Diseases and Revision (ICD-9 and ICD-10) codes, offering a standardized framework for systematic classification.Notably, the repository includes hourly physiologic data collected from bedside monitors and rigorously validated by skilled ICU nursing personnel.
The content of this database, made available through contributions from clinicians, data scientists, and information technology experts, prioritizes an anonymity-centered approach to protect patients' health-related information.This orientation has resulted in the exemption of the database from the realm of human subjects research, eliminating the need for individual patient consent due to the anonymized nature of the health data.It is crucial to highlight that potential users undergo a rigorous assessment procedure, including the successful completion of a qualifying examination and obtaining approval from the MIMIC-IV database administration.As an example, Wenbin Lu, an author of this study, completed a mandatory training course, leading to authorization for data extraction from the database for research purposes (certification number: 50992435)

Patients and data variables
Data extraction was carried out using Structured Query Language (SQL) programming within the PostgreSQL framework (version 14.0).The SQL script codes needed for extracting patient information were obtained from the GitHub repository located at (https://github.com/MIT-LCP/mimic-code/tree/main/mimic-iv) [10], ensuring methodological transparency and replicability.
Patients diagnosed with acute pancreatitis were identified from database using the International Classification of Diseases, ninth revision (ICD-9, code 577.0), and tenth revision (ICD10, code K85%).Prudent exclusion criteria included patients under 18 years of age, those with an ICU stay of fewer than 24 h, and individuals with a documented history of renal disease.For patients with multiple ICU admissions, data retrieval was exclusively performed for the initial admission, ensuring methodological consistency.
After meticulously identifying eligible patient cohorts, a comprehensive set of fundamental parameters characterizing patients with AP was systematically gathered.These parameters included a range of demographic characteristics, relevant medical histories, vital sign recordings, laboratory indices, and instances of interventions.The interventions under consideration included the use of invasive mechanical ventilation, the implementation of renal replacement therapy, and the administration of vasoactive agents.It is important to note that both vital sign measurements and laboratory indicators were based on the initial values recorded during the first 24 h following the patient's admission to the ICU.
The chosen outcome measure focused on the incidence of AKI within seven days following the patient's admission to the ICU.To diagnose AKI, we adhered to the guidelines outlined in the 2012 version of the Kidney Disease: Improving Global Outcomes (KDIGO) guidelines.The diagnostic criteria were as follows: an increase in serum creatinine (SCr) levels by ≥26.5 μmol/L (0.3 mg/dl) within a 48-h period; an increase in SCr values by ≥50% compared to the baseline value (resulting in a 1.5-fold increase); or a urinary output less than 0.5 mL/kg/h for more than six hours.Importantly, the baseline SCr, a crucial parameter in this diagnostic framework, is defined as the lowest observed SCr value within the immediate preceding week [11].

Data cleaning and feature selection
The database exhibited a noteworthy prevalence of missing data.In this study, variables with missing values exceeding a 20% threshold were intentionally excluded.The variables and their corresponding proportions of missing values are thoroughly outlined in Appendix 1.To address these data gaps, multiple imputations were employed, as detailed in reference [12].Skewed distribution patterns led to the use of median and quartile representations, facilitating a robust Mann-Whitney U test for inter-variable comparisons.Categorical variables were succinctly presented through count (%) depictions and underwent comparative analysis using the χ 2 test.Feature selection was pragmatically carried out through recursive feature elimination, an inherent aspect of the random forest, supported by the 'rfe' function within the 'caret' package.This process resulted in the curation of a thoughtfully selected subset of features ready for integration into the subsequent machine-learning models.

Model selection
To address the variability in internal validation robustness across different machine-learning methods, a prudent approach involved dividing the data into training and test sets, using a randomized allocation of 70/30 for validation purposes.Specifically, 70% of the dataset was used for the training phase, while the remaining 30% served as a separate test set (Figure 1).In a non-preordained sequence, machine-learning algorithms underwent training within the training set, incorporating a 10-fold cross-validation protocol repeated thrice to mitigate initial tendencies toward overfitting.Additionally, the evaluation of each model included the computation of sensitivity and specificity at the 'best' thresholds.It is important to clarify that the term 'best' threshold refers to the point at which both sensitivity and specificity simultaneously reach their peak, although not always aligning with the optimal threshold for clinical application.
The array of trained machine-learning algorithms included generalized linear model, support vector machines, naive Bayes, k-nearest neighbor, random forest, neural networks, and stochastic gradient boosting machine.All features surviving recursive feature elimination were uniformly employed across the training algorithms.Certain machine-learning algorithms, such as the random forest, have inherent built-in feature selection capabilities.The initial training and subsequent 10-fold cross-validation were orchestrated using the 'caret' package in R, with the receiver operating characteristic as the designated performance metric.Parameter calibration, specific to machine-learning approach, underwent a basic tuning process following defaults set by the package [13][14][15][16].Additionally, receiver operating characteristic curves were generated using the 'pROC' package.A comprehensive representation of the model's confidence intervals was achieved by computing bootstrap-based 95% confidence intervals, obtained through 2,000 stratified bootstrap replicates using the 'ci' function within the 'pROC' package.

Model tuning and testing
Following meticulous model selection, characterized by the initial course-level tuning of the models, the subsequent phase involved the careful refinement of the most proficient algorithm.This refinement was based on its achievement of the highest area under the receiver operating characteristic curve, a crucial criterion for identifying the optimal model.The fine-tuning process, indicating the intentional optimization of algorithmic performance through judicious modification of relevant parameters, occurred in alignment with the specific requirements of each machine-learning method.In this pursuit, tuning involved the calibration of method-specific parameters in a conscientious effort to achieve optimal performance benchmarks.
The optimization efforts for these method-specific parameters involved a thorough manual grid search.This rigorous procedure entailed introducing broad yet practical ranges of conceivable values for each parameter, followed by a meticulous comparative analysis of the performance exhibited by the resulting models.For models allowing the calculation of variable importance, this attribute was determined for the final model.It's important to note that the feasibility of computing variable importance varies across machine-learning algorithms.This quantification, based on the classifier's construction and its subsequent impact on the performance metric, establishes the significance of individual features in facilitating the classification process.
Variable importance serves as a crucial metric for evaluating the influence of individual variables on the algorithm's performance.This assessment is notably discerned through the examination of the repercussions resulting from the permutation or omission of a variable with heightened importance from the model, leading to a corresponding decrement in performance.The magnitude of importance established conveys the indispensability of a given variable to the model's efficacy, underscoring its contribution to the performance outcomes.However, it's important to highlight that the direct derivation of effect size in relation to the primary outcome based on variable importance remains elusive [17].The quantification of variable importance was operationalized via the 'varImp' function, intrinsic to the 'caret' package.Subsequently, the ultimate model underwent validation through simulation on the test set, elucidating both the algorithm's capacity for generalization and the avoidance of overfitting [18].A visual representation of this process is explained in Figure 1.To assess the model's discriminative prowess, the area under the receiver operating characteristic curve (AUC) was employed as the yardstick for evaluation.Significance levels were established at p < 0.05.The entire analytical process was conducted within the R software framework (version 4.2.2).

Results
Following the exclusion of patients below the age of 18, the resulting cohort comprised 1,235 individuals diagnosed with acute pancreatitis (AP), of whom 667 (54%) developed acute kidney injury (AKI) within the subsequent 7 days.The training set, consisting of 865 cases, was balanced by the test set, which included 370 cases.In the training set, 467 (53.9%) cases experienced AKI, while the test set recorded 200 (54.0%)instances of AKI.The comprehensive data characteristics of the entire dataset are meticulously outlined in Table 1.The final selection of features following recursive feature elimination is visually depicted in Figure 2.
In the refinement of method-specific parameters, tuning was applied to the gradient boosting machine method, involving adjustments to the number of trees (ranging from 50 to 200), interaction depth (ranging from 1 to 8), shrinkage (ranging from 0.1 to 0.3), and the minimum number of variables at terminal nodes (ranging from 5 to 20).This process concluded with a finalized gradient boosting machine algorithm characterized by 50 trees, an interaction depth of 2, shrinkage of 0.1, and 10 minimum variables at terminal nodes.The final variable importance can be observed in Figure 3.The resulting model demonstrated an AUC of 0.816 (95% CI, 0.766 to 0.867).
The subsequent evaluation of the model on the test set yielded an AUC of 0.867 (95% CI, 0.831 to 0.903), a negative predictive value of 77% (95% CI, 76 to 78%), a positive predictive value of 82% (95% CI, 81 to 83%), and an overall classification accuracy of 79% (95% CI, 78 to 80%).The AUC values for all machine-learning classifiers executed on the test set are detailed in both Table 3 and Figure 4.The calibration plot of GBM model in test datasets illustrate in Appendix 2.

Discussion
The data from the MIMIC database was meticulously collected and utilized alongside a diverse array of machine-learning algorithms.This collaborative effort aimed to predict the likelihood of acute renal injury occurring within a seven-day window following admission to the ICU for patients diagnosed with acute pancreatitis.Our investigation resulted in the development of predictive models represented by area under the receiver operating characteristic curves, where the model employing the GBM technique emerged as the most prominent.Notably, the GBM model demonstrated the most robust performance, supported by AUC of 0.814 (95% CI, 0.763 to 0.865) for the training set and 0.867 (95% CI, 0.831 to 0.903) for the test set, respectively.This performance aligns well with our initial expectations.
Among the classical regression methodologies, generalized linear model (e.g.logistic regression) stands out as a pivotal tool for examining associations between AKI and relevant risk factors.For instance, Dongliang Yang et al. leveraged logistic regression to construct a predictive model with discerning efficacy in forecasting AKI and severe AKI in patients with mild and severe acute pancreatitis (MSAP and SAP).This model highlighted the significant importance of clinical parameters such as C-reactive protein, intra-abdominal pressure, and serum cystatin C in the prediction of AKI [19].Similarly, Simin Wu et al. using multivariate logistic regression and a subsequent nomogram, demonstrated proficient predictive capability for early AKI occurrence in acute pancreatitis patients.The resulting nomogram achieved AUC of 0.795 (95% CI, 0.758-0.832) in the training cohort and 0.772 (95% CI, 0.711-0.832) in the validation cohort [7].However, existing literature, as seen in certain studies [20,21], suggests that conventional logistic regression may exhibit relatively modest performance indicators, quantified by AUC for receiver operating characteristic curves.Some studies also emphasize an elevated prediction error and comparative performance diminution compared to innovative techniques.
In recent times, the exploration of various machine learning algorithms, a subset of artificial intelligence involving the construction of predictive algorithms by 'learning' from data, has received increased attention.This methodology, inherently skilled at automated analysis of intricate datasets to yield substantive insights, has notably outperformed conventional statistical methods in terms of performance.This superiority arises from its ability to effectively decipher complex data patterns and generate meaningful outcomes.Notable contributions in this domain include the work of Yi Yang in prognosticating the clinical risk for acute lung injury following severe acute pancreatitis (SAP) [23].However, it is crucial to acknowledge the relatively modest sample sizes in these studies, limiting the achieved area under the curve values.A prominent contender in the landscape of machine learning algorithms is the gradient boosting machine, recognized for its precision and performance in predictive competitions.Its demonstrated attributes underscore its increasing prominence as a compelling alternative to conventional regression analyses, especially for predicting clinical adversities.In line with these trends, our findings affirm the superiority of the GBM model over alternative machine learning frameworks and traditional logistic regression models.The notable improvement in performance and heightened accuracy in predicting AKI among acute pancreatitis patients highlight the elevated potential of the GBM-based algorithm.This reinforces the prominence of gbm within the array of machine learning methodologies, affirming its status as a robust contender for enhancing predictive modeling outcomes in the context of clinical adverse events.In line with these trends, our findings confirm the superiority of the GBM model over alternative machine learning frameworks and traditional logistic regression models.The notable improvement in performance and increased accuracy in predicting AKI among acute pancreatitis patients underscores the elevated potential of the GBM-based algorithm.This solidifies the prominence of GBM within the array of machine learning methodologies, reaffirming its status as a robust contender for enhancing predictive modeling outcomes in the context of clinical adverse events.
Through meticulous scrutiny of attribute significance within our model, we identified the pronounced influence of specific characteristics in predicting acute renal injury within the cohort of acute pancreatitis patients.Foremost among these determinants was urine volume, emerging as a pivotal factor, followed sequentially by invasive mechanical ventilation, white blood cell count, utilization of vasoactive drugs, mean heart rate, mean respiratory rate, and maximum creatinine levels.This aligns judiciously with the collective wisdom of diverse medical conditions, wherein variations in urine volume often foreshadow the emergence of acute renal injury [24].It is essential to underscore that acute renal injury denotes a precipitous decrement in renal function, attributable to multifarious triggers, including ischemia, nephrotoxic agents, and infections [25].Notably, a decline in urine volume signifies compromised renal perfusion and diminished glomerular filtration rate, portending the onset of acute renal injury.Remarkably, our analysis unveiled the predictive significance of invasive mechanical ventilation in the context of AKI within acute pancreatitis.This observation aligns with Figure 3. Variable importance of features included in gradient boosting machine algorithm for prediction of aKi.Variable importance is computed based on how important any given feature is to aid in the classification process when the classifier is built, determined by its effect on the performance measure.The greater the importance, the more essential the variable is to the performance of the model.assumptions about effect size cannot be drawn directly about the relationship of variable importance to the primary outcome.earlier investigations [26].It is discerned that acute respiratory failure stemming from acute pancreatitis necessitates recourse to invasive mechanical ventilation in ICUadmitted patients.This intervention, albeit essential, is recognized to potentially precipitate acute lung injury, exacerbating hypoxia and culminating in vasoconstriction, diminished renal perfusion, and reduced glomerular filtration rate.Notably, invasive mechanical ventilation instigates an elevation in intrathoracic pressure, inducing a concomitant reduction in venous return and mean arterial pressure, thereby fostering a milieu conducive to prerenal hypoperfusion and the subsequent onset of acute renal injury [27].Cytokines, including IL-1β, IL-8, and IL-6, play a pivotal role in the potential pathogenesis of AKI.These mediators influence endothelial cells, leading to renal ischemia, thrombosis, and the release of oxygen free radicals [26].Simultaneously, inflammatory mediators contribute to increased mucosal permeability and facilitate endotoxin translocation.Notably, endotoxin's role in elevating endothelin levels orchestrates vasoconstriction, resulting in reduced renal blood flow and subsequent tubular necrosis, perpetuating the trajectory toward AKI development [27].Importantly, this inflammatory milieu can intrinsically impede normative renal function, leading to a decrease in glomerular filtration rate and amplifying the risk of AKI [28].A retrospective study [29] supports the utility of various biomarkers-hematocrit, platelets, leukocytes, lymphocytes, albumin, CRP, CRP/albumin ratio, neutrophil/lymphocyte ratio, procalcitonin, urea, and creatinine-evaluated at the point of hospital admission as effective prognostic indicators for AKI occurrence in acute pancreatitis patients.This observation aligns with the scholarly consensus, emphasizing the pivotal role of white blood cells in engendering inflammatory responses and supporting their significant contribution to AKI surveillance.It is noteworthy that the systemic inflammatory response, inherently interconnected with the AKI process, may result from localized inflammation within renal tissue [30].
The present study highlights the noticeable predictive capacity of vasoactive drugs in the context of acute pancreatitis-related acute kidney injury.In alignment with this observation, prior research has established that the need for mechanical ventilation, along with the use of vasopressor agents and renal replacement therapy, constitutes a cluster of risk factors associated with elevated mortality rates among AP patients [31].It is crucial to emphasize that critically ill individuals require increased doses of vasopressor agents to manage blood pressure.Disturbances in heart rate and respiratory rate serve as indicators of changes in circulatory and respiratory domains, which, in turn, significantly impact renal functionality.It is important to recognize that deviations in circulatory and respiratory function lead to a cascade of events, consistently highlighting the impact on renal function [32].
However, this study is not without certain limitations.Firstly, it is essential to acknowledge its retrospective and single-center study.To enhance clinical applicability and achieve external validation, prospective efforts conducted across diverse centers are imperative.Furthermore, the model's construction did not consider other significant factors, including the etiologies of acute pancreatitis, the stratification of acute pancreatitis severity, and variables related to intra-abdominal hypertension and abdominal compartment syndrome-factors that may influence the trajectory of AKI development in acute pancreatitis.Another limitation pertains to the relatively modest sample size underlying this inquiry, coupled with reliance solely on internal validation to assess the model's precision and efficacy.To strengthen the generalizability and robustness of our findings, future investigations should embrace larger sample sizes and a more comprehensive incorporation of variables to validate our observations.

Figure 1 .
Figure 1.Diagram of methods.The complete data set was split into training and test sets.The machine-learning methods were trained on the training set and the best performer selected for additional parameter tuning before being applied to the test set for validation.

Figure 2 .
Figure 2. Reduction of dimensionality by recursive feature elimination on the training data set.The number of features used for training was reduced from the list of features on the left to the list of features on the right.

Figure 4 .
Figure 4. Receiver operating characteristic curves of machine-learning methods for prediction of aKi in the test data set.a greater area under the receiver operating characteristic curve represents higher discriminative ability of the model.area under the receiver operative characteristics curves, as well as specificity and sensitivity of each machine learning model for prediction of aKi at "best" threshold are presented with 95% Cis."best" threshold refers to the threshold at which specificity and sensitivity are both maximized.

Appendix 1 .
The proportion of missing values of variables Appendix 2. The calibration plot of GBM model in the test sets

Table 3 .
area under the receiver operating characteristic curves for each machine-learning classifier run on the test set.