Systematic review and meta-analysis of risk scores in prediction for the clinical outcomes in patients with acute variceal bleeding

Abstract Background Acute variceal bleeding (AVB) is a life-threatening condition that needs risk stratification to guide clinical treatment. Which risk system could reflect the prognosis more accurately remains controversial. We aimed to conduct a meta-analysis of the predictive value of GBS, AIMS65, Rockall (clinical Rockall score and full Rockall score), CTP and MELD. Method PubMed, Web of Science, Embase, Cochrane library, WANGFANG and CNKI were searched. Twenty-eight articles were included in the study. The Meta-DiSc software and MedCalc software were used to pool the predictive accuracy. Results Concerning in-hospital mortality, CTP, AIMS65, MELD, Full-Rockall and GBS had a pooled AUC of 0.824, 0.793, 0.788, 0.75 and 0.683, respectively. CTP had the highest sensitivity of 0.910 (95% CI: 0.864–0.944) with a specificity of 0.666 (95% CI: 0.635–0.696). AIMS65 had the highest specificity of 0.774 (95% CI: 0.749–0.798) with a sensitivity of 0.679 (95% CI: 0.617–0.736). For follow-up mortality, MELD, AIMS65, CTP, Clinical Rockall, Full-Rockall and GBS showed a pooled AUC of 0.798, 0.77, 0.746, 0.704, 0.678 and 0.618, respectively. CTP had the highest specificity (0.806, 95% CI: 0.763–0.843) with a sensitivity of 0.722 (95% CI: 0.628–0.804). GBS had the highest sensitivity 0.800 (95% CI: 0.696–0.881) with a specificity of 0.412 (95% CI: 0.368–0.457). As for rebleeding, no score performed particularly well. Conclusions No risk scores were ideally identified by our systematic review. CTP was superior to other risk scores in identifying AVB patients at high risk of death in hospital and patients at low risk within follow-up. Guidelines have recommended the use of GBS to risk stratification of patients with upper gastrointestinal bleeding. However, if the cause of upper gastrointestinal bleeding is suspected oesophageal and gastric varices, extra care should be taken. Because in this meta-analysis, the ability of GBS was limited. Key message CTP was superior in identifying AVB patients at high risk of death in hospital and low risk within follow-up. GBS, though recommended by the Guidelines, should be cautiously used when assessing AVB patients.


Introduction
Acute variceal bleeding (AVB) is one of the leading causes of acute upper gastrointestinal bleeding (AUGIB), and the incidence is second only to peptic ulcers [1]. They are most often a consequence of portal hypertension [2], commonly due to cirrhosis. Varices can be found in 50% of cirrhotic patients, and they develop at a rate of 5-15% per year [3]. The variceal bleeding may be brisk, and patients may soon develop shock. The 6 weeks mortality with each bleeding would be known. As long as upper gastrointestinal bleeding is suspected, an urgent risk assessment should be done. Besides, AVB, as a much more dangerous condition than nonvariceal bleeding, is more in need of stratification scores. Up to now, several risk scores have been invented. The most widely used scores for predicting upper gastrointestinal bleeding are the Glasgow-Blatchford score (GBS), the Rockall score and the AIMS65 score. In 2000, the GBS was developed and validated to predict in-hospital rebleeding, death and the need for intervention [7]. The Rockall score was created in 1996 to predict death and rebleeding [8], including clinical Rockall score and full Rockall score. Saltzman et al. developed and validated the AIMS65 score in 2011 to predict in-hospital death [9]. However, the three risk scores were validated and compared mostly in AUGIB patients, especially acute nonvariceal upper gastrointestinal bleeding (ANVUGIB). Patients with variceal bleeding were excluded or only accounted for a small part. The best score predicting the prognosis of AVB patients remains unclear. Another two predictive scores for patients with chronic liver disease are also gradually used in predicting variceal bleeding. The Child-Pugh score (CTP) is a valuable tool for determining the prognosis of chronic liver disease, especially cirrhosis [10]. Another scoring system for assessing the seriousness of the chronic liver disease is the model for end-stage liver disease (MELD). It is commonly used to estimate mortality in patients who had a transjugular intrahepatic portosystemic shunt (TIPS) procedure [11] and prioritize for receipt of a liver transplant [12]. Previous studies had reported the two staging systems' predictive abilities in AVB patients' outcomes, but which could reflect the prognosis more accurately remained controversial [13,14]. We aim to conduct a systematic review of the predictive value of GBS, AIMS65, Rockall (clinical Rockall score and full Rockall score), CTP and MELD in risking stratify AVB patients for mortality and rebleeding within three months after the initial bleeding.

Methods
Search strategy "Variceal bleeding" and "risk scor Ã " were searched in PubMed, Web of Science, Embase, the Cochrane library, WANGFANG (Chinese) and CNKI (Chinese) from inception to February 2021. (The detailed search strategy showed in Supplement materials). All search results were exported to the EndNote version 8 (Thomson Reuters, Toronto, Canada).

Study selection
Eligible articles ought to meet the following criteria: 1) adults (!18 years) who presented with AVB, confirmed by upper GI endoscopy (oesophageal, fundal, or both) 2) studies concerning GBS, AIMS65, Rockall (clinical Rockall score and full Rockall score), CTP or MELD score were included in this meta-analysis. 3) All risk scores should be consistent with the internationally recognized standard. The exclusion criteria included duplicate articles, reviews, letters to the editor, case reports, animal studies and children studies. With the exception of duplicates, two reviewers (L.Y. and N.W.) independently screened the titles and abstracts of all reported studies. The full texts of the selected papers were then scanned separately, and the eligibility and exclusion requirements were applied. Disagreements were addressed and settled by a discussion.

Outcome measures
Outcomes included mortality and rebleeding. Mortality was defined as all-cause death, including in-hospital death and follow-up death within three months. Rebleeding was defined as variceal bleeding that happened again after a 24-h clinical stable period by haemostasis, which included in-hospital rebleeding and follow-up bleeding within three months. Followup time within seven days was considered to be in hospital.

Data abstraction
Two reviewers (L.Y. and N.W.) extracted data from eligible articles. A third reviewer (H.C.) was consulted when facing the divergence. The following variables were collected from the included articles: author names, country, year of publication, study design, demographics and samples, the sensitivity, specificity, positive predictive value (PPV), negative predictive value (NPV), suggested cut-offs, the area under the receiver-operating characteristic curve (AUC), 95% CIs and SEs. In principle, if a test had an AUC lower than 0.5, the data were not included into meta-analysis [15].

Quality assessment
QUADAS-2 tool was used to assess the risk of bias and quality of included articles [16]. This tool evaluates the risk of bias from four aspects: patient selection, index test, reference standard, flow and timing of the study. For this study, the index tests are the validated risk scores. The patient outcomes within follow-up are the reference standards.

Statistical analysis
The ability of each scoring system to predict the outcomes (mortality and rebleeding) was assessed mainly by calculating the AUC. In this meta-analysis, AUCs or SEs were used. If the SEs were not reported in the studies, it was calculated as follows (SE ¼ upper limit of 95% CIthe lower limit of 95% CI/(2 Ã 1.96)) [17]. The use of the random effects model or fixed-effects model was dependent on the heterogeneity of studies. Subgroup analysis for in-hospital and follow-up outcomes was performed in the study. Pooled AUC of 0.5 was considered to have no predictive power, > 0.5 and 0.7 was considered poor predictive power, > 0.7 and 0.9 was considered excellent predictive power and one was considered a perfect measure [18]. For the statistical analyses, MedCalc version 15.2 (Ostend, Belgium) was used. p < .05 was deemed significant. We also pooled the sensitivity, specificity and positive and negative likelihood ratios. Meta-DiSc version 1.4 (Ramony Cajal Hospital, Madrid, Spain) was used for the assessment of heterogeneity and calculating the I 2 statistic.

Selection of studies
Through the electronic search, a total of 7388 articles were found. After removing the 3121 duplicates, the lefts were scanned and applied inclusion and exclusion criteria. We excluded 4183 studies after reading the titles and abstracts. Eighty-four articles were read for full-text review. Seventeen articles were excluded because the full texts were unable to obtain. Two reviews were excluded for not having the right study type. Fifteen studies studied other scores which were new or not validated and needed further research. Another five studies were excluded as the scores were not correctly calculated. Two articles were excluded as they did not measure the outcome of interest. Fourteen studies applied the predictive value of risk scores in cirrhosis patients with other reasons-caused upper gastrointestinal bleeding. And one study was excluded because the UGIB was not confirmed by an endoscope. At last, 28 articles were included in the study [13,14, (see Figure 1).

Descriptive overview of included articles
Twenty-eight studies were included in this review (Table  1). Studies included were published between the years 2005 and 2020. All studies were conducted within a 3month follow-up to assess mortality and rebleeding outcomes. Eleven out of 28 studies were prospective. Nine studies reported sensitivity and specificity values [19,21,26,27,32,34,36,38,42]. All studies presented AUCs and 95% CIs. An overview of the above five scores is shown in Supplementary materials.

Risk of bias/quality of studies
The risk of bias and applicability of the included studies was low in twenty-eight studies. Four research did not specify whether patients were enrolled consecutively, nor did they specify exclusion criteria, raising a high risk of bias in patient selection [20,22,23,44]. The results of the reference standard were not known before calculating the risk scores in any of the studies. As a result, the reference test in all research had a low risk of bias. The index test had a low risk of bias.

Outcomes of meta-analysis
Analysis of the diagnostic threshold suggested no heterogeneity caused by the threshold effect in this study. The pooled AUC values by MedCalc version 15.2 are shown in Tables 2, 3, 5 and 6. The information was stratified by follow-up time.

Rebleeding
As for rebleeding, no score performed exceptionally well. In predicting in-hospital recurrent bleeding, clinical Rockall had the highest predictive value of AUC (0.689, 95% CI: 0.627-0.752) ( Table 5). Regarding follow-up rebleeding, AIMS65 showed the highest predictive value of AUC (0.682, 95% CI: 0.614-0.750) ( Table 6). As no score had an AUC over 0.7, it showed low predictive power regardless of following-up time.

Discussion
AVB, as a critical emergency, has the characteristics of fast bleeding, high fatality rate and high rebleeding rate. It is the most life-threatening complication of liver cirrhosis. In recent years, with the continuous development of new drugs, endoscopic intervention and other new diagnosis and treatment technologies, the mortality and rebleeding rate of AVB have declined. Despite this, the mortality at six weeks is still around 20% [3]. To provide a reference for follow-up treatment, accurately predicting the outcomes of oesophageal gastric variceal bleeding through the risk scoring system has become a research hotspot for clinicians. This is the first meta-analysis to examine the predictive value of risk scores in AVB patients to the best  of our knowledge. CTP, AIMS65 and MELD showed good predictive power for mortality in hospitals and follow-up. Full-Rockall showed good predictive power for in-hospital mortality and low predictive power for follow-up mortality. Clinical Rockall showed good predictive power in follow-up mortality. GBS has low predictive power regardless of follow-up time. As for rebleeding, no score showed good predictive power.

CTP
CTP score and classification is uncomplicated and classical, which have long been used to evaluate liver function reserve, surgical risk and prognosis [10]. In this study, we analysed the predictive value of CTP in predicting outcomes of AVB patients. The results showed that CTP had the most excellent predictive power with the pooled AUC value of 0.824 in in-hospital mortality. The pooled sensitivity was highest (0.910, 95% CI: 0.864-0.944) with a specificity of 0.666 (95% CI: 0.635-0.696), which means CTP was superior to other risk scores in identifying patients who were at high risk of death in hospital. The predictive power was slightly declined in follow-up mortality with the pooled AUC value of 0.746. With a high pooled specificity of 0.806 and sensitivity of 0.722, CTP was also effective at triaging low-risk patients for early release or less intensive treatment, which had significant healthcare implications.

MELD was first proposed by Malinchoc and later modified and improved by Malinchoc et al. and Kamath et
al. [11,12]. The MELD score, according to Forman, was a valuable addition to the repertoire of prognostic instruments, and it seemed likely to dethrone the Child-Turcotte-Pugh method from its throne in the prognosis of chronic liver disease [45]. In contrast, Cholongitas stated that MELD did not perform better than the Child-Turcotte-Pugh score in non-transplant settings [46]. In this study, MELD had a lower pooled AUC value than CTP (AUC: 0.788 vs.0.824) in in-hospital mortality but had the highest pooled AUC value of 0.798 in follow-up mortality. That meant MELD was not as good as CTP in predicting in-hospital mortality but performed better in predicting outpatients outcomes in 3-month follow-up.

GBS
Stanley suggested that the GBS could identify UGIB patients who can be managed safely as outpatients with an area under ROC curve of 0.90 [47]. An international multicentre prospective study involving 3012 patients showed that GSB was best (AUC: 0.86) at predicting intervention or death [48]. In that study, the number of patients with AVB was only 143 and accounted just for 7% among patients who had gone through endoscopy. In our meta-analysis, the results were different when there were only AVB patients. GBS showed low predictive power neither in mortality nor rebleeding outcomes with no AUC value more than 0.7. The cause might be that the GBS was 3-month mortality and 3-month rebleeding a 1: retrospective; 2: prospective; "-": not mentioned b AUCs were less than 0.5, and were not included into meta-analysis.  developed in most ANVUGIB, which usually had a milder condition and better prognosis.

AIMS65
The AIMS65 is a scoring system designed by Saltzman et al. on 29,222 patients' clinical data analysis and integration. It is mainly used to assess the fatality rate of UGIB patients [9]. Hyett et al. compared AIMS65 and GBS in 278 UGIB patients and suggested that the AIMS65 score was superior in predicting inpatient mortality (AUC, 0.93 vs. 0.68, p < .001) [49]. The results in this meta-analysis concerning only AVB patients were similar to the previous research. AIMS65 performed better than GBS with a pooled AUC value of 0.793 in hospital and 0.77 in follow-up mortality.

Rockall score
Rockall score was developed and established based on a prospective, unselected, multicentre study in 1996 [8]. Robertson reported that the full Rockall score had an AUC value of 0.78 in predicting AUGIB inpatient mortality based on 424 study patients [50]. Results were similar in AVB patients according to the results in our meta-analysis. The Full Rockall score had a good predictive power with a pooled AUC value of 0.75 in in-hospital mortality. However, it was low in follow-up mortality (AUC: 0.678). Not all patients had the chance to undergo endoscopy limited the application of full Rockall scores. To solve that problem, there came the clinical Rockall score. However, compared to other risk scores, the articles included concerning clinical Rockall score were decreased (n ¼ 3), and the   meta-analysis in in-hospital mortality was unable to conduct. The pooled AUC of follow-up mortality was 0.704, which showed that the clinical Rockall score had a moderate predictive power.

Strengths and limitations
To our knowledge, no previous studies have conducted a meta-analysis to compare the predictive value of risk scores in AVB patients, despite that the AVB is life-threatening and patients would benefit most from risk stratification. Furthermore, the search was conducted in six different databases, allowing for greater comprehensiveness in the systematic search. The small number of included studies of clinical and full Rockall scores when pooling sensitivity and specificity is a limitation in this meta-analysis.
When pooled sensitivity and specificity for the six risk scores, we found a high I 2 statistic, indicating significant heterogeneity. The AVB patients included were most with cirrhosis. However, the aetiology of cirrhosis could be different, like virus hepatitis, alcohol and other reasons. What is more, some studies included AVB patients not only by cirrhosis but also other-cause portal hypertension. The different aetiology of oesophageal and gastric varices might cause selection bias. There was also clinical heterogeneity as studies used different follow-up time. We dealt with the clinical heterogeneity by performing subgroup analyses for different follow-up time. Artificial intelligence is showing considerable potential in risk stratification. Shung Dennis L developed and validated a machine learning odel for UGIB, showing 100% sensitivity with a specificity of 26% [51]. A similar method can be introduced in AVB patients, which could be more feasible and helpful.

Addition to previous research
Ramaekers et al. to our knowledge, has performed a systematic study on the predictive value of risk scores in UGIB patients [52]. That study concluded all UGIB patients in the emergency department and did not perform subgroup analysis in AVB and ANVUGIB. Besides, the CTP and MELD score were not involved. The review concluded that GBS with a pooled sensitivity of 0.99 and a specificity of 0.08 (cutoff score ¼ 0) was superior to other risk scores for identifying lowrisk UGIB patients accurately. What is more, according to both US and UK guidelines, a GBS of zero was recommended to be used to classify very low-risk AUGIB patients who can avoid admission [4,6]. However, in our meta-analysis, the results were different. GBS showed low predictive power neither in mortality nor rebleeding with an AUC value no more than 0.7. Besides, with a pooled sensitivity of 0.783 and a specificity of 0.493 for overall mortality, GBS was not the best when compared to CTP with a pooled sensitivity of 0.848 and a specificity of 0.707. This indicated that GBS performed flawlessly, mainly in UGIB patients, especially in NVUGIB patients rather than AVB patients. However, only when the endoscopy has been completed would the causes of bleeding be known. The endoscopy is needed to achieve the highest accuracy resulting in GBS not being so favourable. Thus, if a patient is suspected of bleeding from varicose veins, it should be cautious when using GBS to identify lowrisk patients. The CTP combined with GBS in risk stratification might be a safer but more complicated choice and needed further validation. Horibe M et al. developed a novel and simple scoring system, namely HARBINGER, to predict the outcomes for nonvariceal and variceal bleeding patients [53]. This study showed that the HARBINGER had greater accuracy than the GBS in predicting the urgency for an endoscopic intervention in all-cause UGIB patients (AUC, 0.74 vs. 0.63; p < .001). This simple score was further validated in a prospective multicentre Japanese setting involving 1486 patients. It showed that the new triage system set at one was proved accurate in ruling out the suspected UGIB patients with a sensitivity of 98.8% and specificity of 15.5% [54].

Conclusion
No risk scores were ideally identified by our systematic review (CTP, MELD, GBS, AMIS65, full Rockall and clinical Rockall). CTP was superior over other risk scores in identifying AVB patients at high risk of death in hospital and patients at low risk of death within followup. Guidelines have recommended the use of GBS to risk stratification of patients with upper gastrointestinal bleeding. However, when it is suspected that the cause of upper gastrointestinal bleeding is from oesophageal and gastric varices, extra care should be taken. Because in this meta-analysis, it was found that the ability of GBS in predicting the death and rebleeding of AVB patients was limited. More researches are needed to validate it in the future. Artificial intelligence might be an important direction for future development to help risk-stratify AVB patients.

Disclosure statement
No benefits in any form have been received or will be obtained from a commercial party related directly or indirectly to the subject of this article.

Author contributions
HC conceived the study. LY, NW and RS performed the analysis. LY was a major contributor in writing the manuscript. All authors have read and approved the manuscript.

Data availability statement
The study presents an analysis of data published in other studies.

Funding
The author(s) reported there is no funding associated with the work featured in this article.