Radiomics analysis of ultrasound to predict recurrence of hepatocellular carcinoma after microwave ablation

Abstract Objective To develop and validate an ultrasonic radiomics model for predicting the recurrence and differentiation of hepatocellular carcinoma (HCC). Convolutional neural network (CNN) ResNet 18 and Pyradiomics were used to analyze gray-scale-ultrasonic images to predict the prognosis and degree of differentiation of HCC. Methods This retrospective study enrolled 513 patients with HCC who underwent preoperative grayscale-ultrasonic imaging, and their clinical characteristics were observed. Patients were randomly divided into training (n = 413) and validation (n = 100) cohorts. CNN ResNet 18 and Pyradiomics were used to analyze ultrasonic images of HCC and peritumoral images to develop a prognostic and differentiation model. Clinical characteristics were integrated into the radiomics model and patients were stratified into high- and low-risk groups. The predictive effect was evaluated using the C-index and receiver operating characteristic (ROC) curve. Results The model combined with ResNet 18 and clinical characteristics achieved a good predictive ability. The C-indices of early recurrence (ER), late recurrence (LR), and recurrence-free survival (RFS) were 0.695 (0.561–0.789), 0.715 (0.623–0.800) and 0.721 (0.647–0.795), respectively, in the validation cohort, which was superior to the clinical model and ultrasonic semantic model. The model could stratify patients into high- and low-risk groups, which showed significant differences (p < 0.001) in ER, LR, and RFS. The area under the curve for predicting the degree of HCC differentiation was 0.855 and 0.709 in the training and validation cohorts, respectively. Conclusion We developed and validated a radiomics model to predict HCC recurrence and HCC differentiation, which could also acquire pathological information in a noninvasive manner. KEY RESULTS A hepatocellular carcinoma (HCC) prognostic prediction model was developed and validated by convolutional neural network (CNN) ResNet 18-based gray-scale ultrasound (US). A differentiation of HCC prediction model was developed for preoperative prediction avoiding invasive operation. Compared with Pyradiomics, CNN ResNet was more suitable for extracting information from US images.


Introduction
Hepatocellular carcinoma (HCC) is the fourth most common cancer and ranks sixth as the most common cause of tumorrelated mortality worldwide [1]. According to the World Health Organization, more than one million patients will die from HCC by 2030. Chinese patients account for 51.2% of all HCC patients worldwide; HCC has a prevalence of 36.5/ 10,000 and a mortality rate of 31.9/10,000 in China [2].
Ultrasound (US) is cost effective during the diagnostic, therapeutic, and postoperative monitoring processes [3][4][5]. In particular, ablation has been included in many guidelines for the treatment of early-stage HCC [6][7][8]. Of all the guide modalities of ablation, US is the most used and pivotal modality during preoperative decision making, operative guidance, and postoperative follow-up with real-time advantages [9], which is unique in surveillance.
Nevertheless, because of the heterogeneity of tumors, the prognosis of patients with HCC is different and challenging. Among HCC patients who receive curative therapy, 50-70% develop recurrence within 5 years [1,[10][11][12]. Additionally, the degree of differentiation between HCC and metastasis of HCC may affect patient outcomes. Although previous staging systems, such as Barcelona Clinic Liver Cancer, tumor-nodemetastasis systems, and Hong Kong Liver Cancer have been proposed for recurrence prediction-based clinical tests (e.g., tumor size and diameter), patients with similar sizes and numbers of tumors have different prognoses, which might be attributed to histopathology, such as tumor grade and heterogeneity, and identifying this information before biopsy or surgical resection is very arduous.
In 2012, Professor Lambin [13] proposed the concept of radiomics, which contains numerous medical imaging information related to the microstructure of tumors that cannot be identified by the naked eye, such as the histopathological structure and heterogeneity of the tumor. The information extracted from medical imaging may be related to the prognosis of HCC.
Recently, some radiomics studies have focused on the relationship between HCC recurrence and computed tomography (CT) or magnetic resonance imaging (MRI) [14][15][16][17][18]. Although they achieved a good predictive ability [19] using preoperative medical imaging features, the cost and concern for radiation exposure have limited the use of these techniques. To the best of our knowledge, only a few researchers have focused on US imaging. Therefore, we developed and validated a radiomics model based on gray-scale US to predict the recurrence of HCC, stratify patients into different risk groups, and predict the pathological differentiation of HCC in a noninvasive manner.

Materials and methods
We retrospectively collected data of 1880 patients with HCC who underwent microwave ablation (MWA) at our department between January 2009 and December 2017. Ethical approval was granted by the institutional ethics committee. The approved protocol number in the Human Subjects section is S2019-348-01. US imaging and clinical information were retrieved from the database. The inclusion criteria were as follows: (1) pathology or two modalities with enhanced imaging confirming HCC; (2) a single tumor 5 cm or tumor number 3 and maximum diameter 3 cm; (3) availability of clear gray-scale US images within 2 weeks before MWA; and (4) complete clinical characteristics. The exclusion criteria were as follows: (1) other treatments before MWA (such as surgical resection, radiation, or trans-arterial chemoembolization); (2) concomitant with other malignant tumors; (3) loss to follow-up; (4) unclear US images or incomplete tumor on US; and (5) incomplete ablation and local tumor progression. A total of 513 patients were included in this study and randomly divided into a training cohort (400 patients) and validation cohort (113 patients) at a ratio of 4:1. Of the 513 patients, 270 had a clear pathological grade diagnosis and were divided into a training cohort of 197 patients and validation cohort of 73 patients. The primary endpoint was recurrence-free survival (RFS), which was defined as the time between the start of operation and disease recurrence or death according to consensus guidelines for the definition of time-to-event endpoints [20].
The secondary endpoint was early recurrence (ER), which was defined as the time between the start of the treatment and recurrence within 2 years. Late recurrence (LR) was defined as the time between treatment and recurrence within 5 years. A flowchart of the process is shown in Figure 1. The requirement for informed consent was waived because of the retrospective study design.

Treatment procedure
Ablation procedures were performed by experienced doctors (J. Y. and Z. G. C.), who had more than 15 years of experience in ablation. The patients were placed in the supine or oblique position. Before the MWA antennae were placed, color Doppler and conventional US imaging were performed to determine the proper path for inserting the antennae. Subsequently, 1% lidocaine was injected between the skin and peritoneum along the scheduled path. Intravenous anesthesia was administered during ablation. All MWA antennae were placed on the target lesion under contrast-enhanced ultrasound (CEUS)/US guidance. During ablation, the surgeon monitored whether the ablation zone (hyperechoic area) completely covered the target lesion. After ablation, CEUS was performed to evaluate whether complete ablation had been achieved. Complete ablation was defined as an ablative margin covering at least 5 mm of the peri-tumor liver tissue. The procedure was performed using a cooled-shaft microwave system (KY-2000; Kangyou Medical, China).

US imaging acquisition and analysis
We retrospectively collected preoperative gray-US images and selected dynamic US videos, including complete liver lesions, sketched liver lesions, and liver background. The software used for sketching was LiverSketch, which was designed at Zhe Jiang University. All images were in DICOM format. The images were sketched into regions of interest (ROIs) by two doctors (Z. S. Y. and Z. J. K.) trained in US with more than 10 years of experience in liver US examination. Images of the liver lesions were sketched from their appearance in the video image until the lesions disappeared. The ROI was strictly based on the morphological and edge characteristics of the lesion and background, avoiding sketching the shadow behind the lesions into the ROI, and the unclear lesions were eliminated during this process. After sketching, a US doctor (Y. J.) with 15 years' experience reviewed these images to ensure a proper outline, and if the senior doctor did not pass the review, these images were sketched again until the standard was met. Before using deep learning and radiomics to analyze the prognostic value of US images, we investigated whether some conventional US features were associated with the prognosis of HCC. During the process of outlining the ROI, we also assigned six US semantic features, which included echogenicity (hyper-iso-hypo-heterogeneous echogenicity), morphology (round-oval-irregular), hypoechoic halo (without-thin-thick hypoechoic halo), boundary (smoothnot smooth), posterior acoustic enhancement (absent-present), and intratumoral vascularity (absent-present). These features were analyzed in relation to HCC prognosis.

ResNet and pyradiomics model
Before the ROI of liver lesions and peritumoral images were inputted, the US images needed to be pre-processed to increase data of the training cohort and reduce overfitting, which included translation, horizontal, and vertical flipping, and named data augmentation. Data augmentation was performed only in the training cohort. Two methods were used to develop predictive models: Convolutional neural network (CNN) Residual Network 18 (ResNet 18) and Pyradiomics software [21].

ResNet 18 model
We then input the ROI of the training cohort into the classification network and revised the model after comparing it with the real label. In this process, the predicted value was the average value of every US image. Finally, a feature map was created including information on the input US images. In this process, the error continued to decrease in the training cohort; however, the error in the validation cohort decreased at the beginning and then increased. The training was stopped when the validation cohort achieved the best predictive effect. Peritumoral information was collected by outreaching 40 pixels. The workflow is illustrated in Figure 2.

Pyradiomics model
Pyradiomics software is an open-access software based on Python, which can extract radiomics features from medical images, including two-dimensional or three-dimensional images. After acquiring the ROI, the software can extract first-order statistics, shape-based and textual-based features, and so on. Principal component analysis and lasso regression were used after features were extracted to reduce redundancy and dimensions. Logistic regression, random forest, and support vector machines were used to identify features and calculate the prediction value. Correlation analysis was used to screen for US features associated with recurrence of HCC. Peritumoral information was collected by outreaching 40 pixels.

Follow-up
CEUS was performed immediately after MWA to evaluate the treatment effect. If CEUS showed incomplete ablation, secondary ablation was performed immediately under US guidance. In the first month after ablation, contrast-enhanced MRI was performed to evaluate the effect of treatment. At the third month after ablation, gray-scale US was used to examine whether recurrence or intrahepatic metastasis had developed. If recurrence was suspected, contrast-enhanced imaging was performed for further evaluation. CE-MRI was performed once every month till the third year after initial therapy and then rechecked once a year until the endpoint, which included intrahepatic recurrence, extrahepatic metastasis, and death.

Statistics
Continuous variables were analyzed using Student's t-test. Categorical variables were analyzed using the chi-square test or Fisher's exact test. Univariate and multivariate Cox regression analyses were used to identify variables related to HCC prognosis. The variable with p < 0.1 in univariable analysis was included in multivariable Cox regression. The results are presented as hazard ratios (HRs) and 95% confidence intervals (CIs). The predictability of the model was evaluated using the area under the curve (AUC), C-index, and 95% CI. The log-rank test was used to evaluate whether there were significant differences between the different risk groups. Statistical significance was set at p < 0.05. All statistical analyses were performed using R software (version 3.6.1, R Foundation for Statistical Computing, http://www.r-project. org). The R packages used in this study are survival, pROC, ggplot2, survminer, rms, survcomp, and nomogramEx.

Results
A total of 513 patients were included in this study. The clinical characteristics of the patients are summarized in Table 1. The mean age was 58.6 ± 11.0 years, and the mean tumor size was 2.96 ± 1.13 cm. Complications are shown in Table E1 according to CIRSE classification system. A total of 146 patients developed ER, 226 patients developed LR, and 293 patients developed recurrence or death during the follow-up interval. No significant differences were observed between the training and validation cohorts. In the model for predicting HCC differentiation, there was no significant difference between the cohorts. The US semantic features are presented in Figure E1 and Table E2. The Cox regression analysis of the semantic features of the US is shown in Table E3. In  these features, intratumoral vascularity might be associated with ER (HR ¼ 1.712, 95% CI: 1.233-2.378, p ¼ 0.001), LR (HR ¼ 1.449, 95% CI: 1.114-1.884, p ¼ 0.006) and RFS (HR ¼ 1.349, 95% CI: 1.071-1.698, p ¼ 0.011). However, after validation by Kaplan-Meier analysis, there was no significant difference in any semantic feature, as shown in Figure E2. For the ResNet 18 model and Pyradiomics model constructed by tumor and peritumoral imaging, the AUCs of the ResNet 18 model for predicting ER, LR, and RFS were 0.685, 0.748, and 0.728 in the training cohort and 0.694, 0.653, and 0.614 in the validation cohort, respectively, which is superior to those of the Pyradiomics model. The predictability of adding peritumoral images was superior to that of adding tumor images. The receiver operating characteristic (ROC) curves are shown in Figure E3.

Developing prognostic model combined with the clinical characteristics
Clinical characteristics were analyzed using univariate and multivariate Cox regression, as shown in Table 2. After multivariate Cox regression analysis, diameter, tumor number, AFP(Alpha-fetoprotein), ALBI (albumin-bilirubin) grade, AST(Aspartate aminotransferase), and TBIL (Total bilirubin) were related to ER; diameter and tumor number were related to LR, RFS, and AFP; and PLT was related to RFS. We integrated clinical characteristics and ResNet 18 and Pyradiomics into a multivariate Cox regression model. Table  3 summarizes the C-index of different models. For ER, the Pyradiomics mixed model was superior to the clinical model (p < 0.001), and the ResNet 18 mixed model was better   than the Pyradiomics model (p < 0.001) in the training cohort; however, there was no significant difference in the validation cohort between the three models (p > 0.05). For LR, the Pyradiomics mixed model was better than the clinical model (p < 0.001). There was no difference between the Pyradiomics and ResNet 18 models in the training cohort (p ¼ 0.092). For RFS, the ResNet mixed model was better than the clinical and Pyradiomics models in both the training (p < 0.001) and validation cohorts (p ¼ 0.043).

Risk stratification and nomogram
ResNet 18 model achieved the best performance among the four models. After the ResNet 18 mixed model was developed, patients were divided into two risk groups: high-risk and low-risk. The Kaplan-Meier method was used to compare patients in the ER, LR, and RFS groups as shown in Figure 3 and showed that there was a significant difference (p < 0.001) between the two groups regardless of ER, LR and RFS in the training and validation cohorts. The high-risk group had a higher risk of recurrence than the low-risk group. We also drew a nomogram to visualize the Cox regression model of RFS, as shown in Figure 4. The calibration curves of the training and validation cohorts are shown in Figure E4.  Table E4 shows the clinical characteristics, and there was no significant difference in clinical characteristics between the training and validation cohorts. Considering the better performance of the ResNet 18 model, we used ResNet 18 to develop a model to predict the differentiation of HCC; tumor, peritumoral, and clinical information were included in this model. Table E5 shows that the mixed model combined with tumor, peritumoral, and clinical information achieved the best predictive value in both the training and validation cohorts. The AUC and true positive rates were 0.885 and 0.853 in the training cohort, and 0.709 and 0.769 in the validation cohort, respectively. The ROC curves are shown in Figure 5.

Discussion
This study aimed to develop a radiomics model based on US imaging to predict the recurrence and pathological differentiation of HCC. First, the relationship between the semantic features and prognosis was investigated. Cox regression and Kaplan-Meier analyses showed that there was no relationship between US semantic features and the prognosis of HCC. Subsequently, CNN ResNet 18 and Pyradiomics were used to develop a prognostic prediction model. The two radiomics models performed better than the clinical model. CNN ResNet 18 achieved a better effect than Pyradiomics. The reason why Pyradiomics achieved a relatively poor predictive effect might be attributed to the fact that this software is mostly used in X-ray, CT, and MRI images [22][23][24] and might not be suitable for US images. US images have the advantages of real-time and dynamic characteristics, and these advantages bring some challenges in radiomics analysis, resulting in US images that are not standardized during the examination, unlike CT or MRI. Radiomics analysis of US images is relatively broadly used in superficial organs, such as breast lesions, lymph nodes, and thyroid [25][26][27], but rarely in abdominal organs because deeper organs may be affected by gas in the gastrointestinal tract and lungs. In addition, Fornacon-Wood et al. [28] pointed out that the prognostic value is highly dependent on the platform used to extract features. After the addition of peritumoral US images, there was a significant difference (p < 0.01) in the CNN ResNet 18 model, in which the AUC was 0.54-0.69 for ER and 0.59-0.65 for LR, but there was no difference in RFS. This is similar to previous research showing that peritumoral images could improve the predictive value of radiomics analysis [19,29]. However, there was no significant difference after the addition of peritumoral imaging information to the Pyradiomics model. This indicates that CNN ResNet can extract more information than Pyradiomics. The ResNet 18 mixed model was superior to the other three models in both the training and validation cohorts in terms of LR and RFS, indicating that radiomics analysis for US images could extract the prognostic information hidden in the images, which might help doctors stratify patients and make better clinical decisions, similar to previous research [30,31]. CNN ResNet 18 also achieved good performance in predicting the differentiation of HCC. The AUCs were 0.885 and 0.709 in the training and validation cohorts, respectively, which could be used to determine the degree of differentiation of HCC before biopsy or surgery. Most US radiomics studies have focused on the differential diagnosis between benign and malignant lesions [31,32] or between HCC and cholangiocarcinoma [33]. However, the differentiation of HCC is vital and may influence follow-up decisions after treatment. Our study presented a viable method for acquiring this information without any invasion and achieved a comparable performance to existing research using MRI [17,34]. Moreover, our study focused on static gray-scale US images without CEUS. A few studies have explored the value of CEUS in clinical decisions and the prediction of recurrence [30,35]. These studies mostly focused on the use of superficial organs in CEUS. However, there are no mature programs on dealing with tens of thousands of frames of CEUS images, especially for deeper abdominal organs such as liver lesions and those influenced by breathing. Further studies should explore the information hidden deep in CEUS images.

Limitations
This study has several limitations, the first of which is its retrospective nature. Although quality control was used in the US images, there is still a gap with prospective studies. Second, this was a single-center study, and its predictive value needs to be validated in an external cohort. Third, CEUS should be included in further studies; CEUS presents far more information, including time-related flow perfusion information than gray-scale US, although delineating the tumor is a key problem during different time phases.

Conclusion
Using radiomics to analyze US images was a feasible, convenient, and effective way to predict the prognosis and differentiation of HCC. CNN ResNet 18 model had a better performance in predicting HCC recurrence. Confirmation of the degree of differentiation could avoid invasive examinations such as biopsy and surgery to assist clinical decision making and follow-up and achieve better treatment effects.