Radiomics in surgical oncology: applications and challenges

Abstract Surgery is a curative treatment option for many patients with malignant tumors. Increased attention has focused on the combination of surgery with chemotherapy, as multimodality treatment has been associated with promising results in certain cancer types. Despite these data, there remains clinical equipoise on optimal timing and patient selection for neoadjuvant or adjuvant strategies. Radiomics, an emerging field involving the extraction of advanced features from radiographic images, has the potential to revolutionize oncologic treatment and contribute to the advance of personalized therapy by helping predict tumor behavior and response to therapy. This review analyzes and summarizes studies that use radiomics with machine learning in patients who have received neoadjuvant and/or adjuvant chemotherapy to predict prognosis, recurrence, survival, and therapeutic response for various cancer types. While studies in both neoadjuvant and adjuvant settings demonstrate above average performance on ability to predict progression-free and overall survival, there remain many challenges and limitations to widespread implementation of this technology. The lack of standardization of common practices to analyze radiomics, limited data sharing, and absence of auto-segmentation have hindered the inclusion and rapid adoption of radiomics in prospective, clinical studies.


Introduction
Cancer is the second leading cause of death worldwide, accounting for an estimated 9.6 million deaths in 2018 [1]. Advances in oncologic treatment, including novel surgical techniques, combined modality approaches and new chemotherapeutic regimens, have led to significant improvements in both cancerrelated mortality and disease-free survival in multiple tumor types. Efforts to best identify patients who will respond to and benefit from such therapies remains a critical research priority in oncology.
Chemotherapy and surgery are the mainstay treatments for many tumor types [2,3]. While surgery remains the curative treatment for many solid tumors, combined surgical treatment with chemotherapy has been associated with improved survival and systemic disease control [4,5]. Neoadjuvant chemotherapy, which refers to the administration of systemic treatment before surgery, is routinely administered for inoperable breast, colorectal and lung cancers, and is also an option for other solid tumors [6][7][8][9]. Such upfront therapy is intended to reduce the size of the tumor and control progression of disease. Downstaging the tumor with upfront systemic therapy may increase margin-negative resections, make previously inoperable tumors resectable, and help manage micro-metastatic disease [10][11][12]. Adjuvant chemotherapy is provided after surgical intervention and is designed to reduce recurrence of disease [13]. Adjuvant chemotherapy is routinely used in high-risk patients with breast, colon, testicular, ovarian, lung, and pancreatic cancers [14].
As there are risks associated with both neoadjuvant and adjuvant chemotherapy, efforts to determine which patients may benefit from treatment are warranted. For neoadjuvant chemotherapy, progression of disease or development of metastasis may preclude surgical intervention, rendering the disease harder to treat [15]. Additionally, side effects from neoadjuvant therapy may leave a patient unfit for surgery. Adjuvant chemotherapy may similarly lead to long term complications due to the side effects from prolonged treatment [16]. Identifying predictors of treatment effectiveness would not only improve patient selection but also optimize the timing of these treatments.
Evaluation of imaging modalities like computed tomography (CT), contrast enhanced ultrasound, magnetic resonance imaging (MRI), and positron emission tomography (PET) to determine chemotherapy response of patients in different studies have given inconsistent results [17,18]. Analysis in these studies relied on either the sole expertise of radiologists or limited quantitative imaging parameters [17,18]. While radiologists have the training and ability to discern many imaging features, microscopic details relevant to clinical outcomes may be missed due to the limitations in human visualization [19].
Radiomics is an emerging and ever evolving field of study involving the extraction of many advanced quantitative features from radiographic images [20]. Computer algorithms are used to mine these features by analyzing multiple aspects of the images (i.e. physical, textural, histogram, filter-based, and fractal) [20]. Radiomics have been used to characterize tumor phenotypes for predicting prognosis and therapeutic response for various diseases [21,22]. Radiomics serve as a diagnostic and prognostic tool, which can be applied at many different timepoints in oncologic care. Use of this technology in the diagnosis stage, treatment planning stage, and surveillance stage (as outlined below in Figure 1) can complement traditional oncologic care and help personalize treatment options for patients with solid tumors. In this paper, we analyze and summarize papers that use radiomics to predict prognosis, recurrence, survival, and therapeutic response to chemotherapy in both neoadjuvant and adjuvant settings. Such data can be leveraged to appropriately identify patients who may benefit from perioperative chemotherapy and help optimize timing of surgery.

Technical approaches
Radiomics is an advanced computational method that derives radiological features from medical images with the ability to identify, diagnose, and predict patient response for various diseases. Radiomics can be derived explicitly using so-called handcrafted features, or implicitly using deep learning approaches. The general workflow of radiomics involves four main components: image acquisition, tumor segmentation, radiomic signature, and machine learning analysis. Each aspect of the workflow is complex, and the implementation varies across institutions and research groups. Figure 2 gives a general overview of the radiomic process from patient to predicted outcome.

Patient cohorts
The studies summarized in this paper were identified by combinations of the following search terms on Google: "adjuvant", "neoadjuvant", "cancer", "machine learning", "chemotherapy", and "radiomics". Twelve papers including patients who received neoadjuvant therapy and nine including patients who received adjuvant therapy are included in this review. These Figure 1. This graphic gives a patient a top-level view of their progression from an initial screening, chemotherapy, surgery, and surveillance after resection. Radiomics serve as a diagnostic and prognostic tool, which can be applied at many different timepoints in oncologic care. Use of this technology in the diagnosis stage, treatment planning stage, and surveillance stage can complement traditional oncologic care and help personalize treatment options for patients with solid tumors [23]. particular papers were chosen due to their varying cancer types, depth of knowledge conveyed about machine learning, and thorough understanding of radiomic analysis.
Tables 1 and 2 illustrate the breadth of tumor types included in this review: ovarian, breast, gastric, pancreatic, rectal, gastroesophageal junction, cervical, prostate, sarcoma, mesothelioma, glioblastoma, lung, and skin and soft tissue. While the prognosis and management of each of these tumors is unique, radiomics provide a universal tool to aid in diagnosis and prognosis for these patients. In each study, standard protocols concerning patients and protected health information (PHI) are adhered to, such as ensuring the anonymization of PHI, having accountability to an institutional review board, and including waivers of informed consent. Clinical factors, which vary for each study, are acquired by reviewing patients' medical records. Patient inclusion and exclusion criteria are given that explain how the final cohort is developed. Each study reviewed patients for a duration of years, with some using a second or third cohort from another window of time as a validation set. Some studies highlighted in this review paper use cohorts from multiple institutions, with one set being the training data, and the other serving as the validation set. All of the performance metrics given in Tables 3  and 4 are derived from the validation sets of each paper.

Image acquisition & data processing
Each medical center represented in these studies uses different scanning equipment, parameters, and , tumor segmentation, radiomic signature, machine learning analysis, and the predicted outcome. These studies capture prognosis, recurrence, therapeutic response, survival, and tumor volume. These various outcome categories describe different aspects of the effectiveness of either neoadjuvant or adjuvant chemotherapy [24,25].  reconstruction methods in the acquisition of their images. The imaging modalities across the studies vary between computed tomography (CT) with or without contrast, magnetic resonance imaging (MRI) at 1.5 and/or 3.0 Tesla, radiomics of multiparametric MRI (RMM), and fluorodeoxyglucose-positron emission tomography (FDG-PET). The segmentation of tumor regions likewise varies between institutions and studies. Manual segmentation, semi-automatic, and automatic segmentation methods are represented across these studies. All studies have the segmentation results under the supervision of and approved by one or more expert radiologist. Each study scaled and normalized the segmented images and applied dilation or filtration according to their standards. Tables 1 and 2 give the imaging modality used for each study.

Radiomic analysis
The extraction of radiomic features deviates between each study. The number of initial features range from the hundreds to the thousands. Matlab, the PyRadiomics library in Python, and other in-house software developed by some institutions are used to extract radiomics features from the region of interest. Many different methods of feature selection are used to reduce the number of features. First, the majority of institutions remove duplicate values from the feature list. Most state the exclusion of highly correlated features, as well as grouping highly correlated features together. These groups are identified by a Pearson linear correlation coefficient (varies by study), and all but the most significant features are eliminated. To further reduce the dimensionality of the features, the studies use one or more of these techniques: least absolute shrinkage and selection operator (LASSO), recursive features elimination based on Naive Bayes (RFE-NB), selection by filter based on linear discriminant analysis (SBF-LDA), and recursive features elimination based on random forest (RFE-RF). Final parsing of features is completed by using L1 regularization, Cox modeling, and Mann-Whitney U test. Tables 1 and 2 summarize the radiomic features extracted, selected, and the models used to reduce feature dimensionality. Figure  3 gives a visual representation of a generalized version of image data processing and radiomic analysis done in these studies.

Prediction modeling
The prediction models created from each study are guided by the desired outcome. The model types can be broken up into four main outcomes -prognosis, recurrence, survival, and therapeutic response. Studies analyzing recurrence or therapeutic response of any kind (ImmunoScore of gastric cancer (IS gc ), pathologic complete response (pCR), pathologic good response (pGR), radiomics ImmunoScore (RIS), radiomic score (RS), tumor size) use supervised learning methods [26][27][28]. Whereas studies analyzing prognosis or survival (disease-free survival (DFS), overall survival (OS), progression-free survival (PFS), recurrence-free survival (RFS)) use unsupervised learning techniques.    and Service Solutions (SPSS), and R software are the primary software packages used for statistical analysis in these studies.

Deep learning
Deep learning is a recent advancement in artificial intelligence that incorporates many layers of computation to extract high-level features from raw input [47]. Deep learning has been useful in many areas of interest within medical applications such as survival, image segmentation, and classification. These algorithms increasingly are implemented in medical imaging and have great potential for research and as an important tool in clinical radiology [48]. Unlike shallow layered machine learning algorithms that require hard-coded and handcrafted features, deep learning algorithms derive generalized features from learning on a training set. This model established through training is validated on an unseen test set, and gives predictions of the probability of a certain class. Convolutional neural networks (CNNs), stacked denoising autoencoders (SDAs), and deep recurrent neural networks (RNNs) are examples of different types of deep neural networks. Figure 4 gives a top-level overview of the ways radiomics can be implemented between using handcrafted features and machine learning versus implicitly with deep learning.
One drawback of deep learning is that these algorithms need a vast amount of data to detect a reliable pattern for successful predictions and regression outcomes. Individual medical centers may not have the appropriate data needed to implement these algorithms. Transfer learning, data augmentation, and regularization techniques are some methods used to overcome the dearth in available data.

Clinical applications
In the previous section, various technical approaches used to develop a radiomics model are summarized. This section describes the clinical applications of radiomics and highlights some of the performance metrics from the aforementioned studies. These compiled performance metrics do not describe the full range of results for each study, but a succinct picture of comparable performance metrics. Many of these studies combine their radiomic features with clinical and other quantitative features. These results report solely the radiomic feature results from these studies for simplicity. We do not attempt to analyze any of the data listed in this section, but show the range of metrics to give a sense of typical results from these types of studies. Tables 3 and 4 show the performance metrics for the radiomics studies highlighted in this review paper, separating the studies that focused on neoadjuvant chemotherapy from the studies that focused on adjuvant chemotherapy.

Improved tumor detection
Traditional imaging modalities, including CT and MRI, are critical for the characterization of tumor burden. These imaging modalities are utilized for clinical decision making, treatment planning, and prognosis. Despite high resolution images, small metastatic deposits may be missed. Clinically, detection of these micrometastases may impact surgical planning or treatment course. By using texture analysis to assess subtle changes in tissue, radiomics may allow for improved detection of early micrometastatic disease. One study, published in 2018, tested this hypothesis through evaluation of a mouse model in colorectal cancer. In this study, three independent texture features were associated with later appearance of metastatic deposits on MRI, thus suggesting that early tumor detection may be possible using this technique [49]. While many studies have focused on utilization of radiomics for distinguishing benign and malignant lesions or assessing therapy response, such data are promising as they suggest a role for radiomics in initial diagnosis.

Predicting response after neoadjuvant therapy
Appropriate selection of patients for neoadjuvant therapy remains a top research priority for a myriad of oncologic diseases. The intent of neoadjuvant therapy is to decrease tumor size, control micro-metastatic disease, and allow for appropriate patient selection for surgery. Radiomics have been studied as a tool for the a priori identification of patients who would benefit from such upfront chemotherapy. For patients with gastric cancer, for example, radiomics have been utilized to accurately predict response to systemic therapy. A recent study, including 106 patients with neoadjuvant chemotherapy before gastrectomy, introduced a CT-based radiomics score to predict response. The published" radclinicalscore" incorporated clinical variables and radiomic features, and was demonstrated to be highly effective at predicting treatment responders (AUC 0.77 in the training cohort, and AUC 0.82 in the validation cohort) [50].
In some tumors, such as locally advanced rectal cancer, pathologic complete response is possible after upfront chemoradiotherapy. As such, identification of treatment responders may obviate the need for resection in these patients. Notably, use of total neoadjuvant therapy (TNT) has become standard of care at many institutions, given the results of large-scale clinical trials in locally advanced rectal cancer [51]. Given this paradigm shift in rectal cancer treatment, new methods to predict pathologic complete response in this patient population have been developed [52]. Similarly, pretreatment radiomic features have been shown to predict pathologic response after neoadjuvant therapy in patients with non-small cell lung cancer and breast cancer [53,54]. These noninvasive proxies for tumor pathology have the potential to revolutionize locoregional options for cancer treatment and potentially avoid unnecessary surgery in treatment responders.

Selecting patients for adjuvant therapy
In the adjuvant setting, radiomics can be utilized to help clinicians prognosticate after surgery and determine which patients may require more aggressive treatment or follow-up. The integration of radiomics into clinical risk scores may improve risk stratification of patients and selection of patients who would benefit from adjuvant therapy after tumor resection. Such models have been successfully developed and validated in lung cancer [46]. While half of the adjuvant AUC results in this study range from 0.66 to 0.79, with the other half in the 0.80 s (excluding an outlier of 0.47), the hazard ratios for the adjuvant studies do not reflect the fullness of each study, as they often are compared to patients under varying circumstances (i.e. those without surgery, without adjuvant chemotherapy, etc.). Nevertheless, the hazard ratios help show either the reduced risk or increased risk of an event for the patients in these studies. Taken together, the included studies appear to demonstrate reliable performance of radiomics in accurately predicting progression-free, disease-free, and overall survival. Although prediction of survival in the adjuvant setting does not appear to be as robust as prediction of progression in the neoadjuvant setting, these data, in conjunction with other clinical, pathologic and treatment features, may help clinicians better prognosticate after surgical intervention.

Selecting patients for surgery
The use of medical imaging to accurately predict tumor behavior can significantly impact surgical decision-making and treatment planning for cancer patients. Such technology can help clinicians better identify operative candidates based on preoperative clinical and imaging variables. In hepatocelullar carcinoma, for example, identification of microvascular invasion (MVI) preoperatively may influence decisions to pursue resection or transplantation. To study the role of radiomics in predicting MVI, Zheng et al. evaluated 120 patients from two institutions who underwent resection of hepatocellular carcinoma. These data demonstrate that quantitative features could predict MVI among patients with tumors 5 cm with an AUC of 0.80, positive predictive value of 63 percent and negative predictive value of 85 percent [55].
For patients with pancreatic intraductal papillary mucinous neoplasms (IPMN), use of imaging to predict malignant transformation is ubiquitous in clinical practice. Imaging attributes, such as cyst size, duct diameter, or presence of solid enhancing components, have been considered predictive of IPMN-associated malignancy and often mandate further evaluation with endoscopy or resection. Radiomics have been proposed as a novel method to more precisely characterize imaging features and distinguish high-risk versus low-risk IPMNs. In one retrospective series, analyzing 103 patients with IPMN, investigators sought to associate preoperative imaging features with pathologic features after resection: low-risk (lowand intermediate-grade dysplasia) versus high-risk disease (high-grade dysplasia or invasive carcinoma). A prediction model for risk assessment using clinical variables alone, including age, cyst size, presence of solid or mural nodule, symptoms and gender, was used as a comparison group. When using clinical variables alone, the multivariate prediction model demonstrated an AUC 0.67, however when combined with quantitative imaging features, the AUC was 0.79. Such data demonstrate that utilization of radiomics with clinical features may improve preoperative risk stratification and patient selection for further procedure [56]. These data have been re-demonstrated and validated in later studies [57,58]. Such findings help clinicians appropriately select patients for upfront surgical resection. This is of particular importance for patients with precancerous lesions, such as IPMN, where timely resection may translate into a significant survival advantage.
In addition to predicting treatment response, advanced imaging techniques can improve staging accuracy prior to resection. In pancreas cancer, for example, radiomics may provide a clearer distinction between locally advanced or borderline resectable disease; such a distinction with regard to vessel involvement can assist with operative planning and prognostication. Radiomics have further been demonstrated to aid in preoperative prediction of mediastinal lymph node metastasis in lung cancer or lymph node metastasis in colorectal cancer [59,60]. Other studies have demonstrated the efficacy of radiomics in differentiating well-differentiated liposarcoma from lipomas or invasive lung cancers from preinvasive lesions, with significant implications for operative planning [61,62]. Better preoperative characterization of disease burden can improve overall outcomes by better informing surgical approach.

Discussion
Surgery remains a curative treatment option for many patients with solid tumors, however multimodal approaches, which combine surgery and chemotherapy or radiation, have become more prevalent with promising results [63,64]. Efforts to best identify patients who would benefit from surgery or multimodal approaches are increasingly critical to improve outcomes. In this paper, studies using radiomics and machine learning to predict therapeutic response, prognosis, recurrence, and survival in patients who underwent neoadjuvant or adjuvant chemotherapy were summarized to highlight the technical approaches used to generate appropriate radiomics models and a portion of the performance metrics rendered from these models. While the neoadjuvant studies are centered primarily around therapeutic response and the adjuvant studies focus on survival, both produced results that show above average performance of their models' ability to predict outcome. Such data are promising for the future role of radiomics in clinical practice, however the feasibility of widespread expansion and adoption remains unknown.
Traditionally, clinical factors such as tumor type, stage, patient co-morbidities and functional status have influenced timing and receipt of treatment.
Researchers are now focusing on analyzing and developing personalized strategies taking into account various prognostic factors [65]. Radiomics has emerged as a tool to help identify treatment responders and riskstratify patients based on projected outcomes. Radiomics coupled with robust machine learning models have shown promising results and a potential to develop personalized strategies for many cancers, including head-and-neck, breast, lung, prostate, colorectal, and gastric [66]. While this summary is limited and heterogenous in tumor type, the high AUC metrics for neoadjuvant studies, ranging from 0.82 to 0.89, and adjuvant studies, ranging from 0.66 to 0.86, demonstrate the predictive power of radiomics in therapeutic response, prognosis, recurrence, and survival. These data, which suggest the reliability and reproducibility of this technique, may be clinically leveraged to guide therapy and tailor treatment to an individual patient.

Limitations and future work
Despite the promise and potential of radiomic applications, there remain many well understood challenges and limitations that need to be addressed to create robust markers for clinical use. In each of these studies, the radiomic features extracted vary greatly, with no consistency in the type of features extracted, nor the methodology used to select the final feature set [67]. The imaging modalities and types of machines used to capture the medical images are vastly inconsistent at each institution [67]. These differences in imaging modalities do not allow for easy replication of prior experiments. To obtain the tumor regions, radiologists need to manually segment the images. Manual segmentation is time-consuming to complete. Auto segmentation and semi-automatic segmentation methods exist but require supervision and refinement by radiologists. Radiomic studies tend to be single institution, owing to the difficulties in sharing imaging data, with no external validation of the developed models [68]. The size of datasets varies, with many being too small to conclusively state that the models generated would render similar results on larger datasets. One solution to the small dataset issue is to combine datasets from multiple institutions [69]. However, many medical, industrial, and international politics prevent these large datasets from being readily formed.
For radiomic studies to advance past preliminary results, an adherence and standardization of common practices has to occur [67]. Multicenter data sharing and data scalability would allow for deeper studies and more robust results [69]. Federated learning techniques are one possible solution to safe and secure multicenter data sharing. Federated learning seeks to address the problem of privacy and data governance by collaboratively training algorithms without exchanging the data itself [70]. Privatization methods are needed to secure data and encrypt it from attackers and strengthen the appeal of federated learning as a trusted method of multicenter data sharing [71]. The development of accurate, simple-to-use auto segmentation algorithms would enable radiologists to analyze more data. Implementing multimodal data approaches integrating quantitative, clinical, histological, genomic, and radiomic data is essential in designing personalized treatments [68].
Recently, there have been efforts proposed to standardize radiomic features and develop strong practices that the medical machine learning community could adopt as a standard. The Image Biomarker Standardization Initiative (IBSI) is a collaboration between 25 research groups who validated consensusbased reference values for 169 radiomics features [72]. The results of this study showed the potential to enable calibration and verification of radiomics software, if widely accepted as a standard.
Additionally, transparency in the construction and implementation of prediction models is proposed as a solution to curtail the lack of detail given in many papers regarding the models. The Transparent Reporting of a multivariable prediction model for Individual Prognosis Or Diagnosis (TRIPOD) Initiative was developed during a series of meetings between methodologists, health care professionals, and journal editors as a set of recommendations for the reporting of studies developing, validating, or updating a prediction model, whether for diagnostic or prognostic purposes [73].

Conclusion
Radiomics have shown great promise in personalizing chemotherapy treatment and options for patients. The studies analyzed in this paper have demonstrated the effectiveness of radiomics and machine learning to discern the patients that are positively affected by chemotherapy versus those in whom it is futile. The standardization of radiomic analysis, increased data sharing, and stronger auto-segmentation algorithms are needed to improve the adoption and implementation in clinical applications.

Disclosure statement
No potential conflict of interest was reported by the author(s).