A Novel Multi-Neural Ensemble Approach for Cancer Diagnosis

ABSTRACT Cancer is a complex worldwide health concern that resulted in 10 million cancer deaths in 2018; hence, early cancer detection is crucial. Early detection involves developing more precise technology that offers information about the patient’s cancer, allowing clinicians to make better-informed treatment options. This study provides an in-depth analysis of multiple cancers. This study also exhibits a good survey of the machine or deep learning techniques used in cancer research. Also, the study proposed a stacking-based multi-neural ensemble learning method’s prediction performance on eight datasets, including the benchmark datasets like Wisconsin Breast cancer dataset, mesothelioma, cervical cancer, non-small cell lung cancer survival dataset, and prostate cancer dataset. This study also analyzes the three real-time cancer datasets (Lung, Ovarian & Leukemia) of the Jammu and Kashmir region. The simulation findings indicate that the methodology described in our study attained the highest level of prediction accuracy across all types of cancer data sets. Additionally, the proposed approach has been statistically validated. The purpose of this investigation was to develop and evaluate a prediction model that might be used as a biomarker for malignancy based on anthropometric, clinical, imaging, and gene data.


Introduction
Cancer is a deadly issue responsible for most deaths worldwide that rise with an estimate of 18.1 million new cancer cases each year (Ferlay 2018). The study's motivation is the alarming rate at which new cancer cases increase (Islami et al. 2018). According to the World Health Organization's most recent data, 10 million cancer deaths occurred in 2020 alone, and millions of new incidences are recognized each year. Table 1 summarizes the study's statistical findings on the tumors examined (Bray, Ferlay, and Soerjomataram 2020).
CONTACT Surbhi Gupta sur7312@gmail.com School of Computer Science and Engineering, Shri Mata Vaishno Devi University, Katra, Jammu and Kashmir, India; Department of Computer Science and Engineering, Model Institute of Engineering and Technology, Kot Bhalwal, Jammu, Jammu and Kashmir, India.
Lung malignant growth is the most widely recognized cancer, i.e., 11.6% of the total cases on the planet. Regardless of advances in chemotherapy, the forecast for malignant lung growth stays poor, with 5-year relative endurance fewer than 14% among men and approximately 18% among females in most countries (Bray, Ferlay, and Soerjomataram 2020). Tobacco use and impacts of cigarette smoking is the chief risk factor for malignant lung growth (Cruz, Tanoue, and Matthay 2011). Breast Cancer is the second most threatening cancer in the world. It has a high incidence and mortality rate (Key, Verkasalo, and Banks 2001). According to the latest cancer statistics mentioned in Table 1, Breast malignancy alone accounts for the majority of cancer deaths worldwide (Ferlay 2018). Ovarian Cancer (OC) is the seventh most generally analyzed malignant growth among ladies on the planet (3% of women died). OC is ordinarily detected at a late stage when the 5-year relative endurance rate is just 29%. Hardly any cases (15%) are determined to have restricted tumors (Stage 1) when the 5-year endurance rate is 92%. Strikingly, the general endurance rate for most cases runs between 30% and 40% over the globe and has seen increments (2%-4%) since 1995 (Allemani et al. 2014;Torre et al. 2018). OC risk factors incorporate natural and way of life factors, for example, asbestos and powder exposures and cigarette smoking (Reid, De Klerk, and Bill Musk 2011).
In 2018, 4.5% of people died of leukemia. As indicated by the review case surveys of leukemia, typical signs and side effects incorporate fever (17% to 77%), dormancy (12% to 39%), and dying (10% to 45%) (B. M. Reid, Permuth, and Sellers 2017). Around 33% of youngsters had musculoskeletal manifestations, especially in the spine and long bones, 75% had an expanded liver or spleen, in roughly 7% of kids at finding (Sinigaglia et al. 2008). Leukemia survivors require sequential complete blood record checking, just as age-and sex-explicit malignancy screening (Shouval et al. 2019). Grown-ups additionally present with protected side effects, for example, fever, tiredness, and weight reduction. They may have experienced shortness of breath, chest irritation, unreasonable wounding, nosebleeds, or abnormal menstrual periods in ladies (Cornell and Palmer 2012).
Early discovery of malignancy guarantees a unique possibility of expanding survivability of malignant growth patients (B. M. Reid, Permuth, and Sellers 2017). Various models dependent on clinical information are proposed in the • The study proposed a firsthand approach to the ensemble (stack) multiple deep learning models with a gradient-boosting technique named stackingbased multi-neural ensemble to classify cancer datasets to predict cancer diagnosis, stage, and survival time. • This study has focused on the limitations of previous studies, thereby presenting an improved approach. • Three real-time cancer datasets (Lung, Ovarian & Leukemia) are collected from the Jammu & Kashmir region. • The proposed model is tested on five benchmark datasets: the Wisconsin Breast cancer dataset, Mesothelioma, Cervical cancer dataset, non-small cell lung cancer (NSCLC) survival dataset, and prostate cancer dataset. • The performance of the proposed models is compared with previous studies, and the proposed model, i.e., stacking-based multi-neural ensemble, attained better prediction results than all the previous studies.
All the implementation details of the established Prediction Model are accessible on Github to facilitate the model's reusability by other researchers.
Medical data can now be found in multiple public and private data repositories, thanks to advances in database technology and the Internet. The healthcare industry is anticipated to create terabytes of data each year. Extracting valuable information for excellent healthcare is a difficult and vital task, and we now have many data in our databases to do so. However, the amount of information gleaned from it is minuscule. As a result, effective data organization, analysis, and interpretation are critical if tangible knowledge extraction is accomplished. In order to identify relevant patterns and hidden knowledge from these enormous datasets of medical data, multiple computational techniques are necessary. We often analyze massive and large observational datasets in the data mining process and then extract important hidden patterns for data classification. The automated learning techniques have now begun to experiment with clinical data.
In this study, we have assessed the proposed strategy on eight datasets. Two datasets are extricated from digitized images, three real-time cancer datasets, two electronic health records databases comprising clinical properties, and datasets dependent on gene expressions and clinical information. From a vast collection of literature in malignancy prediction modeling, deep learning approaches have signified their vastness effectively and accomplished incredible outcomes; however, none of the systems is entirely exact. The conclusive results of our study confirm that the proposed stacking-based multi-neural ensemble learning strategy utilizes the cancer patient's data and produces more precise expectations than single classifiers. The remaining article is grouped into seven sections. Section 2 describes the review of related research studies and prediction models. Section 3 describes the proposed methodology employed in the current study along with the dataset analysis. Section 5 shows the simulation results and their discussion. Finally, the article is concluded in the last section.

Theoretical Framework
Several research works have been done in the field of cancer detection (Coccia 2019;Korbar et al. 2021;Deshmukh and Kashyap 2022;Zhang et al. 2022;Kohli et al. 2021;Kumar 2020;. Many researchers have used automated learning techniques for the prediction of cancer Kumar et al. 2020;Kumar and Mahajan 2019;Kumar and Single, 2021). Few such studies are mentioned in this section.
Lung Cancer: In 2017, Lynch ) led an examination work to anticipate malignant lung growth utilizing unsupervised learning and achieved Root Mean Square Error (RMSE) values (16.193 for k-Means). This study used approximately 10.4k lung cancer records from the Surveillance, Epidemiology, and End Results (SEER) program database. Also, some researchers have assessed the endurance period of lung cancer patients by examining data mining approaches on the lung cancer records from the SEER database, containing collaborative clustering-based techniques (D. Chen et al. 2009), Support Vector Machine (SVM), and Logistic Regression (LR) (Fradkin, Muchnik, and Schneider 2005), and unsupervised methods in 2017 . A similar study was proposed in 2017 ) that examined the supervised classification models to predict lung cancer survival. The classification models employed in the study are Decision Trees  (Yoo et al. 2019), a novel model based on CNN was applied for prostate cancer diagnosis. The data used in the study consisted of 427 patients, where 175 were cases, and 252 were controls. The recommended model attained an area under the receiver operating characteristic curve (AUC) of 0.87. A computation model based on deep learning was projected in 2020 (Tolkach et al. 2020) to predict prostate cancer. Classification precision of the deep learning architecture reaches 98%. Another research study in 2020  proposed an automatic diagnosis of prostate cancer. This study evaluated multiple Classification models like KNN, Naïve Bayes (NB), SVM, DT, and the best performance was achieved by neural learning models. Also, computer simulations demonstrate that the data balancing strategy increased predictive performance from 84% to 93% with balanced data. Recent research  proposed multiple-balancing techniques for attaining high accurateness. Many of the research studies done to predict the prostate cancer diagnosis has successfully shown the importance of computer-aided diagnosis.
Cervical cancer: Cervical cancer was diagnosed using automated learning methodologies (Wu and Zhou 2017). The technique based on Support Vector Machine (SVM) was used for classification along with Principal Component Analysis (PCA) and Recursive Feature Elimination (RFE) techniques. SVM-PCA, SVM-RFE with different feature sets was proposed in the study, and SVM-PCA displayed the best performances attaining the highest classification score (93%). Another study carried in 2017 (Ceylan and Pekel 2017) proposed multiple-classification models to predict the risk of cervical cancer and compared the Bayesian model, DTs, and RF. RF achieved the highest accurateness, i.e., 82% (approx.). Cervical cancer was diagnosed using the proposed strategy of balancing the data with Smote and used PCA for dimension reduction in 2018 (Abdoh, Rizka, and Maghraby 2018). The technique was compared with the feature set selected by the RFE technique. The proposed design achieved 97.4% accuracy. Cervical cancer diagnosis was done using stacked Autoencoders and softmax classification in 2019 (Fernandes, Chicco, and Cardoso 2018) and achieved a top AUC score of 97.25%. Also, recent research by  investigated the performance of stacking ensemble of different classifiers on cervical cancer dataset.
Leukemia: In 2018, a research study (Mei et al. 2018) applied neural Learning to predict acute myeloid leukemia (AML). The dataset used in the study was taken from TCGA (The Cancer Genome Atlas) database. The implementation used stacked Autoencoders to formulate a categorized DL model. The model implemented in R language attained exceptional correctness of 83% in forecasting prognosis. A review article published in 2019 (Salah et al. 2019) emphasized the utilization of ML models to predict leukemia diagnosis. A total of 58 research studies were revised. A significant factor observed in this study was that none of the articles applied ML models in realworld scenarios. More than 90% of articles utilized small and homogenous samples. A research study was done in 2019 (Shouval et al. 2019) worked on predicting the survival of leukemia patients after the Autologous Stem Cell Transplantation. A recent research study 2020 (Maria, Devi, and Ravi 2020) employed ML to predict diagnosis. The respective research presented a comparative study of SVM, KNN, Neural Networks, and NB for the classification of leukemia into its subtypes.
Ovarian Cancer: (Miao et al. 2018) used deep CNN for predicting the diagnosis of ovarian cancers. The 10-folder cross-validation validated simulation results. Also, classification accurateness improved from 72.76% to 78.20% by using the strategy proposed in the study. Another study conducted in 2019 (Kawakami et al. 2019) used 334 epithelial ovarian cancer (EOC) cases, out of which 101 cases belonged to the benign group, and the rest belonged to the malignant group. ML models comprising Gradient Boosting Machine (GBM), SVM, RF, NB, and Neural Network were used. The ensemble technique (GBM & RF) presented the top prediction performance of 92.4% AUC. A recent study 2020 (Mingyang et al. 2020) aimed to access the practical value of ML models in OC detection. The data comprised 349 patients with 49 features. The study established notable features. The learner produced a better forecast and outperformed the prevailing OC prediction approaches.
Mesothelioma: Research work (Mukherjee 2018) on the same feature set and attained 99% with SVM. The study made by Ilhan and Celik (Ilhan and Celik 2017) deployed Ensemble Learning with 10-fold Cross-validation and successfully achieved 100% accuracy in classification. Also, recent research (Gupta and Kumar Gupta, 2021a) explores the performance of multiple classifiers on the dataset. The research work (Kaur and Singh 2019) used K-NN and claimed 99.07% accuracy. A retrospective study (Hu and Zebo 2019) trained numerous deep learning algorithms and confirmed stacked sparse auto-encoder (SSAE) as the best model for MM diagnosis. Two feature selection methods, i.e., Genetic Algorithm (GA) and ReliefF methods, were used to select the features. Genetic Algorithms (GA) chose a set of 19 highly significant features and confirmed that GA and Stacked Sparse Autoencoder (SSAE) achieved the highest attainable accuracy (100%). All the above-stated studies claimed high accuracies but, after examination, we observed that an input feature ("diagnosis method") used in the model duplicated the target diagnosis class, confirmed by (Chicco and Rovelli 2019). This trivial feature makes the model virtually perfect yielding high estimation accuracy. Hence, we don't advocate their results as it violates the fundamentals and can't be considered. Recent work done by (Chicco and Rovelli 2019) on the same dataset confirmed that the accuracy stated by (Orhan et al. 2012) was trivial, and Probabilistic Neural Network (PNN) could not perform well, obtaining an accuracy of 0.52. Their study made the first move to address the repetitive feature in the dataset. Also, they handled the imbalance problem of the data by using the under-sampling technique. The highest accuracy was 0.82 and was recorded using Random Forest Classifier on the balanced set. Under-sampling established its effectiveness to upgrade the prediction results, even though it imposes the constraint of omitting a portion of valuable data. Table 2 summarizes the literature review of the cancer research studies.

Proposed System
This section holds the flowchart of the cancer prediction procedure, algorithm of the proposed classification model, description of the hyperparameters used, and the proposed architecture. Missing value imputations are done using k-Nearest Neighbors ("An Introduction to Kernel and Nearest Neighbor Nonparametric Regression" 1992) (k-NN) imputation methods. Next, data is transformed using data scalar procedures. K-fold (K = 10) Cross-Validation (CV) approach was adopted, wherein MLP models were built on a training set and assessed on a test set. The training set corresponded to 75% of the total amount of data. Figure 1 depicts the proposed workflow.

Data Analysis
This section provides the description of the datasets that have been explored in this research. The eight datasets used in the study fall under the category of clinical databases, real-time databases, gene expression-based databases, and digitized datasets.

Clinical Databases
Cervical cancer and mesothelioma cancer clinical datasets utilized in this study are authentic electronic health records of patients. They are freely available on the UCI Machine Learning Repository (University of California, Irvine).  (324) examples portraying records of patients was analyzed and tested. In the dataset, all cases independently have 35 columns, i.e., features. One of the dataset features named "diagnosis method" is replicating the target variable "class of diagnosis." Henceforth, we exclude this feature from further analysis to improve the reliability of the study.

Real-Time Datasets
Three real-time cancer datasets, i.e., ovarian cancer, lung cancer, and leukemia dataset, are incorporated in the study. The patients diagnosed with cancer were selected from multiple hospitals and clinics of the Jammu and Kashmir Region. For each patient, the diagnosis results were histologically confirmed. Records of cancer patients and healthy volunteers with the consent of all the participants were included in the present study. In this study, we assembled Clinical, demographic, and anthropometric information for all participants under similar conditions. The predictors identical in all the three datasets are age, weight, height, BMI.
• Ovarian Cancer Dataset (Verma et al. 2019): Ovarian cancer dataset was used in the study (Verma et al. 2019). Later on, we gathered more data and a total record of 697 participants was collected comprising 248 ovarian cancer patients (mean age 58.7 years, range 22-89) and 449 controls (mean age 56.44 years, range 25-89). Collected data includes menopausal status (that determine the pre/post or bleeding after menopause again), Pre/Post Menopause (for each participant, this status expressed whether patient experience menopause earlier or later), age of menarche i.e. age of onset of menses, presence of breast cancer nodules (stating whether the patient is diagnosed with breast nodules also), and use of oral contraceptives. These clinical features have been marked as important risk factors (B. M. Reid, Permuth, and Sellers 2017). The target variable is ovarian code that determines whether a person has ovarian cancer or not. The statistical description of the ovarian cancer dataset is provided in Table 3. Range and missing values are given for each predictor.
These clinical features have been marked as important risk factors (B. M. Reid, Permuth, and Sellers 2017). The target variable is ovarian code that determines whether a person has ovarian malignant growth or not.
• Lung Cancer Dataset (Bhat et al.   hemoglobin tally differs from 8.6 (mean) in cases to 9.7 (median) in controls. Gathered information incorporates the smoking status, the alcoholic propensities, the proximity of fever, unusual augmentation of spleen or liver (Splenomegaly/Hepatomegaly), and hemoglobin tally of the patient. These clinical highlights have been set apart as significant hazard factors in prior studies (Davis, Viera, and Mead 2014). Dataset description is given in Table 5.
Variable 7 i.e. "Alcoholic" depicts whether person consumes alcohol (2), does not consume alcohol (2) and sometimes/occasionally drinks (1). Hemoglobin count is a significant factor as it depicts the amount of red blood cells (RBC) in the body and is expressed in terms of gm/dL (grams per deciliter). The objective variable is "case/control" that decides if an individual is a case (leukemia patient) or control (healthy person).

Gene Expression Based Datasets
Various examinations have detailed the utilization of quality articulation information and other high-dimensional genomic information for endurance expectation (Chaddad et al. 2017;Skaug, Eide Msci, and Gulsvik 2007;Sun et al. 2018;Xiao et al. 2018;Cho and Won 2003;Shedden et al. 2008;Størvold et al. 2007; Y. Chen, Ke, and Chiu 2014). For example, non-small-cell lung carcinoma (NSCLC) patients' quality articulation crude information (CEL files) and clinical information downloaded from the NCI database, a vault of high-throughput gene expression data microarrays.
• Lung Cancer Survival Dataset: We investigated numerous informational indexes to evaluate the prognostic estimation of different parameters in lung disease. We utilized the survival time (< year and a half) as a high-risk group and survival time [18,48] as the moderate-risk group, and survival time > 4 years as a low-risk group. This NSCLC information was recorded basically from four establishments and constituted of 442 NSCLC patients. Patients' survival durations, ages, breakdown phases, treatment, and smoking history were all included in the clinical data. All gene expression data profiling was carried out using Affymetrix HG-U133A chips. The treatment response data includes age, race, sex, survival time, adjuvant chemotherapy, adjuvant radiation therapy, and stage statistics. Cases with missing survival time are omitted from further analysis. The summary of the patient's information and NSCLC patients, along with classified risk groups, is depicted in Table 6.

Classification Model
Given the accomplishment of neural networks in biomedicine in earlier studies, we resorted to employing deep learning architectures Gupta and Kumar 20212022;. Henceforth, to construct learning models that can learn the unknown relationships among various classifiers, we embrace the Stacking-based ensemble learning of neural classifiers. Ensemble Learning: Taking into account the way that ensemble learning can integrate various learning techniques. The resultant model that takes pluses of compound learning strategies would prompt superior performance. A few examinations have been portrayed in the writings to incorporate models to raise the exactness of the expectation. For example, Bagging was acquainted by Breiman (Breiman 2001) to consolidate outputs from decision trees produced by a few arbitrarily chose sub-sets of the training information and decisions in favor of the ultimate result. Boosting is an improved adaptation of Bagging that was advanced by Freund and Schapire (Vladimir et al. 2005). This strategy works by uplifting the weights of training samples in each iteration and finally joins the classification outcomes by weighted votes. Wolpert (Wolpert 1992) proposed using linear models to integrate results of the learning structures, otherwise called Stacking or blending. Contrasting the majority voting that takes just the linear connections among classifiers into thought, stacking classifiers can "learn" non-linear structures. Stacking utilizes a learning approach to integrate the models that make it a significantly more remarkable outfit strategy.
Multiple Layer Perceptrons: Rosenblatt constructed a single-layer perceptron that permits the neural systems to demonstrate a shallow neural system, wound up forestalling this network from performing non-linear classification (Rosenblatt 1958). Quick forward to 1986, when Hinton, Rumelhart, and Williams distributed a paper "Learning representations by back-propagating errors," presenting ideas about Backpropagation and hidden layers -subsequently bringing forth Multilayer Perceptrons (MLPs) (Rumelhart and Hintont 1986). In the forward pass, the data stream flows from the information layer through the shrouded ("hidden") layers to the final ("output") layer, and the selection of the last layer is estimated against the ground truth labels. Hidden Layers are neuron hubs stacked in the middle of sources of info and outcomes, permitting neural systems to learn intricate features gradually. In Backpropagation, weights are updated repeatedly to minimize the error rate utilizing the chain rule of calculus, partial derivatives of the error function. Such strategy provides us a gradient or a scene of blunder. Also, this may balance the parameters as it can estimate the error in various ways, including by Mean Square Error (MSE).

Measures and Parameters of Variables
Model Hyperparameters are properties that govern the entire training process. They include variables that decide the system structure (for instance, Number of Hidden Units) and the factors which determine how the system is prepared (for example, Learning Rate). Model hyperparameters are set before preparing (before upgrading the loads and predisposition). Hyperparameters are significant since they straightforwardly control the classification performance. Also, it has a substantial effect on the execution of the model under training. Optimization Hyperparameters are connected more to the advancement and preparing process like learning rate and number of epochs. In addition, model Hyperparameters are more associated with the structure of the model, like hidden layers and hidden units.
• Learning rate: If the model's learning rate is significantly below than optimum quality, it will take significantly longer (hundreds or thousands) of epochs to reach optimum state. Then again, on the off chance that the learning rate is a lot bigger than ideal worth, at that point it would overshoot the perfect state and the calculation probably won't merge.
We chose the learning rate = 0.001 in the wake of tuning the neural model. • Epochs: We used 500 epochs for the training phase of each MLP classifier.
The intuitive manual method is to train the model for as much iterations as the validation error continues to decrease. • Hidden units: It is one of the more perplexing hyper parameters. The number of the hidden units is proportional to the learning limit of the model. We used units of 50, 150, and 200 in MLP 1, MLP 2, and MLP 3, respectively. Another heuristic regarding the first hidden layer is that empirical observation indicates that increasing the number of hidden units above the number of inputs results in improved performance on a variety of tasks. • Layers: MLP_1 is fabricated using two hidden layers while MLP_2 is prepared using three hidden layers. Consequently, MLP_3 is constructed with four hidden layers. • Optimizer: AdaM represents Adaptive Momentum. It joins the Momentum and RMS prop in a solitary methodology making AdaM an exceptionally incredible and quick streamlining agent. Adaptive Moment Estimation (Adam) computes adaptive learning rates for each parameter and favors flat minima on the error surface. As followed, we calculate the decay average of the previous squared gradients (S t Þ and past gradients C t in eq. (i) and (ii).
C t and S t are approximations of the gradients' initial moment (the mean) and secondary moment (the un-centered variance). The biases were countered using b C t and b S t i.e. bias-corrected first and second moment estimates respectively. These are mathematically expressed in equation (iii) and (iv).
Finally, the ADAM rule is expressed in eq. (v).
• Activation function: For input layer, ReLU activation function is used. We utilized the Sigmoid as the activation function in the hidden and output layer. It is a rather straightforward architecture, yet complex enough to serve as a valuable function. • Rectified Linear Units (ReLU): ReLU function guarantees that if y is more prominent than zero, our yield remains y; else if y is negative, our yield is zero. In short, we select the most extreme among 0 and y. ReLU is expressed mathematically in equation (vi).
• Sigmoid activation: The enactment work utilized in the inside layer of ANN is Sigmoid. The arrival estimation of sigmoid capacity is monotonically expanding, lies between 0 and 1 or from −1 to 1. Sigmoid capacity is characterized scientifically in eq. (vii).

S x
A sigmoid function is a statistical function with a characteristic "S"-shaped curve that is called the sigmoid curve.

Friedman Ranking Test
The Friedman Test is a non-parametric variant to ANOVA with Repeated Measures. It is used to detect there is or is not a statistical substantial distinction of three or more groups that contain the same participants. The Friedman test is used to determine the classifiers' ranks. At the 0.5 and 95% confidence levels, the null hypothesis (H0: there is no significant variation in classifier performance) is discarded. Thus, the alternative hypothesis (H1) is supported, implying a considerable difference exists between the classification results. Bonferroni-Holm adjustments were employed to determine the significance of the multi-neural ensemble above other classifiers.

Working Methodology
The dataset denoted by X is made of x belonging to a set of attributes, and y denotes the target column. Also, z represents the size of the data. The base learners (MLP 1 , MLP 2 & MLP 3 ) are represented by M 1 , M 2 & M 3, respectively. The algorithms work by training each base learner applying MLPs on original data (X) and saving it as S 1 , S 2 , and S 3 (M 1 , M 2 , & M 3, respectively).Then a new dataset (P 1 ) is generated to hold predictions (p) made by the S 1 , S 2 , and S 3 . Then the prediction dataset (P 1 ) made using base learners is passed as input through meta-model GDC or G and stored in S ' . Finally, the desired model is obtained. The algorithm of the stacking-based neural ensemble model is given in Figure 2.
(MLP 1 , MLP 2 & MLP 3 ) are represented by M 1 , M 2 & M 3, respectively, predictions (p), prediction dataset (P 1 ) The approach used in this study attempts to use a Gradient-boosting classifier (GBC) to stack deep neural networks, to construct a multi-model ensemble model to predict cancer in normal and tumor conditions. The selected features in each of the cancer datasets are supplied to the three neural models. After that, the GBC is used to stack the outputs of the three base learners to acquire the last forecast outcome. Figure 3 shows the learning architecture proposed in the study.
Multilayer Perceptrons_1 where irepresents the data À size Z q m ð Þis the hypothesis that all instances q ð Þare predicted correctly Here γ is the conditional function assessing the hypothesis Z q m ð Þ, y ¼ 1 if the condition γ is true else 0 and is calculated in the Equation (ix).
Then the misclassified cases are assigned a weight on the succeeding layer using the equation (x).
Weights of data instances are updated in each iteration as shown in the equation (xi).
The principal behind weight updation approach is to tempt learning where the classification models learn from the mistakes of the models at preceding layers. Further γ = 0 implies no Updation in the weight of the instance given in Equation (xii).
In case of misclassification, weight updation for the particular instance is given in Equation (xiii).
After the s repetitions, the final output is given in the Equation (xiv).
Thus, Gradient Boosting Classifier (GBC) works on the basis of weighted vote scheme where the working of the classification models depends on the prediction performance of (n-1) th classifiers.

Execution Details
The models were created using a Dell −15JPO9P computer equipped with an Intel Core i7-8550 U processor running at 1.80 GHz and 8 GB of Random-Access Memory (RAM). All machine learning algorithms are implemented in Python 3.7 via Anaconda Navigator.

Results and Discussion
This segment holds the experimental outcomes gathered after applying the proposed classification procedure. The graphs used to display the simulation results are plotted using "matplotlib" library in python. The evaluation parameters used for the assessment of the prediction models are described in Table A1. These evaluation parameters lay the standard for evaluating the advantages and shortcomings of the AI-based learning approaches. In Table A1, true positive (P) refers to the correctly recognized cases, false positive rate (Q) refers to the cases that are negative and wrongly  identified by the model, true negative rate (R) refers to the outcomes that are correctly predicted as negative by the technique, and false-negative rate (S) refers to the negative cases wrongly predicted by the model. The description of commonly used evaluation parameters for instance accuracy (Acc), Specificity (Spec.), Sensitivity (Sens.), F-measure, Receiving operator characteristic (ROC) curve, and Area under the curve (AUC) is given in Table A1 (appendix).  • Ovarian Cancer: Inferring from Figure 6, the best prediction accuracy on the Ovarian Cancer dataset was attained by the proposed stacked model (98% approximately) followed by the neural network built using three or more hidden layers.

•
Lung Cancer: Figure 7 shows that the best prediction accuracy on the Lung Cancer dataset was attained by the proposed stacked model (95.7%) followed by the MLPs with two or more hidden layers (84%).
• Leukemia: Figure 8 expresses that the most appreciable accurateness of 99% was achieved using Stacked Model can predict leukemia.   Figure 9.
• Prostate Cancer: Figure 10 displays the accuracy results achieved using all the neural techniques. The proposed model predicts prostate cancer with the highest accuracy (87%).
• Gene Expression Dataset: The proposed machine learning algorithm worked the best on gene expression-based dataset.
• Lung Cancer Survival: Figure 11 illustrates that the accuracy achieved by the proposed model predicts the survival of lung cancer patients with the highest accuracy (88.47%) followed by MLP_1.

Prediction Results of the Proposed Model Accuracy
The final stacked neural model was assessed on each of the cancer datasets using different performance assessment parameters. The classification model was validated using a 10-Fold Cross-Validation approach. The performance of the model was investigated using evaluation parameters (Powers 2020) like Accuracy (Acc), Area under the Curve (AUC), F1_Score (F1), Mathew's Correlation Coefficients (MCC), Specificity (Spec.), Sensitivity (Sens.). The prediction results thus attained are depicted in Table 7. The description of evaluation parameters is given in Table A1.  Table 7 infers that the prediction architecture proposed in the study performs well on all the cancer datasets. Following inferences need to be highlighted: • The Stacked Neural model worked well with both cervical cancer and Mesothelioma dataset attaining a great prediction score. • The prediction model worked well with the three real-time ovarian cancer datasets (binary), lung cancer dataset (multi-class problem) where the target is to predict stage (stage 2, 3 and 4), and leukemia dataset (binary target). The proposed prediction model performed exceptionally well on all the parameters concerning the Wisconsin breast cancer dataset. • The projected model worked well on NSCLC gene-expression dataset achieving appreciable prediction outcomes where the target is to predict the survival time of the lung cancer patients.

Comparison of the Proposed Model
To evaluate the effectiveness of the proposed study, we compared several approaches to predicting cancer or patient survival using the benchmark datasets (Cervical Cancer, Mesothelioma, Breast Cancer, and NSCLC gene expression-based lung cancer survival data. Comparisons based on the prediction accurateness (accuracy score) achieved by several research studies are summarized, and the highest accuracy scores are bold-faced in Table 8. Table 8 reasons that the proposed classification technique, i.e., stacking-based multi-neural ensemble system, shows an incredible performance on all cancer datasets. Our proposed strategy performed better than the techniques employed in previous research studies.

Statistical Analysis
Using Friedman statistical significance tests (WALs and Kelleher 1971) and Holms post-hoc analysis, the proposed stacking-based multi-neural ensemble is statistically compared to three deep learning approaches for each dataset (Holm 1979). Table 9 contains the average Friedman ranks (the higher the rating, the better (Evans 2019)) and the adjusted p. Friedman rankings of each classifier demonstrated that the suggested stacked model beat the MLP 1, MLP 2, and MLP 3 algorithms significantly (at 0.05).
Based on the results, we perceive that the proposed gradient boosting-based multi-neural approach produces noticeable results superior to each of the neural classification models. Due to the complication and high mortality of cancer, diagnosis precision is critical. Consequently, improvement in diagnosis prediction by applying machine learning systems is of great aid to cancer cure. According to the interpretations in an earlier study (Kourou et al. 2015a), neural networks have been used in 70% of cancer research studies. This interpretation encouraged us to integrate multiple neural models for attaining a more precise classification model. In the study, we presented an evaluation of the proposed multi-neural technique and the three different MLP models acting solo. Also, Simulation results on the eight data sets express that the proposed strategy yields greater accurateness than all the other learners performing individually. The sensitivity analysis shown in Figure 12 specifies that the single classification model displays uneven performance for different data sets.

Conclusion
A gradient boosting-based multi-neural approach is presented to predict cancer diagnosis, stage and survival. Multiple cancer datasets like real-time datasets, clinical, image-driven datasets, and gene expression data have been analyzed. The multi-neural ensemble model based on stacking ensembles the outputs of the three neural classifiers. Employing gradient boost learning at the second level enables the ensemble method to recognize the intricate relationships among the classifiers are automatically to achieve better prediction. This exploratory investigation conveys that the proposed stackingbased deep learning model can be an integral asset for a viable biomarker of various tumors. An ideal classifier must achieve higher sensitivity as diagnosing tumorous patients as nonmalignant would be a significant hazard. For cancer studies, this misclassification can be more hazardous than categorizing a healthy patient as malignant. Proposed gradient boosting used in the ensemble stage spontaneously acquires complicated structures. The instance labels are learned, such that the yield of MLPs and the associations amid them are considered. The gradient boosting learner works in a stepwise fashion by placing more weight on the instances that have been misclassified in the former stage. Subsequently, the appreciable accurateness of cancer prediction is achieved. The classification outcomes achieved by the predictive model in each of the cancer datasets are exceptionally sound to advocate the worth of the proposed model in further studies and medicinal practices. The study has some limitations; for instance, the model has been evaluated on small-size datasets only, and there is a requirement to validate the model on considerably large-sized datasets. Also, the proposed approach has been evaluated on only cancer datasets; for the sake of generalizability, the proposed model needs to be validated on other disease datasets as well. Regarding future directions, we aim to analyze the performance of the proposed model on other disease datasets.

Disclosure Statement
No potential conflict of interest was reported by the author(s).

Data Availability Statement
The following are the links of online datasets: https://github.com/surbhigupta24/Stacking-Based-Multi-Neural-Ensemble-