Covid-19 diagnosis by WE-SAJ

With a global COVID-19 pandemic, the number of confirmed patients increases rapidly, leaving the world with very few medical resources. Therefore, the fast diagnosis and monitoring of COVID-19 are one of the world's most critical challenges today. Artificial intelligence-based CT image classification models can quickly and accurately distinguish infected patients from healthy populations. Our research proposes a deep learning model (WE-SAJ) using wavelet entropy for feature extraction, two-layer FNNs for classification and the adaptive Jaya algorithm as a training algorithm. It achieves superior performance compared to the Jaya-based model. The model has a sensitivity of 85.47±1.84, specificity of 87.23±1.67 precision of 87.03±1.34, an accuracy of 86.35±0.70, and F1 score of 86.23±0.77, Matthews correlation coefficient of 72.75±1.38, and feature mutual information of 86.24±0.76. Our experiments demonstrate the potential of artificial intelligence techniques for COVID-19 diagnosis and the effectiveness of the Self-adaptive Jaya algorithm compared to the Jaya algorithm for medical image classification tasks.


Introduction
Covid-19 is a respiratory disease caused by the novel coronavirus SARS-CoV-2. Since the first cases appeared in 2019, Covid-19 has spread to most countries and territories worldwide (Hotez et al., 2021). Although there are several effective vaccines against the disease, they have not been able to definitively stop the spread of Covid-19 due to its high variability. Depending on the severity of the disease, patients with Covid-19 may suffer from mild respiratory symptoms such as cough and fever, severe pneumonia, multi-organ failure, or even death (Struyf et al., 2020). As the global pandemic progresses, the number of severe cases and deaths from Covid-19 continues to rise, a significant blow to human life and the global economy. Scholars in various fields have become highly concerned about the potentially severe consequences of Covid-19 and have maintained a constant interest in possible solutions to . An important issue that continues to be addressed is the lack of medical resources due to the rapid increase in patients. The current standard method of Covid-19 detection is Reverse Transcription Polymerase Chain Reaction (RT_PCR) (Fang et al., 2020), which has the disadvantage of a high proportion of false negatives and often requires multiple tests to produce reliable results so that the process is highly time-consuming. Diagnostic methods using CT and X-Ray images of the chest also CONTACT Shui-Hua Wang shuihuawang@ieee.org; Yu-Dong Zhang yudongzhang@ieee.org * Those authors contributed equally to this paper and should be regarded as co-first authors.
have the advantage of assessing the extent of a patient's disease (Fang et al., 2020). However, this method requires a large number of medical experts to do the diagnosis task.
Meanwhile, the number of confirmed cases of Covid-19 is increasing rapidly. It is difficult to diagnose and monitor the vast number of Covid-19 patients promptly manually. Therefore, finding a quick and accurate diagnosis method is becoming one of the most critical tasks for stopping the spread of COVID-19.
Artificial intelligence has been a popular field of research in recent years, attracting many researchers to solve complex problems in various areas such as medicine, economics, and the cyber security field (Chen et al., 2005;Du et al., 2007;Han & Huang, 2006;Wang et al., 2010;Wang & Huang, 2009). A significant advantage of AI is that it can be trained to replace humans with machines that perform repetitive and complex tasks (Du et al., 2006;Huang, 1999;Huang & Du, 2008;Meskó et al., 2018). The advantage of artificial intelligence is that it can solve the diagnostic difficulties associated with the rapid increase in patients. Many scholars believe that machine learning techniques on medical images can effectively diagnose COVID-19 patients (Wehbe et al., 2021). Many studies on machine learning techniques for chest X-ray and CT images of COVID-19 patients have emerged, and some of them have achieved relatively good performance. Also, a significant number have implemented innovative and inspiring image processing methods on COVID-19. (Szegedy et al., 2015) proposed a 22 layers deep network to do the image classification and detection task. Their model is suitable for COVID-19 diagnosis. Their research did not mention what optimisation algorithm they used, which could be a direction to further increase their model performance. (Lu, 2016) proposed a radial basis function (RBF) based model for brain disease diagnosis. Their model can be generalised for the diagnostic task of COVID-19, but did not achieve a stable and promising performance. (Chen, 2020) combined the greyscale co-occurrence matrix (GLCM) and support vector machine (SVM) to classify COVID-19 chest CT images and demonstrated the effectiveness of this method. (Yao & Han, 2020) worked on the COVID-19 Chest CT image classification task by using a wavelet entropy (WE) and biogeography-based optimisation (BBO) based method (WE-BBO). They have discovered the possible performance improvement of the combination of optimisation algorithm and Wavelet Entropy. Based on the WE-BBO, (Wang, 2021) proposed a Wavelet and Jaya combined method to do the same task and has improved, but their method needs to set the population size manually, which may cause the local optimal solution. The Jaya algorithm they used was proposed by (Rao, 2016), which can solve constrained and unconstrained optimisation problems as close to the optimal solution as possible while avoiding the worst solution. Moreover, this algorithm is parameter-free and straightforward to use. (Rao & More, 2017) proposed a modified version of the Jaya algorithm, called the self-adaptive Jaya algorithm, which solves the problem of the Jaya algorithm requiring a set population size. Furthermore, this algorithm can automatically adjust the population size based on the current and previous population size and the current solution. Because of these advantages, self-adaptive Jaya can bring higher performance and application value to the model. We constructed this experiment on research (Wang, 2021) by replacing one of the Jaya algorithms and proposing a WE-SAJ model that combines the self-adaptive Jaya algorithm and Wavelet Entropy and made considerable progress compared to the previous research.
Our contributions are as follows: (i) We discovered a self-adaptive Jaya algorithm and Wavelet entropy combined method (WE-SAJ) for Covid-19 diagnosis; (ii) We demonstrated the performance improvement of the selfadaptive Jaya algorithm over the Jaya algorithm for medical image classification models; (iii) We have further demonstrated the value of AI technology for the COVID-19 diagnostic task. The second part of the paper presents the dataset used for the experiments. The third part describes the main methods involved in the experiments. Finally, the fourth part presents and discusses the experimental results.

Dataset
We used a chest CT image dataset for the experiment. The dataset consisted of chest CT from 77 men and 55 women, for a total of 132 subjects. Each sample consisted of one complete chest CT image and the corresponding nucleic acid test results. The data set was divided into two groups, the COVID-19-infected group and the healthy group, and each group included 148 chest CT slices. The COVID-19infected group consisted of chest CT images from 66 COVID-19 patients from the Fourth People's Hospital in Huai'an, China, while the healthy group consisted of chest CT images from 66 healthy subjects. Table 1 shows the statistic of the dataset. Figure 1 shows the two samples from this dataset (Wang, 2021).

Wavelet transform
The Fourier transform is widely used in many areas of signal analysis as a method that can transform a signal from the time domain to the frequency domain, and the form of the transform is shown in Equation (1).
where ω refers to the frequency, t refers to time.
Although the Fourier transform can analyse the spectrum of a signal and has a high application value, it has certain limitations when dealing with non-stationary signals. The Fourier transform can only capture which frequencies a section of the signal generally consists of and cannot reflect the moments when these frequencies occur, making it possible for two non-stationary signals that differ in the time domain to appear identical in the frequency domain (Saravanan & Ramachandran, 2010). Many signals in nature are non-stationary, and biology and medical signal analysis problems can rarely be solved using the straightforward Fourier transform as a solution. A simple and feasible way to solve such problems is to decompose the entire time-domain signal into an infinite number of short-time signals. Then making each shorttime signal approximately smooth and using the Fourier transform on this basis to know the moment at which each frequency occurs. This decomposition process is known as adding windows, and this Fourier transform process based on the signal decomposition is known as the short-time Fourier transform (Allen & Rabiner, 1977). However, this method is limited by the width of the window. A too wide window will result in a low temporal resolution and a lack of refinement in the time domain. At the same time, a window that is too narrow will result in a poor frequency resolution and a lack of precision in frequency analysis. Furthermore, the window's width does not transform during a short-time Fourier transform, so the short-time Fourier transform is also not the best solution for non-stationary signal analysis.
The wavelet transform replaces the infinitely long trigonometric basis of the Fourier transform with a finite decaying wavelet basis to locate the moment when the frequency occurs while obtaining the frequency (Quian Quiroga et al., 2001). The transformation equation is shown in Equation (2).
where a represents scale, τ represents translation, ψ represents parent wavelet function, and t represents time.
A wavelet is a wave that is more concentrated in the time domain than a sine wave in the Fourier transform, where the energy is finite and concentrated at a point. Wavelets can be used to efficiently extract information from a signal and analyse functions or signals at multiple scales of refinement through operational functions such as scaling and translation. The essence of the wavelet transform is similar to the Fourier transform in that a carefully selected basis represents the signal function. Each wavelet transform has a mother wavelet and a scaling function. The basis function of any wavelet transform is the set of scaling and translation of the mother wavelet and scaling function. This scaling and translation correspond to two variables in the wavelet transform function, the scale and the translation. The scale controls the scaling of the wavelet function, which corresponds to frequency, and the translation controls the translation of the wavelet function, which corresponds to time (Saritha et al., 2013). The wavelet transform can therefore capture which frequency components the signal contains at different moments, thus solving the shortcomings of the Fourier transform when it comes to analysing unstable signals.

Wavelet entropy
Wavelet entropy is a novel tool for analysing the instantaneous characteristics of non-stationary signals based on the wavelet transform combining wavelet decomposition and entropy. The original Shannon entropy was proposed to provide a valuable criterion for quantifying and comparing the energy distribution in wavelet subbands. Thus, it can be defined as Equation (3) (Yildiz et al., 2009).
where g refers to grey levels, and P refers to the probabilities of grey levels. Thus, S(g) can represent the energy distribution in wavelet subbands according to the probabilities of grey levels occurring. However, most of the research on Shannon's entropy has been on engineering applications, and its physical meaning and principles have not been discussed in depth. Moreover, the shortcomings of Shannon entropy make it prone to wavelet mixing and energy leakage when dealing with non-stationary signals, which may lead to inaccurate or even incorrect results. Given this, many new solutions to these problems have emerged, such as relative wavelet entropy  and Tsallis Wavelet Entropy (Chen & Li, 2014). Our research uses a 4-level decomposition of biorthogonal wavelets. Compared to orthogonal wavelet bases, biorthogonal wavelet bases resolve the incompatibility of symmetry and exact signal reconstruction. Biorthogonal wavelets consist of two wavelets called dyads, which decompose and reconstruct the signal separately. Bi-orthogonal wavelets resolve the contradiction between linear phase and orthogonality requirements and are widely used in signal and image reconstruction. In this research, wavelet entropy is used for feature extraction. And then, the extracted features are fed into a two-layer Feedforward Neural Network for classification.

Feedforward neural network
The Feedforward Neural Network (FNN) (Jansen-Winkeln et al., 2021) is a typical deep learning model consisting of a multi-layer logistic regression model (continuous nonlinear function), also known as Single/Multi-Layer Perceptron, depending on the number of network layers . Each network layer contains different numbers of neurones (perceptron). The structure of a perceptron can be defined as in Equation (4) (Venkata, 2017).
where w i are weights, x is a set of inputs x i , m is the number of inputs to the perceptron, and b is the bias added after the sum of weighted inputs. Learning the value of w i and b can help the model cope with different tasks. A FNN usually consists of an input layer, serval hidden layer, and an output layer (Han et al., 2010). In a FNN, the preceding network layer is the input layer. The last layer is the output layer, and the network layer between the input and output layers are hidden layers. Although a wide variety of classifiers already exist, when faced with classification tasks of more-approximate classes, neural networks can add non-linearity and change the representation of the data through hidden layers to better generalise the model (Rudolph, 1997).
In this study, a two-layer FNN is used with Mean Squared Error (MSE) as the loss function. A sample structure of FNN is shown in Figure 2, m is a hyperparameter representing the number of hidden layers. The number of perceptrons in each hidden layer is also a hyperparameter. c represents the number of input x c , d represents the number of outputs (y d ), which depends on the number of output classes. For example, for a 5-classes classification task, d can be equal to 5.

Jaya algorithm
The Jaya algorithm is a population-based heuristic algorithm (Cheng, 2018). In the Jaya algorithm, the search agent updates the current value based on the optimal known value and the worst known value, continuously approaching the optimal solution while avoiding the worst one. Thus, this approach helps converge to the optimal global solution (Rao, 2016), (Degertekin et al., 2021). Compared to traditional evolutionary algorithms, the Jaya algorithm is parameter-free, effectively avoiding the situation where traditional evolutionary algorithms produce locally optimal solutions after incorrect algorithmic tuning (Han, 2018). Moreover, it has more excellent reliability and generalisation than other heuristics and is widely used in optimisation applications and research in several fields. The basic equations of the Jaya algorithm are shown in Equation (5) (Zhao, 2018).

Self-adaptive Jaya algorithm
This study used a modified Jaya algorithm called Selfadaptive Jaya algorithm (Rao & More, 2017). This method divides the solution into groups based on quality distinctions, distributed in the search space, to obtain the best solution. The most important feature of this algorithm is that the population size can be determined automatically, and its population development is shown in Equation (6) (Ravipudi & Neebha, 2018).
where n new represents the new population, n old represents the old population, and r is the relative population development rate, a random value taken from [−0.5, 0.5]. Since r is random, the size of the new population may be larger or smaller than the old population.
• Only the current best population can enter the next generation when the new population is smaller than the old population. • When the new population equals the old population, no change occurs. • When the new population is larger than the old population, all current populations enter the next generation.
The flow chart of the Self-adaptive Jaya algorithm is shown in Figure 3. This research uses the Self-adaptive Jaya algorithm as the training algorithm.

K-fold cross-validation
In machine learning, researchers usually divide the dataset into a training set, which is used for model training, and a test set, which is used to test model performance and thus improve the generalisation of the model. However, machine learning is a data-driven science. The size of the dataset has a significant impact on the model's performance, with larger amounts of data tending to train higher performance models. However, many studies face difficulties with data scarcity, and the division of datasets into training and testing sets severely reduces the amount of data available for training, thus affecting model performance. The core theory of Cross-Validation is to reuse data to increase the amount of data available for training models while ensuring the availability of test sets. The K-fold Cross-Validation (Rajasekaran & Rajwade, 2021) used in this study is a widely used Cross-Validation method. This method divides the dataset into pre-specified K groups, takes one of them without repeating as the test set, uses all the other data as the training set, and calculates the model performance using the training set. The training is repeated K times while ensuring that a different data set is switched as the test set. Finally, get the final performance (P final ) from the model performances over the K tests. Figure 4 illustrates a concrete form of K-fold Cross-Validation. To obtain a more reliable and robust result, we used a 10-fold Cross-Validation to divide the dataset.

Evaluation
In this study, we evaluated the model performance by multiple methods. To be specific, they are Accuracy, Sensitivity, Precision, Specificity, F-score, Area Under the ROC Curve (AUC) and Confusion Matrix.

Confusion matrix
Confusion Matrix is a visualisation tool used to see how the model performs on each class and is represented as a matrix of n rows and n columns, where n is the number of classes in the classification task. The confusion matrix consists of four central values: True Positive TPP, the number of positive samples that are correctly predicted as positive), True Negative TNN, the number of negative samples that are correctly predicted as negative), False Positive FPP, the number of negative samples that are incorrectly predicted as positive) and False Negative FNN, the number of positive samples incorrectly predicted as negative), as shown in Figure 5.

Accuracy
Accuracy is a widely used indicator to assess the performance of machine learning models, and it represents the proportion of correctly determined samples to the total number of samples. In general, the higher the accuracy, the better the classifier. Equation (7) is the formula of accuracy: Accuracy = TP + TN TP + TN + FP + FN (7)

Precision
Precision is the proportion of correctly classified samples among all data samples in the dataset. The calculation of   precision is shown as Equation (8).

Specificity
Specificity, also known as true negative rate, represents the ability of the model to distinguish between negative samples. The calculation of specificity is shown as Equation (9).

Sensitivity
Sensitivity, also known as true positive rate, represents the ability of the model to differentiate between positive samples. The calculation sensitivity is shown as Equation (10).

F-Score
F-Score is the benchmark generated by the combination of Precision and Sensitivity. The higher the F1-score value, the higher the stability of the classification model. The calculation of precision is shown as Equation (11).

AUC
AUC, a nonparametric statistic that is not affected by category distribution (Wu & Flach, 2005), is one of the most common indicators used to evaluate binary classification models. AUC evaluates the model performance by calculating the area under the curve (ROC curve) of the model's classification results on an axis with a false positive rate on the horizontal axis and a true positive rate on the vertical axis. Since the AUC considers the classification ability of the model for both positive and negative cases, it is still possible to evaluate the classifier reasonably well, even in the case of sample imbalance. Assume the ROC is constructed by m points, (x 1 , y 1 ), (x 2 , y 2 ), · · · , (x m , y m ), the estimation calculation of AUC can be defined as Equation (12) (Lee, 2019): Figure 6 illustrates a sample of 4-level decomposition of biorthogonal. Figure 6(a) is the result of the first-level wavelet transform, shows the input map's low-frequency subbands (upper left corner) and high-frequency subbands (upper right corner, lower-left corner, and lower right corner). Since the wavelet transform introduces downsampling, the edge lengths of the four self-band maps are general to the input. In the second-level wavelet transform Figure 6(b), the low-frequency self-bands and the high-frequency sub-bands are obtained after a similar transformation of the low-frequency sub-bands obtained from the first-level wavelet transform. The results of the third-level wavelet transform -as shown in Figure 6(c) -and the fourth-level wavelet transform -as shown in Figure 6(d) -are obtained after a recursive operation.  Note here that the image in the sample is painted with pseudo-colour, which is still actually represented as greyscale.

Statistical results
WE-SAJ used Wavelet Entropy as the feature extraction method, 2-layer FNN as the classifier, the Self-adaptive Jaya algorithm as the training algorithm, and K-fold crossvalidation to report unbiased performance. As a result, the experiments achieved great performance (shown in  Table 3 shows the specific numerical performance comparison between the Self-adaptive Jaya algorithm-based model (WE-SAJ) and the previous research (WE-Jaya) based on the Jaya algorithm. Compared to WE-Jaya, WE-SAJ has achieved significant improvements in all performance metrics.

Self-adaptive Jaya compared to Jaya
The WE-SAJ has improved accuracy by more than ten percentage points, which means our method has a higher practical value. More detailed performance improvements can be seen in the other performance indicators. WE-SAJ improves sensitivity by more than 12 percentage points and specificity by more than 11 percentage points. It suggests that WE-SAJ can reduce unnecessary healthcare resources by misdiagnosing as few healthy people as possible while ensuring that more infected people are correctly identified to identify COVID-19 patients effectively. It shows that WE-SAJ can ensure that more infected patients are correctly identified to effectively identify COVID-19 patients while minimising misdiagnosis of healthy people and reducing unnecessary wastage of healthcare resources. The improvement in F1-score also demonstrates that the model achieves better overall performance with equal weighting of precision and sensitivity. The increase of over 11 percentage points in the FMI index indicates the higher relevance of the data features extracted by the model to the data labels, which means that the model has improved its ability to extract useful features.
This series of improvements can be attributed to that the automatic population sizing feature of Adaptive Jaya can effectively set the most appropriate population size, thus enhancing the tracking of the optimal solution and ultimately helping the model achieve better performance. Figure 7 illustrates the ROC curve and AUC value of WE-Jaya (a) and WE-SAJ (b), respectively. Each point in the graph corresponds to a threshold. When the threshold is maximum, True Positive Rate (TPR) = False Positive Rate (FPR) = 0, which corresponds to the origin (0,0) of the graph. TPR = FPR = 1 corresponds to the point (1,1) in the upper right corner when the threshold is minimal. As the Threshold increases, both TPR and FPR increase. The ROC curve shows that the WE-SAJ can achieve a lower False Positive Rate and higher True Positive Rate than the Jaya-based model at most thresholds, which leads to a lower AUC for WE-SAJ than the Jaya-based model. It indicates that the WE-SAJ has more diagnostic value than WE-Jaya.

Comparison to state-of-the-art approaches
Compared to the other state-of-art approaches in CT image classification of COVID-19, WE-SAJ shows significant improvement in all aspects, as shown in Table 4 for numerical comparison. The model can obtain the overall improvement to these SOTA models because of the advantages of the Self-adaptive Jaya algorithm. It can automatically adjust the population size based on the current and the previous population size. The current solution avoids the local optimal solution and increases the possibility of finding the global optimal solution. The results also demonstrate that our method is promising for the CT image classification task of COVID-19 and has great scope for improvement.

Conclusions
Artificial intelligence-based medical image analysis techniques have a significant application in the fight against COVID-19. It can help in the diagnosis of COVID-19 challenges caused by the lack of medical resources.
This experiment validates the feasibility of the Wavelet Entropy and Self-adaptive algorithm-based model for the COVID-19 chest CT image classification task and achieved auspicious performance. It is highly generalisable and theoretically applicable to all types of medical image classification tasks, which needs to be further validated in future studies. Based on the experimentally obtained performance, we have reasons to believe that with more optimisations and improvements in the near future, we can obtain models with better performance for diagnosis and recognition of COVID-19 and even more diseases and solve more medical challenges.

Disclosure statement
No potential conflict of interest was reported by the author(s).