Deep learning-based automated multiclass classification of chest X-rays into Covid-19, normal, bacterial pneumonia and viral pneumonia

Abstract Covid-19 has been a pandemic across almost all parts of the world. Due to its higher spread rate and increased mortality rate, early detection of this is required. In the present study, we have used chest X-Rays to identify the presence of Covid-19 and several other Pneumonia types (Viral and Bacterial). To perform this classification, we have used a transfer learning-based model relying upon a pre-trained VGG-16 classifier network. Along with that, we have used the inception module as a pre-processing cursor to this network. We present our model via three case study approaches, namely – Case (01) – four-class classification, Case (02) – three-class classification, and Case (03) – two-class classification. For these case studies, we have selected our classes from Normal, Covid-19, Viral Pneumonia, and Bacterial Pneumonia. We have evaluated our model’s classification performance on various parameters, such as—accuracy, precision, sensitivity, specificity, false-positive rate, and F1-score, as just one parameter is not sufficient enough to evaluate the performance. After training the network for all three cases, we have found Covid-19 classification accuracies – Case 01–91.86% (Four Classes), Case 02–97.67% (Three Classes), and Case 03–99.61% (Two Classes) and all the other parameters are well represented in the performance parameter section. Our proposed model testing accuracies for all three cases are – Case 01–87.32% (Four Classes), Case 02–96.89% (Three Classes), and Case 03–99.95% (Two Classes). Based on the achieved accuracies, our model showed comparable performance to pre-existing methods like VGG-16, Res-Net, and Inception-Net. We can use these case studies for the interpretation and classification of chest X-Rays in these classes and with increased dataset and computational power, we can apply the proposed model for more class classification.


Introduction
A global health crisis, Covid-19, began in December 2019, which was so impactful that it was declared a pandemic by the WHO (Singhal, 2020). It forced us to do severe lockdowns and strict rules for movements across the countries. It has changed a lot of things, including lifestyle, work culture, economic depression, and high mortality rate all over the world, which has led to a massive burden on health infrastructure, which collapsed due to the enormous number of patients and lesser medical resources available.
Novel coronavirus attacks the respiratory system of the human body. It infects the lungs, due to which the patient is unable to breathe correctly and eventually dies due to lack of oxygen to the lungs, which has several symptoms (Cascella et al., 2022) such as fever, cough, and weakness. However, these systems vary from individual to individual based on their immune system response; along with that, loss of taste and smell is also another symptom. With an increasing infection rate of Covid-19, the patient suffers chest pain and loss of breathing. To assist that, patients need urgent medical oxygen supply to restore breathing.
As of now, there is no particular drug that can cure the disease; instead, we treat it using symptomatic treatment. However, we might not have any symptoms in several cases (Johansson et al., 2021). There are several vaccines (Polack et al., 2020) in the market; none proved to be having 100% efficacy in dealing with coronavirus. Even vaccinated people get affected with Covid-19 (Kissler et al., 2021) but with lesser severity and risk to lives. The main challenges to deal with it -rapid spreading continuously and mutation into several of its variants. Recent coronavirus variants are delta and omicron, which are deadly compared to earlier variants. Researchers have not yet tested vaccine effectiveness on omicron.
A better way to deal with Covid-19 is to make an early diagnosis as soon as possible, which can initiate treatment, and we can stop or localize the spread of Covid-19 into a few persons and regions. This virus mainly spreads through human-to-human interaction, and several researchers also found that it is transmitted through the air as media as well. Any surface or object can easily share this with another person. Being an invisible threat and hardly curable disease, it has affected almost all of the world population from all age groups, from small children to the oldest people. Nobody could avoid it, even after being quarantined. It shows how deadly it is for humankind, and hence bulk testing of Covid-19 will enable us to find Red-Zones where patients and spread rate and mortality rates are higher. We can mark them and implement a lockdown in that region and quarantine its people. We need a method to be reliable, fast and provide results within less time.
Out of several available methods (Giri et al., 2021) such as RT-PCR, INAT, antibody test, serology test, and medical imaging, Covid-19 testing uses the traditional diagnosis RT-PCR-based method, which eventually takes more time (roughly 4-6 hr). The higher spread rate, infection rate, and very high mortality rates lead to finding alternative solutions to the diagnosis at tiny fractions of time. To do this, several pathologists started taking the help of Chest X-Ray and CT imaging as primary diagnostic tools because these tools are widely available across the country around each corner. Also, considering the nature of the virus mutating over time, it has been the primary requirement to diagnose it as early as possible.
Along with the RT-PCR report, X-Ray and CT scans have also been helpful for primary diagnosis of virus and its severity based upon CT score (Fang et al., 2020). With a CT-score, patients are diagnosed as either Normal or Covid-19 and live-in quarantine or in-home isolation, but in several cases, it has been found that Viral Pneumonia and other flues go undetected in the early stage. It gives False-Negative results, which is dangerous for the patient, as they do not have any symptoms and CT score is negative, yet they might be contaminated and can lose their lives. Hence, CT and X-Ray have certain limitations which need to be taken care of: a) It is unable to differentiate between Covid-19 and other respiratory infections. b) Most people might get a Normal CT Score when there is low severity, which is not accurately identifying a patient's condition as an early diagnosis. c) Due to its rapid spread based on touching objects and human-to-human interactions, healthy people and health care staff might also be affected while taking X-Ray of infected patients. Compared to CT, X-Ray is widely available, even in remote areas in most countries. Hence, we have utilized X-Rays for this study instead of CT However, CT is a better and advanced imaging modality than X-Rays in terms of image quality and angle of visualization.
We introduced a Deep Transfer learning-based (Albahli & Albattah, 2021) method that utilizes its ability to classify input X-Ray images into normal,viral Pneumonia,and Bacterial Pneumonia. To achieve this, we have applied the Inception module and the VGG-16 Net, compared against the preexisting VGG-16 Net, Res-Net, and Inception-Net. For the proposed method, we achieved accuracies for each class in Case (01)

Related works
This section will review several methods and studies based on medical image classification, which give us insights about moving forward to conduct our research work.

Deep learning in medical imaging
With the latest improvements and advancements in computational capacity and the availability of massive datasets, researchers have developed various deep learning-based algorithms that outperform the specialists in that field, such as medical image disease detection, classification, and segmentation. Due to this, we have been able to move towards an automated diagnosis of several diseases, for example, brain tumor detection, skin lesion classification, breast cancer detection, and the severity proportion of any diseases like Covid-19. Several deep learning architectures, such as ALEX-Net (Krizhevsky et al., 2017), Le-Net (Lecun et al., 1998), Google-Net (Szegedy et al., 2015), VGG-16 Net (Simonyan & Zisserman,), U-Net (Ronneberger et al.,), and so on, have been used based upon the type of disease or task. One single deep learning architecture is not sufficient for all, and hence it gives varying performances over different tasks. Also, not all the time do we need the same performance metrics to evaluate our model, so a thorough check is necessary while designing and implementing the model. Developing a raw model is challenging and requires vast computational resources, which might not be available at all institutions. But there is a way to deal with it; we can utilize pre-trained models, trained over millions of images. We can use their weight parameters up to the second last layer of the model and tweak the final few layers based on our requirements.

Covid-19 classification
It has been 2 years since the first reported case of Covid-19 in December 2019. Deep learning-based image classification has been in existence for so long, which has been utilized by many researchers to deal with Covid-19 classification with the help of medical images -Chest X-Ray and CT.
Wang et al. suggested one prevalent approach to classification (Wang et al., 2020) in the form of the proposed Covid-19-Net. They declared the open-source network design for Covid-19 classification using Chest X-Ray as the first of its kind. They also provided an open-source Covid-19X dataset.
Another notable work by Rajpurkar (Rajpurkar et al., 2017) et al. developed an algorithm that could beat the radiologists to detect Pneumonia from the Chest X-Ray images. They utilized a public dataset of chest X-Rays and detected 14 diseases better than radiologists. It was a revolutionary work as it surpassed the radiologist's performance.
A machine-learning-based approach was also applied by Kassani et al. (2020), in which they tried several deep learning-based models and machine learning classifiers to classify the results. Arias-Garzón et al. (2021) discussed Convolutional Neural Network (CNN)-based Covid-19 detection in X-Ray images. They studied pre-existing VGG-16 and U-Net to process the Chest X-Ray and classify them as positive or negative for Covid-19. This process utilized pre-processing to remove unwanted chest X-Ray portions from the image using the lung segmentation method. The best of their model achieved Covid-19 detection accuracy of around 97%. Das et al. (2021) presented ensemble learning with a CNN-based approach. Their work has adopted multiple state-of-the-art models such as DenseNet201, Res-Net50V2, and InceptionV3. They have used a smaller dataset of 538 Covid-19 positives and 468 Covid-19 negative cases. Their approach gave a classification accuracy of 91.62%. We improve these results with a significant increase in the dataset and by tuning training parameters. Maghded et al. (2020), in one of their works, tried to create a novel AI-enabled framework to Covid-19 diagnosis using smartphone. In another notable work by , the authors performed data collection via different online resources as well as aimed to build one Covid-19 detection algorithm. To do that, they have utilised simple CNN along with pretrained Alex-Net and achieved accuracy around 94.1%. Shah et al. (2022), in their work, gave insights about the Covid-19 and its research challenges, which gave a good review about various factors associated with Covid-19.
Maghdid et al.  in one of their work designed a smartphone-based approach to manage Covid-19 lockdown by finding the contamination zones .
Ferhat Ozgur Catak et al. (Catak & Şahinbaş, 2022) in one of their latest research articles utilised a transfer-learning-based CNN model for Covid-19 detection. They proposed an uncertainty quantification-enhanced transfer-learning-based CNN to predict the presence of Covid-19. They achieved approx. 75% accuracy.
Based on the above literature survey, we found that there is a requirement for multiclass classification instead of just binary classification. This paper is an attempt to perform multiclass classification in three separate case studies to make our model more versatile, and we can use this model as per our requirement and urgency of the situation. Our model performed best for binary class classification 99.95%. It also performed very well for other case studies which involve multiclass classification. For four-class classification, we achieved an accuracy of 87.32% and for three-class classification, we achieved 96.89%

Research gaps
After reviewing the literature, we have found out few common points that we need to take into consideration • Data insufficiency to train an extensive deep neural network to accurately classify into various classes.
• Most of the studies rely upon either just CT or CT with CXR • There is a need to develop a model that gives better classification results on a smaller dataset that addresses multi-class classification.
To deal with the problem of data scarcity, we have utilized the collected dataset from different resources and then cleaned it as per our requirements. To enhance the size of our dataset, we have performed some data pre-processing and data augmentation. We stuck to Chest X-Ray for input image type mainly because it is widely available in most regions and cost-effective. We utilized it for Covid-19 detection using deep learning as early as possible.

Original collected dataset
The Curated Covid-19 X-Ray Dataset (Rajpurkar et al., 2017) has been collected, assembled, and maintained from version 1 to version 3 by a joint effort between the Indian Institute of Science, PES University, MS Ramaiah Institute of Technology, and Concordia University. We have used version 3 of this dataset. This dataset collects Covid-19 X-Ray Dataset from 15 publicly available datasets cited in the given reference.
The collected dataset includes four classes, namely-1. Covid-19 X-Rays, 2. Normal X-Rays, 3. Viral Pneumonia X-Rays, and 4. Bacterial Pneumonia X-Rays. As shown in the Figure 1, there are total of 4558 Covid-19 X-Rays, 5403 Normal X-Rays, 4497 Viral pneumonia X-Rays, and 5768 bacterial pneumonia X-Rays. Out of which, 1379 Covid-19 X-Rays, 1476 normal X-Rays, 2690 viral pneumonia X-Rays, and 2588 bacterial pneumonia X-Rays are duplicates based on the image similarities, thus are removed. Several other images that were defective or poor-quality images were also removed, and the final present dataset used in this study is as shown in Figure 2. In the present study, we have studied the behaviour of the same model to evaluate classification accuracy over

Data pre-processing
Dataset has extensive pre-processing to avoid class imbalance due to insufficient sample images from one particular class. So, to achieve that, we have applied the data augmentation method to increase the number of sample images and make it a uniform distribution. In order to perform data augmentation, we have tried ImageDataGenerator module from Keras library by tuning various data augmentation parameters like rescale = 1/255, rotation_range = 20, width_shif-t_range = 0.2, height_shift_range = 0.2, and, also, we set horizontal_flip as "True" as in chest X-Ray; we are mainly focusing about the vertical area, so horizontal flip does not make any such difference and hence does not lose much information.
In order to remove duplicate images, we searched for exact duplicates or near exact duplicates with the help of Pillow library in python. With the help of hashing, we successfully removed these kinds of duplicates.
We have also selected the same number of images from each class to avoid overfitting or underfitting to one particular class. The network remains unbiased to any specific class when performing classification tasks. Here we use the dataset as data2, data3, and data4, namely for two classes, three classes, and four classes.

Prepared dataset
The dataset has been divided into three parts for training: Training data, Validation data, and Test data, typically in a ratio of 70% Training data, 20% Validation data, and the rest 10% as Testing data. Testing data remain untouched during the training part. Hence, testing classification accuracy is significantly lower than training and validation classification accuracy, which we will discuss later in the results section. Here we present Figure 3 for the number of images for each class used for training the proposed neural network architecture.

Basic block diagram
The basic block diagram of the proposed model architecture operation is given below in Figure 4. It starts with data preparation into training (70%), validation (20%) and testing (10%). After that, we performed some data pre-processing and augmentation operation as described in the data preprocessing section. After that, we prepared our transfer learning-based model by freezing all the input layers except the output layer. Then, we modified the output layer as per our desired goal. We then trained our model and evaluated the performance of our model against pre-existing methods.     Tiwari et al., Cogent Engineering (2022)

Inception module
It is a well-established fact to note that the deeper the network, the better its performance, but it comes with a few drawbacks, such as a more extensive network means a higher number of parameters and suffers from overfitting if the training dataset is minimal. Another major drawback is an increase in computational complexity and resource requirements. To deal with these problems, we promote sparse architectures rather than dense ones.
It allows the internal layers to pick and choose which filter size will be relevant to learning the required information. Larger filter size (3 × 3) is used for global feature extraction, and smaller filter size (2 × 2) gives the information about distributed features. When the size of the target in the image is different, the layer works accordingly to recognize the target. In the present model, we have used this benefit of the Inception block originally used in Inception v1 to extract features from the images. Once these features are extracted by different filters, they are concatenated before they are fed to the next layer as shown in Figure 5.

VGG-16 net
The VGG-16 Net is in Figure 6. The basic principle is to investigate neural network depth for largescale visual recognition by utilizing multiple tiny (3 × 3) convolution filters to increase the network depth. ImageNet Challenge 2014 used VGG-16 Network for large-scale image recognition into 1000 classes. The original VGG-16 net used for this challenge used an image size of 224 × 224, and  for the pre-processing step, they just subtracted its mean value. Then, the image is passed through several 3 × 3 convolution layers along with stride set to 1, and padding is also set to 1. They used several max-pooling layers following convolution layers to preserve the spatial resolution after convolution. After going through several iterations to these convolutions and max-pooling layers, they used a stack of dense layers which are fully connected, and the final layer was a SoftMax layer that gave classification into 1000 classes.

Scaled Exponential Linear Units (SELU) activation function
We have seen the popularity of the RELU (Rectified Linear Unit) activation function due to its capabilities to deal with vanishing gradients. This gives zero gradients for negative values and unit gradient as when the input values are positive, defined by But it suffers a dead state when the rate of change of weights is very high, and the resulting value of x is very small in the next iteration that RELU is stuck at the left side of zero. Hence, due to its zero-gradient value, affected cell disconnects to contribute to training in the network. Although you get rid of vanishing gradients, it causes dying RELUs in various cells in the network.
RELU is suitable for its low complexity due to its linear nature for positive x values. Still, to deal with the problem of dying neurons for the negative values, we have modified it to another called Leaky RELU. We now have two options to choose from, either RELU (dying neuron) or Leaky-RELU (little risk of vanishing gradients). But there is another approach, SELU dealing with it with the advantage of its quality to self-normalization.
In this paper, we have used the "SELU" activation function instead of the famous "RELU" function, introduced in September 2017. Basic SELU activation function is defined by It works similarly to RELU for positive values of "x"; there is no problem with vanishing gradients, and there are no dying neurons for negative values of "x"s. Hence, SELU is our preferred choice for this research work.

Deep transfer learning
Since the introduction of CNN, it has found its use in many applications (Yang & Li, 2017), including autonomous car driving, AlphaGo championship, image classification, and segmentation. With the increasing dataset resources and computational capabilities, it has taken rise in its use for automated diagnosis of diseases using medical images through classification and segmentation. A simple CNN is a sequential stack of multiple layers with many neurons. The deeper the network, with more and more data, the better the predictability of the outcome. However, it has certain limitations (Ching et al., 2018) as it requires more computational resources and requires an enormous dataset, which might not be available and not in appropriate form. To deal with such a problem, we utilize pre-trained weights and models and change a few output dense layers according to our need to classify the results. Mathematically (Tan et al., 2018), we denote it with some notations such as Domain (D) and Task (T).

x-Feature Space, P(X)-Edge probability distribution, and X
where, y -Label space, f(x) -Target prediction function or conditional probability function

Proposed model
The proposed method utilizes the properties of the inception module along with the VGG16 model, which carries three convoluted layers stacked parallel with different kernel window sizes as 1 × 1, 3 × 3, and 5 × 5. These layers have a "SELU" activation function with padding as "same." This module takes four inputs, namely -layer_in, f1, f2, and f3, defined as image input size and sizes of respective kernel filter windows of three convolution layers, respectively. In our experiment, we have taken image input size as 128 × 128 × 3 and f1 = 16, f2 = 16, and f3 = 32.
The modified VGG-16 net in our research work starts with the inception block in which we performed a concatenation of three stacked convolutional layers, whose output will serve as an input to the VGG-16 neural network as shown in Figure 7. VGG-16 network consists of five blocks (Block-1 to Block-5) and the final classification block. We start with the first block, which takes input from the output of the inception module. Block-1 consists of two Conv2D layers along with one max pooling layer. Parameters used for these Conv2D layers are number of filters = 64, kernel size = 3, stride = (1,1), padding = same, learning rate = 0.0002 and for Maxpooling2D layer, pool size was set to (2,2), stride = (2,2) and padding as "valid." Block 2 is similar to Block 1 except for the increase in filters from 64 to 128. Block three is similar to the previous block, but with an additional Conv2D layer and increased filters from 128 to 256 for Blocks 4 and 5 with increased filters. Finally, the generated features feed into three sequential dense layers with appropriate activation functions for classification. We used the SoftMax activation function for multi-class classification while using the sigmoid activation function for binary classification. At the last Dense layer, we select the number of neurons as the total number of classes to classify. We have selected 4,3,1 as the number of neurons for four-class, three-class, and binary class classification, respectively, at the output layer.

Training the network
In this proposed model, we have utilized pretrained VGG-16 Net and Inception Net. In order to compare the performance of our model, we have trained our dataset on VGG-16 Net, Res-Net, and Inception-Net. In the Table 1, we have summarised the performance parameters of original papers with their implementation year and number of parameters.
We have utilized the Keras 2.3.1 framework with Tensorflow 2.1.0 and the scikit-learn library to train the proposed model. Complete experiments were performed on a Lenovo Legion Y730 Laptop with configurations such as-Nvidia RTX 2070Q, 8GB DDR5 GPU, 16GB DDR4 RAM, CUDA 11.2 with 1TB SSD. We have used the "SELU" activation function and input image dimension as 128 × 128 × 3. We have trained data on three pre-existing networks by freezing the input layers and then train the network by using "Image-Net" weights. We have similarly trained our model. Training parameters include, batch_size = 16, steps_per_epoch = 100, and learning_rate = 0.0001 and model was trained for 50 epochs. These model hyperparameter values have been summarised in Table 2.

Performance evaluation parameters
A classification model is used to find the separate classes as an outcome belonging to a particular category by finding the maximum probability of occurrence. The confusion matrix defines the performance of the classifier system by finding out the cases of true positive (TP), true negative (TN), false positive (FP), false negative (FN), and so on. A typical confusion matrix may look like this-Here, True Positive (TP) = Cell (1) False Negative (FN) = Cell (2) + Cell (3) False Positive (FP) = Cell (4) + Cell (7) True Negative (TN) = Cell (5) + Cell (6) + Cell (8) + Cell (9) All the performance metrics (Figures 8-13) are utilizing these values to find out metrics like Accuracy, Precision, Sensitivity, Specificity, False-Positive Rate (FPR), F1-Score, and so on, as given below.  Figure 11. Specificity chart. Tiwari et al., Cogent Engineering (2022)

Performance evaluation parameters for all three cases
Six performance parameters -Accuracy, Precision, Specificity, Sensitivity, False-Positive Rate (FPR), and F1-Score -have been shown in Figures 8-13. We have calculated these parameters and compared them with three pre-existing models. We have trained and tested this dataset for all models and cases. Figures  14-16 give us clear insights into which performance metrics performed better in our proposed model. We have seen performance improvement despite considering that we are using a smaller dataset and utilizing less computational power than pre-trained VGG-16 Net, Res-Net, and Inception-Net.

Discussion and conclusion
In this paper, we have proposed a transfer-learning-based model, which discusses three case studies -Case (01) -Four-Class Classification, Case (02) -Three-Class Classification, and Case (03) -Two-Class Classification.
For Case (02), for our proposed model, we have found out the model training and validation accuracy of approx. 97.13% and 96.48%, respectively, and the testing accuracy on 387 images was approx. 96.89%. We also evaluated other performance parameters for each class as Accuracies -Covid-19 (97.67%), Normal (97.93%), and Pneumonia Bacterial (98.19%).
For Case (03), for our proposed model, we have found out the model training and validation accuracy approx. of 98.50% and 98.63%, respectively, and the testing accuracy on 258 images was approx. 99.95%. We also evaluated other performance parameters for each class as Accuracies -Covid-19 (99.61%) and Normal (99.61%).
All these performance parameters have been evealuated with the confusion matrices as shown in Table 3 and Table 4. After evaluating the performance parameters, we have seen overall model performance comparison in Figures 14-16.We see that our model performed well in most cases and worked well to identify individual classes. The introduction of the SELU activation function enabled us to deal with the vanishing gradients and tackle the problem of dying neurons, which might affect the network training.
Although this method has found promising results, there is still scope for performance improvement. With the availability of larger datasets and computational power, this seems achievable. But still, the purpose of this study was to assist in the early diagnosis of Covid-19 from other flu-like symptoms like Viral Pneumonia and Bacterial Pneumonia. An early diagnosis might be helpful to cluster people into several classes and treat them accordingly. In the future, we can collect more data and try experimenting with more and more models to get better accuracy, which can be helpful for the complete automated diagnosis of Covid-19, just by an X-Ray or CT.

Funding
The authors received no direct funding for this research.

Disclosure statement
No potential conflict of interest was reported by the author(s).