A fused lightweight CNN model for the diagnosis of COVID-19 using CT scan images

Computed tomography is an effective tool that can be used for the fast diagnosis of COVID-19. However, in high case-load scenarios, there are chances of delay and human error in interpreting the scan images manually by an expert. An artificial intelligence (AI) based automated tool can be employed for fast and efficient diagnosis of this disease. For image-based diagnosis, convolutional neural networks (CNN) which is a subcategory of AI has been widely explored. However, these CNN models require significant computational resources for processing. Hence in this work, the performance of two lightweight least explored CNN models, namely SqueezeNet and ShuffleNet have been evaluated with CT scan images. While SqueezeNet produced an accuracy of 86.4%, ShuffleNet was able to provide an accuracy of 95.8%. Later, in order to improve the accuracy, a novel fused-model combining these two models has been developed and its performance has been evaluated. The fused-model outperformed the two base models with an overall accuracy of 97%. The analysis of the confusion matrix revealed an improved specificity of 96.08% and precision of 96.15% with a better fallout and false discovery rate of 3.91% and 3.84%, respectively.


Introduction
The COVID-19 disease which broke out in Wuhan, China, towards the end of 2019 has spread throughout the world to infect more than 240,061,454 individuals and claimed 4,887,600 lives by mid-October 2021 [1]. Its outbreak has caused a major strain on the already compromised healthcare systems of most of the countries, where the healthcare personnel, especially the nurses are subjected to unbalanced workloads or undue stress [2]. The reverse transcription-polymerase chain reaction (RT-PCR) test is considered to be the gold standard for the diagnosis of COVID-19. Despite that, it has limitations such as the need for qualified technicians, time-consuming manual process, lack of availability of test kits and higher test expenses. It also suffers from a low detection rate and low sensitivity and hence multiple tests are required for diagnosis [3][4][5]. Significant information for the diagnosis of COVID-19 can be obtained by investigating the radiological images like X-rays and chest computed tomography (CT) scans of the infected individuals [3]. A major advantage of CT scan over chest X-ray is that it enables detailed visualization of the organ in three dimensions and allows examination of all types of tissues which can aid in better diagnosis of the disease [6]. CT is also a sensitive method for diagnosis of COVID- 19 and it has been widely used in conjunction with RT-PCR for improved diagnosis of this disease [3]. Qualified radiologists and physicians can apprehend these scan images and screen out COVID-19 infected ones from the images pertaining to other health conditions. Still, there are chances for human error in interpretation mainly due to the disparity in pathology [7][8][9]. It is at this juncture that, artificial intelligence (AI) based interpretation of radiological images finds a significant place in medical diagnosis.
In the past, traditional machine learning algorithms were explored by the researchers for the diagnosis of various medical conditions [10][11][12][13][14][15][16][17]. This approach requires carefully selected features for building an effective prediction model. Some of the research works, pertaining to the use of machine learning techniques for medical diagnosis, are furnished in Table 1. However, in recent years, specifically for radiological image (X-rays, CT scans, mammograms and MRI scans) based diagnosis, convolutional neural networks (CNN) are being widely explored for various medical conditions from Alzheimer's to appendicitis [18][19][20][21][22][23][24][25].
Currently, the COVID-19 pandemic demands such a breakthrough technology for rapid diagnosis using CT scan images. Some of the earlier studies utilized a limited number of CT scan images with a transfer learning approach (pre-trained models) for the diagnosis of COVID-19. One of the studies by Wang et al. [26] used a In another study, making use of 360 COVID-19 and 397 non-COVID-19 CT scan images and employing a decision fusion approach, Mishra et al. [28] were able to detect COVID-19 with an accuracy of 88.34%. The decision fusion approach consists of multiple CNN models and individual predictions of each of these models are combined by a majority voting approach to improve the overall efficiency of the baseline models. Ying et al. [29] produced an accuracy of 86% by training 777 and 708 CT scan images pertaining to COVID-19 and healthy subjects respectively using the DRE-Net. Some of the notable works using the CT scan images for the diagnosis of COVID-19 have been furnished in Table 2.
Today, lightweight CNN models such as MobileNet, SqueezeNet, ShuffleNet and so on are of great interest among researchers as they enable its implementation in devices with minimum computing resources. Ardakani et al. [37] utilized 10 different CNN models including both shallow and deep models. SqueezeNet, MobileNet V2 and ResNet-18 reported an accuracy of 82.84%, 92.16% and 91.67%, respectively. But the deeper architectures, namely ResNet-50 and ResNet-101 produced an accuracy greater than 98%. In another study, Pham [35] evaluated the diagnosis of COVID-19 using 16 different CNN models which also produced a similar lower accuracy for shallow CNNs. For example, SqueezeNet had an accuracy of 78.52% and AlexNet had 74.50%. Silva et al. [31] employed EfficientNet-B0 and utilized 1601 COVID-19 CT scan images and 1693 non-COVID-19 images to obtain a classification accuracy of 87.68%. In a study by Polsinelli et al [30], they developed their own CNN architecture inspired by SqueezeNet architecture and reported an accuracy of 85.03%. In this study, the authors are proposing a novel fused-model that combines the layers of SqueezeNet [38] and ShuffleNet [39] that utilizes the features from both the models. This benefits the prediction performance as it integrates the advantages of features from these two distinct architectures. Other researchers have also proposed similar fusion methods and have reported an improved accuracy [40][41][42]. A custom classifier has been added for the diagnosis and the model has been fine tuned for improved performance. The paper is organized in such a way that Section 2 covers materials and methods which include the details of the utilized original dataset, techniques employed to increase the dataset, chosen CNN models and the implemented modifications. The encouraging results along with associated discussions are provided in Section 3 followed by a brief discussion on the limitations of the proposed model in Section 4. Conclusion remarks are provided in Section 5.

Materials and methods
The CT scan dataset used for the study is publicly available (https://www.kaggle.com/plameneduardo/sarscov 2-ctscan-dataset). The dataset has 1252 and 1230 slices of COVID-19 and non-COVID-19 categories, respectively, obtained from a total of 120 patients [43]. Typically, deep learning models perform better with a large number of images and hence the images in the dataset have been increased using the augmentation approach. The process of augmentation involves the introduction of various transformations and uncertainties such as blurring, shearing, flipping, and distortion in brightness, rotation and translation to the images. This has proved in the elimination of overfitting and better generalization of the deep learning models. In this study, a random displacement in the range of ±10 pixels along with a random rotation of ±5°and random intensity value alteration in the range of ±20 have been carried out. In addition, image blurring, shearing of ±20°and image flipping operations have also been performed to increase the dataset size. As a result, the total number of images has been increased to 17,367 and the dataset has been split into training, validation and test sets as shown in Table 3.
Each of the selected model in this study perquisites an input image of a specific dimension and hence the images have been resized to 224 × 224 for Shuf-fleNet and 227 × 227 for SqueezeNet. Also, the images have been normalized with zero centre normalization to have the input data on the same scale for efficient learning.
The hardware used for the study consists of 8 GB RAM and 4 GB NVIDIA 1050Ti graphics card. All the training and analysis operations have been performed using MATLAB2020b. The entire methodology implemented for performing the experiment is shown in Figure 1. Initially, the experiments have been carried out with the two lightweight CNN models, namely SqueezeNet and ShuffleNet using transfer learning. In transfer learning, the deep learning models are already loaded with pre-trained weight and bias parameters as a result of training with a large dataset such as Imagenet dataset. These pre-trained models have shown to converge faster as the models tend to learn certain basic features from the images of the Imagenet dataset.
Although SqueezeNet and ShuffleNet are lightweight CNN models, the authors have followed an entirely different strategy to reduce the size and the computational resources required for execution. For example, SqueezeNet has a fire module block which consists of squeeze layers with 1 × 1 filters and expand layers of 1 × 1 and 3 × 3 filters, whereas ShuffleNet uses shuffling of channels and point-wise grouped convolutions. SqueezeNet has shown to produce an accuracy equivalent to AlexNet but with reduced parameters from 60 million to 0.4 million, which reduces the memory space requirement from 240 to 4.8 MB and execution speed. ShuffleNet has shown to produce lower classification error compared to AlexNet with the reduced multiplication operation that reduces the need for higher computing power. Before performing the training operation, a few layers have to be modified for the diagnosis of COVID-19 from chest CT scan images. In the ShuffleNet model, the last layers i.e. the fully connected, softmax and classification layers have been replaced with new layers suitable for the diagnosis. The parameters in the fully connected layers have been randomly initialized and require a higher learning rate compared to the pre-trained layers for faster convergence. The models have been fine tuned to follow hyperparameters setting, namely minibatch size of 32, 5 epochs, learning rate of pre-trained layers 0.00001, L2 regularization 0.1, the weight learning rate of new layers 0.00003 and momentum 0.9.
Similarly in SqueezeNet, the last 5 layers i.e. convolution, ReLU, average pooling, softmax and classification layers have been replaced with new convolution (filter size corresponding to the number of classes), ReLU, average pooling (same filter size of 14 × 14), softmax and classification layers respectively. The hyperparameter configuration fine tuned for this study has been chosen to have a minibatch size of 32, 5 epochs, a learning rate of pre-trained layers 0.00001 and momentum 0.9. The result obtained using the above models has been presented, analysed and discussed in Section 3.
In the second experiment, the two pre-trained CNN models, namely SqueezeNet and ShuffleNet have been fused together to form a hybrid model for improving the diagnosis of COVID-19 using CT scan images as shown in Figure 2. The two CNN models were used as a feature extractor and the features obtained have been combined to provide it to a classifier for classification. In order to implement the proposed architecture, the feature maps resulting from the last feature extraction layers must be of the same dimension and concatenated in the depth dimension to combine the features.
To facilitate the implementation, the fused-model has two input streams 1 and 2 for providing the input image to the SqueezeNet and ShuffleNet. Initially, the images of dimension 227 × 227 are provided as input which are then resized to 224 × 224 using a maxpooling layer with a filter size of 4 × 4 and stride 1 for the ShuffleNet model. In the case of SqueezeNet, as the input dimension matches with the provided input image, it is directly fed to the subsequent layers for feature extraction. As the features need to be combined from both the models using the activation maps resulting from the last ShuffleNet block and the fire module block, the dimension of the maps should be the same. The activation map resulting from SqueezeNet is of dimension 14 × 14 which is then downsampled to 7 × 7 using a maxpooling filter of dimension 2 × 2 and stride 2. Hence, the resulting activation maps from ShuffleNet and SqueezeNet have dimensions of 7 × 7 × 544 and 7 × 7 × 512, respectively. A depth concatenation layer has been added to combine the maps along the depth direction. Finally, two hidden fully connected layers, each of 1024 neurons have been added to the fused-model for the efficient classification of the CT scan images. The last fully connected layer has a number of neurons proportional to the number of classes which then ends with the softmax and classification layer. The fused-model architecture created for this study is shown in Figure 3. The pseudo-code for the implemented algorithm is presented in Table 4. The hyperparameters fine tuned for the training of the fused-model are furnished in Table 5.  The trained and validated models have been tested with an augmented test dataset. Further, classification accuracy has been analysed using the confusion matrix and performance metrics. In addition, the features which significantly influence the classification process have been visualized using Class Activated Mapping (CAM) [44]. The obtained result and analysis using the fused-model have been discussed in Section 3.

Results and discussion
The three models, namely ShuffleNet, SqueezeNet and fused-model have been trained using the training dataset with the validation frequency of 100 iterations. The training progress with the minibatch accuracy and loss value for these models are shown in Figure 4. The initial accuracy of the validation dataset before training using the pre-trained features was 52.02% for SqueezeNet which was higher than ShuffleNet (47.50%) and the fused-model (49.49%). This shows that some of the basic features required for diagnosis have been already learned which significantly contributed to the initial classification result. But, interestingly after 100 iterations in the first epoch, the validation accuracy of the fused-model was the highest with 85.48% which was followed by ShuffleNet (82.12%). The performance of SqueezeNet was poor where the accuracy improved only marginally (61.90%). It is also evident that the deviations in both minibatch accuracy and loss value were much higher in SqueezeNet compared to ShuffleNet and the fusedmodel. This shows the ambiguity in the features of SqueezeNet required for the correct diagnosis of COVID-19. The ShuffleNet and fused-model resulted in nearly identical training accuracy of 98.2% and 98.5%, respectively but, the accuracy saturated at 86% for SqueezeNet which is relatively lower. At the end of five epochs, the final validation accuracy obtained for the above-mentioned models were 94.4% (ShuffleNet), 85.9% (SqueezeNet) and 94.2% (fused-model).
The trained and validated models were then tested with a test dataset consisting of 1738 images of COVID-19 and non-COVID-19 categories. In the case of the test   set, the accuracy of the fused-model was 97% which is better than the ShuffleNet (95.8%) and SqueezeNet (86.4%). The classification performance was further analysed using the confusion matrix as shown in Figure 5. Sensitivity and specificity are important metrics that can provide an insight into the performance of the models. It can be estimated using the True Positives (TP), True Negatives (TN), False Positives (FP) and False Negatives (FN). For instance, the fused-model has TP, TN, FP and FN of 851, 834, 34 and 19, respectively. The estimated sensitivity and specificity of the fusedmodel from the above values were 97.81 and 96.08, respectively, based on the following equations: In the case of SqueezeNet, sensitivity and specificity are lower compared to the fused-model and ShuffleNet since the number of FN and FP for the detection of COVID-19 from CT scan images was higher. The higher FN for the diagnosis would result in a significant delay in treatment. Similarly, many images in non-COVID-19 category were falsely predicted to the COVID-19 category compared to the other models, namely ShuffleNet and the fused-model. This will also result in an unwanted panic to the patient which will increase the stress on the healthcare workers. In comparison, the fused-model has lower FN and FP and hence the sensitivity, specificity and accuracy improved significantly. Further, the assessment was carried out using performance metrics estimated from the confusion matrix as shown in Table 6. From Table 6 it is evident that the fused-model and ShuffleNet have approximately equal sensitivity values but the former has better specificity and precision values of 96.08% and 96.15%, respectively. This is due to the lower false prediction of CT scan images belonging to the non-COVID-19 category and higher correct prediction of the COVID-19 category. The fallout or false positive rate (FPR) and false discovery rate were also lower compared to the ShuffleNet and SqueezeNet models. The fused-model produced a best F1 score of 96.98% and hence it acts as an efficient model for diagnosis compared to the original models. In addition to this, the performance of the proposed model was compared with other popular lightweight pre-trained CNN models, namely GoogLeNet, MobileNetV2, Nas-NetMobile and EfficientNet-B0 [45][46][47][48] (as shown in Table 7). It shows that the proposed fused-model depicted a better sensitivity for the detection of COVID-19 with a higher F1 score and accuracy compared to the other models. The assessment of the confusion matrix using performance metrics shows the diagnostic effectiveness of the trained model. The change in the performance of the model in terms of its prediction capability can be assessed using the prediction score. It acts like a confidence index that shows the probability of a CT scan image belonging to COVID-19 or non-COVID-19 categories. At first, the prediction score for the correctly predicted COVID-19 category images was analysed. The assessment of the prediction score was performed for the SqueezeNet, ShuffleNet and the fused-model. Among the three models, ShuffleNet produced the best average prediction score of 0.98 followed by the fused-model and SqueezeNet with 0.97 and 0.87, respectively. Although the average prediction score can provide a general picture of the model's ability, in this study it has been categorized in five different ranges as shown in Figure 6. The number of images in each range is counted and the model's performance is assessed based on these numbers. Based on the estimation, it has been found that the number of images having prediction scores greater than 0.95 was higher in the ShuffleNet model. But, more number of images of the COVID-19 category have been rightly predicted in the case of the fused-model. This is due to the tradeoff between the beneficial and non-beneficial features from SqueezeNet that lowered prediction scores in a few cases but increased the number of correct predictions of COVID-19 images for the fused-model.
Similarly, in the case of non-COVID-19 category images, the prediction scores for a few images have dropped, but the number of correct predictions has increased, which is evident from Figure 7(c). When the prediction scores for the incorrectly predicted COVID-19 category were analysed for the fusedmodel, there was a reduction of 50% of images in the range of 0.95-1, 75% of images in the ranges of both 0.9-0.95 and 0.8-0.85 compared to the ShuffleNet as shown in Figure 8. For the ranges 0.51-0.8 and 0.85-0.9, there was an increase in the number of images as prediction scores for the false category has dropped from the higher ranges. This shows the improvement in the performance of the fused-model compared to ShuffleNet and SqueezeNet.
The prediction scores for incorrectly predicted non-COVID-19 to COVID-19 categories (as shown in Figure 9) with the fused-model show that the number of images in the ranges of 0.95-1, 0.9-0.95, 0.8-0.85 and 0.51-0.8 dropped by 60%, 50%, 40%, 18%, respectively, compared to ShuffleNet. This shows that the prediction capability improved in the case of the fused-model as the prediction score for the falsely predicted category has dropped. Even though the scores for the correctly predicted category reduced slightly, they still remained greater than 0.8 for 94.4% and 93.1% of images belonging to COVID-19 and non-COVID-19 respectively.
In order to understand and visualize the reason for misclassification, CAM has been utilized. This can provide information regarding the features of the images that significantly influenced the decision process of the deep learning models. Sometimes, the features in the background might influence the prediction of the model or other reasons that can be inferred from the visualization. A visualization of sample image of the COVID-19 category for the three models is shown in Figure 10.
From Figure 10 it can be observed that there are several areas in red which are the strongest areas of activation that influence the prediction of the deep learning models. In the case of SqueezeNet, as shown in Figure 10(a), several areas of the lungs which do not have the symptoms for COVID-19 depict a strong activation. This clearly shows the inability of the model in learning the specific features of COVID-19. But, in the case of ShuffleNet, it reveals that the features confined to the symptomatic areas of COVID-19 in the lungs depicted higher activation compared to the SqueezeNet. Still, a significant portion of the strongest activation area is spilled outside the lungs. In the case    of the fused-model, the areas of strongest activation are more focused compared to ShuffleNet. When the visualization for a few of the incorrectly categorized images was analysed for the best performing fused-model, the area of strongest activation was found outside the lung area. This was mainly due to insufficient quantity of pixels for lungs in the particular slice and distortion of symptoms beyond recognition due to the performed augmentation.
The diagnosis of COVID-19 using CT scan images can be performed using the CNN models which are being widely investigated. But, most studies have focused on using the CNN models with more depth which demands a significant requirement of computing resources and GPU for processing. Contrastingly, lightweight CNN models are not being explored widely especially SqueezeNet and ShuffleNet for diagnosis of COVID-19 using CT scan images. These lightweight models can be executed in devices such as smartphones with limited computing capacity instead of using cloudbased platforms. Although these shallow CNN models are efficient in using computing resources, some compromises in terms of diagnosis accuracy have to be accepted. To counteract this problem, two lightweight CNN models have been fused in this study to produce an improved performance that can be used for the diagnosis of COVID- 19. In an earlier study by Polsinelli et al. [30], with their own lightweight CNN model inspired from the SqueezeNet architecture were able to produce an accuracy of 85.03% whereas in our study an accuracy of 97% was achieved with the fusion of SqueezeNet and Shuf-fleNet. Another study by Pham [35], explored 16 different CNN models which included both shallower and deeper architectures. Interestingly, the dataset without augmentation produced an improved accuracy compared to the one with the augmented dataset. This is in contrast to this study and many other studies which always depicted an improved performance with data augmentation [6,30,31,34]. In addition, the studies of Pham [35] did not analyse the regions of the image which influenced the classification process. The assessment on the prediction score has not been performed by many previous studies which motivated the authors to focus on understanding the variation in performance of the original and fused-models. Although the classification performance from the SqueezeNet was lower, some of the essential features which were not captured by ShuffleNet have been utilized in the fusedmodel that resulted in the improved performance of the fused-model. This enhancement in the performance was evident from the confusion matrix and feature visualization. Especially the fusion was effective in the non-COVID-19 category where the number of misclassification dropped significantly that decreased the false positives.

Limitations
The focus of this study was to detect COVID-19 from CT scan images incorporating deep learning techniques. Hence, the proposed model will not be suitable for the detection of other viral or bacterial pneumonia having features similar to COVID-19. As a consequence, the proposed trained model may predict the CT scan images of a few subjects with other viral or bacterial pneumonia to COVID-19. Hence, it is always desirable to use this tool in conjunction with the RT-PCR test for fool-proof diagnosis of COVID-19. On the other hand, researchers like Xu et al. [27] performed experiments using CT scans of other pneumonia and COVID-19 related pneumonia but reported a lower accuracy of 86.7% which may be due to the similarity of symptoms.
Another limitation of the proposed model is its inability to determine the severity of the symptoms which may be crucial in the clinical diagnosis for determining the effective treatment methods. In addition to these, insufficient pixels of lungs in a few CT scan slices and a limited number of parameters of lightweight CNN models, resulting in compromise of diagnostic accuracy. Despite these limitations, the deep learning tool in existing conditions can act as a supportive aid in the diagnosis but needs further improvement as well as analysis before actual deployment.

Conclusion
In this study, the performance of two lightweight CNN models -SqueezeNet and ShuffleNet and a novel fusedmodel developed by combining the layers of these two base models have been evaluated to diagnose COVID-19 from CT scan images. A total of 2480 publicly available chest CT scan images (COVID-19: 1252 and non-COVID-19: 1230) have been used in the study and the number of images has been increased to 17,367 (COVID-19: 8764 and non-COVID-19: 8603) employing data augmentation. With the test set, when SqueezeNet and ShuffleNet produced an accuracy of 86.4% and 95.8%, respectively, the novel fused-model was able to classify COVID-19 and non-COVID-19 CT images with an overall accuracy of 97%. The fusedmodel out performed SqueezeNet and ShuffleNet with higher sensitivity (97.81%), specificity (96.08%), precision (96.15%), Negative Predictive Value (97.77%) and F1 score (96.98%). Hence this study suggests that the novel fused-model derived from SqueezeNet and Shuf-fleNet is a promising model for diagnosing COVID-19 from CT scan images making use of limited computing resources. Healthcare systems should make use of AI-based systems which blend such predictive models with conventional radiological imaging for substantially low cost, fast, reliable and accurate assessment of COVID-19 risk.

Disclosure statement
No potential conflict of interest was reported by the author(s).