A modified version of GoogLeNet for melanoma diagnosis

ABSTRACT Differential diagnosis of malignant melanoma, which is the cause of more than 75% of deaths amongst skin lesions, is vital for patients. Artificial intelligence-based decision support systems developed for the analysis of medical images are in the solution of such problems. In recent years, various deep learning algorithms have been developed to be used for this purpose. In our previous study, we compared the performances of AlexNet, GoogLeNet and ResNet-50 for the differential diagnosis of benign and malignant melanoma on International Skin Imaging Collaboration: Melanoma Project (ISIC) dataset. In this study, we proposed a CNN model by modifying the GoogLeNet algorithm and we compared the performance of this model with the previous results. For the experiments, we used 19,373 benign and 2197 malignant diagnosed dermoscopy images obtained from this public archive. We compared the performance results according to the eight different performance metrics including polygon area metric (PAM), classification accuracy (CA), sensitivity (SE), specificity (SP), area under curve (AUC), kappa (K), F measure metric (FM) and time complexity (TC) measures. According to the results, our proposed CNN achieved the best classification accuracy with 0.9309 and decreased the time complexity of GoogLeNet from 283 min 50 to 256 min 26 s.


Introduction
Melanoma is a deadly form of skin cancer. Diagnosis of melanoma at an early stage significantly increases the chance of survival than 75% of deaths (Argenziano et al., 2012), although it is observed as low as 4-5% among other types of skin cancer (Miller & Mihm, 2006). These skin lesions can be easily treated if they are detected in their early stages. If it is diagnosed late, melanoma can pass to the last stage very quickly and result in metastasis. Due to this fact, the patient's chances of survival decreases. According to the statistics, a patient in the last stage of melanoma can survive up to 5 years.
The experience of the expert dermatologist performing the clinical examinations is a very important factor for the early diagnosis of melanoma. In clinical examinations, the diagnosis success rate of skin lesions increased with the use of images obtained via dermoscopy devices.
Dermoscopy is used to enlarge the visual features of the lesions that cannot be seen with the naked eye. Despite the presence of this type of images, it is still difficult to distinguish skin lesions which have very similar features such as shape, texture, edge irregularity and so on. Based on the observed features, various analysis methods are used to decide whether the lesion is benign or malignant. Experts use the ABCD (Rigel et al., 2010, september) rule, CASH algorithm (Henning et al., 2007), Menzies method (Vestergaard et al., 2008), 7-point control list and 3-point checklist methods (2008,2008) to make a decision about the type of the lesion. Although these methods have differences and advantages compared to each other, the golden rule for the differential diagnosis of skin lesions is pathological examinations. Requesting a pathological examination for each case is unnecessary, laborious and costly. It is also a worrying process for the patient.
As a result of advancements in artificial intelligence, various algorithms and methods are being developed to increase the success rates of clinical diagnosis. Promising results are obtained in the studies carried out by applying deep learning models of which have increased in popularity in recent years. In some studies, on comparing the performances of artificial intelligence models and expert dermatologist, it has been revealed that deep algorithms provide more successful results in the diagnosis of skin lesions.
Melanoma could not be easily recognized before the usage of dermoscopy and discovery of the analysis methods. Due to lack of these information most of the patients who suffered from melanoma had died because of the aggressive growth of this skin disorder. After discovery of the ABCD method, melanoma has been diagnosed by its features like asymmetry, border irregularity, colour variation and diameter over 6 mm. This ABCDE abbreviation was formed by combination of those features first letters.
In this study, we propose a modified version of GoogLeNet for the malignant and benign melanoma classification. This paper is an extended version of the ACIIDS 2020 conference paper . In our previous work, we have tested wellknown CNN architectures for the benign and malignant skin lesion classification problem on ISIC dataset (Project(n.d.), ISIC) images. Then we compare those architecture performances in terms of accuracy and time complexity. In this version, we modified Goo-gLeNet and compared its performance with the previous study results. We also compare these results via Visual PAM chart graphics.

Deep learning models for benign and malignant skin lesion classification
Deep learning which is a subset of machine learning has gained popularity recent years due to the improvements on computation power, especially on graphic processing units (GPU). LeNet-5 model which have been developed by Yann LeCun (LeCun et al., 2015) in 1998 was used on MNIST dataset achieved the first successful results in this domain. Although DL was known for a while, implementation of these models on complex and huge datasets was not feasible due to the lack of computation power. Convolutional neural network (CNN) is a subset of deep learning techniques based on arbitrary convolution operations. The aim of these convolutions is to extract meaningful features from the given input image, signal, or text information, and summarize the input data with basic features to complex ones.
CNNs are mostly being used in image analysis domain. Recent years, especially in medical image diagnosis field various CNN models have been emerged for differential diagnosis of various diseases. Skin lesion diagnosis by using deep learning methods is one of these popular field of interests (Hosny et al., 2019;Mahbod et al., 2019, may;Ray et al., 2020, may;Zhang et al., 2019). In particular, melanoma classification with deep learning methods gained popularity in last years (Ali et al., 2019, july;Favole et al., 2020, may;Winkler et al., 2019;.

AlexNet & ResNet50 models
AlexNet (Krizhevsky et al., 2017) which was the winner of 2012 ImageNet contest is one of the plain and mostly used CNN model in last year's studies. The main goal of AlexNet was to classify a huge dataset including 14 million images into 1000 different categories on Ima-geNet competition which has been established in 2010 amongst different machine learning methods. Conventional machine learning methods were the winners of this competition till AlexNet outperformed all competitors in 2012. AlexNet became the winner with a 15.3% top-5 error rate by achieving 10% more successful results than its closest competitor.
AlexNet architecture is one of the simple versions of CNNs like LeNet-5. But it is deeper than this previous model. AlexNet has 2 extra convolution layers than LeNet-5. Also there were other modifications in AlexNet, such as additional fully connected layers, a max pooling layer instead of average pooling layer and an RELU (Rectified Linear Unit) as an activation function instead of tanh and sigmoid functions as preferred in LeNet-5.
ResNet (He et al., 2016) which was the 2015 winner of the ImageNet competition is a deeper version of CNN models. While going deeper with convolutions in ResNet, to overcome gradient vanishing problem it involves some shortcuts between different convolution blocks. Because of this technique, ResNet was identified as Residual Neural Network. ResNet became the winner of ImageNet 2015 competition with a top 5 error rate of 3.57%.
Various versions of ResNet can be used in different purposes. Although the ResNet version which was the winner of ImageNet competition has 152 layers, there are other versions such as ResNet-18, ResNet-32 and ResNet-101. In this work, we compared the performance of ResNet-50 which is a plainer and more successful version ResNet.

GoogLeNet model
GoogleNet (Szegedy et al., 2015) which was the winner of ImageNet competition in 2014 has been identified as a deeper and wider CNN model. GoogLeNet achieved this success with a 6.67% top-5 error rate in this contest.
Instead of applying arbitrary convolutions like AlexNet, GoogLeNet has inception modules involving 1 × 1, 3 × 3, 5 × 5 convolution sublayers with a 3 × 3 max pooling operation block acting as parallel arbitrary operations. Those inception blocks receive data from the previous layers, then apply arbitrary parallel operations on the same data. For reducing calculation loss, 1 × 1 convolution operation applies before the parallel convolution operations. However, 1 × 1 convolution sublayer is placed after the for max pooling layer inside inception module. In each branch of Inception layer, different features are calculated on the data received from the previous layer. Then each output is concatenated at the end of these parallel operations as an input for the next layers of this CNN. Instead of using full connected layers, this model uses inception modules to surpass overfitting problem.
In this network, max pooling operations are performed between some blocks for reducing the information transferred from precious blocks. Also, in GoogLeNet, at the end of the network there is an average pooling layer available unlike max pooling layer used in AlexNet and ResNet.

Previous work
In our previous work , we compare the performances of AlexNet, Goo-gLeNet and Resnet50 CNNs for the classification problem of benign and malignant melanoma cancers on dermoscopic images. In our last work (Favole et al., 2020, may), we describe a region of interest-based approach for the classification of dermoscopic images of skin lesions.

Proposed CNN model
In our previous work (Favole et al., 2020, may), we used saliency map features to find the actual lesional area in the dermoscopy images obtained from ISIC Dataset (reference). Following that we cropped the lesional regions from each image to feed the CNN's in our experiments. We applied data augmentation to whole ISIC dataset to balance benign, malignant and other classes. After balancing each class, we used those augmented images in our experiments. According to our experiments which we applied on AlexNet, Inception1 and ResNet50, the accuracy scores were not improved when we compare with our first study results .
Data augmentation mostly preferred for increasing the amount of data by adding modified copies of the actual data. We have looked for a common expression for the benign and malignant without any data augmentation. Given the reduced resolution while feeding CNN input layers, data augmentation is not useful for increasing accuracy with the well-known CNN methods. Despite this, we focused on upsampling the CNN's output for preserving useful information obtained from dermoscopy images.
After convolution operation has been performed, some meaningful information which summarizes the previous layer will be obtained. In deep networks, going deeper is not the solution of achieving good performance results. In some cases, due to gradient vanishing problem or obtaining more detailed information, some anomalies can be occurred in the outputs. Deconvolution following a convolution can be the solution of those issues (Odena et al., 2016). Deconvolution cannot be considered as opposite of convolution. This is a transpose of convolution and can also be considered as an up sampling operation. In our proposed model after the least convolution layer at GoogLeNet model we inserted a 3x3 deconvolution layer (see Figure 1).

Software and hardware configuration
In the experiments, we used a 2.40 GHz 8 Core Intel 7 4700HQ CPU, 16GB of RAM device. The algorithms have been developed and tested in MATLAB R2020b with Deep Learning Toolbox. Experiments for the proposed CNN was conducted on NVIDIA GeForce GT 750M Graphic Processing Unit (GPU) which is available in our device.

ISIC dataset
The dataset used in this work is the International Skin Imaging Collaboration: Melanoma Project (ISIC), public archive of clinical and dermoscopic images of skin lesions International Skin Imaging Collaboration (ISIC). We selected a subset of 21,570 dermoscopic images form 23,906 images available in this dataset. In this subset, 19,373 benign and 2197 malignant images were allocated for binary classification.
Recent studies have addressed ISIC dataset images for solving the skin lesion classification problem. In those studies, various subsets of ISIC dataset with limited number of images have been used for the experiments.
In this work, our main goal is to classify skin lesion images regardless of high or low resolution. Thus in our previous work and in this extended version, we are using whole ISIC dataset in the experiments

Data preprocessing
The images available in ISIC dataset have different resolutions. To feed the network with these images we applied basic data augmentation and reduced the size of images. As in GoogLeNet, our proposed model accepts 224 × 224 sized images as the input of the network. For this purpose, while resizing each 21,570 images we applied a bicubic interpolation to preserve some details.

CNN parameters
For a meaningful performance comparison with previous experiment results, we used the same configuration parameters as before. To apply binary classification with our proposed model, we modified the last softmax layer outputs to 2, defined the learn rate factor as 10 and set the bias learn rate factor to 10.

Training parameters
As in our previous work, we used the same 80% training and 20% validation subsets excluded from this dataset. We have set the learning rate as 0.0003, mini batch size as 10 and number of epochs as 5 like in our previous work.

Performance comparisons
In our previous work , we applied experiments on the same training and validation datasets with AlexNet, GoogLeNet and ResNet-50 architectures. According to the previous results, ResNet-50 achieved the best classification accuracy and GoogLeNet achieved the second and the closest result after ResNet-50. AlexNet had achieved least accuracy but best time complexity result in those experiments.
We have repeated the experiments with our Proposed CNN on the same training and validation datasets to compare with previous results. After these experiments we also calculated polygonal area metric (PAM) (Aydemir, 2020, january) for previous and current experiment results. PAM is calculated from well-known performance evaluation metrics as classification accuracy (CA), sensitivity (SE), specificity (SP), area under curve (AUC), kappa (K), F measure metric (FM). According to the experiments our proposed CNN achieved the best classification accuracy with 0.9309 and sensitivity with 0.9739, kappa with 0.9620 and F metric with 0.9620. Detailed information about CNN performances can be found in Table 1. During the experiments, these four models have reached these scores at 5 epochs with 8625 iterations.  Figures 2 and 3).

Performance evaluation of CNNs
In our previous work, we stated that ResNet-50 was the best by the classification accuracy results. But, in terms of time complexity ResNet-50 was the worse with 354 min 46 s training duration. Although AlexNet was the best according to time complexity result with 239 min 4 s, the worst accuracy result was also calculated with AlexNet. According to these comparisons, GoogLeNet was the best solution according to the cost/performance evaluation.
In our current work, we modified GoogLeNet by adding a deconvolution layer to improve its performance and decrease time complexity. As a result of current experiments

Conclusion and future work
In this study, we propose a CNN model for differential diagnosis of benign and malignant skin lesions. As a preliminary work, we compared AlexNet, GoogLeNet and Resnet-50 performances on ISIC dataset to find the best working model on this problem domain. According to the previous results, ResNet-50 achieved the best accuracy result despite its worst time complexity score. Although AlexNet achieved best time complexity score, with this architecture we observed the worse accuracy scores among three CNNs. We concluded that in future works GoogLeNet based CNN's may perform good results in terms of accuracy and time complexity.
In our proposed model, we used an additional transposed convolutional layer after the last convolution in GoogLeNet. By applying this deconvolution, we want to make an upsampling operation to preserve some high level information. We have tested our proposed CNN with the same dataset, parameter configuration as in our previous work with AlexNet, GoogLeNet and ResNet50. We calculated our CNNs performance metrics in terms of classification accuracy (CA), sensitivity (SE), specificity (SP), area under curve (AUC), kappa (K), F measure metric (FM), polygonal area metric (PAM) and time complexity. We also calculated SE, SP, AUC, K, FM and PAM scores for the results of the previous experiments with AlexNet, GoogLeNet and ResNet50. Our proposed CNN achieved the best CA, SE, J and FM scores. Although ResNet-50 achieved the best PAM and AUC scores, our proposed CNN obtained closest results on the same metrics. Our proposed CNN decreased time complexity obtained with the original GoogLeNet.
According to the current studies in this field, deep learning methods yield to promising results in the problem domain of differential diagnosis of skin lesions. Considering the results obtained with modified GoogLeNet, we will try to design a problem specific CNN for benign and malignant melanoma diagnosis. Our main goal is to produce an online teledermatology system based on a CNN model which can diagnose melanoma and other types of skin lesions with high accuracy in minimum time.

Disclosure statement
No potential conflict of interest was reported by the author(s). Notes on contributors E. Yılmaz received his M.Sc. and Ph.D. degrees from Computer Engineering Department at Karadeniz Technical University in June'18. In January'02, he joined Karadeniz Technical University Department of Informatics, Trabzon/TURKEY as academic staff member. As a senior lecturer, he teaches lectures on Information Technologies. He was a post-doctoral fellow at LISITE (Laboratoire D'Informatique, Signal, Image, Telecommunication et Electronique) de L'ISEP, ISEP, Paris/FRANCE between dates September 2019 and September 2020. His current research includes the development of noise reduction, semiautomatic segmentation and classification algorithms using deep learning methods to increase non-invasive diagnosis of different lesion types (skin lesions and dental lesions lesions) detected in medical images (dermoscopy, 3D CBCT images) Ercüment Yılmaz is also the founder of Yılmaz Bilişim, which is located in Trabzon Technology Development Zone. Yılmaz Bilişim is a R&D company that has been established with the aim of developing 'Road Traffic Simulation Software' product.