Machine learning approach for classification of maculopapular and vesicular rashes using the textural features of the skin images

Abstract Skin, being the largest organ of the body, suffers different disorders, and one such is rashes caused because of infections. The rashes appear in different forms, and most often their texture features are different. The proposed algorithm classifies maculopapular and vesicular rashes of skin conditions using the machine learning approach. The initial pre-processing involved the segmentation of the rash region. The characteristics of the rashes were extracted from the skin images, and the Gray-Level Co-Occurrence Matrix (GLCM) method was incorporated for extracting the texture feature. The backpropagation neural model was trained with the rash images. The features extracted from the unsegmented and the segmented images were taken separately and trained and tested with the neural model. The performance of the model was studied for accuracy, sensitivity, specificity, and F1-score values. The developed machine learning algorithm has an average accuracy of 83.43% on the segmented images.


PUBLIC INTEREST STATEMENT
The pandemic has created a difficult situation for individuals seeking consultation with their preferred doctors. Due to limited hours of consultancy or restricted movements, patients have had the discomfort of waiting for their doctors at hospitals for their regular check-up or consultation. The telemedicine has ensured life supporting services in many areas of healthcare. The latest developments in teledermatology provides an opportunity to serve remote or isolated patients. The proposed machine learning method classifies the rash type, utilising the attributes of skin rashes. These kinds of technological developments in the area of artificial intelligence and machine learning are the support systems for developing automated diagnosis tools.

Introduction
Skin, being the largest organ in our body, gets constantly bombarded with various external and internal stimuli. There are different types of diseases or disorders viz., warts, moles, skin cancer, rashes, measles, chickenpox, etc. that affect skin conditions (Lipworth & Topic Editor, 2020).
A study carried out in Upper Egypt indicates the presence of skin diseases among rural population with about 86% of the population having one or more skin diseases (. Abdel-Hafez et al., 2003). A similar pattern of rural population affected by skin diseases is reported by the works carried out in Nepal, and different parts of India (Bommakanti & Pendyala, 2017;Gupta, 2015;Jain et al., 2016;Shrestha et al., 2014).
Today, healthcare specialists and dermatologists concentrate on researching and practicing in urban areas. People from the rural areas lack access to better healthcare, due to the scarcity of transportation and communication. There are many challenges faced in the set-up and functioning of proper laboratory services at rural health centres (Olusegun, 2012). The lack of funds, irregular power supply, and availability of skilled technicians are a few of the constraints that hinder the functioning of rural medical laboratories.
In the case of a viral fever with the presence of rashes, the morphological details of the rashes present on the patient play a major role in determining the differential diagnosis of the patient (McKinnon & Howard, 2000). As the complexity and the number of features associated with the disease increases, the diagnosis, and the recognition system become harder. A computer algorithm that utilises image processing, feature extraction, and classification can be employed. This in turn helps in diagnosing the disease early. The image processing approach helps in automation, and the image processing technique acts as an assistive device to the dermatologists helping to bring in the efficacy in teledermatology.
The role of teledermatology lies in the following segments: referral, consultation, diagnosis, treatment, review, follow-up, education, teleconferencing, and business plan. To provide education and support, there is a need for collaboration between the healthcare sector and the dermatologists. Therefore, teledermatology is a step forward in better dermatological care in general, and aesthetic care, in particular (Thomas & Kumar, 2013). The three important merits of teledermatology are efficacy, time reduction, and economic value.

Literature review
Fever and rash are common signs of any disease. They may exist concurrently due to independent unrelated processes (such as chronic eczematous dermatitis and incidental influenza) or due to a common etiology (Lipworth & Topic Editor, 2020). Typical illnesses with the symptoms of fever and the rash include viral fevers such as measles, rubella, chickenpox, etc. Additionally the drug reactions exhibit the skin rash as one of the prominent clinical features.
Skin rashes are classified into various categories like maculopapular, vesicular, etc., based on their size, morphology, appearance, and distribution. In viral diseases, maculopapular rashes are the most common type of rashes observed (Goldman et al., 2007;Kang, 2015). The maculopapular rashes generally have a distinct flat red bump, and the experts diagnose it while inspecting the origin of the rashes to rule out if it is contagious. The rashes can be infectious or non-infectious. For the clinical diagnosis of these rashes, in addition to the type of the rash, the candidate's travel history, animal contact, medications, and exposure to natural environments also need to be considered. On the other hand, vesicular rashes are clear fluid-filled rashes that measure less than 5 mm. Common diseases associated with this are chickenpox, herpes, and hand-foot-mouth disease (Dermnetz.org, 2015). The skin manifestation of COVID-19 also indicated sign of the maculopapular type of rash (Mariyath et al., 2021).
Typical diagnostic methods used by the specialists to identify them are detailed history and examination findings. However, as the number of features associated with a certain disease increases, the diagnosis and the recognition system becomes harder. Hence, computer-based algorithms can be employed. A typical machine learning technique for diagnosing skin diseases involve an image processing approach, where the input skin area of interest is acquired and analysed.
Gray-Level Co-Occurrence Matrix (GLCM) provides details of the different combinations of Gray levels that co-occur in an image. GLCM is a statistical method that examines the texture of a rash by looking at an image or an image section having special relationships (Sampathila & Martis, 2020. The study of textural features using GLCM is introduced in different areas of medical image processing. Classification experiments were worked on for classifying three types of diseases, namely, herpes, dermatitis, and psoriasis (Wei et al., 2018). The initial step was to undertake the pre-processing of the original data image. Specific areas of the skin lesions were rightly divided, and vertical image segmentation was incorporated to improve the accuracy of identification. GLCM was used not only to extract the features from the images of these skin diseases but also the texture and the colour features accurately. For the colour features, the marker-controlled watershed algorithm was incorporated. GLCM was used to find certain parameters such as contrast, correlation, entropy, uniformity, and energy. In the end, the support vector machine (SVM), which was more suitable to be used when the sample size was small, was introduced for the classification (Wei et al., 2018). In addition, GLCM was used for extracting the brain textural information for classifying dementia or applied for retrieving the MRI images (Bhalerao et al., 2017;Sampathila & Martis, 2020).
The classification of skin lesions was tried out using the features that were extracted from skin images and the classifier was designed using the discriminants. This model of discriminant analysis achieved a success rate of 97% (Maglogiannis et al., 2005). Another attempt of classification of skin diseases was tried out by using the neural networks based on the analysis of the texture  feature. The computer algorithm was developed using a multi-layered feedforward network and was trained with the features of the GLCM. The features extracted included energy, correlation, contrast, and homogeneity. The authors demonstrated the performance of their model with an accuracy, sensitivity, and specificity of 80%, 71.4%, and 87.5%, respectively (Islam et al., 2017). An automatic detection of the diseases by incorporating a convolutional neural network (CNN) has its performance measured by AlexNet implementation. This performance measure showed an accuracy of 96.5% (Vijaya & Srinivas Reddy, 2018a).
Although a variety of techniques are been opted previously for the automatic detection of skin abnormalities, the textural features have been specifically seen used in the analysis of medical data in a wide variety. The most common models of classification used for this purpose are SVM, CNN, and multi-layered feedforward networks. This paper presents the classification of rashes, especially considering the sub-classes such as maculopapular and vesicular types based on an artificial neural network model. In this method, GLCM features are extracted from the rash images and are classified accordingly.

Methodology
The classification of the images of skin rash has been implemented with a backpropagation neural network. The images used for the study have been obtained from AccessMedicine (Vijaya & Srinivas Reddy,) and the web resources of DynaMed (Michael & Topic Editor, 2019;Neilan & Topic Editor, 2019). The images of maculopapular and the vesicular skin typically have different visual details as shown in Figure 1.
A supervised model of neural network has been trained and tested for the desired rash classification. Figure 2 describes the general workflow of the classification system of the image of skin rash. The major elements of this system are the input unit, pre-processing unit, feature extraction, and a classifier.
The inputs to the system were the skin images, which consisted of two types of rashes with a certain health condition such as fever. The study was carried out in two phases. In the first phase, we considered the images without applying the segmentation technique. In the second phase, the study was carried out applying the segmentation to extract the rash region. In both cases, the feature set was extracted and used for training and testing the classification model. The images used for the study were generated using the cropped rash regions from the reference set with the image manipulation tool (Neilan, 2020) available online. We considered 45 and 54 images of the maculopapular rash and the vesicular type respectively, with a size of 106 × 90 pixels each.
The pre-processing stage improves the quality of the considered image, helps in suppressing the insignificant region of the data, and enhances the rashes spreading over the skin region. The preprocess comprises the re-sampling of an image, noise removal, and segmentation of the image in a uniform size of 106 × 90 pixels. As part of pre-processing, we tried to measure the impact of image segmentation on the classifier accuracy. For first set of experiment, we used the images directly, and for second set of experiment, image segmentation is applied before using it for the classification. The variance in the value of accuracy will help us to decide the overall approach in which we can decide if we need to include image segmentation as one of the mandatory steps before the feature extraction from the image under consideration. Hence, we have carried out our experiments with and without applying segmentation. The segmentation separates the foreground (rash region) from its background (other skin regions) with the Otsu Method and the K-Means technique in MATLAB (R. Khandekar et al., 2021). Otsu method is one of the most successful segmentation methods used for segmenting the region of interest (Rohan Khandekar et al.). K-Means is another segmentation technique. K-Means partition the available image into different regions of interest that include background and rash region by selecting an appropriate number of clusters (Thanh et al., 2020).
The basic idea of extracting the required features is to simplify the process and omit the nondominant features, if any, from the required data (Amiri et al., 2014) by reducing the number of variables that may be required to depict a large set of data and save the memory space and the computational power during the analysis. The output obtained after the extraction of features will be carried out as the input for the classification part.
The features of GLCM are the second-order statistical feature (texture) and extracted from the images to differentiate the two classes of interest. This matrix method has its number of rows and columns equal to the number of gray levels in the image. Since the features of GLCM are very sensitive to the size of the texture sample due to their large dimensionality, it is required to reduce the number of gray levels present (Albregtsen, 2008). The important features considered include contrast, correlation, energy, homogeneity, entropy, variance, kurtosis, and skewness.

Contrast:
The difference in colour helps us to differentiate an object (here, the rash) from its background. The contrast is described with equation 1.
Correlation: It measures the similarities between the two signals. The equation for correlation is shown in equation 2.
(2) Energy: The value ranges from 0 to 1. The value of energy in the case of a constant image is 1. The equation for energy is shown in equation 3.
Energy ¼ ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi Homogeneity: The value typically ranges from 0 to 1. It returns the value that measures the closeness of distribution of elements in the GLCM. The equation for homogeneity is shown in equation 4.
Entropy: A homogeneous image tends to have high entropy while an inhomogeneous image has low entropy. The equation for entropy is shown in equation 5.
Variance: It can be used for identifying sharp details such as the edges in an image. The equation for variance is shown in equation 6.
Kurtosis: In the presence of a low amount of noise and low resolution, the Kurtosis value tends to be lower. The equation for Kurtosis is shown in equation 7.
Skewness: It can be defined as the measure of the asymmetry of the probability distribution of a real-valued variable about the mean. The darker and the shinier surfaces are likely to be more positively skewed than the light and the dull images (itl.nist.gov, 2013). The equation for skewness is shown in equation 8.
Here, G is the number of gray level, µ the mean, P (i, j) the corresponding element of the GLCM, µx, µy are the mean values of Px and Py respectively, and σx, σy are the standard deviations of Px and Py, respectively.
The relative occurrence of the intensities of a pair of pixels defined by the distance ranging from 1 to the size of the image in the different directions is represented in the matrix. A matrix of the image with a predefined distance and angles of 0°, 45°, 90°, and 135° is developed. The features obtained from a different set of distance and direction vary from each other. The features obtained in different directions are generally averaged, but this leads to the loss of directional information, which would eventually lead to poor accuracy in the classification (Singh & Srivastava, 2015). The values of contrast, correlation, energy, and homogeneity have been calculated for the four directions and engineered a feature vector. The feature vector comprising of 20 texture features has been extracted for all the rash images.

Classification
The accurate classification depends on the learning techniques. In this section, the results obtained during the classification of skin rashes by supervised and unsupervised models have been reported. The backpropagation network (BPN) is modelled as supervised. The unsupervised model is designed with the support vector machine. Both the models are trained and tested with the skin images and the performance of the classifier is analysed with the confusion matrix.
The backpropagation algorithm is a multilayer feedforward system. For a given pair of training input-output, the input is to be classified into the correct group using this algorithm. By adjusting the parameters of the model, we can minimise the cost function (Deepa & Sivanandam, 2011). The weights are updated using the gradient descent method. The main goal of this network is to give almost the exact responses to the inputs that are almost the same as the training input.
A BPN is trained in three stages: (i) Feedforward input training (ii) Backpropagation of error (ii)Updating the weights The BPN of an input layer with a size of 20, a hidden layer of 10, and an output layer with 2 nodes is trained. The neurons of the hidden and the output layer have bias whose activation is 1. The output obtained in a BPN can either have a binary value or bipolar value.

Performance evaluation of the screening tool
The BPN algorithm was applied to both the segmented and the unsegmented feature sets and was repeated ten times. The iterations were split such that the first five took 70% data for training, 15% for validation, and 15% for testing, whereas the next five iterations took 80% for training and 10% for validation and testing each. Each iteration gave a confusion matrix from which the accuracy, sensitivity, specificity, and F1-score were derived.
Accuracy is the ratio of samples that were correctly classified into the total number of samples. The equation for the accuracy is shown in equation 9.
Sensitivity is the ratio of samples correctly labelled as positive to the actual positive samples. It is also called recall. The equation for sensitivity is shown in equation 10.
Specificity is the ratio of the correctly labelled negative samples to the actual negative samples. The equation for specificity is shown in equation 11.  F1-score is given by the harmonic mean of precision and recall. The equation for the F1-score is shown in equation 12.

Results
The results obtained from the developed machine learning algorithm for screening skin rashes under two conditions, which include maculopapular and vesicular, and their significance have been reported here. It also presents a comparative study carried out with the region of interest separated and the direct input. The rash area of the image is initially segmented and separated from its background. The result obtained after segmenting the maculopapular rash is as shown in (Figure 3, Figure 4).
We studied the performance of the developed classifier, and the resulting confusion matrix is shown in Figure 5 (a) and the ROC plot in Figure 5 (b). The test accuracy of the proposed  classification algorithm to classify the given skin rash image as maculopapular or vesicular is 88%. Further, the results were obtained by repeating the test multiple times and the average values of the results were compared. Table 1 shows the average values of accuracy, sensitivity, specificity, and F1-score recorded during the study of the designed classifier. From the table, we can notice that there is some improvement in the accuracy when segmented images were used for classification.
The computer-aided design (CAD) based automated teledermatology is a requirement in the overall management of resources. The proposed algorithm is an interesting algorithm that helps in automated diagnosis of the rash type and further assistance in the tele-diagnostic system (Upadya et al., 2019).
The developed tool will be integrated with the telemedicine framework. Further, with the intelligent deep learning neural network, or using object detection tool (Jaisakthi et al., 2018), a higher performance teledermatology system can be deployed.
Rural health centres face a lot of challenges in terms of setting up of proper laboratory facilities (Olusegun, 2012), and often have a shortage of specialised health consultants for the diagnosing patients (Planning Commission of India, 2019). In such scenarios, the framework suggested for teledermatology implementation (Hameed et al., 2020;Upadya et al., 2019) can play a significant alternative for the early diagnosis of the patients suffering from skin related issues at rural locations. The framework (Upadya et al., 2019) needs the image processing unit as one of the major components to help in the classification of the images for disease identification. The result obtained from this current rash image classification work encourages the utilisation of machine learning approach as one of the possible techniques to automate the classification problem.

Conclusions
The proposed machine learning algorithm focuses on the classification of skin rashes into the maculopapular and vesicular types. The proposed study used a segmentation method with the Otsu, the K-Means methods, and the Image Segmenter app in MATLAB on a Core i5 system. The rash regions were selected; the features of GLCM selected for the study include contrast, energy, entropy, kurtosis, correlation, homogeneity, variance, and skewness. Then, the performance parameters of the BPN were studied. The performance parameters, including the values of accuracy, sensitivity, specificity, and F1-score of the proposed BPN model are 83.43 ± 5.3%, 92.39 ± 3.54%, 72.65 ± 11.9%, and 86 ± 3.72%, respectively. Once developed, the tool can be integrated with a mobile-based framework. The developed model ensures the possibility of its usage in the application of teledermatology for screening rashes. Niranjana Sampathila, Harishchandra Hebbar & Sathish B Pai, Cogent Engineering (2022), 9: 2009093.