Efficient diabetic retinopathy diagnosis through U-Net – KNN integration in retinal fundus images

Diabetic retinopathy (DR) is a retinal disorder that may lead to blindness in people all over the world. The major cause of DR is diabetes for a longer period and early detection is the only solution to prevent the vision. This paper focuses on the classes of Normal eye (No DR), Mild NPDR (Non-Proliferative Diabetic Retinopathy), Moderate NPDR, Severe NPDR, and PDR. On retinal fundus images, an effective method for identifying diabetic retinopathy (DR) is proposed by combining the U-Net architecture with the K-nearest neighbours (KNN) algorithm. The U-Net architecture is used for segmenting exudates in retinal pictures, and the KNN algorithm is used for final classification. The combination of U-Net and KNN enables accurate feature extraction and efficient classification, effectively overcoming the computational challenges common to deep learning models. The experiments are carried out utilizing a publicly available dataset of retinal fundus images from Kaggle to assess the effectiveness of our suggested strategy. The proposed architecture provides precise output when compared to other models GoogleNet, ResNet18, and VGG16. The proposed model provides a training accuracy of 82.96% and detection of PDR with high accuracy in the short period which prevents loss of vision in early stage.


Introduction
Diabetes mellitus is an abnormality in blood sugar levels has become a usual thing among working-aged people nowadays and the cases increased exponentially.In 2040, it is being expected to have more than 600 million people being diabetic [1].Diabetes causes a lot of serious consequences like heart disease, stroke, retinopathy, chronic kidney disease, and so on.One of the major diseases was diabetic Retinopathy since it is related to a very delicate vision problem.Approximately, about 103 million people were affected by Diabetic Retinopathy in 2020, and it is being expected to reach around 160 million by 2045 [2].Diabetic retinopathy is a retinal neuro-vascular abnormality that occurs due to the degeneration of blood vessels in the eye.The retina is a layer that covers the back portion of the eye.It is the most prominent part of the vision.The fovea present at the retina is the point for the incident of reflected light for normal vision.In the case of Diabetic Retinopathy the nerves and the cells present in the retina get affected by the occurrence of some external things such as exudates, venous beading, microaneurysms, intraretinal haemorrhages, and neo-vascularization.When the patient is affected by DR who can be prevented from further damage of vision.But the DR cannot be treated and cured completely.DR is a permanent disease that is irreversible.And it is only preventable with treatment if detected early [3].Early detection was the greatest challenge for ophthalmologists, since they manually check for the presence of external symptoms in the fundus images of the eye.
Fundus image is the 2D representation of the 3dimensional view of the eye taken using a Fundus camera (Figure 1), ophthalmoscopy, fundus photography (FP), Optical Coherence Tomography (OCT), and OCT angiography.The degree of the symptoms was very less in the initial stage.The initial stage of DR was similar to the normal healthy eye.The clinicians may fail to detect the appropriate stage with accurate results.These problems make a drastic improvement in the technology for introducing AI methods.The deep learning methods were used for the detection of disease such as DR.The AI models were developed to learn the features of the healthy and affected eye.The features from both trained and input images were compared for classifying the DR with better accuracy [1][2][3].
Depending upon the severity, DR can be classified into 5 stages Normal eye, Mild NPDR, Moderate NPDR, Severe NPDR, and Proliferative DR [1].Deep learning achieves higher accuracy, and higher efficiency and reduces human needs.Also, it has a unique capacity of learning the features of a large number of datasets to train the model automatically under different levels of representation and it's the most important advantage over other methods of AI.The convolutional neural network was the most basic network for image classification under deep learning methods.CNN was mainly used for image recognition and tasks that involve the processing of pixel data.CNN can automatically learn features from data (images).The major steps for CNN are image segmentation, image classification, and detection [1].CNN developed various models like VGG16, ResNet18, inception V3, AlexNet, and NasNet mobile.
This research focuses on defining five distinct categories: "Normal eye" (also known as "No DR"), "Mild NPDR" (Non-Proliferative Diabetic Retinopathy), "Moderate NPDR", "Severe NPDR", and "PDR" (Proliferative Diabetic Retinopathy).A method of notable performance for the detection of diabetic retinopathy (DR) is introduced in the context of retinal fundus images.This technique involves the seamless integration of the K-nearest neighbours (KNN) algorithm and the U-Net architecture, creating a strong framework for the identification of DR instances.
Generally, U-Net is a architecture which was mainly used for in which a CNN model incorporated to identify best features [4].Unlike other models, U-net does not make use of dense layers and fully connected layers.Instead, U-Net uses convolutional and max pooling layers only.It performs recognition of the object in an image after semantic segmentation.The proposed mechanism which includes U-Net is explained in section III.

Related works
This section comprises most of the works related to the detection of Diabetic Retinopathy using Machine learning methods.Artificial neural network is the building blocks of deep learning and fundus images as the input for identifying the presence of DR [5].A high-level CAD algorithm or procedure to detect exudates, haemorrhages, and microaneurysms [6].The most useful deep neural network (CNN) was used for the automatic detection of retinal abnormalities using fundus images obtained from publicly available datasets.Some of the datasets are IDRID, ROC, and local hospital datasets [1,2,[7][8][9][10].An ensemble CNN is used for providing accurate results [3,11,12].A special type CNN called U-Net which comprises both convolutional and de-convolutional layers were used for identifying DR [4].Alternate architectures like VGG16, ResNet, and the Inception family were used to build the model and took account of the 5-class classification of DR [13,14].K-Nearest Neighbour clustering also provided good performance [15].The early detection of DR was done with the identification of RED (Retinal Exudates and Drusen) in fundus images [16][17][18][19].SMFC (Super pixel Multi Featured Classification) method in which the input image is segmented into superpixels [20].The training involves the segmentation of images [21][22][23][24].The detection was made according to the presence of three types of lesions such as red lesions, yellow lesions, and white lesions [25].The lesions can be classified using Recurrent Neural Network [26,27].The Bayesian model was used for training the dataset [11].DR detection was done using a choice of colour space as one of the important factors [28].
A median filter and an adaptive histogram equalization techniques were used for the reduction of noise [29].An entropy principle has been proposed for processing the fundus image [30].Naive Bayes and SVMS classifiers were used for the feature selection and classification of pixels [31].HOG and GLCM were used for feature extraction with low complexity [32].A combination of Grayscale morphology, active contour method, and region-wise classification played a major role [27,33].The use of support vector machine classifiers for detecting hard exudates has been evolved [34].HSV was used to overcome the unequal brightness and poor contrast of fundus images and trained using CNN architecture [35].Multi-feature extraction and training were done on patch-based CNN provided some additional merit [36].GLCM feature extraction method along with SVM K-NN and decision tree classifiers were made of used to identify the affected eye [37].
The segmentation of the optic disc and blood vessels was done in the preprocessing step using two different U-Nets.The preprocessed, enhanced image is sent into the CNN models, which use transfer learning as their foundation [38].Deep learning image processing technology is now crucial to computer-aided systems for detecting anomalies in diabetic retinopathy [39].The segmentation and abnormality presence were improved during preprocessing, only relevant features were acquired during extraction [40].A new approach with two main steps for automatically classifying diabetic retinopathy (DR).In the first step, two different U-Net models to identify the optic disc (OD) and blood vessels (BV) by breaking down the image into parts during the initial processing [41].The proposed method contrasts with GA, GWO and numerous other current cutting-edge diabetic retinopathy classification approaches [42].To address issues with unbalanced datasets, the Gaussian space theory and generic data structures increase the dataset's accuracy and quantity [43].
Most of the works concentrated on the detection of exudates which is the prominent symptom of diabetic retinopathy.The segmentation provides a better patch of exudates which leads to identify DR is easy.The segmentation plays a vital role for forming patches of exudates which leads to easily identifying the presence of DR with good accuracy.Hence the proposed method, segmentation is implemented with the combination of U-Net and KNN for getting better accuracy for the early detection of DR.

Dataset
The dataset of fundus images which is required for this paper is taken from a publicly available large dataset from Kaggle Diabetic Retinopathy Detection.There is possibility of occurring additional white spaces around the eye image.So it is necessary to preprocess the given image into the dimension of 256 X256 to obtain better results.The dataset consisted of more than 35,000 images under five basic classifications of diabetic retinopathy which are No DR, Mild NPDR, Moderate NPDR, Severe NPDR, and PDR.This classification was made under Early Treatment Diabetic Retinopathy Study (ETDRS).According to the available datasets, the proportion of Normal eye images is always greater than the affected eye.Both the training and testing were done with the Kaggle dataset under the categories train dataset and test dataset.
The performance of the model developed depends on the dataset used for training.The proportion range obtained for the training of the dataset using U-Net is given as following Figure 2.

Proposed work
The flow of the process is depicted into following block diagram given in Figure 3.
The input image is pre-processed first.The preprocessed image is given as the input for extracting the features.The featured output is given to the U-Net architecture.The encoder and decoder part of the U-Net model provides the segmentation of the image for better and more effective results shown in Figure 6.The classifier algorithm used after segmentation is  KNN, SVM, RF, DT, NB, and so on.But among these KNN plays a better role and provides high accuracy.The output layer is comparing the features and trains the dataset which makes the output as the appropriate detection of DR.
Adam optimizer is the most popular optimizer which is used for stochastic gradient descent of the training of the dataset for this particular model.It replaces the optimization techniques and Adam provides a faster computation.It is used for the estimation of first and second-order moments [44].
The model used for the segmentation process is U-Net architecture.The most important phase in digital image processing is image segmentation [12,36,37,44].The U-Net architecture has gained widespread recognition in the CNN model is extensively deployed for image segmentation [4].The algorithm for classification is done with KNN architecture.Unlike the CNN model, U-Net has more convolutional layers, relu activation layers, and max pooling layers only [44].
The U-Net was separated into 3 parts shown in Figure 4.
The encoder path follows a typical CNN structure incorporating multiple convolutional and pooling layers.These layers progressively reduce the spatial dimensions of the input image while capturing high level features.The encoder path effectively captures contextual information and extracts hierarchical features from the input image.Encoder is used to downsample the image.
The U-Net architecture incorporates skip connections establishing connections between corresponding layers of the encoder and decoder paths.These connections enable the decoder path to access and reuse high resolution features from the encoder path, thereby preserving fine-grained details during upsampling and enhancing segmentation accuracy.
The decoder path is responsible for upsampling the low resolution features maps obtained from the encoder path, thus restoring the original image's spatial resolution.It consists of upsampling layers followed by convolutional layers that concatenate feature maps from the corresponding encoder path layer.This combination of local and global information facilitates precise segmentation outcomes.
The U-Net architecture is illustrated in Figure 5.
As the name indicates, U-Net is a U shaped architecture with 2 portions namely contraction and expansion path [4].The contraction path comprises the encoder and the expansion comprises the decoder.The bottleneck at the centre contains the skip connection.A sigmoid function confines the pixel value in the range of 0 and 1 [6].
The U-Net architecture concludes with a final layer typically composed of a 1 X 1 convolutional layer followed by an activation function either sigmoid or softmax.This final layer generates a pixel-wise segmentation mask, where each pixel represents the predicted class or label for the corresponding input image pixel [45].Classification is done with KNN, CNN, SVM, RF, DT, and NB algorithms.KNN is selected since it provides better accuracy than other methods shown in Figure 8 [33].
An efficient and simple supervised learning technique for classification and regression tasks is the k-Nearest Neighbours (k-NN) algorithm.A new, unlabelled data point is compared to the "k" instances from the training dataset that are closest to it in the k-NN model.It labels the new instance with the class that has the most support among its k-nearest neighbours.The weighted average of the target values of the neighbours is computed for regression.
The choice of "k" is a crucial parameter, influencing the balance between noise reduction and decision boundary complexity.k-NN's simplicity allows it to handle multiclass classification and adapt to various data distributions.However, it has drawbacks: computational intensity, sensitivity to noise and outliers, and struggles with high-dimensional data.

Results and discussion
Deep learning evaluation metrics are utilized to evaluate and ascertain the performance and efficiency of deep learning models.These metrics furnish quantitative measures that facilitate a comprehensive understanding of the model's performance in a given task.Here, we present a compilation of commonly employed evaluation metrics in the field of deep learning.
In general, the result obtained may be true positive, true negative, false positive, and false negative.Here true positive and false negative are the two most complicated things since it indicates the presence of DR.The accuracy, precision, and loss are calculated using these predictions.The confusion matrix for this case was given in Figure 7.

Accuracy
The accuracy defined as the ratio of the total number of correct predictions to the total number of predictions, including true negatives (TN), true positives (TP), false negatives (FN), and false positives (FP).

Precision
Precision can be applied to address multiclass classification problems effectively.Precision serves as a valuable metric for evaluating the accuracy of positive predictions generated by a model.It involves computing the      Mathematically, recall is calculated as the ratio of true positives (TP) to the sum of true positives and false negatives (FN):

F1 score
The F1 score is a widely used evaluation metric in classification tasks that combines both precision and recall into a single value.It provides a balanced measure of a model's performance by considering both the ability to correctly identify positive instances (precision) and the ability to capture all actual positive instances (recall).
Mathematically, the F1 score is calculated as follows:

Specificity
The proportion of actual negative instances that were correctly predicted as negative by the model.

Specificity = TN TN + FP
Actual processing time to emphasize proposed model is approximately 12-13 h for 10 Epochs in 8GB RAM, Intel core-i5 12th generation processor CPU.Simulation and experimentation in this study were conducted using MATLAB version 2023a, a versatile numerical computing platform widely used for scientific and engineering applications.
The experimental results of models like Google Net, ResNet50, and VGG16 were obtained with around 79% accuracy (Table 2).By comparing these results with the proposed U-Net, it is more accurate and faster shown in Figure 9.The U-Net architecture is inspired by other architecture.Due to using KNN as classifier along with U-Net provides validation accuracy of 80.78% which is higher than other classifiers that is represented in Table 1.The detection and classification of Diabetic retinopathy is done with U-Net semantic segmentation and KNN algorithm in the proposed work.The paper provides the high accuracy and also prevents the damage of eye while predicting DR at the early stage.
The integration of U-Net and K-Nearest Neighbours (KNN) for diabetic retinopathy classification provides promising result but comes with certain limitations.The high-dimensional feature maps generated by U-Net may lead to increased computational complexity when utilized in conjunction with KNN.U-Net's local feature extraction design may not effectively capture the global context necessary for KNN's classification approach.The "curse of dimensionality" could affect KNN's performance negatively when applied to the extensive feature space produced by U-Net.Moreover, handling data imbalance, hyperparameter tuning, adaptability across datasets, and training data quality are crucial considerations for achieving optimal results with this integrated approach.

Conclusion
The U-Net -KNN model has demonstrated remarkable performance in diverse image segmentation challenges, establishing its position as a widely accepted model in the fields of computer vision and medical imaging.U-Net -KNN model provides the training accuracy of 82.96% and validation accuracy of 80.78%.The effectiveness of the U-Net -KNN extends to various image segmentation tasks, particularly in bio-medical image analysis such as cell segmentation, organ segmentation, and lesion segmentation.Its ability to capture both local and global information, along with the utilization of different convolutional techniques, and the incorporation of additional regularization methods.
Researchers and practitioners often customize and adapt the U-Net architecture to meet specific segmentation requirements.This adaptation may include variations in network depth or width, utilization of different convolutional techniques, and the incorporation of additional regularization methods.

Figure 1 .
Figure 1.Fundus imaging technologies -take pictures of the back of the eye in high resolution.

Figure 2 .
Figure 2. Distribution of the dataset among different classes.

Figure 3 .
Figure 3.The workflow diagram of the diabetic retinopathy detection.

Figure 4 .
Figure 4.The Functional overview diagram of U-Net architecture.

Figure 5 .
Figure 5.The architecture of the U-Net model.

Figure 6 .
Figure 6.Examples of Results of the U-Net model for Exudates segmentation.Left: Original image.Right: Segmented image (exudates have been marked in white).
ratio of true positives to the total number of predicted positives.In the context of multiclass classification, precision can be determined for each class separately by designating it as the positive class and treating the remaining classes as negatives.This approach enables an independent assessment of the model's precision for each class.Precision = TP TP + FP4.3.RecallRecall, also known as sensitivity or true positive rate, is an evaluation metric used in classification tasks.It measures the ability of a model to identify all relevant instances of a positive class from the total actual positive instances.In other words, recall quantifies the proportion of true positive predictions out of all actual positive instances.

Figure 7 .
Figure 7. Confusion Matrix.Figure 9. Comparative analysis of different deep learning architecture.

Figure 9 .
Figure 7. Confusion Matrix.Figure 9. Comparative analysis of different deep learning architecture.

Figure 8 .
Figure 8. Training and Validation Accuracy Curve for U-Net with different classifier.

Table 1 .
Results obtained from the various classifier with U-Net architecture.

Table 2 .
Results obtained from the deep learning architecture.