Computer aided detection of leaf disease in agriculture using convolution neural network based squeeze and excitation network

ABSTRACT The support rendered by artificial intelligence in plant disease diagnosis and with drastic progression in the agricultural technology, it is necessary to do pertinent research for the cause of long-term agricultural development. Numerous diseases like early and late blight have a significant influence on the quality and quantity of potatoes. Manual interpretation turns out to be a time-consuming process in sorting out leaf diseases. In order to classify various diseases like fungal, viral and bacterial infections in the potato leaf, an enhanced Convolution Neural Network based on VGG16 is used for potato leaf disease classification. Improved Median filter is also used which eradicates the noise to a greater extent. The convolution layers of VGG16 along with the Inception and the SE block are used in this research for classification. The global average pooling layer is used to reduce model training parameters, layer and Squeeze and Excitation Network attention mechanism is used to improve the model’s ability to extract features. The approximate calculations can be done by using soft computing. Compared with other traditional convolutional neural networks, the proposed model achieved the highest classification accuracy of 99.3%


Introduction
Potato is a well-known vegetable root and an essential meal in many countries over the world [1,2].Deep Learning (DL) has gained popularity in recent years since it has been used in dealing variety of problems that can be solved by using traditional Machine Learning techniques [3].Disease can spoil the quality and quantity of agricultural goods when it occurs during the growing season results in failure of harvest and premature harvesting [4].
Figure 1 explains the general procedure of extracting potato leaf images.The process involves taking image of the potato leaves, transfer it through Wi-fi to a cloud server, extract the data, process the data.Kaggle Dataset proposed by Divyansh Tiwari [4] was utilized for the study.More than 55,000 leaf image data (both healthy and unhealthy) of different variety of plants are included in the collection [4,5].The Machine Learning-based automatic system to identify and classify potato leaf diseases.Around 450 images comprising of both healthy and diseased leaves, obtained from publicly available database, were given as input for seven effective image segmentation algorithms [6,7].
The VGG16 gives a better performance for classification with an accuracy range of 97.89% as the result of classification for the input potato leaves been considered for the process and discussed about the Mask-R-CNN for the process of classification which explains that the discussed approach gives a better range of identification of diseases in potato leaves with an accuracy range of 98% [8][9][10][11][12][13][14][15][16] used the Kaggleavailable New Plant Diseases Dataset for the experiment and DL-based method to identify and classify plant leaf diseases.Two datasets are constructed from this dataset for the Machine Learning classifier experiment.CNN is a subset of image processing that are extremely successful and are a part of deep learning.There are many tools available for automatically detecting plant leaf diseases.This proposed Squeeze and Excitation Network based Convolutional Neural Network (SENet-CNN) initiatives might constitute the foundation for the establishment of professional support.These kinds of solutions could promote excellent, sustainable agriculture approaches and increase the security of food production.
There are still some challenges in plant leaf disease classification, which are as follows: Limited by experimental conditions, such as current platform and hardware, a large CNN network will cost a long training time and have a slow convergence rate.Long training convergence time will cause the final classification accuracy This paper is organized as follows: Introduction and Related works are depicted in Sections 1 and 2. Section 3 shows the proposed methods; Section 4 with the experimental results and finally Section 5 shows the conclusion.

Related works
Gowri Shankar et al. [17] have used Gray level cooccurrence matrix (GLCM) used to evaluate the image pixel pair relation which is the most needed part in the filtering and enhancing process gives a better accuracy range.Abdalla Mohamed Hambal et al. [18] reviewed the field of image noise reduction several linear as well as nonlinear filtering methodologies as well as compared the results for different filtering techniques.Sandeep Kumar et al. [19] and Geetharamani et al. [20] propose an unique exponential spider monkey optimization approach for correcting significant features from a high-dimensional set of features.
ArunPriya et al. [21] and Pooja et al. [22] presented Support Vector Machine (SVM) classification for effective leaf recognition, during which 12 leaf features are extracted and orthogonalized into 5 principal variables as well as fed into the SVM as an input vector.Maryam Ouhami et al. [23] review machine learning methods which use numerous data sources and are implemented to plant disease detection.
Sharma et al. [24] and Yegneshwar Yadhav et al.Anusha Rao et al. [28] investigate image enhancement as well as image conversion schemes.Subsequently, the extracted features were indeed utilized to train a classifier using Neuro-Fuzzy Logic.Drako et al. [30] investigated and compared Deep Neural Network (DNN) with the traditional RF algorithm for malware classification.N.Nandhini et al. [31] analyses the efficiency of the classification performed using SVM, K-Nearest Neighbor (KNN) and Decision trees based on the extracted characteristics.Mohamed Loey et al. [32] and Gobalakrishnan et al. [33] conduct a survey that introduce the utilization of DL in plant disease detection and analyze them in aspects of dataset utilized, models used and overall performance accomplished.
Punitha Kartikeyan et al. [34] proposed a smart and efficient technique for the detection of crop disease which uses computer vision and machine learning techniques called RF and able to detect 20 different diseases of 5 common plants with 93% accuracy.ANN was used by Vyawahare Vishweshm [35] and Manya Afonso et al. [36] to detect the disease.ANN model must undergo a training process with the range of accuracy at 65.68%.have discussed, in their paper, about the use of DL techniques for sensing blackleg diseased potato plants.Dor Oppenheim et al. [29] and Hossain et al. [37] proposed a technique for plant leaf disease detection and classification using KNN classifier.Garima Shrestha et al. [38], in their paper, gave a different perspective for plant disease detection using diverse algorithms based on CNN with the accuracy of about 88.80%.
Jothiaruna et al. [39] and Chaojun Hou et al. [40] proposed Advanced Comprehensive Color Features (ACCF) and Region Growing method were employed in this approach for the segmentation of disease spots.
Afifi et al. [41] proposed a tool employing two fundamental models, a Triplet network as well as a deep adversarial Metric Learning (DAML) strategy, were built using three CNN structures (ResNet18, ResNet34 and ResNet50).Mustafa et al. [42] develop a fivelayered CNN framework for automatically identifying plant disease using leaf images.In a real cultivation context, the proposed CNN model may indeed be employed as a preliminary warning tool or disease diagnosis system.Table 2 shows the comparison of CNN applications in agriculture.

Proposed work
The Squeeze and Excitation network based Convolutional Neural Network is used to classify potato leaf diseases in the proposed work.The images from datasets are transferred with the help of cloud server (IoT).The process of data collection is done in data acquisition step Figure 2.
In the next stage, preprocessing is done for the purpose of resizing the image size and reducing the noise.The preprocessed image is then taken for feature extraction to extract the key features.Finally, classification is done by using SENet-CNN.

i. Dataset origin and characteristics
This is typically done with the digital cameras taken photos on-site or under controlled conditions.The goal of this stage is to gather and prepare an appropriate image dataset to be used in the learning process.Obtain an image data which is utilized as input for subsequent processing.The image data should be in the following formats: bmp, jpg, png and gif.

ii. Data annotation
This annotation process aims to label the class and location of the infected areas in the image.The outputs of this step are the coordinates of the bounding boxes of different sizes with their corresponding class of disease and pest, which consequently will be evaluated   as the Intersection over-Union (IoU) with the predicted results of the network during testing.The red box shows the infected areas of the plant, and parts of the background Table 3.
iii.Data augmentation Data augmentation is a method of modifying data while keeping the data's essence, because 5100 images are still insufficient for optimal performance.To create the augmentation parameters used in this work, simple geometric transformations such as translations, rotations, scale changes, shearing, vertical and horizontal flips are used.

iv. Image pre-processing
This is a crucial stage in the image classification process.
Preprocessing is done in two steps: (a).Image resizing (b).Image noise reduction

Image resizing
The processing time for the detection and classification steps will rise if each image is not resized.If it has too much noise, then the image will not get ready for the process.To standardize the input images in the dataset, size of images should shrunk to 224 × 224 pixels.

Image noise reduction
Every electronic gadget receives and transmits noise.Images are damaged by impulse noise when transferred via channels due to noisy channels.Improved Median Filtering Process was utilized in this paper to achieve improved outcomes.

Improved median filter
Due to noise, image quality and feature extraction become unreliable.In this study, a nonlinear filter is utilized to de-noise the data.When comparing with all other filters, Improved Median Filter is considered as the better one for the noise removal process.

Algorithm 1: improved median filter
Algorithm for removing salt and pepper noise.
Step 1: A two-dimensional window is chosen and centred on the corrupted image's processed pixel p(x, y).

Step 2:
Arrange the pixels in the window that have been selected for the process in ascending order.The median pixel value is found as represented by P med .The pixel values are considered as two categories: one is maximum P max and another one is minimum P min of the arranged vector V 0 .Therefore, the first and last values of V 0 is P min , P max and P med is the middle element.
Step 3: Check the pixel range to find the uncorrupted pixels.The pixels which come under the range P min < P(x, y) P max , P min 0 and P max < 255 is taken as uncorrupted and it left unchanged or else it is taken as corrupted.
Step 4: For corrupted pixel there are two cases to follow: Case 1: If the pixel satisfies the condition P min < P(x, y) P max , P min 0 and P max < 255, then the corrupted pixel is replaced by P med .Case 2: In case, Case 1 is not fulfilled, then P med is considered as a noisy pixel.After this evaluate the variation between each pair of adjacent pixel across the vector V 0 and get the difference vector which is denoted as V D .Then the maximum difference is found and mark it's respective V 0 to the processed pixel.
Step 5: Repeat the process of step 1 and 4 for the entire image until the end of process.
Purposely, impulse noise of selected parameter, ranging from 0.1 to 0.8, is added to the input image, specifically to the pixels s(i, j) for 1 ≤ i ≤ M1 and 1 ≤ j ≤ M2, in order to corrupt it.To analyse the relative filtering capacity of several filters, the peak signal to noise ratio (PSNR) is utilized.
MAX I is the maximum value of pixel.The images of dimensions Ml × M2 pixels are used for simulations and Mean Square Error (MSE) is defined as The peak signal to noise ratio (PSNR) is clearly linked to the mean square error (MSE).The median filter is simple to use and can be used to de-noise a variety of noises Figure 3.  which specifies which layers of the initial network must be frozen during the pre-training phase and which layers are allowed to resume learning at a specified learning rate, forms the basis for the five convolutional layers.
To train the model on the supplied data set, a stochastic gradient descent optimization technique was employed in this research.While momentum and weight attenuation were adjusted to 0.9 and 0.0005, respectively, the initial learning rate was set at 0.001.

Input layer
The images considered as the inputs here are presented in this layer.The potato leaf image is taken as the input for the process.

Convolution layer
The convolution layer consists of five convolution block.It deals with both the spatial and channel wise relationships.The reduction process done in convolution is given in Table 4. Figure 6 This Rectified Linear Unit (ReLU) uses the parameter f (a) used as activation function and can be used to effectively solve nonlinear problems with the help of neural networks.The Equation (5) of a is the value of the neural network node.In Equation ( 5), W i,j     is weight, b i is the i bias value, x j is j pixel value.

Inception layer
Figure 7 Inception layer takes multi-scale information and extract some features for the purpose of creating the feature map.Max pooling is to store the most important features needed for mapping.

Global average pooling (GAP)
By averaging the whole pixels of each feature map globally, global average pooling (GAP) intends to provide an output for each feature map. Figure 8 shows a faceoff between fully connected layer and the global average pooled layer.The global average pooling layer is used to reduce model training parameters and layers.

Squeeze-and-excitation module
The attention technique on the channel had been introduced by the SE module.The squeezing and the excitement are indeed the two important phases.
As a result, the network could efficiently grasp crucial features, enhance its capacity to extract features and enhance the model's sensitivity to channel features.In Figure 5, SENet is positioned after the VGG16 convolutional layer in order to boost the network's focus on useful features and further weight the output characteristics of the entire convolutional layer.The network of the proposed SE module, which is created by stacking SE blocks, is seen schematically in Figure 9.The first step is the Squeeze function where X is the input.The input data should be in H × W × C format.Where, H -Height, W -Width and C -Channel Size.
If the channel shape is H × W and the ith channel is denoted as C i , then channel descriptor is Squeeze output has shape 1 × 1 × C as shown in Figure 7.After squeeze, next is the excitation phase.Channel descriptors in squeeze phase are fed to fully connected layer 1 as shown in Figure 7.The sigmoid function is given by where, δ symbolizes the ReLU function, and the output Z is a set of local descriptors for the overall channel map, T .After excitation phase channels are scaled with corresponding modulation weights as follows   where F scale (x C , s C ) refers to product between respective corresponding channels, among the scalar s C and the feature map x C ∈ R H×1W and x = |x 1 , x 2 , . . .x c | (Figure 10).

Algorithm 2: proposed SENet-CNN based classification
Step 1: Input data's are taken from the Kaggle dataset.
Images of Infectious diseases in potato leaves are considered.
Step 2: Preprocessing is done for input images Case 1: To standardize input images in the dataset, images obtained from many sources, maybe of different sizes, and must be resized to 224 × 224 pixels.Case 2: De-noising is done by using Improved Median Filter as per Algorithm 1.The PSNR and MSE value is calculated using the formulas: Step 3: Classification using SENet-CNN.The integrated SE module re-calibrates the channel dimension's original features to replace the fully connected layer with the greatest pooling layer.

Squeeze operation:
Step 5: Finally Classification output is obtained.

Experimental results
The experiment is conducted out utilizing a 2.3 GHz core i5 processor with 8 GB of RAM and MATLAB 2021a.The steps below demonstrate the outcomes acquired from the potato leaf image database.

(i) Database
The experiment is done on potato leaf Images from Kaggle dataset.The data were divided into three categories: training, validation and testing.To eliminate overfitting, validation data was used to tune network parameters and hyperparameters.To avoid overfitting, the train-validation-test data split percentage of 70-20-10 was being used.

Training phase
The dataset learning experiment was carried out using the Neural Network Convolutional method using the VGGNet family architecture model, specifically VGG16, which has 16 layers.The epoch specified with 32 batch size, and a learning rate is 0.01 to improve the performance of the model.The simulation parameters were shown in Table 5. Images with varying resolutions as well as sizes had been acquired from a variety of sources, including those collected from a potato plantation in Malang, Indonesia, PlantVillage, an openaccess image database and Google images.An obtained dataset of approximately 5000 images and classified them into five classes: Alternaria Solani, healthy, insectaffected leaves, virus affected leaves and Phytophthora Infestans as in Figure 11.12.
Figure 13 shows the leaf disease spots detection results.From Table 6, it is clear that the improved median filter gives better less noise density range when compared with other filters and the plot for this comparison process is given in Figure 14.
(iii) Visualization of Feature Extraction and feature map in SENet-CNN x ji (7) Standarddeviation :     Skew : where μ j denotes the mean value, σ j is the standard deviation, N is the total number of pixels and x ji is the pixel values (j = 1, 2 . . .

.n).
Texture: A texture of an image is one of the most common features as a regional descriptor in the image retrieval process.
The formulas for computing the feature values are Homogeneity :

Contrast
: Correlation : Here, P ij is the (i, j) th entry in a gray-tone spatial dependence matrix, N is the number of distinct gray levels in the quantized image.Shape: Another fundamental feature that aids comprehension of the image comparison is the shape.The shape features considered in this work are given in

Table 7
Area : 1 2 Solidity : A/H (15) Centroid : Here, λ 1 and λ 2 are the Eigen values.H is the hull area, A represents the area.
(i) Loss function and confusion matrix: The confusion matrix is used to determine how accurate a classification algorithm is for each classification category.Table 8, demonstrates the confusion matrix's output data that reveals most classes with high level of model correctness.Thus the confusion matrix shows an accuracy of 99.3% in Table 8.

(ii) Performance measures of classification results:
The parameters considered in this process are Precision, Recall, Score and Accuracy.
Classification accuracy: The number of correct predictions divided by the total number of accurate predictions yields classification accuracy.

accuracy(class) = TP(class) = TN(class) total samples
(18) Precision: The measure of inconsistency that finds when repeatedly using the same instrument gives the precision value, defined as: (19) Recall: Another important statistic is recall, defined as the partition of input samples into classes that  (20) F-Score is one of the well-known metric that combines precision and recall, defined as:

Plant village dataset
Four types of classifiers are compared here along with the proposed work and the results obtained are represented in Tables 9-11.Mean precision, recall, F-Score and total Test Accuracy are the performance parameters taken into account.From Table 9, the proposed methodology gives a better range in all parameters when compared with all other classifiers.The plot for Table 9 is shown in Figure 16.
From the results of Table 9, it is observed that the most successful learning strategy in the detection of plant diseases for all CNN architectures is the proposed SENet-CNN with an accuracy of 99.3%.Furthermore, the precision and F-score of the proposed methodology are 98.5% and 97.6% respectively.

Dataset 38
From Table 10, the proposed methodology gives a better range in all parameters when compared with all other classifiers.The plot for Table 10 is shown in Figure 17.The accuracy is high for SENet-CNN, whereas it is significantly less for all other classifiers.The dataset 38 consists of 10,000 healthy and unhealthy leaf images divided into 38 categories by species and diseases.Table 10 shows the result of applying different classifiers on dataset 38.19, 20.To ensure the results obtained and shared are useful to the scientific community, it is important to have a reproducible research perspective.The results obtained by the proposed SENet-CNN with the plant village dataset achieve a high degree of reliability when the study is replicated.

Training parameters and time
The number of parameters for every model is indicated in Table 12.The Resnet framework comprises the maximum parameters, as the table demonstrates.Whereas the training parameters for the model given in current research are the smallest, because of the relatively great depth of this network model, the training time for each round is a little bit longer than that of AlexNet and GoogleNet.Compared to AlexNet, GoogleNet and ResNet101, the convolutional neural network described in this research has the benefit of requiring less training parameters.Table 13 provides the comparison of accuracy with the existing methods.

Conclusion and future work
This paper proposes an improved accurate and automated system for recognizing diseased leaves.The proposed system uses SENet-CNN for the classification     In the future research, the existing algorithms can also be utilized in outdoor conditions along with the combination of leaf front and leaf backs into a common dataset.The current proposed model consists of a combination of existing models, so its limitations are clear, and to solve this problem, new techniques or designs of other architectures remain our priority in the future works.

Figure 1 .
Figure 1.General Architecture of extracting data through IoT.
[25] analyse a potential solution by training CNN frameworks with segmented image data and demonstrate that the established activation function improves CNN model accuracy.Prof. A. R et al.[26] reviewed the early detection of plant disease with Classification abilities of CNN, Inception v3.Debasish Das et al.[27] proposed to classify distinct types of leaf diseases (SVM), Random Forest (RF) as well as Logistic Regression

Figure 2 .
Figure 2. Proposed squeeze and excitation network-based convolutional neural network.
v. SENet-CNN based classification:The convolution layers of VGG16 along with the Inception and the SE block are used in this paper for classification.The first five convolution blocks, which are included in the VGG16 convolution layers, are predicated on the self-learning of low-to-high features of training images, while deeper convolutional layers extract the most high-level, abstract features while reducing the resolution of the feature maps.The proposed block diagram is shown in Figure4.It consists of five convolution blocks comes under the convolution layer.Each block reduces the image size, and the final reduced image is given for the process of classification.Filter Size visualization in the Proposed framework is shown in Figure5.The VGG16 pre-training model,

Figure 3 .
Figure 3. Flowchart of the proposed framework.

Figure 5 .
Figure 5. Filter Size visualization in the proposed framework.

Figure 8 .
Figure 8. Fully connected layer vs global averaged pooled layer.

Figure 10 .
Figure 10.The combined model of the SE module and the inception.

Figure 13 .
Figure 13.Leaf disease spots detection results (a) Multiple black rot spots in one leaf, (b) Multiple black measles spots in one leaf (c) Multiple leaf blight spots in one leaf and (d) Diversified diseased spots in one leaf.

Figure 15 .
Figure 15.Activation maps of potato disease image.

Figure 16 .
Figure 16.Performance evaluation of plant village dataset.

Table 1 .
Comparison of related woks.

Table 2 .
Comparison of CNN applications in agriculture.
ClassTest images (Before data augmentation) Test images (After data augmentation)

Table 6 .
PSNR at different noise density.

Table 7 .
Values of feature extracted from image.

Table 9 .
Performance analysis of various methods along with the proposed work for plant village dataset.

Table 10 .
Performance analysis of various methods along with the proposed work for Dataset 38.

Table 11 .
Performance analysis of various methods along with the proposed work for Dataset 15.From Table11, the proposed methodology gives a better range in all parameters when compared with all other classifiers.The plot for Table11is shown in Figure18.The dataset 15 consists of 6000 images, categorized into 15 groups based on species and diseases.The accuracy of all other classifiers is noticed less when compared with proposed SENet-CNN.Moreover, Figures 12-14 describe the values of above-mentioned measures graphically.The accuracy comparison of existing methods with the proposed framework is shown in

Table 11 .
The precision on test dataset was shown in Figures

Table 12 .
Test training parameters of each network. of image and detection of potato leaf diseases.Three datasets such as DataSet 38, DataSet 15 and Plant Village Dataset available on Kaggle were used.Augmented dataset is employed to expand the size of the training dataset in order to improve the model's performance and generalization capacity.The proposed SENet-CNN has superior flexibility to changes in image spatial position than other CNN algorithms.Compared with other traditional convolutional neural networks, the proposed model achieved the highest classification accuracy rate of 99.3%.

Table 13 .
Comparison of accuracy with the existing methods.