Classification and Indirect Weighing of Sweet Lime Fruit through Machine Learning and Meta-heuristic Approach

ABSTRACT In the past few decades, both academicians and industries have shown interest toward the agricultural post-harvest operation aiming to reduce the post-harvest losses. In order to assist farmers in post-harvest decision-making some effective and innovative methodological frameworks are required. The fruit weight measurement is of prime importance in many food processing industries during sorting, grading, and packaging. In this work, different Support vector machine (SVM) classifiers as well as weighing models developed using the optimized adaptive neuro-fuzzy inference system (ANFIS) coupled with a computer vision system are proposed. More precisely, the weighing models based on the hybrid ANFIS approach using two well-known optimization algorithms are analyzed. In the first approach, a series of GA-ANFIS models have been evaluated for different population size. In the later approach, different PSO-ANFIS models have been evaluated by varying the most influential parameters. The comprehensive self-built color image database has been used for both calibration and validation of the models. From an economic point of view, this indirect way of weighing fruits may be useful to fruit growers and traders in deciding the market depending on the fruit size and weight before packaging. The result shows the higher reliability and prediction capability of the proposed meta-heuristics (GA-ANFIS) model in estimating the weight of sweet lime fruit.


Introduction
Automatic classification and precise weighing of different fruits are becoming the most popular area of research not only in academic institutions but also in the fruit processing and packaging industry. The sweet lime (Citrus limetta) locally known as "Sathukudi or mosambi" belongs to Rutaceae family, Citrus genus, and Limetta species. It is an industrial fruit used in traditional medicines as well as a source of Potassium and Vitamin C to restore energy with low calories and fats. The nutritional contents of sweet lime fruit are Potassium (182 ± 39.4 mg), Vitamin C (46.96 ± 7.64 mg), Iron (0.11 ± 0.02 mg), Calcium (25.79 ± 5.02 mg), Copper (0.03 ± 0.00 mg), Sodium (1.17 ± 0.45 mg), and Energy (114 ± 5 KJ) all units are per 100 g of its edible portion (Longvah et al., 2017). Along with this, the d-limonene found as major chemical composition in sweet lime fruit peels have explored and established as a good alternative to modern medicines for primary health care and to cure chronic disease (Khan et al., 2016). Hence, because of these well-proven health benefits of sweet lime fruit, it places itself as the major ingredient of fruit juice and food industries as well as pharmaceutical and beauty product industries. The productivity of sweet lime fruit in India has increased from 10.8 to 17.7 (metric tons per hectare) in between 2006 and 2018 (Horticultural Statistics at a glance 2018 report, 2020). Though the productivity has increased, but the post-harvest operations like grading and sorting of fruit are still performed manually. This manual handling of the fruits is one of the reasons for the qualitative and quantitative loss which leads to the economic loss to the grower (Ladaniya, 2008). A report shows that 5.8-18.1% fruits are lost during harvest, post-harvest, handling and storage in India. The overall post-harvest loss of citrus fruit in India is 6.3% (Nanda et al., 2012). The worldwide grading and packaging of all the fruits coming under the citrus family is performed by considering their size as the main attributes. The equal size and regular shape fruits always attract the attention of the consumer. Also, the size and weight of the fruit are considered as one of the quality measures along with an aroma, physical defect, color, shape and many more (El-Mesery et al., 2019). Hence, it is more important to know the relation between the size and weight of the fruit (Moreda et al., 2009). Cubero et al. (2014) have developed the computer vision system to inspect the citrus fruits concerning their size and color before the packaging line.
In a similar study, Spreer and Muller (2011) have developed a good correlation equation between geometric features and weight of a mango fruit which is to be used in contact-less sizing of mango fruit. Ercisli et al. (2012) have investigated the size and shape attributes of ten different Walnut cultivators using image processing techniques. The relation was determined between the attributes extracted and the Walnut cultivator using principle component analysis (PCA). Schulze et al. (2015) have applied length, width and thickness of the mango fruit as inputs to develop a non-linear feedforward ANN model for its mass prediction. Jorquera-Fontena et al. (2017) have communicated the relationship between the blueberry fruit weight with its diameter. In the recent past few years, computer vision has proved itself as one of the alternatives for the manual post-harvest operation like sorting and grading especially in the supermarket to help a customer as well as a cashier (Hossain et al., 2019). In order to provide a smart classification and packaging solution to agricultural industries, a computer vision along with intelligent soft computing techniques was recommended (Iraji, 2019). Yuan-Yuan Pu et al. (2019) have investigated three ripening stages of Bananito fruit using three different classifiers. The firmness, some color features, and soluble solids content (SSC) of the fruit have been used for internal quality and peel color identification of the fruit. Arun Kumar et al. (2019) have classified Indian pomegranate fruit in three different grades using 134 features extracted through image processing and ANN. Wenwen Zhang et al. (2019) have developed the multispectral vision system for potato defect identification and classification using the method of a single shot.
Miraei Ashtiani et al. (2020) have developed forecasting models for almond kernel mass prediction from the geometric attributes of its shell using different soft computing techniques. The authors have practiced various forecasting algorithms viz multi-linear regression (MLR), multi-layer perceptron (MLP), radial basis function (RBF), support vector machine (SVM), and ANFIS for three varieties of almond. So far within this field, appreciable work has been done. Table 1 summarizes the outcomes of different fruit mass modeling studies using manual and image processing for the past few years. In the aforementioned weighing models studied for different fruits have one or more flaws as mentioned below.
• Most of the mass prediction models were developed using simple or multiple linear regression. • The non-linearly in fruit mass modeling was not learned up to the mark. • Most of the studies performed have used more than two views of the fruit.
In order to conquer these flaws, this work presents an optimized and efficient ANFIS network. The application of meta-heuristics algorithms GA and PSO in the field of food process engineering for estimating the weight of fruit is both effective and novel. The ANFIS is being used as one of a non-linear function approximator in different engineering areas. As it takes the advantages of both, the uncertainty in a fuzzy system and the learning capability of a neural network (Jang, 1993). Recently, the studies pointing toward the environmental issue, ANFIS has been used as a modeling tool for forecasting of Indian River "Satluj" water quality index (Tiwari et al., 2018). Similarly, considering agricultural growth, post-harvest operations, and health as a prime concern, many efforts have been made by different researchers all over the world using ANFIS. They recommended different models using ANFIS for forecasting of the weight percentage of flocculated Asphaltene in oil (Keybondorian et al., 2018), strawberry yield under the greenhouse (Khoshnevisan et al., 2014a), wheat grain yield (Khoshnevisan et al., 2014b). The result obtained in the aforementioned studies encourages us to employ GA and PSO optimized ANFIS for predicting the sweet lime fruit weight.
The relation between fruit weight with its size is more important during post-harvest operations and packaging. Hence, for the potential application of the sorting, grading and packaging machines toward bulk agricultural products, a more accurate weighing model is required. From this aspect, this work tends to contribute mainly as follows: • First, the computer vision system which consists of an image acquisition chamber, camera, illumination system and a computer has been used to prepare a more comprehensive self-built database. • Second, the image processing algorithm has been used to extract seven features viz. length, average width, depth, two perimeters, and two projected areas from two views of the sweet lime fruit. • Lastly, the Support Vector Machine (SVM) classifier and ANFIS have been applied to these 1D and 2D features aiming to develop a forecasting model for sweet lime fruit class and weight. The developed supervised models have been improved with the use of meta-heuristics GA and PSO optimization algorithm.
The performance of the ANFIS model developed is improved by the use of GA and PSO algorithm for optimizing the weights of the if-then rules. The performance of these two models with high predictive potential is evaluated statistically and the results obtained are compared with the weighing models reported previously. This study reduces the need for an exhaustive, laborious and timeconsuming weighing system used in the packaging line.
Thus, this indirect method of weighing fruits will be helpful to a researcher, food scientist and agriculture-based industries in grading as well as the packaging of bulk agricultural products. Using such a similar combination of weighing and classification model will be useful in developing many industrial applications. It can be utilized in packing of bulk post-harvested fruits in supermarkets, wholesale traders, exporters and fruit processing industries.

Dataset Preparation
In this investigative work, the study of computation as well as extraction of features, and their modeling was important. As the benchmark public dataset for sweet lime fruit to estimate its weight is not available, we decided to create our dataset. The weight of fruit not only depends on its size but also depends on its chemical composition and the maturity level. Hence, in order to develop more general weight forecasting model total 793 (≈ 206 kg) good quality (without any physical damage), ripened and semi-ripened Indian sweet lime fruits were procured a number of times. The sweet lime fruits were procured from the nearest local traders located in Trichy (78.7047° E and 10.7905° N), Tamil Nadu (India) from March 2018 to July 2019. The samples used for calibration and validation of the model were 70% (553) and 30% (240) respectively. The samples were hand-picked to ensure the observed variability referring to size and weight. For ensuring the wide range validity of the developed model, the data set used for calibration as well as validation was divided in four ranges (>200 g as Class A; 150−200 g as Class B; 100-150 g as Class C and <100 g as Class D).
The weight and dimensions of the samples were measured using a digital balance (Equal®, India, Model: EQ3, capacity: 3 kg and accuracy: 0.1 g), and digital caliper (Dedso™, India, capacity: 200 mm, accuracy: 0.01 mm) respectively. The processes adapted to measure the above-said attributes, the color image acquisition set up to capture two different views of fruit were the same as used in the previous studies (Phate et al., 2019a(Phate et al., , 2019b. The algorithm developed for 1D and 2D feature extraction from the captured color image is shown in Figure 1. The threshold value which was automatically determined using the gray level histogram was applied for converting the gray image into a binary image (Otsu, 1979). Table 2 summarizes different input features extracted from images and output feature used for calibration and validation of the weighing models. The self-built dataset consists of total 1586 color images each have a resolution of 1920 × 1080.

Proposed Methodology
The actual computer vision system consists of the imaging chamber, three illumination lamps, a CCD image device having resolution 16.1 megapixels, and a personal computer (2.7 GHz, Intel core i7) with MATLAB 2019b installed. Figure 2 shows the proposed methodology used in this work.

SVM Classifier
The SVM has been used as the non-linear classifier to classify the sweet lime fruits into four different classes. In order to simplify the calculations, the loss function used transferees the quadratic programming problem into a linear problem. In the research work, four different kernel functions have been  used. The optimal kernel parameters and hyperparameter have been found using ten-fold crossvalidation using STATISTICA 10 (Data mining tool). Figure 3 shows the process adopted for classifying the sweet lime fruits using the SVM classifier of type 1 and using different kernel functions. Three diameters of the fruit extracted using image processing technique have been used as the input features. The target class assigned to the particular sample has been determined based on their respective weight. The different parameters were set during classification viz. C = 10; degree of polynomial = 3; γ = 0.9; coefficient = 0; maximum number of epochs = 1000; stop at error = 0.001 and cache size = 40 MB.

ANFIS
The ANFIS network was first proposed by Jang (1993). In this study, the ANFIS network was used as a function approximator because of its good learning ability and the capability to adapt itself quickly toward the system changes. The ANFIS structure has five layers. The output of layer 1 defines the membership degree. The shape of the Gaussian membership function used in this layer is determined by the non-linear premise parameters. The layer 2 output represents the firing strength for the respective rule. The layer 3 normalizes the firing strength. With the help of the input variables, defuzzification is performed in layer 4. The linear parameters used for output membership function are called the consequent parameters. Finally, the summation is carried out in the last layer. The nodes in layer 1 and layer 4 are adaptive while the nodes in other layers are non-adaptive. The parameters used in the adaptive layer has been updated using the calibration dataset.
In the present work, Takagi-Sugeno type FIS with fuzzy c means (FCM) clustering method has been used because of the well-proven benefits of it over the other two methods viz. the subtractive clustering and the grid partitioning (Phate et al., 2019b). The FCM has been used for rule extraction and also for determining the membership function (MFs) of antecedents and consequents. The consequent and premise parameters have been updated during the learning process. The predictive network may fall in a local minimum because of the difficulty in complex problem-solving. Hence, for solving this problem, the optimization method viz. PSO and GA have been used for updating ANFIS network parameters. For that initially, the ANFIS model has been tuned using a hybrid learning algorithm, subsequently, the MFs and the parameters of each MFs have been optimized using this well-known optimization algorithm to tune the MFs for achieving lower error and improving model performance.

GA Optimized ANFIS
GA is a stochastic and investigative search-based optimization algorithm. In order to optimize the parameters of ANFIS, optimization techniques have been used. The well-known optimization algorithm based on a gradient like back-propagation has a drawback that it will easily trap in local maxima. In recent literature, many global optimization techniques have been used to train the ANFIS Model.  Figure 4 shows the detailed flowchart applied for developing a GA optimized ANFIS model. The techniques like GA and PSO have been used by many researchers in recent years (Khandelwal et al., 2018;Wang et al., 2019) to optimize the performance of ANN and ANFIS. Moayedi et al. (2019) have discussed and concluded that the most encouraging and influential factors in optimization using GA is a population's size. The models are developed with crossover percentage (p c = 40%), mutation percentage (p m = 70%), mutation rate (m u = 0.15), and maximum iteration 1000. Moayedi et al. (2019) have discussed the most important and influential factors in optimization using PSO as the population size, the personal and global learning coefficient, viz. C 1 and C 2 respectively and the inertial weight (IW). Figure 5 shows the detailed flowchart applied for developing the PSO optimized ANFIS model.

Statistical Metrics
The performance of the developed models using ANFIS optimized using GA and PSO are examined and validated using the different statistical indices. The various error evaluation terms and goodnessof-fit measures used are defined by equations (1) to (6). In the said equations N represents the sample size while P i , and O i represent the predicted and measured value of the i th sample, respectively. Also, � O and � P are the mean of the N measured and the predicted samples, respectively. Mean square errors (MSE) and Root mean square error (RMSE) are most commonly used as a measure of residuals. Their  (1) and (2) respectively. The mean bias error (MBE) shown in equation (3) is used to convey the average model bias. Its value equal to zero indicates the model predictions are unbiased.
Fractional bias (FB) well stated by equation (4) is used to decide the model's prediction ability toward the 'under prediction' or the 'over predictions'. Its positive value points toward the, 'under prediction' and the negative value, signifies the 'over prediction', while the value zero implies the perfect agreements.  (5). It is a descriptive statistical measure describing the model's ability to produce error-free prediction. Its bounded value between −1 and 1 decides the extent to which the prediction made by the model will be agreed or not. Willmott et al. (2015) have compared refined index of agreement (d r ) with Nash and Sutcliffe's coefficient of efficiency (NSE) and Legates and McCabe's measure (E 1 ). It shows the better utility of d r to judge the goodness-of-fit.

Refined index of agreement (d r ) is a refined version of index of agreement (d) introduced by Willmott et al. (2012) is expressed in equation
Value account for (VAF) also known as variance account for is one of the metrics used for indicating the prediction capability of the model. It is well-defined in equation (6). Its value equals to a hundred indicates the model's excellent performance.

Sensitivity Analysis
The recognition of the relative effect of each model input feature on the model output would be important in the applications using automatic weighing of fruits. The strength of the relation ðr ij Þbetween x i and x j is well expressed by equation (7) for sensitivity analysis using the cosine amplitude method (Mohamad et al., 2017).

Results and Discussion
This section presents the results obtained for various classification and weighing models. In total, four SVM classifiers and sixty-five different ANFIS weighing models are developed and the results obtained are compared and investigated using different statistical metrics explained in the previous section.

SVM Classification
Three diameters of the sweet lime fruit have been used as the input features for developing the SVM classifier. Using four different kernel functions the classifier has been developed and their performance has been evaluated. In order to calibrate the classifier in total 553 samples and for validation in total 240 samples have been used. Table 3 shows the confusion matrix during calibration and validation of the classifier also, it shows the number of support vectors required. It shows least support vectors required for the SVM classifier using RBF kernel.

ANFIS Cluster Size Optimization
In this section, fourteen different FCM clustered ANFIS models are evaluated by varying their cluster count from 2 to 15. The cluster size of the FCM clustered ANFIS network is optimized by ranking the model output using several statistical metrics like d r , RMSE, and VAF. In order to select the best cluster size, the model ranking procedure is followed. In which, the ranking of each statistical term used is done using the ordinal ranking method for both calibration and validation data. Lastly, for each model, the rank assigned is summed up. The model with the highest total rank is considered as an optimal model. The model with a cluster count fifteen shows the optimal performance. The same cluster count is then used throughout the work.

GA-ANFIS Models
Using this hybrid network, fourteen different weighing models are evolved by changing the population size from 25 to 600. The convergence plot of various GA-ANFIS models for different population size is  shown in Figure 6. For determining the optimized model, the performance of the model during calibration and validation is ranked as shown in Table 5. The model with population size 75 shows the highest total rank which indicates the better performance of it over the other models. The results obtained for the proposed GA-ANFIS model with population size 75 during model validation is shown in Figure 7. The MBE and FB found for the proposed GA-ANFIS model during model calibration are 0.0159 and −0.0001, respectively. While during model validation the said values are noted as −0.1417 and 0.00096, respectively.

PSO-ANFIS Models
In this hybrid network in total thirty-seven models are developed and evaluated. The inertia weight damping ratio is kept as 0.99. The parameters of the PSO are judged through the trial and error  process. At first, the population size is varied from 25 to 600, then the obtained optimized population is kept fixed, and both personal learning coefficients (C 1 ) and global learning coefficients (C 2 ) are change by trial and error. Lastly, by keeping the optimized population size, and learning coefficients fixed, the inertia weight (IW) is varied for 0.2 to 1.0. The performance of PSO-ANFIS models for different population size during calibration is shown in Figure 8. Table 6 shows the model with population size 300 has a maximum rank. Table 7 shows that the model with the learning coefficients C 1 = 1 and C 2 = 2 has the optimal rank compared to the other models developed for different learning coefficients. Also, Figures 9 and Figures 10 show the performance of PSO-ANFIS modes during model calibration for different learning coefficients and different IW respectively. Table 8 shows for different inertia weights the model with IW = 1, has the maximum rank. The results obtained for the proposed PSO-ANFIS model with population size 300, learning coefficients C 1 = 1; C 2 = 2 and IW = 1 during model validation is shown in Figure 11. The MBE and FB found for the proposed PSO-ANFIS model during model calibration are −0.0084 and 0.00006,    respectively, while during model validation the said values are noted as −0.0441 and 0.0003, respectively.

Assessments with Literature
In this section, the comparison of the proposed model with the existing models is presented. For that, the best weighing models presented for sweet lime fruit are compared with the proposed model using the validation data set as shown in Table 9. It shows the best performance of the proposed GA-ANFIS model over the existing models. In Figure 12, the comparison between the two optimized ANFIS models investigated in this work is shown.

Conclusion
The SVM classifier developed using polynomial kernel shows better performance over the other classifiers during calibration as well as validation. The optimized ANFIS weighing models are proposed in this work using GA and PSO. The most influential parameters of GA and PSO are optimized using the trial and error process. Along with some important statistical terms, the ranking procedure is followed to judge the model's optimal performance. In almost all predictive models, the higher value of d r , VAF and a low value of RMSE shows the learning process is excellent. The comparative result shows the superiority of the proposed hybrid GA-ANFIS model ( The sensitivity analysis shows the higher impact of the projected areas on the predicted weight compared to other inputs applied to the model. Also, the strength of the relation found for all the inputs is more than 0.98 which shows the good relation between inputs and output. Since the weight estimation models mentioned in this work have a promising future, the prospects of this research work are listed below, • The mentioned indirect method of weight estimation will be helpful in the development and design of the weight-based post-harvested equipment for sweet lime fruits sorting, grading and packaging. • A similar approach can be useful for other axisymmetric fruits in estimating their weight as well as volume. • This may be the starting step toward the on-tree weight estimation of fruits to monitor their growth as well as the yield.