A novel Chinese herbal medicine classification approach based on EfficientNet

Chinese Herbal Medicine (CHM) classification is a promising research issue in Intelligent Medicine. However, the small available Chinese Herbal datasets and the traditional CHM classification model lead to huge challenge for obtaining the promising classification results. To tackle the above challenges, a novel large CHM classification (CHMC) dataset has been firstly established, which includes 100 classes with about 10,000 samples. This dataset contains a wide range of medicinal materials and natural background. Further, the promising EfficientNetB4 model is proposed to perform the CHM classification. EfficientNet can uniformly scales up the depth, width and resolution of the model, which will obtain better accuracy as it balance all dimensions of the network, including depth, width, and resolution, respectively. To validate the superiority of the EfficientNet and the effectiveness of CHMC dataset, extensive experiments have been conducted, verifying that the EfficientNetB4 is optimal for CHM classification, with 5% improvement of the existing model. In addition, this model has achieved state-of-the art CHM classification performance, with TOP-1 accuracy of 83.1%, and TOP-5 accuracy of 92.50%.


Introduction
Chinese Herbal Medicine (CHM) with a long-term development history, which has broad applications and attrac ted much attention from academia and industry fields. It plays a critical role in human being health. For example, CHM has made outstanding contributions to the effective fight against the novel coronavirus pneumonia in China. Consequently, accurate classification of CHM is essential in promoting the development of medical treatment.
Tremendous research efforts have been conducted classifying the CH. Luo et al. (2013) proposed a novel processing technique of the Locally Linear Embedding algorithm (LLE) and Linear Discriminant Analysis (LDA) to perform the CHM classification, which could handle the high-dimensional nonlinear data of CHM. However, the dataset they used only contains six classes that is quite small. Zhang et al. (2013) utilized a supervised local projection strategy to identify the leaves of plant and achieved superior classification performance. A Support Vector Machine (SVM) with Fourier characteristics and morphometric methods have been employed to classify the medicines in two test sets, including 26 and 17 classes, respectively (Unger et al., 2016). Specifically, one class contains about 10 samples and achieves the corresponding accuracy of 73.21% and 84% in each issue. Dehan et al. (2014) also adopted two approaches PCA CONTACT Fuzhong Li hualimengyu@163.com,869536244@qq.com and SVM, and conducted a comparison experiment. It is figured out that the SVM can obtain superior results. In 2016, the SOM algorithm has been utilized to identify CHM images (Wang et al., 2016). Although excellent performance are obtained, the scale of the dataset is relatively small, and there are not many classes of images. Liu et al. (2018) utilized GoogleNet for the CHM classification. Unfortunately, the dataset in this work is relatively large (50 classes), but still cannot meet our requirements. Further, the model they used is out of date. As above stated, although extensive methods have been developed to address CHM classification, it has not achieved the promising results. The reason can be concluded as follows: (1) the existing CHM data sets are small; (2) most CHM classification models are traditional with poor performance. To overcome the above-mentioned problem, we established a novel large CHM classification (CHMC) dataset via crawler technology. This dataset contains 100 classes, and each class contains 100 samples on average. Thus, the dataset includes 10,000 images in total. Among them, 8000 samples are utilized for training and the rest for testing. Moreover, CHMC has rich variety of medicinal materials. Specifically, concerning the natural attributes of medicinal materials: CHMC involves botanical medicine (rice bud, cockscomb, cinnamon twig, hematoxylin), animal medicine (sea cuttlebone, sea dragon, earth dragon, corrugated fruit, scorpion) and mineral medicine (red stone fat, Alum). Moreover, considering the medicinal part: CHMC contains roots (asarum, ginseng), bark (cork, pomegranate peel), seeds (lotus seeds, wild jujube kernels, orange cores), etc. Furthermore, the most samples in CHMC have nature backgrounds, which can promote real-world applications.
Moreover, in order to obtain effective CHM classification performance, we have proposed to utilizing promising EfficientNet model to conduct CHM classification. Conventional convolutional Neural Networks (ConvNets) are commonly proposed at a fixed resource cost, and then scaled up for suppressing performance if more resources are obtainable. For example, ResNet (He et al., 2016) can be scaled up from ResNet-18 to ResNet-200 through utilizing more layers. The most commonly used approach is to scale up ConvNets by their width (Zagoruyko & Komodakis, 2016) or depth (He et al., 2016). Another research line is to scale up models via image resolution (Huang et al., 2018). Although these models have achieved better performance than their corresponding baseline model without scaling up, they only scale one of the three dimensions -width, depth, and image resolution. Though two or three dimensions can be scaled arbitrarily, arbitrary scaling needs tedious manual adjustments and still often leads to sub-optimal accuracy and efficiency. EfficientNet can uniformly scale the model in all dimensions and can carefully balance the network width, depth, and resolution, respectively.
Above all, our contributions can be summarized as follows: • We first establish a novel large CHMC dataset for CHM classification with 100 categories and 10,000 samples. CHMC contains rich variety of medicinal materials, including botanical medicine, animal medicine, mineral medicine, roots, bark, and seeds part medicine. Furthermore, most samples in CHMC have complex and close to natural background. • We first propose to leverage the EfficientNet model which has powerful classification capabilities to conduct CHM classification. Through a simple but extre mely effective compound coefficient, EfficientNet can consistently scale all dimensions, including width, depth, and resolution, and obtain much better efficiency and accuracy than other models. Specifically, EfficientNetB4 achieves the best result for CHM classification, with 5% improvement of the existing model. Additionally, EfficientNetB4 has obtained the state-ofthe-art CHM classification performance, with TOP-1 accuracy of 83.1%, and TOP-5 accuracy of 92.50%. • Extensive ablation studies have been designed and executed to validate the performance of EfficientNet model and the effectiveness of the dataset. Concretely, these studies include the evaluation of various models, the evaluation of ResNet model with different layers, the evaluation of different EfficientNet variants, the evaluation of ResNet50 and EfficienetNetB4, and the evaluation of different dataset scales.
The reminder of this paper is organized as follows. Section 2 briefly reviews some related works. In Section 3, we describe the construction of the dataset and the Effi-cientNet model in detail. Experimental results are illustrated in Section 4 and some discussions are presented in Section 5. Finally, we conclude this paper in Section 6.

Related works
In this section, we will review some works closely related to this study, including deep neural networks. In recent years, deep convolutional neural networks (Krizhevsky et al., 2012;LeCun et al., 1989) have established a series of milestones for image classification (Krizhevsky et al., 2012;Zeiler & Fergus, 2014). Concretely, AlexNet (Krizhevsky et al., 2012), consisted of eight layers, won the 2012 ImageNet competition. This network started a new era of deep learning. Afterward, as the network becomes deeper, the performance of the convolutional neural network continues to make breakthroughs. In 2014, the GoogleNet (Szegedy et al., 2014) with 22 layers, won the ImageNet competition and obtained 74.8% top-1 accuracy. The key to this model is its inception module. Moreover, VGG (Simonyan & Zisserman, 2014), with two variants, VGG16 and VGG19, ranked second in the competition of the same year. Although above-mentioned networks have achieved promising classification results, they cannot be extended to deeper layers due to gradient explosion or dispersion problems. To tackle this problem, He et al. (2016) have developed ResNet framework, which with 152 layers on the ImageNet dataset and with a depth of up to 100 and 1000 layers on CIFAR10. The ResNet has won the 2015 ImageNet competition. In 2017, SENet (Hu et al., 2017) has become the champion of the ImageNet competition and its top-1 accuracy was 82.7%.
Although the deeper the network, the better the performance, but in some real application scenarios such as mobile or embedded devices, such a large and complex model is quite difficult to be applied. Thus, the research work (Howard et al., 2017) proposes the mobile model MobileNet. The essential of this network is the utilization of decomposable depthwise separable convolution, which cannot only reduce the computational complexity of the model, but also greatly reduce the model size. Moreover, Xception (Chollet, 2016) is another improvement of Inception v3 proposed by Google after Inception.
It mainly uses depthwise separable convolution, similar to that of MobileNet, to replace the convolution operation in the original Inception v3. Xception obtained the comparable classification performance on the ImageNet dataset with Inception v3.
Moreover, numerous researchers proposed to scale up ConvNets to pursue superior accuracy. However, most of their work consider depth, width, and resolution separately. To further achieve enhanced performance, the work (Tan & Le, 2019) systematically explores model scaling and demonstrates that carefully balancing network width, depth, and resolution can result in better results, then develops a novel model called EfficientNet.
Although convolutional neural networks have made a series of breakthroughs in recognition tasks, we are not sure that which model is optimal for our scenario. Therefore, exploring an optimal model for solving our scenario is becoming a key issue in this paper.

Dataset construction
This section elaborates the process of dataset construction in detail, including the data collection, cleaning, and preprocessing.

Data collection and cleaning
In the normal process of collecting medicinal materials, the ones of similar function are usually stored together. Therefore, the pictures of the same medicinal material collected in pharmacies and other occasions are very similar, with less diversity. Besides, the generalization ability of the trained model is poor, and it is prone to overfitting. Aiming to increase the diversity and richness of the samples, we utilize crawling strategy to collect a series of pictures of medicine images, under natural conditions with complex backgrounds, and cleaned them as necessary. The cleaned data is divided into the training set and test set, with a ratio 4:1. In the cleaning process, we choose images according to the proportion of the medicine in the image, the ones with a small proportion are deleted. To ensure the feasibility of data, our collection process is partitioned into three stages, with data categories ranging from less to more, specifically 29 categories, 50 categories, and 100 categories. After processing, each category contains 100 images. Some illustrative images are shown in Figure 1.
Furthermore, a wide variety of medicinal materials are collected. From the natural attributes of medicinal materials: it involves botanical medicine (rice bud, cockscomb, cinnamon twig, hematoxylin), animal medicine (sea cuttlebone, sea dragon, earth dragon, corrugated fruit, scorpion) and mineral medicine (red stone fat, Alum), as shown in Figure 2; from the medicinal part: it related to roots (asarum, ginseng), bark (cork, pomegranate peel), seeds (lotus seeds, wild jujube kernels, orange cores), etc., as shown in Figure 3.

Data preprocessing
The major task in the data preprocessing stage is data augmentation. In this stage, we adopt several most commonly used data augmentation strategies, including rotation, translation, and shearing, for CHM data augmentation.

Problem formulation
The sample x i is sent into a network F with several phases with convolutional operation or pooling/fc operation. The network contains p phases and the phase j has L j layers with operator O j . Finally, F obtains the logit g(x i ) for sample x i .
where F denotes the classification network, g(x i ) indicates the logit for sample x i from F, O L j j represents the phase j has L j layers and each layer with operator O j , indicates the composition operations between a series of layers. The predicted probability p c (x i ) of class c for sample x i from network is calculated as where the logit g c (x i ) is obtained from the 'softmax' layer of Network for x i . The network can be trained through the conventional cross entropy loss L in classification tasks, which can be obtained by the following equation.
where y i is the true label for x i , y c i is an indicator, if y i = c, y c i = 1, y i = c, y c i = 0, L denotes the cross entropy error between the correct labels and the predicted values, which can enforce the model to predict the correct results for the training samples.

Compound scaling
The key idea of EfficientNet  is to leverage a novel compound scaling strategy to uniformly scaling the width, depth and resolution of network, aiming at pursuring enhanced classification results with more resources available. This model scaling strategy is presented in Figure 4(b). Figure 4(a) illustrates a baseline network example with smaller width, depth, and resolution. Among it, the H and W denote the shape of the input tensor.
EfficientNet utilize a compound coefficient φ to consistently scales the width, depth, and resolution of network in the following way as in : where α, β, and γ are constants. φ is a coefficient specified by user that determines how many more resources are accessible for model scaling, while α, β, and γ denote how to distribute these excess resources to network depth, width, and resolution respectively. Table 1 presents the structure of EfficientNet-B0 as in . The key building block of EfficientNet-B0 is mobile inverted bottleneck MBConv (Sandler et al., 2018;, with added squeeze-and-excitation optimization (Hu et al., 2017). Beginning with the baseline EfficientNetB0 and utilizing the compound scaling method can scale up the network to different larger and superior variants, from EfficientNetB1-EfficientNetB4. The detailed width_scale :

Model training
For a fair comparison, the models in the experiment uniformly adopt SGD to update the parameters. At the same time, the epochs are 50 and the batch_size is 8. In addition, the Bayesian search is selected as the main strategy of hyperparameter search in this paper.

Evaluation criteria
In order to verify the performance of the model, we utilize the following evaluation criteria, including top1 accuracy, top5 accuracy, precision, recall, and F1 score . Among them, the top1 accuracy refers to the accuracy of the first ranking category in accordance with the actual results and the top5 accuracy refers to the accuracy of the top five categories containing actual results. The precision = TP TP+FP , it indicates how many of the predicted samples are correctly predicted. recall = TP TP+FN , which denotes how many samples are correctly predicted in terms of the original samples.TP/FP/FN indicates the number of True positive/False positive/False negative samples. The F1 score = 2 * precision * recall precision+recall , it is an index utilized in statistics to measure the accuracy of the classification model. F1 score takes the precision and recall of the classification model into account simultaneously, and it is the harmonic average of the precision and recall.

Experiments
In this section, we will describe our experiments in detail. Extensive experiments are conducted for exploring the availability of dataset and the efficiency of the optimal model.

Evaluation of various models
To explore the performances of CHM classification in various models, we compare various popular deep neural networks based on 29 classes of CHM. The comparison models include Resnet, SE-Resnet, MobileNet, Efficient-Net, with results presented in Table 3. Table 3 illustrates that the EfficienetNetB4 model obtains the best results, validating the effectiveness of EfficienetNetB4 model in CHM classification.

The necessity of hyperparameter search
To evaluate the effectiveness of hyperparameter search, Baysian strategy has been employed to search the optimal hyperparameter for different promising models, including MobileNetV3_large_x1_0, Xception71, ResNet 50_vd and EfficientNetB4, respectively. The comparison results of models with different parameters are shown in Figure 5(a) is about learning rate hyperparameter, and Figure 5(b) is about weight decay hyperparameter. Specifically, the search ranges of learning_rate and weight_decay hyperparameters are determined by the algorithm. Figure 5 illustrates that after hyperparameter search, the performance of model will be greatly improved, which validates the effectiveness of hyperparameter search.

Evaluation of different dataset scales
In order to verify the feasibility of the dataset we collected, we established it step by step. Meanwhile, we validate the performance of dataset with different scales. The reason is that when the dataset scale becomes larger and the performance is still remain comparable with those of small-scale dataset, we can believe that the large dataset we constructed is valid. To be specific, we explore three different dataset scales, including 29, 50, 100 classes of CHM. Furthermore, our experiments are based on the relatively optimal ResNet50 and Efficient-NetB4 models respectively. The comparison results are reported in Figure 6. Figure 6 illustrates that for different sample scales, the models obtain comparable performances in terms of top1 accuracy, top5 accuracy and loss, which verifies that the data we collect is valid, even when the scale is expanded to 100 categories. Furthermore, in subsequent experiments, we will employ 100 types of data by default.

Evaluation of ResNet model with different layers
ResNet is a relatively popular classification model. To verify the performance of the different ResNet variants on the CHM data, we compared the residual model with different layers, including ResNet18, ResNet34, ResNet50, ResNet101, ResNet152, respectively. The comparison results are reported in Figure 7 which illustrates that for ResNet18, ResNet34 and ResNet50, the deeper the layer, the better the performance. However, the ResNet101 and ResNet152 obtain worse results than that of ResNet50, which may be because of the data scale we established is still small when applying the ResNet model. In a word, the ResNet50 is the optimal choice among the ResNet model variants for our situation.

Evaluation of different EfficientNet variants
EfficientNet is recently the promising classification model. To validate the performance of the different Efficient-Net variants on the CHM classification data, we compared the EfficientNet with different variants, including EfficientNetB0 EfficientNetB4, respectively. The comparison results are shown in Figure 8 which indicates that the EfficientNetB4 obtains the optimal results.

Evaluation of ResNet50 and EfficienetNetB4
Based on the above sections, we conclude that ResNet50 and EfficientNetB4 are the optimal two models in their corresponding counterpart. In order to figure out the better one of these two models, we compared them and presented the results in the Figure 9. Figure 9 illustrates that EfficientNetB4 is superior to that of ResNet50 in terms of top1, top5 and loss evaluation criteria.

Discussions
CHM classification is essential to intelligent medical, assisting people to better collect, identify, and process    CHM. Furthermore, it can better treat diseases such as the new coronavirus. However, CHM classification has not achieved promising performance. Reasons can be concluded as follows, first, the classification of CHM requires very professional knowledge, and there are few people with this ability. Furthermore, the previous Chinese herbal datasets are small, and the recognition model employed is not optimal.
To tackle the above-mentioned problems, a novel larger CHM dataset, with 100 categories, has been established. This dataset is more feasible in terms of backgrounds and involved fields. Specifically, most examples in this dataset contains wild and complex backgrounds. Further, this dataset contains plant, animal and mineral medicinal materials, and also includes root, bark and seed medicinal materials.
In order to ensure that the established dataset is effective, we set up it gradually, from small to large, ranged from 29 categories, 50 categories, and 100 categories. Section 4.3 validate the performance of them. The results demonstrate that different scale datasets obtain promising and comparable performance, which verifies the feasibility of our novel dataset when its scale up to 100 categories.
Furthermore, we explore the effectiveness of using Bayesian hyperparameter search strategies in training in order to improve CHM classification performance. Section 4.2 demonstrates that the results of model will be greatly improved after the hyperparameter search, which verifies the effectiveness of the hyperparameter search.
To explore the effectiveness of EfficientNet for CHM classification, we design the experiments from four aspects: (1) the evaluation of different models, (2) the evaluation of ResNet model with different layers, (3) the evaluation of different EfficientNet variants and (4) the evaluation of ResNet50 and EfficienetNetB4. According to the evaluation of different models, we figured out that the ResNet50 and EfficientNetB4 models achieve superior results on 29 categories CH dataset. In order to further assess the ResNet and EfficientNet models in their corresponding counterparts, the (2) and (3) experiments are carried out, which further validate the superiorities of ResNet50 and EfficientNetB4 in their corresponding families (ResNet18-ResNet 152 and Efficient-NetB0-EfficientB4). The (4) experiments are finally utilized to figure out the most optimal model, which reports that the EfficientNetB4 is the best model for CHM classification.

Conclusions
This paper establishes a novel relatively large CHM dataset with natural and complex backgrounds, which contains 100 categories and 10,000 samples in total. Furthermore, we first propose to apply the excellent EfficientNet to perform CHM classification task. Numerous experiments have been carried out to validate the availability of the established dataset and the effectiveness of EfficientNet. Experimental results demonstrate that EfficientNetB4 is the superior classification model and obtains state-of-the-art CHM classification performance, far beyond other models (ResNet, SE-ResNet, MobileNet, and Xception et.) in terms of all evaluation criteria. Future work will be explored in the aspects of construction of a larger CHM dataset with more categories, and more effective methods for classifying CHMs.

Disclosure statement
No potential conflict of interest was reported by the author(s).

Funding
This work was supported by Shanxi Province Higher Education Innovation Project of China 2020L0154 and National Natural Science Foundation of China (NSFC) under grants 61702317.