Research on near-ground forage hyperspectral imagery classification based on fusion preprocessing process

ABSTRACT Accurate identification and classification of forage grass are pivotal in optimizing forage resources and breeding superior forage varieties. Given the low accuracy in forage image identification and classification, and the loss of some features from preprocessing, we proposed an innovative approach that integrates preprocessing operations directly into the model instead of preceding feature analysis. We captured near-ground hyperspectral imagery of forage in the field and applied two deep learning models – Squeeze and Excitation ResNet (SEResNet) and Convolution Block Attention Module ResNet (CBAMResNet). These models not only harness the automatic learning capabilities of the ResNet deep network but also employ channel attention and a channel-plus-space dual attention mechanism to filter and label important features. This approach enhances data extraction and analysis, strengthen the correlation between the channel and space dimensions while eliminating redundancy and noise. We compared the performance of the proposed methods with the current popular methods by six evaluation parameters, including overall accuracy (OA), average accuracy (AA), Kappa coefficient, etc. Experiment results show the OA of SEResNet and CBAMResNet are 96.57% and 98.35% respectively. The experiments demonstrate the feasibility of incorporating preprocessing into the network and the effectiveness of the new idea for the classification research of forage.


Introduction
China is a big country in animal husbandry, but the quality of forage grass in most parts of China is relatively poor.And the yield of high-quality forage grass is seriously insufficient.More than 75% of the areas face grass shortage in winter, which limits the development of animal husbandry in China, leads to a large dependence on imports for forage, and affects the living standards and income of people in pastoral areas (Zhang et al. 2020).Forage breeding in China started late and developed relatively slowly, with problems such as limited varieties of forage species and less prominent traits, there is a certain gap with foreign countries in yield and quality (Jin et al. 2021).Since 2021, China's annual demand for forage feed has exceeded 10 million tons.Taking alfalfa as an example, the self-sufficiency rate of high-quality alfalfa in China is only 64%, while American alfalfa accounts for 93.5% of China's imports (Sun 2021).Based on the above situation, it is urgent to solve the problems of forage germplasm resources, breeding and genetics, cultivate high yield and high-quality forage grass varieties suitable for different regions and different growth conditions in China, and provide guarantee for the high-quality development of animal husbandry (Aodeng and Liu 2019;Wang 2021).
The identification and classification of forage are not only crucial for rational utilization and improvement of forage resources but also a key factor in developing high-quality forage varieties (Bengtsson et al. 2019;Dong et al. 2020;Li et al. 2021;Li et al. 2020).The appearance and quality of the same kind of forage can vary significantly in different soil environments and under various growth conditions, so the discrimination of forage species plays a particularly crucial role (Lei and Wang 2017).At present, most forage discrimination methods rely on manual and traditional techniques, which are time-consuming, laborious and often yield low accuracy.Therefore, rapid and accurate identification and classification of forage grass are important for the development, cultivation and production of forage resources.
Hyperspectral imagery (HSI) has gained popularity in the field of agricultural monitoring and analysis (Peng et al. 2022a;Xie et al. 2022).HSI offers a wide band range and rich spectral information.It can obtain the feature information of different dimensions such as channel, space and spectrum, which strengthens the capacity to identify and detect objects, has been widely usage in food safety, medical diagnosis, agricultural development and other fields (Jiang and Mei 2013;Li et al. 2023;Margherita et al. 2022;Mary et al. 2022;Rosalizan et al. 2022;Sajad et al. 2021;Zhang and Li 2014).However, the abundance of information in HSI also presents a challenge in pinpointing critical features.In recent years, researchers applied deep learning to HSI can effectively extract the depth feature information into the image, and exerted of massively parallel image processing to improve the analytical ability of HSI.Peng et al. (2019) proposed the GA-BPNN model combining back propagation neural network (BPNN) and genetic algorithm (GA) to optimize forage resourcesation by detect the content of nitrogen, phosphorus and potassium in soil.Gao et al. (2018) presented the convolution neural network (CNN) HSI classification structure, which converted the one-dimensional spectral vector corresponding to each pixel in HSI into a two-dimensional spectral feature matrix, making full use of the spectral information of hyperspectral data.Liu et al. (2023) introduced a classification model based on a rotation-invariant uniform local binary pattern and a graph-based masking autoencoder to fine-tune the convolution network using only a small number of samples to complete the classification of all nodes.Xu et al. (2023) proposed the spectral feature extraction method LGSF, using the local spectral feature extraction module LSFEM and the global spectral feature extraction module GSFEM to weaken the defects of long-distance spectra.Qing et al. (2022) proposed 3DSA-MFN based on 3D multi-head self-attention, using convolution kernels of different sizes to extract features, combining the spatial and spectral features of the feature map to improve the 3D multi-head self-attention mechanism.Dhande and Malik (2023) proposed the deep convolution network DCN, which integrates near-field and far-field images, to address the data set-specific performance gap and improves the efficiency of multimodal enhanced image classification through correlation analysis.
The advancement of deep learning has led to the application of various models for the identification and classification of HSI.These models include generative adversarial networks (GAN), Visual Geometry Group Network (VGGNet), recurrent neural networks (RNN), residual network (ResNet), etc.Through the idea of identity mapping, ResNet solves the problems of gradient disappearance and gradient explosion caused by deepening the number of layers, effectively enhances the image's feature extraction ability, greatly reduces the training time and improves the classification accuracy (He et al. 2015;Li et al. 2022b;Peng et al. 2022b).Arshi and Virendra (2022) proposed the Resnet 152v2 based face recognition model, demonstrating its performance containing various changes in pose, occlusion, and more achieving a recognition accuracy of 97% on the AT & T datasets.Mohammed, Tannouche, and Ounejjar (2022) presented a deep learning approach for crop detection.They pre-trained the Faster RCNN ResNet model and achieved up to 100% accuracy in detecting peas in various datasets.Wang, Li, and Wu (2022) proposed an improved model that combines improved multi-layer perceptron (IMLP) and ResNet.Deep overparameterized convolution was added to the IMLP to increase the learnable parameters of the model and improve the convergence speed.All the above papers achieve the enhancement to improve the classification performance to a certain extent.
To improve the analytical ability of HSI for a better classification effect, researchers have combined ResNet with attention mechanism to deeply extract spatial information and spectral features in HSI.The attention mechanism can allocate resources to more important tasks with limited computational power, focusing on the features that need more attention, thereby enhancing the overall performance (Chen et al. 2021b;Chen, Fan, and Chen 2021a;Yang et al. 2021).Zhong, Huang, and Tang (2022) proposed T-RNet, adding Transformer in ResNet to model global information and suppress background noise to enhance the focus on the target.Zhao et al. (2022) uses linear spectral hybrid analysis and spectral index method to extract the image element.They construct a ResNet model based on the real spectral information of the image element, which solves the massive demand of samples for deep learning.Qing and Liu (2021) proposed a multiscale residual convolution neural network combined with the channel attention mechanism, with a classification accuracy of 99.82%, 99.81%, and 99.37% across three hyperspectral datasets.
Current studies on HSI mainly use public datasets, primarily focus on the pixels of the image.But an increasing number of studies are shifting their focus toward the image's shape.Among publicly available remote sensing datasets, the image spatial resolution has reached 30 m.Some datasets have improved resolution, such as the spatial resolution of Indian Pines is 20 m and Pavia University is 1.3 m.However, it is not clear enough for forage identification, it still does not reach the ideal spatial resolution, because of the small size of single forage and multiple forage varieties in a given area.If you want to clearly observe the color, shape and other characteristics of the forage, a highdefinition HSI image with a spatial resolution of 10 cm and below is needed.Existing datasets struggle to meet these specific requirements, making it difficult to achieve high-accuracy identification and classification of forage HSI images.However, if the spatial interpretation of hyperspectral imagery can be strengthened, the identification and classification effect can be improved to some extent.Based on the above questions, we decided to take hyperspectral imagery of nearground forage in the field, establish HSI images database dedicated to forage, and increase the number of samples.
At present, the application of deep learning to forage HSI images requires preprocessing, including techniques such as principal component analysis (PCA), Maximum Noise Fraction (MNF), Linear Discriminant Analysis (LDA) and other methods.These methods serve to achieve dimension reduction, reduce redundancy and remove noise.However, this preprocessing often results in the loss of some features, leading to insufficient data extraction and analysis, reducing the experimental accuracy.In view of the above problems and combined with the characteristics of the field capturing data, we removed pre-processing process and integrated it into the network.In this context, we propose the Squeeze and Excitation ResNet model (SEResNet) and the Convolution Block Attention Module ResNet (CBAMResNet) for the identification and classification of forage HSI images.
Those two models first extract the shallow features, then calculate the weight values of the features from the channel dimension and the channel plus spatial dimension, mark the features critical to the classification, reduce the weight values of the noisy and redundant data, and reduce the dimension of the data.Achieving the effect of pretreatment in the network also strengthens the characteristic performance ability of spatial and spectral dimensions.Subsequently we use convolution kernel of different sizes to analyze the image features, which improve the data analysis ability.These models strengthen the ability of feature extraction and analysis, reduce the operation time, improve the identification and classification accuracy, provide a new idea for the rapid and nondestructive detection of forage.
Our main contributions are as follows: 1. We captured high-precision forage HSI images in the field and established a dedicated forage image database.2. We proposed a novel method that integrates the preprocessing process into the network.Changing the processing order in which the data are preprocessed and then analyzed.By calculating the importance of features to remove redundancy and noise, we have achieved the effect of pretreatment.3. We put forward SEResNet and CBAMResNet to validate the data preprocessing method in 2. The models not only highlight the advantages of automatic learning arising from deep network but also improve the data extraction and analysis ability through the attention mechanism, strengthen the correlation of channel dimension and space dimension, enhance the interpretation of spectral information.4. We verified the effectiveness of our proposed method by comparing several popular preprocessing methods and other deep learning models.

Data acquisition and expansion
Most of the current studies on hyperspectral imagery use publicly available datasets, the highest spatial resolution can reach around 1 m.However, the volume of the forage is small, requiring higher resolution to observe clear appearance.And there is now no datasets dedicated to forage HSI images.So we took HSI images of near-ground grassland and built a database.From August 2020 to August 2022, we took HSI of 10 kinds of forage in the field, such as Agropyron mongolianum, Old wheat awn and so on, at the institute of grassland research of CAAS (Chinese Academy of Agricultural Sciences).The area is dominated by grassland, consisting of grass grasses and leguminous grasses.
We took near-ground images using HyperSpec PTU-D48E hyperspectral imager, with the spectral range between 400 nm and 1000 nm, the imager was placed on a tripod during capturing.Before capturing, adjusted the equipment parameters, initial angle of the lens was set 0°to 35°, exposure time set to 300 ms, the scanning length set 25°, the lens should be 50 cm -100 cm away from the forage, each image contained only one forage at the time of capturing.The image acquisition process is as follows: selecting average times and pixel mixing times, adjusting exposure time and lens focal length, measuring dark current, capturing an image W, and converting the acquired absolute image I into hyperspectral imagery X (Eq 1).
It takes about 4 min to capture an image, scanned from left to right.If the capturing process encounters variations of light, wind speed and weather, it will affect the quality of imaging with blurring, shadows, displacement and other situations.So we chose the capturing time between 9 am and 2 pm with sufficient light and no wind.Since practiced, only black and white stripes can be seen before capturing, so it is critical to adjust the clarity of the image.We set the exposure time to 300 ms and selected the relatively clear black stripe as the initial DN value, generally between 8,000 and 10,000.At this time, the image is very blurred, so optimal exposure time and focal length need to be set.After the black stripe becomes clear by multiple adjustment, then the DN value should be checked.The value dropped to about 3000 indicates that the high clarity of image clarity at this time, and the image can be collected.After a long, large number of shots, removing low definition images, we finally obtained 75 near-ground forage HSI images of 1004 × 972 × 125 in which 1004, 972 and 125 correspond to width, height and channels or bands, respectively.each image with a file size of approximately 500 MB and a spatial resolution of 6.5 cm. Figure 1 shows the false color images of 10 forage species, with a number of bands of 11,32,53 and wavelengths of 448, 549 and 650 nm, respectively.
The existing studies on HSI images basically use publicly available datasets such as Indian Pines and Pavia University, most focus on pixels and identify classification studies through pixels (Divyesh, Ajay, and Onkar 2019;Kumar, Boggavarapu, and Prabukumar 2021;Reshma, Sowmya, and Soman 2016).There are also some studies focused on the image shape itself, we will also focus on the shape, color, size and other characteristics of the forage images.In addition, our data were taken in the field, so all 10 kinds of forage could not be included in one image, and only one kind of forage could be taken in one image, so the number of samples was insufficient.Each image with a file size of approximately 500 MB, and the direct experiment will consume a significant amount of memory and adversely affect the experimental speed.Therefore, we use cropping and rotation operations to expand the sample number and reduce the image quality.We set the 40×40 size cropping box, select one HSI of each forage, randomly cut out 1000 images, then select the other 10 forage images rotate 90°and 180°after cutting, finally get 20,000 forage images of 40 × 40 × 125 size.Table 1 contains the forage data information.

Squeeze and Excitation ResNet (SEResNet)
Hyperspectral imagery are multi-channel three-dimensional stereo images, with a large number of channel features, which can be more expressive description.The Squeeze and Excitation (SE) module can extract more features.Devassy et al. (2023) propose CNN model of the SE module based on the hybrid compression method, which adjusts the scale of each channel appropriately by lifting meaningful feature maps and reducing less important features.Xu et al. (2022) propose few-sample learning model based on double pooling compression and stimulating attention stimulation (dSE).dSE adopts two pooling methods to emphasize the features responding to the foreground target channel, while using pixel descriptors and channel descriptors to capture locally identifiable channel features and pixel features.Rajendran, Prajoona, and Amutharaj (2022) proposed a deep feature extraction model for HSI classification, SE-CNN, which can extract both spatial and spectral features from HSI data.The trusion and excitation blocks improve the representation quality of CNN.All the above studies used the advanced model and the SE module to improve the performance of the model.
To make full use of the features in the channel, we propose SEResNet to strengthen the learning ability of the deep neural network to independently learn and analyze the features.SEResNet is shown in Figure 2.After the image is cropped, first enter a 7 × 7 convolution layer to extract image features; then perform Squeeze and Excitation operation.The squeeze operation first average pools the feature map to obtain compressed features.Then the weight value of the feature is obtained by global pooling according to the loss value.And the excitation operation is performed, the input feature map and the obtained weight value are accumulated to assign weights to the original feature map.The correlation between channels is then constructed using two fully connected (FC) layers, the first FC layer is used to reduce the feature dimension, and then the weight value is output through the other FC layer after ReLu activation.The 2 FC layers not only enhance the nonlinear mapping ability of the space but also better fit the inter-channel correlations and reducing the influence of feature position on the classification results.Finally, the sigmoid function is used to assign different weight values to each channel and output the weighted features.Through the above steps, we not only calculate the importance of the features but also mark the features.Then the marked features are brought into the ResBlock composed of three residue blocks to analyze the features, screen the important features, and remove the features with small weight values.Three ResBlock are then entered for further training and analysis of weighted features.The second ResBlock consists of 4 residue blocks, the third ResBlock consists of 23 residue blocks, and the fourth resblock consists of 3 residue blocks.Finally, after the global average pooling, 2 Dense layers and 1 softmax output data, adding dropout between the dense layers to prevent overfitting.
The calculation process of SEBlock perform as Eq 2 -Eq5.F tr is the convolution operation, v c is the c-th convolution kernel, v s c is the c-group convolution kernel with the number of channels of s, x s is the Feature Map with the channel number of s.The feature maps were converted from [H, W, C] to [1, 1, C] by global average pooling of F sq .Then generate the weight value to the feature channel, W 1z and W 2 are FC layers, r is the scaling parameter, s is used to characterize the weights of C feature map in tensor U. Finally, the normalized weight s c is multiplied by the feature map matrix u c to obtain the final result xc .
After incorporating the SE module, the network is able to capture more nonlinear relationships and better fits the correlation between channels, while reducing the number of parameters and computational requirements.SEResNet focuses on the most critical features for the task at hand by identifying and analyzing the key features of heavy weights, while ignoring noise, background interference factors and redundant information by exerting small-value weights, to realize the effect of preprocessing, then learns and analyzes these features through the deep network to enhance the analysis of the features and improve the identification performance of forage.

Convolution Block Attention Module ResNet (CBAMResNet)
Convolution Block Attention Module (CBAM) is a lightweight attention mechanism, in which weights are calculated from two dimensions of channel and spatial.
To optimize computational efficiency, it employs a limited number of pooling layers and features fusion operations.The weight is multiplied by the feature map derived from refining the adaptive feature, realizing the sequential attention structure from channel to space (Cao et al. 2020;Pang 2022).Spatial attention directs the neural network to focus on the crucial features for image classification, while channel attention handles the allocation relationship of the feature graph channels.By combining the above two dimensions, the model's performance is enhanced.Sheng et al. (2023) proposed improved CF-RCNN that integrates convolution block attention module, features pyramid network (FPN), integrates CBAM into the network and improves the detection ability of the objects.Yin, Chen, and Zhang (2023) proposed a hybrid CNN-Transformer architecture, CTCA-Net, embedding the extracted image features into a token sequence, using transformer modeling.The reconstructed features are subsequently sent to the cascade decoder to aggregate with the shallow fine-grained features, enabling the model to maintain the integrity of the changes.Zhang et al. (2023) proposed a DFL-UNet + CBAM model, in which the CBAM mechanism was inserted for effective feature layer extracted by the backbone network and the results of the first upsampling, enhanced the channel characteristics of leaf spots, and improved the segmentation performance of apple leaf spot.The above methods combine advanced models with CBAM to effectively improve the ability to detect and analyze objects.
CBAMResNet as shown in Figure 3 also uses a 7 × 7 convolution layer to extract image features first and then the data enters CBAM model.The process starts with the channel attention mechanism, which performs maxPool and avgPool operations to obtain two different dimensions feature.These features are fed into a multi-layer perceptron (MLP) with shared weights to learn inter-channel dependencies.The weights and features are multiplied by the sigmoid function to generate channel weighted features and obtain the feature maps of two H × W × 1 as input features of the spatial attention mechanism.The two feature maps are stacked in the spatial dimension and then reduced into a single channel feature graph by convolution operation.Finally, we normalize the spatial weights of the feature maps and multiply the weights to obtain spatially weighted features.CBAM not only calculates the importance of the features, finds the key features but also ranks the weight of the noise and redundancy, and reduces the dimension of the image.The weighted features are then brought into ResBlock, and finally go through the global average pooling into 2 Dense layers and 1 softmax to output data.To address the inconsistency of the input and output dimensions, all blocks use a linear projection.
The channel attention mechanism, as shown in Figure 4, focuses on the importance of the features on the channel.The input data is H × W × C, and two 1 × 1 × C feature maps are obtained by global average pooling and global maximum pooling first.Then the two feature maps are sent into the MLP of the shared weights of the two layers respectively, the two feature maps are added up to obtain the weight coefficient between 0 and 1 through the sigmoid function.Finally, multiply the weight coefficient with the feature map to output the weighted feature map.
The spatial attention mechanism is as shown in Figure 5, and the output data of channel attention enters the spatial attention module, focusing on the importance of features in space.Input feature map as H × W × C, and carry out maximum pooling and average pooling of channel dimensions respectively to obtain two H × W × 1 feature maps.Then the feature map is spliced in the channel dimension, the size of the feature map becomes H × W × 2, and reduced to one channel through the convolution layer.The output feature map is H × W × 1, and the spatial weight coefficient is generated by sigmoid function.Finally multiplying with feature map yields a spatially weighted feature map.The calculation formula of CBAM show in Eq6 -Eq7, where σ is the sigmoid function and M c is the weighted feature.F`is the channel output feature, and M s is the weight of the spatial dimension.
CBAMResNet calculates the weight values of two dimensions, channel and space, improves the attention to important features, removes noise, shadow and other interference factors, reduces the dimension of data, achieves the purpose of preprocessing without losing features, and better reflects the idea of integrating preprocessing into the network.When the data entered the ResBlock training through CBAM, the data contained a large number of key features, which greatly improved the model's ability to analyze the characteristics of forage, and significantly reduced the training time and network parameters.Compared to SENet, CBAM not only enhances the generalization ability of the baseline model to better learn information in the target area and aggregation but also strengthens attention on characteristics in space, further strengthens the processing ability and analysis ability, and enables the data in 3D images to have better interpretation.

Comparison methods
The comparative pre-treatment methods are: Principal Component Analysis (PCA), Gray Wolf Optimizer (GWO), Minimum Noise Fraction (MNF), and InterBand Redundancy Analysis (IBRA).PCA (Akbar et al. 2022) was used to reduce the dimensionality of data while retaining as much original data as possible, by projecting the original features to the dimension with the maximum projection information, so as to minimize the loss of information after dimension reduction.GWO (Xie et al. 2018) adaptively adjusts the convergence factors and information feedback mechanism to achieve the balance between local optimization and global search, which has the advantages of a simple structure and few parameters.MNF (Islam et al. 2023) separates the noise in the data, using a two layered  principal component transformation, according to the signal to noise ratio from large to small, overcome the impact of noise on image quality.IBRA (Morales et al. 2021) is a filtering method for collinearity analysis based on the recurrence between the spectral band and its neighbors, by calculating the variance expansion factor, key information is found and redundant data is eliminated.The comparative deep learning methods include 3DCNN, Alexnet, VGGNet, and recurrent neural networks (RNN).Alexnet (Alex, Ilya, and Geoffrey 2017) realizes the application of largescale image data sets in convolution neural network, deepens the number of network layers.3DCNN (Li et al. 2022a) extracts features in three dimensions.Each feature is connected to multiple adjacent features in the previous layer to better capture spectral information and spatial features.VGGNet (Wei et al. 2021) increased the number of network layers, which strengthened the expression ability of the network, replaced the original large convolution layer of multiple small convolution cores, and enhanced the ability to learn features.RNN (Kim et al. 2021) establishes weight connections between the neurons of each layer, uses internal memory to process the input data of arbitrary time series, which has both internal feedback connections and feed-forward connections between the processing units.

Experimental evaluation parameters
The experimental equipment used Lenovo Saver Y9000P, Inter (R) Core i7-12700H, 32 GB of memory, and a 64-bit Windows 11 operating system.We utilize the Tensorflow 2.6.0 framework and the Python 3.9 compiler.Initially, we set batch size to 64, learning rate to 0.001 and trained 100 epochs, using the Adam optimizer, which has good explanatory hyperparameters, and is capable of adapting to changing learning rates from the mean of the gradient and the mean of the square of the gradient.Evaluate the experiment by six indexes, overall accuracy (OA), average classification accuracy (AA), Precision, Recall, Kappa, and Time.OA represents the ratio of correctly classified samples divided by the number of test sample; AA is the average of all classification accuracy; Precision represents the proportion of the true results in the correct samples; Recall represents the number of correct predictions in the sample where the true value is correct; Kappa coefficient represents the statistical consistency; Time is the training time.

Parametric analysis
The division of the datasets will affect the training level and experimental speed, as well as the fitting degree.We used 30%, 50%, 70%, 80%, and 90% of the dataset as the training set, experiment results are shown in Figure 6.When the training sample is small, the learning ability of the model with the data is relatively poor, and experiment results are low; when the training sample reaches 70% of the dataset, the accuracy is significantly improved, indicating the learning ability reaches a high level.But when the training sample exceeds 80%, the accuracy decreases, which indicates the excessive training samples cause a certain burden on the network.So we set the training data to 70%.
The learning rate determines whether the target function can converge to the local minimum value, also determines the gradient descent speed of the model.To make the learning rate constantly adjusted in the training process to make the network performance play at its best level, we use the attenuation strategy to dynamically adjust the learning rate.We set the learning rate to 0.001, 0.0001, 0.0003, 0.0005, 2e-3, 2e-5, and then adjusted the learning rate in the middle of the experiment.The experiment results are shown in Figure 7. Before the experiment, setting 2e-5, 40-80 epochs to 0.001, and 80 epochs to 0.0001 can keep the network in the best state, with the best convergence effect and the highest classification accuracy.
Batch size is a crucial factor that affects both the speed of training and the accuracy of the model.If the batch size is too small, it can result in gradient oscillation, while a batch size that is too large can increase the training time and make it more likely for the model to get stuck in a local minimum.Therefore, it is important to carefully choose an appropriate batch size for optimal performance.We set the batch size to 16, 32, 48, 64, and 100.The experiment results are shown in Figure 8.When batch size is small, the number of iterations is large, which makes the model unstable and the classification accuracy low; when batch size is greater than 64, experiment speed is obviously slower, indicating the number of samples input causes a burden on the network and causes the accuracy to decrease, so we set batch size to 48 with the best experimental effect.

Result and discussion
Table 2 displays the experimental results, revealing an OA of 96.57% for SEResNet.SE module strengthens the feature correlation between channels, finds important features by calculating channel weights, and has good processing ability for redundant information and noise.Training process is illustrated in Figure 9, the confusion matrix is depicted in Figure 10, the x-axis represents the predicted value and the y-axis represents the true value,    abbreviations of the x and y axes are the name abbreviations of the 10 forage species, and a scale of 0 -500 indicates the color change corresponding to the number of correctly classified samples.The CBAMResNet has an OA of 98.35%, indicating superior performance compared to SER-esNet.The model employs a double-weighted feature calculation approach for both channel and space, followed by linear connection method to input the weighted channel features into the space.This method effectively filters out noise and other interference, while enhancing the analysis ability of crucial classification information such as texture, color, and shape, and enables the network to better learn the key information.CBAMResNet expresses the idea of preprocessing better.The training process changes as shown in Figure 11.The confusion matrix is shown in Figure 12, the x-axis represents the predicted value, and the y-axis represents the true value, abbreviations of the x and y axes are the name abbreviations of the 10 forage species, and a scale of 0-500 indicates the color change corresponding to the number of correctly classified samples.
The experiment demonstrate both SEResNet and CBAMResNet exhibit excellent characteristic analysis abilities.It shows the addition of channel and spatial dual attention mechanisms can better analyze the features than the addition of a single mechanism, find important information for classification more accurately, strengthen the ability of data analysis, and successfully integrate preprocessing operations into the network.

Pretreatment methods comparative analysis
To demonstrate the effectiveness of the method, we contrast several current advanced preprocessing methods, the comparative preprocessing methods are: PCA, GWO, MNF, and IBRA.The experiment results are shown in Table 2.The OA of PCA-Resnet is 89.52%, proving PCA indeed removed redundant information and interference factors, effectively dimensioned the dimension of the data,.The OA of GWO-Resnet is 91.75%.GWO reduces the computational amount through  a simple structure, speeds up the experiment, finds the bands with important characteristics.The OA of MNF-Resnet is 93.09%.MNF separated the noise in the forage HSI, reduced the interference to the image, reduced the training data for the overall model, shortened the computation time, which performed better than PCA in the experiment.The OA of IBRA-Resnet is 93.36%.IBRA eliminated redundant spectral data and frequency band data, reduced the search range of spatial features, reduced the dimension of forage HSI, saved time in analyzing the data.
All the aforementioned methods utilize preprocessing to eliminate redundant data and noise, also alleviate the burden on Resnet, effectively improve classification accuracy.However, through experiments, it can be found that the accuracy of the above method is lower than that of our proposed method.Although interference and redundancy are removed, part of information is also deleted, which also contains key features for classification.The network did not acquire sufficient characteristics, so the ability to analyze the forage is not at the best level.Experiments show that the method of integrating preprocessing into the network is very effective, not only to find more features but also to improve the performance of the model, play a role in reducing and eliminating noise.

Other methods comparative analysis
To enhance the credibility of our research, we conducted a comparative analysis between our methods and deep learning methods, including 3DCNN, Alexnet, VGGNet and RNN.The experiment results as shown in Table 3, classification accuracy and other evaluation parameters of SER-esNet and CBAMResNet are the best.
Experiment results proved: (1) Although 3DCNN analyzes the characteristics of three dimensions and utilizes the unique features of HSI, it suffers from a shallow network architecture resulting in longer training times and lower accuracy levels.(2) Alexnet strengthens the analysis ability of features, enhances generalization ability through the local neuronal activity competition mechanism, shows good robustness.(3) We utilized a 16-layer VGGNet and found that the use of multiple small convolution kernels enhances the analysis of characteristics.Moreover, we observed that increasing the number of parameters also improves classification accuracy.Results demonstrate that a deep network can effectively strengthen the ability to learn and analyze.(4) RNN makes full use of the characteristics that multiple neurons can coexist between various layers to strengthen the ability of information transmission and the performance ability of features.
The experiment results prove that SEResNet and CBAMResNet verify the feasibility of integrating the preprocessing process into the network, also prove that combining the attention mechanism with ResNet can effectively improve the performance and strengthen the analysis ability of forage characteristics.Our proposed method can conduct high accurate identification and classification studies on forage HSI images.

Conclusion
In this study, aiming to overcome the limitations of traditional for effective forage identification.We proposed SEResNet and CBAMResNet for forage HSI.The experimental results proved that SEResNet and CBAMResNet achieved classification accuracies of 96.57% and 98.35% respectively, and other evaluation parameters also yielded favorable results.Our findings suggest that our methods can effectively extract key features, suppress unimportant information, reduce training time, and improve the overall performance of forage HSI identification and classification.
Our main work is as follows: (1) We took near-ground HSI images of forage in the field and construct HSI dataset.(2) We proposed a new data preprocessing scheme.The individual pre-processing step was replaced and incorporated into the model to remove redundancy and noise by calculating the importance of features to achieve the pre-processing effect.(3) We proposed SER-esNet and CBAMResNet to verify the idea (2).We highlighted the automatic learning advantage of deep network, and strengthened the correlation of the data in both channel and spatial dimensions, and improved the model's identification and classification performance.In our upcoming work, we plan to conduct further research on HSI and use its rich information to further enhance the classification accuracy of forage.

Figure 4 .
Figure 4. Schematic diagram of channel attention mechanism.

Figure 5 .
Figure 5. Schematic diagram of spatial attention mechanism.

Figure 6 .
Figure 6.Experiment results of different training ratio data.

Figure 7 .
Figure 7. Experiment results of different learning rates.

Figure 8 .
Figure 8. Experiment results of different batch size.

Figure 11 .
Figure 11.Relationship between accuracy, loss and training times of CBAMResNet.

Table 1 .
Information table of forage hyperspectral imagery.
Figure 9. Relationship between accuracy, loss and training times of SEResNet.

Table 3 .
Experiment results of the conventional methods (%).