Detection of attack behaviour of pig based on deep learning

Attack behaviour detection of the pig is a valid method to protect the health of pig. Due to the farm conditions and the illumination changes of the piggery, the images of the pig in the videos are often being overlapped, which lead to difficulties in recognizing pig attack behaviour. We propose an improved YOLOX target detection model to overcome these difficulties. The improvements of the proposed model are: (1) the normalization attention mechanism is adopted to gain global information in the last block of the neck network and (2) the loss function IoU in YOLOX is replaced by DIoU to improve the detection accuracy. The pig attack behaviour considered in this paper includes the ear biting, the tail biting, the head to head collision and the head to body collision. The dataset is builded from the artificially observed attack video segments by using the inter-frame difference method. In the pig attack behaviour detection experiments, the improved YOLOX model achieves 93.21% precision which is 5.30% higher than the YOLOX model. The experiment results show that the improved YOLOX can realize pig attack behaviour detection with high precision.


Introduction
Attack behaviour (including fighting, chasing, etc.) frequently occurs in intensive pig breeding, which is harmful to the pig's body and causes infection or even death in the harsh piggery environment (He et al., 2019).Monitoring and identifying pig attack behaviours by manual observation will lead to high labour costs in intensive pig farms.Moreover, the accuracy and efficiency of the attack behaviour detection cannot be ensured by manual observation.Recently, the electronic sensors such as electronic ear tags, motion accelerometers and pressure pads are used to detect attack behaviour.However, electronic sensors are costly and easily damaged, and may cause stress reactions in pig (Pandey et al., 2021).The machine vision technology can effectively solve these problems.
In recent years, there have been a lot of research results on pig behaviour recognition based on vision (Arulmozhi et al., 2021) such as posture recognition (Xue et al., 2018;Yang et al., 2019), climbing behaviour recognition (Li et al., 2019), eating and drinking behaviour recognition (Chen et al., 2020;Jiang et al., 2021).Some traditional image processing techniques have also been applied on pig behaviour detection.For example, in Oczak et al. (2014), the attack behaviours are classified by linear discriminant analysis after extracting the motion features from the historical data of the pig's motion image.In Lee et al. (2016), the support vector CONTACT Hua Yang 331336198@qq.commachine method is applied to identify two abnormal behaviours of pigs after obtaining the activity features of pigs by Kinect depth sensor.The hierarchical clustering method is adopted in Chen et al. (2017), which first extracted the acceleration features of pigs by analysing the displacement changes of pig targets between adjacent key frame images, and then classified the degree of aggressive behaviour of pigs.The above machine learning methods are required to extract additional features from pig images through image processing technology in advance, which will lead to low efficiency and heavy workload in pig attack detection.
In the past few years, deep learning has shown its superiority over traditional methods in the fields of image and vision.Via the extraction and learning of low-dimensional features to high-dimensional features, deep learning can detect and recognize various tasks in most cases (Ren et al., 2017).Demonstrating strong learning and generalization ability in various fields, deep learning has also been widely used in behaviour detection of pigs.For example, Faster R-CNN is used to locate and identify individual pigs from group-housed pigs (Yang et al., 2018), and is also used to identify five postures (standing, sitting, sternal recumbency, ventral recumbency and lateral recumbency) and obtain sows accurate location in loose pens (Zheng et al., 2018).The centre loss monitoring signal into Faster R-CNN training to enhance the cohesion of intraclass features, so as to improve the accuracy of identification of postures of lactating sows (Xue et al., 2018).Resnet-FPN network is used to improve Mask R-CNN deep learning model to identify the mounting behaviour of pigs (Li et al., 2019).The 3D convolutional neural network model is proposed to identify the aggressive behaviour of herd pigs (Gao et al., 2019).YOLOv4 model is applied to detect the dietary behaviour of pigs (Jiang et al., 2021).A convolutional network model is proposed by integrates 2D-3D convolution features together to identify postures of sows (Xue et al., 2021).CenterNet model is improved by introducing lightweight MobileNet and pyramid structure and is applied on target detection of herd pigs (Fang et al., 2021).A module integrating channel and spatial attention is proposed, and the method of replacing CIoU Loss with EIoU Loss is applied to individual recognition of pigs (Ning et al., 2022).SE attention mechanism is introduced into ShuffleNet V2 network to give higher weight to target features, thus improving target recognition accuracy.Lu et al. (2023, march 07).In addition, the channel attention mechanism can also be introduced into the backbone network, so as to guide the model to pay more attention to channel characteristics of pig target information under occlusion conditions, and Yang et al. (2023) improve IoU Loss to make the model better identify occluded targets.In summary, the attention mechanism can filter image information so that the model ignores irrelevant information and pay more attention to the obvious features of the target in the image.Meanwhile, compared with IoU Loss, DIoU Loss can accelerate the regression speed of the model and reduce the missed detection rate.The improvement of IoU Loss and attention mechanism in deep learning has shown excellent performance in both target and simple behaviour detection of pigs.However, few studies are available on the advanced attack behaviour of herd pigs at interactive state.This paper is an attempt to bridge such a gap.The main contributions of this paper are: (1) The inter-frame difference method is considered to preprocess the data to avoid over fitting of the model.In this study, the inter-frame difference method is adopted to obtain key frames to form a data set for the detection of the attack behaviour of herd pigs.After that, a deep learning target detection model YOLOX is constructed which is used for the detection of the attack behaviour of herd pigs, so that the difficulties caused by the complicated postures of pigs and the complex piggery environment can be solved.By training the model, an effective model for detection of attack behaviour of pigs is constructed, and the generalization and feasibility of the algorithm are verified by model prediction tests carried out on the test set in different time buckets.
The paper is organized into four sections.Section 2 introduces data and model improvement methods.The experimental results and model performance analysis are presented in Section 3. Section 4 organizes conclusions and some directions for further research.

Data acquisition and preprocessing
The experimental data were collected in July 2020 at Fenxi pig breeding base in Linfen City, Shanxi Province.A total of nine 5-month-old pigs were selected and placed in an enclosed piggery of size of 4.5 m × 4 m × 2.8 m.This experiment adopts a downward tilt angle of 60 degrees for data collection.Compared with the top and head up angles, this angle can obtain different behaviour characteristics of pigs, and avoid large area occlusion of pigs.
In the process of data collection, we observed the daily video of pigs and intercepted 185 video clips of pigs with aggressive behaviours.The key frames of these video clips are extracted as the data set of the experiment.Traditional key frame extraction often adopts manual methods, which can hardly avoid the problems of subjectivity, low efficiency and high cost.At the same time, due to the fast speed of pig attack behaviour, the key frame of pig attack behaviour can not be effectively extracted manually.In addition, due to the slow movement and long rest time of normal pigs, a large number of test set samples and training set samples are easy to appear in the manually screened data, which reduces the fitting and robustness of the model.Therefore, this paper uses the inter-frame difference method to extract the key frames of video clips, avoiding the problem of artificial methods.
The key frame extraction process can be described as the following form: where f i (x, y) represents the grey value of (x, y) pixel points corresponding to the ith frame image and f j (x, y) is corresponding to the jth frame.T i is the threshold value which is set as 20% of the sum of the grey values of each pixel point of frame i.If the absolute value of inter-frame difference is larger than the threshold T i , then D(i, j) = 1 and the frame j is a key frame relative to frame i, otherwise D(i, j) = 0.After obtaining the key frame, we use the following two steps to process the obtained image and obtain the final experimental data set.(1) Adjust the pixel resolution of the input image from 1920 × 1080 to 640 × 640.
(2) Make the data set required for the experiment conform to PASCALVOC format requirements and label the data set.Pigs with attack behaviour in the image are labelled as attack as can be seen in Figure 1.

YOLOX-DNAM detection network
In this paper, the identification method for the pig attack behaviour is using YOLOX-DNAM deep learning detection network, which is an improvement on the YOLOX detection network by performing both NAM and DIoU.First, a normalized attention mechanism is inserted to the last part of the neck network in YOLOX, as shown in Figure 2.Such an improvement can reduce the insignificant weights during the training of the YOLOX, so that the training can maintain high performance and improve the computational efficiency.Second, the loss function IoU in the original YOLOX is replaced by Distance-IoU (DIoU).This replacement makes the target box regression more stable during YOLOX prediction and does not cause scattered problems during training like IoU.At the same time, DIoU can effectively improve the detection accuracy of YOLOX.

YOLOX detection network
With the development of deep learning target detection technology, YOLO series has been pursuing the best speed and accuracy tradeoff for real-time applications.They extracted the most advanced detection technologies and optimized the implementation of best practices (Gai et al., 2021;Zhang et al., 2022).At present, YOLOX (Ge et al., 2021) has the best compromise performance.YOLOX network structure includes four parts:

Normalization attention mechanism
Attention mechanism has become one of the most widely used plug-ins in the field of deep learning.Normalized attention module (NAM) (Liu et al., 2021) can reduce the weight of less important features.This method applies sparse weight penalty on the attention module, which makes these weights more effective in calculation and can maintain the same performance.NAM is a lightweight and efficient attention mechanism.Using the module integration mode of CBAM, the channel attention and spatial attention sub-modules are redesigned.NAM can be embedded at the end of each network block.The residual network can be embedded into the end of the residual structure.For the channel closing sub-module, the scaling factor in batch normalization is used.It can reflect the change size of each channel and also indicate the importance of the channel.NAM structure is shown in Figure 3.
Combining the redesigned spatial attention and channel attention modules, sigmoid activation function is used to enhance the expression ability of the neural network, further strengthen the feature extraction ability, and add the optimized feature map and the original feature map.In this paper, NAM has a better effect on the feature extraction of pigs' high-level interactive behaviour, and a better feature extraction capability will improve the accuracy of the overall model.

DIoU loss
This paper not only adds the normalization attention mechanism, but also replaces the IoU loss in the original YOLOX model by the DIoU loss (Zheng et al., 2020).As the most commonly used loss function for target detection, IoU loss is a very important loss function for calculating the regression effect of the bounding box.The IoU loss (L IoU ) can be described as the following form: where B and B gt are the prediction box and target box, respectively.DIoU introduces a penalty term on IoU to measure the distance between the centre point of the target box and the prediction box.In the process of minimizing the distance between the centre point of the bounding box, it can make the bounding box converge fast.The DIoU loss (L DIoU ) is described as the following form: where b and b gt are the centre points of B and B gt , respectively.ρ represents the Euclidean distance between b and b gt , and C is the diagonal length that covers the minimum box of the two bounding boxes.

Evaluation metrics
The evaluation indicators of the deep learning network model for pig attack detection are Precision, Recall, F1, AP and mAP.Precision represents detection precision and Recall represents detection recall rate.F1 is an overall evaluation of Precision and Recall.The calculation formula is where true positive (TP)/false positive (FP) are the numbers of samples predicted by the model as individual frames of pigs with a category tag that is consistent/inconsistent with the actual tag, respectively.False negative (FN) is the number of samples in which no individual pig is detected.
In the graph of P-R curve, the Recall is the abscissa and the Precision is the ordinate.Precision is negatively correlated with Recall.AP is the measure of the area formed by P-R curve and coordinate axis and is given as: where p is the abbreviation of Precision and r is the abbreviation of Recall.The larger the area enclosed by the P-R curve, the better the model effect.
The mAP value can be calculated by where N is the number of images in the test set.The mAP value represents the average accuracy of the model.It can evaluate the overall detection effect of the model on the test set.

Results
The improved YOLOX model is compared with the original YOLOX model on the test set.The evaluation indexes  4 shows a part of prediction results.Figure 5 gives the PR curve of pig attack detection results of different improvements on YOLOX.

Performance analysis of the normalization attention mechanism
Attention mechanism can help neural networks suppress less significant features in channels or spaces.Many previous studies have focused on capturing salient features through attentional manipulation (Liu et al., 2021).These methods successfully found the mutual information between different dimensions of features.However, there is a lack of consideration of the contribution factor of the weight value, which can further inhibit the insignificant features.Therefore, in order to extract more effective information from pig attack behaviour images, we add normalization attention mechanism into YOLOX.
When attack between pigs occurs, the captured images are usually accompanied by motion blur and occlusion imagination due to the fast motion speed.However, in the original YOLOX model, it is difficult to detect this kind of target accurately with a simple convolution module.In this paper, the normalization attention mechanism is adopted to the YOLOX model (Figure 6(a)) to improve the express ability, the capture ability of different local information, and the detection accuracy.Meanwhile, the false detection of occluded targets is    also reduced, see Figure 6(b-d).Figure 6 demonstrates the potential of mining feature representation which improves the detection confidence of the prediction box.

Analysis of DIoU loss
In the original YOLOX model, IoU loss has its advantages, but it can not reflect the distance between the prediction box and the real box as well as the size of the coincidence degree.In this paper, the IoU loss is replaced by the DIoU loss to solve the above problem to a certain extent.DIoU loss converges faster than IoU loss in training by combining the normalized distance between the prediction box and the target box.Furthermore, in YOLOX, DIoU is used in non maximum suppression (NMS) to further improve the detection performance.
By replacing IoU loss with DIoU loss, a more significant performance gain is achieved when detecting pig attack behaviour, as shown in Figure 7.

Conclusion
In order to solve the problem of low behaviour recognition accuracy caused by motion blur and occlusion imagination when pigs attack, this paper proposes an improved YOLOX network.By adding normalization attention module to YOLOX, and replacing the loss function IoU with DIoU, the detection precision and mAP are increased by 4.32 and 5.30%, respectively.Furthermore, this paper designs an image preprocessing method for generating deep learning datasets to improve the robustness and generalization ability of the model.Experiments show that the model can be used to accurately detect attack behaviour of pigs in a pig farm environment.
This paper provides ideas for the realization of accurate and personalized pig health monitoring in large farms.In the future, we will use this method to improve the efficiency of pig counting and finishing pig feeding behaviour recognition in complex scenes, and further study and improve the methods of enhancing feature extraction ability and reducing network complexity to build a lightweight and multi-attention mechanism network structure.So as to ensure that the detection accuracy and speed are improved.According to application requirements and resource constraints, a lightweight network with balanced accuracy and speed will be built.
(2) The normalized attention mechanism is inserted into YOLOX model to improve the feature extraction ability.(3) The IoU loss in YOLOX is replaced by DIoU loss to improve the detection accuracy of the model.
input, backbone, neck and prediction.In the input phase, YOLOX network scales enhance and normalize the input image.The backbone network of YOLOX is composed of Darknet53 and SPP structure to achieve feature extraction and fusion.The neck uses a feature pyramid FPN structure to obtain a feature map for prediction.The prediction section includes three decoupled Heads.Each decoupled Head consists of five CBLs, three Convs and two Sigmoids function.In addition, the prediction part also includes mec hanisms such as anchor-free, SimOTA and IoU loss.The overall network structure of YOLOX is to remove NAM part of Figure2.

Figure 6 .
Figure 6.Comparisons of the normalization attention mechanism.

Table 1 .
Indicators of different models for pig attack detection.