The abnormal traffic detection scheme based on PCA and SSH

Network abnormal traffic detection can monitor the network environment in real time by extracting and analysing network traffic characteristics, and plays an important role in network security protection. In order to solve the problems that the existing detection methods cannot fully learn the spatio-temporal characteristics of data, the classification accuracy is not high, and the detection time and accuracy are susceptible to the influence of redundant data in the sample. Thus, this paper proposes a network abnormal detection method (PCSS) integrating principal component analysis (PCA) and single-stage headless face detector algorithms (SSH). PCSS applies the PCA algorithm to the data preprocessing to eliminate the interference of redundant data. At the same time, PCSS also combines feature fusion and SSH to enhance the feature extraction of unclear features data, and effectively improve the detection speed and accuracy. Simulation experiments based on IDS2017 and IDS2012 data sets are carried out in this paper. Experimental results show that PCSS is obviously superior to other detection models in detection speed and accuracy, which provides a new method for efficiently detecting traffic attacks.


Introduction
In recent years, the Internet has been widely used in various fields of social production, affecting people's work and life. Network technology not only brings convenience to people, but also brings certain risks and challenges. Due to the openness of the Internet architecture, there are various new attacks against the vulnerabilities of network protocols and applications. As a new criminal means, some advanced network technologies are used by criminals, which poses a serious threat to social production and national security (S. Y. Ji et al., 2014).
The traffic generated by different network intrusion behaviours and users' normal behaviour is essentially different, and different network intrusion behaviours can be detected in theory (Marir et al., 2018). However, due to the complex topology of the network and the huge openness of the Internet, the network traffic data generated every day is not only large in scale but also high in dimension. As a result, only limited risk information can be detected by the traditional security monitoring methods, which cannot meet the requirements of attack and defence. In addition, the high-dimensional traffic data will increase the time complexity and spatial complexity of the security monitoring programme, causing "dimensional disaster".
The traditional network abnormal traffic detection method mainly realises the abnormal monitoring of various indicators in the traffic by manually setting a fixed threshold. As the network environment is complex and changeable and new network attacks occur constantly, the threshold often needs to be adjusted according to the actual situation. The manually set threshold can no longer meet the needs of network protection and security, so the network abnormal traffic detection system must learn to set the threshold dynamically and autonomously. With the continuous development of deep learning, its ability of active learning has brought up new development opportunities for abnormal traffic detection. Many scholars began to introduce deep learning into network abnormal traffic detection (Cui et al., 2019). Deep learning can automatically learn various characteristics of traffic and classify traffic on this basis, so as to realise the detection of abnormal traffic and achieve surprising results. At present, applying deep learning to network abnormal traffic detection still faces many challenges. The high-dimensional traffic data brings great difficulties to the real-time detection of network abnormal traffic by using deep learning. In addition, the data imbalance leads to the traditional model prediction results that tend to those large amounts of data, resulting in inaccurate prediction accuracy. For high-dimensional data, principal component analysis (PCA) is used to reduce the dimension and improve the convergence speed of the model. Aiming at the data imbalance, feature fusion technology is introduced into the network model to fuse the low-level features and high-level features extracted from the data, to further improve the detection accuracy.
To solve some disadvantages of current deep learning methods, this paper proposes a new abnormal traffic detection model (PCSS). In this paper, PCA (Jin et al., 2008) is introduced into traffic data (Swarna Priya et al., 2020) preprocessing to reduce the dimension of high-dimensional data and obtain the key features of data. PCSS can fully learn the temporal and spatial features of data, and on this basis, carry out feature fusion. Moreover, the SSH framework in face detection is embedded into PCSS (Najibi et al., 2017), which enhances the extraction of semantic information in the data and further increases the receptive field.
The main contributions of this paper are as follows: (1) PCA is applied to the pre-processing of traffic data in this paper to reduce the dimension of high-dimensional data, remove the redundancy in the original data, extract the main features of data, and effectively reduce the data processing time.
(2) The SSH model in the field of face detection is introduced into the network model designed in this paper, which enhances the receptive field, fully extracts the semantic information, and further strengthens the feature extraction of the data with no obvious features.
(3) The PCSS model is verified on IDS2012 and IDS2017 datasets. The experimental results show that the PCSS model not only is fast and robust, but also can achieve a high recognition rate.
The rest of this paper is organised as follows. The second section reviews the related work in the field of network abnormal traffic detection. Then, the third section introduces the PCA data processing method and the PCSS network model, and the analyses of the experimental results are given in the fourth section. Finally, the fifth section concludes this paper.

Related work
This section gives a comprehensive overview and discussion of the related work, focusing on the development of network abnormal traffic detection methods, including data processing methods and network model design.

Data processing method
Effective data processing can enhance the characteristic information of data, which is conducive to the accuracy of network intrusion detection. Duque (2015) introduced data cleaning into the field of traffic data processing earlier and solved the problem of data heterogeneity (W.  by converting the attributes of data sets into numerical data and storing them in a readable form. Experiments were carried out on the NSL-KDD dataset, and the accuracy was improved by using the K-mean algorithm to classify the processed data. In addition, the data imbalance (Sreeja, 2019) will also affect the detection accuracy of abnormal network traffic. Most standard algorithms assume that the class distribution is uniform or the standard error of the class distribution is within a controllable range. Therefore, when faced with complex unbalanced data sets, these algorithms cannot well represent the distribution characteristics of the data, resulting in low prediction accuracy of the algorithms. Some scholars by random sampling (Y. , random undersampling (Deng et al., 2020) technology to reduce the imbalance of data, a random sample from a few classes, select a random sample to replace, and the instance multiple copies of supplementary training data, so a single instance may be multiple-choice, easily lead to data fitting (D. , random owe sampling and random sampling instead. Undersampling randomly selects samples from most classes and removes samples from them, thus reducing the number of examples in most classes in the transformed data, but causing a large amount of data to be discarded (Liang, Long, et al., 2021), making the decision boundary between minority and majority instances more difficult to learn (H. , resulting in reduced classification performance. Chawla et al. (2002) proposed a synthetic minority class oversampling technique (smote) that generates synthetic examples by operating in the Feature Space. A few classes are oversampling obtained by interpolating multiple minority class instances (Deng, Shang, Cai, Zhao & Zhou, 2021), and randomly selecting neighbourhood domains from k-nearest neighbours based on the amount of oversampling required, thus effectively avoiding overfitting problems . However, there are some problems in this method, there is certain blindness in SMOTE during the choice of nearest neighbours and it cannot overcome the data distribution problem of unbalanced data set, which is easy to cause the distribution marginalisation problem. Liang, Xiao, et al. (2021) improved SMOTE technology and introduced it to the field of network anomaly traffic detection, preprocessing the original traffic data, and on this basis, proposes a feature selection method based on information gain (Deng, Shang, Cai, Zhao & Song, 2021), which constructs a simplified subset of features on the original traffic data set technically and further improves the quality of the training data set, but it takes a long time. J.  combine genetic algorithms with feature selection on the former basis to further enhance feature selection, which effectively reduces data dimensionality, increases detection rates, and maintains a low false-positive rate. Anderson (1980) first proposed the concept of network abnormal traffic detection in 1980, aiming at detecting illegal behaviours that damage hosts without interfering with network usage. With the continuous progress and development of science and technology, many methods (Q. Tian et al., 2020) are used for intrusion detection to detect attacks with good prediction accuracy and improve real-time prediction capabilities. Julisch (2003) initially used the network abnormal traffic detection method to detect abnormal attacks by clustering the traffic and then detecting abnormal traffic according to the manually set threshold (X. Chen et al., 2021), which achieved a certain effect. but the current network traffic data is too complex and it is difficult to set thresholds manually (Liang et al., 2020), so this method is less scalable. At present, mainly through supervised machine learning algorithms for network anomaly traffic detection , such as support vector machine (SVM) (W. H. Chen et al., 2005), k-nearest neighbour (KNN) (S. Zhang et al., 2017), random forest (RF) (Paul et al., 2018), Naive Bayes (NB) (Murphy, 2006), artificial neural network (ANN) (Manzoor & Kumar, 2017), etc. The method based on machine learning is the most studied network anomaly traffic detection method, However, all network abnormal traffic detection methods based on machine learning need to manually extract the characteristics of traffic data to determine its type, which is time-consuming and laborious, with high false alarm rate and low attack traffic detection rate.

Network abnormal traffic detection method
Deep learning (LeCun et al., 2015) has good adaptive, self-organising and generalisation abilities (H. Liu et al., 2021). Therefore, it can solve the problem that traditional machine learning requires the manual design of feature thresholds (Ioffe & Szegedy, 2015). Deep learning can make the detection system have higher detection efficiency, so it has been widely studied by scholars in recent years. He et al. (2017) constructed an intrusion detection system based on a convolutional neural network (CNN) and applied a generative adversarial network to synthesise attack traces, which achieved certain effect, but had poor detection effect on high-dimensional data . Y.  proposed a network abnormal traffic detection method combining CNN and LSTM, which can effectively model the feature information contained in the traffic data, so as to realise automatic extraction of the spatio-temporal features of the traffic data. However, there are some problems with the data processing methods and anomaly detection methods mentioned above (D'Angelo et al., 2021). First, data feature extraction is not sufficient enough to handle the traffic data generated by the current complex network environment . Second, scholars focus on the final overall sorting metrics, ignoring each type of sorting metrics in abnormal dataset traffic . Third, both KDD data sets and NSL-KDD data sets are fixed and data feature dimensions are not high enough to adequately simulate real network traffic, and the selection of features on this basis is not reasonable. Fourth, many algorithms do not consider time costs, and while some mainstream anomaly traffic detection algorithms are able to increase detection time for unbalanced abnormal traffic data (H. Ji et al., 2021), they do not meet the requirements of big data calculations. This paper designs a network model (PCSS) that can learn spatio-temporal features of traffic data and fuse the learned features. PCSS learns traffic data more fully.in addition, the PCSS integrated SSH in the field of face detection module, enhances the semantic information extraction, to further improve the classification efficiency and accuracy. Compared with the traditional work, the method proposed in this paper is less time-consuming, and has high detection accuracy and strong practical value.

Model and method
PCA is a data analysis method, mainly used to reduce the dimensionality of highdimensional data, so as to obtain the main attributes of the data. In the original network traffic data, there are many duplicate or meaningless attributes for traffic detection, resulting in too high data dimension, slow training and low accuracy of the traditional network model. To solve this problem, this paper introduces the PCA algorithm into the field of traffic data processing to simplify the attributes without losing the meaning of the traffic data itself, remove noise and redundancy to obtain more typical information.
In this section, the network model PCSS is proposed and the overall process of a network abnormal traffic detection method integrating PCA and SSH are given, as shown in Figures 1  and 2, respectively. Then, the data preprocessing method and how to use PCA to reduce the dimension of traffic data are introduced in detail.

PCSS network model
In this subsection, the network model proposed is described and the related technologies are introduced in detail.

Overall model architecture
As shown in Figure 1, the data processed by the PCA algorithm is input into the network model PCSS, which is mainly composed of two parallel convolutional layers divided into three stages. Feature fusion is used to fuse the temporal and spatial features of traffic data. PCSS combines PCA algorithm with feature fusion technology to learn traffic data more  fully and effectively solve the problem of data imbalance. The PCSS is mainly divided into two layers and three stages. In the first stage, the first layer and the second layer learn the attribute characteristics of traffic data from different aspects. In the process of learning, the characteristics of each layer are fused many times to strengthen the results of learning (Cui et al., 2020). The first layer consists of two convolution layers and an SSH module, in which the stripe and padding of the first convolution are set to 1, the padding of the second convolution is set to 1, and the stripe is set to 2. The SSH module was originally applied in the field of face detection and can use different detection and recognition methods for targets of different scales, where the ability of data classification can be enhanced . The second layer is composed of two convolution layers and a pool layer. The stripe and padding of the two convolution layers are set to 1, the padding of the pool layer is set to 0, and the stripe is set to 2. Then, the learning results of the first stage are fused through the feature fusion technology and input to the convolution layer of the second stage through the channel. The second stage is also a two-layer structure, each layer has a convolution layer, and the padding and stripe of the convolution layer are set to 1. In the second stage, the learning features are further combined. The third stage is a layer structure, which is composed of a global conv module, AVG pooling and a classification in FCN. This stage is mainly used to further extract the learning features. After three stages of learning, we can more fully learn the spatial characteristics of traffic. Figure 2 is the overall workflow chart of the system. It mainly includes original data dimensionality reduction, data set preprocessing, network model training and abnormal traffic detection.
To reduce the model's complexity, the size of all convolution kernels in the PCSS network model is set to 3 * 3, in which the multilayer stacked 33 convolution kernels can also obtain the receptive field of equivalent larger convolution kernels. For the convolution operation, assuming that χ i−1 is the input feature mapping of the current layer and the convolution kernel ω i is 3 * 3, the current output feature χ i can be expressed as where b i represents the offset term. In actual network traffic, the distribution of samples of different traffic types is quite different. In a highly imbalanced flow sample, the system can quickly identify the flow data category with a large sample size, but for data with a small sample size. There is often a problem of poor detection effect. The proposal of batch processing can solve the over-fitting problem well and speed up training by reducing the internal covariate shift of the input data, so as to quickly complete the convergence of the model. The introduction of batch standardisation (Ioffe & Szegedy, 2015) reduces the dimensionality of features and the imbalance of the network, and can better adapt to data with small sample sizes, by calculating the mean and variance of a small batch of samples, and then normalising the relevant data. The related process is as follows: For n input samples, the average μ B -band variance σ 2 B of small batches is calculated, and then normalised according to the average value and variance ; the specific process is as follows: where μ B represents the average characteristic size of all samples in a small batch and σ 2

B
represents the variance of all samples in a small batch. Sample x i obtainsx i by standardising μ B and σ 2 B , and each input layer uses normalisation operation to make all input data obtain the same distribution without sample difference. To improve the stability of the network model, two learnable parameters are introduced to scale and shift the features and the model learns the original sample distribution to enhance the generalisation ability of different types of data.
After the data perform batch normalise (BN), the nonlinearity is introduced through the activation function. To accelerate the convergence speed of the network and solve the influence of gradient explosion and gradient disappearance, all activation functions use the ReLU function. So the final output feature map is expressed as where g is the activation function. Each category is defined as follows:

SSH
SSH is a face detection algorithm and its main innovation lies in multi-scale detection, the introduction of more context information, and the grouping of loss functions. Its network model is shown in Figure 3, whose model structure is improved on the basis of VGG16. The SSH face detection algorithm, similar to YOLO (Y. Tian et al., 2019) and SSD (W. Liu et al., 2016), is performed directly from the first few convolutional layers of the classification network, which is different from the conventional two-stage face detection algorithm. Face detection is tested on the WIDER (Solt, 2016) data set. Compared with the ResNet101 network, it has fewer parameters, faster speed, and higher accuracy. It can be seen from the network model structure diagram that the SSH network model introduces different detection modules in the convolutional layers of different depths to detect faces of different scales. For targets of different scales, three detection modules with different structures are integrated into the network model. The detection module M3 is spliced in conv5-3 and added with a max-pooling layer, and the detection module M2 is directly spliced in conv5-3 convolution. After a max-pooling layer operation with a stride of 2, there is a difference between the detection module M3 and the detection module M2. The Max-pooling operation is used to increase the receptive field so that M3 can detect a face larger than M2. For the detection module M1, the features of conv4-3 and conv5-3 are fused to detect small-sized faces. To reduce the memory consumption, the number of channels is reduced from the original 512 dimensions to 128 dimensions through the dimension reduction of 1 × 1 convolution operation, and the bilinear interpolation up-sampling operation is performed to increase the size of the feature map. Then carrying out the corresponding summation and finally splicing on the detection module M1 after passing through a 3 × 3 convolution layer. SSH combines three designed detection modules on three convolutional layer feature maps of different depths to detect small, medium, and large images of different scales. This article fuses SSH into PCSS, which can improve the learning ability of sample data of different sizes.

Data processing
Data processing mainly includes data preprocessing and dimensionality reduction of highdimensional data using PCA.

Data preprocessing
The traffic data set in pcap format is used in this article. Since the data in the pcap file is stored in binary format, it needs to be preprocessed before it can be input into the network model for related training. Effective data processing can enhance the characteristic information in the data and improve the accuracy of the model to a certain extent. This paper proposes an original stream feature extraction method that can eliminate a large amount of zero data that is useless for feature learning. PCSS inherited and improved the methods provided by Y. Zhang, Chen, Jin, et al. (2019), Y.  and M. . First, the flow is divided by the ancestor, and the upper limit of the number of data packets that can be contained in each flow is set, and then the number of data flows contained in the attack flow is increased. Second, a certain number of data packets will be intercepted in the data stream. When the number of data packets is less than the number of interceptions, forward padding is used to facilitate short data streams. The relevant steps are as follows: Step 1: Based on the five-tuple information, the tags provided by the dataset are compared with the original pcap file. All malicious traffic is extracted separately and stored in a csv file. This paper draws on the data processing method of Li et al. and makes improvements. This paper divides the continuous PCAP flow into multiple discrete flow units and extracts information from the discrete units that quote each flow information.
Step 2: Create several blank arrays to store the required data, and determine whether the data packet belongs to malicious traffic. If so, store the tuple, original data, and label in the corresponding location ; otherwise, it will not be processed. After completing the data traversal, store all malicious traffic in a csv file. For the extracted abnormal traffic, this article refers to the five-tuple information to divide the abnormal traffic into data streams. Only five data packets are extracted for each data stream. For a data stream with more than five packets, it is regarded as a new data stream.
Step 3: Clear interference information in network traffic data packets, eliminate duplicate data and empty files, and read all abnormal data streams. The data are converted from hexadecimal data to decimal, and the data stream is reduced according to the threshold value of the number and length of the data packet. The purpose is to obtain a file for each type of malicious traffic. For the length of the tailored packet, if the length is less than 96 bits, we will fill it with data 0 to 96 bits. If it is greater than 96 bits, we will subtract the extra part. If the number of packets in the network stream is less than 5, we are at the end of the stream. Fill the former data to 480 bytes instead of introducing 0 elements, which will make the data distribution more compact and reduce redundancy.
After completing the above steps, each network traffic contains five data packets, and each of which is distinguished by the hexadecimal number "FF". The specific steps are shown in Algorithm 1.

Reduce dimensionality of high-dimensional data by PCA
The basic steps of the PCA data dimensionality reduction algorithm are as follows: (1) Normalise the original traffic data.
(2) Select the complete data set at the initial stage. Suppose there is a matrix of size m × n that must be converted into an n-dimensional vector with the input data end if 20: end for 21: pca(data, dimension) # Reducing data to a specified dimension using the pca algorithm 22: Save the descended data to the specified csv file X 11 , X 11 · · · X mn : (3) Use the following equation to calculate the N-dimensional average vector: (5) Calculate the eigenvalues and eigenvectors of the matrix. (6) Sort the obtained eigenvectors according to the size of their corresponding eigenvalues, and then select n eigenvectors to form a d × n matrix M. (7) The newly obtained matrix M is used to transform the sample into a new space, that is, the formation process of principal components.

Experiment and result analysis
At the beginning of this chapter, the environment required for the experiment is introduced, including the hardware environment, software environment, and the data set used in the experiment. In addition, evaluation indicators used in the experiment and the parameters configuration during model training are also presented. In the last part of this chapter, a specific introduction to the experimental part is launched.

Experimental environment
The experimental environment is shown in Table 1.

Hyperparameter settings
In PCSS, we fixed the size of the convolution kernel as 3 * 3. The batch size used by the model during training was 256, in which momentum was fixed at 0.9 and weight decay was set at 5 × 10 − 4 to prevent overfitting and model falling into local optimum. We use CrossEn-tropyLoss function to continuously optimise model parameters and Adam optimiser to accelerate network convergence. Setting the learning rate too large or too small will affect the convergence of the model, resulting in the model missing the best point. Therefore, a total of 10 epochs are designed in this article. The learning rate setting of each epoch is shown in Table 2.

Evaluation indicators
This chapter mainly introduces the standards for evaluating the pros and cons of detection methods in abnormal network traffic detection. Among them, all evaluation indicators are obtained from the confusion matrix (Visa et al., 2011). In the confusion matrix, TP represents a positive sample predicted by the model as a positive class, TN represents a negative sample predicted by the model as a negative class, and FP represents a negative sample predicted by the model as a positive class, FN represents the positive samples predicted to be negative by the model, and the relevant evaluation indicators are as follows: Precision represents the ratio of the predicted correct sample to all samples, Recall rate is a measure of coverage, in all positive cases, multiple positive cases are classified as positive cases, and false-positive rate is called the false-positive rate that refers to the percentage of negative samples being predicted incorrectly. Accuracy is called detection accuracy, which means that the correctly predicted sample is divided by all samples. F1_score is a weighted average of model accuracy and recall, with a maximum value of 1 and a minimum value of 0. For the above evaluation indicators, the accuracy rate, recall rate, and accuracy standards are all close to 1, indicating that the better the model is, the closer the false-positive rate is to 0.

Experimental results and analysis
In this subsection, three sets of experiments are designed to verify the performance of the network abnormal traffic detection method that combines PCA and SSH proposed in this paper. The ablation experiments are performed on the IDS2012 (Atli et al., 2018) and IDS2017 (Panigrahi & Borah, 2018) datasets, respectively. First, the feasibility of the PCA algorithm in the field of abnormal traffic detection and the superiority of the PCSS model combined with the PCA algorithm on IDS2012 compared to other methods are verified. And then, the superiority of the PCSS model is further verified on the IDS2017 data set.

Comparison of PCA dimensionality reduction experiments
Currently, there are many dimensionality reduction methods. To verify the optimality of the PCSS network model combined with PCA, experiments are carried out on IDS2012 data set by combining PCSS with different dimensionality reduction methods, and the performance is evaluated through precision, recall and other evaluation indicators. From the experimental results in Figure 4, it can be drawn that the accuracy performance of PCSS combined with PCA and ISOMAP is much better than that of PCSS combined with other dimensionality reduction algorithms, but the time consumed by ISOMAP is much higher than that of the PCA algorithm, which further verifies the feasibility of PCSS network model combined with PCA.
To understand the errors of the experimental results of the proposed network abnormal traffic detection algorithm combining PCA and SSH, the experimental results are made into the heat diagram shown in Figure 5. The diagonal numbers in Figure 5 represent the number of correct predictions and the remaining numbers are the number of incorrect predictions. As can be seen from the figure, the method proposed in this paper has achieved a high detection success rate for the four kinds of abnormal flow detection in IDS2012 data set, and the prediction error is controlled at a very low level, which further verifies the superiority of the method proposed in this paper.

IDS2012 data set experimental verification
To verify the ability of PCSS to detect abnormal network traffic in combination with PCA data dimensionality reduction, the proposed method is compared with other current abnormal network traffic detection methods on the IDS2012 data set. The specific experimental results are shown in Figures 6 and 7. From the experimental results, it can be seen that the PCSS network model proposed in this paper achieves good results without adding the PCA algorithm, and is significantly better than CNN and CNN in the evaluation indexes such as precision and recall_LSTM and other detection methods, but they are basically the same as PCNN, DT and other detection methods. After adding the PCA algorithm, the method proposed in this paper is obviously better than other detection methods mentioned in this paper.
In order to verify the validity of the PCA algorithm, additional experiments are designed in this paper. This article restores the PCCN network model and then enters the data processed by the PCA algorithm into the model. It can be seen from Figures 6 and 7 that after training the data processed by PCA and PCNN have been significantly improved in various experimental evaluation indicators, which again verify the feasibility of PCA in the field of abnormal traffic detection.

Further comparative experiments
Since the IDS2012 data set only contains traffic data of four attack types, to further explore the detection performance of the proposed PCSS algorithm on unknown traffic types in a complex environment, further verification is carried out on the IDS2017 data set. The    Tables 3-7. It can be seen that the proposed method has the highest overall accuracy compared to other abnormal traffic detection methods. Compared with the ITSN network model, it has the highest overall accuracy on the IDS2017 data set. The detection accuracy of these several categories has reached a high    Table 7. False-positive rate comparison. level but the balance of detection is poor, while the detection accuracy of other categories needs to be improved. It can be seen from the experimental results that the PCSS network model combined with the PCA data dimensionality reduction method has a high detection accuracy for both a single class and the whole class. Furthermore, its evaluation indicators such as Precision, Recall rate, and F1_score have all achieved good experimental results.

Conclusion
This paper proposes a method for detecting abnormal network traffic that integrates PCA and SSH. The method first performs PCA dimensionality reduction on the original traffic data to obtain the necessary characteristics of the data to further reduce data redundancy, and then send the processed data to the combination of feature fusion and SSH. The proposed PCSS method achieves nearly 100% detection level on the data sets IDS 2012 and IDS 2017, and its performance is much higher than that of other comparative experiments.
The experimental results show that the PCSS algorithm can resist data imbalance and can fully learn the inherent characteristics of network traffic data in fewer samples. The network abnormal traffic detection method proposed in this paper opens up a new idea for real-time protection of network security and plays a positive role in promoting the development of the network security industry. Although the network abnormal traffic detection method that integrates PCA and SSH has an excellent performance in detecting abnormal network traffic, the current network is complex and changeable and new attack types are emerging in an endless stream. Owing to the implementation of the PCSS model which is based on a closed set protocol, it cannot meet the detection requirements of new network abnormal traffic and has poor scalability. It may even be incorrectly classified as a training data set. Considering the practical application significance of abnormal traffic detection, It is necessary to design a network model, which can automatically learn the new types of attacks in the network environment. In terms of resisting various types of cyberattacks, the PCSS model will be improved in the future to enhance its practical value and practical significance.