Network traffic prediction of the optimized BP neural network based on Glowworm Swarm Algorithm

ABSTRACT In order to improve the neural network structure and parameters set methods, on the basis of Glowworm Swarm Algorithm and the BP neural network, a Glowworm Swarm Algorithm to optimize the BP neural network algorithm is proposed. The algorithm uses the Glowworm Swarm Algorithm to obtain the better initial weights and thresholds of the network, to make up for the random defects of the BP neural network in the selection of connection weights and threshold and display the mapping ability of the generalization of BP neural network, and also make the BP neural network has fast convergence and strong learning ability. Applying the algorithm to the measured network flow algorithm and compared with the BP neural network and the Glowworm Swarm Algorithm to optimize the BP neural network, the simulation results show that the algorithm has higher forecast accuracy, thus proves the feasibility and effectiveness of the algorithm in the field of the forecast.


Introduction
With the rapid development of Internet and network technology, the network scale is increasing, and the network topology structure is also more and more complex, the problem of network performance and network service quality has become very prominent. The user wants to through the network to get better quality, more security services; at the same time, the network service providers hope to improve the network's controllability and management, so as to improve the utilization ratio of the network. Therefore, in the case of limited network resources, the establishment of network traffic forecast model, real-time forecasting of the network, timely to make control or adjustment, will greatly improve the quality of network performance and network services.
Network traffic has the characteristics of nonlinear, long-relevant, time-varying, self-similarity and unexpected (Beran, Sherman, & Taqqu, 1995;Park & Willinger, 2000;Roughan, Veitch, & Abry, 2000), is a typical nonlinear time series. In recent years, the research on network traffic prediction has a lot of research methods. The traditional network traffic prediction Model are Interrupted Poisson Process (IPP) Model (Heffes & Lucantoni, 1986), Markov Modulated Fluid Model (MMFM) Model (Maglaris, Anastassiou, Sen, Karlsson, & Robbins, 1988), Auto Regressive Moving Average (ARMA) Model (Heyman, Tabatabai, & Lakshman, 1992) and Auto Regressive Integrated Moving CONTACT Haitao Li lhtao425@126.com Average Model (ARIMA) (Grunenfelder, Cosmas, Manthorpe, & Odinma-Okafor, 1991), the network traffic prediction methods are based on the linear model. However, with the rapid development of network scale, the network traffic in real environment is very complex, and linear method to predict the nonlinear network traffic cannot accurately characterize the flow of these characteristics, so the prediction results are not satisfactory, and the accuracy is not high. The existing research has proved that the neural network system with nonlinear structure can approximate any nonlinear function (Hotnik, Stinchcombe, & White, 1989). Because of its powerful learning and multi mapping ability, the neural network can construct a nonlinear model with the input and output data. It can accurately describe the nonlinear relationship among various factors, so it is widely used in network traffic prediction (Atiya, Alym, & Parlosa, 2005;Hou & Wu, 2015;Hussein, 2001;Yu, 2013;Zhao, 2013). But in practical applications, the neural network has some inevitable defects, such as slow convergence speed, the potential of local extreme points and the poor ability to adapt to the network, especially the initial weights and thresholds of the neural network have a great influence on the performance of (Li & Li, 2011). Glowworm Swarm optimization algorithm is simulated firefly mating or foraging behaviour and puts forward a new swarm intelligence algorithm (Krishhand & Ghose, 2009). It not only has a strong global optimization ability, and do not require the gradient of the objective function, with ease of use, robustness and easy realization characteristics, very suitable for optimizing the parameters of the neural network. From the angle of the nonlinear time series, this paper puts forward a kind of the Glowworm Swarm Algorithm to optimize the BP neural network algorithm (IGSOBPNN),The algorithm uses Glowworm Swarm Algorithm to get better initial weights and thresholds, the method uses Glowworm Swarm Algorithm to compensate BP neural network connection weights and thresholds random selection of defects, which can play a mapping of BP neural network generalization ability, and can make the BP neural network has faster convergence and strong learning ability. Using this method to measured network traffic modelling and forecasting analysis, the simulation results show that the algorithm has higher accuracy of prediction.

Glowworm Swarm Algorithm
The basic Glowworm Swarm Algorithm is a new swarm intelligence optimization algorithm proposed by Krishnanad in 2005 (Krishhand & Ghose, 2009), it is the search and optimization process simulation into firefly individuals attract and moving process, by solving the problem of the objective function to quantify the individual merits of the firefly's location. In the Glowworm Swarm Algorithm, each firefly is distributed within the definition of the objective function space, the individual has its own decision-making radius and its own portable fluorescent powder, firefly brightness is determined by the value of the target function of its location, the greater the intensity of the above point where the firefly has better target function value, which can attract more fireflies in the direction of the movement. Due to the radius of each firefly has its own decision-making, decision-making radius will be affected by the adjacent fireflies, at the same time when the surrounding the small number of fireflies, fireflies decision-making radius will increase, so that they can attract more fireflies around. When the fireflies around data more, decision-making radius is smaller. In the end, most of the fireflies gather in multiple locations, reach the extreme value point (that is the optimal objective function value).
Assuming that the number of the firefly is N, and which the firefly i location is (x i , y i ), the corresponding target function is f(x i , y i ) and the place of firefly fluorescein value is T i , the decision radius of each firefly is updated as shown in the formula (1).
In the formula, R i d (t + 1) is the i firefly in the t + 1 generation of the decision-making radius (that is the decision radius), R s is perception radius, β is Control parameter, n t is the threshold for the number of neighbours around the firefly, N i (t) is the number of the firefly which has high luciferase in the decision radius, which the N i (t) formula can be showed with the (2) formula: In the formula, x j (t) represents the position of the i firefly in generation t,l j (t) represents the value of the fluorescent value of the i firefly in generation t. The distance between the adjacent firefly is within the range of the visual field radius of R i d . The rate of the firefly i to its neighbour firefly j is calculated by the formula (3): Firefly location update calculated by the formula (4): In the formula, s is the moving step.
After the firefly i moved to the new location, the fluorescein values will be recalculated according to the formula (5): In the formula, l i (t + 1) is the t + 1 iteration, the value of firefly luciferase; ρ ∈ (0,1) is a constant, represents the neighbourhood threshold related to the volatile of fluorescein; γ is a constant, represents the renewal rate of fluorescein; f(x i (t + 1)) is the fitness function values.
Neighbours in the collection, when fireflies i find higher luciferin value of firefly j, if the firefly i and fireflies j distance less than the radius of decision-making, the firefly i will be in p ij (t) (calculated by formula (3)) probability moved to the direction of the firefly j; Then according to formula (4) position updates and compute the objective function value for the new position; and finally, to update luciferin value according to equation (5).

BP neural network
BP neural network is a multi-layer feedforward neural network, with the input layer, hidden layer and output layer. Literature (Wang & Shi, 2002) is given the number of input layer neurons is m, the number of output layer neurons is 1, choose the number of hidden layer neurons p 3layer BP neural network, its network structure shown in  Figure 1, the neural network mapping is complete with f:R m → R 1 , which was expressed as formula (6) below: In the formula, c j is the hidden layer to the output layer connection weights, b j is the output node of the hidden layer, ε is the threshold value of the output layer.
The BP neural network uses the Sigmoid functionf (x) = 1 1+e −x as the transfer function, and the output of the hidden layer nodes is shown in the formula (7): In the formula, w ij is the weights between input layer to hidden layer, θ j is the threshold of hidden layer nodes. The BP neural network threshold of θ j , ε and connection weights w ij can be obtained through the BP neural network training, so x i+1 can be predicted.
Gradient descent method is the learning rule of BP network. That is to say, the weight of the network is modified according to the direction of negative gradient. After correction, the error objective function is minimized. The objective function is shown in Equation (8): In the formula, E p is the error function of the sample of pth, t pj is the jth component of the pth output sample, o pj is the jth network output value of the pth sample.
Using gradient descent method to train BP network often makes the neural network fall into local minimum.
Reducing network performance, therefore, this paper uses firefly algorithm to train BP neural network.

Reconstruction of phase space
Phase space reconstruction theory is the basis of chaotic time series prediction, phase space reconstruction through one-dimensional time series construct the original system of phase space structure reversely. Packard and Takens proposed a method to reconstruct the phase space of coordinate delay by using the time delay sequence (Packard, Crutchfield, Farmer, & Shaw, 1980;Takens, 1981), The essence of the coordinate delay method is to construct the phase space vector of mdimension by the delay of different time series of chaotic time series: In the formula, M = n-(m-1)τ is the number of the reconstruction space; τ is the delay time; m is the embedded dimension, that is the reconstruction phase space dimension.
Therefore, for the prediction signal of n data points x 1 , x 2 , x 3 , . . . , x n , can be in the m dimensional phase space to form a M = n-(m-1)τ state point. These phase points connections form the trajectory of n data points in m-dimensional phase space, which characterize the evolution of the system status over time.
Takens theorem proved that if the embedding dimension m ≥ 2d + 1, d is dimension of system dynamics, the phase space constituted by the original system state variables is equivalent with the dynamic behaviour onedimensional measurements in the reconstructed phase space, two-phase space of chaotic attractor points diffeomorphism, that is Takes Theorem guarantees that we can reconstruct the phase space with the original power system in the topological sense equivalence from onedimensional chaotic time series, this provides a basis for predicting chaotic time series. Coordinate delay phase space reconstruction technology has two key parameters, namely embedding determine m and the determine of delay time τ .

Calculation of embedding dimension m
In Taken's theorem, for the ideal of infinite and noiseless one-dimensional time series, embedding dimension m and the delay time τ can take any value, but in fact the time series are finite length sequences containing noise, embedding dimension and time delay cannot be any value. Otherwise, it will affect the quality of phase space reconstruction. The purpose of determining the embedding dimension is to make the original attractor and the reconstructed attractor topological equivalent. Cao method is used to calculate the embedding dimension m (Cheng & Ko, 2006), for the time series {x(n)}, where x d (i) and x NN d (i)are the i vector of d dimension space and its closest point. X d+1 (i) and x NN d+1 (i)are the i vector of d + 1 dimension space and its closest point. The average value of a(i,d) is calculated.
If the time series are determined, then E 1 (m) will not change with the increase of m to a certain value, and the embedding dimension m is determined.

Calculation of delay time τ
At present, the main method for determining the delay time τ is the sequence correlation method and the phase space geometry method. Auto-correlation method is a kind of serial correlation method, although the autocorrelation function is a simple method to calculate the delay time, but it can only extract the linear correlation between time series. The average displacement method belongs to the phase space reconstruction method, which can be linked to the correlation criterion, but the method may have a strong shake in the overall variation trend, which has a certain randomness. Therefore, this paper uses the multiple correlation method combined the auto-correlation method with the average displacement method (Wang, Sun, Fei, & Zhu, 2007), which has little computation complexity, has a strong anti-noise ability, specific calculation is as follows: The average displacement of the time series {x(n)} which under the reconstruction of m-dimensional phase space is < S 2 m (τ ) >, and the expression is like the formula (12): where N is points of the observation sequence, ignoring the edge point difference, x 2 i+jτ is recorded as a con- x 2 i , substituted into the formula (12) can be obtained: Among them, R xx (jτ ) is the auto-correlation function of the sequence at the time span of jτ , that is The complex correlation function defined by the formula (13) is: The first zero of the (14) type is the value of the time delay τ .

Basic thought
The basic idea of GSOBPNN is: According to the input and output parameters to determine the network structure of the BPNN, which determine the Glowworm Swarm Algorithm coding length of each individual. Population of each individual contains the ownership of the BPNN value and threshold value, individual fitness function value calculated by the fitness function, and through the location update, decision-making radius update and fluorescein update to find the best fitness function value corresponding to the individual. The BPNN initial weights and threshold values are assigned by the best individual which obtained by the Glowworm Swarm Algorithm optimization, reuse BPNN network optimization model to obtain the predicted value BPNN which have global optimal solution.

Optimization of BP neural network algorithm based on Glowworm Swarm Algorithm
The basic steps of the algorithm are as follows.
Step 1: coding. Individual coding method of Glowworm Swarm Algorithm real number coding, each individual is represented by a real number string, the real number is composed of 4 parts, namely: the connection weights between the hidden layer and the output layer, the connection weights between the input layer and the hidden layer, the hidden layer threshold and the output layer threshold. Each individual contains all the weight and threshold value of BPNN, each individual can represent a structure of BPNN.
Step 2: Initialization algorithm parameter. Randomly generated individual species of firefly n to form the initial firefly population, set up an initial fluorescein L 0 of each firefly, the perception radius R s , the initial step length s, the maximum step size s max , the smallest step size s min , fluorescein evaporation rate ρ, fluorescein update rate γ , the initial value of the iteration counter t = 0, set the maximum number of iterations of the algorithm t max .
Step 3: Calculation of fitness function value. Fitness is the main index to describe the quality of the individual. This paper selects (Normal Root Mean Square Error, NRMSE) as the fitness function. Its expression as the formula (15): In the formula, S is the number of predicting samples, represents the standard deviation of the time series, t p is the desired output of the samples p, y p is the actual output of the samples.
Step 4: Fluorescein update. According to the formula (5) to update each firefly luciferase value in generation t.
Step 5: Position update. In the Glowworm Swarm Algorithm, when the firefly i find firefly j which has higher fluorescein on it, if the distance between the two is less than the radius of decision, the firefly i will according to the probability of the formula (3) to be near the firefly j; Then, use the formula (5) to calculate each individual fireflies fluorescence factor, and the position of the object is updated according to the formula (4) and the value of the target function is calculated by the formula (15), then the global optimal value is updated.
Step 6: Decision radius updated. According to formula (1) update dynamic decision radius Step 7: If the number of iterations exceeds the maximum evolution times, or achieve accuracy requirements set by the user, then quit the operation, otherwise returns to Step 4 execution.
Step 8: Decomposing the individual which obtained by the Glowworm Swarm Algorithm optimization into the connection weight and threshold of BPNN, which can be used as the initial weights and thresholds of the forecasting model, and the optimal solution of the network traffic prediction is obtained by training BPNN forecasting network.

The GSOBPNN model prediction
In this paper, the SNMP protocol data acquisition method is used to monitor the flow of a CISCO6509 device to a port in Hubei information centre related nodes. The monitoring time is from 15-18 August 2015, and the sampling time interval is 5 min. Taking 500 of the data, the 500 data is divided into two parts, the first 400 data as a training set, as the training sample, the 100 data as a test sample, Table  1 gives the test set of data. Separately with BPNN and GSOBPNN two prediction models to forecast the data set, Figure 2 shows the predicted results of BPNN prediction model, in order to facilitate comparison, Figure 3 shows the forecast result which in the same scale of GSOBPNN and IGSOBPNN prediction model.
From Figures 1 and 2, we can see that the prediction results of the two kinds of prediction model can well predict the tendency of changes in the measured network traffic, and the GSOBP prediction results of neural network forecasting model is superior to the BP neural network prediction model, which shows that the Glowworm Swarm Algorithm to optimize the BP neural network for network traffic prediction is feasible and effective.

Predictive performance analysis of different embedding dimensions and time delays
Five hundred sets of experimental data collected by a centre of information in Hubei Province are regarded as a set of time series data. The maximum Lyapunov exponent of the time series is calculated by WOLF method. It is calculated that λ 1 = 0.0392 > 0. Therefore, the network traffic time series in this paper can be regarded as chaotic time series. Then the method of phase space reconstruction is used to deal with chaotic time series. Mutual information method and Cao method are used to determine the time delay and embedding dimension. The optimal time delay is 2 and the optimal embedding dimension is 3.  Due to the different network structure, the prediction results are also different. Embedding dimension and time delay determine the input and output of the structure. Different embedding dimensions and time delays are selected to continue the prediction experiment. When m = 3, τ = 1, the input is X(t) = [x(t − 2), x(t − 1), x(t)], output is s(t) = x(t + 1). When m = 4, τ = 1, the input is , output is s(t) = x(t + 1). The results of all prediction models are summarized in Table 2. Regularized root mean square error (RMSE) is used as the evaluation index of prediction results.
As can be seen from  method has a high prediction effect both in the optimal embedding dimension time delay and in the non-optimal embedding dimension time delay. So GSOBPNN method is effective in network traffic time series prediction. Well, As can be seen from Table 2, Either method has better prediction effect under the optimal embedding dimension time delay than under the non-optimal embedding dimension time delay. This shows that the embedding dimension and time delay are also the main factors affecting the prediction results.

Conclusion
Network traffic prediction has the characteristics of nonlinear, long correlation, time varying, self-similarity and sudden characteristics, and it is difficult to get the ideal result by using the traditional linear prediction model. Based on nonlinear time series, this paper uses the improved differential evolution algorithm to optimize the BP neural network centre c i , the width σ i , the network connection weights w kj of the hidden unit basis function, so as to better play the BP neural network and the fitting effect of nonlinear approximation capability and get the best neural network prediction model. The GSOBPNN forecasting model is applied to forecast the network traffic, and the accuracy of prediction is compared with that of BPNN. The simulation results show that the proposed method can be used to predict the network traffic more accurately and has good application prospect in network traffic prediction.