Security situational awareness of power information networks based on machine learning algorithms

To properly predict the security posture of these networks, we provide a method based on machine learning algorithms to detect the security condition of power information networks. A perception model outlines the consequences of the abstracted perception problem. Sample data is initially pre-processed using linear discriminant analysis methods to optimise the data, get integrated features, and ascertain the best projection. To assess system safety posture and find mapping relationships with network posture values, the cleaned data is subsequently input into an RBF neural network as training data. The reliability of the suggested technique for network security posture analysis is finally shown by simulations using the KDD Cup99 dataset and attack data from power information networks, with detection rates frequently surpassing 90%.


Introduction
Dispatching automation systems are increasingly exposed to cyber dangers and attacks because of the construction of smart grids and ongoing information technology advancements (Envelope, 2022).Efficiency, global connection, data sharing, remote work, economic development, innovation, IoT, vital infrastructure, communication, digital services, education, and emergency response all depend on information networks.For instance, in 2010, the "Shock net" virus attacked Iran's uranium enrichment infrastructure by taking advantage of undiscovered flaws in information systems.Situational awareness has emerged as a novel method of cyber security evaluation and protection because traditional security techniques are no longer adequate to meet protection needs (Liu & Yang, 2023).DoS and DDoS assaults, malware, network congestion, insider threats, ransomware infections, data breaches, APTs, botnets, IoT vulnerabilities, phishing, and software flaws are security risks to a company's network performance.To sustain peak performance, effective security measures are essential.This has made situational awareness a crucial component of industrial control system cyber security.By examining crucial behavioural and state data in the network, situational awareness aims to identify whether a network is at risk from cyberattacks and quantify the security status of the system.
Situational awareness was first introduced in disciplines like sociology and avionics.The idea of situational awareness for cyber security was first presented by (Liang et al., 2021), who also built a reference model based on the concept of data fusion for the identification, assessment, and localisation of cyberattacks.Experiments in cybersecurity situational awareness encompass the preparation and collecting of data as well as the assessment and analysis of algorithms, with a particular focus on the study and optimisation of algorithms for anomaly detection and threat identification.Situational awareness in cybersecurity is not a brand-new idea in power systems.While proposing a paradigm for network security situational awareness in terms of physical security, system security, and protocol security, (Xu et al., 2021) deployed network security situational awareness technology to substation networks without performing any specific quantitative calculations on it.(Yang et al., 2021) Begins with a security situational awareness approach for computer networks, combining the probability of attack occurrence, the probability of attack success, and the threat of attack to evaluate the state of the power grid now.For effective anomaly detection and situational comprehension, the security situational awareness framework blends Linear Discriminant Analysis (LDA) and Radial Basis Function, providing thorough detection, fewer false positives, efficient resource utilisation, and flexibility.However, the method does not introduce feature extraction methods in detail, and the information extracted from the features has not been validated.A crucial stage in dimensionality reduction is feature extraction, which is accomplished using a variety of techniques.Principal Component Analysis (PCA), a methodology for converting high-dimensional data into a lower-dimensional space while retaining as much variation as feasible, is one of the frequently used approaches.By projecting information with high dimensions onto discriminative directions, it converts it into a lower-dimensional space.To determine the finest discriminative instructions, LDA computes class means and scatter matrices before performing eigenvalue-eigenvector reduction.
Machine learning offers a range of options, including neural networks and random forests, which open new opportunities for situational awareness of network security as artificial intelligence advances quickly (Chen et al., 2021a), (Liu et al., 2021), suggested a ball vector machine classifier approach for electric power information networks based on quantum genetic algorithm with optimised training parameters for precise classification of network posture.On the other hand, (Chen et al., 2021b) increase classification accuracy by splitting the dataset into subsets and integrating training and learning with a distribution line fault classification approach based on the random forest algorithm.The K-nearest neighbour technique is also used by (Wang et al., 2021) as a classifier for intrusion detection systems to find unlawful attacks.IoT network security is predicted using preprocessed data and an RBF neural network in a study technique based on machine learning algorithms.Since its launch in China Mobile, it has discovered over 65,000 suspected illicit IoT cards, enhancing efficiency, detection, and classification.This technique lowers operator expenses and false alarm rates (Meng, 2022).
This study suggests a security situational awareness approach that combines linear discriminant analysis (LDA) and radial basis function (RBF) neural networks to integrate the features of existing situational awareness methods (Shu et al., 2021).It is not possible to understand and perceive the network state accurately by using it directly as an input to the neural network due to the complexity and diversity of network feature selection (Liu et al., 2022).The samples are subsequently pre-processed using LDA to efficiently fuse and extract feature metrics to generate the best projection for the data's optimum separability.The goal of state awareness is then achieved by training the RBF neural network model with the processed input to identify the mapping relationship with the network state values (Yu et al., 2021).The RBF neural network model is adaptable and appropriate for non-linear data modelling and decision-making since it can do pattern recognition, regression, categorisation, anomaly detection, as well as time series analytics.Using a Scrappy web crawler architecture, this article investigates network security scenario awareness and measurement.Data were gathered from the China Computer Network Intrusion Prevention Centre's vulnerability database and the Zhiming network security event websites.The development of a text-based analysis tool improved data cleansing and offered complete answers.In comparison to conventional methods, the crawler algorithm boosted capacity by 12.79% and 29.33% and decreased reading time by 63.5% and 87.2% (Wu et al., 2022).
Due to the development of the smart grid and improvements in information technology, dispatching automation systems are becoming more susceptible to cyber-attacks.Through the analysis of behavioural and state data, situational awareness is a novel technique for assessing and safeguarding complex systems.Situational awareness in network security has new like neural networks and random forests.However, because of its complexity, using network state directly as input to neural networks is difficult.A strategy based on machine learning methods is suggested to forecast the security posture of power information networks.Utilising linear discriminant analysis techniques, sample data is pre-processed to find integrated characteristics and improve the quality of the data.Simulations utilising the KDD Cup99 dataset and attack data from power information networks show the technique's dependability.

Power information network security situational awareness methods
Network risks significantly affect grid operations in power systems because the grid is becoming more and more dependent on information networks (Huang et al., 2021).Performance metrics such as the operation, traffic patterns, and status tracking of devices in the power information network need to be continuously monitored, collected, and extracted (Li et al., 2021) to effectively evaluate and anticipate the cyber security posture of a system.With understanding and predictive capabilities, information network defense can become proactive rather than reactive, enabling the prompt implementation of efficient security measures to defend the grid against assault.Three key components of power information network security situational awareness are as follows.

Situation element extraction
The security condition and original data of the network under assault are obtained using a variety of sensors or detecting devices in this module, from which the typical indicators that have a bigger impact on the power information network are retrieved to serve as the data foundation for the subsequent work.In a network that is being attacked, using a variety of sensors and detection tools enables thorough monitoring, real-time analysis, and anomaly detection, providing a full picture of the network, security conditions, and forensic investigation.

Posture understanding
To map the network's security state and create a macroscopic situational awareness model, the extracted situational element information is analyzed using neural networks or mathematical models to ascertain the relationship between the information and the situational situation.The macroscopic situational awareness approach improves decisionmaking and knowledge across a variety of fields by using neural networks to extract useful information from complicated data environments.

Situation prediction
Based on the extraction of posture components from the power dispatch automation system and an understanding of the mapping model, the security risk assessment and prediction of the electric power information network makes qualitative or quantitative inferences about the values of the network security posture.The security risk assessment for the electric power information network employs a combination of qualitative and quantitative analysis to pinpoint threats, weigh consequences, and foresee weaknesses, allowing for well-informed decision-making and ongoing development.

Security posture level classification
The security posture of the dispatch automation system is divided into five assessment levels per the "Information Security Technology Information Security Risk Assessment Specification" (GB/T 20984-2007), which takes into account the risk factors of the system and the threat of attackers.The security posture values in the [0, 1] interval is then used to quantitatively describe the system behaviour and network characteristics for each level.An organised framework for managing and analyzing information security risks inside an organisation, detailing techniques, and best practices, is provided by the Information Security Technology Information Security Risk Assessment Specification.The circumstance (Zhang et al., 2021).The evaluation of machine learning-based methods for network security situational awareness is based on these security levels.To comprehensively assess the security levels and use a rating scale to quantify the values under various levels, we combine the observed phenomena of various attacks, such as the number of active ports, the severity of virus threats, and the number of open vulnerabilities (Song et al., 2021).The network security posture scale is shown below (Table 1).Through such security level classification and quantitative description of security posture values, we can make an accurate assessment of the security posture of the dispatch automation system and provide an effective evaluation and optimisation basis for machine learning-based network security posture awareness methods.

LDA
To extract classification information and reduce the dimensionality of the feature space, LDA is the projection of a high-dimensional sample into the best discriminative vector space (Wang et al., 2021).Linear Discriminant Analysis (LDA) is supervised for classification With more risks and weaknesses that could have an impact, there is a higher chance that external attackers will be successful in their invasion 4 0 .7 5 ∼ 0.9

Moderate risk
The rise in activities like network attacks and virus software can result in system failures and jeopardise the system's ability to run steadily 5 0 .9 ∼ 1.0 Highly dangerous The regular operation of the power system may be threatened at any time by harmful viruses or other weaknesses problems and improves data representation and classification performance by translating high-dimensional data into a lower-dimensional space.LDA can be used to provide optimal sample projection, extract integrated features, and eliminate redundant or complex information due to the wide variety and complexity of information network features.LDA decreases dimensionality while boosting class differences, producing instructive features for the next RBF network.With better classification accuracy and well-defined decision limits, the method excels at identifying complicated, non-linear patterns and anomalies.Both PCA and LDA are dimensionality reduction methods, with LDA boosting class separation in classification tasks while PCA concentrates on reducing data dimensions.Depending on the assignment, PCA is flexible whereas LDA emphasises definite class boundaries.
Let the number of samples collected from the power information network be n, the total number of features d, and the sample matrix with c categories (safety categories), where the number of samples of category i sample ω i is n i and satisfies Let the projection matrix W ∈ R d×d , , then the projected sample matrix is z = W T X, and the centroids of the projected class i samples ζ i and of all samples ζ : LDA finds an optimal discriminative projection vector with the following objective function: where: S B ______new sample interclass scatter matrix.S W ______ new space intra-class scatter matrix.
Solving the eigenvalue problem S B W = SW A yields the optimal projection matrix W. Steps of the algorithm for solving the LDA projection matrix.
(1) Input the training sample matrix X ∈ R d×n , where d denotes the number of feature indicators collected and n is the number of samples collected.Solve for each type of sample centreμ i as well asμ.
(2) Solve for the inter-class dispersion matrix S B and the intra-class dispersion matrix S W .
(3) Find its eigenvalues and eigenvectors by equation S B W = SW A to find the projection matrixW, the first deigenvectors.

RBF neural networks
RBF neural networks do not suffer from local minima as well as slow learning rates.(Lin et al., 2021)  Common types include string, custom, sigmoid, Laplacian, polynomial, and Gaussian.Each category captures complicated non-linear connections, polynomial, sigmoidal, localised patterns, outliers, sequence matches, and unique relationships.There are various forms of kernel functions in the implicit layer nodes, but the most used is the Gaussian function, with the function R i (z) expressed as. Where: G-base function.
Z k -k th sample vector.C i -centre of the i th hidden layer neuron.σ i -the scale function of the kernel function.
A scale function is used to change the scale or size units of data to make it less dissimilar and appropriate for investigation or modelling.A base function is a basic statistical component utilised in numerous operations and modelling approaches.The output of the network can be obtained from Figure 1 as.
Where: y-Output.ω i -network weights of the implicit layer node i and the output layer.

Algorithm flow
Combined with the theoretical description, the proposed overall prediction process for power information network security situational awareness.
Step 1: According to the properties of the power information network, raw network data is gathered, features are extracted to create sample sets, and security category labels are then applied.Table 2 displays the format of the data that was gathered.Each piece of data is crystallized into a variety of dimensions, each of which corresponds to a particular piece of collected data.The network behaviour corresponds to that piece of data's security category.
Step 2: The sample data generated in step 1 is divided into training and test sample sets, and the training sample matrix is X ∈ R d×n , where d denotes the number of feature indicators collected and n is the number of samples collected.Using the LDA optimisation process, the sample matrix Z and the projection matrix W are obtained in the optimal projection space.To improve classification accuracy, data visualisation, pattern identification, medical diagnosis, and other uses in several disciplines, linear discriminant analysis (LDA) resolves obstacles such as class imbalance, dimensional scream, non-linearity, anomaly sensibility, and data distribution reliance.
Step 3: Build the RBF neural network, use the LDA pre-processed data as the training input of the RBF, and use the attack category or security index corresponding to this sample matrix as the training output.Then train the RBF neural network model, and the training is complete when a specific network error is satisfied, i.e. discover the mapping relationship with the network posture value.Step 4: The test data is used as input and after the projection matrix Z and the RBF neural network model, the corresponding situational awareness results are obtained.
The flow chart of the algorithm is shown in Figure 2.

Security situational awareness framework
With the data acquisition module and the human-machine interface, the security situational awareness framework combines the LDA-RBF, which serves as its core module, to create the security situational awareness structure of the power information network.The LDA-RBF technique, which combines Linear Discriminant Analysis (LDA) with Radial Basis Function (RBF) to handle complicated data interactions, represents a significant development in security situational awareness.As a result, security monitoring and anomaly detection are improved.It successfully differentiates between normal and unusual behaviour, decreases dimensions, and makes use of non-linear transformation capabilities.Figure 3 depicts the power information network security situational awareness structure.Another module is the "User Interaction and Feedback Module" which improves usability and user experience by offering interactive features, feedback mechanisms, and instructions.In the field monitoring region, smart measurement devices are often set up, and the data collected by each sensor in real-time is transferred to a database via concentrator (Zeng, 2021).The historical behaviour database and the cyber threat database make up the database's two primary sections.The Historical Behaviour Database logs everyday, innocuous network or system activity over time, including user actions, regular tasks, and communication patterns.It aids in creating baselines and identifying deviations from the norm, which may point to security events or anomalies.The Cyber Threat Database offers verified data on cyber threats, assaults, weaknesses, and criminal activity.Data is compared to known threat profiles and attack patterns to help security systems find behaviours that correspond to known dangers.The two primary components of a cybersecurity database are the Historical Behaviour Database and the Cyber Threat Database.While the latter offers reference points for recognising known risks and assaults, the former develops typical behaviour patterns, improving an organisation's capacity to recognise and respond to cybersecurity problems.The network threat database is used to store various threat sample data to assess the security posture of the network, while the historical behaviour database is used to keep raw sample data and real-time acquired network data.LDA enhances class discrimination and identification accuracy by increasing inter-class distance and decreasing intra-class variance, hence optimising data preprocessing and laying the groundwork for classification methods.A security posture awareness model is created using the LDA-RBF technique, which entails gathering historical network data, using LDA for feature extraction, training an RBF neural network, modifying hyperparameters, deployment for real-time detection of anomalies, and retraining regularly.Using historical data, the LDA-RBF approach is trained to create a security posture awareness model (Ma & Zhang, 2021).Once trained, the model can use real-time data to be measured to conduct situational awareness of network behaviour.Both the network security status and alarm alerts for the monitored region are shown by the human-machine interface module, which is divided into two sections.The first part of the module generates the required rules for system requirements.

Experimental dataset
We decided to use intrusion detection assessment data from the KDD Cup99 dataset for training and testing to conduct simulation tests on the suggested security situational awareness method (Zhao et al., 2021).The KDD Cup 1999 dataset, with 4.9 million records and 41 network connection parameters, is essential for analyzing machine learning and data mining techniques for intrusion detection in cybersecurity research.This dataset includes both typical network data and four primary attack types: DoS (Denial of Service), Probe (Detection), U2R (User-to-Root attack), and R2L (Remote Login assault).DoS, Probe, U2R, and R2L are examples of cybersecurity attacks that overwhelm systems traffic, gather information about flaws, provide users more rights, and make use of authentication flaws.Effective defenses and intrusion detection systems require a thorough understanding of these categories.The dataset for each sample data includes 41 feature attributes as well as a type of label to indicate whether the data is normal or the result of an attack.
Detection of the type of cyberattack and quantitative evaluation of the cyber security posture included the two components of the experiments on situational awareness in cyber security.We can assess the effectiveness of the suggested mechanism in recognising various network threats and offer a precise quantitative assessment of the network security posture by training and testing the KDD Cup99 dataset (Lai et al., 2020).These experimental findings will serve as a crucial foundation for us to confirm the viability and efficiency of the suggested mechanism.

Network attack category detection
A portion of data samples were randomly selected for training and testing, and the types of attacks with corresponding sample numbers are shown in Table 3.
Figure 4 illustrates the recognition rates for various attack types.Figure 4 shows that the LDA-RBF approach often has a recognition rate of over 90% (Shi et al., 2019).Due to the modest amount of "buffer overflow" type samples in the dataset, the accuracy rate is 88%.In terms of accuracy rate, linear discriminant analysis (LDA), which emphasises class separability, dimensionality reduction, and feature interpretability, surpasses RBF and BP neural networks.Combining several methods might produce better outcomes.The suggested  method exhibits a strong advantage over RBF neural networks and BP neural networks due to its greater accuracy rate.LDA-BP is a hybrid technique that combines the feature extraction power of linear discriminant analysis (LDA) with the learning capabilities of backpropagation neural networks (BP) classification.
Assuming that the records in the "Normal" class are positive samples and the other classes are negative samples, the False Negative Rate (FNR) and False Positive Rate (FPR) can be used as performance indicators for the algorithm.False alarm rates are shown by FPR, whereas FNR gauges missed abnormalities.It's critical to balance the trade-offs between them.This entails modifying decision thresholds, taking into account real-world repercussions, and modifying models in response to shifting facts.The priorities, risk tolerance, and operational environment of the application will determine the proper balance.
Where: N e ____ Number of negative samples with errors.N____ Total number of negative samples.P e ____ Number of errors in positive samples.P ____ Total number of positive samples.
To reflect the superiority of the proposed method, several other algorithms were selected for comparison.The evaluation results of each method are shown in Table 4.
Comparing the LDA-RBF approach to the RBF neural network alone, there is some gain in recognition accuracy.While maintaining class separability, LDA decreases dimensionality, and the ensuing RBF neural network captures intricate non-linear patterns.In settings with complex and non-linear data distribution, this combination performs especially well.Due to the high sample count and the fact that the normal samples are only of the "Normal" type, there is no difference between the two approaches in this area (Wang et al., 2020).However, the underreporting rate of negative samples was improved by 2.02% with the LDA-RBF approach.The approach developed in this paper demonstrates a substantial advantage in network attack category detection when compared to the method in the literature (He et al., 2020).

Quantitative evaluation of security posture
The types of cyber attacks considered mainly include four categories, which are quantified in terms of threats concerning relevant literature.The threat event quantification values are shown in Table 5.
The network threat value is established based on the type of attack the network is subjected to, the network security posture level to which it belongs is assessed, and finally, its median value is taken as the network security posture value.Figure 5 displays a comparison of the output findings.Figure 5 illustrates how, in many situations, the anticipated output of LDA-RBF matches the actual circumstance very well and outperforms the expected output of RBF alone (Zhou et al., 2023a), (Zhou et al., 2023b).
The error values for each test sample are shown in Figure 6.Only three of these points, with the others within 0.2 of the error, have very small output errors, all of which are more accurate in predicting network security posture values.
Three error evaluation metrics commonly used in forecasting were selected to evaluate the forecasting results of the simulation experiments, namely Mean Absolute Error (MAE), Mean Square Error (MSE), and Root Mean Square Error (RMSE).6.
The LDA-RBF technique maintains a smaller mean error, mean squared error, and root mean squared error than the other methods, as seen in Table 6 when comparing the RBF neural network.This is because the LDA processes the data samples efficiently, leading to a larger improvement in accuracy.It demonstrates that when utilised for situational awareness, the suggested LDA-RBF approach has greater stability and accuracy.

Power information network attack experiments
Based on a domestic electric power information system, the electric power information network environment is shown in Figure 7.Among the test equipment are the vulnerability scanning tool (RSAS NX3 V6.0), Ko lai Network Analysis System (CSNAS), network performance tester (Spirent Test CenterC1), and attack tester Avalanche.A network security architecture for real-time detection and reaction to network threats is called CSNAS (Cyber Security Network Attack System).To improve cybersecurity, it uses automated incident response and anomaly detection.The Remote Sensing and Automation System (RSAS) specialises in remote data collecting, transmission, and control for uses including industrial automation and environmental monitoring.
In this environment, the operation is simulated under normal operation and Avalanche injection attack.Attack information is regularly collected from the monitoring platform, network traffic information is collected by CSNAS, vulnerability scanning information is collected by RSAS, etc.

Experiment 1
As in Figure 7, the test equipment was connected to the scheduling automation network environment, and traffic information was collected from the router at regular intervals.Traffic statistics under normal were collected every 10 s.Traffic statistics under normal are shown in Table 7; packet statistics under normal are shown in Table 8.
A composite message with a digital signature is sent from the master server to the intelligent terminal RTU, and the time difference between the sending of the message and the return of the confirmation message from the terminal is calculated, i.e. the network transmission delay.The delay test results are shown in Table 9.

Experiment 2
The attack tester Avalanche was connected to the test network through a switch, and penetration attacks were carried out to the test system, with specific types of DDoS attacks, SQL injection attacks, UDP flooding attacks, replay attacks launched on end devices, network storms, etc.
After adding the storm traffic to the network, the network traffic was re-captured (every 10 s) and the network traffic statistics under the attack are shown in Table 10; the packet statistics under the attack are shown in Table 11.
After joining the storm, the traffic on the network increases significantly and varies irregularly.
During the attack, the master emulator sends digitally signed load messages to the terminal to verify that the device can perform correctly and to calculate the time difference between sending the message and acknowledging the return message.The results of the Experimental data from an information network environment provided by a power company was collected for experimental validation.Seven categories of network behaviour were collected, and a random portion of the data was selected as training and test data.The sample set is shown in Table 13.
Figure 8 displays the identification outcomes of each attack on the information network.As shown in Figure 8, the suggested method has several benefits, particularly for the first three categories of samples, which are typically challenging to identify because port scanning, etc., has little effect on network traffic until the subsequent intrusion assault action is carried out.Due to similarities to acceptable user behaviour and system operations, the absence of obvious anomalies, and the wide variation in typical behaviour, it can be difficult to distinguish between normal or benign samples.Contextual knowledge and feature  engineering are necessary for effective identification.The first three types of samples used in intrusion detection and network security have little effect on network traffic and probably reflect normal or beneficial activity.This enables security systems to concentrate on locating potentially harmful or out-of-the-ordinary activity.The suggested approach uses LDA to pre-process the sample data to offer the samples the best possible separability.
The results of the evaluation of the algorithms were compared and the comparison of the methods is shown in Table 14.There is a near 10% improvement in the overall recognition rate and the proposed method has a very significant improvement in recognition accuracy.

Conclusion
An LDA-RBF-based strategy is suggested in this paper to establish security situational awareness of power information networks.Before optimising the inter-and intra-class relationships of the samples, the approach first performs dimensionality reduction on the sample data.The network security posture is then measured for situational awareness using RBF neural networks.By doing simulation experiments with the KDD Cup99 dataset and experimental data from power information networks, the proposed method is contrasted with alternative methods.The testing outcomes demonstrate that the approach is quite accurate at detecting network threats.
n, and the centroids of each category and all samples are μ i = 1

Figure 3 .
Figure 3. Power information network security situational awareness structure.

Figure 4 .
Figure 4. Recognition rate by attack type.

Figure 5 .
Figure 5.Comparison of output results.

Figure 6 .
Figure 6.Error values for each test sample.

Figure 7 .
Figure 7. Power information network environment.

Figure 8 .
Figure 8. Identification results for each attack in the information network.

Table 1 .
Network Security Situation Level Table.
demonstrated that RBF neural networks can approximate arbitrary nonlinear functions with any accuracy.In machine learning techniques like SVMs and clustering, the Radial Basis Function (RBF) is a kernel function that improves classification and clustering accuracy by preserving nonlinear features.The structure of RBF neural networks is shown in Figure1.

Table 2 .
Acquisition data format.

Table 3 .
Types of Attacks and Corresponding Sample Numbers.

Table 4 .
Evaluation results for each method.

Table 5 .
Quantitative values of threat events.

Table 6 .
Results of error evaluation indicators.

Table 7 .
Network Traffic Statistics under Normal Conditions.

Table 8 .
Packet statistics under normal.

Table 9 .
Time delay test results.

Table 10 .
Network traffic statistics under attack.

Table 11 .
Packet statistics under attack.

Table 12 .
Delay test results under attack.

Table 13 .
Sample Set.delay test under attack are shown in Table12.Compared to Experiment 1, the latency has increased by a factor of nearly one.

Table 14 .
Comparison of methods.