Design of advanced intrusion detection systems based on hybrid machine learning techniques in hierarchically wireless sensor networks

Wireless sensor networks (WSNs) are an emerging military and civilian technology that uses sensors. Sensor networks are hierarchical and chaotic in remote, unmonitored sites. Wireless sensor networks pose unique security threats due to their public location and wireless transmission. WSNs are vulnerable to various routing attacks, including Black holes, Sybil, sinkholes and wormholes. In this paper, we proposed advanced intrusion detection systems based on hybrid machine learning (AIDS-HML) in wireless sensor networks to identify and classify attacks. Hybrid machine learning classifiers identify wireless sensor network dangers. Benchmark datasets are used to compare the proposed model to baseline models in terms of precision, recall, f1-score, and accuracy. The scheme is trained and evaluates prediction models. This confirms that the detection accuracy achieved 99.80% using the NSL-KDD benchmark dataset based on hybrid random forest and extreme gradient boost (RF-XGB). The hybrid cluster labelling K-Means (CLK-M) s achieved better classification accuracy of 100% using UNSW_NB15, and CICIDS2017 benchmark datasets for binary classification of label attacks. Different attack detection metrics were compared against various benchmark datasets to evaluate the quality of this work. The proposed system is efficient in simulations for feature extraction and route discovery and detection attacks achieving an accuracy of 99.46%.


Introduction
Wireless sensor networks (WSNs) are Affordable and low-power sensor nodes (Gao, 2014).Sensor nodes with multi-hop routing and self-organising intelligent sensor networks fall into this category (Gebreyesus, 2021).WSNs are autonomous spatially disseminated devices using sensors for physical and environmental conditions.Cooperatively, nodes in WSNs collect data on some physical or environmental characteristic, including noise, vibration, temperature, pressure, motion, or pollution, and relay that data to a hub node (Alsaedi et al., 2017;Farooqi et al., 2013;Han et al., 2019;Pundir et al., 2020).Raw data sensed and collected sent to the cluster head was stored and analysed at the base station (Blywis, 2009;Choi et al., 2004) as shown in Figure 1.WSNs are composed of self-configured and connected by radio signals having a low operating battery and low cost distributed hierarchically and randomly (Saravana Kumar et al., 2021;Singh et al., 2017).WSNs are recent technology and have gained significant attention for research scenarios (Singh et al., 2017).They comprise low-power and cost sensors randomly distributed over the target localisation.The sensors are distributed to act on specific tasks (Mangrulkar & Negandhi, 2018;Rouissi et al., 2019).The sensors have sensing and signal processing capabilities and activate wireless communication in WSNs (Vinitha et al., 2019).In WSNs, A gat way provides wireless connectivity using wired and distributed nodes.WSNs are the main components of the internet of things (IoT), with small, low-power sensors for data collecting and monitoring from the environment (Saidi et al., 2020;Zhao et al., 2023).Wireless sensor nodes can be configured as distributed or hierarchical fashion in the target area for deployment as shown in Figure 1.
The frequency of cyber-attacks targeting international businesses is increasing, leading to the rapid development of intrusion detection systems (IDS) in both industry and academia (Mahbooba et al., 2021).The availability and confidentiality of the data may have been compromised as a result of attempts by attackers to break the network's security through vulnerabilities in the security measures (Abdulganiyu et al., 2023).IDS is a network security solution that collects and analyses network data to detect abnormal behaviour and protect system resources (Zhang et al., 2020).It is crucial in maintaining network security (Wang et al., 2021).Anomalies and misuse are two types of intrusions that can occur in WSNs.Anomaly detection utilises mathematical models and compares estimated feature values with reference values to identify deviations from normal behaviour (Godala & Vaddella, 2020).Misuse detection, on the other hand, relies on previously observed malicious activities and their specific patterns to identify intrusions.Common causes of cybercrimes include DoS and web attacks.Attacks in WSNs can be active or passive, involving unauthorised eavesdropping, information gathering, and packet manipulation (Elsaid & Albatati, 2020).To ensure protection, WSNs employ two layers of defence.intrusion detection and prevention technologies.IDS tracks intruders, their activities, time, location, and network layer, providing valuable information to system administrators (Elsaid & Albatati, 2020).Cybercrimes pose significant threats to businesses as they introduce malicious attacks into networks.Intrusion detection is vital to cybersecurity and involves examining and identifying security breaches in an information system.
Machine learning (ML) techniques offer generic solutions and continuously improve their performance (Praveen Kumar et al., 2019).In WSNs, ML finds applications in various fields, enhancing performance and reducing the need for manual maintenance.It facilitates information extraction from large sensor-generated data, enabling machine-tomachine (M2M) communications, cyber-physical systems, and the Internet of Things (IoT).ML proves valuable in WSNs for optimising sensor nodes, energy management, localisation, node classification, routing, and detecting attacks across multiple layers of sensor networks.Machine learning strategies are employed for selecting the cluster head to enhance energy performance in Wireless Sensor Networks (WSNs) as shown in Figure 2.
ML techniques also helpful in detecting and removing malicious nodes at the cluster head, improving network reliability and operational lifetime (Praveen Kumar et al., 2019).ML algorithms efficiently classify DoS attacks and handle data aggregation and forwarding tasks to the sink node.Data aggregation plays a critical role in WSNs, impacting energy consumption, storage requirements, network load, and processing speed as in Figure 2. Various data aggregation methods, such as cluster-based, tree-based, in-network, or centralised approaches, can improve network reliability and lifespan.ML algorithms offer advantages in collecting and organising data, contributing to the security and efficiency of WSNs.

Problem statement
The problem addressed in this research is developing an advanced Intrusion Detection System (IDS) based on hybrid machine learning (AIDS-HML) to protect hierarchically structured Wireless Sensor Networks (WSNs).Current IDSs face difficulties achieving high detection accuracy and minimising false alarms.The hierarchical organisation of sensor nodes adds complexity to the design of advanced IDSs (Paliwal & Kumar, 2018)..The objective is to create a robust and efficient IDS that effectively identifies and responds to intrusions while considering the hierarchical structure of the WSN.Specific challenges to be addressed include: 1. Developing a hierarchical IDS architecture that can adapt to the structure of the WSN and effectively handle data collection and aggregation.2. Extracting relevant features from sensor data in a resource-constrained environment and employing techniques to reduce dimensionality and select the most informative features.3. Integrating multiple machine learning algorithms in a hybrid approach to leverage their strengths and enhance intrusion detection accuracy.4. Addressing the challenge of stolen sensitive information in WSNs, where the absence of physical defence lines makes it crucial to monitor network traffic flow. 5. Evaluating the proposed IDS in terms of detection accuracy, recall, precision, false alarm rate, resource utilisation, and scalability.
Addressing these challenges will contribute to developing advanced IDSs tailored for hierarchically structured WSNs, providing enhanced security and maintaining the efficiency of the network.The proposed solution will aim to overcome the limitations of existing IDSs and provide insights into the effective integration of hybrid machine learning techniques in the context of WSNs.The NSL-KDD, UNSW_NB15, and CICIDS2017 datasets are utilised for training and testing, serving as benchmarks for evaluating the proposed system.Specifically, the NSL-KDD dataset, which includes four types of attacks (Root to Local attacks, Denial of Service, Probing, and User to Root attacks), is used to test the effectiveness of the proposed approach.

Research motivation
Attacks affect Wireless Sensor Networks (WSNs) applied in different applications, including healthcare systems, industrial applications, traffic light management systems, and intelligent power generation and distribution systems for keeping the real-time demand.So it is essential to survey and study hybrid security methods by combining two schemes, such as hybrid IDS, hybrid routing, and hybrid anomaly detection techniques.Hybrid security techniques are famous for the detection of attacks in WSNs.The hybrid security techniques utilise datasets and hybrid models to evaluate the system's effectiveness for attack classification and detection.The significant contributions of this work are to explore the various hybrid security techniques in wireless sensor networks as follows.
1. Design and simulate a secure attack detection scheme against routing attack scenarios in wireless sensor networks.2. Explore the various design of hybrid security techniques using a combined attackdefence strategy.3. Explore data processing transformation techniques for evaluating security performance using benchmark datasets.4. Exploration of the various hybrid machine learning models for effective DoS attack detection and classification using a public dataset to evaluate the system's effectiveness. 5. Explore wormhole tunnels and routing techniques to provide the optimal path for data transmission nodes and suitable solutions for secure routing and monitoring mechanisms.6. Analyse the Location and detection of multiple attacks like a wormhole and black hole attacks using hybrid and combined schemes.7. Explore the hybrid techniques used for WSNs traffic analysis and detecting complicated attacks with the public dataset with normal and malicious behaviours of the network traffic data.

Paper of contribution
This paper's primary contribution is to set a novel, hybrid machine-learning approach for intrusion detection to bolster the security and performance of WSNs while minimising resource consumption.In the past, many IDS strategies have been implemented with benchmark machine learning approaches to improve the detection accuracy of WSN assaults.However, as the quantity and interconnectivity of wireless sensor nodes grow, the frequency with which routing assaults are launched also rises.Here are some key contributions of this research: • Design and planning of secure advanced intrusion detection systems for scalable and resources optimised WSNs.• The research specifically addresses the challenges of hierarchically structured WSNs.By considering the multi-level organisation of sensor nodes, the IDS can adapt its detection mechanisms more effectively, providing improved coverage and response throughout the network.• Explore how advanced hybrid machine learning models can be used in WSNs intrusion detection systems with metrics to examine the most important aspects of intrusion detection systems.
• The research provides valuable insights into the effective integration of different machine learning approaches in the context of WSNs.This knowledge can be extended to other domains where hybrid ML solutions may be beneficial.• It helps decrease the number of false alarms and boosts the efficiency of intrusion detection systems for Secure data transmission across the network for data privacy and protection in WSNs.The other sections of this paper are divided into distinct parts for easier readability and comprehension.The first section provides a high-level overview of wireless sensor networks, sensor clustering, our inspiration for this study, and the results.Section 2 summarises prior research on sophisticated machine-learning strategies that utilise IDS data.Section 3 explains the network models underpinning the proposed hybrid security system for intrusion detection.Methodology and dataset processing for hybrid security methods are described in Section 4. In part 5, we will examine the simulation and its outcomes.Section 6 depicts the experimental setup, data collection, and analysis, while the final section discusses the takeaways and suggestions for the future as depicted as shown in Figure 3.

Related works
This section's extensive literature review is conducted on various hybrid security techniques, including hybrid intrusion detection systems, machine learning techniques, and anomaly detection using benchmark datasets.The hybrid techniques effectively detect WSN attacks such as wormholes, flooding, sinkhole, and Sybil attacks in WSNs.These include hybrid intrusion detection systems, advanced HIDS, and intelligent and artificial immune hybrid intrusion detection systems (AIHIDS) (Singh et al., 2017).The hybrid routing protocols, hybrid misuse, hybrid optimisation algorithms, anomaly detection techniques, and hybrid clustering techniques are discussed in this section.(Singh et al., 2017) presented an Advanced Hybrid Intrusion Detection System (AHIDS) for Wireless Sensor Networks (WSNs).AHIDS employs a cluster-based architecture with an improved LEACH protocol to reduce power consumption in sensor nodes.Hybrid Artificial Neural Networks (HANNs) are used to detect and classify potential threats like Sybil, wormhole, and Hello flood attacks.The proposed system achieved high detection rates: 99.40% for Sybil attacks, 98.20% for Hello flood attacks, and 99.20% for wormhole attacks.Similarly, (Cepheli et al., 2016) examined a hybrid intrusion detection technique that combines flexible and tunable parameters using parallel detection methods.Their hybrid system, guided by a central node, improves DDoS attack detection accuracy.Figure 4 shows signaturebased and anomaly-detection block diagrams for attack detection as normal traffic and DDoS attacks.The detection process analyses network traffic and extracts building activity model features.DARPA 2000 dataset is used to evaluate the intrusion detection system, focusing on DDoS attacks and normal network traffic.The hybrid detection employs anomaly and signature detectors.Anomaly detector identifies normal and abnormal traffic data through feature extraction, while the signature detector uses predefined sets for traffic data features.AHIDS uses anomaly detection blocks to identify normal and abnormal data packets and misuse detection blocks to recognise various attacks as shown in Figure 4.The detection of malicious nodes in AHIDS, based on fuzzy rules, involves three steps as follow:

Hybrid intrusion detection systems
• It measures the transmission of data packet history in WSNs through base nodes.
• It selects the feature set and looking the key elements for packet classification.
• Anomaly intrusion detection techniques are established based on data packet resolution.
The fuzzy-based intrusion detection technique uses MPNN, which consists of BPNN and FFNN for anomaly and misuse detection, as shown in Figure 5.It is applied for the highest detection rate using supervising learning technique.The fuzzy base AHIDS with FFNN and BPNN achieves greater attack detection accuracy using massive clustered training.The multilayer perception is utilised for estimating the error rate, e i , using the formula as in Equation ( 1).
Where d i represents the preferred output and a i is the true output obtained from MPNN.The MPNN model consists of BPNN and FFNN and is applied to evaluate the detection accuracy of the various class attacks in WSNs.
The MPNN utilises BPNN and FFNN techniques for IHIDS to manage huge datasets and the system's stability.FFNN detects the new type of attacks, and BPNN clusters the mysterious attacks for MPNN supervised learning.The membership vector applied for the fuzziness F(V) is given by Equation (2).Where V = {µ 1 , µ 2 , . . ., µ n } is a set of fuzzy, the fuzziness values are categorised into high, low, and mid fuzziness groups with training and testing samples.(Gandhimathi & Murugaboopathi, 2020) conducted research on flow-based and crosslayer hybrid intrusion detection.Their approach aims to detect anomaly traffic and identify potential attacks based on narrow features.The process involves two phases: In the first phase, flow-based IDS is used to classify malicious nodes, and in the second phase, packets are analysed using cross-layer features to verify and detect potential threats.The flowbased IDS monitors network traffic, utilising network information to classify data as either normal or malicious, as shown below in Figure 6.Flow-based anomaly detection can exhibit a regular profile of behaviour by keeping tabs on network activity and keeping track of many parameters.Misdetection, a hybrid technique, effectively detects blackhole attacks by utilising Kmedoid clustering on a synthetic data set (Ahmad et al., 2019).This approach employs the K-medoid individualised clustering method to identify anomalies caused by diversion and blackhole assaults.Additionally, a novel SDN-based Hybrid Clone Node Detection (HCND) technique has been developed for Wireless Sensor Networks (WSNs) by (Devi & Jaison, 2020).This technique proactively identifies cloned nodes using software-defined networking, ensuring the maintenance and enhancement of Quality of Service (QoS) limitations in the WSN.Hybrid multi-tiered IDS detects cyber-attacks on vehicular networks by reducing energy consumption and malicious anomaly nodes (Yang et al., 2022).Hybrid IDS detects cloning attacks in WSNs (Devi & Jaison, 2020) and classifies IoT-based security attacks for healthcare applications using feature selection and hybrid DT-GA (Saif et al., 2022).(Rabbani et al., 2020) proposed an effective intrusion detection system that combines machine learning with traditional security techniques.This scheme ensures mathematically secure communication among nodes and detects malicious behaviour using data pre-processing and recognition modules.Recognition techniques involve training and prediction using an optimised probabilistic neural network (PNN) with the UNSW-NB15 dataset containing normal and malicious data traffic.The hybrid PSO-PNN scheme builds a selfoptimised network using particle swarm optimisation (PSO) to reduce misclassification errors and increase classification accuracy in the PNN system.PSO adapts the PNN structure, making it a self-adaptive network model using swarm behaviour patterns.Figure 9 depicts the architecture of the proposed system.The features of the data network traffic are collected from the raw network packets using tools including Netmate, BRO-IDS, and Argus.The noisy features are removed to detect malicious attacks in WSNs effectively.Numeric values and symbolic variables represent the necessary features.The numeric and symbolic representations are normalised and transformed using the statistical characteristics of Equation (3).

Hybrid machine learning techniques
Where Z is the feature value, min (Z) is the minimum value, and max (Z) is the maximum value from the feature of the samples in the dataset.S. M. Kumar (Kumar, 2022) presented an optimised hybrid deep neural network using a feature section algorithm for improving the intrusion detection of attacks using benchmark datasets UNSW-NB15 and NSL-KDD.The selected features are processed into the convolutional neural network consisting of layers using distance measurement and the correlational coefficient for input packets.The distance between two data points (a i ,b i ) with x and y input data, D, for arranging and selecting the features is given by Equation (4) as shown below.
A convolutional neural network is used to compute the selected features.The outputs are incorporated into long-term memory configuration and modified materials to improve classification precision.(Mahajan et al., 2022) explored hybrid machine learning and deep learning techniques for network traffic analysis and classification in wireless sensor networks, including attack detection using benchmark datasets.(Faysal et al., 2022) proposed machine learning techniques to detect IoT-based WSN attacks using benchmark datasets.Hybrid eXtreme Gradient Boosting and Random Forest (XGB-RF) were effective in detecting botnet attacks with feature selection and classification using various metrics.In the same year, (Alghamdi, 2022) introduced a novel optimiser technique called PO-CFNN for IoT-based IDS.The PO-CFNN method involves three stages: preprocessing, classification, and parameter optimisation, transforming networking information into a more usable format for intrusion identification.(Sadikin et al., 2020) presented a research study on a hybrid intrusion detection system for ZigBee-based IoT systems.They combined anomaly and rule-based machine learning techniques to detect attacks.A hybrid Long Short-Term Memory Network (LSTM) and Convolutional Neural Network (CNN) learning approach extracts network traffic features using a hybrid IDS with the CICIDS2017 dataset, achieving 99.50% overall accuracy for attack type detection (Sun et al., 2020).In another study, a hybrid classification strategy combines Kalman filter (KF) and Extreme Learning Machine (ELM) to train a predictive classifier on the sink node.It detects random WSN anomalies with promising results using normal and faulty datasets (Biswas et al., 2019).
For binary classification of attack detection, Hybrid k-means and Support Vector Machine (SVM) reduce training and testing times while maintaining high classification accuracy (Rose et al., 2020).Federated learning techniques are utilised to create a privacy-friendly framework across multiple devices using benchmark datasets (Liu et al., 2022).Hybrid optimisation and deep-learning-centric intrusion detection systems are deployed in IoTenabled smart cities using the Hybrid Chicken Swarm Genetic Algorithm (HCSGA) method (Gupta et al., 2022).The proposed solution involves pre-processing the dataset, feature selection with HCSGA and K-means, and classification with the Deep Learning-based Hybrid Neural Network (DLHNN) classifier using the NSL-KDD benchmark dataset.Hybrid machine learning techniques utilise sampling methods and feature selection analysis to achieve better detection accuracy (Cao et al., 2022).To address sample imbalance, a hybrid sampling method combining ADASYN and RENN is employed.Hybrid deep learning methods have shown effective identification of malicious attacks when tested on performance benchmark datasets (Ullah et al., 2022).(Umarani & Kannan, 2020) proposed hybrid anomaly detection techniques based on artificial immune systems using hybrid tissue growing techniques in wireless sensor networks to detect malicious traffic.(Yin et al., 2019) presented an anomaly detection technique that recognizes and separates normal and abnormal behaviours based on patterns of normally labelled behaviours.Data mining techniques like regression analysis, clustering analysis, outlier detection, and classification are used to extract valuable knowledge for identifying patterns of malicious nodes, improving detection accuracy and efficiency.Anomaly detection requires a machine-learning model with human effort and is error-prone.The rest of the related works are summarised as in Table 1.

Hybrid anomaly detection
Despite the existing research efforts in developing IDS based on hybrid machine learning techniques in hierarchically wireless sensor networks, some research gaps still need to be addressed.Additionally, there is a need for further exploration of the optimal combination and configuration of machine learning algorithms in the hybrid approach.Different algorithms may have varying strengths and weaknesses depending on the characteristics of the WSNs.Investigating the most effective combinations and configurations can improve detection accuracy and efficiency.These include.Compared to other explainable machine learning models, they discovered that the Gaussian process regression performed exceptionally well, with a correlation coefficient of 1, a root mean square error of 0.007, and a bias of 0.006.Ahmad et al. (2019) Misdetection and K-medoid clustering K-medoid clustering with a synthetic data set proves to be an efficient way of spotting hybrid black hole attacks.Devi and Jaison (2020) SDN-based Hybrid clone node detection The hybrid clone node detection detects the cloned node using proactive and verification processes based on software-defined networking in WSNs, maintaining and enhancing the quality of service.Singh et al. (2022b) log transformation and feature scaling on the feature set and trained the tuned Support Vector Regression (SVR) They discovered that the model accurately predicts the number of barriers with a correlation coefficient (R) equal to 0.98, a root mean square error (RMSE) equal to 6.47, and a bias equal to 12.35.Davahli et al. (2020) Genetic Algorithm (GA) and grey wolf optimiser(GWO) Reduce the dimensionality of wireless network traffic using selective features for the Internet of Things intrusion detection system with the SVM classifier.Sun et al. (2020) Convolutional neural network (CNN) and long short-term memory network (LSTM) Extracts the network data traffic features using the hybrid IDS with the CICIDS2017 dataset for evaluation and achieves an overall accuracy of 99.50% for attack type.Biswas et al. (2019) Kalman filter (KF) with extreme learning machine (ELM) A hybrid classification technique to train the sink node using a predictive classifier.The scheme is evaluated using the detection of random WSN anomalies data with the normal and faulty datasets.Ren et al. (2019) hybrid data optimisation The practical technique uses data sampling and feature selection using a Genetic algorithm and random forest classifiers using the optimal training UNSW-NB15 dataset. (continued).

Ref. Security Technique Research Findings
Regan and Leo Manickam (2019) The optimised hybrid security model Hybrid optimised technique for detection of malicious attacks using the hybrid secured model.Moon and Ingole (2015) IDS-Secured hybrid approach The scheme provides a unique security technique for preventing and detecting attacks.The scheme realises data integrity, authentication, and energy minimisation.Deepa and Latha (2019) Hybrid hierarchical secure routing using clustering The scheme selects a hybrid hierarchical secure algorithm for detection and packet delivery using a coordinate cluster head.Sakthivel and Chandrasekaran (2018) Hybrid security using dummy packets A secure routing framework for detecting malicious attacks using routing protocols and dummy packet optimisation.Yang et al. (2022) Hybrid multi-tiered IDS using machine learning It is practical for detecting internal and external cyber-attacks targeting vehicular networks using the CICIDS 2017 dataset.Rose et al. (2020) Hybrid k-means and support vector machine Reduced training and testing times with promising classification accuracy of attack detection and classification in the network.VenkataRao and Ananth (2021) Hybrid optimisation and secure clustering protocol Provides better performance using hybrid secure clustering protocol and k-means clustering for attack detection.Gupta et al. (2022) Deep learning-based hybrid neural network Effective detection and classification of attacks IoT enable networks to utilise the hybrid chicken swarm genetic algorithm method.Cao et al. (2022) Hybrid sampling method Provides better accuracy using feature selection analysis techniques.Das and Namasudra (2022) Hybrid encryption method It is better to improve security performance in IoT-based healthcare infrastructures.Devi and Jaison (2020) Clone detection technique Efficient for detection and verification of cloning attacks in WSNs.Ullah et al. (2022) Hybrid deep learning Proficient in the detection of malicious attacks using benchmark datasets.Reshma et al. (2022) The hybrid Neighbor discovery protocol Maximizes energy efficiency and improves security performance for detecting malicious nodes based on hybrid machine learning.Saif et al. (2022) Hybrid IDS approach using feature selection Utilised hybrid DT-GA to detect and classify security attacks on IoT for healthcare applications with reduced cost.
1. Limited evaluation on real-world WSN deployments.Many studies have focused on simulation-based evaluations or used benchmark datasets.Future research should include more experimentation and evaluation of real-world WSN deployments to validate the effectiveness of the proposed hybrid IDS techniques.2. Scalability and resource constraints.Hierarchically structured WSNs often operate under resource-constrained environments.There is a need for IDS solutions that can handle scalability issues and optimise resource usage while maintaining high detection accuracy.3. Dynamic adaptation.WSNs are subject to dynamic environmental changes and evolving attack patterns.IDS solutions should be capable of dynamically adapting to these changes and updating their detection models to ensure continuous and effective protection.
The literatures show that there is still a research gap in comprehensive studies that address the unique challenges and opportunities presented by the hierarchical structure of WSNs.Furthermore, exploring the optimal combination and configuration of machine learning algorithms in the hybrid approach is crucial for achieving improved detection accuracy and efficiency in WSNs.Future research should aim to fill these gaps and provide valuable insights into the design and implementation of advanced IDSs for hierarchically structured WSNs.

Network models and clustering techniques
The base station, cluster head (CH), and sensor nodes comprise the hierarchically distributed wireless sensor networks (SN) network paradigm.In this setup, the sensor nodes use a wireless connection to communicate with the cluster node and the sink node (Ghugar et al., 2019).When designing and planning the network model, the following assumptions are incorporated as shown below.
• Every mobile sensor node can roam freely within the network area (Pajila et al., 2021).
• The sensor nodes are deployed randomly.
• All of the sensor nodes are the same in every way.
• Any location within the network's range is possible for a sink to be installed.
As a result, the position of unknown nodes in the network can be calculated using beacon and sink nodes, both aware of their position and location.When it comes to routing assaults, the WSN's nodes are unprotected.In most cases, this kind of attack shortens the lifespan of the sensor nodes and causes them to run out of juice.Tunnels created by routing assaults distort the route path and use routing resources.To protect against denial-ofservice and routing attacks, the proposed network model incorporates node-level security measures.The proposed system uses cutting-edge intrusion detection systems founded on hybrid machine learning approaches to identify and pinpoint cyberattacks.
The proposed network model includes five types of nodes: sensor nodes, malicious nodes, central nodes, cluster nodes, and sink nodes shown in Figure 10.The CH acts as a root node to prevent malicious communication, and the central node and CH serve as the backbone for communication with the BS.The CH uses the isolation table to conserve energy and detect attacks, with primary and secondary cluster heads for intrusion detection (Ismail & Amin, 2019).The CH also avoids depletion energy using the isolation table for attack detection.Two primary and secondary cluster heads for intrusion detection of attacks.
The attack models provide a graphical representation of the network topology, along with details about key identities and routing information that can be used to identify and exploit security flaws in the system.As seen in Figure 11, it is presumed that they have limited means and intellect to interrupt network traffic.Several variants of WSN attacks are simulated to test the effectiveness of the proposed IDS-the attack model measures how well and securely the system functions.The application layer defines the network's threads, the Media Access Control layer, the Physical layer, the Transport layer, and the Network layer (Zou et al., 2016).Jamming security threats continuously sends harmful data, disrupting short-range connections.The transmission of jamming signals causes legitimate user and service blocking.Following is a mathematical model of an attack as in Equation ( 5).
Where I a is the transmitted information depending on the IDS, that can be correct or incorrect, e i is the expected information, and m is the malicious content information (Gebreyesus, 2021).The data is detected as malicious nodes have considered the network's transmitted data and energy auditing.
The channel priority is the major factor in the medium access control layer.The malicious nodes modify and change the back-off time using the manipulation approach.The attackers advertising false information in the network affects the layer routing information like the minimum hope count.

Methodology
The proposed system starts with system design and simulation for generating and extracting datasets.Since getting real datasets in WSNs is difficult, we use standard datasets to evaluate the effectiveness of the new advanced intrusion detection system based on hybrid machine learning models.The usefulness and efficiency of the proposed enhanced intrusion detection method based on hybrid machine learning on various classes of assaults are demonstrated by utilising publicly available datasets (Wu et al., 2020).The KDDCup 99, NSl-KDD, UNSW_NB15, and CICIDS2017 datasets are extensively used for academic study and research for attack detection evaluation as benchmark datasets.The system's effectiveness is tested using the KDD Cup 99 datasets and the intrusion detection techniques.These datasets aim to measure the IDS using a predictive decision model.The 1999 DARPA dataset is also used in this work.The dataset is evaluated using offline and real-time evaluation modes.The data is handled in several modes to establish the normal functioning of the network.

Benchmark datasets
Some of the issues with the KDD'99 data set are discussed, and a data set called NSL-KDD is proposed as a solution (Meena & Choudhary, 2017).Although it is a new and standardised genre of the KDD data set, it still suffers from some of the problems studied by McHugh.It may not be a perfect illustrative of existing real networks.Still, it is used and applied effectively as a standard data set to help researchers compare the various network-based IDSs.In addition, the NSL-KDD training and testing sets have a manageable quantity of records.The cost of doing trials on the entire dataset, as opposed to a sample, is reduced by using this method.By combining the KDD'99 Data Set with the NSL Data Set, we get several advantages over the original KDD data set.
• It avoids training classifiers on duplicate or redundant records by omitting them from the train set.• The proposed test sets don't reuse any records, thus, the approaches with higher detection rates on common data won't unfairly boost the learners' performance.• The proportion of records in the original KDD data set is inversely proportional to the number of records chosen from each group of challenging levels.• Since there aren't many records in the train or test sets, we can afford to perform the experiments on the whole set without randomly picking a subset.
The NSL-KDD dataset is also used as a benchmark to test the detection performance of the proposed system using the semi-supervised machine learning (Praveen Kumar et al., 2019) models for a class of attacks with 42 attributes and class labels.Forty-one attributes are classified into content, host, traffic, and basic features.The dataset has a total record

DoS
The attacker makes the network busy and denies the legitimate user access.

R2L
The intruder tries to gain access to the network or machine for a specific version of the FTP.U2R The attacker accesses the system's root and makes unauthorised attempts to the network.

Probe
It endeavours to assemble the data behind evading the security of the system.
of 148,515 samples sectioned into 80% of training and 20% of testing samples, as shown in Table 2, with four different classes of attacks.The vector features are extracted for training by splitting the dataset into clusters as normal and abnormal.After training, the vector features are received for classification as normal and abnormal clusters.
The dataset consists of 23 classes of attack types and is clustered into four classes of attacks, including denial of service (DoS), remote to local (R2L), user to root (U2R), and probe category.The DoS attack makes the network service busy and the authorised user inaccessible from the network (Elsaid & Albatati, 2020).The U2R attack applies vulnerabilities to the host system by sniffing the passwords of the legitimate user.The R2L injects vulnerabilities remotely into the system of the network host.The probe attack scans the network for information collecting and gathering, violating the security rule.The probe and DoS attacks have multiple links, whereas the others have single links (Pande et al., 2021).Table 3 shows the description of the four classes of attacks in the NSL-KDD benchmark dataset.
The UNSW_NB15 dataset is used as a benchmark for evaluating the effectiveness of the proposed system.This dataset has hybrid synthesised attack activities and normal network traffic data (Jatti & Kishor Sontif, 2019).The IXIA traffic generator is arranged with three virtual servers for generating the UNSW_NB15 dataset containing normal and malicious activities in the network traffic.The servers are established using public and private network traffic having IP addresses with routers.The routers are configured with a firewall that filters the traffic as normal and malicious activities.The tcpdump tool is installed on routers for capturing from the IXIA tool dispersed among the network nodes utilised as attack traffic generators with normal network traffic.The frequency distribution of the class of DoS attacks is shown in Table 4 with training and testing samples.
The method's effectiveness for identifying flooding assaults in WSNs is measured against the CIC-IDSS2017 dataset.Table 5 details some of the key elements of the training and testing dataset that can be found online at the Canadian Institute for Cyber Security Research LAB.Data about network traffic, both benign and malicious, is included in the dataset.It was manufactured to serve as a plausible in-the-background activity while gathering data.Twenty-five individuals utilising a variety of protocols were used to compile the dataset.Table 6 shows that the dataset used to develop the predictive machine learning models contains 485881 occurrences and 31 characteristics divided into training 80% and 20% testing subsets.There are five distinct varieties of DoS attacks (Bansal & Kaur, 2018), including the widely-known Slowhttptest, Slowloris, Hulk, Heartbleed, and GoldenEye.DoS attack samples used for training and testing are shown in Table 6.
Normalisation, missing value imputation, and aggregation are all part of the data processing required to rearrange the data before the training and testing phases begin.We fill in the blanks by averaging the current values (Anbarasan et al., 2020).It is possible to convert the data into binary values as 0 and 1 by using the minimum and maximum values.

Proposed AHIDS framework
Figure 12 depicts how the hybrid machine learning models used in the proposed system are used to categories network data flow as either normal operation or harmful attacks.Hybrid machine-learning approaches are mostly combined to complete the intrusion detection system for attacks in WSNs (Shi & Li, 2022).Figure 12 depicts the major technical components of the proposed framework, including sensor deployment, data collection and information processing, data aggregation and clustering, decision-making, preprocessing, machine learning and optimisation for training and testing, classifier generation, and a detection and classification module for records.Since the proposed method is simple and effective, it has great potential for deployment in real-world wireless sensor networks in hierarchical clusters.Conditional control statements are a key component of the hybrid machine learning approaches for decision-making events' outcomes.The new aspect of this decision-making method is the use of a collaborative process for data analysis, which in turn aids in the automatic construction of predictive models.Decision nodes are used for prediction, while leaf nodes are used for the final classification, as seen in Figure 12, using hybrid machine learning models.Splitting training and testing benchmarks is governed by rules and reasoning generated by the hybrid machine learning models.Target categorisation is performed using the statistical metric.
Finally, the proposed system employs hybrid machine learning approaches (Gupta et al., 2022) to detect and localise attacks utilising data from both the attack and non-attack phases.Using the modified dataset, the suggested AIDS-HML learns to identify potential attacks.Assuming that all features of every sample belong to the designated class label, AIDS-HML is an efficient classification method (conditional independence assumption).Enhanced Advanced Hybrid Machine Learning (AIDS-HML) is a more advanced and hybridised version of machine learning.
Designing advanced intrusion detection systems based on hybrid machine learning techniques in hierarchically wireless sensor networks can offer several advantages and disadvantages.Here are some potential benefits and drawbacks of the proposed system to consider.

Disadvantages.
1. Increased Complexity.Implementing hybrid machine learning techniques in intrusion detection systems adds complexity to the design and deployment process.Combining different algorithms and managing their interactions requires expertise in both machine learning and wireless sensor network domains.2. Higher Computational Demands.Hybrid systems may require more computational resources compared to single-method approaches.The processing and memory requirements can be significant, particularly in resource-constrained wireless sensor networks, which can impact system performance and energy efficiency.3. Training and Maintenance Overhead.Hybrid machine learning models typically require more extensive training and maintenance processes.Ensuring accurate model updates, handling concept drift (changes in intrusion patterns over time), and managing retraining procedures can be time-consuming and resource-intensive.4. Increased Vulnerability to Attacks.Advanced intrusion detection systems may become targets for attackers seeking to manipulate or evade detection.Hybrid machine learning models can be susceptible to adversarial attacks, where attackers exploit vulnerabilities in the learning algorithms to deceive the system.Robustness against such attacks must be considered during system design.

Sensor deployment and routing techniques
Sensor nodes are deployed based on various network models for attack detection and classification attacks in WSNs.However, WSNs face significant routing difficulties because of their restricted power supply, poor transmission bandwidth, less memory capacity, and processor capacity (Praveen Kumar et al., 2019).Due to limitations like short battery life, small memory, and low processing power, an adversary can quickly target individual nodes of WSNs when deployed in a dangerous area (Rouissi et al., 2019).It is crucial to identify malicious attacks to prevent being tricked by the adversary's fabricated data supplied by compromised nodes.Here, we distinguish between internal and external attacks on WSNs.
The goal of the external attack is to reduce the effectiveness of the WSN and is carried out by parties outside of the network.Therefore, we shall elaborate on the proposal that protects against routing attacks with data while maintaining its integrity.This paper uses HML methods for WSNs to create safe protocols for extracting features and locating new routes in moderately complex hybrid and tree network topologies.The following are a few of the many advantages that machine-based routing brings to WSNs.
• Without requiring re-programming, machine learning techniques can adapt to new environments and select new CHs for routing in WSNs.• Hybrid machine learning models can be used for various purposes in WSNs, including optimal routing, reducing communication overhead, and delay-awareness.
In this study, we employ the GA-ANN method for detecting wormhole assaults and determining energy-efficient and robust routing for WSNs.WSNs utilise GA-ANN to train their protocols based on a wide range of inputs, including residual energy, node distances, routing discovery, path selection, feature extraction, cluster heads (CH), border nodes, and the sink or base station.An enormous training set is produced, and even ANN is provided with effective threshold values for picking a set of trustworthy CH via backpropagation.Data loss in WSNs can be prevented, and energy consumption among sensor nodes is balanced with this technique.
The engineering optimisation problems used in this study find the optimal solutions under special conditions for selecting cluster head and shortest path for routing in WSNs, such as design principles, resource limitations, and safety requirements (Agushaka et al., 2022) as shown in Figure 13.Typically, metaheuristic algorithms cannot directly find the solution to constraint optimisation problems.Designing and optimising WSNs pose several challenges due to the constraints of limited energy, communication bandwidth, and processing capabilities of the sensor nodes (Ovelade & Ezugwu, 2021).However, equipped with constraint-handling techniques (CHTs), the optimisers can contend with the objective function and corresponding constraints.The purpose of optimisation is to locate the optimal answer to a problem while taking into account all relevant factors.The essence of optimisation methods lies in the gradual improvement of the generated set of solutions using a set of optimisation rules and the evaluation of those solutions using a defined objective function (Abualigah, Diabat, et al., 2021).
Unknown search space, discrete or continuous search space, non-derivative objective functions, high dimensions, and non-convexity are only a few of the characteristics of optimisation problems that prevent them from being solved in a reasonable amount of time using only classical methods (Ezugwu et al., 2022).These algorithms, coupled with appropriate fitness functions and problem-specific adaptations, have been used to improve the performance, reliability, and energy efficiency of IoT-WSNs (Abualigah et al., 2022).The algorithm evaluates the fitness of the candidate population using the objective function and constraints in each iteration, and the next generation of the candidate population is evaluated based on the calculated fitness function.
In the context of IoT-WSNs, an optimisation process involves finding the optimal values for specific parameters of the system in order to meet the system design requirements while minimising cost and finding the shortest path (Abualigah, Yousri, et al., 2021).The goal is to achieve an optimal configuration or solution that optimises the system's performance and efficiency.The parameters that are typically optimised in IoT-WSNs can vary depending on the specific application and design objectives.Some common parameters that are often optimised include.
• Node Placement.The optimal locations for sensor nodes are determined to achieve desired coverage, connectivity, and energy efficiency.This involves finding the optimal positions or coordinates for deploying the sensor nodes within the network area.• Routing.The optimal routing paths are identified to transmit data from source to destination nodes efficiently.This includes finding the shortest or most energyefficient paths considering network dynamics, congestion, and quality of services (QoS) requirements.• Energy Management.Energy consumption is optimised by dynamically adjusting parameters such as sleep-wake schedules, duty cycles, or transmission power levels.The objective is to prolong the network lifetime while meeting the application requirements.
• Resource Allocation.The allocation of network resources, such as bandwidth and time slots, is optimised to ensure efficient utilisation.This involves determining how resources should be allocated among sensor nodes or applications to maximise overall system performance.• Data Aggregation.The optimal strategies for aggregating data from multiple sensor nodes are identified to minimise redundant transmissions and conserve energy.This involves determining which nodes should aggregate data and how the data should be fused or compressed.
By employing an optimisation process in IoT-WSNs, system designers and engineers can identify the most efficient and cost-effective network configurations, enabling improved performance, energy efficiency, and overall system design.

Data pre-processing
The performance of a machine learning model is indeed influenced by the quality of the datasets on which it is trained (Singh et al., 2022b).The quality of the datasets can significantly impact the model's ability to learn patterns, generalise to new data, and make accurate predictions.Here are some key points regarding the importance of dataset quality in machine learning.
• High-quality datasets should be accurate, free from errors, and reflect the true values or labels of the target variable.• Datasets should contain all the features and attributes required for the learning task.
• The datasets should represent the real-world problem the model aims to solve.
• Balancing the dataset or using appropriate techniques to handle class imbalance is crucial to ensure fair and accurate predictions.• Proper preprocessing steps, such as cleaning, normalisation, and feature engineering, are essential for data preparation before training the model.• The size of the dataset can also influence the model's performance.
• Rigorous quality assurance measures should be applied to datasets, including data validation, outlier detection, and error handling.
The data preprocessing model converts raw network traffic into the format the classification model needs in the next stage (Zhao et al., 2023).In order to offer both training and testing data, this study measures and normalises raw network traffic using a variety of data preparation approaches.A raw dataset may not have undergone any preprocessing (Roy & Chowdhury, 2021).A raw dataset is incomplete, noisy, and possibly presented unfavourably.As a result, building machine learning models from scratch using a raw dataset is impossible, as shown in Figure 14.
Preprocessing the raw data by eliminating duplicates and standardising the format improves the efficiency of a machine-learning model.This is why the training phase of a machine learning model is so important.Here are the pre-processing techniques.Filtering: One way to clean up data is through filtering.This filter roughly estimates a desired signal pattern from a distorted signal pattern.The major goal of this filtering method is to minimise the mean square error between the estimated and intended signal patterns.Feature Selection: Feature selection approaches are essential for choosing relevant and useful features for model learning, enhancing prediction accuracy and reducing overfitting, training time, and complexity.Common methods include Filtering, Wrapping, and embedding.Selecting the right features is crucial for data mining projects, and Figure 15 illustrates selected features using the NSL-KDD benchmark dataset (Intelligence et al., 2019).To improve application performance, identifying the best feature selection method and implementing it in relevant processes is necessary.This reduces the dataset's attributes and makes associations between them, streamlining the procedure.However, there is no universal approach to feature selection, and the dataset's condition should be considered when choosing an appropriate method.The primary challenge is finding the best feature to discriminate between classes, requiring different strategies for various datasets.
The feature selection method employs a plethora of different methods.Spearman's rank correlation coefficient formula is used for a recursive feature selection process, which then dynamically selects features, as shown in Equation ( 6).
Where ρ is the correlation coefficient, x i and y i are the feature variables and x and ȳ are the mean values of x and y.
Feature engineering: Each pattern can be isolated with the help of a single clue provided by the feature engineering phase.When a raw dataset has a large feature set that is considered redundant, the feature extraction approach is used to create a derived set of non-redundant and informative features from the original feature set.
Windowing: Each pattern can be isolated with the help of a single clue provided by the feature engineering phase.When a raw dataset has a large feature set that is considered redundant, the feature extraction approach is used to create a derived set of non-redundant and informative features from the original feature set.(Saheed et al., 2022).Machine learning algorithms benefit from normalised data, achieving remarkable results in generalised prediction models.Minimum and maximum normalisation is used to standardise the dataset, scaling information between 0 and 1, avoiding overshadowing of lower numeric range features (Mojtaba et al., 2016).Normalisation also eliminates numerical issues in calculations, improving overall performance (Mojtaba et al., 2016;Saheed et al., 2022).

Machine learning and classification techniques
Data mining operations can utilise various data mining techniques, including hybrid machine learning methods such as Naive Bayes (NB), Artificial Neural Networks (ANN), Decision Tree (DT), Extreme Gradient Boosting (XGB), Extra Tree (ET), Random Forest (RF), Ensemble Stacking (ES), and cluster labelling K-Means (CLK-M) for attack detection and categorisation (Intelligence et al., 2019).Machine learning enables systems to learn and improve from experience without explicit programming, enhancing reliability, efficiency, and cost-effectiveness in computational procedures (Praveen Kumar et al., 2019).ML models are developed using automated and accurate processing of complex data with the extracted or chosen feature set used in conjunction with machine learning methods to create algorithms.Supervised learning is suitable for standard fingerprinting data, while unsupervised or semi-supervised learning can be appropriate for crowdsourced data.Selecting the right data mining algorithm based on the dataset's structure is crucial for optimal performance.This overview outlines the use of data mining methods in classification procedures on representative benchmark datasets.

Naive Bayes
Naive Bayes is a basic technique for classifying data based on probability theories to identify which classes should be included.Predictions can be made after just one scan, which is straightforward.The technique is predicated on a streamlined version of the Bayesian theorem.
Conditional probability theory is used to predict to which class a given sample from a dataset will belong.Classes for test dataset samples are determined using knowledge gained during training on the training dataset.Despite its seeming lack of complexity, the Naive Bayes algorithm is highly effective.Below are the mathematical formulations for the probabilities involved in Bayes' theorem as in (7).P(c|x 1 ,x 2 , . . ..,x n ) = P(x 1 ,x 2 , . . ..,x n |c)P(c) P(x 1 ,x 2 , . . ..,x n ) (7) Where P(x) is the probability of event x, c is the desired outcome, and x is the entire dataset's properties.
Based on Naive Bayes, an optimal cluster head selection method is utilised for safe and low-power routing in WSNs.An ideal collection of CHs will always maximise the network's lifetime while minimising the energy drain on individual sensor nodes.Naive Bayes ensures continued network flexibility in the face of dynamically added or modified features.This study describes a new adaptive integrated routing architecture for data collecting using a Bayesian approach.

Random forest
When several decision trees are trained using many different data sets, the resulting algorithm is called a random forest algorithm (RF).Breiman created this multi-technique classifier back in 2001 as an algorithm.Sub-training clusters are generated in the random forest algorithm.When forming a training cluster, preloading is used.In order to grow the trees, we employ a mechanism in which the attributes are randomly picked.The algorithm works by picking a random value from each node and utilising that as the basis for a branch as shown in Figure 16.Randomly selected factors produce the derived trees.The collected datasets are utilised as input into the Classification And Regression Trees (CART) algorithm for tree building.Each created tree is used to label the training sample and the classes assigned to which the sample is then compiled.To be processed instances are often included in the most common classification to which they belong.The RF method does not include pruning, while the CART algorithm does.An important reason why the RF algorithm outperforms the other decision tree approaches is that it doesn't rely on pruning.
The RF algorithm is quick, flexible, and more effective than alternative decision tree approaches despite using numerous tree topologies.The CART algorithm uses the GINI idex value to decide which branch to create from each node.Tree development parameters include the number of trees and the number of variables per node.The RF algorithm's basic operation is depicted in Figure 16 for attack detection and classification using training and testing the benchmark datasets.

Decision trees
Decision trees (DTs) are a type of supervised ML technique to classification that uses a set of if-then rules to simplify the process and improve human comprehension.The two types of nodes in a decision tree are the leaf nodes, which represent the outcomes, and the decision nodes, which represent the choices that lead to those outcomes (choice between alternatives).A decision tree can be used to predict a class or target by inferring decision rules from training data.The decision tree has the benefits of being easy to understand, helping eliminate confusion while making choices, and facilitating in-depth research.Connectivity, anomaly detection, data aggregation, and mobile sink path selection are just a few of the many problems that WSNs can address with the help of adopted decision trees.
The algorithm employs a divide-and-conquer technique.This algorithm, in contrast to ID3, incorporates normalising procedures.The algorithm determines the ratio based on the values of the information acquired.Building and repositioning intermediate trees is feasible at the time of the decision tree's inception.The decision tree method also employs branch pruning to eliminate potentially incorrect data and lower the error rate.Identifying a single node to begin the tree-building method is necessary if all of the samples belong to the same class; otherwise, the node will be labelled as a leaf and will not represent any classes.An optimal segmentation attribute is chosen if a node has characteristics from multiple classes, and the tree expands from there.Each feature's information gain is computed, and the feature with the highest value is chosen as the tree's decision node.During the election of the cluster leader, this is the best time to identify and remove any malicious nodes.After identifying a decision node, the procedure continues by creating a child branch off that node.If all the elements in the subgroups listed above have the same value, the procedure ends, and that value is used as the output.The process ends if the subset contains exactly one node and no distinguishing features are identified.

K-Means clustering
With minimal effort, the k -means method can divide a data set into a specified number of groups.Starting with a random sample of k locations, the nearest centres are assigned to each remaining point.After the data is partitioned into clusters, the centroid of each cluster is recalculated.Each time the algorithm is run, the cluster's centroid shifts until the algorithm reaches a plateau and no cluster centroid shifts.Suppose we define n as the total number of points and k as the total number of centroids.In that case, i is the total number of iterations, and d is the total number of attributes, then we can say that the time complexity of the k -means algorithm is O (n * k * i * d).In Equation ( 8), we see the minimisation function for the sum of squares of errors.
Where N is the number of data points in the i th cluster and ||x i -y j || is the Euclidean distance between x i and y j .The simplest clustering method, k-means, is also useful in WSNs for identifying ideal cluster heads (CHs) and detecting of malicious nodes to employ while transmitting data to the base station.This method also works well for locating productive mobile sink rendezvous spots.Choosing a different value of K can affect the outcomes in some situations.Getting the best results from the analysed data is crucial to get the value of k right.Euclidean, Manhattan, and Minkowski are just a few of the distance and neighbour node formulas that can be applied.Here are the relevant formulas given by Equation ( 9).

Hybrid-ensemble machine learning techniques
When multiple machine learning algorithms are combined into an ensemble, the resulting classification is both more accurate and faster.This approach involves several learning procedures using various machine learning approaches and then combining and categorising the results.The underlying algorithm performs two basic steps.At first, the original dataset is partitioned, and the distribution of a basic model is generated on those subsets.After doing so, the distribution is aggregated into a single model, and the results are obtained.The stacking strategy differs from standard machine learning methods because it involves a model production step.Models built from the training set are combined.You can describe the algorithm's function as follows.
• Models are created during training by employing the dataset and the training method.
• Each derived model has full annotations for all the dataset's training samples.
• The final model is built from the other models in the training dataset using the combiner method.After a final model is obtained, it categorises and tests dataset samples.• A final prediction is made using the final model once all test dataset samples have been classified and the class predicted by the stacking algorithm of the sample is chosen.
The term ensemble technique is used to describe three distinct approaches.We're bagging, boosting, and stacking here.Data mining approaches and the capabilities of the combiner models used by each of these methods are where they diverge.Stacking strives to do both instead of maximising predictive power like boosting and minimising variance like bagging does.The function that generates a single model uses the average weight in the bagging strategy, the weighted majority vote in the boosting approach, and Logistic regression in the stacking approach.
For the suggested strategy, a tree-based Parzen estimation (PTE) is employed with hyperparameter and Bayesian optimisation (BO) techniques to further enhance the classification of the hybrid machine learning models on the benchmark dataset.Fine-tuning these parameters, or hyperparameters, is integral to every machine-learning process.Hyperparameter optimisation (HPO) enhances ML performance with decreased practitioner involvement (Feurer & Hutter, 2019).Hyperparameters are optimised in a black box and global optimisation for more accurate function evaluation.This allows us to provide a nontechnical explanation of Bayesian Optimization's inner workings.Bayesian optimisation (BO) is becoming increasingly prominent in HPO for deep neural networks as a framework for the global optimisation of networks that contain expensive blackbox functions.Bayesian optimisation is a recursive method that uses a probabilistic surrogate model and an acquisition function to evaluate choices with the help of the Gaussian process.Random forest and tree Parzen estimators are just two tree-based approaches to dealing with hyperparameters (PTE).This suggested effort combines Bayesian-based optimisation (BBO) with tree Parzen estimators (TPE) to determine the optimal evaluation point for fully automated machine learning.

Performance evaluation metrics
Detection rate [18], precision, false-negative rate, and the receiver operating characteristics curve (ROC) are the performance parameters that may be measured and analysed.These indicators evaluate the system's performance, generate a categorisation report, and compare the results to those of other studies.We used a complexity matrix as one of our criteria for rating submissions (Intelligence et al., 2019).Some of the metrics used in this paper are bulleted as follow: Routing Attack Detection Rate: This metric indicates the percentage of detected routing attacks, including blackhole, wormhole, Sybil and misdirection assaults, out of the total number of simulated attacks.A higher detection rate signifies the effectiveness of the hybrid technique in identifying malicious routing behaviours.False Positive Rate: This metric measures the proportion of legitimate network activities falsely identified as routing attacks.A low false positive rate is desirable to avoid unnecessary alarms and ensure the reliability of the detection system.
False Negative Rate: This metric represents the percentage of actual routing attacks that were not detected by the hybrid technique.A low false negative rate indicates that the method can effectively capture most routing attacks without missing significant ones.Precision: Precision is the ratio of true positive detections to the sum of true positive and false positive detections.A higher precision indicates that the reported routing attacks are more likely to be genuine, reducing the chances of false alarms.Recall (Sensitivity): Recall measures the proportion of true positive detections to the sum of true positive and false negative detections.A higher recall signifies that the hybrid technique successfully identifies a larger portion of routing attacks, ensuring better coverage.F1 Score: The F1 score is the harmonic mean of precision and recall.It provides a balanced assessment of the algorithm's accuracy in detecting routing attacks.A higher F1 score indicates a better trade-off between precision and recall.Execution Time: This metric measures the time taken by the hybrid technique to analyse network data and detect routing attacks.A shorter execution time is preferred to enable real-time or near real-time response to potential threats.Resource Utilization: This metric evaluates the amount of computational resources, memory, or network bandwidth required by the hybrid technique to perform routing attack detection.Efficient resource utilisation including time and energy is essential for practical deployment.
Values from the complexity matrix are used to determine the criterion for evaluation.Following is a breakdown of the values in the complexity matrix.
• In the dataset, TP (true-positive) refers to the number of samples that were accurately predicted to be incursions.• Several samples in the normal class were correctly predicted to be in the normal class (true-negative or TN).• False-negative (FN).The fraction of intrusions samples that were wrongly classified as normal.
• Number of normal samples in the dataset that were wrongly classified as incursions (FP, or false positive).
The detection rate is calculated by dividing the TP value by the total number of samples for which intrusion estimates were calculated.The accuracy value measures how well a system performs in classifying data by comparing the fraction of data points that were correctly labelled by the system to the total number of data points.To demonstrate the system's efficiency, we employ the following mathematical Equation ( 10 It has been noted that the designed IDS can perform four different outcomes for each traffic operation.The following scenarios are generated using the confusion matrix.First, a True Positive (TP) occurs when an intrusion detection system (IDS) reports a successful detection of malicious activity on a network (Mohd et al., 2020); second, a True Negative (TN) occurs when an IDS does not report a successful detection of malicious activity, third, a False Positive (FP) occurs when an IDS reports no malicious activity, and fourth, a False Negative (FN) occurs when an IDS reports a successful detection.
Mean squared error (MSE).Mean squared error (MSE) measures the amount of error in statistical machine learning models for computing the position and distance of the wormhole attack between two points as in Equation ( 11).It assesses the average squared difference between the observed and predicted values of each sensor node's position and location, having its unique identity to detect routing attacks.When a model has no error, the MSE equals zero.As model error increases, its value increases.The mean squared error is also known as the mean squared deviation (MSD). Where.
• x i , and y i is the i th observed values.
• xi , and ȳi are the corresponding predicted values.
• n is the number of observations.
The mean squared error uses a formula that is quite close to the variances.The MSE is calculated by square root, the difference between the observed and anticipated values.That should be done for every observation.After that, divide the total by the total number of observations to get the square root.

Simulation and environmental setup
Network design and model simulations were executed in MATLAB R2021a on a Windows 10 64-bit x64-based processor running an Intel Xeon Silver 4214 CPU at 2.20 GHz 2.19 GHz (2 processors), with 128GB (128GB useable) of installed RAM.Data processing and analysis with machine learning classifiers are performed in Python libraries, including Keras numpy, Sklearn, Seaborn, and pandas using Anaconda navigator and MATLAB R2021a.The simulation parameters for running network attack scenarios are depicted in Table 7, along with the values.This study assumes that Node-0 is the final destination for network traffic.A total of 5 s are allotted for the simulation.
Simulations of wormhole routing attacks are undertaken, with the results being created using an artificial neural network and a genetic algorithm that have been genetically enhanced for optimal effectiveness.After that, a malicious node is added to the network in order to generate and extract features for both benign and malicious network traffic.This procedure is then repeated for another five seconds of simulation time in order to develop a new database.The simulation scenarios that make use of a mobility and routing protocol that is based on a random selection of intermediate nodes and mobility nodes to identify and extract features from probable routes are outlined in Table 7. Between two malicious nodes, a wormhole attack is injected, which results in the creation of a tunnel.

Experimental results and analysis
In this section, each sensor node in the network is capable of establishing connections with all other nodes, resulting in a fully connected network topology.This dynamic connectivity allows the formation of wormhole tunnels.Wormhole tunnels refer to virtual tunnels or channels created between two malicious nodes in the network.as in Figure 17 20 (a) and (b).These tunnels bypass normal routing mechanisms and enable attackers to disrupt the routing discovery process.By exploiting these tunnels, attackers can carry out various routing attacks, such as selective forwarding, blackhole attacks, or Sybil attacks.The purpose of this study is to investigate the detection and prevention of such routing attacks in wireless sensor networks.By simulating the fully connected network topology and incorporating wormhole tunnels, the researchers aim to develop effective mechanisms to discover and mitigate routing attacks.The routing wormhole attack highly affects the sensor nodes' energy consumption and timing operation for effective communication, as shown in Figure 17 (c) and (d).
The simulation results show that the proposed attack detection and classification techniques are effective, with an average detection accuracy of 99.46%, varying the hope count and wormhole tunnel of the routing attacks across the network.The results also show that hybrid techniques improve the prediction error and maximise the performance, as shown in Figure 18 (a) and (b) histograms.Figures show the differences between targets and actual outputs for computing the errors of the unknown nodes.Targets represent expected outputs, and outputs represent the actual outputs (Constraints, 2016).The error of the training data is almost 0, whereas the error of testing data is higher than that of training errors.This confirms and validates the proposed technique is effective for detecting the wormhole attack in WSNs.The validation and efficiency of the proposed system are depicted in Figure 18 (c) and (d) for detection of routing attacks at epochs 7 and 200 epochs with minimum mean squared error (MSE) of 0.0067 and 2.143x10e −08 .

Attack detection analysis
The samples from the reference datasets have been put through training and testing processes (Panigrahi et al., 2022).First, we randomly assign each sample to two groups.the training and test sets.Step two involves using the whole training set for both training and testing.Finally, cross-validation was utilised to test how well the proposed model actually worked.The area under the curve, false rate, precision, and classification accuracy are used to evaluate performance.Machine learning models are used to assess how well the proposed method performs on benchmark datasets that simulate a variety of assaults against wireless sensor networks.When assessing the efficacy of the proposed system for detecting routing attacks in WSNs, the hybrid optimised machine learning also uses the same benchmark dataset.Table 8 provides a comparison of the results obtained by using various machine learning algorithms.Cluster labelling (CL) k-means binary classification methods are used to boost the suggested system's performance further.Table 8 shows the comparative performance of the various hybrid machine-learning techniques.
The hyperparameter and Bayesian optimisation (BO) techniques and the tree-based Parzen estimation (BO-PTE) are used to boost the performance of hybrid machine learning  models for the proposed system.Table 8 displays the results of evaluating the proposed scheme's performance using the UNSW NB15 benchmark dataset and many other machinelearning models.When applied to the benchmark dataset, the binary classification method employing hybrid cluster labelling K-means obtains a classification accuracy of 100%.
Table 9 shows how merging different hybrid machine-learning models and moving data frames from one machine-learning to the other improves the proposed system's performance even further.The results demonstrate that hybrid ML models outperform their  The results show that hybrid machine techniques perform better attack detection and classification of attacks using the NSL-KDD benchmark dataset, as shown in Figure 19 (a) and (b).For attack detection and classification, the extreme gradient boosting (XGB)-an enhanced hybrid of a random forest and a decision tree achieves better results than either the random forest or the decision tree alone in terms of validity, accuracy, precision, recall, and f1-score.
The performance of the proposed technique is effective compared to L. Yang et al. (Yang et al., 2022) developed multi-tiered hybrid intrusion detection systems (MTH-IDS) for secure vehicular networks using the benchmark dataset CICIDS2017 for known and unknown attacks.They achieved average detection accuracy of 99.88% using binary classification.P. Sun et al. (Sun et al., 2020) developed a hybrid deep learning-based intrusion detection system (DL-IDS) using a convolutional neural network and a long short-term memory network (CNN-LSTM).The scheme achieved an average detection accuracy of 98.67% by extracting the network traffic.This proves that the proposed scheme effectively detects DoS attacks using the benchmark dataset in wireless sensor networks, as shown in Figure 20 (b), using attack detection performance metrics.(Kasongo, 2021) presented an intrusion detection system for the Internet of Things using random forest based on a genetic algorithm (RF-GA) for feature selection, as shown in Figure 20.This achieved average detection accuracy of 87.61%, which is less than compared to 100% using hybrid binary classification.(Suleiman & Issac, 2018) Evaluated six machine learning classifiers using UNSW_NB15, phishing and NSL-KD benchmark datasets for intrusion detection system.Random forest based intrusion detection system (RF-IDS) produced better detection accuracy using UNSW_NB15.Temporal and spatial features to enhance attack detection and classification.This confirms the proposed technique is effective for detection and localisation of attacks as shown in Figure 20 (a).
In order to achieve high-performance intrusion detection across a wide range of attack types, B. Media et al. (Intelligence et al., 2019) proposed a hybrid-layered IDS (HL-IDS) that employs several distinct machine learning and feature selection approaches as shown in Figure 20 (b).The size of the NSL-KDD dataset is decreased in the created system by first performing data preprocessing on the dataset using various feature selection algorithms.
G. H. Lai (Lai 2016) Proposed detecting wormhole attacks in WSNs using low-power routing protocol and achieving 100% accuracy with fixed range and wormhole tunnel points.This confirms the proposed technique is effective for localising and detecting routing attacks in wireless sensor networks using a benchmark dataset.Y. Yuan et al. (Yuan et al., 2018) Presented a novel lightweight method for Sybil attack detection in distributed WSNs using the approximate point in a triangle (APIT) localisation approach.They achieved an average detection rate of 90%, which is less than the proposed work.D. Upadhyay et al. (Upadhyay et al., 2021) proposed a framework for intrusion detection systems in smart grids using Gradient boosting feature selection by applying machine learning classification techniques.The scheme combines feature engineering with machine learning classifiers and achieves the performance as in Figure 21 (a).This confirms that the proposed method is effective for various applications of DoS attacks in wireless sensor networks.
A unique feature selection algorithm, the dynamic recursive feature selection algorithm, was introduced by Nancy P et al. (Nancy et al., 2020), which chooses an optimal number of features from the data set.Moreover, a sophisticated intrusion-detection system based on a fuzzy logic algorithm (IF-IDS) using the NSL-KDD dataset is shown in Figure 20.Extending the decision tree approach and including convolution neural networks are also presented as means by which to detect the invaders efficiently.The technique of intelligent feature selection algorithm named dynamic recursive feature selection algorithm (DRFSA) has been proposed in this work, which picks the important features to construct the data set.G. Qi, J. Zhou et al. (Qi et al., 2021) presented a new ECABC-BPNN, a combination of back propagation neural networks (BPNNs) and elite clone artificial bee colonies (ECABCs), that improves upon the standard BPNN's weight and threshold settings as depicted in Figure 21 (b).
The proposed system is evaluated using benchmark datasets based on accuracy, precision, recall, F1-score and AUC criteria.The proposed scheme's performance is effective compared to R. Khilar et al. (Khilar et al., 2022) and (Saheed et al., 2022) in terms of various evaluation metrics.This proves the system effectively detects and localises DoS assaults in WSNs, as in Figure 22   The next step is using ECABC-BPNN to identify threats in a computer system's network.The comparison and Conducted experiments on assault classification using benchmark dataset as shown in Table 10.
The proposed scheme is further compared for validation with previous works to detect and localise routing attacks in WSNs.S. Jiang et al. (Jiang et al., 2020) Proposed an intrusion detection system based on a secure light gradient boosting machine (IDS-SLGBM) in wireless sensor networks using the WSN-DS benchmark dataset with the class of routing attacks.
The experimental results and analysis show that designing advanced intrusion detection systems (IDS) based on hybrid machine learning techniques in hierarchically wireless sensor networks introduces several novel aspects and contributions.Here are some key points that highlight the novelty of this design: Hierarchical Wireless Sensor Networks: Hierarchical architecture in wireless sensor networks introduces an additional layer of complexity and organisation.The network is divided into multiple levels or tiers, with different roles assigned to nodes at each level.This hierarchical structure helps in efficient data aggregation, routing, and management in large-scale sensor networks.Hybrid Machine Learning Techniques: The IDS design incorporates hybrid machine learning techniques, which combine multiple algorithms or approaches to enhance the accuracy and effectiveness of intrusion detection.This hybridisation can involve integrating different machine learning models, such as combining supervised and unsupervised learning methods or combining traditional rule-based techniques with machine learning algorithms.

Advanced Intrusion Detection:
The IDS focuses on advanced intrusion detection, aiming to detect sophisticated attacks beyond simple rule-based or signature-based detection methods.Advanced attacks often exhibit complex patterns or behaviours that can be challenging to identify using traditional approaches.By leveraging machine learning techniques, the IDS can learn and adapt to evolving attack patterns, enabling the detection of novel and unknown attacks.Novelty in Feature Selection: The design may introduce novel approaches to feature selection, which involves identifying the most relevant and discriminative features from the sensor data to train the machine learning models.Effective feature selection plays a crucial role in improving the accuracy and efficiency of intrusion detection systems, especially in resource-constrained wireless sensor networks.

Scalability and Efficiency Considerations:
The design considers wireless sensor networks' scalability and efficiency requirements.Hierarchical structures and optimised machinelearning techniques are employed to reduce the network's computational overhead, energy consumption, and communication overhead.These considerations ensure that the IDS is suitable for deployment in resource-constrained environments.

Limitation of the proposed system
While design of advanced intrusion detection systems based on hybrid machine learning techniques in hierarchically wireless sensor networks presents promising approaches to intrusion detection in wireless sensor networks, it also has certain limitations that should be acknowledged: Scalability: The performance of the proposed advanced intrusion detection system may degrade with the increase in the size of the wireless sensor network.Handling a large number of nodes and data traffic could pose challenges in terms of computational resources and memory requirements.Complexity and Overhead: The hybrid machine learning techniques used in the system may introduce additional complexity and computational overhead, particularly for resource-constrained sensor nodes.This could impact the real-time responsiveness and energy efficiency of the overall network.Training Data Collection: Obtaining labelled training data for machine learning algorithms in wireless sensor networks can be challenging.Collecting a diverse and representative dataset of intrusion scenarios, including rare attacks, might be difficult due to the limited resources and controlled environment.Intrusion Diversity: As new intrusion techniques and attack patterns emerge, the detection system may face challenges in generalising and adapting to previously unseen or zero-day attacks, especially when using pre-trained machine learning models.Security and Privacy: Deploying an intrusion detection system in the network itself could potentially become a target for attacks.Adversaries might try to manipulate the system's behaviour or exploit its vulnerabilities to evade detection.Adaptability to Network Changes: As the wireless sensor network topology changes due to node failures, additions, or mobility, the intrusion detection system should be able to adapt and maintain its effectiveness.
False Alarms: Hybrid machine learning techniques might lead to false alarms in certain situations, triggering unnecessary responses and consuming valuable resources for investigating non-existent attacks.

Conclusion and future work
The proposed advanced intrusion detection system based on machine learning effectively detects and classifies attacks for scalable and manageable in hierarchically distributed wireless sensor networks.This research aims to create a classification model for an advanced intrusion detection system based on hybrid machine learning, specifically tailored for use in wireless sensor networks to detect intrusions.Each sensor node collects information on the state of its features and reports it to the cluster's central processing node.The cluster leader checks the data and then forwards it to the main cluster head.The proposed hybrid machine learning models use training and testing data to identify attacks.Our suggested IDS-HML outperforms state-of-the-art systems regarding detection and localisation accuracy in a simulated attack on a WSN.By comparing the hypothetical outcomes to earlier research, we find that they are credible.The simulation results show that the proposed system is effective for detecting routing attacks with a localisation accuracy of 99.46% of the wormhole routing attacks.The effectiveness of the suggested system has been measured in accuracy, precision, TP Rate, FP Rate, F-Measure, Mean squared error, and Time.The designed IDS-HML achieved 99.82%, 99.91%, 99.85%, 99.82%, and 100% for average detection accuracy, precision, F1-score, recall, and CLK-Means respectively, in the presence of normal and intrusion traffic using CICIDS2017 dataset as a benchmark for multiclass and binary classifications.This work is implemented using MATLAB for network planning and simulation of attack scenarios.The Python libraries are utilised for hybrid machine-learning classification techniques.This model uses logic rules for decision-making and interpretable predictive models.
Overall, the novelty lies in the combination of hierarchical wireless sensor networks, hybrid machine learning techniques, advanced intrusion detection capabilities, novel feature selection approaches, and considerations for scalability and efficiency.These elements contribute to the development of a robust and effective IDS for wireless sensor networks.
Although the proposed method performs well, it is essential to note that IoT-based WSNs are still susceptible to attacks not addressed in this study.Since the dynamics of the attacks change with time, the topology and design should cope with the attack scenarios.The countermeasures module is only provided in concept, which is another shortcoming.Therefore, we plan to investigate and eventually offer specialised advanced hybrid intrusion detection systems for each type of assault utilising benchmark datasets to evaluate hybrid machine learning techniques.In future work, we will explore collaborative advanced intrusion detection systems based on machine learning in IoT-based wireless sensor networks for different applications using benchmark datasets for evaluations.

Figure 2 .
Figure 2. Wireless sensor networks model with clustering Nodes for data aggregation to the sink node.

Figure 3 .
Figure 3.The organisation and framework of the proposed work.

Figure 4 .
Figure 4. hybrid intrusion detection model with anomaly detector and rule-based detector.

Figure 5 .
Figure 5. Block diagram for advanced and hybrid intrusion detection system.

Figure 6 .
Figure 6.Block diagram of an intrusion detection system using the flow-based technique.

Figure 7 .
Figure 7. Hierarchical hybrid intrusion detection technique in wireless sensor networks using three phases of attack detection.

Figure 8 .
Figure 8. Illustration of data aggregation and clustering in WSNs.

Figure 9 .
Figure 9. Hybrid PSO-PNN technique for attack classification and detection.

Figure 10 .
Figure 10.Hierarchical topology and configuration model for secure wireless sensor networks.

Figure 11 .
Figure 11.Jamming (a) and Sinkhole (b) attacks at the physical and network layers.(a) Jamming attacks and (b) Sink hole attacks

Figure 12 .
Figure 12.Framework for advanced hybrid intrusion detection system (AHIDS) Block diagram using attack detection and classification Model.

Figure 13 .
Figure 13.Illustration of flow chart diagram based on various phases of the proposed system.

Figure
Figure Data pre-processing, training and testing for model evaluation framework using benchmark datasets.

Filter Method :
The filter technique employs feature ranking methods for feature selection.Ranking features indicate how crucial they are when constructing a model.The results of numerous statistical tests are used to rank the features.Each feature's connection with the

Figure 15 .
Figure 15.Selection of important feature technique using NSL-KDD benchmark dataset.

Figure 16 .
Figure 16.Block diagram of random forest operation for training and testing benchmark dataset.

Figure 17 .
Figure 17.wireless sensor network deployment and routing discovery for analysing energy consumption and time elapsed.(a) Dynamic network deployment.(b) Routing discovery and feature extraction.(C) Energy consumed for nodes with time and (d) Time elapsed for each node.

Figure 18 .
Figure 18.The proposed system's performance evaluation using hybrid GA-ANN takes 100 samples and varying the number of epochs.(a) Histogram with 300 instances.(b) Histogram with 300 instances.(c) Best performance ate epoch of 7 and (d) Best performance at the epoch of 100.

Figure 19 .
Figure 19.Performance comparison of various machine learning models using NSL-KDD.(a) RF-based comparative analysis and (b) DT-based comparative analysis

Figure 20 .
Figure 20.Performance comparison of the proposed scheme-based hybrid machine learning techniques using benchmark datasets.(a) Comparison based on UNSW_NB15 and (b) Comparison-based CICIDS2017

Figure 21 .
Figure 21.Performance comparison of the proposed technique with other previous works.(a) Comparative analysis of the proposed scheme and (b) Comparative analysis based on recall (a) and (b).

Table 1 .
Summary of different hybrid techniques from previous works.

Table 2 .
(Li et al., 2018)ution of various attack classes in the NSL-KDD with training and testing samples for testing performance(Li et al., 2018).

Table 3 .
Provides a detailed technical description of the four types of attacks in the dataset.

Table 4 .
Frequency distribution of attacks in the dataset.

Table 5 .
The dataset's statistical distribution, based on a subset of its attributes.

Table 6 .
Structure of the dataset with classifications of assaults.

. 1 .
Improved Detection Accuracy.Hybrid machine learning techniques combine the strengths of multiple algorithms, such as neural networks, decision trees, or support vector machines.This combination can enhance intrusion detection accuracy by leveraging each technique's unique capabilities.2. Adaptability to Dynamic Environments.Hierarchically wireless sensor networks often operate in dynamic environments where network topology, traffic patterns, and intrusion characteristics can change over time.Hybrid machine-learning approaches can adapt to these changes and update their detection models accordingly, leading to better performance in dynamic scenarios.3. Enhanced Scalability.Wireless sensor networks may consist of many nodes, making scalability a critical factor.Hybrid machine learning techniques can handle largescale networks more effectively by distributing computational tasks among nodes and optimising resource utilisation.4.

Reduced False Positives and False Negatives
. By combining multiple machine learning techniques, hybrid systems can mitigate individual algorithms' weaknesses, reducing false positives (incorrectly identifying benign activity as intrusion) and false negatives (failing to detect actual intrusions).

Table 7 .
WSN configuration of simulation setting.

Table 8 .
Evaluation of hybrid machine learning models on standard datasets.

Table 9 .
Comparison of various hybrid ML models using NSL-KDD dataset for attack detection and classification.

Table 10 .
Performance evaluation of the proposed system using the NSL-KDD benchmark dataset.