Optimized cluster head selection using krill herd algorithm for wireless sensor network

ABSTRACT Wireless Sensor Network (WSNs) can perform transmission within themselves and examination is performed based on their range of frequency. It is quite difficult to recharge devises under adverse conditions. The main limitations are area of coverage, network’s lifetime and aggregating and scheduling. If the lifetime of a network should be prolonged, then it can become a success along with reliability of the data transferred, conservation of sensor and scalability. Through many research works, this challenge can be overcome which are being proposed and the network’s lifespan improved which can preserve the sensor’s energy. By schemes of clustering, a low overhead is provided and the resources are efficiently allocated thus increasing the ultimate consumption of energy and reducing interfaces within the sensor nodes. Challenges such as node deployment and energy-aware clustering can be considered as issues of optimization with regards to WSNs, along with data collection. An optimal solution can be gotten through evolutionary and SI algorithm, pertaining to Non-deterministic Polynomial (NP)-complete along with a number of techniques. In this work, Krill Herd Algorithm based clustering is proposed.


Introduction
Wireless Sensor Network (WSN) is a self-establishing network with numerous sensors without proper infrastructure. SNs can be either wired or wireless, can be optical or some other mode of communication, but the most suited one is the short-range wireless low-power communication mode. Data processing, information collection and wireless communication along with multiple other functions and many SNs work together in forming robust SNs. The concurrent task of sensing, gathering ecological information and processing, monitoring and relaying data to the users can be accomplished using the consolidation of sensor technology, embedded computing, distributed information processing and wireless communication. WSNs comprise battery-powered devices that are energy constrained; when the battery drains, the sensor nodes are generally inaccessible post installation during which an alternative source of energy is difficult to incorporate. Expanding the network's lifetime is an important issue in the WSN [1].
The lifetime and efficiency of networks are influenced directly by the quality of topology control, where a good topology scheme depends upon a complete methodology of evaluation. Along with those attributes and system features, major considerations are given to three following indications in order to assess the WSN topology control: • Coverage -this is the metric for quality of service of WSN, whose focus is primarily on the handling rate of original node positioning and if the nodes could get signals of Region of Interest (ROI) to its entirety as well as precisely. • Connectivity: SNs, in general, are of large scale and so connectivity is an assurance that there can be a delivery of data information obtained by sensors to BS. • Network lifetime: This is the time taken from start to the dead node percentage coming to a threshold.
One of the energy management features of WSNs is clustering that classifies the network into multiple clusters and a node is allotted as CH in each cluster [2]. The work of BS is reduced by CH by consolidating the data received from each node and then sending it to BS. This aids in energy conservation of resource constrained WSNs as BS receives data from fewer nodes. [3]. The following are the advantages of CH: (1) Data aggregation is enabled at CH in order to discard the unwanted and uncorrelated data, thus saving the sensor node energy. (2) As just CHs have to continue the local route set-up, routing can be managed easily and so just small routing information is required; this again enhances the scalability of the network to a significant extent. (3) As the SNs communicate only with their CHs, communication bandwidth is also conserved thus avoiding the exchange of redundant messages within themselves.
Clustering algorithm helps in reduction of energy consumption [4] in WSN. Clustering algorithm operates in rounds and there are two phases in each of them -set-up and steady phase. Nodes are arranged as independent sets or clusters. Each cluster has a cluster head (CH) that is chosen. The sensed data are sent to the sink via the CH. Data of the sensor nodes that belong to a cluster is collected by the CH. There are several data aggregation techniques that are applied to the clustering algorithms in the important information form. This information is collected and relayed to the BS. In maintaining the energy efficiency of the clustering algorithms, the CH has a major role to play. The position of the selected CH in the cluster determines the inter-cluster communication distance whereas the distance among the clusters determines the intra-cluster communication distance. The clusters that have high intra-cluster communication distance would devour more amount of energy compared to the other clusters [5].
With regards to WSNs, LEACH [6] is the primary hierarchical cluster-based routing protocol, at first classify nodes into clusters, and in every cluster there is a specialized node which has additional privileges which are referred as CH which generate and operate Time Division Multiple Access (TDMA) schedule and transmit the amassed data from nodes to BS, in which Code Division multiple Access requires these nodes. The rest of the nodes are called cluster members. The communication method is classified into rounds in LEACH and set-up and steady-state phase are present in every round. The advantages of LEACH are as follows: • A particular node which has served as CH once cannot be chosen as a CH again and so every node has an equal opportunity up to a certain extent to share the load imposed upon CH. The disadvantages of LEACH are as follows: • Only a single hop cluster is performed here openly from CHs to BS, routing scheme that is unconnected to huge-area networks. • With regards to SNs, real load balancing cannot be ensured by LEACH with differing energy amounts, as CHs are chosen with relevant to probabilities without any consideration being given to energy. SNs with low initial energy perform by means of CHs used for a same number of rounds as other like another SNs, while the ones with greater initial energy die prematurely which can lead to energy holes and coverage issues. • As the selection of CH is based on probabilities, it is complex to predetermine CHs to be distributed consistently through the entire network. • An extra overhead is brought by the idea of dynamic clustering. For example, there might be a change in CHs and advertisements which might decrease the gain in energy consumption [7].
CH-selection is an optimization and an NP-hard problem. When there is scaling up of the network, canonical optimization techniques become ineffective, It takes a number of engineering fields to handle different hard optimization dilemmas and reveals its supremacy or economical over the persisting metaheuristics such as genetic algorithm (GA), Ant Colony Optimization (ACO), Particle Swarm Optimization (PSO), etc. There are a number of protocols which enhance the life of WSN network that includes Artificial Bee Colony (ABC), ACO and Cuckoo Search (CS). A good SI algorithm is one which shows improvement in global search and has increased convergence for the global best solution. Numerous hybrid techniques have been put forth so as to enhance this.
Bio-inspired optimization algorithm which holds good for NP-hard problems are GA and KH optimization algorithm. Moreover, it can be implemented easily and high-quality solutions are given and has appreciable exploration and exploitation features which can lead to quick convergences with the capacity to escape from local minima [8]. This work involves with CH selection using GA and KH algorithm for WSN. KH algorithm consists of more benefits such as it is very efficient for solving structural engineering problems, simplicity, easy implementation, it is an effective optimization method comprising both examination and manipulation mechanisms for pursuing the best solution.
The remainder of the investigation is organized into the following sections. Related work in literature is discussed in Section 2. The number of techniques used is explained next section. Experimental effects are discussed in Section 4 and Section 5 discusses on all the work performed in the research work and stated the future scope.

Related works
Singh and Lobiyal [9] suggested the PSO concept to generate the energy-aware clusters by optimally choosing the CHs. By locating the optimal position of the CHs, there is a reduction in the cost by the PSO. Additionally, PSO based technique is semi-distributed as it is executed within a cluster and not the BS. Intracluster distance, residual energy, node degree and the number of possible CHs are the basis for selecting the objective function. To add to this, the anticipated number of packet retransmissions with the approximated route to the CHs is influenced by the suggested energy consumption model. The efficacy of the suggested work has been compared with its influence on the lifetime of the network as well as average packet transmissions and this has been proven by simulations.
Karaboga et al., [10] presented an energy efficient clustering algorithm. The ABC algorithm is employed for improving the network lifetime. In the clustering methods, the intellectual foraging performance of the honeybee swarms has been simulated. The suggested technique has been compared with LEACH and PSO that are used in many routing applications. The empirical results have revealed that ABC based clustering may be effectively applied to the WSN routing protocols.
Selvi et al. [11] suggested the Honey Bee Optimization (HBO). Its objective is a reduction in the consumption of energy by searching for an optimal path, with reduced expense. The aim of the concept is improvisation in the network lifetime and throughput. In terms of energy efficiency parameters like scalability and link quality, the existing technique delivers better performance. Thus, the effective searching features of the ABC approach that are biologically inspired are used for building the energy clusters. Matlab simulations can be used for verifying this technique.
Sirdeshapande and Udupi [12] proposed a competent optimization algorithm referred to as Lion (FLION) clustering algorithm with the aim of creating an energy effective routing path. By means of a rapid collection of CHs, the energy and the lifetime of the network nodes can be developed by this clustering algorithm. Added to it, five objects like delay, normal energy nodes, cluster and inter-cluster distance and CH energy are the basis for this fitness function. For effective routing path, the suggested fitness function locates the rapid cluster centroid. It has been shown via results that the network lifetime is enhanced using the suggested FLION based multi-objective clustering algorithm.
Though Multihop LEACH is proposed in the literature, it proves to be inefficient as optimization is NPhard. A new technique is dealt with by Vijayalakshmi and Anandan [13] which chooses the optimal path to route that can enhance the lifespan of the network and its energy effectiveness. A number of metaheuristic techniques, in specific, PSO have been used effectively but with decreased local optima problem. The basis for the presented technique is PSO and Tabu Search (TS) algorithms. The effectiveness of the proposed Tabu PSO is documented through results which show enhancing number clusters formed, percentage of alive nodes and the decrease in average packet loss rate and average end-to-end delay.
Cat Swarm Optimization (CSO) was proposed by Chandirasekaran & Jayabarathi [14]. This is a novel evolutionary approach that aims to decrease the intracluster distance among the nodes in a cluster and their CH thereby optimizing the energy distribution in the WSNs. The sensor nodes that form the cluster aid in analysing the WSN protocol performance. The proposed scheme uses the received signal strength, the remaining voltage in a battery and intra-cluster distance of sensor nodes which are deployed in a field and are grouped into clusters. This is the unique trait of CSO. The results of the CSO have been compared with LEACH-centralized (LEACH-C) and swarm-based PSO. The outcomes have demonstrated a significant improvement in the battery energy levels as compared to the PSO and the conventional LEACH.
On the basis of novel Chemical Reaction Optimization (nCRO) paradigm (nCRO-UCRA), Rao & Banka [15] suggested Unequal Clustering and Routing Algorithms (UCRA). In this scheme of clustering, the network is divided as groups of unequal sizes in such a method that the smaller sized clusters are nearer to the sink and the larger ones are farther away. Thus, based on nCRO paradigm, this develops into a CH selection algorithm; on the basis of the derived cost function, the non-CH sensor nodes are allocated to the CHs. This is followed by a routing algorithm that is based on nCRO based technique. Systems of molecular structure encoding and new possible energy utilities are used on behalf of formulating these algorithms. Thus for different conditions of the WSNs involving different number of CHs and sensors, there is an extensive simulation of nCRO-UCRA.
An energy efficient CH selection algorithm is proposed by Jadhav and Shankar [16] whose basis is on Whale Optimization Algorithm (WOA) called WOA-Clustering (OA-C). Accordingly, energy-aware CHs are chosen by the proposed algorithms based on a fitness function where residual energy is considered and the sum of the energy of adjacent nodes. Furthermore, evaluation of WOA-C is done against standard conventional protocols such as LEACH. Extensive simulations show the superior act of the projected algorithm created on residual energy, network lifetime as well as longer stability period.
Clustering Energy-Efficient Transmission Protocol (CEETP) for WSNs was proposed by Chen et al. [17] on the basis of Ant Colony Path Optimization (CEETP-ACPO). Initially, the Distributed Cluster Computing Energy-Efficient Routing Scheme (DCCERS) is employed toward CHs based on centre of gravity and node energy in the available range of nodes, and then there is a clustering of SNs. Secondly, the optimal path is chosen by utilizing improved ACPO algorithm by the next-hop nodes in the range that is managed by ring-angle search model and transition probability by path pheromone is calculated, distance and current node energy. The effectiveness of the projected protocol shows simulation results.
An Improved Cuckoo Search (ICS)-based energy balanced node clustering protocol was proposed by Gupta and Jha [18] where a new objective function is used for uniform distribution of CHs. Additionally an Improved Harmony Search (IHS) based routing technique is tried out for routing between CHs and the sink. Compared to the other state-of-the-art protocols, the proposed Cuckoo-Harmony Search-based incorporated clustering and routing protocol show important development.
The development of a well-organized clustering algorithm can play a vital part in enhancing the lifetime of network. Shopon et al. [19] presented a central method regarding an energy-awareness of WSNs using the Krill Herd (KH) algorithm. The performance of the proposed algorithm is measured using clustering protocols. The simulation results proved that the proposed algorithm maximized sensor network lifetime than other algorithms of the same category.

Methodology
The objective of data aggregation is to reduce superfluous data transmission and improve the life of energy in WSN. Data aggregation should take place using effective clustering scheme. This section involves an energy efficient CH selection utilizing LEACH protocol, CH selection using GA and KH optimization algorithm are discussed.

Energy efficient cluster head selection using LEACH protocol
Some of the nodes are arbitrarily selected by the LEACH; these CHs have to poise the energy consumed at the sensor nodes and the role is rotational. Data collection is centralized and periodic performance is monitored. There are two rounds in which the LEACH operations are divided; every round has the set-up and the steady-state phase. Based on the probability that is computed for suggesting the percentages for CHs in the network that is pre-determined and the frequency with which a node becomes a CH, the decision of a node to become a CH or not for the current round is determined. This decision is undertaken when the node n selects a random number between 0 and 1. At the presents round, the node becomes a CH if the number is lower than a threshold T(n) [20] which is set as in Equation (1): Here, p represents the chosen percentage of CHs, represents the existing round number and G represents the set of nodes that have not been chosen as CHs in the last 1/p rounds. This threshold can stand for selecting every node as CH at some point once in 1/p rounds. LEACH randomly selects the CH and this results in an unbalanced level of energy that is stored in the nodes. This also leads to an increase in the total energy degenerate in the method. Also, every interval a node becomes a CH, large areas cannot be covered and assumed. This will lead to inefficiency as the CHs that are farther from the BS end up spending more energy than the ones that are located near it.
Additional features have considered for optimizing the selection process of CH for ensuring that the load is appropriately distributed over the entire network [21]. The aim here is to attain energy efficiency with regards towards both the network lifetime and the energy utilization. Thus, by regulating the threshold T(n) symbolized in "Equation (1)", qualified to the node's remaining energy, it extends the LEACH's stochastic CH selection algorithm. This threshold is used by every node in every round to determine whether or not to become a CH and is presented in Equation (2): where the E residual and E initial represents the remaining and initial energy of the node before the transmission.
The optimal number of cluster is regulated bypracticing the "Equation (3)". Where k opt is the optimal CH number, N is total number of sensor nodes, M stands for the node's length, d toBS states the distance among nodes and the BS.

CH selection using genetic algorithm (GA)
John Holland in 1970 [22] introduced the GA is on the basis of evolutionary Charles Darwin theory. This heuristic adaptive algorithm depends on the biological genetic evolution. This is employed for solving dynamic problems and is robust in searching for solutions in populations. The application of this algorithm is employed to solve various NP-hard problems. However, encoding a problem with regard to a set of chromosomes wherein every chromosome signifies a clarification to the problem is the main drawback of solving a problem using GA. The quality of a chromosome can be assessed using a fitness function.
Each chromosome quality is assessed by a fitness function. On the basis of the fitness value, crossover and mutation operations are exploited on the decided chromosomes. Through concatenating elements of two chosen chromosomes, new solutions known as offspring are generated. For the offspring that has been produced, mutation is used for changing one or more genetic elements which avoid it getting trapped in the local minima.
GA for proposed solution of CH selection process is shown below [23]: Population -The population is comprised of several individual solutions for a problem. The accuracy of the algorithm is directly proportional to the dimension of the population. The number of nodes present in the network determine the length of the individual. A 1 in the individual means that the node is a CH whereas a 0 means the nod is a member node. The initial population is produced arbitrarily. Figure 2 shows the block diagram of the Cluster Head based GA.
Fitness function -This is an indicator of survivability. The fitness of every individual is computed based on the fitness function. There are four parameters to be considered with regard to this in the current work:

• Remaining Energy (E) • Number of CHs • Total Intra-cluster Communication Distance (IC) • Total Distance from CHs to Base Station (BSD)
Value of last to parameters are based on the first. Lesser the number of CHs, less is the total distance from the CH to BS, and higher is the intra-cluster communication distance and vice versa.
The fitness function is presented as in Equation (4): where N represents the total number of nodes present in network. Thus, as noticed from the fitness function, more prominence is given to the total distance from the CHs to the BSD. Selection -This method selects individuals from the existing population for producing a new population. The objective of using the selection function in the GA is providing greater chance to the members of the population having better fitness to reproduce. As Roulette Wheel, Tournament, Boltzmann, Rank, Random are some of the techniques for implementing the selection process.
Crossover -The operation of crossover goes on between two chromosomes which have the probability that is specified by crossover rate. Portions are exchanged by two chromosomes which are segregated by the point of crossover.
Mutation-To every bit 0 chromosome, the mutation operator is applied with mutation rate probability. A bit which was 0 changes to 1 after mutation and vice versa.

Proposed CH selection using Krill Herd (KH) optimization algorithm
KH is a novel optimization strategy that can help solve complex issues. This is on the basis of krill individual's performance and is the member of the swarm intelligence algorithm family. The krill swarms that hunt for food and communicate with members of the swarm form the basis of this algorithm. Three movements are implemented which are repeated in this KH technique; the best solution is proceeded by the search directions. There are three actions in which the position of a krill has been set [24]: • Effortinclined by other krill; • Foraging action; • Physical diffusion.
KH assumed the Lagrangian model shown in Equation (5). (5) where N i specifies the motion made by other krill, F i is the searching motion, and D i is the physical distribution. In Equation (1), i = 1, 2 . . . NP, and NP is population size. For the first motion, a target, local and repulsive effect determine its motion direction, α i . For krill i, Equation (6) is specified as: (6) where N max represents the maximum tempted speed, ω n be the inertia weight, and N old i represents the last motion.
The second motion is determined by the location of the food as well as the prior experience. For the ith krill, it can be defined as (7) and (8): (8) where V f is the seeking speed, ω f stands for the inertia weight for the second motion, and F old i specifies the last motion, β food i specifies the food attractive, β best i specifies the effect of the best fitness of the ith krill up to now.
The third motion is a random process wherein there are two parts such as a maximum diffusion speed and a random directional vector. Equation (9) specifies as: where D max stands for maximum speed flow and δ represents the random vector.
With the three movements, the position of a krill from t to t + Δt is represented as in Equation (10): where t can be obtained from Equation (11): where NV is the total number of variables, LB j and UB j are correspondingly the lower and upper bounds of the jth variables, and C t is a constant number, 0.5. Movement in the standard KH is affected by the other krill; for a fixed number of generations or till stopping criteria is met, foraging and physical diffusion continue to take place. Communication scenarios in WSN are of two types: inter-cluster communication and intra-cluster communication [25]. The transmissions in either type may be a single top to the BS or may be through multiple hops, first to the CH and then from CH to BS. This work comprises single-hop approach. Enhancing the intra-cluster communication and selecting an appropriate cluster representative from all nodes in each round is the objective of clustering. Data from various member nodes is aggregated at the CH and then send to the BS. There is a decreased in the amount of energy consumed using this approach. However, an issue with this method is that the CH is always a fixed node that gradually loses all its energy in the process. Hence, a node has to be assigned as the CH during every round. This decision of selecting a suitable node is undertaken by the KH. A fresh CH in a certain round is chosen based on the energy possessed by the node and the distance from the member nodes that are not CH.
There are four phases and two stages in which clustering protocols operate. The four phases comprise: (1) choosing the CH (2) formation of clusters (3) aggregation of data (4) communication of data. The set-up stage and the steady-state stage are the two stages. Initially, during a single set-up phase, the sensor relays the location and the residual energy data to the BS. The BS measures the average energy based on this data. In any given round, the CH is selected based on having the highest average energy in that particular round. Thus a competent node is selected as the CH for that round. The station then implements the algorithm for determining the K number of fittest CHs. It reduces the cost function [26], as in Equations (12)- (14): where f 1 stands for a maximum of average Euclidean distance of nodes to their related CHs and C p,k specifies the number of nodes fit to cluster C k of krills. Function f2 is defined as the ratio of the total initial energy of all nodes n i , i = 1,2, . . . .,N in the network with the total current energy of the CHs candidates in the current round. β is a user-defined constant used to weigh the contribution of each of the sub-objectives.
The objective of the fitness function that has been distinct to minimize the intra-cluster distance between the nodes and the CHs at the same time. It is quantified by f 1 ; f 2 quantifies the energy efficacy in the network that is quantified. As per the definition of the cost function above, a smaller value of f 1 and f 2 mean that the cluster is ideal having an optimum number of nodes and has requisite energy for performing the tasks related to a CH.
Step 1. Set S krills to hold K randomly designated CHs between the suitable CH candidates.
Step 2. Calculate the cost function of each krill: (i) For each node n i = 1, 2, . . . .,N • Compute distance d(n i , CH p,k ) between node n i and all CHsCH p,k . • Assign node n i to CH CH p,k where; (15) (ii) Estimate the cost function using equality (12) to (14).
Step 3. Find the best for each krill and find the best positioned krills.
Step 4. Updating the positions of individual in the search space using the Equations (16) and (17).
Step 5. Continue steps 2-4 till the maximum number of iteration is attained. The BSD communicates the information comprising the CH ID of every single node back to the sensor field, immediately after obtaining the optimal combinations of clusters. The clustering algorithm as well as the entire procedure for clustering by incorporating the KH algorithm in WSN is shown in Figure 1

Results and discussion
LEACH, GA optimization and KH optimization methods be therein proposed method. The experiments are carried out using 500-3000 network size. The searching speed considered is 0.02(m/s). The average end to end delay, average Packet Delivery Ratio (PDR), Average number of hops to sink and average remaining energy at network half-life as shown in Tables 1-4 and Figures 3-6. Figure 3 shows that the KH optimization attains lower average end to end delay by 48.99% and 0.4% for 500 network size, by 65.6% and 17.32% for 1000 network size, by 65.56% and 17.76% for 1500 network size, by 41.52% and 7.98% for 2000 network size, by 35.54% and 13.95% for 2500 network size and by 42.48% and 8.11% for 3000 network size when compared with LEACH and GA optimization.   Figure 4 shows that the KH optimization attains higher average PDR by 7.29% and 3.83% for 500 network size, by 6% and 3.54% for 1000 network size, by 6.03% and 3.71% for 1500 network size, by 5.43% and 3% for 2000 network size, by 7.71% and 3.57% for 2500 network size and by 5.73% and 2.21% for 3000 network size when compared with LEACH and GA optimization.     Figure 5 shows that the KH optimization attains higher average number of hops to sink by 27.77% and same value for 500 network size, by 21.73% and same value for 1000 network size, by 29.37%and 12.98% for 1500 network size, by 28.83% and 18.66% for 2000 network size, by 23.43% and 7.24% for 2500 network size and by 22.38% and 6.71% for 3000 network size when compared with LEACH and GA optimization. Figure 6 shows that the KH optimization attains higher average remaining energy at network half-life by 46.51% and 23.15% for 500 network size, by 31.33% and 14.43% for 1000 network size, by 24.71% and 6.18% for 1500 network size, by 31.32% and 10.98% for 2000 network size, by 33.33% and 7.4% for 2500 network size and by 37.68% and 21.62% for 3000 network size when compared with LEACH and GA optimization.

Conclusion
The advances in the arena of WSNs have led to many novel protocols that are intended exclusively for SNs in which an area which needs utmost focus is energy awareness. With regards to WSNs, energy plays a vital role. In the clustering algorithm, energy efficiency and load balancing are challenges which are quite significant. Managing such a high population of nodes effectively can be done through clustering mechanisms which help to reduce the nodes' energy consumption. This work involves energy efficient CH selection techniques based on GA and KH algorithm. GA as a CH selection scheme takes care of the optimization in selection in the work which selects heads base on their remaining energy and considered a mutual compromise of inter and intra-cluster distance. KH comes under the group of bio-mimic algorithms. The inspiration of this algorithm is from forming various groups of oceanic animals which are non-random and underdispersed. It can be considered as the KH algorithm executes in WSN clustering. Clusters are formed with the optimal possible node set in every round. The data that are gathered are accepted on the CH that is assigned. Hence, the objective is to optimize the expenses of energy within the SNs taking residual energy into account. This will consecutively progress the lifespan of the network. Results show that the KH optimization has higher average PDR by 7.29% and 3.83% for 500 network size, by 6% and 3.54% for 1000 network size, by 6.03% and 3.71% for 1500 network size, by 5.43% and 3% for 2000 network size, by 7.71% and 3.57% for 2500 network size and by 5.73% and 2.21% for 3000 network size when compared with LEACH and GA optimization. The future plan is to improve an existing results by proposing a hybridized techniques.

Disclosure statement
No potential conflict of interest was reported by the authors.