Energy efficient data collection with multiple mobile sink using artificial bee colony algorithm in large-scale WSN

ABSTRACT In most wireless sensor networks (WSN), multi-hop routing algorithm is used to transmit the data collected by sensors to user. Multi-hop forwarding leads to energy hole problem and high transmission overhead in large scale WSN. In order to address these problems, this paper proposes multiple mobile sink based data collection algorithm, which introduces energy balanced clustering and Artificial Bee Colony based data collection. The cluster head election is based on the residual energy of the node. In this study, we focused on a large-scale and intensive WSN which allows a certain amount of data latency by investigating mobile Sink balance from three aspects: data collection maximization, mobile path length minimization, and network reliability optimization. Simulation results show that, in comparison with other algorithms such Random walk and Ant Colony Optimization, the proposed algorithm can effectively reduce data transmission, save energy, improve network data collection efficiency and reliability, and extend the network lifetime.


Introduction
In recent years, there has been massive development in research of WSN, which are now commonly used in military, intelligent medical and monitoring applications. Data collection is the key task in WSN and it gets attention from a large number of researchers [1]. In conventional data collection scheme, all nodes are fixed in position to collect data before being forwarded to the Sink through routing protocol. Currently, the most challenging unsolved problems with this process include (1) the energy hole problem, where data streams follow a many-for-one mode which subjects nodes near the Sink to greater traffic load, resulting in premature energy depletion and the creation of an energy hole around the Sink; (2) the communication overhead problem, where, because the self-energy of sensor nodes is limited, there is control overhead regardless of the routing protocol algorithm and thus an inherent need to control the energy consumption of network nodes [2].
In most application of WSNs, the nodes are batterypowered and located in unattended or harsh environment. It is difficult or even impossible for battery replacement. Once node's energy exhausts, the node is disabled. It will affect the network operation and split the network to shorten network lifetime [3]. Therefore, in WSNs, network lifetime is the important indicator of network performance. The data collection algorithms of WSNs should save energy and maximize the network lifetime. The researchers proved that the hierarchal data gathering algorithm achieves remarkable performance in extending network life time. One important parameter in hierarchical routing protocols is cluster size. With small size clusters, networks may encounter connectivity and coverage problems. In [4], the authors showed that if the cluster size is not properly chosen, the total energy consumption of the network will increase exponentially, either when the cluster size is smaller than the optimal value or when the cluster size is larger than the optimal size. Lian et al. [5] showed that up to 90% of the total energy of the network can be wasted when the entire network is subject to premature death. The equal size cluster and multi-hop routing is the main cause of energy hole problem, this can be solved by using mobile sink for data collection. When a single mobile sink is used for data collection, it has to travel a whole deployment area which is not feasible for large scale WSN, so multiple mobile sinks are used for energy efficient data collection. The key challenge in this design is how to balance the workload among mobile sinks and to balance energy consumption among sensor nodes through the control of the movement of mobile sinks.
Most existing algorithms for balancing the energy consumption of sensors are too difficult to be implemented in practice due to multiple limitations imposed on WSNs for different applications. In many practical applications, these limitations seriously influence the performance of existing algorithms. In this paper, assume that a sensor network with multiple mobile sinks is to be deployed for monitoring are mote region, and there is a local connected road map for the mobile sinks in the monitored region. The problem of concern is how to schedule these mobile sinks to collect as much data from sensors as possible so as to prolong the network lifetime.
In this paper, a novel constrained optimization problem formulated as the energy management optimization problem of energy constrained multiple mobile sinks for wireless sensor networks. We then propose an efficient algorithm, referred to as the Artificial Bee Colony based mobile sink movement algorithm, for the problem, which balances not only the workload among the mobile sinks but also the energy consumption among the sensor nodes. This paper presents an efficient and reliable data collection mechanism, an MWSN data collection program based on the artificial bee colony algorithm. The main contributions of this paper can be summarized as follows: (1) The mobile Sink data collection process, cluster head selection problems, and mobile Sink path optimization. (2) The path optimization of the mobile Sink can be formulated as a shortest path finding problem; then the artificial bee colony algorithm can be used to seek the features of the optimal solution and the shortest path of the mobile Sink so as to improve network data collection efficiency.
In nature, honey bees have several complicated behaviours such as mating, breeding and foraging. These behaviours have been mimicked for several honey bee based optimization algorithms. The complicated behaviour has made the proposed study to choose bee colony optimization for path optimization.
The remaining part of this paper is organized as follows: Section 1develops the introduction. Section 2 provides a literature review of the research area. The system model is explained in Section 3. A mobility-based energy efficiency algorithm is introduced in Section 4. Numerical results and conclusions are finally given in Sections 5 and 6, respectively.

Literature survey
Several studies of deploying mobile sinks for data collection in literature have been conducted in the past. Grouping the nodes into optimal clusters is known as an NP-hard problem [6]. Consequently, one effective method for optimizing the cluster size is to use optimization algorithms. In [7] used Particle Swarm Optimization (PSO) in their proposed protocol in order to determine the optimal cluster size by minimizing the distance between member nodes and CHs and decreasing the energy consumption of the network. In [8] used GA to create optimal clusters for energy efficiency in WSNs. The authors of [9] proposed a self-clustering method for heterogeneous WSNs using a GA that optimized the network lifetime. In addition, there are other methods which adjust cluster size with different objectives. The authors of [10] aim to construct a routing tree such that the network lifetime is maximized while keeping the routing path from each sensor to the sink is minimized. A generic cost model of energy consumption for data gathering is proposed, and a routing tree is used for the query evaluation [11]. The work in [12] aims to find an optimal trajectory for each mobile sink and to determine the sojourn time of each mobile sink at each sojourn location in the trajectory such that the network lifetime is maximized. The use of a single mobile sink to prolong the network lifetime has also been explored in [13]. The work in [14] proposes the route of mobile base station and balances the network load to prolong network lifetime. Similarly [15,16] propose the maximum residual energy of greedy algorithm to control the mobile base station. In the case of limited base station stationary point, joint mobility and routing of base station are proposed [17][18][19]. Finally, [20][21][22] uses mobile sink routing protocol to manage the path. However, the usage of mobile sinks has many limitations in practical applications. On one hand, a WSN is usually deployed in traffic inconvenience regions, so the movable area in the regions by the mobile sinks is limited. On the other hand, the mobile sinks are usually installed on mobile tools such as unmanned vehicles; the speeds of these mobile vehicles are constrained by many factors such as road conditions and mobile tools, where the mobile tools also have energy limit, so their maximum moving distance per tour is limited, too. Thus, when dealing with routing protocol design under this scenario, we must take these constraints on mobile sinks into account, in order to prolong the network lifetime efficiently and effectively.

Network model
A wireless sensor network can be modelled as an undirected graph n = (V∪ MS, E), where V is the set of n stationary, homogeneous sensor nodes randomly deployed in a monitoring region. MS is the set of k mobile sinks, and E is the set of links between sensors and sinks. There is a link between two sensors u and v if they are within the transmission range of each other. We treat the mobile sinks as special sensors that can receive sensed data from sensor nodes and transfer their collected sensory data to a remote monitor centre for remote monitoring. It is assumed that every sensor node has a unique identification with limited initial energy capacity Q, while each mobile sink has an energy capacity with unlimited energy supply at the depot.
We assume that there are k mobile sinks MS 1 , MS 2 , . . . , MS k ,(k ≥ 2). Let R = (V r , W r ) be the road map in the monitoring region, where each vertex in V r is a road intersection set and each edge in W r is a road. k Mobile sinks are constrained to map R and each of them has an energy capacity eM. The energy consumption of each mobile sink is mainly used for travelling, data transmission, and reception. Let et be the sink energy consumption for unit length travel and let ec be the sink energy consumption for data transmission or reception in unit time at its sojourn locations. In general, sensor nodes consume most of their energy for wireless communication, namely, transmission and reception energy consumption, and the other energy consumption such as sensing and computing energy consumption is negligible.

Proposed approach
One purpose of using clustering techniques in WSNs is to decrease the energy consumption of sensor nodes. Figure 1(a) gives the general system model and Figure  1(b) shows the cluster formation among the nodes. The cluster formation procedure includes cluster head election and joining process between the cluster head and its corresponding cluster members. After the cluster formation, the cluster heads are treated as vertices for tree formation. For load balancing, cluster head is reelected only when the cluster head losses their energy below to the threshold value. The proposed clustering protocol consists of three phases such as grid construction, cluster head selection and cluster formation. The grid formation with the assistance of location finding techniques or GPS is very simple. The cluster establishment in equally sized square grids takes very less control overhead. In the early grid formation techniques [23] the grid size Gs ≤ Rc/ √ 5, has been chosen to communicate with the adjacent grid. Where Gs is the grid size, and Rc is the communication range of the sensor node. In the recent work [24], it has been found that the nodes in diagonal grids can communicate with the grid size Gs ≤ Rc/ √ 8. This smallersized grid provides better connectivity with neighbour grids and freedom to transmit the data with shorter path length. To construct the grids, once the grid size is decided, the sensor node finds their grid coordinates (X, Y) in which they belong. The coordinates can be calculated based on the node's location (x, y) as Each node can calculate their grid coordinate using Equation (1). Finally, the entire network is divided into equal-sized grids or cells as shown in Figure 1 Each node can calculate their grid coordinate using the Equation (1).

Cluster head election
The cluster formation starts with the selection of a cluster head. To select the cluster head the proposed protocol ensures the following criteria: • The residual energy of the sensor node should be greater than threshold, Let Gz the coordinate of any grid, CH(z) is the cluster head of any grid Gz, and Er(z) is the residual energy of any node z e Gz, then Er(z) ≥ Eth. This is the first criteria for cluster head selection • and the sensor node should close to the centroid of the grid.
Let z1, z2, z3, . . . ,zj are the nodes belong to the grid Gz. and | D(zi) | is the distance of any node zi from the centroid of the grid.
So, CH(z) = min1 ≤ i_ ≤ (| D(zi) |). This is the second criteria for cluster head selection Assuming that Zisensor nodes are deployed randomly over a grid Gi. Ek indicates the residual energy of node Sk. lk is the total length of the shortest path routing from node Sk to all (Zi − 1) nodes in Gi. The energy expenditure of one cluster can be minimal if and only if the candidate node Sk has In practical applications, it is not easy to find one candidate node that satisfies (2) in a large area with a large number of CNs. Therefore, in this study, one candidate node will be a CH if it has a maximum CH selection value, given as where α, β are positive real numbers such that α + β = 1. These are the weighting factors, which are found heuristically in the course of the optimization. Note that the relative importance of the objectives depends on these heuristic constants α, β. The most profitable values of (α, β), which are the best balance between l k and Ek(t), were chosen for the while simulation. The CH election procedures for each area can be described in more detail by following the steps of CHE algorithm given in algorithm 1.
Step 2. Calculate the total length of the shortest path from node Skto(Ni -1) normal nodes in the sub-region of interest.
Step 5. Repeat Steps 1through 4 until CH election for all grids in the network.
When the cluster heads are selected from each grid, they broadcast advertisement (CH_ADV) within the grid. The non-cluster head nodes join (CH_ JOIN) the cluster head of the same grid and create the cluster Figure 1(b). The cluster head generates the time-slot schedule for cluster members to collect the data in the collision-free manner. The cluster formation process is illustrated in Algorithm 2.

Energy Spent for Clustering:
We use the radio model applied by Soro and Heinzelman [4]. To transmits L bits of message, the energy spend by the transmitter will be In the radio model of the transmitter amplifier ( Figure  1.3), k = 2 is used for the free space model and k = 4 for multi-path model.
Here threshold To receive L bits of message, the energy spend by the receiver will be The energy expended by the CH "i" for broadcasting the CH_ADV to its Competition radius is The energy spend by the non-CH nodes for receiving CH_ADV (assumed only one CH_ADV is received per non-CH) The energy expended by the node "i" for transmitting the CH_JOIN to its CH The energy spend by the CH nodes for receiving CH_JOIN ECR JCMSG = L c .E elec (11) EC non−CH = ECR HDM + ECT JCMSG (12) The total energy spend by the CH nodes in a single cluster for clustering is, Total energy spent for formation of one cluster is EC cluster and it is given by, Total energy spend for setup phase is denoted as EC total EC Total = C × EC Cluster (15) The energy spend by dynamic routing protocol for reporting r readings (r rounds) The energy spend by the member nodes for transmitting the sensed value to the Cluster Head, The energy spend by the CH for receiving the data from its (k-1) member nodes, aggregating the values (k messages including its own message) and transmitting the aggregated value to the Sink can be represented as (The energy spend by the CH for aggregating one bit message is denoted as E DA .) Packet size of L bits of k nodes including CH can be reduced to L1 bits; that is k * L bits can be reduced to L1 bits; Total energy spent for a single transmission (one data reporting of a cluster to sink) in a cluster is denoted as E cluster and it is given by Total energy spend by all the nodes in the network (one data reporting of whole network to sink) is denoted as E total E Total = C × E Cluster (20) where C is the number of clusters.

Artificial bee colony algorithm and data collection:
In a 2005 Karaboga et al. [25] proposed an innovative heuristic method called the artificial bee colony (ABC) algorithm, inspired by the foraging behaviour found in bee colonies. In the ABC algorithm, there are three "bee" groups in the "colony": onlookers, scouts, and employed bees, where each bee represents a position in the search space; the ABC algorithm employs populations of bees to identify the optimal path. A bee waiting on the "dance" area to choose a food source is an onlooker, a bee randomly searching is a scout, and a bee going to a previously visited food source is an employed bee. The positions of food sources represent possible solutions to the optimization problem, and the amount of "nectar" of a food source corresponds to the quality (fitness) of the associated solution. The first half of the colony consists of employed bees and the second half consists of onlooker bees. The ABC algorithm can be split into four main steps (1) Initialization: Assume that population size is SN, where is the first generated food source of initial population X i = {X i1 , X i2 , . . . X iD }(I = 1, 2, . . . N) with D being the vector dimension of the optimization problem. The random initial population is then (2) Population Updating: The initial positions of food sources are randomly generated and each employed bee was assigned to a food source; then every employed bee determines a new neighbouring food source of its currently associated food source via (9) and then computes the nectar amount of the new food source for each iteration. If the nectar amount of the new food source is higher than the previous one, the employed bee moves to the new food source; if not, it continues with the old one: wherek ∈ {1, 2, 3, . . . , SN}, j ∈ {1, 2, 3, . . . , D}, and rand(−1,1) is the numerical value between randomly produced (−1, 1), which controls the producing range of X ij neighbourhood. The neighbourhood scope gradually decreases as the search approaches the optimum solution.
(3) Bee source selection. In this stage, the employed bees move according to the income rate (calculated according to fitness value) of their sources. Food sources with high-income rates are more likely to be selected, according to the following equation: where fit( Xn ) is the fitness value of the solution n(n) proportional to the nectar amount of the food source n ∈ {1, 2, 3, . . . , SN}. Fitness is calculated as follows: where (X n ) is the objective function value of bee source Xn. The followed bees search in the neighbourhood of the sources, which improves the local exploitative ability of the algorithm.
(4) Population elimination. Suppose a certain solution gains no obvious improvement after continuous limit cycling updates; it is then assumed to be caught into local optimum and is abandoned; then the corresponding onlooker bees turn into scouting bees and randomly produce a new solution to replace the eliminated solution by The new solution obtained by calculation replaces the old and the optimum solution is output accordingly. j ∈ {1,2,3, . . . , D}, rand(0, 1) is the numerical value between randomly produced (−1, 1), and Xmax and Xmin are the maximum and minimum values.
The ABC algorithm is a new type of intelligent population optimization which shows the following advantages: (1) the bee population algorithm is convergent to the whole and at relatively quick convergence speed; (2) the application range of the algorithm is quite wide; (3) it requires relatively few parameters to be set compared to other optimum algorithms; and (4) it is based upon population, so it is easily realized and processed.

ABC algorithm in proposed approach
It has two phases:

Initialization phase 2. Data collection phase
During Initialization Phase each mobile sink runs two respective stages to complete this task. First, the cluster head and node send the shortest hop information and communication time information collected in the above stage to the mobile Sink. According to this information, the mobile Sink calculates the communication time assigned to each cluster head. This numeration of communication time and data collection is expected to be completed by the mobile Sink with strong calculating and storage ability in the offline mode.
In the second stage, the mobile Sink broadcasts the above calculation results to the entire network, creating a series of matching relationship lists between member nodes and cluster heads. Each node receiving this broadcasting information obtains its objective cluster head information and then eliminates items related to itself in the broadcast information and continues broadcasting, thus completing the optimum cluster head selection process for the entire network.
Next comes the data collection phase. After initialization, all network nodes continuously collect data and send it to their objective cluster head along the route tree as established. Each cluster head caches itself and the sensory information of other member nodes before the arrival of the mobile Sink. Similar to mobile route planning issues of the ABC algorithm as it optimizes the travelling salesman problem, the sink dynamically selects its mobile route; in other words, before it collects the next round's data, it calculates and chooses the next hop's objective position according to current network environment parameters and mobile strategy to obtain the shortest route to each cluster head and node. The overview of proposed data collection algorithm is given in Figure 2. The following provides the pseudocode of

Performance analysis
MATLAB simulator is used for conducting performance analysis of the proposed method. In our simulation scenarios, 200-1000 sensor nodes and a mobile Sinks were deployed uniformly initially at random in a 300 × 300 m 2 to 1000 × 1000 m 2 square areas; The sensing radius is kept as 80 m. The packet size was fixed at 4000 bits; The velocity of mobile sink M sink is kept as 10 m/s. The ABC algorithm parameters included population quantity of 50, limit value of 200, and iterations of 60. Sensor nodes generated 10 bytes of data every 1 min and sent it to the cluster head node for storage. The maximum time delay permitted by the network was 20 min, a round of data collection was completed by the mobile Sink also every 20 min, and the mobile Sink collected about 1.5 K bits of data from each cluster head node. When mobile Sink moved at a constant speed of M Sink , the data-range of the total mobile Sink path length was 1000 m-10,000 m. Figure 3 shows sample scenario and Figure 3(a) shows the data collection path of four mobile sinks and Figure 3(b) shows the data collection path of three mobile sinks.
Number of clusters: Too few cluster heads in a network create cluster coverage areas that are too large, requiring excessive energy to transmit data between member nodes and cluster heads over the greater distance. Conversely, too many cluster heads also lead to excess energy consumption because cluster heads inherently consume much more energy than member nodes. The appropriate number of cluster heads must be chosen to ensure minimum energy consumption throughout the entire network. A comparison of the number of cluster heads resulting from the three algorithms is shown in Figure 4, where the proposed algorithm selected more reasonable numbers of cluster than others which consumes reasonable amount of energy.
Energy consumption: Energy consumption is an important indicator of network performance. The total energy consumption of the proposed algorithms in comparison with the existing algorithm is given in Figure 5. As the number of simulation iterations increased, network energy consumption gradually increased for all three though random walk showed the largest increase in energy consumption, followed by the ACO algorithm and finally the proposed algorithm,  which saved 17.5% energy compared to random walk and 4.0% energy compared to ACO.
Transmission delay: Transmission delay (i.e. latency) of successfully received data packet to measure the realtime performance of various protocols. Figure 6 shows the calculated delay from simulation. Random walk showed the largest delay (nearly 2 s) initially which then gradually decreased. The latency of ACO and ABC algorithms was initially relatively large, around 0.8 s, mainly because swarm intelligence algorithms are random during the first iterations while searching the optimal path. After subsequent learning and optimization, the final delay of ACO was about 0.2 s, while that of ABC algorithm was about 0.1 s.
Network connectivity: The continuous motion discretization method is generally used to calculate the rate of network connectivity in a mobile network. According to this method, network topology does not change within a relatively short time period. In the network at a given moment, the node traversal method also can be used to calculate network connectivity; first an initial node is selected and then directly connected nodes, binary-hop connected nodes, and triple hop connected  nodes are searched from it sequentially until the node number connected to the initial nodes does not further increase As shown in Figure 7, as simulation iterations increased, the network connectivity rate of random walk was low and volatile, ranging between 0.2 and 0.75, while that of ACO was higher and more stable, ranging between 0.5 and 0.75; that of ABC was highest and most stable overall, but with some sizable fluctuations at certain points, ranging between 0.45 and 0.8. On the whole, the network connectivity of the artificial bee colony algorithm was best.

Conclusion
This work examined the optimization problem for maximizing network lifetime in WSNs by energy balanced CH election and finding the optimal data collection path of Mobile sinks. Our investigation was based on the number and location of CHs in each round to optimize the travel paths of the MSs. The experimental results demonstrate that our proposed clustering algorithm can greatly extend the network lifetime compared with the Improved LEACH and LEACH algorithms, respectively. Also, a heuristic artificial bee colony algorithm that can be applied to maximize data collection and minimize total energy consumption while optimizing network reliability was proposed. Simulation results show that, compared to ACO and random walk, the proposed algorithm improves WSN throughput, collects data more efficiently, and saves energy. In future planned to introduce a data fusion mechanism into the sensor node and to combine it with the mobile Sink data collection algorithm.

Disclosure statement
No potential conflict of interest was reported by the authors.