UAV autonomous collision avoidance approach

ABSTRACT The conventional sense-and-avoid collision avoidance mode of UAV (unmaned aerial vehicle) lacks applicability and timeliness in a multi-threat environment. In this paper, a new efficient collision avoidance approach for uncertain threat environments derived from the idea of autonomous mental development is proposed. The proposed collision avoidance pattern consists of a sensory layer, a logic layer and a development layer. The threat information is sensed using the sensory layer, and the path planning approach in the logical layer is applied to the output configuration of UAV. In the development phase, the developmental networks approach is used for online learning, training and updating the logical layer so as to form the sense–action mapping, which is stored as the “basic experience” for UAV executing the avoidance manoeuvre. In the implementation phase, the command is executed by matching the sensing information and action base. The simulation results show that the proposed approach has better timeliness compared to the conventional approaches.


Introduction
With a wide application of aeronautical techniques in the field of economy and military, the number of Unmanned Aircraft Systems (UASs) increases significantly in airspace [1][2][3], while high-rise buildings, birds, complex air conditions, etc., have posed a significant challenge to air safety. The UASs must attain the same or higher security level compared with a manned aircraft [4]. Many researchers have focused on the UAV collision avoidance technique as a key for ensuring air safety. The Autonomous Mental Development (AMD) [5] method can simulate the mental development process of human-beings, which can also be applied in the decision-making of UAV collision avoidance in a complex threat environment.
Sensing and Avoidance (S&A) is a major pattern for threat avoidance. The aircraft's collision avoidance methods can be divided into the following categories: (1) the method of resolving the guidance law based on geometrical relationships [6][7][8]. This method calculates the guidance law of avoidance manoeuvre according to the relative distance, speed, acceleration, angle, etc., between the aircraft and threats, but it is difficult to apply in complex threat airspace. (2) The second method is based on real-time path planning [9][10][11][12]. With the development of research in this field, many path-planning approaches with significant improvement in timeliness have been proposed, such as artificial potential-field approaches, A Ã algorithm, artificial heuristic approaches and samplingbased path-planning approaches, etc., which can be applied to common path-planning problems in dynamic environments. However, the timeliness of such methods is still questionable due to the lack of computation efficiency and applicability in complex or high-dimensional dynamic environments. (3) The third approach is based on decision mechanism [13][14][15]. The decision-making approaches based on recognition-primed decision-making, Markov decision process and Bayesian decision theory are capable of implementing decision-making deduction considering real-time information. Thus, they have obvious advantages in terms of making threat avoidance strategy. However, the current researches mainly focus on how to make a better decision under certain rules [16], so it is flawed when dealing with complex situations beyond the known rules. Aiming at the current challenges in S&A techniques, it is necessary to introduce a learning mechanism to solve the above-mentioned problems [17].
Recently, the concept of AMD has been extended profoundly to robotic [18,19], particularly it is applied in visual development [20,21], language and behaviour learning techniques of robots [22,23]. The main idea is to simulate the mental development process of an infant, and the development of brain is realized by constant incremental learning through profound interaction with the environment, and storage of the learned knowledge through certain methods [5]. Unlike those approaches where the UAVs are programmed considering the mission, this idea proposed an approach for self-organization and incremental learning in a non-specific mission.
In this paper, a threat avoidance pattern that consists of a sensory layer, a logic layer and a development layer is proposed. The logic layer accomplishes the initial global configuration space planning by path-planning method and implements real-time local replanning in the sensory region when the threat is approaching, where the threat information is detected by the sensor layer. Subsequently, the avoidance manoeuvre under normal threat conditions is adopted as the fundamental experience, and the mapping of sense-action is formed through online learning and training, which is based on the developmental networks (DNs) [24]. Therefore, the number of neurons constantly grows by learning, training and implementing the process. Moreover, the "experience" is also constantly accumulated while dealing with different types of threats. Finally, the collision avoidance development process of UAVs is accomplished.

Mapping model
Sense information is obtained by the sensory layer; configuration information of the UAV is the output of the logic layer. As shown in Figure 1, the local sensing and path-planning outputs are the inputs in the DN, the network map the inputs and manoeuvre behaviour bank, UAV behaviour is output in manoeuvre bank. In the network, the input (sense)-output (behaviour) relationship sample formed by the sensory layer and the logic layer, mapping would be implemented by adjusting the weights of neurons in the DN. In the beginning, the behaviour bank has nothing, which constantly stores the mapping of sense-behaviour as experience in the process of DN training.
The sensor in the sensory layer is an onboard lidar with a detection range of dðtÞ, where t is the time (real-time data). Assume that the sensor is able to acquire all information regarding the environment, and dðtÞ ¼ ðd 1 ðtÞ; d 2 ðtÞ; . . . ; d n ðtÞÞ, where d i ðtÞ represents the shortest distance between the UAV and the threat in a certain direction, vðtÞ is the current velocity of the UAV, uðtÞ represents the output heading angle of the path planning. The simplified configuration of the UAV can be set as where ðx t ; y t Þ is the current position of the UAV. The heading angle uðtÞ is calculated as follows: where v xt , v yt are the velocity projection along x-axis and y-axis, respectively. Assuming that the value of velocity is constant, the next configuration is given as follows: The coordinates of the next position are The state of current environment is denoted as The mapping of sense-behaviour can be represented as: f :x 7 ! Z, and the next output can be calculated as follows: zðt þ 1Þ ¼ f ðxðtÞÞ

The logic layer
The logic layer outputs the control command through path-planning method. In this paper, the collision avoidance is considered as a decision-making process of a local area. Making use of the dual-planning idea that contains local and external planning within the sensory range, the avoidance problem of unknown type of threats is solved. Set G ¼ðV; CÞ as a grid map with weights, where V denotes the set of grid points, and C represents the cost space (the Euclidean distance is set as the heuristic function). Every grid point g 2 G (except the boundary grid) is directly linked to eight neighbouring grids, denoted as NeighbourðgÞ. The set of paths from the start node s to the target node d can be represented by And the cost of path is kðxÞ is defined as the path with minimum cost from a particular node x to a target node d, and is given as For a certain node x, the sensory range is set as LðxÞ, where LðxÞ 2 G. BðxÞ represents the margin region of LðxÞ, but it does not belong to LðxÞ.
x cur as the current position of the UAV. For any x 2 Lðx cur Þ, the initial cost is set as k 0 ðxÞ. When a threat obstacle is sensed in the original path, the modified cost function is given as follows: where cðx; yÞ represents the cost from x to a certain node y in the sense margin. Choosing the node y in margin with minimum cost, thus The path with minimum cost from x cur to a node y in margin is the local planned path. After accomplishing the local path planning, the configuration command is the output and is applied to satisfy the requirement of emergency collision avoidance. Then, the path from a node in the margin to the target node is calculated, which finally realizes the dual-planning. The computation cost of local paths in the sensory region is much less than those of the global paths. The delay caused by computation is minimum, and hence, it is more suitable for real-time threat avoidance.

Collision avoidance method
Based on the definition of sensory layer and logic layer mentioned above, the DN is used to connect both the layers in this section. The sample of knowledge is formed by mapping the sensory behaviour from the sensory layer to the logic layer. Then, it is trained using DN by adjusting the weights between neurons to form the mapping relationship and extend the bank to realize the development.

Algorithm of the developmental layer
As shown in Figure 2, the DN is composed of a sensory layer X, an internal brain area Y and a motor area Z.
The sensory input and effector output can be connected through the hidden networks in the brain. The networks among the three layers are connected bidirectionally. The layers X and Y are connected by bottom-up synapses, while the layers Y and Z are connected by top-down ones. In the incremental learning phase, the effector layer serves as an input layer, conducting the learning process through interactions with the outside world. Otherwise, it is treated as an output layer.
The dimensions of X, Y, Z are determined according to the actual input and output of DN. In this paper, the inputs are parameters of threat, i.e. vðtÞ, uðtÞ, dðtÞ, the outputs determine which behaviour would be selected in the bank. The complete process is shown in Figure 1. The procedure can be explained as follows: Initial state is at t ¼ 0. For any area A 2 X; Y; Z f g, initialize its renewal part N ¼ ðV; GÞ and the responsive vector r. V includes the weights of all synapses and G includes the ages of all neurons.
At t ¼ 1; 2; . . ., the task of collision avoidance is initiated in the simulation environment. The inputs are provided for network training; iterative computation is carried out in area A, and the corresponding function is as follows: where b represents the bottom-up vector, t denotes the top-down vector and r 0 is the response vector. The update method is N N 0 and r r 0 . The bottom-up and top-down vectors are in the Y-layer. The process mentioned above shows the weight update of the network nodes in the UAV collision avoidance process.
DN constantly performs the prediction for the next time Z. If the prediction value turns out to be a wrong match, it would be directly replaced by the measured value at the next moment in the learning process. That is to say, the wrong outputs from Z lead to the wrong behaviour of the UAV, and eventually, it will be corrected in the learning process. Hence, the right outputs are generated by path-planning modular.
For the weight vector v ¼ ðv b ; v t Þ in area A, its response vector is b=kbkÞ is the response vector after normalization, _ p ¼ ðb=kbk; t=ktkÞ is the output vector after normalization. The deviations of directions of _ v and _ p reflect the degree of matching. The top-k competition mechanism is adopted among the neurons within area A. When k = 1, the winner neuron j is defined as Only the winner neurons can be activated and update their weights of synapses, while the other neurons would stay in their inhibitory states. The Hebbian learning algorithm is adopted in the learning process of neurons. For each activated neuron j, the synapse vector is updated by the following mechanism with the increment of an activating age: where v 2 ðn j Þ is the learning rate of neuron j at activating age n j , v 1 ðn j Þ is the maintaining rate and v 1 ðn j Þ þ v 2 ðn j Þ ¼ 1. It can be described briefly as where t i is the activation time of neuron j. After the learning phase, mapping is formed between the neurons in Y and Z layers based on the inputs from the sensory area. Therefore, the whole training of collision avoidance is completed.

Threat avoidance method
DN is capable of performing incremental learning and internal network mapping through interactions with the outside world. In the development phase, import the threat states from the sensory layer into the logic layer, and then execute the mapping of sensory input and output behaviours. Thus, neurons were activated to form an "experience" of collision avoidance. In the execution phase, the UAV behaviour is exported directly based on the mapping between sensing information and behaviour bank. Such a recurrent process of development and implementation can realize the growth of the behaviour bank. The flow-chart of the proposed algorithm is shown in Figure 3.
In the algorithm, first, N samples are provided as input for training. Then, ask for z j , if it is not matching, the learning rate v 2 ðn j Þ is decreased. Decrease v j and return for the next matching. If it is matching, the learning rate v 2 ðn j Þ is increased, and v j and z 0 j are updated. In the execution phase, based on the inputs from the sensory layer, search for the matching of z j . If so, the UAV behaviour is output in the bank; otherwise, execute path planning to generate a new path to resolve the collision avoidance, and put the new result into the sample.

Simulation and analysis
To test the performance of the proposed method, different simple situations are designed to train the DN. The simulation is conducted in Matlab 7.0 using Windows XP, Intel Core i3, 3.3 GHz platform.

Training of the developmental networks
First, eight different simple situations are applied for training the basic avoidance behaviour of UAVs. The considered environment size is 10 km £ 10 km, with the beginning and ending depots as (0.5,4) and (9,4), respectively. The threat model is where r d values are 1 km and 0.6 km, ðx t ; y t Þ is the location of the threat. A different number of network neurons are set in the DN to determine the relationship between the activated age and the number of neurons.
As shown in Figure 4, the paths are exported by path planning according to different situations; then, online learning is implemented. The primary behaviour bank and samples are determined through training, which forms the basic mapping of sensebehaviour.
In the development process, the number of neurons is closely related to the activation ages. As shown in Figure 5, under the top-k mechanism, when the number of neurons is relatively small, the activation time of neurons in the same area increases significantly, and the updating rate of synapsis is relatively high, which means that the mapping relationships of neurons within this area is relatively large. When the number of neurons is large, the probability that the neurons are to be activated in the current scenario increases, while the neurons in other areas are inhibitory at the same time. This shows that neurons in different areas correspond to different scenarios, and it is unnecessary to repeatedly activate a certain group of neurons.

Validation of the algorithm
The scenarios with known static threats, unknown pop-up static threats and dynamic threats are set to validate the proposed algorithm. The threat model is considered as a hemisphere, where r d is 2.5 km, and ðx t ; y t ; z t Þ is the location of the threat.
The area size of environment is 20 km £ 20 km £ 4 km. The parameter settings of scenario-1 are shown in Table 1.
As shown in Figure 6, the hemispheres are threats, and the red lines are the flight paths of the UAV. At t = 25 s, the UAV was avoiding the first static threat, and the dynamic threat was moving at the same time. At t = 50 s, a threat suddenly turned up in the sensory region of the UAV. Thus, the UAV tried to match the state of the threat with its behaviour bank to avoid collision. At t = 75 s, a dynamic threat ran into the sensory region; the UAV started the third avoidance action. The heading angle of the UAV in scenario-1 is shown in Figure 7. This paper reserved the track of the dynamic threat to observe the threat situations and the UAV's behaviour of avoidance. The result shows that the proposed algorithm has good adaptability to such kinds of environment.
In scenario-2, nine static threats and two dynamic threats are set to test the real-time avoidance ability of the proposed approach under multi-threat conditions. The threat model is the same as the one in Section 4.1, and the r d is set randomly. The size of the environment is 20 km £ 20 km. Parameter settings are shown in Table 2.
As shown in Figure 8, the UAV encountered with threat1 and threat2 during the flight. The UAV and    process of avoiding threat1. It can be seen from the zaxis that they have not intersected in time. Figure 9(b) shows the similar process for threat2. The heading angle of the UAV in scenario-2 is shown in Figure 10.   Results of different scenarios validate the feasibility of the proposed algorithm.

Performance comparison
To show the performance differences, the proposed algorithm was compared with dynamic Particle Swarm Optimization (PSO) [25], dynamic A* algorithm [26] and dynamic Artificial Potential Field (APF) [27] for the same cases. The environment's size is considered as 50 km £ 50 km for this case. A total of 13 static threats and 7 pop-up threats are set. The threat model is the same as the one in Section 4.1, and the r d value is set randomly. The starting point is (0,0) and the end point is (50,50). The results are shown in the following.
As shown in Figure 11, the static threats and the pop-up threats appeared at different time. The paths generated by the four algorithms are entirely different. The three-dimensional oblique view of Figure 11 is shown in Figure 12(a), whereas Figure 12(b) indicates the corresponding left view. To illustrate the paths clearly, some of the threat images which interrupt the view have been removed. It can be seen that in the initial phase, the computational differences between four paths are minor. After encountering the first popup threat, the difference began to increase, and the directions of different paths turn out to deviate. With the change in threats, the difference of computational time and path's direction becomes more and more apparent. Figure 13 shows the heading angle of the UAV in different algorithms. The simulation results are summarized in Table 3.
As shown in Table 3, the shortest path was generated by the dynamic A Ã algorithm, while the longest path was generated by dynamic APF. The path generated by the DN is a bit longer than that of the dynamic A Ã method. The shortest computation time is for the DN algorithm; dynamic APF's computation time is longer than DN's algorithm, and that for dynamic PSO is the longest one. The proposed algorithm is fine enough in terms of calculation time and the cost of  path's length, and it can adapt to a workspace that exists pop-up threats by making avoidance behaviour in time.

Conclusions
In this paper, the concept of AMD is introduced. Also, the UAV collision avoidance method based on DN is proposed by combining the collision avoidance strategy of the UAV with the DN. Unlike the conventional sense-avoidance methods, the proposed method enables the UAV to obtain fundamental experience of collision avoidance by learning and training, as well as the online expansion of knowledge. For different types of threats, neurons in the DN realize growth and development by adjustment and updation of the synapses' weights. The simulation results show that the proposed method based on DN has better timeliness in threat avoidance. The disadvantage of the proposed method is that it is unable to confirm the area directly where neurons need to be activated, which will affect the computation time to a considerable extent. The future work will focus on the improvement of the sensory layer in the DN by classifying different types of threats when sensing information from the outside world, which can be applied to determine certain areas of activated neurons directly.