Angle-Aware Coverage with Camera Rotational Motion Control

This paper presents a novel control strategy for drone networks to improve the quality of 3D structures reconstructed from aerial images by drones. Unlike the existing coverage control strategies for this purpose, our proposed approach simultaneously controls both the camera orientation and drone translational motion, enabling more comprehensive perspectives and enhancing the map's overall quality. Subsequently, we present a novel problem formulation, including a new performance function to evaluate the drone positions and camera orientations. We then design a QP-based controller with a control barrier-like function for a constraint on the decay rate of the objective function. The present problem formulation poses a new challenge, requiring significantly greater computational efforts than the case involving only translational motion control. We approach this issue technologically, namely by introducing JAX, utilizing just-in-time (JIT) compilation and Graphical Processing Unit (GPU) acceleration. We finally conduct extensive verifications through simulation in ROS (Robot Operating System) and show the real-time feasibility of the controller and the superiority of the present controller to the conventional method.


Introduction
The advancement of 3D map reconstruction technology has played a pivotal role in supporting various sectors, including building information modeling [1,2], precision agriculture [3,4], and construction site inspection [5].The success of the Structure from Motion (SfM) algorithm [6] has been instrumental in efficiently reconstructing a 3D model of a target object by analyzing a collection of images captured by a camera.Notably, the past decades have witnessed the introduction of numerous sensors and platforms for capturing high-quality images, with unmanned aerial vehicles (UAVs) emerging as particularly prevalent platforms capable of autonomous image acquisition [7].Therefore, ensuring the efficient coverage of viewpoints by UAVs becomes a crucial requirement for achieving high-quality map reconstruction using the SfM algorithm.
To ensure efficient coverage of viewpoints, at least two crucial factors must be considered.Firstly, achieving comprehensive observation of every part of the scene from multiple angles is vital to enhance the reliability and accuracy of 3D reconstruction [7].
This necessitates sufficient overlap between viewpoints, compelling UAVs to explore a 6D configuration encompassing both 3D position and 3D angle orientation.However, in practice, the roll angle is often neglected, simplifying the search space to a 5D configuration for camera poses [8,9].Secondly, addressing the one-time visit problem is essential [10].This involves systematically capturing images across the target field, ensuring that each 5D point within the target field is observed and visited.
Addressing the aforementioned requirement, classical coverage control algorithms have traditionally been widely adopted to tackle the scenario outlined above [11].Moreover, various studies have also explored coverage control strategies in environments with diverse shapes [12][13][14][15][16][17].However, a notable limitation in the existing literature is that they do not consider coverage in a 5D search space, a crucial aspect in the context of UAV-based 3D map reconstruction.Furthermore, many of these studies tend to guide robots into static configurations, falling short of covering each point within the target field.
Alternatively, there has been a study on persistent coverage control algorithms, enabling UAVs to monitor the environment persistently [18][19][20][21].However, once more, these studies predominantly focus on 2D coverage, overlooking the necessity to observe points from multiple angles.Furthermore, persistent coverage strategies may not address the specific requirement of a one-time visit, as specified earlier.
Recent work in angle-aware coverage control [10] marks the initial attempt at achieving 5D coverage with multiple UAVs.This approach also provides a solution to the one-time visit problem.By integrating the concept of capturing images from various angles within the coverage control framework, significant improvements in 3D map reconstruction quality have been demonstrated, as shown in [22].However, it is essential to note that even in [10], the drones' camera orientations remained fixed, suggesting room for further improvements.This could be achieved by considering dynamic camera control of the UAV using gimbal mechanisms.In an earlier exploration into camera angle control, documented in a previous study [23,24], the focus was primarily on adjusting the camera's angle while keeping the camera position static; therefore, it could not solve the one-time visit problem.
In the context of this paper, our objective is to extend the existing angle-aware coverage control framework [10] by integrating camera rotational motion control via gimbal mechanisms.This enhancement allows for simultaneous control of both camera orientation and drone motion within the coverage control framework.To achieve this, we present a new formulation of the problem that includes a new performance function to assess the camera orientations and drone positions.We then design a QP-based controller [25] with a constraint using a control barrier-like function on the decay rate of the objective function.This new problem formulation is significantly more computationally demanding than the case of coverage with translational motion control only.We address this challenge by implementing JAX [26] , employing JIT compilation and GPU acceleration.Finally, using simulation in the ROS, we conduct thorough verifications demonstrating the controller's real-time viability and superiority over the traditional approach.

Drones, Virtual Field, and Geometry
We consider a scenario involving n drones, each represented by the index set I := {1, . . ., n}, operating in three-dimensional Euclidean space.All the drones are regarded as rigid bodies, and we define two frames of reference: the world frame Σ w and the body frame Σ i , which is fixed on the camera of the i-th drone as illustrated in Fig 2 .The x, y, and z coordinates of the origin of Σ i relative to Σ w are denoted by x i , y i , and z i , respectively.The orientation of Σ i relative to Σ w is denoted by R i ∈ SO (3).Each drone i is equipped with an onboard camera that can be adjusted both horizontally, denoted by φ i , and vertically, denoted by ϕ i , using a gimbal system.The horizontal angle φ i ranges from 0 to 2π, while the vertical angle ϕ i ranges from 0 to π/2, effectively forming a hemisphere of observable angles for the camera.
During operation, all drones are assumed to maintain a constant altitude z i = z c .Consequently, we exclude the altitude component from the drone's position description, defining a state vector of drone i, p 4 , where P ⊂ R 2 represents a compact subset of a plane.Furthermore, each drone within the set I follows the subsequent dynamics: Here, u x i and u y i represent the linear velocity input for drone i, while u φ i and u ϕ i correspond to the angular velocity input for adjusting the gimbal angles of drone i.
Our main objective is to reconstruct 3D structure about a given target field.In this paper, we model the set of position coordinates in Σ w of all points in the target field to be observed as B ⊂ R 3 , encompassing the highest and lowest points and any objects within.Now, it is widely known that every point in B should be observed by rich viewing angles in order to obtain a high-quality 3D structure.To reflect this objective, we define the horizontal angle θ h ∈ [−π, π) and the vertical angle θ v ∈ (0, π/2] to represent the angles from which we observe specific target points.The points to be observed are then characterized by five variables consisting of not only the position coordinates [x y z] T ∈ B but also the viewing angles θ h and θ v (See Fig 3).Accordingly, we consider a coverage control problem over the 5D virtual space, Q c ⊂ R 5 , which encompasses all observation variables: (x, y, z, θ h , θ v ).Hence, the primary goal is how we can effectively control drone p i ∈ P to monitor every point q ∈ Q c .In the sequel, we discretize the 5D virtual space Q c into m cells and a representative point of the j-th cell is denoted by q j = [x j y j z j θ h j θ v j ] ∈ Q c .The collection of q j , j = 1, 2, . . ., m is denoted by Q.
Let us now introduce geometry associated with the present control problem.Consider drone i having the state p i := [x i y i φ i ϕ i ] T .The optical axis of the camera is then described in Σ i as hand, the position vector in q j (the first three elements) is described as in Σ i .Also, the vector specified by the angles θ h j and θ v j is represented in Σ w as .
Remark 1.In the context of drone camera capabilities, it is essential to address certain limitations that may affect their performance.While some drone cameras offer pitch, yaw, and roll rotation capability, the structural limitations of many gimbals often restrict the range of yaw rotation angles, often preventing a complete 360-degree rotation.To address this, controlling the camera's yaw angle typically involves coordination with the drone's body rotation.However, in this paper, we do not focus on the drone's rotational behavior, and for simplicity, we consider it as part of the gimbal's angle control.Moreover, it's important to note that camera sensors generally have rectangular designs, leading to a rectangular beam-shaped observable area for the drone.Nevertheless, for the sake of simplicity and without loss of generality, we assume a circular shape for the drone's observable area.

Performance function and importance index
In this subsection, we present a performance function, denoted as h : . This function characterizes a drone's coverage capability at state p i concerning a specific point q j ∈ Q.A higher value of the performance function corresponds to a more effective coverage capability.The effective coverage of point q j by a drone positioned at state p i depends on two key factors: (1) The geometric position of point q j must fall within the drone's camera's field of view.(2) The relative geometric relationship between the drone's position and the observed point q j must satisfy the angle (θ h , θ v ) of q j .
The first condition can then be described as follows: where fov is the half-angle of the camera's field of view.The second condition can be described in a similar way: In the subsequent controller design, we employ the following performance function combining h 1 and h 2 with appropriate tuning of the parameters σ 1 and σ 2 : Remark that the parameter σ 1 should be tuned so that the performance function is sufficiently close to 0 for any point q outside of the field of view.On the other hand, we have not thought of any systematic way to tune σ 2 , and tune empirically this parameter in the subsequent simulation.We would like to leave the issue to future work.
Let us next introduce the importance index ψ j ∈ [0, ∞) assigned to each point q j ∈ Q.From the definition of the performance function h, a large value of h(p i , q j ) for some i means that the drone already samples a good image on q j and then we can reduce the importance of the point q j .Based on this observation, similarly to [10], we propose the following update law for the importance index ψ j : In terms of the controller design, it is preferable that the performance function meets these two properties: • Restricting the performance function within the range from 0 to 1 in order to govern the decay rate of ψ j in (7).• Ensuring a continuous non-zero gradient for the performance function in order to ensure the existence of the input u i that increases the function b i .
An example that meets the above properties is the Gaussian function.This is why we utilize it as the performance function.

Objective function
We are now prepared to present the objective function that needs to be minimized, defined as This equation represents the integral of the density function across the entire region.
When the density function approaches values close to 0 for individual points, it signifies the effective capture of images for those points.As J approaches 0, it indicates that the region's coverage is nearing completion, which is crucial for capturing images suitable for reconstructing the 3D structure.Consequently, the primary objective is to steer the drones in a manner that drives J towards convergence to zero.However, due to the monotonically decreasing property of ψ j in (7), it is trivial to achieve J → 0 itself.We thus impose another specification J ≤ −γ for a positive constant γ to specify efficiency of the task completion.This covers the primary objective J → 0. In summary, the problem to be addressed in this paper is to determine the velocity input u i so that J ≤ −γ is satisfied for a given parameter γ > 0.

QP-based Controller Design
In this section, we propose a controller based on quadratic programming, which enforces the constraint J ≤ −γ using the concept of control barrier functions.To this end, we begin by computing the time derivative of J: where δh(p i , q j )ψ j (10) corresponds to the contribution by drone i to reduce J in (8).The set V i (p), which depends on p := (p i ) i∈I , is a Voronoi-like partition of the set M := {1, 2, . . ., m} defined as By defining b i,I = I i − γ/n as a candidate for the control barrier function, with a given γ > 0, we aim to compel the drone to minimize the objective function J at a specified rate of decrease, denoted by γ.This criterion is met when the control input u i is chosen such that the following inequality holds: where α 1 is a locally Lipschitz extended class K function.A continuous function α : (−b, a) − → (−∞, ∞) is said to belong to extended class for some a, b > 0 if it is strictly increasing and α(0) = 0 [27].
Furthermore, to adhere to the gimbal input limitations, we introduced a secondary control barrier function constraint, restricting the vertical angle ϕ i of the drone's gimbal within the range of 0 to π/2.While functionally equivalent vertical angles exist between π and π/2 as those between 0 and π/2, the physical structure of the gimbal imposes constraints on this range.Failing to enforce these constraints during optimization may result in commands that surpass the π/2 threshold, which not only prevents gimbal from executing the command, but also results in a reduction in horizontal rotation.By maintaining the vertical angle within the specified limits, we ensure optimal performance and prevent unintended consequences associated with exceeding the gimbal's mechanical boundaries.
Therefore, to ensure the vertical angle ϕ i of the gimbal varies between ϕ min and ϕ max , we define a control barrier function.This function, denoted as b i,ϕ , ensures that the condition is satisfied at all times, securing the appropriate vertical angle constraints throughout the optimization process.This criterion is met when the control input u i is chosen such that the following inequality holds: where α 2 is a locally Lipschitz extended class K function.Now, we are ready to present the QP-based controller as follows: where ϵ is the penalty variable and w i is the slack variable.
Theorem 3.1.Suppose that no q j (j ∈ M) is located on the boundary of V i (p).When α 1 : R → R and α 2 : R → R are set as a linear function such that α 1 (b i,I ) = a 1 b i,I and α 2 (b i,ϕ ) = a 2 b i,ϕ , where a 1 > 0 and a 2 > 0 is a positive scalar, the problem is equivalently reformulated as : where Please refer to Appendix A for the detail proof of the controller ( 16).
Remark 2. The optimization problem ( 15) is always feasible because the constraint is softened by the slack variable w i .However, the existence of the solution to (15) does not mean that the original specification I i ≥ γ/n is met.For a too large γ, the specification might not be satisfied in the transient in the presence of the velocity limit of the drone.It is desirable that γ is appropriately determined so that I i ≥ γ/n is violated only when the coverage is almost completed, namely J ≈ 0. However, a systematic way to determine such γ for a given velocity limit is left for future work.

Computation acceleration
In contrast to conventional two-dimensional coverage control, dealing with coverage in a five-dimensional space significantly increases the computational load.Specifically, the number of cells, m, drastically increases in the five-dimensional case, which affects various processes including calculation of the performance function, updates of the importance index, Voronoi-like region partitioning, and, more significantly, gradient computation of the performance function.In particular, the gradient computation in terms of the five variables x i , y i , φ i and ϕ i takes up most of the computational time.
[10] presented a computationally efficient implementation of the controller with slight approximations in computation, where the position of the drone monitoring a point from a specified view angle was uniquely determined, enabling the mapping of the five-dimensional target field to a two-dimensional space with slight approximation.On the other hand, in this paper, we assume that the camera orientation can be controlled.In this case, the camera state monitoring a point from a specified view angle is not uniquely determined, and the mapping from five-dimensional field to a lower-dimensional space cannot be defined.Therefore, the same approach could not be applied to the present problem.We thus approach this problem technologically, namely we introduce a JAX library accelerating the processing through just-in-time (JIT) compilation and GPU acceleration.

JIT compilation
As same as in [10], our program is primarily written in Python, and we use NumPy library for matrix calculations.Since Python is an interpreted language, the code is not compiled before execution.This leads to a significant performance loss when running large-scale computations in Python compared to statically compiled code.The complex calculations involved in computing the gradient of the performance function exacerbate this loss, making real-time calculations challenging.We thus replace NumPy by JAX's built-in NumPy, allowing to compile functions of pure matrix computations.
We use JIT in two types of ROS nodes in our program: central controller and drone controller, where the importance index is updated in the central controller, while the computation performance function, dividing the Voronoi-like partition and the gradient computation is done in the drone controller, as Fig. 4 shows.The first two rows of table 1 compare the impact on the computation speed of whether JIT is used or not.For drone controller, which is more computationally intensive compared to  central controllers, computation times over 1s are unacceptable for real-time control.Meanwhile, after accelerating the function using JIT, the computational speed gains a significant increase even with the same computer, enabling it to reach a computational frequency of around 10Hz.

GPU acceleration
Since CPU performance is very difficult to improve and the computation time using CPU can be greatly altered by the size of the matrix, we utilize JAX's GPU acceleration feature to do matrix calculations on GPU.
We conducted a timing comparison on a laptop running Ubuntu 20.04.3 LTS, comparing the time it takes to perform computations using the CPU and GPU.The laptop is equipped with a 20-core Intel i7-12800H CPU and a Nvidia RTX A3000 GPU.Fig 5 shows how the computation time consumed at each step changes over time when using CPU and GPU.We note that the first step of the computation takes a very long time in comparison, which is caused by the JIT compilation during the initial execution of functions.When using the GPU, compilation took longer, but after compilation was complete, computation using the GPU took roughly 1/5 the time of the CPU, and the time spent was relatively stable in comparison, while the time spent on the CPU fluctuated considerably.
During CPU-based computations, the CPU usage is around 60%, and during GPUbased computations, the GPU usage is reported only 24%.As indicated in the table 1, employing GPU acceleration for matrix computations results in significant time savings, particularly for large matrices, despite the overhead of data transfer between the CPU and GPU memory.It is to be expected that after increasing the size of the matrix, the computation time of the CPU and GPU will produce a bigger gap.To simulate a drone equipped with a camera with a focal length of x[mm], we set the field of view of the drone to π/6.The initial angles of the gimbal for the three drones are all set to φ i = 0 and ϕ i = π/2, indicating that the cameras were initially pointed vertically downwards.The horizontal angles of the drone gimbal φ i were limited to [0, 2π), while the vertical angles ϕ i were limited to (0, π/2].The field is divided into m = 1.5×10 7 small cells of size 0.02m×0.02m×0.1m×π/30rad×π/30rad.Other parameters are presented as table 2. The quadratic programming is solved by CVXOPT [28].

Controller Evaluation and Verification
All the drones start the coverage from the same moment t = 0s, and their positions and camera orientations are presented by a series of screenshots as Fig. 6.The value of the importance function is a five-dimensional matrix that is difficult to visualize.Therefore we take the average of the two dimensions θ h and θ v , transform it into a matrix containing xyz, and visualize it as a point cloud.The color of the points represents the importance of the position, purple means not yet covered, and red means well covered.The region changed from purple, gradually to red, representing a good completion of the coverage.We observe that drones tend to cover within the task region and keep downwards at the beginning, and start moving out of the task region and turning the gimbal to cover more points at a later stage.
The evolution of the objective function J is shown in Fig. 7, indicating a linear decreasing trend for the majority of the time, which implies that the decrease of the objective function meets the requirement of the target γ.After the coverage is completed to a certain extent, the coverage performance of γ becomes difficult to achieve and the constraint is violated, so the decline of the objective function gradually slows down.
Let us next compare the performance of the present controller with that in [10] without considering the camera orientation control.Since these controllers employ different objective functions, they cannot be used as metrics of the comparison.To fairly compare these controllers, we need to prepare another metric that quantifies their performances.In this paper, we revert to the original use case to establish quantifiable metrics related to the actual 3D model quality.In practical applications, drones perceive the environment by taking photos or videos at a limited frequency, rather than through continuous performance functions.We mimic this behavior by attempting to update the drone's coverage of the target region at a finite frequency.Specifically, we set a shooting rate of images for the drone, representing the number of photos it takes  per second.During each shoot, we record which points are covered by the drone.When a point has met the standard of covering, it is marked as "covered".This process is similar to updating the objective function, but the difference lies in using Boolean values instead of continuous values.By tracking the reduction in the number of uncovered points, we can evaluate how well a controller contributes to the reconstruction of the 3D model.However, we still need to establish criteria to determine whether a point is considered covered.Similar to the performance function, we also consider two aspects.A point is considered covered at a specific moment if and only if both of the following conditions are met: (1) The point lies within the field of view of the drone's camera at that moment.
(2) The angle between the line connecting the drone to that point and the observed angle of that point differs by less than a certain threshold.
We set the threshold value to π/16 and recorded the covered points for both of the present controller and [10] at a burst rate of 5 Hz.The recorded results of decreased objective functions are presented in Fig. 8. Since [10] cannot rotate the camera, the coverage was completed at around t = 130s and then stopped moving.Due to the limited range of viewing angles, the number of uncovered points remain high even after finishing the coverage.Meanwhile, when we use the present controller with camera orientation control, the number decreases to about 1/2 of [10].This result demonstrates the benefit of controlling the camera orientations as well as the drone positions.

Conclusion
In this paper, we presented a novel angle-aware coverage with camera rotational motion control.To this end, we presented a novel problem formulation including a novel performance function that integrates the camera orientations.The real-time viability of the present QP-based controller was also demonstrated with the help of JIT and GPU computing.Moreover, we verified that the present controller achieves a better coverage performance than the original algorithm without camera control [10].
Future work should be directed to the hardware experiments and the performance comparison in terms of the accuracy of the reconstructed 3D structure.

Figure 3 .
Figure 3. Illustration of the angle aware coverage problem with camera orientation control.

Figure 5 .
Figure 5. Comparisons of CPU and GPU computational time.

Figure 6 .
Figure 6.Snapshots of the simulation.

Figure 7 .
Figure 7. Time evolution of the objective function.

Figure 8 .
Figure 8.Comparison of angle-aware coverage and with camera rotational motion control

Table 1 .
Average computation time (ms) of one step

Table 2 .
Parameter setting