Spatio-temporal-spectral-angular observation model that integrates observations from UAV and mobile mapping vehicle for better urban mapping

ABSTRACT In a complex urban scene, observation from a single sensor unavoidably leads to voids in observations, failing to describe urban objects in a comprehensive manner. In this paper, we propose a spatio-temporal-spectral-angular observation model to integrate observations from UAV and mobile mapping vehicle platform, realizing a joint, coordinated observation operation from both air and ground. We develop a multi-source remote sensing data acquisition system to effectively acquire multi-angle data of complex urban scenes. Multi-source data fusion solves the missing data problem caused by occlusion and achieves accurate, rapid, and complete collection of holographic spatial and temporal information in complex urban scenes. We carried out an experiment on Baisha Town, Chongqing, China and obtained multi-sensor, multi-angle data from UAV and mobile mapping vehicle. We first extracted the point cloud from UAV and then integrated the UAV and mobile mapping vehicle point cloud. The integrated results combined both the characteristics of UAV and mobile mapping vehicle point cloud, confirming the practicability of the proposed joint data acquisition platform and the effectiveness of spatio-temporal-spectral-angular observation model. Compared with the observation from UAV or mobile mapping vehicle alone, the integrated system provides an effective data acquisition solution toward comprehensive urban monitoring.


Introduction
With over half the population living in urban agglomerations (Hoole, Hincks, and Rae 2019), urban systems pose unique remote sensing challenges.As a branch of remote sensing technology, urban remote sensing takes the city as the observation object (Weng and Quattrochi 2018, Yang 2011, Shao et al. 2020, Wu, Gui, and Yang 2020, Huang and Wang 2020).Urban fabrics are rather complex, with the existence of various kinds of occlusion from vertical objects such as buildings and trees.Satellite/aerial imagery usually fails to mitigate the lack of urban scene information caused by the occlusions.As a result, the information regarding urban surfaces is difficult to be extracted in an accurate manner, leading to the limited utility of high-resolution remote sensing imagery when it is applied in urban scenes.Therefore, there is a need to carry out both horizontal and vertical spatial observations to form a coordinated observation that involves multiple platforms.
With the rapid development of computational technology and the emergency of the new generation of surveying and mapping technology (e.g., satellite navigation and positioning technology (Ning, Yao, and Zhang 2013), remote sensing and geographic information technology (Dar, Sankar, andDar 2010, Li, Shao, andZhang 2020), traditional remote sensing surveying and mapping technology has undergone a fundamental change.With the continuous progress of Unmanned Aerial Vehicle (UAV) (Colomina and Molina 2014) and Mobile Mapping System (MMS) (Petrie 2010), UAV and MMS have established a new venue where spatial information can be retrieved in a timely manner.Together, they play a very important role in the field of remote sensing surveying and mapping.Therefore, we argue that it is of great importance to investigate how to integrate UAV and MMS towards a coordinated urban mapping framework.
UAV, aiming to obtain spatial information within a targeted region, is an emerging remote sensing platform with the capability to carry a variety of remote sensing sensors, such as high-resolution CCD digital camera, light optical camera, multispectral imager, infrared scanner, laser scanner, hyperspectral imager, synthetic aperture radar, to list a few.UAV remote sensing technology (Changchun et al. 2010, Yao, Qin, and Chen 2019, Zhang et al. 2020, Ma et al. 2013, Xia et al. 2018, Shao et al. 2021) has received wide attention, given its several advantages. 1) UAV operations are flexible and efficient, as UAVs do not require large space for taking off and landing.In addition, UAVs are less affected by weather conditions compared to satellite-borne instruments.2) UAVs are able to obtain multi-scale imagery thanks to their dynamic flight height, allowing them to perform both large-scale and small-scale monitoring.3) The resolution of UAV images can reach 0.1 meters or even higher, considerably finer than most satellite imagery.4) The cost of a UAV remote sensing system is much lower than that of satellite remote sensing and aerial remote sensing in terms of platform construction, routine maintenance, and aerial photography.5) UAV system is highly integrated.A UAV system is generally equipped with a mission planning system and data processing system with simple and flexible flight planning.The data processing can be completed shortly after the data acquisition.
Mobile mapping system (Puente et al. 2013, Li 2006, Novak 1995, Marinelli et al. 2017, Sester 2020), born in the early 1990s, is one of the cutting-edge techniques of modern surveying and mapping.A mobile mapping system is the composition of global satellite positioning, inertial navigation, image processing, photogrammetry, laser scanning, geographic information, and integrated control technology.It has many advantages, such as high flexibility, high precision, high resolution, real-time data acquisition, and multi-source data collection.Mobile mapping system can contribute to urban mapping by obtaining information on surfaces, and more importantly, information on vertical urban objects (e.g., buildings and trees).One specific category of mobile mapping system is the utilization of mobile mapping vehicle.Given its capability of capturing objects' facades, mobile mapping vehicle has been widely used in 3D modeling, topographic map update, GIS database construction, urban survey and planning, mine survey, public security, and urban management.
Although UAV systems and mobile mapping vehicle systems are the current cutting-edge sensing technology, observations based on the single sensor have insurmountable shortcomings in comprehensively describing the urban environments, as single-sensor, either from UAV system or mobile mapping system fails to capture three-dimensional spatial information.For example, UAV can provide the spatial information and texture features of objects from the top view, lacking the details of geometric and texture information of the facades of the objects.The mobile mapping system can obtain street view data with high position accuracy and high resolution, providing rich facade information and a better three-dimensional description of the scene.However, the 3D point cloud data are usually noisy and lacking the texture information from the top view (Haala et al. 2008, Hana et al. 2018).We argue that data collected from multi-platforms can be complementary.In this paper, we focus on the multi-source data fusion methods, taking advantage of multi-sensor acquisition to obtain high-quality geospatial data.
Multi-sensor integration and fusion technology from the early 1980s in the military field have rapidly expanded to military and non-military applications (Huang et al. 2010, Li andFu 2018).Multi-sensor integration refers to the comprehensive use of information from various sensors obtained in different time spans to assist task completion, including the data collection, transmission, analysis, and synthesis of useful information provided by various sensors.The purpose of multi-sensor integration is to take advantage of the resources from multiple sensors, especially complementary spatiotemporal information, to obtain consistent interpretation and description of measured objects being measured.
In a complex urban scene, observation from a single sensor unavoidably results in voids in observations, failing to describe urban objects in a comprehensive manner.In this study, we achieved the rapid acquisition and integration of three-dimensional seamless holographic spatial and temporal data of cities.We integrated observations from UAV and mobile mapping vehicle and proposed a spatio-temporal-spectral-angular observation model (Figure 1).The model we proposed achieves fast observation through the integration of multi-platform, multi-angle observations in both the air and ground.It obtains various data types such as images, point clouds, positions, and attitudes with synchronized and unified geographic reference.Through multi-source data fusion, the spatiotemporal holographic information of complex urban scenes can be collected in rapid, fast, and comprehensive manners.

Methodology
In this section, we introduce the spatio-temporal-spectral-angular observation model at pixel-level (or voxel-level) and feature-level.Then the 3D point cloud registration is introduced using data collected from UAV and mobile mapping vehicle.

1 Spatio-temporal-spectral-angular observation model
In this study, we proposed a spatio-temporal-spectral-angular observation model.Theoretically, both the UAV platform and the mobile mapping vehicle platform can be equipped with sensors with high-temporal, high-spatial, and high-spectral resolution.The proposed spatio-temporal-spectral-angular observation model is able to handle the discrepancies in observation angels from these two platforms.Urban remote sensing observation is generally achieved by constructing appropriate models or algorithms based on the temporal, spatial, spectral, and angular image features.Therefore, we abstract the process as a spatio-temporal-spectral-angular model, where temporal, spatial, spectral, and angular features serve as model inputs.According to different model outputs, an urban remote sensing observation model can be generally divided into two categories: 1) data quality improvement model (pixel-level fusion or voxel-level fusion), and 2) information extraction model (feature-level fusion).
1) The data quality improvement model refers to obtaining images with higher quality by fusing multi-source data.This process can be modeled using the following formula: Given the difference in sensing techniques, multi-source images tend to focus only on certain components.For example, high-resolution images own high spatial resolution but usually couples with limited temporal and limited spectral resolution.To fuse these components, the constraint relationship among multi-source images on each component should be established when a spatio-temporal-spectral-angular observation model is constructed: where   F  is the feature constraint function and I is the fused image.
2) When information serves as the output of spatio-temporal-spectral-angular observation models, the information extraction model with a specific task can be abstracted as:   Similarly, under the constraint of the task , features from four aspects (i.e., spatial, temporal, spectral, and angular) can be extracted and further fused, thereby outputting useful information that benefits numerous urban monitoring tasks.
This process can be expressed via the following formula: where   F  is the feature constraint function and   O  represents the information extraction function.
The spatio-temporal-spectral-angular observation model is a general model for data fusion of similar types.The data quality improvement model performs pixel-level (or voxel-level) fusion, while the information extraction model performs feature-level fusion.In this study, we conduct the voxel-level fusion based on the 3D point cloud data with diversified spatial, temporal, spectral, and angular information.

3D point cloud registration
To perform voxel-level fusion based on the 3D point cloud data from UAV and mobile mapping vehicle, the point cloud registration is a vital preprocess.Point cloud registration refers to the process of transforming the point cloud to the same coordinate system through a series of rotation and translation operations.Point cloud registration belongs to a rigid transformation that the target point Q is transformed to the source point P through rotation and translation transformation that can be represented by Equation.6，where R and T refer to the rotation matrix and translation matrix respectively, the M refers to the total transformation matrix (M) that describes the three-dimensional transformation of space is in the form of 4×4.
Using the three axes of the coordinate system X, Y, and Z as the axes of rotation,  ,  and  are the rotation angle about the axis of X, Y, and Z, respectively.The three rotation matrices are obtained as follows: Assuming ( , , ,1) x y z and ( , , ,1)

Urban data acquisition system combining UAV and mobile mapping vehicle
In this study, we proposed an urban data acquisition system combining the observations from UAV and mobile mapping vehicle.UAV is responsible for collecting aerial images, while mobile mapping vehicle is responsible for collecting ground-view data.Their combination achieves fast acquisition of three-dimensional spatio-temporal information in complex urban environments.

UAV sub-system
Traditional platforms to obtain images are manned aircraft or satellites (Woldai 2020).However, these remote sensing systems have several major disadvantages, such as low spatial and temporal resolutions, limited availability by weather conditions, and high costs (Xiang and Tian 2011).In comparison, UAVs are typically at a low cost.Such light-weighted aircraft with low speed has shown great performance in remote sensing data collection.UAV-based sensing system largely fills the gap between ground observations and remote observations from satellite platforms.UAV Remote Sensing System (UAVRSS) is a new remote sensing system composed of an unmanned aerial platform, Positioning and Orientation System (POS), remote sensing sensor, and inertial stabilization platform.It can automatically, intelligently, and rapidly obtain imagery that covers targeted areas and can process, model, and analyze the obtained images (Li, Shan, and Gong 2009).The overall framework of the UAV system is shown in Figure 2. The POS is used to provide the position and attitude information and assist the high-resolution imaging.The main types of sensors used for UAV remote sensing include aerial cameras, airborne LiDAR (Horvat et al. 2016), hyperspectral imagers (Zhong et al. 2018), infrared thermal imager, and small Synthetic Aperture Radar (SAR) (Guerreiro et al. 2017).In the case of UAV flight, the inertial stabilization platform is applied to ensure the stability of remote sensing sensors by absorbing and smoothing the mechanical jitter during flight.

1) Unmanned aerial platforms
UAV is the flight platform of photogrammetry and remote sensing system, which is used to carry remote sensing sensors, positioning and attitude measurement system, UAV power system, and other remote sensing measurement related equipment.The most important function of the flight platform is to carry a variety of sensors to carry out safe and stable flight tasks in order to ensure the acquisition of high-quality remote sensing data.Innovations in power technology, lightweight composite materials, and control methods have led to the emergence of a variety of UAVs.
UAVs can be generally divided into three categories based on their structure: 1) fixed-wing UAVs, 2) multi-rotor UAVs, and 3) helicopter UAVs, as shown in Figure 3. Fixed-wing UAVs are mainly used in military and civil fields with the advantages of long endurance time and strong load capacity and the disadvantages of strict requirements for taking off and landing.The multi-rotor UAV, mainly used in the civil field, is able to rise and fall vertically and hover in the air.
Helicopter UAVs can also achieve vertical lifting and hovering in the air with high load capacity compared to multi-rotor UAVs.
Figure 3. Different types of UAVs.

2) POS
POS, used to calculate the position and attitude parameters of remote sensing sensor, is composed of a Global Navigation Satellite System (GNSS) and Inertial Navigation System (INS).POS mainly includes POS integration module, aerial anti-jamming antenna, ground control station, and POS post-processing software.POS receives satellite signals as well as angular velocity and acceleration collected by the IMU sensor with the support of GNSS antenna the ground control stations.POS post-processing software is used for fusion calculation, aiming to obtain the three-dimensional coordinates and attitude of the carrier.

3) Remote sensing sensors
Given different tasks, the remote sensing sensor uses the corresponding airborne remote sensing equipment, such as high-resolution CCD digital camera, light optical camera, multispectral imager, infrared scanner, laser scanner, hyperspectral imager, synthetic aperture radar, etc. (Colomina and Molina 2014).Remote sensing sensors should own the characteristics of small size, light weight, high precision, and large storage capacity.
The aerial camera shoots high-resolution optical images for aerial remote sensing.The hyperspectral imager combines imaging technology and spectral technology to obtain continuous and narrow reflectance of the spectrum.
According to the laser ranging principle, laser scanners acquire three-dimensional coordinates and texture information of a large number of dense points on the surface of objects and construct the three-dimensional model of the object and various map data such as line, plane, and volume.Infrared scanners sense the infrared radiation of the measured object and form an infrared image by combining the optical scanning and motion direction of the instrument.

4) Inertial stabilization platform
In general, the volume of UAVs is relatively small.Therefore, during flight, UAVs are vulnerable to interferences caused by weather conditions such as crosswind and eddy current, potentially leading to error measurement of IMU and resulting in decreased imaging quality.The inertial stabilization platform supports and stabilizes the navigation and positioning sensors and remote sensing sensors, which can effectively isolate the angular movement of the flight platform and the errors caused by various internal and external disturbances and maintains the working stability of the POS and remote sensing sensors.
In this study, we use the light, small, and high-precision POS UAV remote sensing system independently developed by Leador Spatial Information Technology Corporation (Figure 4).Its unmanned aerial platform is a professional six-rotor UAV with excellent dynamic redundancy and wind resistance.It can be adapted to work in high-altitude areas and has a considerably long endurance time of more than 50 min, which is suitable for long-term operation.With the support of ground control stations, the UAV can be set in automatic cruise, automatic flight, automatic landing, intelligent cruise.Thanks to its highly modulated design and pluggable structure, the entire UAV can be stored in an aeronautical box for easy transportation.

Mobile mapping vehicle sub-system
The mobile mapping vehicle system is one of the most cutting-edge science and technology of modern surveying and mapping, integrated with the global satellite positioning, inertial navigation, image processing, photogrammetry, laser scanning, geographic information, and integrated control technology, with the characteristics of flexibility, high precision, high resolution, and real-time multi-source 3D spatial data collection.Mobile mapping vehicles can obtain essential surface information of roads as well as the vertical objects (e.g., buildings and trees) on both sides of the road in a real-time manner, even under fast driving conditions.This surface information reflects the structure, size, texture, and other information of urban objects.Mobile mapping vehicle systems have obvious advantages in data acquisition: fast, accurate, automatic in data processing workflow with the adaptation of various data forms.
The architecture of the acquisition, processing, and application of the mobile mapping vehicle system can be divided into four levels: equipment layer, data layer, outcome layer, and application layer (Figure 5).The equipment layer contains on-board data acquisition hardware equipment, including POS that serves as the time and space reference, panoramic cameras used to obtain image data, and laser scanners that can directly obtain 3D information.
The data layer contains data collected directly by the device or obtained after simple processing.Among them, POS can recover high-precision running track and position and attitude with high sampling frequency.Positioning and attitude information can be used to mark the position and azimuth of panoramic images, serving as time and space references for 3D point clouds.
The outcome layer represents the information of interest obtained from the data layer.For example, high-precision driving tracks can be used as digital maps to update the road network.The point cloud contains abundant information regarding the physical size of the object, which can be used to construct high-resolution DEM of the road surface and roadside, three-dimensional building model, and positions of power lines, poles, and other ancillary facilities.
The application layer serves as the application of the outcome layer.Different industries have different demands with specific applications.For example, panoramic images can be used for real virtual roaming and navigation, to identify the position of advertisements on the street, and to calculate earthwork volume for road diversion based on high-resolution DEM.
The mobile mapping vehicle system used in this study, shown in Figure 6, is the "Flash" system with an exquisite modular design developed by Leador Spatial Information Technology Corporation.Connected through aviation plugs, the whole system is composed of a data acquisition device, monitoring device, and power supply.The data acquisition device is composed of high-precision optical fiber or laser Inertial Navigation System (INS), panoramic camera, LiDAR, and high-grade protective cover.The INS, panoramic camera, and LiDAR are fixated with mechanical devices.Indoor integrated checking and calibration are completed before delivery.The monitoring device is supported by an industrial computer, which is used for the system's working state display and monitoring, data storage, processing, etc. Figure 6.The mobile mapping vehicle system developed by Leador Spatial Information Technology Corporation.In this study, the UAV is equipped with five Sony cameras with three bands (Red, Green, Blue), obtaining images from diversified angles.The mobile mapping vehicle captures the ground data, including panoramic images, 3D point clouds, and trajectory data.We conduct the voxel-level fusion that combines the 3D point cloud data from UAV and mobile mapping vehicle.The UAV 3D point cloud data is first extracted from UAV images with accurate pose and position information.We then perform the registration of 3D point cloud data from UAV and mobile mapping vehicle, including three steps: 1) keypoints selection, 2) transformation matrix calculation, and 3) aligning.Finally, we implement the proposed spatio-temporal-spectral-angular observation model.

1 Study area
The study area is in Baisha Town, Chongqing, China, located at 106°7'1.55''E and 29°3'38.66"N. Baisha Town is featured by complicated urban environments with occlusions caused by trees and buildings where a single sensor often fail to obtain complete observations.Therefore, we conduct the spatio-temporal-spectral-angular observation based on the combination of UAV and mobile mapping vehicle system in this study area, as shown in Figure 8.One UAV and one mobile mapping vehicle system are utilized in our experiments.The red line indicates the data acquisition area where the mobile mapping vehicle obtained data along the road and the UAV collected data from the air.

2 Data acquisition
In the study area, the UAV and mobile mapping vehicle collected data simultaneously according to the prescribed routes, as shown in Figure 9.The UAV captured images from diversified angles while the mobile mapping vehicle obtained images and 3D point cloud data along the road.The size of each image is 6000×4000.Figure 10 shows some example images from our UAV.The positional information of each image was recorded, an example showing in Table 1.The mobile mapping vehicle moved with a speed of 30 km/h and captured the ground data that include panoramic images, 3D point cloud, trajectory data.Figure 11 shows an example of data collected from our mobile mapping vehicle.

3 Data preprocessing and data fusion
In the data preprocessing step, we first extracted the 3D point cloud from UAV images according to the accurate pose and position information of each image.Figure 12 shows the extraction results.UAV 3D point cloud has rich color information and spatial texture features of ground objects, as well as spatial location information, contributing to ground object identification and 3D model information extraction.Although rich information can be obtained from the UAV point cloud, there still remains missing information, such as information from facades and occluded objects.caused by occlusion.Therefore, from the comparison results, we can conclude that information from a single sensor is inadequate to obtain the 3D spatial information of ground objects in complex urban scenes in an accurate manner.The integrated point cloud data contains more abundant and comprehensive information, suggesting that the data fusion strategy in our spatio-temporal-spectral-angular model is able to obtain 3D information for complex urban environments in a comprehensive manner.

Discussion
As a conceptual model, the proposed spatio-temporal-spectral-angular observation model includes pixel-level (or voxel-level) fusion and feature-level fusion that enables the integration of similar data types.In this study, we conducted voxel-level fusion that aims to integrate UAV and MMS 3D point clouds.We acknowledge that certain improvements can be made in future works.In our experiments, the UAV 3D point cloud was extracted from UAV images instead of directly from UAV equipped with a LiDAR sensor.Such a point cloud extraction method leads to a relatively lower precision.In addition, the MMS point cloud we collected lacks color information.In light of these two limitations, we only conducted a qualitative evaluation of our data fusion strategy instead of a quantitative one.Despite these limitations, the spatio-temporal-spectral-angular observation model we proposed achieves the integration of multi-sensor, multi-angle data to better obtain three-dimensional information of complex urban environments.Our experiment results confirmed the effectiveness and contribution of our spatio-temporal-spectral-angular observation model.In the future, we plan to further

Conclusions
In this study, we propose a spatio-temporal-spectral-angular observation model to integrate observations from UAV and mobile mapping vehicle platforms, realizing a joint, coordinated observation operation.We develop a multi-source remote sensing data acquisition system to effectively acquire multi-angle data of complex urban scenes.Multi-source data fusion solves the missing data problem caused by occlusion and achieves accurate, rapid, and complete collection of holographic spatial and temporal information in complex urban scenes.We carried out an experiment on Baisha Town, Chongqing, China and obtained multi-sensor, multi-angle data from UAV and mobile mapping vehicle system.We first extracted the point cloud from the UAV and then integrated the UAV and mobile mapping vehicle 3D point clouds.The experimental results show that a single sensor is unable to accurately obtain the 3D spatial information of ground objects while the combination of information from multiple sensors can address this issue.The integrated point cloud contains more abundant and comprehensive information, indicating that the data fusion via our proposed spatio-temporal-spectral-angular model can better obtain the 3D information for complex urban environments, providing an effective data acquisition solution towards comprehensive urban monitoring.

Figure 1 .
Figure 1.The spatio-temporal-spectral-angular observation model that combines observations from UAV and mobile

I
represent multi-source images,   O  stands for the fusion model, and I indicates the output of the model, i.e., images with improved quality.In general, remote sensing images mainly contain spatial, spectral, temporal, and angular features, which can be expressed as: , and angular features, respectively.

I
represent multi-source images,   O  stands for information extraction model, and Y indicates the output of the model.

Figure 2 .
Figure 2. The overall framework of the UAV Remote Sensing System.

Figure 4 .
Figure 4.The professional six-rotor UAV developed by Leador Spatial Information Technology Corporation.

Figure 5 .
Figure 5.The architecture of the acquisition, processing, and application of a mobile mapping vehicle system.

Figure 7
Figure 7 presents how UAV and mobile mapping vehicle systems are integrated.UAV and mobile mapping vehicle

Figure 7 .
Figure 7.The combination of UAV and mobile mapping vehicle system.

Figure 8 .
Figure 8.The experimental area is in Baisha Town, Chongqing, China, with the red line indicating the data acquisition

Figure 9 .
Figure 9.The UAV and mobile mapping vehicle were collecting data in a simultaneous manner with prescribed routes.

Figure 10 .
Figure 10.The UAV images from five cameras.(a) Five-lens camera model, (b) The behind view, (c) The forward view, (d)

Figure 11 .
Figure 11.Data sample from our mobile mapping vehicle.(a) Panoramic image, (b) 3D point cloud.

Figure 13 .
Figure 13.Keypoints selection between the UAV 3D point cloud and the MMS 3D point cloud.

Figure 14 .
Figure 14.The integrated 3D point cloud that combines the UAV point cloud and the MMS point cloud.
investigate the performance of the proposed spatio-temporal-spectral-angular observation model, applying it to other scenarios and paying more attention to feature-level fusion and experiments based on UAV and MMS equipped with better LiDAR sensors.

Figure 15 .
Figure 15.The comparison results of 3D point cloud before and after fusion.(a) and (c) present the UAV point cloud.(b)

Table 1 .
The position information of part of the sample from UAV.