Intensifying the spatial resolution of 3D thermal models from aerial imagery using deep learning-based image super-resolution

Abstract Nowadays, 3D thermal models can play an important role in buildings' energy management while acquiring multisource data to generate a high-resolution 3D thermal model. Consequently, in this article, a method for intensifying 3D thermal model using deep learning-based image super-resolution is presented. In the proposed method, first, the enhanced deep residual super-resolution (EDSR) deep network is re-trained based on thermal aerial images. Second, the resolution of low-resolution thermal images is enhanced using the newly trained network. Finally, the state-of-the-art structures from motion (SfM), semi global matching (SGM) and space intersection are utilized to generate intensified 3D thermal model from the resolution enhanced thermal images. Spatial evaluations indicate a 5% increase in edge-based image fusion metric (EFM) for the intensified 3D model. Besides, the evaluations show that the modulation transfer function (MTF) curves of the intensified 3D thermal model are closer to a reference model against the original 3D thermal model. Highlights A 3D thermal model intensification solution using EDSR is proposed which is independent of hardware techniques and multisource data. Considering the importance of edge sharpness in the intensified 3D thermal model, the quality of edges is assessed using MTF curves and the EFM metric. In comparison to the original 3D thermal model, the MTF curves of the intensified 3D thermal model are closer to the MTF curve of the high-resolution 3D model. The EFM metric shows higher values for MTF curves of the intensified 3D thermal model against MTF curves of the original 3D thermal model.


Introduction
Nowadays, the use of a 3D thermal model generated from aerial thermal images, in which the digital number (DN) values from thermal images are mapped, are more commonly applied. These models are mainly used in building inspections which helps in detecting of heat losses, incomplete insulation in roofs, cracks, air and moisture leakages, etc. Also, detecting areas with energy leaks to reduce energy consumption plays an important role in buildings energy management (Rakha et al. 2018). Thus, it seems necessary to have accurate and precise information about the surface temperature and its spatial pattern (Mandanici et al. 2019). In this regard, 3D thermal models that present thermal information and provide 3D building roofs information can be used to detect, interpret and measure thermal anomalies in building and roof inspections (Borrmann et al. 2013).
One of the challenges of having such a 3D thermal model is its spatial resolution. In fact, because of the need for larger IFOVs in thermal cameras (to ensure that enough energy reaches the detector), the spatial resolution of thermal images is usually fairly coarse. Consequently, the 3D thermal model generated only based on thermal images provides a low spatial resolution and few details that may challenge the process of detecting, interpreting and measuring thermal anomalies (Khodaei et al. 2015).
The proposed methods to generate a high-quality 3D thermal model can be divided into two main groups as multi-source and single-source methods. In multi-source methods, researchers use thermal images and information from other resources such as laser scanners data and RGB images or panchromatic images simultaneously (Oreifej et al. 2014;Borrmann et al. 2013; Ant on and Amaro-Mellado 2021) . That is while in single source methods, only thermal images are used and attempt to improve the quality of the final product.
The multi-source methods are classified into two categories of methods that focus on image space and those works that on object space. Research that increases thermal image resolution by fusing it with RGB or panchromatic images falls into the first category (Ma et al. 2019). In the methods focus on object space, researchers have used fusion-based techniques for enhancing the accuracy of 3D thermal models. As an instance, in some studies the 3D model is generated using RGB images or laser scanner data and then, the thermal information is mapped onto the 3D model (Ham and Golparvar-Fard 2013;Yang et al. 2018;Javadnejad et al. 2020;Alba et al. 2011;Borrmann et al. 2013). In other studies, 3D models from thermal and RGB images have been generated separately and registered by different methods to enhance the resolution of the 3D thermal model (Maset et al. 2017;Javan and Savadkouhi 2019;Dahaghin et al. 2021).
Obviously, the multi-source methods are not applicable in the cases where only thermal images are available. Acquiring multisource data and registering data from various data sources are the challenging process because different sensors are concentrated on different operating ranges and environmental conditions ). In addition, regarding the simultaneous use of visible and thermal sensors, it seems necessary to mention these points that first dual sensors that capture thermal and RGB images at the same time are normally too expensive and not cost-effective for lots of projects. Second, thermal images should be taken at night and when there is no light reflection, while RGB images should be taken during the day and when enough light exists. Simultaneous capturing of these two images is used mainly in cases related to interpretation application. In the 3D information extraction projects, it is mostly recommended not to take RGB and thermal images at the same time.
In the single source methods, an attempt is made to increase the 3D thermal model resolution only by increasing the thermal image resolution. One way to achieve this is the resolution enhancement of thermal images through 'hardware' techniques, which entails high costs and many limitations (Yue et al. 2016). Therefore, resolution enhancement of thermal images independent of hardware techniques can be more practical for generating high-quality 3D thermal models.
As another solution in the single source methods, super-resolution (SR) techniques that use images of just one sensor have become a potential way for obtaining high resolution images (Yue et al. 2016) and, then, high-quality 3D model. Indeed, SR is a technique that reconstructs a higher-resolution image or sequence from the given low resolution images (Yue et al. 2016;Dong et al. 2016).
Against various methods of producing super-resolution images, single-image superresolution (SISR) techniques that use limited low-resolution information from a single image to estimate the mapping from low-resolution to high-resolution space have been used in many studies (Kim and Kwon 2010). The SISR methods are classified into three groups including interpolation-based methods that suffer from accuracy deficiency (for example, bi-cubic interpolation), reconstruction-based methods that are usually time-consuming and learning-based methods (Yang et al. 2019). Among the learning-based methods, deep learning (DL) solutions, particularly convolutional neural networks (CNNs), are superior to other methods because they are actually able to enhance the data in an information-theoretical sense (Kansal and Nathan 2020;Liebel and K€ orner 2016).
Nowadays, researchers have focused on providing networks that can produce better high-resolution images against simple up-sampling methods such as bi-cubic interpolation. Some of these networks are CNN-based SR (SR-CNN) (Dong et al. 2016), very deep super-resolution (VDSR) , enhanced deep residual super-resolution (EDSR) (Lim et al. 2017), super-resolution network for multiple degradations (SRMD) (Zhang et al. 2018), very deep residual channel attention networks (RCAN) (Dai et al. 2019), etc. Researchers have also used SR techniques in the 3D information extraction from RGB images. For example, Zhang et al. (2019) generated high-quality DSMs by improving the quality of images using super-resolution methods. They applied various super-resolution methods to the RGB images and compared the quality of the resulting DSMs. In another work, Burdziakowski (2020) utilized SR algorithms to increase the geometric and interpretative quality of the final photogrammetric products. They concluded the photogrammetric products generated from high-altitude images processed by the SR algorithm showed a similar quality to the reference products generated from low-altitude images and, in some cases, even improved their quality. In Pashaei et al. (2020), the ability of a DCNN-based SISR model, named enhanced super-resolution generative adversarial network (ESRGAN), to predict the spatial information degraded or lost in a hyper-spatial resolution unmanned aerial system (UAS) RGB image set was studied. Results showed the accurate extraction of interior and exterior imaging geometry from a super-resolved image set.
Although the super-resolution of RGB and panchromatic images has been extensively studied, the DL-based enhancement of thermal images is a newer field of research. In some research for thermal images enhancement, network training is done based on images from the visible spectrum at different colour space representations (Choi et al. 2016;Almasri and Debeir 2018). In contrast to these methods, a number of studies in recent years have focused on the production of high-resolution thermal images through network training using low-resolution thermal images (Rivadeneira et al. 2019;Kansal and Nathan 2020).
This research proposes to generate a high-resolution 3D thermal model from lower resolution aerial thermal imagery captured by a low-cost camera mounted a low weight drone. For this purpose, super-resolution thermal images are generated based on training a deep network while quality of aerial thermal images and the information content is lower in thermal imagery than RGB images. This research aims also to investigate how much this enhancement can improve the spatial resolution of the final 3D thermal model generated based on low quality aerial thermal images. For the purpose, a DL-based superresolution network is trained and applied to generate intensified 3D thermal model. To evaluate the capability of the proposed methodology, the effect of SISR algorithm on 3D thermal model intensification is investigated by measuring spatial resolution evaluation criteria and answer to the question whether this low-cost solution can help to improve the quality of 3D models generated by low quality and cost images.
The remainder of this article is organized as follows: Section 2 describes the methodology, the quality assessment metrics for evaluating the intensified 3D thermal model and dataset that is used in this article. In Section 3, the results of generating a 3D thermal model based on the proposed solution are presented and discussed in detail. Lastly, Section 4 provides a conclusion and future perspective.

Methodology
Given the importance of producing high-quality 3D thermal the problems of conventional solutions, increasing the spatial resolution of aerial thermal images using DL-based SISR seems to be a practical solution for producing intensified 3D thermal models. In Figure 1 the proposed method to generate an intensified 3D thermal model is presented. Accordingly, after training the selected deep network, the spatial resolution of each thermal image extracted from the thermal video will be increased, then, intensified 3D model is generated from generated high-resolution aerial thermal images. Afterwards, the DN values of high-resolution aerial thermal images are mapped onto the intensified 3D model and the intensified 3D thermal model is generated. It is noteworthy mentioning that converting thermal image DN values into absolute temperature values is not in the scope of this article. If needed, radiometric calibration and conversion of thermal DN value into temperature could be implemented.

Pre-processing
This step includes two basic phases of thermal image extraction from captured string video frames and the thermal camera calibration. The thermal camera captures the video which is converted into images. The size of extracted image frames is 640 Â 480 pixels.
One of the main photogrammetric parts here is camera calibration, which intends to calculate the parameters of camera model from 2D images (Peng and Li 2010). Currently, the techniques for camera calibration are classified into two main groups: the traditional camera pre-calibration and the self-calibration strategies. The traditional camera calibration methods are performed before using the photogrammetric procedure and solve the camera parameters based on accurate scene information, such as points or lines with precise coordinates. Thus, these methods normally result in more accurate camera calibration (Yan et al. 2016). To this end, an appropriate test field must be designed. Many effective factors such as simple structure, material, shape, and target dimensions should be considered in test field designing. The features of the test field, as well as their coordinates, should be easily identified as well. The projection between image location of features and their object coordinates are then employed to conclude the camera calibration parameters.
In a study conducted by Usamentiaga et al. (2017), the camera calibration process was done using both chessboards and circular patterns. Comparing the obtained results revealed better accuracy for the circular pattern. Consequently, for the proposed method, a planar circular test field is designed and images are captured from several directions and orientations based on Zhang's method (Zhang 2000). The test field is a rectangular calibration board (with 13 Â 17 hollow circles). The diameter of circles is 12 mm and the distances between their centres are 24 mm. Also, six coded targets are embedded on the calibration board to identify the position of targets in each of images ( Figure 2a). Due to the limitation of the wavelength detectable by thermal cameras, the calibration board is heated for better detection and higher contrast, followed by imaging from multiple views ( Figure 2b). After imaging, adaptive thresholding is used to generate binary images (Prakash and Karam 2012). In the next step, the geometric centres of the circles are identified (Ouellet and H ebert 2009) (Figure 2c). After determining the geometric centre of targets in all images, each point is re-projected from object space to image space by collinearity equations (Equations (1) and (2)) (Javan and Savadkouhi 2019): In the above equations, x and y are image coordinates, c is focal length, X 0 , Y 0 , and Z 0 are coordinates of projection center, X, Y, and Z are object coordinates, and r denotes elements of the rotation matrix. To determine lens distortion parameters, Brown model equations are used (Equation (3) and (4)) (Brown 1971): where x 0 and y 0 show image coordinates that have no distortion, k i represents the lens radial distortion parameter, p i is lens decentring distortion factor, and r is the radial distance from the perspective point projected on the image plane.
In geometric calibration, the mathematical model converting from the target coordinate system and the corresponding coordinates in the image space is also computed. Therefore, the accuracy of the calibration algorithm can be estimated based on the mean re-projection error. Figure 2d presents the mean re-projection error per image. As it is obvious from the figure, the average value of 0.315 pixels is estimated over 13 images and 221 calibration points.

3D thermal model intensification
As shown in Figure 1, this phase consists of two main steps. In the first step, the process of producing a high-resolution thermal image from a low-resolution one is done by training a DL-based SISR model. In the next step, the process of producing the intensified 3D thermal model is performed based on the outputs of the previous steps. Details of these two steps are given in the followings.
2.1.2.1. Image resolution enhancement. To enhance the thermal image resolution, the EDSR network, which is a convolutional neural network (CNN), is used to apply SISR (Lim et al. 2017). Since in this study showing the efficiency of SISR methods in 3D thermal model intensification is important, the EDSR network is chosen due to its simplicity of implementation and acceptable reported performance in recent research (Yang et al. 2019). The network structure of EDSR is presented in Figure 3. The EDSR network has not yet been trained for thermal aerial imagery and for the first time, its capability in producing an aerial thermal super-resolved image is investigated here.
The main difference between low-resolution and high-resolution images is in high-frequency details such that the EDSR network can learn the mapping between low-and high-resolution images. Due to the necessity of reference high-resolution images for deep network training, in this step, original thermal images are down-sampled by the bi-cubic method to generate low-resolution thermal images as the network input (Lim et al. 2017). This makes it possible to compare the output of the EDSR network with the reference high-resolution image (actually original thermal images).
Consider Th LR as a low-resolution thermal image, Th HR as a high-resolution thermal image and c Th HR as the estimated high-resolution thermal image, the goal is to train model f, which holds in Equation (5).
In other words, the EDSR network generates a resolution enhanced image c Th HR by minimizing the distance between f Th LR ð Þ and Th HR : The EDSR network is trained using the mean absolute error loss function (L1 loss) instead of L2. Lim et al. (2017) found that L1 loss results in better convergence than L2.
The L1 loss function that should be minimized is presented in Equation (6). In this equation, m is the number of the rows of the images, i represents the index of each row, n is the number of columns of the image and j represents the index of the column.
In selecting the images for training, an attempt is made to include a variety of features (such as buildings, roads, trees, land) in the training dataset. The extracted frames are first converted into low-resolution images by the bi-cubic interpolation method with the scale factor of 2; then 70% of these images are used for the training process and the remaining 30% images are used for validating the accuracy of the training process. It should be noted that the use of down-sampled images and original images for deep network training is a common a technique in this field and research by (Shermeyer and Van Etten 2019;Lim et al. 2017) show that the network learns to produce the super-resolved image independent of the spatial resolution of the input image.
The training set of low-resolution images and their corresponding original thermal images are entered into the EDSR network for training. It is important to note that in order to prepare a thermal image similar to the RGB image and to allow entry into the EDSR network for training; the thermal image DN values are repeated in three channels. The standard network parameters as set in the original paper (Lim et al. 2017) are used. The patch size of 48 Â 48 pixels is chosen and augmented by random horizontal flips and 90 rotations per patch. The use of image patches and augmentation of the data during the training process effectively increase the amount of available training data and ensure the production of a robust model. Moreover, a variety of features in the images selected for training help to more robustness of the trained model. Since the EDSR network is not been used before for high resolution aerial thermal images, for this study it is developed in C þþ.
After the training process, the map for predicting high-resolution image from the input low-resolution image is determined. In this way, any low-resolution image can be superresolved and the high-resolution image can be produced. Thus, each of another original thermal images are extracted from the video of the study area is separately converted into high-resolution thermal images using the trained model and scale factor 2. These enhanced images are used to generate the intensified 3D thermal model in the next steps.
2.1.2.2. Intensified 3D thermal model generation. For generating an intensified 3D thermal model from a set of high-resolution thermal images, the first state-of-the-art SfM algorithm is used to compute exterior orientation parameters of images. Thus, the key points are detected and matched using Scale Invariant Feature Transform (SIFT) method. These matched points are used in a sequential bundle adjustment to determine the exterior orientation of input images and then to generate a sparse 3D point cloud (Truong et al. 2017).
Second, the semi-global matching (SGM) algorithm is applied to high-resolution thermal images for generating a disparity map (Hirschm€ uller 2011). SGM estimates a dense disparity map from a rectified stereo image pairs. In this method, the stereo matching problem is formulated to determine the disparity image D for energy E(D) minimization using Equation (7).
where the first term is sum of the matching costs of all pixels for the disparity values in D. In the second term, the P 1 parameter is a constant penalty that is added to all pixels q in the neighbourhood N p of p, for which the disparity varies a little (i.e. 1 pixel). The third term in all greater disparity differences adds a larger constant penalty P 2 Satisfactory outcomes of the SGM have made it appropriate for dense stereo matching applications and thus encouraged many researchers to use it (Hirschm€ uller 2011). Then, consistency check and peak removal are used to eliminate errors and blunders caused by conditions like spectral discontinuities and hidden areas from the generated disparity maps (Mohammadi et al. 2019). Nevertheless, some blunders still stay in purified disparity maps due to the incompleteness of these two algorithms (Mohammadi et al. 2019). Afterwards, the generated disparity maps from all the stereo pairs of images and computed exterior orientation parameters are utilized to generate a dense point cloud of the scene by applying space intersection. After generating a dense point cloud, the intensified 3D model is produced by data gridding.
Finally, thermal image DN values from the high-resolution thermal images are mapped onto the intensified 3D model for producing an intensified 3D thermal model. It should be noted that although the DNs do not refer to absolute temperature in the thermal image, their differences indicate relative temperature differences. Darker radiometric information infers less heat reflection in the area than in areas with lighter radiometric information.

Quality assessment
In this study, by considering the fact that the produced intensified 3D thermal model consists of two elements of intensified 3D model and the texture (orthophoto) which is mapped onto the intensified 3D model; both the 3D model and the orthophoto are evaluated separately.
First, the intensified 3D model is evaluated using two statistical criteria, i.e. the Root Mean Square Error (RMSE) and Mean Relative Error (MRE), to compare the low-resolution 3D model (3D model generated from original thermal images) and the intensified 3D model against the reference 3D model. Here, the reference 3D model is considered as R and the generated 3D model from thermal images as G. Equations (8) and (9) represent RMSE and MRE, respectively.
where N is the total number of pixels in 3D model, and (i, j) represents the position at i th row and j th column. The lower values for RMSE and MRE indicate more similarity of the produced 3D model to the reference 3D model . It should be noted that due to the use of control points in the 3D model production process, the generated and reference 3D models are registered planimetrically and vertically; only the Z accuracy is checked using the RMSE and MRE criteria. The criteria such as RMSE and MRE cannot examine the geometric quality of produced 3D thermal models. Considering the importance of edge information in 3D thermal model quality, especially in building boundaries, the Modulation Transfer Function (MTF) as an edge-based quality metric is used for evaluating the geometric quality of generated 3D thermal model. The main idea of this method is to extract the appropriate edges in the reference and produce models and compare their MTF curves based on the Line Spread Function (LSF).
To calculate MTF, first, the high-contrast edges should be extracted to evaluate the spatial resolution of the produced models. Therefore, step edges, which are defined using Equation (10), are considered as appropriate targets for evaluation.
After extracting edge locations, edge profiles are extracted. For each point on the edge, straight lines are created perpendicular to the edge and crossing the edge. The resulted Edge Spread Function (ESF) is introduced to the LSF calculation after smoothing and consistency checks. This is done to avoid noise in final MTF and prevent the algorithm from selecting weak or unsteady edges (Javan et al. 2013). To calculate LSF, differentiation is applied to the ESF profile. Next, the generated LSF curve is smoothed and its noises are removed by a Gaussian function. Equation (11) shows the LSF calculation equation: Discrete Fourier transform from the generated LSF function leads to MTF (Equation (12)). The normalized MTF is computed by dividing the absolute transformed function values by the first absolute value.
After calculating MTF for the extracted edges in the reference and produced models, the model quality assessment is performed by comparing the calculated MTF curves. This assessment is based on the idea that any edge in the reference model should appear in the produced models with similar MTF curves. The lower degradation of the MTF curve shows better spatial quality of the produced models. For the numerical comparison of MTF curves, the statistical variance index is used. In this way, it is possible to measure the distance between MTF curves using the edge-based image fusion metric (EFM) introduced in (Javan et al. 2013) (Equation (13)): where MTF i denotes the MTF value at spatial frequency i. V is the mean of the variable V i : Also, N is the total number of spatial sample frequencies. To be compatible with other measures, the EFM will be defined in such a way that a higher value indicates less distinction between the produced and the reference models, suggesting a higher spatial quality and a higher similarity.

Dataset
The study is carried out in Charmshahr industrial area in the southern part of Tehran, Iran.

Sensor and platform
In this study, thermal videos are captured using an uncooled focal plane array camera Keii HL-640S mounted on a lightweight multi-rotor Unmanned Aerial Vehicle (UAV) that has the roll and pitch axis stabilizer. This camera detects the Thermal InfraRed (TIR) region of InfraRed (IR), which covers middle and longwave part of IR spectrum. Moreover, for generating the reference 3D model a high-resolution RGB Sony a6000 24 MP camera equipped with a 35 mm lens is used to collect RGB images. These RGB images are only used to generate a reference 3D model and evaluate the enhanced spatial resolution of the intensified 3D model. Figure 4 shows the thermal and RGB cameras as well as UAV platform used in this study.
More details about the sensors and platform are provided in Table 1.

Study area and flight plan
Flight planning for the Charmshahr region with the area of approximately 123,000 m 2 is done using Mission planner as a UAV ground control station software. Figure 5 shows the studied area and the flight plan. Table 2 provides some information over the flight plan.

Results
As mentioned in the methodology, for generating an intensified 3D thermal model, using the proposed algorithm, the spatial and radiometric resolution of the images is enhanced.
In the training phase, 620 frames are extracted from the thermal video are used. 70% of these images (434 images) are used for the training process. The accuracy of the determined trained model is validated by other 186 images and the peak signal-to-noise ratio    images using the trained model and scale factor 2. The size of the original images is 640 Â 480 pixels and their pixel size is 17 mm. The size of the enhanced images is 1280 Â 960 and their pixel size is 8.5 mm. These enhanced images are used to generate the intensified 3D thermal model in the next steps. Figure 6 illustrates the overall view of some samples for visual comparison between original images extracted from the thermal video and their corresponding high-resolution images resulted from DL-based SISR.
In Figure 7, for each sample of Figure 6, a magnified view of the original image and its corresponding high-resolution image is compared.
After image resolution enhancement step, the resolution-enhanced images are used to generate intensified 3D thermal model of about 11 cm resolution. Also, the low-resolution 3D thermal model (generated from original thermal images) has a resolution of 22 cm.   Figure 8 presents the low-resolution 3D model and intensified low-resolution 3D model. Also, 3D views of these two 3D models and their corresponding 3D thermal models are represented in Figure 9.

Discussion
In this section, the quality of the low-resolution and intensified 3D thermal models is compared against the reference 3D model by resolution of four cm which is generated from RGB images captured using a high-resolution RGB camera.
To compare low-resolution 3D thermal model and intensified 3D thermal model against the reference 3D model, the produced 3D thermal models are interpolated to the same resolution as the reference 3D model by the bi-cubic interpolation method. Figure  10 shows a visual comparison of low-resolution 3D model and intensified 3D model against the reference 3D model and their corresponding orthophotos. Figure 11 shows a visual comparison of 3D view of object edges of low-resolution 3D model and intensified 3D model. The visual comparison illustrates that objects in intensified 3D model provide more details than the low-resolution 3D model. Moreover, the edges on the intensified 3D model are sharper than those of the low-resolution 3D model.
In Figure 10, two objects in 3D models and orthophotos are visually compared. As can be seen, the intensified 3D model has more details. In the highlighted regions, small objects in the intensified 3D model are more distinctive than the low-resolution 3D Figure 11. Visual comparison of 3D view of objects edges. model, which is due to the higher quality of spatially enhanced images using SISR. On the other hand, it is clear that the intensified 3D model, compared to the reference 3D model, still has shortcomings in the details and quality of the edges. Although the spatial resolution of the reference 3D model is three times better, Shermeyer and Van Etten (2019) showed that even if the spatial resolution of the super-resolved images is improved by a higher scale factor, SISR methods cannot achieve the quality of original images at the same spatial resolution. In addition, these studies demonstrate that increasing the scale factor to produce spatially enhanced images can reduce the quality of the results against lower scale factors and can cause artificial structures in the outputs, which is not acceptable.
In Figure 11, three objects are shown in the 3D view to compare the sharpness of the edges visually. For each object, a magnified view is represented. The edges of the objects are obviously sharper in the intensified 3D model than low-resolution 3D model. However, as before, there are still shortcomings compared to the reference 3D model.
In addition to the visual evaluation, quantitative assessment of produced 3D models is of particular importance. As mentioned before, the 3D thermal model is a 3D model, onto which the texture (orthophoto) is mapped, so in this part, the spatial accuracy of the intensified 3D model and the orthophoto is evaluated separately. To measure the effect of SISR on the quality of the intensified 3D model, the RMSE and MRE metrics once are calculated between low-resolution 3D model and reference 3D model and again between the intensified 3D model and reference 3D model (Table 3).
Besides, considering the importance of sharpness of the edges in 3D models, the quality of edges is evaluated based on MTF curves and EFM metric. For this purpose, proper edges are extracted from 3D models using Canny edge extraction operators. Then, the line segments are detected from the edges using the direct Hough transform (Mukhopadhyay and Chaudhuri 2015). Finally, those that are too long, too short, or slanted are eliminated by calculating the length and slope of the extracted lines. Figure 12 shows the suitable edges to MTF curves and EFM metric calculation extracted from 3D model. The mean values of the EFM metric for all the extracted edges are 0.928 and 0.979 for low-resolution and intensified 3D models, respectively. The higher mean value of EFM metric for the intensified 3D model than that of low-resolution 3D model shows the closeness of the intensified 3D model to the reference 3D model. For more analysis, eight edges with suitable distribution in the study area are selected for evaluation. The overall view of these samples is provided in Figure 13.
The edges in Figure 13 are magnified and compared in Figure 14. Each edge in the reference 3D model, Low-resolution 3D model, and intensified 3D model is shown and the MTF curves are presented.
As can be seen in Figure 14, the MTF curves of the intensified 3D model are closer to the MTF curves of the reference 3D model compared to the low-resolution 3D model and, suggesting that the proposed method improves the geometric quality of the 3D model. Moreover, it is obvious that although the edges are spatially enhanced in the intensified 3D model, there are still some distortions on the edges.
Finally, the EFM metric is computed to compare MTF curves with respect to the reference 3D model (Table 4).
As can be seen in Table 4, the results for the intensified 3D model that are generated based on the proposed method have higher closeness values, indicating the higher spatial quality of the intensified 3D model than the low-resolution 3D model. Also, for evaluating the spatial accuracy of the orthophotos generated from high-resolution thermal images, the MTF curve is plotted for each selected edge from 3D model in low-resolution, enhanced and reference orthophotos. Then, the average MTF curves are calculated. As can be seen in Figure 15, MTF curve of the enhanced orthophoto has lower degradation; therefore, the spatial resolution of the enhanced orthophoto is better than the low-resolution orthophoto.
It is noteworthy that if the interpolation-based SISR methods (such as bi-cubic) are used instead of the EDSR to enhance the resolution of the images, although a dense 3D model is produced, the accuracy of the produced 3D model would be lower than the intensified 3D model generated using the proposed method. This is because although this method is simple, the high-frequency details of the image are not restored (Ooi and Ibrahim 2021).
To illustrate this point, a bi-cubic dense 3D model is produced and compared with the intensified 3D model. Therefore, the MTF curve is plotted for 20 edges of the proposed intensified 3D model and bi-cubic dense 3D model. Finally, the average MTF curve is calculated. As shown in Figure 16, the average MTF curve of the intensified 3D model has  In MTF curve diagram, the blue curve represents the MTF of the reference 3D model, the red curve represents the MTF of the Low-resolution 3D model and, the green curve represents the MTF of the intensified 3D model. Closeness of low-resolution 3D model and intensified 3D model to Reference 3D model. Figure 15. The average MTF curve for selected edges in low-resolution orthophoto (red curve), intensified orthophoto (green curve) and reference orthophoto (blue curve). Figure 16. Comparison between the average MTF curves for some edges in intensified 3D model (blue curve) and bicubic dense 3D model (pink curve).
lower degradation and the spatial resolution of the intensified 3D model is better than the bi-cubic dense 3D model.

Conclusions
Nowadays, 3D thermal model that presents thermal information and provides 3D building roofs information can play an important role in energy management of urban area. By considering the challenges of acquiring multisource data and registering data from various data sources and using hardware techniques to generate a high-resolution 3D thermal model, in this article, a method is presented for intensifying 3D thermal model using SISR algorithm. The effect of SISR algorithm on 3D thermal model intensification is investigated by measuring spatial resolution evaluation criteria. The efficiency of the proposed method was evaluated from two criteria, i.e. RMSE and MRE, to compare the low-resolution 3D model and intensified 3D model against the reference 3D model. Evaluations indicate 47% improvement in RMSE and 0.03% improvement in MRE for the intensified 3D model that indicates an increase in the quality of the result. In addition, for spatial evaluation, considering the importance of edge information in 3D model quality, especially in building boundaries, the sharpness of the edge information in the intensified 3D model is computed using MTF curves and EFM metric. The results show that the MTF curves of the intensified 3D model are closer to the MTF curve of the reference 3D model. Moreover, the EFM metric shows higher values for MTF curves of the intensified 3D model against MTF curves of the low-resolution 3D model. In addition, investigating MTF curves of orthophotos shows MTF curve of enhanced orthophoto has lower degradation and the spatial resolution of the enhanced orthophoto is better than the low-resolution orthophoto. In addition, although the evaluation shows the efficiency of the proposed method to generate 3D thermal models with higher spatial resolution, further studies should be concentrated on producing better results in the boundaries of the objects. Also, it is necessary to investigate the possible incensement in spatial resolution and the use of different methods of DL-based SISR and their effect on the quality of 3D thermal models. In addition, with the aim of using the intensified 3D thermal model for the interpretation purposes, the radiometric calibration of the thermal sensor is suggested in order to determine the absolute temperature.