Spatio-temporal resolution enhancement for cloudy thermal sequences

ABSTRACT Many applications require remotely sensed brightness temperature (BT) data acquired with high temporal and spatial resolutions. In this regard, a viable strategy to overtake the physical limitations of space-borne sensors to achieve these data relies on fusing low temporal but high spatial resolution (HSR) data with high temporal but low spatial resolution data. The most promising methods rely on the fusion of spatially interpolated high temporal resolution data with temporally interpolated HSR data. However, the unavoidable presence of cloud masses in the acquired image sequences is often neglected, compromising the functionality and/or the effectiveness of the most of these fusion algorithms. To overcome this problem, a framework combining techniques of temporal smoothing and spatial enhancement is proposed to estimate surface BTs with high spatial and high temporal resolutions even when cloud masses corrupt the scene. Numerical results using real thermal data acquired by the SEVIRI sensor show the ability of the proposed approach to reach better performance than techniques based on either only interpolation or only spatial sharpening, even dealing with missing data due to the presence of cloud masses.


Introduction
Sensors on-board satellite platforms acquire everyday thousands of images in order to continuously monitor huge geographical areas. This amount of data is usually processed to achieve synthetic information (features) related to geophysical quantities. A product of broad interest, provided by the processing of brightness temperature (BT) data, is the land surface temperature (LST). This is indeed used as input in several applications (Running et al., 1994), such as forest fires (Eckmann, Roberts, & Still, 2008), urban, regional and global climate models (Meehl, 1994;Weng, 2009), crop growth modeling (de Wit & Van Diepen, 2008) and water resource management (Allen, Tasumi, & Trezza, 2007;IRRISAT Project).
In many monitoring applications, strict requirements about spatial, spectral and temporal resolutions of data are imposed. Very high resolutions in different domains are often required. Unfortunately, due to physical constraints of modern sensors, these requirements cannot be achieved by acquiring data from a unique source of information. Instead, the acquisition of data from multiple sensors, frequently mounted on-board multiple satellite platforms, represents the sole viable solution to fulfill these strict requirements. Thus, data fusion is increasing attention into the scientific community.
A successful instance of data fusion is the spatial enhancement (sharpening) of low spatial resolution (LSR) multispectral (MS) optical data using a high spatial resolution (HSR) single band optical (panchromatic) image. This data fusion approach is often called pansharpening, which stands for panchromatic sharpening.
TSP is applied to the issue of fusing data acquired by sensors: i) on-board the same satellite platform, thus avoiding several problems like image co-registration, or ii) on-board different satellite platforms (Zakšek & Oštir, 2012). A classical example of this second case is represented by the sharpening of high temporal resolution (htr) but LSR thermal image sequences (or, vice versa, the temporal enhancement of HSR but low temporal resolution (ltr) thermal image sequences). This usually leverages on geostationary data acquired with a htr but LSR (because of the distance from the Earth) and on nearpolar orbiting satellite data with complementary spatiotemporal features. Considering the temporal dimension of the acquired data, i.e. sequences instead of single images, the quality of the synthetic fused sequences can be improved. In this case, for some applications a real time requirement could be imposed, i.e. only past observations can be considered into the data fusion framework, see e.g. the related authors' works in Addesso, Conte, Longo, Restaino and Vivone (2012), . and Addesso, Longo, Restaino, and Vivone (2013). Whereas, for those applications where this requirement is not necessary (non-real time case or batch case), all the collected data in a given time interval can be considered by the data fusion approach (Addesso, Capodici, et al., 2013, 2015. In this work, the authors focus attention on the possibility of using smoothing techniques for enhancing the resolution of BT image series in non-real time (batch) scenarios. The most promising methods rely on the fusion of spatially interpolated htr/LSR data with temporally interpolated ltr/HSR data. These techniques proved capable of catching the temporal and spatial correlation exhibited by real data while not incurring in heavy computational burden. Hence, a framework combining spatio-temporal interpolated data with techniques of temporal smoothing and spatial enhancement is proposed, thus leveraging on the temporal correlation inside image sequences. In particular, a Bayesian smoother based on the Rauch-Tung-Striebel (RTS) algorithm and a pansharpening method belonging to the multi-resolution analysis family (an undecimated wavelet decomposition with multiplicative injection scheme) are exploited for temporal smoothing and spatial sharpening, respectively. The former represents the most straightforward implementation within the class of equivalent optimal Bayesian smoothers (Brown & Hwang, 1992), while the latter has been selected since it does not require specific information regarding the acquisition sensors. Multi-resolution analysis approaches have also the capability of accurately preserving the spectral content of the LSR data , thus being suitable for future multichannel implementation of the proposed thermal image sharpening method.
When dealing with the fusion of real data sequences, the unavoidable presence of cloud masses in the acquired images has to be also taken into account. This represents a serious issue for multitemporal techniques, as the one proposed in this paper. Indeed, the incompleteness of image sequences compromises the functionality and/or the effectiveness of the most fusion algorithms. To overcome this problem, the following steps are integrated in the proposed data fusion framework: i) a cloud detection approach to detect cloud masses in image sequences; ii) a module relying upon temporal interpolation (TI), whose aim is to fill the gaps in the image sequences due to cloud masses by estimating a value for the BT congruent with the available data.
The experimental results are conducted on data observed by the SEVIRI sensor mounted on-board the Meteosat second generation geostationary platform, over the Iberian peninsula. More specifically, the IR 10.8 channel, characterized by the [9.80-11.80] μm spectral acquisition range, was used in the tests. A reduced resolution assessment protocol is employed in this work. The purpose is to obtain a perfectly controlled scenario, suited for testing algorithms with any desired relationship among the characteristics of the fusing data. To this aim, a single series of real SEVIRI images, acting as the sequence to estimate, or ground-truth (GT), is employed. On the contrary, the fusing sequences are simulated by processing the GT according to Wald's protocol (Wald, Ranchin, & Mangolini, 1997). The experimental results show: i) the ability of the proposed approach to reach better performance than techniques based on either TI or spatial sharpening, and ii) the ability of the proposed technique to deal with missing (cloudy) data, even analyzing the robustness of the exploited TI strategies to cloud detection errors.
The work is organized as follows. Section 2 presents the formalization of the thermal image sequence enhancement problem and the proposed solution. Section 3 is devoted to the description of the cloud detection algorithm employed to assess the performance of the data fusion approach by varying cloud detection errors. Section 4 instead shows the assessment of the proposed approach. Finally, conclusions and future developments are drawn in Section 5.
A framework for performing spatio-temporal fusion in the presence of clouds This section is devoted to the issue of fusing a / htr/ LSR thermal image sequence with a ltr/HSR thermal image sequence producing a synthetic sequence with high resolution in both the domains. The presence of cloud masses in the acquisition of these sequences can further complicate this multi-sensor data fusion problem and the estimation of the BT is surely a further challenging task to be addressed. In order to gain more insights, a specific setup is arranged. Nonetheless, the developed solution can be easily applied to many real-world problems.
In particular, the htr/LSR sequence is denoted as L ¼ L k : k 2 T L f g and the ltr/HSR sequence as The goal is to get an estimation of the ideal htr/HSR sequence E ¼ E k : k 2 T E f g , denoted asÊ ¼Ê k : k 2 T E È É , characterized by the same temporal resolution of L and the same spatial resolution of H. For the sake of simplicity, the support T E of E is suppose to be the same as the one of L, i.e. T L .
This estimation problem is addressed by exploiting the Bayesian framework, whose suitability for inferring the physical characteristics of the ground surface from SEVIRI acquisition has been recently demonstrated even in the presence of missing data (Masiello, Serio, et al., 2013, 2015. The sequence to be estimated E is modeled as a Markov process with states x k , where k is in T E ¼ 1; . . . ; N f g , with N representing the maximum number of acquisitions from the htr/LSR sensor. x k is the vectorial form of the acquired image E k at time k, i.e. x k ¼ col E k ð Þ, where col Á ð Þ arranges a matrix in vector in a column-wise way. The observation vector, denoted as y k , contains the acquired data organized in the same way. The time evolution of x k and y k is supposed to follow the linear dynamic models (1) in which w k ,N 0; Q k ð Þ is the process noise assumed zero-mean Gaussian with covariance Q k and n k ,N 0; R k ð Þ is the observation noise assumed zeromean Gaussian with covariance R k .
The maximum a posteriori (MAP) solution of this problem, which is a key instance of the Bayesian estimation (Van Trees, 2001), is searched. In particular, the fixed interval smoothing problem (Simon, 2006) is addressed; it consists in estimating the state at time k 2 T E by exploiting the whole observation sequence The MAP optimal estimate is given bŷ where p x k jy 1:N À Á is the so-called smoothing distribution andx kjN indicates the estimate at time k given the observations in the interval 1; . . . ; N. The corresponding covariance matrix is denoted by P kjN .
Unfortunately, in a real-world scenario, the ground surface can be masked by cloud masses. In the above-mentioned formalization, this issue is not taken into account. Indeed, the estimation problem considers that all the observations are acquired in a clear-sky condition. All the pixels contaminated by clouds constitute missing information about the brightness and, thus, they cannot be considered as observations into the Bayesian framework. In order to deal with this further issue, a strategy is proposed and consists in i) detecting the corrupted pixels with a cloud detection algorithm and ii) roughly estimating the missing information by means of TI.
Hence, the general framework that allows to get a consistent estimate of the complete htr/HSR sequence of BTs when cloud masses are present is depicted in Figure 1 and briefly outlined, below: (1) Temporal inte rpolation. This step is applied to both the htr/LSR sequence L and the ltr/ HSR sequence H.
f g and the cloudy pixels are estimated via a TI algorithm to obtain a sequencê H ¼Ĥ k : k 2 T L È É , see e.g. H 4 andĤ 4 in Figure 1; • missing information represented by cloudy pixels in L ¼ L k : k 2 T L f gis approximated via TI to get a sequenceL ¼L k : k 2 T L È É , see L k andL k for k ¼ 2; 3; 4 f g in Figure 1.
(2) Spatial interpolation (SI). The images inL ¼ fL k : k 2 T L g are upsampled to the same scale as the images in H via SI to get the sequenceL k : k 2 T L È É .
(3) Data fusion. The two pre-processed sequencesL andĤ are combined to get an estimation of the where, for the sake of simplicity, T E ¼ T L . In particular,L andĤ are combined through a sharpening algorithm producing a first rough estimate of an htr/HSR sequence, denoted as S ¼ S k : k 2 T L f g . Afterwards, this sequence is used as input for a Bayesian smoother in order to get a refinement of the estimation for the sequence of the BT series E.
In the next sections, more details about these three steps will be provided to the readers.

Temporal interpolation
The TI of the ltr/HSR sequences can be addressed in several ways. Sophisticated approaches are based on the spatial correlation among neighboring pixels (Garcia, 2012). However, for computational reasons, a simple approach is considered in this work; it consists in independently managing each pixel, thus applying deterministic one-dimensional interpolation algorithms. Superior performance could be achieved by taking into account a priori information such as the temporal evolution of the temperatures during the day (Schädlich, Göttsche, & Olesen, 2012), described by diurnal temperature cycle (DTC) models (Duan, Li, Wang, Wua, & Tang, 2012). However, in real scenarios DTC models are not enough accurate due to the presence of clouds.
Hence, in view of applying this framework in nonideal conditions, TI without any a priori information given by DTC models is exploited. The interpolation procedure is applied to: (1) The ltr/HSR BTs H ¼ H k : k 2 T H f gin order to obtain a full sequence of interpolated tem-peraturesĤ ¼Ĥ k : k 2 T L È É .
(2) The L sequence, only for pixels affected by clouds (detected, for instance, by the cloud detection algorithm in Section 2.1.1), in order to get an estimate of these missing data (indeed, BTs are unavailable for cloudy pixels). This leads to a newL k sequence.
The key issue characterizing the TI of real sequences is represented by the large non-uniform intervals among available samples. Thus, in practice, an optimal interpolation scheme is hardly defined and the interpolation is commonly performed through heuristic block-wise procedures. In the numerical results of this work, two possible TI strategies are considered, pointing out the pros and cons in real-world scenarios. In particular, these strategies are: iÞ a block-wise cubic interpolation (CI), and iiÞ a polynomial fitting obtained via a minimum mean square error (MMSE).
The cloud detection algorithm Cloud detection algorithms for remote sensing images are frequently performed by using MS information, as e.g. in De Ruyter de Wildt, Seiz, and Gruen (2007), in order to fully exploit the differences between clouds and other kinds of surface (e.g. snow cover). Moreover, it is possible to include a priori information of different nature, such as spatial and temporal information (see e.g. (Li, 2009;Vivone, Addesso, Conte, Longo, & Restaino, 2014)), within a Bayesian formalization. In the proposed framework, the choice of the proper cloud detection algorithm is left to the user and to the related literature. In this paper, a simple algorithm that allows to assess the performance of data fusion in the presence of cloud detection errors, is exploited, see Section 3 for details.

Spatial interpolation
SI of images has been the subject of many papers in the technical literature, see e.g. (Aiazzi, Baronti, Selva, & Alparone, 2013;Keys, 1981;Meijering, Niessen, & Viergever, 2001;Parker, Kenyon, & Troxel, 1983). A computationally efficient way to implement twodimensional interpolation methods, without significantly reducing the quality of the final product, consists in handling independently the vertical and the horizontal dimensions. Accordingly, in this paper the analysis is restricted to one-dimensional approaches and, in particular, a bi-CI is chosen. It is completed by calculating the convolution of the image pixel values with the one-dimensional kernel in which a is a parameter and x is a generic variable indexing the vertical or the horizontal dimension, being the operator applied in sequence to the columns and to the rows of the image. Note that hð0Þ ¼ 1 and hðnÞ ¼ 0 for all n 2 N þ . The parameter a is set it to À 0:5 (which corresponds to cubic Hermite spline), producing a third-order convergence with respect to the sampling interval of the original function (Keys, 1981). Hence, starting from the temporal interpolated sequence,L ¼ fL k : k 2 T L g, the upsampled (spatial interpolated) sequenceL ¼L k : k 2 T L È É is made available to be used for data fusion, see Section 2.3.

Data fusion
In the linear Gaussian case the optimal solution of (3) can be obtained through several methods (Bryson & Frazier, 1962;Fraser & Potter, 1969;Mayne, 1966;Rauch, Tung, & Striebel, 1965), which are characterized by different computational requirements and robustness properties (Crassidis & Junkins, 2012). Among these implementations, the RTS approach (Rauch et al., 1965) has been selected. The state space model described by (1) and (2) plays a crucial role within the Bayesian smoothing problem (Ghahramani, 2001), since closed form solutions are founded on its correctness. In the following sections, the linear dynamic model (1) adopted for describing the time evolution of the brightness and the linear observation model (2) for producing an observation sequence from the available data are described. Afterwards, some details will be also provided about the adopted RTS Bayesian smoothing approach.
Linear dynamic model Following the rationale explained in Addesso, Conte, Longo, Restaino & Vivone (2015), the brightness daily variation of each pixel is independently computed, namely a diagonal state transition matrix is employed, where diag Á ð Þ is the diagonal operator and À is the component-wise division.
The covariance matrix of the noise Q k is defined as Q k ¼ σ 2 m I for all k, where I is the identity matrix and σ m is the standard deviation of the process noise. In particular, the information for modeling the time evolution of the brightness related to the HSR pixels is extracted from the available LSR images. Although being carried out on a coarser resolution, the estimation of the transition matrix A k allows to roughly take into consideration the physical phenomena that influence the brightness, e.g. the incoming solar radiation effect.

Linear observation model
The observation model (2) is a key issue for this data fusion approach. Since the goal is to get an estimate of an ideal htr/HSR sequence that has the same spatial characteristics of the ltr/HSR sequence, the estimate sequence is the ltr/HSR image for k 2 T H , i.e. where the latter information is available. This can be easily formalized into the Bayesian framework by using and setting the observation noise covariance to zero, i.e. R k ¼ 0.
Instead, for k 2 T E nT H , i.e. when HSR images are not available, the straightforward approach consists in using the LSR image as observation. However, this method requires an accurate model for the statistical relationship between the high and LSR representations of the illuminated scene, which constitutes a well-known weakness of the Bayesian approaches (Fasbender, 2008). Accordingly, an approach for taking into account all the available information into a single comprehensive observation is exploited. To this aim, the proposed method employs a sharpening procedure, which yields a first rough approximation of the desired htr/HSR image. Thus, the two intermediate sequencesL andĤ are combined through a deterministic fusion rule FðÁ; ÁÞ producing the rough estimate of the htr/HSR sequence The employed fusion rule FðÁ; ÁÞ is based on the high pass modulation (HPM) injection scheme (Schowengerdt, 2007), which is very credited in the pansharpening literature (Aiazzi, Alparone, Baronti, Pippi, & Selva, 2002;Vivone, Restaino, Dalla Mura, Licciardi, & Chanussot, 2014). Thus, the sharpened image is defined by the following fusion rule whereĤ LP k is a low pass version of the HSR imageĤ k . H LP k is achieved by a multi-resolution analysis implemented through the undecimated "à trous" algorithm (Starck, Fadili, & Murtagh, 2007) choosing a B 3 cubic spline as scaling function (Strang & Nguyen, 1996). Thus, the use of an enhanced observation (via a sharpening approach) allows to reduce the measure uncertainty; in turn, this allows to reasonably model the observation noise through a spatially uncorrelated Gaussian process with standard deviation σ o .
By summarizing, the proposed observation model can be written as The RTS smoothing The RTS smoothing approach (Rauch et al., 1965) is an efficient two-step algorithm for fixed interval smoothing. In the following, the two forward and backward steps composing the smoother are briefly detailed.
(1) Forward filter. The forward filter is a classical Kalman filter (KF) (Kalman, 1960). It represents the optimal solution for estimating the state x k in the linear Gaussian case, when only the observations acquired up to time k are available. It consists in recursively calculating the meanx kjk and the covariance P kjk of the state distribution p x k jy 1:k À Á , which is thus completely specified thanks to the Gaussian hypothesis.
The KF starts from the initial estimates of the statex 0j0 and of the error covariance P 0j0 and proceeds through the following recursion, for k ¼ 1; . . . ; N • Propagation step. Computation of the a priori estimatex • Update step. Computation of the posterior estimatê where is the so-called Kalman gain.
(2) Backward Filter. In the backward step, the smoothed state estimatesx kjN and covariances P kjN , are calculated. The computation starts from (last) time N and proceeds backwards in time using the following recursive equationŝ where x kjk and P kjk are the posterior state estimate and covariance at time k, respectively, obtained by the update step of the forward KF,x kþ1jk and P kþ1jk are the a priori state estimate and covariance at time k þ 1, respectively, obtained by the propagation step of the forward KF, whereasx kjN and P kjN are the final posterior state estimate and covariance at time k for this fixed interval smoothing approach.

Cloud detection algorithm for data fusion performance evaluation
Cloud detection algorithms for remotely sensed images are devoted to discriminate pixel-by-pixel the following two hypotheses: H 0 : the pixel is non cloudy; H 1 : the pixel is cloudy.
The output is an estimated cloud mask, which allows to identify the cloudy pixels by using ("k 2 T E ) a binary imageM k with components in the set C ¼ noncloudy; cloudy f g ¼0; 1 f g. The estimated cloud mask unavoidably differs from the actual one (say it M k ). Thus, two different error probabilities can be considered: • the false cloud probability P fc , also called Type I error probability, i.e. the probability to decide H 1 (cloudy pixel) when H 0 is true (noncloudy pixel); • the miss cloud probability P mc , also called Type II error probability, i.e. the probability to decide H 0 when H 1 is true.
The goal of this work is to assess data fusion performance when unavoidable misclassification errors arise from the cloud detector. Therefore, the cloud detection is carried out through a simple likelihood ratio test (LRT), which allows to tune the error probabilities by simply changing the threshold values. It is worth pointing out that any other state-of-the-art cloud detection approach can be substituted to the LRT in real applications, where the goal is to get best performance from the data fusion framework achieved also via refinements on the cloud detection strategy. The exploited LRT approach can be formalized as follows: • a MS image acquired at time k is denoted as D k , where D k 2 R BÂN has B bands and N pixels (each row contains the lexicographically ordered pixels of a given band); • the value at pixel n, i.e. d k ðnÞ ¼ D k ðÁ; nÞ, is modeled by a multivariate normal for both the hypotheses H 0 and H 1 .
Therefore, "k, one can write where d 2 R B is a generic pixel, μ k;i 2 R B is the mean vector at time k for the ith hypothesis (i 2 0; 1 f g), AE k;i 2 R BÂB is the covariance matrix at time k for the ith hypothesis, and ðÁÞ T and ðÁÞ À1 are the transpose and inverse operators, respectively. The detection strategy, based on the log-likelihood ratio Λ k ðÁÞ, is: where γ is a threshold that generates, for each k, a different couple ðP fc ðγ; kÞ; P mc ðγ; kÞÞ of false cloud and miss cloud probabilities. The receiver operating characteristic (ROC) of the test can be defined in terms of the average values of P fc ðγ; kÞ and P mc ðγ; kÞ over the whole sequence, say P fc ðγÞ and P fc ðγÞ, respectively. Finally, the estimated cloud mask, for each k, is given bŷ where HðÁÞ is the Heaviside step function.

Numerical results
This section is devoted to the assessment of the proposed smoothing Bayesian approach in real scenarios acquired by the SEVIRI sensor. A further testbed is related to the design of the TI operator under a criterion of robustness with respect to cloud masking errors.

Dataset description
As above-mentioned, thermal image sequences acquired by the SEVIRI sensor (band IR 10.8) are employed. In particular, the obtained outcomes are related to data collected on 16 August 2014 over the Iberian peninsula (latitude between 35:7 and 41:4 degrees North, longitude between 4:1 and 9:8 degrees West), hereafter Spain dataset, characterized by a spatial resolution of about 6 km and a temporal rate of four images per hour. The protocol for accurately assessing the quality of the fused products is as follows. The original dataset plays the role of the estimating htr/HSR sequence E. H is simulated by selecting a subset of E with a temporal interval Δ H ¼ 8 between each couple of ltr/HSR images. L is simulated by generating a spatially degraded version of E imposing a spatial resolution ratio R between E and L equal to 6. The spatial degradation is obtained by firstly applying a Gaussian filter matched to the sensor's modulation transfer function to the target image and then downsampling the outcome by a factor equal to R.
In order to assess the robustness of the framework with respect to the cloud mask errors, real cloud patterns, say it S k , acquired on 17 June 2013 on the southern part of Italy are superimposed on the clear-sky image sequence described before. In order to merge these two real sequences making available both the GT (original clear-sky sequence) and the observed cloudy sequence (clear-sky + real cloudy sequence), the following steps are performed: • Computation of the cloud mask sequence M k on the cloudy sequence S k . • Construction of a coefficient map sequence W k from the cloud mask via morphological operators (Soille, 2003), namely: ○ a certain number N E of successive erosions are performed "k in order to get N E eroded maps ε n e B ½M k , where n e is an integer such that n e 2 ½1; N E , ε B ½Á is the erosion operator with structuring element B (Soille, 2003), which is defined as and ε n e B ½Á is obtained by performing n e erosion steps; • the coefficient map is computed by the formula ε n e B ½M k : • Addition of the clouds to the clear-sky sequence. For the sake of simplicity, only the sequence between two HSR images are changed according to the following equation: in which 1 denotes a matrix with all entries equal to one, and the operator " " indicates the element-wise Hadamard product. This procedure has been performed for the interval between and 16:00 UTC, resulting in the synthetic data sequence shown in Figure 2. The corresponding cloud cover fraction, computed for each k, is shown in Figure 3. The main advantage of this protocol is the accuracy due to its ability to produce a GT for the assessment. Instead, the main drawbacks are the following ones: • the suitability of the employed assessment procedure strictly depends on the accuracy of the scale invariance hypothesis; • the reference (GT) temperature is supposed to not be affected by the presence of the fictitious clouds.
Actually, the last issue does not imply a significant loss of generality, because no physical model of the temperature is employed in this framework. In fact, since the proposed approach exploits the sole information contained in the available images, it is highly reasonable to assume that the same behavior would be shown with diverse trends of the surface temperature.

Cloud detection impact on data fusion
The datasets are simulated through the procedure described in the previous section allow to assess the performance of the data fusion framework in real conditions (cloud mass presence); the effectiveness of the approach is evaluated in the following two cases: • perfect cloud classification (PCC), in which perfect cloud/no cloud pixel classification is assumed; • non-perfect cloud classification (NPCC), where the cloudy pixels are identified via the cloud detection algorithm described in Secion 3 by using two thermal bands (namely IR 10.8 and IR 13.4, used in De Ruyter de Wildt et al., 2007)) for different values of P fc and P mc .
The data used to estimate the mean vectors μ k;0 and μ k;1 and the related covariance matrices AE k;0 and AE k;1 exploited by the cloud detection algorithm, see (18), have been acquired on the southern part of Italy on 12, 13, 18 and 19 June 2013 and on 3, 5 and 7 August 2013. The accuracy is measured in terms of root mean square error (RMSE), defined as ffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi ffi E½ðI À JÞ 2 q , where I is the ground-truth, J is the estimated image, and E½Á indicates the sample average over the pixels.
The performance of the algorithm in terms of RMSE is resumed in Table 1 as a function of the cloud detection error probabilities P fc and P mc and further illustrated by the plots in Figure 4. As expected after the preliminary study performed in   , the performance of the CI-based scheme, which is the best one in a PCC scenario (RMSE CI ¼ 0:884 K), is strongly deteriorated when P mc increases, whereas the MMSE-based scheme, which is less effective in a PCC scenario (RMSE MMSE ¼ 1:027 K), is more robust than CI to cloud detection errors, as shown in Figure 4 panel (b). Furthermore, note that the other error probability, i.e. P fc , has a minor impact on the data fusion effectiveness, as shown in Figure 4 panel (a). These outcomes are also corroborated by Figure 5 in which, for both the PCC and the NPCC cases, the BT RMSEs over time are reported.
Finally, in Figure 6, the data fusion outcomes for different values of P mc are depicted by focusing on time 9:30 UTC. This visual analysis points out that, even if the global RMSE is low, there are evident artifacts in the CI-based fused products even for low P mc values. These are mainly due to errors in the detection of clouds (probably on cloud boundaries) that are reported at the current time, because of the CI TI. On other hand, the MMSE scheme is much more robust to cloud errors and some artifacts emerge only for high values of P mc , as shown in Figure 6 panel (l).

Conclusions and future developments
High spatial and high temporal thermal sequences are of great interest for monitoring wide rural and urban areas. Unfortunately, because of physical constraints, these sequences cannot be acquired by a single sensor. Instead, data fusion techniques are often required to fulfill this task. In this paper, a framework for fusing htr but LSR with HSR but ltr sequences is proposed. It works in non-real time (batch mode) and is able to enhance thermal sequences even when missing data (clouds) affect the scene. The proposed framework consists of temporal smoothing and spatial sharpening techniques. In particular, a Bayesian smoother relying upon the RTS algorithm and a pansharpening method belonging to the multi-resolution analysis family (undecimated wavelet decomposition with HPM injection) are exploited together to get the synthetic high spatial and htr sequence. The issue of missing data due to the presence of cloud masses is addressed by a cloud detection algorithm followed by a TI in order to estimate the landscape under the clouds, thus providing a complete thermal map sequence even in a cloudy environment.
The numerical results using real data acquired by the SEVIRI sensor (band IR 10.8) demonstrate the capability of the proposed framework to counteract to a certain extent the effects of cloud coverage and even of cloud detection errors. In particular, the MMSE-based scheme is much more robust to cloud detection errors than the CI-based one, thus resulting more appropriate for cloudy scenarios. Instead, CI-based approaches are suggested for cases when P mc is very low, often achievable only in (almost) clear-sky conditions. Future investigations could be devoted to the improvements of the data fusion framework, such as the introduction of a priori knowledge about the surface (e.g. digital elevation model) and the integration of physical models for the incoming solar radiation, the surface energy balance and the DTCs.