A change detection framework by fusing threshold and clustering methods for optical medium resolution remote sensing images

ABSTRACT In change detection (CD) of medium-resolution remote sensing images, the threshold and clustering methods are two kinds of the most popular ones. It is found that the threshold method of the expectation maximization (EM) algorithm usually generates a CD map including many false alarms but almost detecting all changes, and the fuzzy local information c-means algorithm (FLICM) obtains a homogenous CD map but with some missed detections. Therefore, a framework is designed to improve CD results by fusing the advantages of the threshold and clustering methods. The CD map generated by the clustering method of FLICM is used to remove false alarms in the CD map obtained by EM threshold method by an overlap fusion. Then, the local Markov random field model is implemented to verify the potentially missed detections. Finally, a fused CD map with less false alarms and missed detections is achieved. Two experiments were carried out on two Landsat ETM+ data sets. The proposed method obtained the least errors (1.11% and 3.51%) and the highest kappa coefficient (0.9366 and 0.8834), respectively, when compared with five popular CD methods.


Introduction
Change detection (CD) from remote sensing images identifies changes by analyzing multitemporal images acquired over the same geographical area at different times, such as land use/land cover, damages due to earthquakes, floods and fires, changes of roads, cities and plants (Chen, Moriya, Sakai, Koyama, & Cao, 2014;Lu, Mausel, Brondizio, & Moran, 2004;Mari, Laneve, Cadau, & Porcasi, 2012). In the past three decades, lots of CD methods have been proposed to automatically achieve accurate CD results (Hao, Shi, Zhang, & Li, 2014;Moser, Angiati, & Serpico, 2011). All methods can be grouped into supervised and unsupervised types. The former detects changes and supply change types by comparing the classification maps of bitemporal images. However, it needs the ground reference, which limits its application. The latter identifies changes without needing the ground reference; therefore, the study in this paper focuses on the unsupervised ones.
Three steps are usually involved in CD: (1) preprocessing, (2) image comparison and (3) image analysis. In the first step, several corrections need implementing between bitemporal images to reduce the effects of light and atmospheric condition, such as co-registration, radiometric corrections (Mishra, Ghosh, & Ghosh, 2012;Ye & Shan, 2014). In the second step, the difference image is generated by pixel-by-pixel comparing between bitemporal images. Many methods have been used to generate the difference image, including the image differencing, image ratio, image correlation, image regression, log ratio for synthetic aperture radar (SAR) images and change vector analysis (CVA) (Shi, Shao, Hao, He, & Wang, 2016). In the third step, the difference image is analyzed and divided into changed and unchanged parts. In the beginning, visual analysis was applied to detect changes, which costs much time and limits the production efficiency (Leichtle, Geiss, Wurm, Lakes, & Taubenboeck, 2017). Afterward, a trial-and-error threshold method was implemented by changing the threshold value, and an empirical threshold method was developed to identify changes using x i > m þ T Á σ, where x i is the grey value of the ith pixel, T is a constant, m and σ are the mean and standard deviation of the difference image, respectively (Fung & LeDrew, 1988). In order to improve the efficiency, some automatic threshold methods were proposed (Bazi, Bruzzone, & Melgani, 2005;Im, Jensen, & Hodgson, 2008). One of the most popular threshold methods was proposed by Bruzzone and Prieto (2000), where the difference image is supposed as mixture Gaussian model and the Bayes rule for minimum error is adopted to calculate the threshold using expectation maximization (EM) algorithm. Due to much false alarms existing in the change map obtained by threshold, the spatial information was introduced by some advanced models, such as Markov random model (Hao, Shi, Deng, & Zhang, 2014;Subudhi, Ghosh, Nanda, & Ghosh, 2017), support vector machine (Bovolo, Bruzzone, & Marconcini, 2008;Nemmour & Chibani, 2006), artificial neural network (Wang, Shi, Atkinson, & Li, 2015;Xu, Ricci, Yan, Song, & Sebe, 2015), wavelet transform (Celik & Ma, 2011;Li, Shi, Zhang, & Hao, 2017), active contour model Li, Gong, & Liu, 2015), and fuzzy c-means clustering (FCM) algorithm (Ghosh, Mishra, & Ghosh, 2011;Krinidis & Chatzis, 2010;Mishra et al., 2012). The ranges of pixel grey values in the difference image belonging to the changed and unchanged clusters often have an overlap, and FCM has robust characteristics for ambiguity and provides an appropriate choice to identify them by using fuzzy set information (Ghosh et al., 2011). The improved ones mainly contain an improved local energy term of exploiting spatial information, where a robust fuzzy local information c-means (FLICM) was proposed by Krinidis and Chatzis (2010) for image segmentation. A novel fuzzy factor was added into FCM to guarantee noise insensitiveness, which usually smoothed detailed changes due to overuse of spatial information (Zhang, Wang, Shi, & Hao, 2017).
It is found that the threshold method, i.e. the EM algorithm, usually generates more false alarms but less missed detections than the clustering algorithm, such as active contour model and FLICM. The EM threshold method generates a CD map including many false alarms but almost detecting all changes. FLICM obtains a CD map including fewer false alarms (Hao, Zhang, Li, & Chen, 2017). Therefore, we aim to design a framework to improve CD results by fusing the advantages of threshold and clustering methods.

Methodology
Let X 1 and X 2 be two multispectral images acquired from the same geographical area at two different times. Assuming images have been co-registered and radiometrically corrected, the two images have the same size of M × N. The designed framework mainly includes three parts as shown in Figure 1. First, the difference image X is generated by the magnitude of CVA or image differencing, and consists of changed pixels W 1 and unchanged pixels W 2 . Second, EM threshold and FLICM CD methods are implemented to the difference image and two initial CD maps are yielded. Then, an advantage fusion strategy is proposed to fuse two initial CD maps by taking full use of their advantages and obtain the final CD map.

EM threshold method
In this study, it is assumed that the difference image X can be seen as a Gaussian mixture, as follows: where p(X), p(X/W 1 ) and p(X/W 2 ) are the probability density functions of the difference image X, changed pixels W 1 and unchanged pixels W 2 , and P(W 1 ) and P(W 2 ) are the a priori probabilities of changed pixels and unchanged pixels, respectively. The probability density functions p(X/W 1 ) and p(X/W 2 ) are Gaussian and can be written as follows: where k 2 ð1; 2Þ, μ k and σ k are the respective mean and variance of the corresponding pixels of class W k . EM can be performed to estimate the mean values μ k by the following three steps .
Step 1: Initialize the means μ k , covariance σ k and a priori probability P(W k ). A threshold d to the difference image was set to obtain the initial changed pixels and unchanged pixels. The threshold can be generated from an empirical equation written as follows: where R is a constant, μ X and σ X denote the respective mean and standard deviation of the difference image. The values of μ k , σ k and P(W k ) can then be computed from the classified pixels and regarded as the initial values to EM.
Step 2: Expectation step. The a posterior probability P(W k /X) was evaluated as follows: where 1 i MN and x i is the ith pixel of the difference image.
Step 3: Maximization step. Re-estimate the parameters using the following equations: where the superscripts t and t + 1 are the current and next iterations, respectively. The parameters are estimated by the steps above and then checked for convergence. If the convergence criterion is not satisfied, repeat steps 2 and 3 until convergence is achieved. Finally, the mean values are estimated.
Finally, the Bayes rule for minimum error is adopted to calculate the threshold T 0 according to the following equation: FLICM-based clustering method Dunn (1973) first developed FCM algorithm and later extended by Bezdek (1981). This clustering algorithm aims at producing and optimal c partition through an iterative clustering process. Suppose there are N pixels in the difference image X ¼ x 1 ; x 2 ; Á Á Á ; x N f g , and c is the number of the clusters. The FCM aims at obtaining the kth cluster by minimizing the objective function as follows: where u ki is the degree of membership value of pixel x i in the kth cluster, v k is the prototype of the center of cluster k, m is the weighing exponent in each fuzzy membership, and x i À v k k k 2 is the Euclidean distance between object x i and the cluster center v k .
To improve the robustness of the conventional FCM, local information has been introduced to extend it. Krinidis and Chatzis (2010) proposed a robust FLICM clustering algorithm to overcome the disadvantages of the absence of spatial information in the initial FCM algorithm. A fuzzy local similarity measure factor G was added into FCM, aiming to guarantee noise insensitiveness and image detail preservation (Krinidis & Chatzis, 2010), and the modified object function is written as follows: where the x i is the gray value of the ith pixel, N is the number of pixels in the difference image, v k is the prototype of the center of cluster k, u ki denotes the fuzzy membership of the i-th pixel with respect to cluster k, and for each pixel x i , the fuzzy membership satisfies the constraint that P c k¼1 u ki ¼ 1. Additionally, the added local factor is defined as follows: where the ith pixel is the center of the local window, the jth pixel represents the neighborhood pixels within the currently local window of the ith pixel, and d ij is the spatial Euclidean distance between pixels i and j. u kj denotes the fuzzy membership of the gray value j in terms of the k-th cluster, and v k represents the prototype of the center of cluster k.

Fusion of two kinds of method
In order to take full use of the advantages of threshold and clustering CD methods, a fusion strategy is designed to remove noise and retain detailed changes. As for CD methods incorporating contextual information, they usually miss changes around edges because of the overused contextual information, such as FLICM. The EM threshold method can detect almost all changes and obtain accurate shapes of changed regions. Therefore, the CD map generated by FLICM is adopted to refine the one obtained by EM threshold method using an overlap fusion and local Markov random field (LMRF). The flowchart of the proposed fusion strategy is shown in Figure 2, and the details can be found as follows.
Step 1: Labelling the EM CD map. For the EM CD map, the independent changed regions are labelled as clusters C ¼ C i ; 1 i n f g for the connected components with four-connected pixels, labelled from 1 to n, as shown in Figure 2.
Step 2: Overlap fusion. For the labelled cluster C i , the overlap ratio r i is calculated referencing the FLICM CD map using r i ¼ n 1 =n 2 , where n 1 is the pixel number in C i and n 2 is the number of changed pixels in the corresponding region of FLICM CD map. Set a threshold T to the overlap ratio r i . If r i > T, the cluster C i is remained as changes; in contrary, it is removed as false alarms. Therefore, an initial fusion CD map can be produced after this step.
Step 3: LMRF fusion. Compared with the initially fused CD map in the step 2, the potentially missed (nooverlap) pixels in FLICM CD map are detected (e.g. green pixels shown in LMRF fusion part of Figure 2). For all potentially missed pixels, LMRF is designed to the identified missed pixels by minimizing the following energy function Uðx i Þ with a pixel x i where β > 0 is a parameter with value fixed by the user to tuning the weight of spatial information, U s (x i ) describes the spectral energy function from the difference image, and U c (x i ) represents the energy term of contextual information computed from the local neighborhood N i of the center pixel x i . In the process of calculation, the U s (x i ) and U c (x i ) are normalized respectively. The detailed spectral energy term can be written as follows: where μ k and σ 2 k denote the mean and variance of class k, respectively, that can be calculated from the initial change map generated by FLICM.
The spatial energy of the pixel x i is defined as follows: where N i denotes the local neighborhood of the pixel x i (i‚N i ), l(x i ) and l(x j ) (j 2 N i ) represent the class label of the pixel x i and its local neighborhood, and the function I is defined as follows: Step 4: Output. The potentially missed pixels are relabeled by minimizing the energy of LMRF, and other pixels keep the original labels in the initially fused CD map.

Experimental results
In order to evaluate the effectiveness of the proposed method, several experiments were carried out on two remote sensing data sets. Additionally, five popular methods were implemented as a comparison, including EM threshold, fusion of EM and Markov random field (EMMRF), FLICM, superpixel-based active contour model (SACM) (Hao, Shi, Deng, & Feng, 2016), and undecimated discrete wavelet transform and active contour (UDWTAC) (Celik & Ma, 2011). All the experiments were implemented using Matlab 8.4 on a personal computer with 2.2 GHz CPU and 8.00 GB RAM. Four indices are used to quantitatively assess the results as follows. (1) Missed detections N m that indicate the number of actually changed pixels that were incorrectly classified as unchanged ones in the CD map. The ratio of missed detections P m is calculated by P m ¼ N m =N 0 Â 100%, where N 0 is the total number of changed pixels counted in the ground reference map; (2) False alarms N f indicate the number of actually unchanged pixels that were incorrectly determined as changed ones in the CD map. The ratio of false alarms P f is calculated with the ratio P f ¼ N f =N 1 Â 100%, where N 1 is the total number of unchanged pixels counted in the ground Figure 2. Flowchart of the spatial advantage fusion strategy. reference map; (3) Total errors N t indicate the total number of detection errors, including both missed and false detections. This total number refers to the sum of missed detections and false alarms. Hence, the ratio of total errors P t is calculated with P t ¼ ðN m þ N f Þ=ðN 0 þ N 1 Þ Â 100%; and (4) Kappa coefficient κ.

Experiments of data set 1
The first data sets used in experiments are two images acquired by the Landsat Enhanced Thematic Mapper Plus (ETM+) sensor of the Landsat-7 satellite in an area of Mexico in April 2000 and May 2002, and a section of 512 × 512 pixels was selected as test site. Image X 1 was registered to image X 2 using the linear transformation model, and the relative radiometric correction was implemented using the histogrammatching method (Coltuc, Bolon, & Chassery, 2006) by referencing image X 2 . The value of root mean square error on the geometric correction was 0.7. The changes were caused by a fire that burned a large part of the vegetation in the test region. Figure 3(a,b) show the band 4 of 2000 and 2002 images, respectively. The image differencing method was implemented to the band 4 of bi-temporal images to generate the difference image. A reference map was manually obtained by a detailed visual analysis of both the available multitemporal images and the difference image as shown in Figure 3(c).
In the proposed approach, the overlap threshold T and the weight parameter β of contextual information affect the CD accuracy. Figure 4(a) shows the change patterns of the CD accuracy indices with the changing threshold T, where the weight parameter was set to 0.5 and the threshold ranged from 0 to 0.8 with the steps 0.1. It can be seen that the ratios of missed detection and total errors go down until the threshold value of 0.5, then increase. The ratio of false alarms always decreases with the increasing threshold. Figure 4(b) shows the change patterns of the accuracy indices with the weight parameter β ranging from 0.1 to 0.9 with the steps 0.1, in which the threshold T was set to 0.5. The ratio of missed detections has a slight increase, and the ratio of false alarms has a slight decrease with the increasing weight parameter. But the ratio of total errors almost keeps stable with the increase of the weight parameter. Figure 5 shows the CD maps obtained by EM, EMMRF, FLICM and the proposed method, respectively. The weight parameter β in EMMRF controls the effect of spatial information, and a larger weight obtains a smoother CD map. It was set to 0.4 in this study. The segment scale Q determines the superpixel scale and the contour length μ controls the smooth of CD map in SACM. They were set to 128 and 0.2,  respectively. The contour length μ in UDWTAC was set to 0.1. All values of parameters in the above contrast experiments were manually set to obtain highly accurate results. The overlap threshold T in the proposed method was set to 0.5 by referencing the accuracy change patterns under different overlap thresholds. As can be seen in Figure 5(a), the change map of EM contains almost all changes, but many false alarms exist at the same time (e.g. circle region). EMMRF gives homogenous regions by using spatial context, but much detailed information (e.g. accurate shapes of changed regions) is removed and lots of false alarms still exist as shown in the circle region of Figure 5(b). This is explained by the factor that it depends on the initial CD map of EM and neglects the fuzzy spatial information. In contrast, FLICM and SACM obtain more homogenous CD maps than EM due to incorporating local information, but it misses some detailed changes due to the overused spatial information. The CD map obtained by UDWTAC was displayed in Figure 5(e), which contains many false alarms as shown by circle regions. However, the proposed method can remove false alarms and retain details more efficiently as shown in the circle regions of Figure 5(f), which generates the most similar CD map to the ground reference among all the methods in this study. The reason is that the proposed method reduces the false alarms existing in the EM-based results by taking advantage of the FLICM CD map and retains detailed changes through the LMRF. Table 1 presents the accuracy comparisons of missed detections, false alarms, total errors and kappa coefficient among these methods. EM results in the maximum false alarms of 9792 pixels, and EMMRF generates the minimum missed detections of 796 pixels and false alarms of 5076 pixels, which reduces missed detections and false alarms compared with EM by incorporating fuzzy spatial information. For FLICM, it reduces much false alarms compared with EM due to the introduction of local information, where the total errors are 4947 pixels. SACM and UDWTAC generate similarly accurate CD maps with FLICM. It is worth noting the proposed method obtains the most accurate CD results among all methods used in this study via the total errors and kappa coefficient, and the missed detections, false alarms and total errors are 1512, 1410 and 2922, respectively. More importantly, it results in less missed detections than FLICM and much less false alarms than EM. Experimental results have indicated that the proposed method is insensitive to the overlap ratio and weight parameters. The EM method costs the least time of 1.1 s, and the FLICM method needs 12.8 s which costs much computation time than other comparative methods. The proposed method is a little slower than FLICM but slower than other four methods because it combines EM and FLICM.

Experiments of data set 2
The second data set was acquired by Landsat-7 ETM+ sensor in August 2001 and August 2002 in the northeast region of China. A section (400 × 400 pixels) of the entire scene was selected as the test site. Figure 6(a,b) show the false color composite using the NIR band (R-4, G-3 and B-2), respectively. The image of 2001 was registered to the image of 2002 using the linear transformation model. The value of root mean square error on the geometric correction was 0.6. Then, the histogram-matching method was implemented to the former image for the relative radiometric correction by referencing the latter one. The difference image was generated by CVA based on all bands except for the thermal infrared band. A reference map was manually produced by a detailed visual analysis as shown in Figure 6(c). Figure 7(a) shows the change patterns of the CD accuracy indices with the increase of threshold T, in which the weight parameter was set to 0.5 and the threshold ranged from 0 to 0.8 with the steps 0.1. We can see that the ratio of false alarms always decreases with the increase of the threshold. As for the ratios of missed detections and total errors, they decline when the threshold ranges from 0 to 0.5, and then grow. As shown in Figure 7(b), different values of the weight parameter β have a slight influence on the accuracy indices. The ratio of missed detections slightly increases and the ratio of false alarms slightly decreases, respectively, with the increase of the weight parameter. But the ratio of total errors almost keeps stable.
CD maps of the data set 2 are presented in Figure 8, which are generated by EM, EMMRF, FLICM and the proposed method, respectively. The value of parameter β in EMMRF was set to 0.4, the segment scale Q and the μ value were set to 128 and 0.2, the μ value of UDWTAC was set to 0.1. All values of parameters in the above contrast experiments were manually set to obtain highly accurate results. The overlap threshold T in the proposed method was set to 0.5 by referencing Table 1. Quantitative CD results of the data set 1. P m , P f and P t are ratios of missed detections, false alarms and total errors, respectively, and κ represents the kappa coefficient.  the accuracy change patterns under different overlap thresholds. Figure 8(a) displays the change map produced by EM, which contains few missed detections but much false alarms (e.g. circle region). As can be seen from Figure 8(b), EMMRF generates a more homogenous change map than the one of EM, but  lots of false alarms still exist and some regions are over smoothed. FLICM and SACM produce more homogenous CD maps than the above-mentioned methods because of using the fuzzy and local spatial information, respectively. But some detailed changes are missed due to the overused spatial information. The result obtained by UDWTAC has much noise as shown in Figure 8(e). The proposed method not only reduces the missed details as shown in Figure 8(f), but also yields a very homogenous CD map, which is the most similar one with the ground reference map. Table 2 displays the accuracy of the CD maps generated by the above-mentioned methods. EM generates few missed detections and the maximum false alarms. Though EMMRF reduces the missed detections of EM, it still contains many false alarms. FLICM generates more missed detections but much less false alarms than EM and EMMRF, which reduces the false alarms to 1224 pixels by incorporating fuzzy local information. SACM and UDWTAC produce CD maps indicating similar accuracy with FLICM. It is worth noting that the proposed method produces the most accurate CD map by referencing the total errors and kappa coefficient, where both the false alarms and missed detections are significantly reduced compared with other methods used in this study. In total, the experiments indicate the effectiveness of the proposed method that removes false alarms and preserves detailed changes. The threshold method EM is very fast, which only needs 1.0 s, and the FLICM costs 10.0 s. Because the proposed method generates more accurate by combining EM and FLICM, it needs more computing time than other methods used in this paper.

Discussion
This paper presents a CD framework by fusing threshold and clustering methods for optical medium resolution remote sensing images. Automatic threshold algorithms usually generate a CD map containing almost all real changes but many false alarms. Clustering methods usually produce less false alarms but many missed detections. A fusion strategy consisting of overlap fusion and local Markov random field fusion was proposed to detect more real changes and fewer false alarms. The proposed framework was tested using two Landsat EMT+ data sets. In the first data set, the changes between two images were caused by a forest fire. In the second data set, different kinds of changes happened because of crop planting. It was shown that the proposed method can obtain more accurate CD maps compared with other five popular CD methods. Through the second experiments, the proposed framework also works well for regions with different kinds of changes.
The proposed method is influenced by the overlap threshold and the weight parameter β. Its total errors firstly decrease when the overlap threshold increases and begin to increase after the overlap threshold reaching around 0.5. The main reason is that the false alarms cannot be removed when the overlap threshold is small, because most detected changes are retained for a small overlap threshold. Much real changes will be removed under a large overlap threshold. Therefore, the proposed method produces an accurate result under the overlap threshold value around 0.5. As for the weight parameter β, the ratio of missed detections slightly increases and the ratio of false alarms slightly decreases with the increasing weight parameter, respectively. But the ratio of total errors almost keeps stable and reaches minimal values at the weight parameter value of 0.5. Given that, both the overlap threshold and the weight parameters are suggested to set to 0.5 in other studies.
The advantage of the proposed method is that it achieves a good trade-off between missed detections and false alarms. It reduced many false alarms compared with the EM threshold method and EMMRF, and missed detections were slightly increased. The proposed method reduced missed detections when compared with the popular CD methods incorporating spatial information, such as FLICM, SACM and UDWTAC. The proposed method usually generates more accurate CD maps considering the total errors, and achieves a better consistency with the ground map referencing the kappa coefficients. The disadvantage is that the proposed method needs more time than other used methods in this study. The threshold method EM is very fast, which costs about 1.0 s for both experiments. However, the FLICM incorporating spatial information costs much time, about 10.0 s. As for the proposed method, though it generates more accurate results, it needs more computation time than other methods because of by combining EM and FLICM. Hence, when we want to obtain more accurate results, it is better to use the proposed method. It is better to obtain the threshold method when the change maps need producing in a shorter time.
This method was developed for mediumresolution images. Therefore, it should work for Table 2. Quantitative CD results for the data set 2. P m , P f , and P t are ratios of missed detections, false alarms and total errors, respectively, and κ represents the kappa coefficient. other images having similar resolution with Landsat images, such as Sentinel-2. However, it cannot work well for very-high-resolution (VHR) images so far. More challenges are raised when VHR images are used as input data sources in CD, such as larger reflectance variability in each class and different acquisition characteristics (e.g. sensor viewing geometry and shadow effect). The increased variability often results in too many false changes, known as the "salt and pepper" effect in the pixel-based CD approaches (Hussain, Chen, Cheng, Wei, & Stanley, 2013). The object-based methods take an image object as the unit, which makes better use of texture, shape, and spatial relationships with neighboring objects than the pixel-based ones. In the future work, we will extend this method to superpixel or object level to test its effectiveness. In addition, the proposed method can be extended to CD maps obtained by other methods, where few missed detections and false alarms exist, respectively, such as some fine-and coarse-resolution CD results. In this study, only Landsat ETM+ data sets were used to assess the proposed method, therefore, the future work will be focused on the evaluation to other type data sets and methods.

Conclusion
A CD framework of fusing the threshold and clustering CD methods is proposed to remove false alarms and preserve detailed changes simultaneously. The CD map generated by the clustering method of FLICM is used to refine the one obtained by EM threshold method through the overlap fusion. Therefore, false alarms can be removed and some potentially missed detections are found. Then, LMRF is implemented to verify the potentially missed detections. Experiments were conducted on two Landsat EMT+ data sets to assess the performance of the proposed method. Compared with some popular methods (i.e. EM, EMMRF, FLICM, SACM, and UDWTAC), the proposed method achieves improvements in both accuracy and visual interpretation, and produces the most accurate change maps. Results indicate that the proposed method not only obtains more homogenous CD maps, but also preserves more detailed changes than other methods used in this paper. In total, the proposed method provides an effective and unsupervised CD method from remote sensing images.