Segmentation and classification of high spatial resolution images based on Hölder exponents and variance

Abstract Pixel-based or texture-based classification technique individually does not yield an appropriate result in classifying the high spatial resolution remote sensing imagery since it comprises textured and non-textured regions. In this study, Hölder exponents (HE) and variance (VAR) are used together to transform the image for measuring texture. A threshold is derived to segment the transformed image into textured and non-textured regions. Subsequently, the original image is extracted into textured and non-textured regions using this segmented image mask. Afterward, extracted textured region is classified using ISODATA classification algorithm considering HE, VAR, and intensity values of individual pixel of textured region. And extracted non-textured region of the image is classified using ISODATA classification algorithm. In case of non-textured region, HE and VAR value of individual pixel is not considered for classification for significant textural variation is not found among different classes. Consequently, the classified outputs of non-textured and textured regions that are generated independently are merged together to get the final classified image. IKONOS 1 m PAN images are classified using the proposed algorithm, and the classification accuracy is more than 88%.


Introduction
High spatial resolution imagery helps to acquire quality and detailed information about the features of the earth's surface together with their geographical relationships. The spatial resolution specifies the pixel size of remote sensing image covering the earth surface. The internal variability within the homogeneous land cover units increases with the increase of resolution. The increased variability decreases the statistical separability of landcover classes in the spectral data space. This decreased separability tends to reduce the accuracies of pixel-based classification algorithms such as K-Means (Hartigan and Wong 1979), Fuzzy C Means (Bezdek, Ehrlich, and Full 1984) and minimum distance classifiers (Richards 1995). These pixel-based classification techniques assign a pixel to a region according to the similarities of spectral signature. It considers only one pixel at a time (Chakraborty, Sen, and Hazra 2012). Spectral signatures are the specific combination of emitted, reflected, or absorbed electromagnetic (EM) radiation at varying wavelengths which can uniquely identify an object (Chakraborty, Sen, and Hazra 2012).
The spectral resolution of high spatial resolution remote sensing images is relatively poor compared to Landsat TM sensors. Spectral resolution describes the ability of a sensor to define fine wavelength intervals. The finer the spectral resolution, the narrower the wavelength for a particular channel or band. Thus, there is a trade-off between the spatial and spectral resolution. Particularly it is true for high spatial resolution panchromatic (PAN) images, namely CARTOSAT-II 1 m and IKONOS 1 m. There is a need to take into account the spatial relations between pixel values, also known as "texture" of the objects of the scene for classifying the high spatial resolution images due to much variation of spatial structure in these images. Thus, number of texture-based classification technique, namely gray level co-occurrence matrix (GLCM) (Clausi and Yue 2004;Haralick, Shanmugam, and Dinstein 1973;Tsai and Chou 2006;Tsai, Chou, and Wang 2005), Markov random field (MRF) model (Clausi and Yue 2004), Gray scale rotation invariant (Klemas 2009) have been developed for classifying high spatial resolution images. However, these techniques are found to be well applicable in textured region of high spatial resolution images. A region is said to be textured, where the intensity variation among neighborhood pixels is significant. A region is said to be non-textured, where the intensity variation among neighborhood pixels is insignificant (Chakraborty et al. 2013;Weng 2012). However, texture-based classification techniques failed

OPEN ACCESS
in non-textured region of high spatial resolution image as much variation is not found in the spatial pattern of those regions of the image (Chakraborty, Sen, and Hazra 2009). Thus, we can infer from earlier studies that classification of high spatial resolution imageries either by pixel or texture-based algorithm may not yield desired results.
Some more techniques namely watershed approach (Mathivanan and Selvarajan 2012;Wang, Zhao, and Chen 2004), region growing approach (Carleer, Debeir, and Wolff 2005;Chakraborty, Sen, and Hazra 2012), mean shift approach (Chakraborty et al. 2008;Su et al. 2015), region merging approach (Zhang et al. 2014) etc. are in use for classifying high spatial resolution remote sensing images. Application of these approaches for classification of images either leads to undersegmentation or oversegmentation (Chen et al. 2015;Wang et al. 2014). Structural image indexing approach (Xia et al. 2010), semi-supervised feature learning approach (Yang, Yin, and Xia 2015) and multi-scale manner using SVM approach (Huang and Zhang 2013) are also used and found quite useful in classifying high spatial resolution remote sensing images. The high spatial resolution remote sensing imagery comprises textured and non-textured regions. Therefore, classification of high spatial resolution imageries either by pixel or texture-based algorithm does not yield desired result. This type of study on classification of high spatial resolution imagery is in vogue. Multi circular local binary pattern (MCLBP) and variance-(VAR) based algorithm (Chakraborty et al. 2013) has been used to classify textured and non-textured region of high-resolution image individually. The MCLBP operator has been used here for measuring the spatial structure of the image. However, the disadvantage in this approach is that MCLBP operator is sensitive to noise since it considers exactly the value of the central pixel of the moving window as a threshold for measuring the spatial structure around the central pixel.
In the last few years, the Hölder exponent (HE) has been used for measuring spatial structure of the images (Lucieer, Stein, and Fisher 2005;Malladi, Kasilingam, and Costa 2003;Tahiri, Farssi, and Touzani 2005). It is also being used for segmenting high-resolution images (Chakraborty, Sen, and Hazra 2009). HE gives an idea of the spatial structure of the image and is not very sensitive to noise. Besides spatial structure, contrast of the local image holds important property for measuring the texture around the pixel. The present study is carried out with the specific objective of segmenting textured and non-textured region of high spatial resolution image using HE and VAR-based approach and classify textured and non-textured regions individually. VAR is used for measuring the contrast around the pixel. The proposed approach is implemented on high spatial resolution IKONOS PAN imagery having spatial resolution of 1 m.

Methods
The proposed approach for classifying the high spatial resolution image P has three main steps: (i) image transformation, (ii) segmentation and extraction, and (iii) classification. First, each pixel of the image is transformed into degree of texture or non-texture around the pixel. Second, the transformed image is segmented and extracted non-textured and textured areas from the original image using segmented image mask. Third, these two textured regions are classified independently.

Image transformation
The HE and VAR are used together to transform the image for measuring the texture. The HE measures the spatial structure around each pixel of P. Besides spatial structure, contrast of the local image holds important property for measuring the texture around the pixel. Therefore, VAR is used for measuring the contrast around the pixel.

Hölder exponent
HE has been used for texture analysis of high spatial resolution images (Chakraborty, Sen, and Hazra 2009). It measures the local regularity of the image. The advantages of using HE in high spatial resolution images are: (i) it can be used as a tool to measure the spatial structure around each pixel of the image, (ii) it does not require any prior information about the pixel intensity, and (iii) is not very sensitive to noise (Chakraborty, Sen, and Hazra 2009).
Definition of HE (Bourissou, Pham, and Levy-Vehel 1994): Let μ be a measure on a set Ω as well as for all A series of 15 values of radius r (i.e. 1, 2) centered on x are used as a scale parameter for calculating HE value around each pixel x in the image (Chakraborty, Sen, and Hazra 2009) and the total number (N) of intersected pixels by the perimeter of series of circles of radius r is considered as a scale parameter for computing VAR value around x (Chakraborty, Sen, and Hazra 2009). N is computed using Equation (1): where t is the total number of identified circles; m r is the number of intersected pixel on the perimeter of the circle of radius r.

VAR for measuring contrast around each pixel of the image
The VAR (σ 2 ) of the neighbor of each pixel (x, y) over the whole image is computed to obtain the contrast or m r σ 2 value of (x, y). The σ 2 (x, y) is obtained by using the Equation (2): where a rj is the intensity value of pixel (r, j); = ∑ t r=1 ∑m r j=1 a rj N . Thus α(x, y) and σ 2 (x, y) for each (x, y) of the original image P are obtained. Subsequently, these values are used in the Equation (3) to get the corresponding pixel value (x, y) in the transformed image T. Each pixel (x, y) of T represents the degree of texture around that pixel:

Image segmentation and extraction
A threshold is used to segment T into textured and non-textured regions. The pixel value in T below the threshold value is considered as non-textured region while greater than or equal to the threshold value is considered as textured region in the segmented image. Pixels identified as non-textured regions are marked as zero, whereas the pixels identified as textured regions are marked as one in the segmented image mask and represented as follows: where T(x, y) and Γ(x, y) represents the pixel value in (x, y) position of the two-dimensional transformed image and segmented image, respectively; δ corresponds to the threshold value and is computed by using Equation (5): where T min and T max are minimum and maximum pixel gray value in T, respectively, and K is user defined value.
To find the optimum K, IKONOS PAN sensor image having spatial resolution of 1 m with image size 256 × 256 pixels are used (shown in Figure 2(a)). Further, the proposed classification approach is implemented on this image for different K values.
Subsequently, the segmented image Γ is used to extract the textured and non-textured regions from the original image P. The mathematical representation of this process is shown as follows: where P, Γ, R 1 , and R 2 represents original image, segmented image, extracted non-textured region from original image P and extracted textured region from original image P, respectively; P, R 1 , and R 2 satisfy the properties, namely (i) R 1 and R 2 ⊂ P, (ii) P = R 1 ∪ R 2 , and (iii) R 1 ∩ R 2 = Φ.

Classification
Initially, the transformed image is segmented into textured and non-textured region using a threshold. Subsequently using the segmented image mask the original image is extracted into textured and nontextured region and classified independently. The extracted textured region (R 2 ) is classified using unsupervised interactive self-organizing data analysis technique (ISODATA) classification algorithm (Jain, Murty, and Flynn 1999) considering the HE, VAR, and intensity values of individual pixel of textured area. ISODATA clustering algorithm has many benefits such as less computing, simplicity, and un-supervising. While extracted non-textured region (R 1 ) of the image is classified using ISODATA classification algorithm. In case of non-textured region, HE and VAR value of individual pixel is not considered for classification as significant texture variation is not found among classes. Subsequently, the classified outputs of non-textured and textured region are generated independently and are merged together to get the final classified image.
To show the robustness of the proposed method "HE-VAR and PAN"-based classification method and "MCLBP and VAR"-based classification method are used in this study. "HE-VAR and PAN"-based method classify the whole image using HE, VAR and intensity value of each pixel of the IKONOS PAN image. Thereafter, the proposed classification method is compared with the results of "HE-VAR and PAN"-and "MCLBP and VAR"based classification method to show its robustness.

Results and discussion
The proposed classification approach envisages threshold δ to get the segmented image mask from the transformed image. A constant K is used to compute the threshold. In the present study, proposed classification algorithm is implemented on IKONOS PAN sensor image having spatial resolution 1 m for different K values: values between 3 and 7. Classification rate is also assessed for these different K values using the ground truth data. Figure 1 shows the classification accuracy with different K. From Figure 1, we can infer that K affects the accuracy in classifying high spatial resolution images substantially. Therefore, a suitable selection of K is important for measuring texture. This research, with K = 5 achieved the best performance in high-resolution image classification. The optimum K is found based on are (73, 69, 59, and 87%, respectively) by "HE-VAR and PAN"-based method and (79, 71, 68, and 89%, respectively) by "MCLBP and VAR"-based method whereas (91, 86, 85, and 94%, respectively) by the "Proposed classification" method. The confusion matrices (Table 2) also calculated for IKONOS classified images (Figure 2(f)-(h)) showed that the accuracy of classifying vegetation, built-up area, fallow and Water bodies are (73, 74, 66, and 88%, respectively) by "HE-VAR and PAN"-based method and (78, 76, 68 and 89% respectively) by "MCLBP and VAR"-based method whereas (90, 87, 86, and 93%, respectively) by the "Proposed classification" method.
image Figure 2(a) which is also applied in classifying Figure 2(e) as well as other images and found classification accuracy is more than 88%. Thus, we can infer from this study that the same K value is suitable for most of the images.
The "HE-VAR and PAN"-based method, "MCLBP and VAR"-based method and Proposed classification method have been applied on two different 1 m PAN (IKONOS) images (size: 256 × 256 pixels) of (i) vegetation, (ii) built-up area, (iii) water bodies, and (iv) fallow (shown in Figures 2(a) and 2(e)). Texture is visible in both the images. The results of the proposed method is then compared with the results obtained from the analysis based on "HE-VAR and PAN" and "MCLBP and VAR", respectively.
The classified image outputs of "HE-VAR and PAN", "MCLBP and VAR" and "Proposed classification" methods after being applied (i) on the first IKONOS image (Figure 2 Classified images identify the different features, namely vegetation, built-up area, water bodies, and fallow respectively, are presented in Figure 2(b)-(d), (f)-(h). From the results, it is clearly seen that the "MCLBP and VAR"-based method gives less heterogeneous segments than "HE-VAR and PAN"-based method, while "Proposed classification method" gives more homogeneous segments with distinct classes than "MCLBP and VAR"-based classification method.
The GPS equipment is used to collect the ground truth data for the class vegetation, built-up area, fallow, and water body of sample size of 656, 519, 577, and 462 m 2 , respectively. The collected ground truth data is converted into vector data using ArcGIS software. Using the ground truth data overlaid separately on the resultant outputs obtained from both IKONOS images (Figure 2(a) and (e)) after by means of "HE-VAR and PAN", "MCLBP and VAR" and "Proposed classification" methods, the classification accuracies for each approach is shown through confusion matrix. The confusion matrices (Table 1) calculated for IKONOS classified images (Figure 2(b)-(d)) showed that the accuracy of classifying vegetation, built-up area, fallow, and Water bodies   Figure 2(b). This discrepancy decreases the classification accuracy of vegetation, fallow, water bodies and built-up area as shown in Tables 1 and 2. The "MCLBP and VAR"-based approach overcome these discrepancies to some extent. It is found that the superposition of fallow, water body, vegetation area becomes less as shown in Figures 2(c) and (g). Moreover, decreased discrepancies increase the accuracy in classifying fallow, water body, and vegetation areas (shown in Tables 1 and 2). However, the proposed classification method mostly overcomes these discrepancies as shown in Figures 2(d) and (h). Thus, the improvement in classification accuracy is found in Tables 1 and 2.
The texture pattern of water bodies and fallow areas does not show much difference, as it is visible in Figures 2(a) and (e). Consequently, "HE-VAR and PAN"-based method classifies these areas as a single class The classified output with respect to the two input images (Figure 2(a) and (e)) shows that the "HE-VAR and PAN"-based method under segment (i) fallow Table 1. the confusion matrices showing the classification accuracy obtained by applying "He-Var and pan", "mclBp and Var" and "proposed" methods separately on iKonos image shown in figure 2(a).  Table 2. the confusion matrices showing the classification accuracy obtained by applying "He-Var and pan", "mclBp and Var" and "proposed" methods separately on iKonos image shown in figure 2(e).  HE and VAR are used together to transform the image for measuring the texture. A threshold δ is applied to extract non-textured and textured region from the image. The extracted textured region is classified using ISODATA classification algorithm considering HE, VAR, and intensity values of an individual pixel of textured area. While extracted non-textured region of the image is classified using ISODATA classification algorithm. In case of non-textured region, HE and VAR value of individual pixel is not considered for classification. From the results of the study, it is found that the proposed method is useful to classify complex images containing both textured and non-textured regions. Moreover, it can be considered as an intuitively appealing and unsupervised classification algorithm. As a result, the method is potentially useful to classify high spatial resolution panchromatic images more efficiently. Calculation of textural features in the multi-scale manner using SVM approach for classification of high-resolution image is our future scope of work.
( Figure 2(b) and (f)). "MCLBP and VAR"-based method shows improvement in classifying the fallow areas and water bodies which is visible in Figure 2(g). In case of Figure 2(a), this method could not extract non-textured region properly form the image since MCLBP is sensitive to the noise. Thus "MCLBP and VAR"-based method could not distinguish properly fallow areas and water bodies in Figure 2(a) as visible in Figure 2(c). Since HE is less sensitive to noise therefore the proposed technique partitions the image into textured and non-textured regions distinctly which in turn helps in classifying the fallow and water bodies as shown in Figure 2(d).
Further to show the robustness and validity of the proposed classification method in classifying land use area, the method is applied on a 1 m PAN (IKONOS) image (Figure 3(a)) of (i) urban woodland, (ii) building, (iii) water bodies, and (iv) fallow. The output result (Figure 3(b)) shows that the method satisfactorily discriminates urban woodland, building, fallow, and water bodies. The method also applied on other two different 1 m PAN (IKONOS) images: (i) Figure 4(a) of fallow, vegetation, built-up area, and bare soil and (iii) Figure  4(c) of water bodies, vegetation, fallow, and built-up area. The output results (Figure 4(b) and (d)) shows that the method satisfactorily discriminates vegetation, fallow, built-up area, bare soil, and water bodies.

Conclusions
In the present study, HE is used to compute the spatial structure of local image texture. VAR is used to measure the contrast around the pixel. Subsequently,