Textural segmentation of remotely sensed images using multiresolution analysis for slum area identification

ABSTRACT Many cities in developing countries are facing rapid growth of dynamic slum areas but often lack detailed information and analysis on these informal settlements. Multiresolution analysis (MRA) has been successfully used in texture analysis. Texture analysis is widely discussed in literature, but most of the methods which do not employ multiresolution strategy cannot exploit the fact that texture occurs at various spatial scales. This paper proposes a texture-based segmentation scheme using newly developed multiresolution methods for slum area identification. The proposed method is tested on remotely sensed images where textural information in terms of statistical moments and energy are extracted at various scales and in different directions with the help of curvelet and contourlet transforms. The results are compared with wavelet-based MRA method of segmentation. Accuracy assessment is performed for segmented images, and comparative analysis is carried out in terms of class-wise and overall accuracies. It is found that the proposed method shows better class-discriminating power as compared to existing methods and overall classification accuracy of 91.4–95.4%.


Introduction
Today, urban growth is one of the most important issues in developing countries like India. According to the prediction of the United Nations, the population of India will be about 1.44 billion in 2024 and is expected to reach 1.66 billion inhabitants around 2050 (United Nations Population Division, 2017). Most of the rapid urbanization will take place on agricultural land, vegetation and other natural land cover. It is shown that the rate of urban physical expansion is much faster than urban population growth in many cities (Seto, Sánchezrodríguez, & Fragkias, 2010). Present urban growth in Asia is due to both high natural growth rates and the concurrent rural-urban migration (UN-Habitat, 2013). In general, urbanization can lead to both, the growth of informal as well as formal urban settlements of low, medium and upper class housing. Such a high rate of urbanization is often unaccompanied by adequate development of infrastructure including housing and transport. Together with the large share of informal low-paid employment, this process largely results in growth of informal settlements (UN, 2009). "Slum" has become a common term to refer to areas which lack tenure security, access to safe water, access to proper sanitation, durability of housing and overcrowding (UN, 2003).
Accurate detection and classification of informal settlements using remote sensing data pose real challenges to researchers. Distinguishing different urban structures is challenging because of the nature of the classes as opposed to the separation of standard land-cover classes. This task requires extraction of texture and spatial features. Unlike agricultural land or other natural vegetation types, urban structures lack unique and easily distinguishable spectral signatures (Baltsavias & Mason, 1997). In contrast, internal spatial characteristics of slums such as housing density, size and structure of individual dwelling units emerge as promising and efficient methods of slum detection.
A number of studies have addressed these issues focussing on physical characteristics of the slums that can be analysed from remotely sensed images. These methods include object-based image analysis (Kohli, Warwadekar, Kerle, Sliuzas, & Stein, 2013;Witharana & Lynch, 2016), visual image interpretation (Baud, Kuffer, Pfeffer, Sliuzas, & Karuppannan, 2010;Munyati & Motholo, 2014), pixel-based classification (Jain, 2007;Persello & Stein, 2017) and texture-based methods (Ciriza, Sola, Albizua, Álvarez-Mozos, & González-Audícana, 2017, Kit, Lüdeke, & Reckien, 2012. Texture is a commonly used feature in the analysis and interpretation of images. Texture is characterized by the spatial organization of grey-level variations in a local area. It quantifies the local intensity variations in an image, observing properties such as fineness, coarseness and evenness. Co-occurrence matrices are frequently used in texture analysis as they capture the spatial relatedness of pixel intensities in a neighbourhood within an image (Kuffer, Pfeffer, Sliuzas, & Baud, 2016;Murray, Lucieer, & Williams, 2010;Unser, 1986). These methods are constrained by the analysis of spatial arrangement over relatively small neighbourhoods on a given single scale. In this context, scale is the ratio of size of the object in the image to the actual size of the object (Benz, Hofmann, Willhauck, Lingenfelder, & Heynen, 2004;Woodcock & Strahler, 1987). An object which is smaller than the spatial resolution of sensor system cannot be identified. As a result, performance of cooccurrence matrices is only suitable for micro-level textures (Unser, 1995).
Signal processing-based multiresolution techniques have also gained attention for texture analysis and segmentation. A multiresolution technique provides a coarse-to-fine and scale-invariant decomposition for interpreting the image information. At different scales, the details of an image vary according to the content of the image, where the lower resolution provides a global view, while the higher resolution provides the finer details of the scene. Texture and MRA are therefore required in analysis and segmentation because: (1) It is difficult to analyse the information content of an image directly from the pixel intensity.
(2) The local changes of the intensity of an image are more important than the grey-level intensity of that image. (3) An image contains important information in a wide range of scales, starting from large objects occupying a sizeable portion of the image to fine details occupying a few pixels.
Textured objects reveal different type of information as a function of the resolution of reference for analysis, which cannot optimally be observed at a single resolution for image analysis (Chen, Luo, Zhou, & Pei, 2003;Mallat, 1989). From the feature recognition and extraction perspective, for example, one residential building differs from a large shopping mall in terms of scale or size that they are unlikely to be extracted simultaneously at just one single level of segmentation. Therefore, it is better to extract features of the objects from different segmentation levels.
Traditionally, wavelet transforms have been very popular for multiresolution analysis (MRA) but it is well documented how they are constrained in their ability to capture directional information beyond horizontal, vertical and diagonal directions (Ansari & Buddhiraju, 2016, Welland, 2003. The wavelet transform uses a particular set of basis functions, which are defined by roughly isotropic functions present at all scales and locations. Therefore, it is more appropriate for isotropic features or slightly anisotropic features. The theory of composite wavelets is used to construct shearlets which provide an optimal approximation for singularities in 2D (Guo & Labate, 2007, Kutyniok & Lim, 2011. Shearlet transform optimally represents anisotropic features of multidimensional data (Kutyniok & Labate, 2012, Kutyniok, Shahram, & Zhuang, 2012. A fast implementation of shearlets can be found in Häuser and Steidl (2012). The importance of anisotropic shearlet features is explored for image registration in remote sensing (Le Moigne, Campbell, & Cromp, 2002, Murphy & Le Moigne, 2015, Murphy, Le Moigne, & Harding, 2016, Zavorin & Le Moigne, 2005. To move beyond the wavelet transform, a range of other basis function sets have been used, with properties relating to alignments, elongations, edges and indeed curvilinear features. The developments in MRA such as curvelet (Candès & Donoho, 2000) and contourlet (Do & Vetterli, 2002) are shown to overcome these limitations. The use of curvelets and contourlets for image segmentation by texture, particularly in the context of remotely sensed image analysis is limited, and it is the aim of this paper to explore these relatively new multiresolution techniques using texture features for slum identification.

Data set
The study area is sections of Mumbai city having a dense mix of slums and formal built-up areas. The region is rapidly growing urban region where slums provide housing and livelihood activity space for lowincome groups. An IRS-1C satellite's panchromatic sensor image of part of Mumbai city (covering the slums of Dharavi and nearby areas) with spatial resolution of 5:8m Â 5:8m is considered as shown in Figure 1. This resolution is well suited for texture analysis since 5:8m Â 5:8m is not adequate to extract individual buildings or narrow roads but groups of them render a visible checked pattern in dense urban areas. The second data set is a high-resolution Worldview-2 (2m Â 2m) image of Mumbai city which covers the slums of Cheeta Camp and nearby areas ( Figure 2). Mumbai's slums are generally characterized by very high densities and clustering of small buildings. However, there are differences between slums: areas that have been rehabilitated near highrise apartments and towers; long-established slum locations, which have regular small-scale shops and roof patterns along main roads; very densely congested areas with only small lanes in inner pockets (parts of well-known slum of Dharavi, areas near Cheeta Camp and Mankhurd); inner-city slums covering large areas with high roof density and small lanes (parts of Shivaji Nagar and Mankhurd); slums located on hilltops (parts covering Antop Hill and Koliwada); temporary structures with very poor and informal settlements.
The methodology consists of two phases, texture extraction training and segmentation. The major steps in the proposed methodology are shown in

Wavelet transform
The wavelet transform is a mathematical algorithm used to describe images in multiple resolutions. Wavelets can be viewed as projection of the signal on a specific set of scaling ϕ(t) and wavelet basis ψ(t) functions in the vector space. The wavelet coefficients obtained represent these projection values. The discrete wavelet transform is realized with the help of filter banks. The basis functions are expressed with the help of dilation Equations (1) and (2) as (Mallat, 1989): ψðtÞ ¼ X n g½nϕð2 m t À nÞ; where h[.] are low-pass filter coefficients and g [.] are high-pass filter coefficients of filter bank, m is scaling index and n is translating index. Wavelet-based MRA method is very effective when dealing with one-and two-dimensional signals with linear discontinuities. By decomposing the image into a series of high-pass and low-pass filter bands, the wavelet transform extracts directional details that capture horizontal, vertical and diagonal details. However, these three linear directions are limiting and might not capture sufficient directional information in remotely sensed images. Furthermore, wavelet filter coefficients may not be the best and the sparsest representation to describe the edge features in the image (Welland, 2003).

Curvelet transform
Curvelet transform is a multi-scale and multi-directional transform with wedge-shaped basis functions. Basis functions are the functions which decompose the input signal and represent them in transform domain. Basis functions of wavelets are isotropic and thus they require large number of coefficients to represent the curve singularities (Welland, 2003). The curvelets at different scales and directions span the entire frequency space and their basis functions are considered as grouping of wavelet basis functions locally into linear structures so that they can capture the curvilinear discontinuities more efficiently as illustrated in Figure 4(a).
Curvelets partition the frequency spectrum into dyadic (2 j , where j is an integer describing scale) scales and sub-partition those into angular wedges which show parabolic aspect ratio. Curvelet transform works in two dimensions with spatial variable x, frequency domain variable ω and the frequency-domain polar coordinates r and θ. It is defined by a pair of windows in Equations (3)-(6), radial window {W(r)} and angular window {V (t)}. A polar "wedge" represented by Uj is supported by the radial window {W(r)} and angular window {V(t)}.
where ν is a smooth function satisfying In frequency domain, these wedges are defined by Candès and Donoho (2000) as: where W(.) is radial window, V(.) is angular window, r is radius and θ is the angle of orientation and s is the scale. In this paper, wrapping-based fast discrete curvelet transform (Cand`es, Demanet, Donoho, & Ying, 2006) is used (Figure 4(b)).

Contourlet transform
The contourlet filter is an extension to wavelets in 2D, which is constructed using non-separable directional filter banks (DFBs). Its expansion is composed of basis images oriented at varying directions in multiple scales, with flexible aspect ratios. Do and Vetterli (2002) proposed a double filter bank structure by combining the Laplacian Pyramid (LP) with a DFB. The resultant structure is called the contourlet transform which is designed to satisfy the anisotropy scaling relation for curves, and thus offers a fast and structured curvelet-like decomposition of sampled signals. In the construction of contourlet, the wavelet coefficients are grouped to obtain a sparse image expansion by first applying a multi-scale transform and then applying a local directional transform to gather the nearby basis functions at the same scale into linear structures. In general, the contourlet construction allows for any number of DFB decomposition levels l j to be applied at each LP level j. To satisfy the anisotropy scaling relation as in the curvelet transform, it is required to impose that in the pyramidal directional filter bank (PDFB), the number of directions is doubled at every other finer scale of the pyramid. By combining these two steps, the support size of the PDFB basis functions is changed from one level to next according to the curve scaling relation. Therefore, each generation doubles the spatial as well as the angular resolution. Figure 5 shows a multi-scale and directional decomposition using a combination of LP and DFB. Bandpass images from the LP are given to a DFB so that directional information can be captured. The scheme can be iterated on the coarse image. The  combined result is a double-iterated filter bank which decomposes images into directional subbands at multiple scales. Let x 0 [n] be input image. The output after LP stage are J bandpass images b j , (j = 1, 2, …, J) in the fine-coarse order and a low-pass image x J [n]. The jth level of the LP decomposes the image x j-1 [n] into a coarser image x j [n] and a detail image b j [n]. Each bandpass image b j [n] is further decomposed by an l j level DFB into 2 l j bandpass directional images c l j j;k ½n; k ¼ 0; 1; :::; ð2 l j À 1Þ. These decompositions satisfy the perfect reconstruction (Equation (7)) and orthogonal property (Equation (8)).
Since the multi-scale and directional decomposition stages are decoupled in the discrete contourlet transform, we can have a different number of directions at different scales, thus providing a flexible multi-scale and directional expansion of remotely sensed images.

Feature extraction
Unser (1995) used co-occurrence matrices and concluded that second-order statistics may be best for segmentation of micro-textures. The use of a wavelet transform for texture discrimination by extracting features from different sub-bands is explored by Mallat (1989). Scheunders, Livens, Van de Wouwer, Vautrot, and Van Dyck (1998) discussed multiband moment features, which they used with three-band colour data for texture analysis.
In case of wavelet transform, it is found that wavelet coefficient distributions are long tailed, and in particular are non-Gaussian (or sometimes close to be Gaussian but not exactly Gaussian) (Belge, Kilmer, & Miller, 2000;Buccigrossi & Simoncelli, 1999). For general analysis where multiresolution coefficients follow a mixture of distribution families, a natural way to carry out the analysis is by using higher order moments of the multiresolution coefficients for the unknown underlying distributions (since the Gaussian distribution is completely characterized by moments of first and second order).
The wavelet transform uses a particular set of basis functions, which are defined by roughly isotropic functions present at all scales and locations. Therefore, it is more appropriate for isotropic features or slightly anisotropic features. Curvelet and contourlet transforms therefore target the detection and characterization of non-Gaussian signatures in the image. In Starck, Aghanim, and Forni (2004), the kurtosis is used to understand the nature of complex non-isotropic features in cosmology. Consequently, we are motivated to use lower and higher order moments in the context of Gaussian as well as long-tailed distributions of coefficients of curvelet and contourlet transforms.

Texture training
In order to extract reference texture features, four classes viz. formal built-up area, slum, vegetation and water classes are considered in the experimental set-up. Sample texturally homogeneous regions are initially cropped manually to obtain pure samples of each texture based on visual inspection from 8 to 10 different places. Curvelet, contourlet and wavelet transforms are applied to these 8-10 texture samples of each class and a set of first four moments and energy are extracted from the transform coefficients; an arithmetic mean is calculated over these 8-10 feature vectors to serve as representative texture feature for the given texture class. These features are calculated for three levels of resolution. The curvelet matrix is based on a radial "wedge" consisting of 16 sub-bands and 2 approximation bands for a 3-level decomposition (16 detail + 2 approximation bands = 18). It results in a feature vector comprising (18 × 5 = 90) descriptors per pixel. The discrete curvelet transform can be calculated at various resolutions and angles. Two parameters are involved in the implementation of the curvelet transform: number of resolutions and number of angles at the coarsest level. The parameters are bound by the following two constraints: the maximum number of resolution levels depends on the original image size, and the number of angles at the second coarsest level must be at least eight and a multiple of four (Cand`es et al., 2006).
We considered pure classes of size 32 × 32 pixels; three levels of resolution could be extracted from this image. The following angles are explored: 12, 16 and 20. Using these parameters resulted in 14, 18 and 22 subbands containing structural information at 3 levels of resolution, respectively. Sixteen angles are found to be a plausible choice with size of 32 × 32 pixels of a pure class. The reference feature vector containing energy, mean, variance, skewness and Kurtosis for each class is formed. The most fundamental first-order moment is mean. Variance measures the irregularity in selected window. Skewness measures the extent to which outliers favour one side of the main distribution. Kurtosis measures the peakedness, or the presence of outliers and energy gives strength of the signal in each sub-band. The detailed steps for training are explained in Algorithm 1.

Algorithm 1: The Training Algorithm
[Input:] 8-10 samples (small sub-regions) of all texture classes [Output:] the representative feature vector library for each texture class (1) Given a sample obtained from a texture class, decompose the samples using a. Wavelet (9/7 biorthognal) three levels (3 detail + 3 detail + 3 detail + 1 approximation = 10 bands) b. Curvelet-three levels (16 detail + 2 approximation = 18 bands) c. Contourlets-three levels (16 detail + 2 approximation = 18 bands) (2) With the decomposed coefficients from 1a, 1b and 1c, extract first 4 moments as in Equations (9-12) and energy (4 + 1 = 5 features) to get 50 features for wavelet (10 × 5 for 1a) and 90 features (18 × 5 for 1b and 1c) for curvelet and contourlet (3) Consider next sample (total 8-10 samples for each texture class) of the same texture class and repeat steps (1) and (2) (4) Compute arithmetic mean of the feature vectors from step (3) over 8-10 samples to obtain the representative feature vector for the class (5) Repeat above steps (1-4) for all texture classes (6) Use these representative feature vectors for all the classes to make feature library After computing reference feature descriptor for each class, the image to be segmented is processed by considering a local window of different sizes around each pixel of the image. We have taken into account a neighbourhood of each pixel since texture is not only defined at micro-level but also at macro-level. The feature at a pixel (x, y) is calculated within the local window of size 32 × 32 after experimenting with various sizes w Â w centred at the pixel. Bigger window captures the macro-level textures effectively but degrades the accuracy of boundary localization in texture segmentation. It is, therefore, necessary that the local window size should be properly controlled for the exact texture segmentation. The unknown texture is decomposed using wavelet, curvelet and contourlet transforms and similar features are extracted and compared with the corresponding reference feature descriptors stored in feature library using Euclidean distance as given in Equation (13): where f j (x) represents the feature descriptor of unknown texture while f j (i) represents the corresponding reference feature descriptor of ith class in feature library. Then, the unknown texture is classified as ith texture class, if the distance D(i) is minimum among all distances to the texture classes available in the library. Algorithm 2 describes the segmentation steps in detail.

Algorithm 2: The Classification Algorithm
[Input:] an unknown texture image and the representative feature vectors in the library [Output:] the index of texture to which this unknown texture image is assigned (1) Decompose the unknown texture image using a. Wavelet (9/7 biorthognal)-three levels (3 detail + 3 detail + 3 detail + 1 approximation = 10 bands) b. Curvelet-three levels (16 detail + 2 approximation = 18 bands) c. Contourlets-three levels (16 detail + 2 approximation = 18 bands) (2) Extract the features as in step (2) of Algorithm 1 (3) Compute the distance between feature vector of step (2) and all the representative feature vectors from feature library using Equation (13) (4) Classify the unknown texture as ith texture class, if the distance D(i) is minimum among all distances to the texture classes available in the library. In order to compare the segmentation results from the proposed method, standard grey-level co-occurrence matrix (GLCM)-based features are considered for segmentation. From original image, co-occurrence matrices are computed for four directions: 0°, 45°, 90°and 135°at a set distance of one. The following six Haralick texture descriptors are then extracted from each co-occurrence matrix of original image: energy, entropy, contrast, homogeneity, mean and variance (Haralick et al., 1973). Seven window sizes ranging from 5 × 5 to 17 × 17 are used to compute the GLCM to have a fair treatment while comparing with MRA approaches. The image is segmented with these GLCM-based features to compare the results with the proposed method. Similar multi-scale GLCM-based texture analysis is adopted by Coburn and Roberts (2004), Su et al. (2008) and Agüera, Aguilar, and Aguilar (2009) for satellite image classification. For comparative analysis, support vector machine (SVM) classifier is also used. A comprehensive description of SVMs can be found in Cristianini and Shawe-Taylor (2000). Training, classification and accuracy assessment are carried out using LIBSVM version 3.23 (Chang & Lin, 2011). Results and discussion Figure 6 shows slum identification results for IRS-1C panchromatic data set. Figure 6(a,b) show original image and reference for slums, respectively. Figure 6(c) shows the result using wavelet-based method, in which it is observed that slum areas are poorly identified and erroneous segmentation is evident. Slum areas and partially built-up areas are getting mixed. Figure 6(d,e) show results using curvelet-and contourlet-based features. It is observed that slum areas are properly identified with better accuracy and improved boundary continuity. The slum class is not getting mixed with partially built-up or formally built-up areas as observed in wavelet-and GLCM-based approach.
Figure 6(f) shows that the segmentation obtained by GLCM is unable to distinguish between slum and built-up areas at multiple locations. This is due to the fact the curvelet and contourlet features capture the directional and isotropic properties of textures.
Similar analysis is carried out for Worldview-2 image of Mumbai city covering Cheeta Camp slum.
Here it can be seen that the texture of vegetation, water, slum and built-up areas are quite different, therefore texture measures should prove to be effective in separating these classes. The segmentation results are shown in Figure 7  contourlet efficiently capture the curvilinear details and represents the structures having various orientation and anisotropic characteristics. Figure 7( d,e) demonstrates the utility of the curvelet and contourlet features for oriented buildings which manifest directional details in the image.
Figure 7(f) shows the segmentation by GLCM method, it has difficulty in distinguishing between slum and built-up areas at many places buildings are wrongly classified as slums. Further, boundaries between classes are not properly captured in comparison to that of proposed methods in Figure 7(d,e). The classification with proposed method performs better, and classes considered based on visual inspection are classified correctly with fine boundaries between classes.

Class separability
Various samples from different locations for all the classes are considered and feature vectors are extracted. The range of features is shown for different classes. Figures 8 and 9 indicate the relative separation between classes in texture feature space for IRS-1C and Worldview-2 data sets, respectively. It is observed that curvelet-and contourlet-based statistical feature descriptors show better separation as compared to wavelet and GLCM-based features for all the class pairs. It is observed that no two class pairs are getting overlapped in curvelet and contourlet texture feature space, which results in proper classification of slum areas from other area classes. Slum-built-up class pair and slum-vegetation pairs show maximum separation for proposed feature descriptors in contrast to wavelet-based and GLCMbased descriptors where the classes are getting overlapped which results in erroneous classification at many places as shown in Figures 6(c,f) and 7(c,f). As built-up area contains more edges in comparison to slums and vegetation, the edges are very well captured by curveletand contourlet-based features, which in turn help in distinguishing the classes better with feature space separation of 1.2 units. Formal areas show structured patterns in different directions which are successfully captured by curvelet and contourlet due to directional property of coefficients. Water class is more homogenous as compared to built-up areas, the discriminating power of curvelet and contourlet is not as high as for built-up areas but still higher than that of wavelet-and GLCMbased descriptors. Further, because curvelet and contourlet exhibit a strong ability to capture intrinsic geometrical  details and directional selectivity, they can analyse and recognize very small features of different classes containing rich and dense detail information.
As it is observed from Figures 6(c) and 7(c), slum and built-up classes are mixing at many places. We investigated these erroneous classification results and found that wavelet-based features are not as separable as those obtained from curvelet and contourlets. Table 1 shows the distances for correctly classified and misclassified pixels using wavelet-based feature. Curvelet and contourlet feature distances are also provided for comparison. These results are in agreement with class separability measures as shown in Figures 8 and 9.
Water class is more homogenous which results in lower variance values as compared to other classes. Similarly, it has lower reflectance as well which in turn reduces the mean and energy in approximation sub-bands. Therefore, mean and energy are the most descriptive features for water class in approximation sub-bands of MRA.
The variance in detail sub-bands differentiates well between slums and formal built-up areas, since high variance values indicate a stark variation in pixel values within the window for built-up areas. This occurs in particular at the edge of a building, when the surroundings of the buildings have a distinctly different reflectance. Thus, the variance will mainly depict building structures in formally developed areas as they have high variance values particularly in detail sub-bands while houses in slums are usually too small and clustered to have a clear separation with their surroundings. The curvelet and contourlet transforms decompose the image in more number of detailed sub-bands in different orientations (16 here) as compared to wavelet decomposition which has only 3 orientations. The variance feature in detail subbands is the most dominant descriptor to isolate slums from built-up areas.
Kurtosis value is found to be higher in approximation and detailed sub-bands for slum areas and builtup areas, respectively. Skewness is found to be in similar range for vegetation and slum classes, whereas this range is quite different for formal built-up areas. This allows the extraction of densely built-up areas with low GLCM variance values, having a high probability to be slums. The detailed analysis in different decomposed detail and approximation sub-bands for feature selection mechanism to identify most descriptive feature for different classes will be the future prospect of this work.

F-ratio-based class separability
To quantify the discriminating power of features further, a statistical parameter viz., the F-ratio is used here which is a measure of inter-class to intra-class variance in feature space. Table 2 shows the discriminating power of individual feature spaces. It is observed that curvelet-and contourlet-based energy and moment features have greater discriminating power than wavelet-based features for both the images.

Accuracy assessment
The results from the proposed method are very encouraging as seen by the accuracy analysis. The detailed ground reference images are used, and classification analysis in terms of confusion matrix and kappa coefficient is presented.
Tables 3 and 4 describe the producer, user and overall accuracies for IRS-1C and Worldview-2 images, respectively. The class-wise accuracies are in agreement with the class separability measures as discussed in Figures 8 and 9. As observed from Figure 6(d,e), formal built-up area and slums areas are properly segmented, and the corresponding producer's and user's accuracies are better for curvelet and contourlets in comparison with waveletbased features. The overall accuracy using curvelet and contourlet is 91.4% and 88.9%, respectively, as compared to 81.9% accuracy obtained from wavelet-based approach. The highest user accuracy using wavelet method is found to be for water class (which is in general more homogeneous as compared to other classes and does not contain structural details), however this is not as high as that obtained from curvelet-and contourletbased methods. Similarly, Table 4 shows the accuracies for Worldview-2 image. The overall accuracies are found to be 92.3%, 91% and 82.3% for curvelet-, contourlet-and wavelet-based methods, respectively, for MDM classifier. For comparison, the user and producer accuracy are also computed for SVM classifier and it is found that SVM improves the accuracy for each class for both the images (Table 5 and 6). In order to assess the effectiveness of different MRA features apart from visual interpretation, local and global consistency errors (GCEs) (Martin, Fowlkes, Tal, & Malik, 2001) are computed as quantitative measures to evaluate the degree of matching between segmentation output and the reference site. Local consistency error (LCE) is an error measure to quantify the consistency between image segmentations of differing granularities which allows labelling refinement between segmentation and ground truth (Martin et al., 2001). GCE forces all local refinements to be in the same direction and assumes that one of the segmentations must be a refinement of the other.
LCE and GCE are computed as (Martin et al., 2001): GCEðR; SÞ ¼ 1 n min X i EðR; S; p i Þ; where E(R, S, p) measures the degree to which two segmentations viz. reference segmentation (R) and segmented output (S) agree at pixel p, and n is the size of region where pixel p belongs. LCE is an error measure, with a score 0 meaning no error and a score 1 meaning maximum error.   LCE and GCE quantify the degree of closeness or matching between segmentation results obtained from different MRA methods and the reference segmentation window generated from human visual interpretation. The lower values of LCE and GCE demonstrate higher degree of matching for curvelet (0.08, 0.09) and contourlet (0.085, 0.089) features when compared to wavelet-based (0.39, 0.42) segmentation (Table 7).

Conclusions
In this paper, a textural analysis method based on curvelet and contourlet MRA for slum identification is proposed. The proposed algorithm is tested on IRS-1C and Worldview-2 images of Mumbai city covering different areas of slums of Dharavi and Cheeta Camp, respectively. The results are compared with a MRA-based wavelet method and a non-MRA-based GLCM method of segmentation. The performance is evaluated based on visual interpretation, and class separability measures in terms of texture feature distance and F-ratio, kappa coefficients, LCE, GCE, class-wise and overall accuracies. From the experimental results, it is found that the multi-scale curvilinear approach of curvelet and contourlet yielded better class discrimination for all the class pairs for both the images. This improvement in segmentation is due to the fact that the curvelet and contourlet transforms have good directional selectivity for linear as well as curvilinear discontinuities when compared with traditional MRA technique based on wavelets.
The experimental results for the curvelet-and contourlet-based methods exhibited good performance in terms of both visual interpretation and feature discrimination and are sufficiently robust against random pixels while preserving spatial arrangement. An overall classification accuracy of 91.4-95.4% is achieved with proper boundary shapes using curvelet method. This would facilitate monitoring of slum dynamics by urban authorities to carry out advanced analysis for planning.

Disclosure statement
No potential conflict of interest was reported by the authors.