Automatic building footprint extraction from high-resolution satellite image using mathematical morphology

Automatic building extraction from High-Resolution Satellite (HRS) image has been an important field of research in the area of remote sensing. Different techniques related to radiometric, geometric, edge detection and object based have already been discussed and used by various researchers for building extraction. However, faithfulness of extraction is highly dependent on user intervention. This study proposes a novel morphological based automatic approach for extraction of buildings using HRS image. Moreover, using such an automatic approach, buildings can be detected having different size and shape. The proposed technique integrates morphological Top-hat filter, and K -means algorithm to extract buildings having bright and dark rooftops. Further, extracted bright and dark rooftop building segments have been combined together to obtain the final output that contains final extracted building segments. In order to eliminate false-detected buildings, different parameters like area, eccentricity, and axis ratio (major/minor axis) have been used. The suitability of the technique has been judged using different indicators such as completeness, correctness and quality.


Introduction
Automatic extraction of buildings has found its applications in various areas like land use land cover mapping, change detection, urban planning, disaster management and many other socio-economic activities. Remote sensing and GIS techniques play a major role in such applications. However, application areas of remote sensing have been considerably increased with availability of sub-metre resolution data from high-resolution earth satellites, such as, IKONOS, WorldView, and QUICKBIRD. HRS imagery facilitates to identify various urban related features, such as, road, tree, building and other natural and man-made structures, present on the earth's surface. Since, manual extraction of buildings using satellite imagery is time consuming and costly affair, hence, automated methods for building extraction have been emerging as a time and cost effective solution with minimal user involvement.
Automatic extraction of building from satellite image has been always a difficult task for many reasons, such as, building structure and shape, which may vary, or presence of obstacles posed by surrounding objects, such as, trees, high rise buildings, etc. Further, the contrast between roof of building and surrounding region may be low, which has been an important criteria in segmentation and varying roof material reveals different spectral characteristics. Considering all these difficulties, different filters, which have been used to extract edges using satellite imagery, have been broadly classified into three groups, namely, gradient based, laplacian based and morphology based (Katiyar & Arun, 2014). Since, both gradient-and laplacian-based filters are very sensitive to noise, mathematical morphology-based technique has been used in this study to extract buildings using HRS imagery.
Building tops in urban area do not have similar shape, size and texture. However, these buildings have certain common characteristics, such as, their bright appearance and high contrast to the surrounding features. Considering these common building characteristics, the objective of this study is to extract buildings from HRS imagery using morphological Top-hat transformation with minimal human intervention. Moreover, K-means algorithm has been used to determine centroids of predefined classes and "salt and pepper" effect and impulse points from the segmented image have been removed by using median filter. In order to validate performance of proposed methodology, both object-based and pixel-based analysis has been carried out and obtained results have been illustrated using confusion matrix. The proposed methodology has been implemented using MATLAB 13 while ArcMap 10.2.1 has been used for digitization purpose in order to prepare the reference building map.

Literature review
Literature reveals a great deal of applications of satellite imagery for feature extraction. A wide range of automatic and semi-automatic techniques for feature extraction have been proposed in different studies. In the following discussion, efforts of such relevant studies have been highlighted. Lin and Nevatia (1998) have used edge detectionbased technique to detect rectangular buildings with flat roofs using geometric and projective constraints from a single-intensity image. The technique further constructs a 3D model of verified buildings using 2D, 3D evidences and shadow cast by buildings. However, this technique detects only rectilinear structures, which limits its utility to the areas covering only rectangular buildings. Mayer (1999) carried out a comprehensive survey on aerial image-based building extraction techniques. The comparative assessment of different techniques has been done on the basis of certain criteria such as, assessment and complexity, data and its complexity. Further, considering the benefits and limitations of different techniques, a combined model and strategy for object extraction using aerial imagery has been proposed. The proposed model addresses the fundamental issues in object extraction, such as, scale, context and 3D-structure, etc. Wei, Zhao, and Song (2004), has proposed a supervised clustering and edge detection-based technique to extract large buildings from HRS QUICKBIRD panchromatic imagery having shadow evidence. However, the technique failed to extract small buildings having little or no shadow.
In order to extract small buildings, Jin & Davis (2005) used spectral information along with structural and contextual details using IKONOS satellite imagery of Columbia city, Missouri. In the proposed technique, structural and contextual details have been used to distinguish buildings from parking and other features having similar spectral information. The technique has been found efficient, which extracts 72.7% of building area with a quality percentage 58.8%. However, it is observed that integrating data from structural, contextual, and spectral results has been an extremely complicated procedure.
Advanced morphological operators, such as, Hit or Miss transformation with varying size and shape of structuring elements have been used by Lefèvre, Weber, and Sheeren (2007), to extract buildings from HRS QUICKBIRD panchromatic imagery. The accuracy of the methodology has been computed in terms of precision rate, which has been 88% with a kappa value of 63%. In order to improve the overall accuracy of building extraction procedure, Huang and Zhang (2012) introduced Morphological Shadow Index (MSI) and used along with Morphological Building Index (MBI) [Huang and Zhang (2011)] in an object-based framework. Further, Differential Morphological shadow and building operator have been used by Singh and Mehrotra (2014), to retrieve buildings using HRS Geoeye-1 imagery of the city Washington DC Mall. The Overall Accuracy (OA) of the proposed methodology has been 95.12%. Huang, Lu, and Zhang (2014) has proposed multiindex learning (MIL) method for HRS images using set of indices, such as, MBI, MSI, and Normalized Difference Vegetation Index (NDVI) to improve classification results over urban areas. Similarly, in order to improve classification accuracy using remote sensing image, Huang et al. (2016) introduced Generalized Differential Morphological Profile (GDMP), which has been found advantageous over traditional Differential Morphological Profile (DMP) [Pesaresi and Benediktsson (2001)]. Further, Benediktsson, Pesaresi, & Amason (2003) investigated methods to pre-process the DMPs, such as, decision boundary feature extraction for neural networks, discriminant analysis feature extraction, and simple sorting feature selection in order to reduce the computational load when DMPs have been used for classification by neural networks. Singh, Maurya, Shukla, Sharma, and Gupta (2012), used NDVI-based segmentation method to separate manmade and natural features and morphological operators have been used to remove false detected regions. However, their approach has been based on the assumption that building has rectangular shape, which may not be true in modern urban scenario.
Koc San and Turker (2010) have used Hough transformation and Support Vector Machine (SVM) classification technique to extract rectangular and circular buildings from HRS panchromatic and pan-sharpened IKONOS imagery. The results have been tested on identified industrial and residential areas in the imagery using parameters, such as, Building Detection Percentage (BDP) and Quality Percentage (QP). For industrial buildings, the values for BDP and QP have been 93.45 and 79.51%, respectively, similarly residential area having rectangular and circular buildings, BDP and QP values for rectangular buildings are 95.34 and 79.05% and 78.74 and 66.81% for circular buildings. In order to improve classification accuracy, Huang and Zhang (2013) proposed Support Vector Machine (SVM) enabled approaches (C-voting, P-fusion, and OBSA), which combines spectral, structural and semantic features for the classification of HRS image. The proposed approaches have been tested at both pixel and object levels and reported that, object-based C-voting and P-fusion technique improve overall accuracy by 0.3-2.0% compared to their pixel-based version. Dey et al. (2011) have proposed a context-based multi-level segmentation method using pan-sharpened and Multi Spectral (MS) GeoEye-1 imagery of the city Hobart, Australia. The proposed methodology extracts homogeneous buildings using shadow-object geometry with an accuracy of 71%. Further, building extraction accuracy using pan-sharpened image was 5% greater than that of MS image. However, this methodology cannot extract buildings having little or no shadow, although, these buildings have the considerable area on the ground.
Evaluation of results of automatic building detection has been highly influenced by the evaluation method, since there is no unique definition to identify what constitutes a "correct" detection [Shufelt (1999)] and comparison itself may be carried out by different ways, such as, visual inspection [Vögtle and Steinle (2003)] or automatically [Rottensteiner, Trinder, Clode, and Kubik (2005)]. In order to assess how the evaluation results have been influenced by the evaluation method, Rutzinger, Rottensteiner, and Pfeifer (2009) has carried out a critical overview on various existing pixel-based [Shufelt (1999)] and object-based [Matikainen, Hyyppä, and Hyyppä (2003); Zhan, Molenaar, Tempfli, and Shi (2005); Rottensteiner et al. (2005); Shan and Lee (2005)] methods for the evaluation of building detection algorithm and proposed a comprehensive evaluation strategy for building detection.

Study area
The performance of proposed methodology has been tested on the IKONOS panchromatic image having 0.60-m resolution, captured in the region Santa Ana, California in the USA, Figure 1. The extents of image covers area between latitude 33.69°to 33.71°N and longitude 117.91°to 117.90°E, which is equivalent to an area of about 1.60 sq. km (1.10 km × 1.45 km). Santa Ana is situated in Southern California, adjacent to the Santa Ana River, the second largest metropolitan area in the United States. It is located at Latitude 33.75°N and Longitude 117.87°W.

Methodology adopted
The methodology adopted for automatic building extraction has been shown in Figure 2. The proposed technique is based on Top-hat transformation and K-means algorithm.

Image acquisition (I PAN )
A scene including various size, shape, and varying reflectance of building tops has been selected as input captured in the region Santa Ana, California in USA by HRS IKONOS panchromatic sensor. The scene selected includes various urban features, such as, road, vegetation, shadow, and parking lots.

Top-hat transformation (I WTH and I BTH )
Top-hat transformation using morphological operations has been used to isolate dark and bright regions of an image Serra (1982). Moreover, morphological operations simplify image data, preserve essential shape characteristics, and eliminate irrelevancies (Haralick, Sternberg, & Zhuang, 1987). There are two variants of Top-hat transformation, white Tophat and black Top-hat, which have been used to extract bright and dark features of an image respectively. However, selection of appropriate size and shape of structuring element (S) plays a key role to identify different features of an image. In this study, white Top-hat and black Top-hat transformation have been applied to an input image (I PAN ) by selecting appropriate size (8) and shape (disk) of structuring element using MATLAB 13 functions, and used in image enhancement process by using fundamental morphological operations as shown below White Top-hat transformation where, I o S ð Þ and I S ð Þ are morphological opening and closing operations respectively.

Image enhancement (I En )
Image pre-processing has been frequently used to remove noise, make image corrections, and to enhance image. Image enhancement through contrast enhancement has been an important technique in which dark and bright areas present in the image are separated to make image better for human vision (Gupta & Kaur, 2014). Local contrast enhancement is a technique in which original image is added to the difference between Top-hat and bottom-hat transformed image (Ritika, Kaur, & Ritika, 2013). Following relation has been used to enhance the contrast of an image.
where, I WTH and I BTH are extracted white and black image regions using white Top-hat and black Top-hat transformations, respectively.

White top-hat transformation (I WTH )
The increased contrast of enhanced image further facilitates differentiation between bright and dark features, which improves the feature extraction during Top-hat transformation process. In this step, white Top-hat transformation is applied by selecting appropriate size (30) and shape (disk) of structuring element on enhanced image (I en ). Further, K-means algorithm has been used to determine threshold value from the obtained image to separate bright and dark regions of an image.
K-means clustering algorithm (I tk ) K-means algorithm (Lloyd, 1982) is an unsupervised clustering algorithm and it classifies a given set of data into K number of clusters in two separate phases. In the first phase, it randomly selects K centroids and in the second phase, it partitions each pixel to the closest centroid from the respective pixel, where, K = 1,2, . . .,n is an user-defined value indicate, the number of predefined classes.
In a typical urban scenario, Building rooftops have different colours, size and shape. However, in panchromatic image building rooftop appears from bright to dark (i.e different shades of grey scale from 0 to 255 (65,536 for 16 bit image)) depending upon their rooftop (colour) reflectance. Considering this important phenomenon of building rooftops in this study, the value of parameter K in K-means algorithm has been set to 3 (three) with respect to bright, dark, and intermediate building rooftops respectively. Further, K-means algorithm determines the centroids of three predefined building classes; dark (T d ), intermediate (T i ), and bright (T b ), respectively. In the next step, centroid of intermediate class has been used as a threshold value to divide image into two separate images, dark (I d ) and bright (I b ), respectively. After threshold, following steps (Section 4.6-4.8) have been followed separately for both the images (I d and I b ), Figure 2.

Median filter
Image filtering is the process to emphasize certain features or to remove undesirable pixels (noise) from an image. However, removing noise and restoring the original image has been one of the most important issues in image processing applications. Although in literature, different filters have been used to remove noise, the standard median filter (Tukey, 1977) has been frequently used in many image processing applications due to its simplicity, edge preserving property and its robustness to impulsive noise. Moreover, median filter of size 3 × 3 has been found to be effective to eliminate "salt and pepper" effect, which has been commonly observed in an image transmission (Zhu & Wang, 2012) and (Tejaswi, Rao, Nair, & Prasad, 2013). The working of nonlinear median filter is to run through a masking window across each pixel entry of an image and replace luminance value of the centre pixel of filtering window with median of luminance values of pixels containing within the window. Further, it has been observed that the median filter separates segments by eliminating thin lines connecting different segments. According to Morgan and Tempfli (2000), elimination of these thin lines is important where 8-direction connectivity is used for segmentation. In addition, median filter further avoids segment merging problem and isolates the segments created.

Connected component labelling
Connected component labelling (Morgan & Tempfli, 2000) has been used to identify the connected pixels (segment) and assign a unique label (C k , k = 1, 2, . . ., n) for each extracted segment. Moreover, Median filter has been used before connected component labelling to reduce noise and misclassification problem. Usually segmentation has been done after removing non-interested pixels; however in this study; different parameters have been used to eliminate non-building segments (section 4.8).

Removal of false-detected building segments
After connected component labelling, it has been observed that the candidate building segments include non-building objects, such as, road, parking lots, and other urban features. During segmentation, due to similar reflectance properties, these objects have been misclassified in building class. However, in order to filter out these false candidate building segments, initially, all the extracted segments have been numbered (B n ), where n = 1,2,3,. . .. . .n. In the next step, parameters, such as, axis ratio, eccentricity, and segment area have been used to eliminate falsedetected candidate building segments.

Segment axis ratio (s r )
Most of the buildings in an urban scenario are rectangular or square in shape. In order to extract these buildings, Singh and Mehrotra (2014) have used the ratio of the area of the candidate building to the area of its Minimum Enclosing Rectangle (MER) for calculating rectangular fit. However, in this study, following relation has been used to extract rectangular buildings.

S r ¼
Major axis length Minor axis length (4) Candidate building segment having S r value "1" has been assigned square shape, whereas S r value greater than "1" has been identified as a rectangular shape building. In order to obtain the threshold value for S r , considering the general shape of building in an urban scenario, values in the range 1-10 have been tested on the input image and suitable threshold value (7) has been obtained, such that, rectangular and square buildings in the region have been extracted.

Segment eccentricity (S e )
Eccentricity returns a scalar that specifies the eccentricity of an ellipse. The eccentricity is the ratio of the distance between the foci of the ellipse and its major axis length and its value lies in between 0 to 1. An ellipse having an eccentricity "0" (zero) is actually a circle while an ellipse having an eccentricity "1" is a line segment. In order to obtain the threshold value for eccentricity, different shapes of circle (or ellipse) have been plotted with respect to the eccentricity value in the range 0 to 1. Further, by observing these plotted circle (or ellipse), the highest elliptical shape which can be a building rooftop shape has been identified and respective eccentricity value (0.85) has been selected as threshold value for eccentricity. The candidate building segments having eccentricity below the threshold value have been accepted as circular buildings whereas the candidate building segments having eccentricity greater than threshold value have been eliminated as misclassified building segments.

Segment area (S a )
The area of each candidate building segment has been the number of pixels in the corresponding connected region (Singh & Mehrotra, 2014). In general, falsedetected non-building urban features, such as, road and barren land are usually either elongated or cover large area on the ground and very small features, such as, vehicles covers very small area on the ground. However, buildings in urban area usually cover considerable area on ground at the same time their area on ground is not very large. Considering general building area in the study area, appropriate threshold range (400-40,000 pixels) has been set, such that very small features, such as, vehicle and very large features, such as, road, barren land, etc. have been eliminated. Building candidates that have a value within the threshold range (Maximum and Minimum) have been accepted as true buildings. However, threshold range may vary depending on resolution of an image and residential or industrial urban scenario under consideration.

Segment merging and segment edge detection
After removing misclassified candidate building segments by using above parameters, the dark and bright roof-top building segments have been obtained in two separate images, Figure 2. Further, these separately extracted building segments have been combined together to get a final extracted building segment image. Moreover, small holes present in the extracted building segments have been filled up and edges of all extracted building segments have been extracted by using Boundary Tracing function in MATLAB 13. In the next step, these boundaries have been overlaid on the input image (I PAN ) to get the final output image (I o ).

Quality assessment
The quality of building extraction technique has been validated by examining quality measures, such as, Completeness, Correctness and Quality (Singh et al., 2012). Before applying these measures, the extracted candidate building segments have been categorized in to three classes as explained below.
a. True Positive (TP) are the buildings among the extracted buildings that are actually buildings.
Object-based evaluation The main objective of object-based evaluation is to identify the number of true building segments (TP), misclassified non-building segments (FP), and non-extracted true buildings (FN) by comparing extracted building segments and reference building map. The reference building map has been obtained by digitizing input satellite image (I PAN ) and further used for comparison. In general, during object-based evaluation, the extracted building segments which has a certain minimum overlap, typically 50-70%, with buildings in the reference building map have been accepted as TP [ [Matikainen et al. (2003); Zhan et al. (2005); Rottensteiner et al. (2005)]. However, in this study, the extracted building segments which has overlap greater than 60% (Average of range 50-70) have been identified TP, while remaining extracted segments have been considered as FP. Further, the evaluation parameters completeness, correctness, and quality obtained by object-based analysis have been denoted by Completeness obj , Correctness obj , and Quality obj , respectively.

Pixel-based evaluation
During object-based evaluation strategy, there has been no such unique criterion to decide whether an object has been TP or not. Further, Shufelt (1999) considered a pixel-based metrics which has been more objective than object-based metrics. During pixel-based analysis, the raster representation of the extracted results has been compared with the reference image [Rutzinger et al. (2009)]. The evaluation of results has been presented as an image, which represent the spatial distribution of the TP, FP and FN. Further, the evaluation parameters completeness, correctness, and quality obtained by pixel-based analysis have been denoted by Completeness pix , Correctness pix , and Quality pix , respectively.

Result and discussion
The proposed automatic building extraction technique has been based on Top-hat transformation and K-means algorithm. The Top-hat transformation has been used to separate dark and bright building segments, while, K-means algorithm determines the threshold value, which divides the dark and bright building segments into two separate images (I d and I b ). However, misclassification problem, which has been commonly observed in feature extraction, has been removed by using parameters, such as, axis ratio, eccentricity and segment area. Figure 3(a-c) shows the enlarged view of results of removing misclassified buildings after using parameters. Thereafter, extracted bright and dark roof-top building segments have been shown (Figure 4(a) and Figure 4(b)), respectively. Further, these separately extracted building segments have been combined together to obtain a final extracted building image (Figure 5(a)). Here, the boundaries of all extracted building segments have been determined and overlaid on the input image (I PAN ) to get the final output ( Figure 5(b)). Thereafter, buildings in the original input image have been digitized using ArcMap 10.2.1 ( Figure 5 (c)) and used further as a reference building map during evaluation of result. The evaluation of result of automatically extracted buildings has been carried out by using both, object-based and pixel-based analysis as discussed in Section 4.11 and 4.12.

Object-based evaluation
During object-based evaluation, the digitized reference building map, Figure 5(c), has been compared visually [Singh et al. (2012);Heipke, Mayer, Wiedemann, and Jamet (1997)] with extracted final output image, Figure 5(b). The results of visual comparison have been shown in Table 1 and used further in analysis and assessment of quality. False-detected non-building regions and non-extracted true buildings have been shown in Figure 6(a) and Figure 6(b) respectively.
Here, 14 non-building urban features have been misclassified in building class due to their similar reflectance and structural properties to that of true buildings, while, 12 buildings, having similar reflectance value to that of the road and other surrounding urban features has been eliminated. Further, Values of completeness, correctness and quality have been computed using above Equations (5, 6, and 7) and results obtained has been given in Table 2.
The computed values of completeness, correctness and quality together indicate the performance of the methodology adopted. High value of completeness  (0.92) ensures that all the true buildings have been extracted. Further, high value of correctness (0.90) shows that very few non-building urban features have been falsely detected as buildings. However, quality has been the measure of both completeness and correctness, which indicates total error in building extraction methodology. In this case, high value of quality (0.83) confirms that true buildings have been successfully extracted from the input image with minimum error.

Pixel-based evaluation
In order to critically analyse the performance of proposed methodology, pixel-based evaluation [Rutzinger et al. (2009)] has been carried out. During pixel-based analysis, the raster representation of the extracted results has been compared with the reference image and extracted pixels have been classified into three categories TPs, FPs and FNs. Figure 7 shows the pixels classified into three respective categories, TPs: Green, FPs: Red and FNs: Black, using pixel based evaluation. Further, values of completeness, correctness and quality have been computed using above Equations (5, 6, and 7) and result obtained has been given in Table 3.
The obtained values for the parameters completeness, correctness, and Quality using pixel-based analysis have been 0.86, 0.89 and 0.78, respectively, which are acceptable when compared with Rutzinger et al. (2009).   Figure 6. (a) Extracted non-building objects (b) Non-extracted true building objects. Furthermore, in order to validate performance of proposed methodology, methodology has also been tested on a scene selected from another IKONOS panchromatic image having 0.60m resolution acquired in the region Naples, Florida in USA, Figure 8(a). The region selected includes buildings having various size, shape, and reflectance, which are morphologically different than buildings in first image, Figure 1. Further, the proposed methodology shown in Figure 2 has been applied on input image Figure 8(a) and final extracted building image has been shown in Figure 8(b).
Thereafter, evaluation of obtained result has been carried out by using both, object-based and pixelbased analysis and has been shown in Tables 4, 5 and 6, respectively. Figure 9 shows the pixels classified into three respective categories, TPs: Green, FPs: Red and FNs: Black, using pixel-based evaluation technique.
The obtained values for the parameters completeness, correctness, and Quality using object-based and pixel-based analysis are acceptable when compared with Rutzinger et al. (2009).

Conclusion
Automatic building extraction has been an important area of research in remote sensing. It has been contributing to various applications such as land-use land cover mapping, disaster management and many other socio-economic activities. Proposed methodology successfully extracts buildings having different size and shape with minimal human intervention. Further, the problem of misclassification has been removed by using parameters, such as, axis ratio, eccentricity and segment area.
Interpretation of obtained result shows that buildings with very bright and dark roof tops have been      Rutzinger et al. (2009). The proposed automatic building extraction methodology is simple, fast, and effective, which does not require any extra information, such as, training samples and digital elevation model and yields high accuracy. Proposed methodology can be further used for various applications, such as damage estimation by identifying damaged and undamaged buildings and calculation of density of buildings in the region.

Disclosure statement
No potential conflict of interest was reported by the authors.