Detection of illegal constructions in urban cities: comparing LIDAR data and stereo KOMPSAT-3 images with development plans

ABSTRACT Illegal building construction can be a problematic issue on densely populated metropolitan cities, especially because it leads to crooked civilization and unbalanced urban structure. The development plans define a maximum boundary for building coverage and floor ratios, but practically it is challenging to apply the plans to the real life in partly unplanned cities that are going under urban transformation, such as Istanbul. Rapidly changing settlements make it crucial to track property statuses carefully since there can be juristic consequences. In this study, we use airborne Light Detection and Ranging (LIDAR) point cloud data, stereo KOMPSAT-3 images and development plans of Guzeltepe district, locating in Uskudar/Istanbul of Turkey to detect the structures built against the development plans. The plans were converted into aGeographical Information System (GIS) environment for the ease of analysis. Digital elevation models and digital surface models (DSMs) were created with both LIDAR and stereo KOMPSAT-3 data to obtain normalized DSMs for extracting the heights of the buildings. These heights extracted from our models were then compared with the planned boundaries. Also, the two different dataset’s accuracies were analyzed. Our research may have direct implications on automatically building 3D plans and eventually instrumental for identification of illegal buildings.


Introduction
A healthy urban planning and development is a crucial task for maintaining the organized metropolitan life, especially in geographically challenging and overcrowded cities, such as Istanbul. The increasing population, wide job opportunities, new railroads, highways, and the interest in living in a better environment lead people to sprawl over the urban fringe of big cities or agricultural districts unevenly. High demand for living in a good status, and mainly social, economic, and administrative reasons create crooked urban development and with the lack of planning regulations, all the factors lead to unplanned and unwanted illegal settlements (Ioannidis, Psaltis, & Potsiou, 2009).
The right of ownership gives the title holder the authority to use the property within the legal restrictions as he wishes, because of that; it is very important for proprietary right to be protected by the laws (Reisoglu, 1969). The increasing number of illegal buildings causes problems such as the reduction of regional and national revenues, as they are not fully taxed and registered. Thus, it is important to detect and abort the problematic structures immediately. In Turkey, in several cities there are regions that lack the technical infrastructures and reinforcements, and are economically and socially weak, have increasing number of crime rates. Those regions need urban transformation. Since the 1980s, the urban transformation has been gaining currency in Turkey.
Although there are regulations for this transformation, only rehabilitation works could be done, the urban transformations could not go further to physical changes (Sisman & Kibaroglu, 2009). The automatic extraction of buildings, change detection, and classification methods are improving rapidly and gaining more interest for urban planning purposes. There are several studies that used high-resolution satellite images, LIDAR point clouds and topographic data to extract buildings and/or generate and digital surface models (DSMs), normalized DSMs (nDSMs), 3D city models and to do 3D change detection, by using traditional or modern techniques such as artificial neural networks, machine learning. LIDAR is an active remote sensing technology that emits direct pulses of laser light to compute the heights, distances and tree dimensional coordinates of points by measuring the time difference between emitted and returned light beams, according to the incidence angle of the emitted beam and the absolute location of the sensor. The final product of these measurements are called point clouds and they are mainly used for production of Digital Elevation Models for different purposes (LIDAR 101,n.d.). In almost all studies, LIDAR point clouds and very high-resolution stereo images showed a higher accuracy than all the other methods. The studies overall showed that on 3D modelling and 3D change detection with the LIDAR point clouds, and highresolution images can be used for urban planning and cadastral applications (Bayburt, Buyuksalih, & Jacobsen, 2018;Bayburt, Kurtak, Buyuksalih, & Jacobsen, 2017;Benedek, Descombes, & Zerubia, 2010;Ioannidis et al., 2009;Lague, Brodu, & Leroux, 2013;Maltezos & Ioannidis, 2015;Moghadam, Delavar, & Hanachee, 2015;Pang, Hu, Wang, & Lu, 2014;Singhal & Radhika, 2014;Uzar & Yastikli, 2013;Yastikli & Cetin, 2017).
For rural and natural areas, InSAR also has long been an important technology for the generation of DSMs. With the development of very high resolution SAR sensors, such as TerraSAR-X, also the analysis of complex densely built-up areas has become an important research topic. Although, airborne LIDAR gives the best solution in terms of accuracy, but the cost is higher than for the other techniques (Ioannidis et al., 2009).
In this study, we compared the data with the development plans in terms of the buildings for LIDAR data and building blocks for KOMPSAT-3 data in the neighborhood. Thus, the areas that need floor reduction or increment are determined. The results provide information about both the areas built-up against the law and the development of the area, in terms of new job opportunities, new settlements, new roads, and other needs. New regulations can be done regarding the changes happening after identifying those areas. As the LIDAR data have very high accuracy and KOMPSAT-3 imagery has very high spatial resolution, and having the advantages of fast acquisition time, automatic processing, covering larger areas with larger field of view, this paper aims to show that they are capable and can be used in urban planning, as an alternative to field survey.

Study area
The study area is Guzeltepe neighborhood located in Uskudar/Istanbul, Turkey ( . They are defined in UTM Zone-35 and WGS-84 is used as reference ellipsoid. This area was chosen because it is a rather problematic neighborhood consisting of complicated building structures, illegal floor constructions, and on-going legal prosecutions. All these factors make this area one of the best challenging candidate for this study. Also, the airborne LIDAR data acquisition was already carried out, and the preprocessed point cloud data and development plans were available.

Data used
Airborne LIDAR data used in this study was acquired in 2013 with a Riegl Q680i laser scanner mounted on a Bell 206 Jet Ranger light utility helicopter. The point density is 16 pts/m 2 and the flight altitude is approximately 500 m above the ground. The development plans are obtained from Uskudar Municipality.
Stereo KOMPSAT-3 images were acquired on 24 March 2017 and used for DEM generation. The Korean KOMPSAT-3 has 0.7 m ground sampling distance (GSD) for panchromatic and 2.8 m GSD for multi-spectral images. The data is provided as a bundle (pan+4 ms) or pan-sharpened Level 1R and Level 1G product. The radiometric and sensor distortions are corrected in the Level 1R product; level 1G products are additionally corrected for geometric distortions and are projected to UTM coordinates. Level 1G product has all the corrections done for Level 1R. By using SRTM DSM 3-arcsec data, the optical distortions and terrain effects are corrected and orthophoto is produced. (KARI, 2013).
For all processing and analysis purposes, MicroStation, TerraSolid, ArcGIS, ERDAS, NETCAD, Geomatica software, and Google Earth geo-browser are used.

Methodology
LIDAR data characteristics, point classification, and height model generation 3D LIDAR point cloud used in this study covers the boundaries of eight adjacent 1/1000 scale base maps and also the whole Guzeltepe neighborhood. Geometrically corrected LIDAR point clouds were classified using TerraSolid software. Layered point clouds are obtained by classification of .LAS data with TerraScan. The classified data include: ground, building, vegetation, and other essential categories (Table 1). Main classes are shown in Figure 2.
Ground layer demonstrates plain surface while vegetation classes correspond to the trees with heights as shown in Table 1. Building layer corresponds to buildings, a low point corresponds to faulty laser points which are located below the ground class, model key-points corresponds to characteristic land points for orthophoto output, and noise corresponds to faulty points which do not belong to any category.
DSM is a surface model produced from ground class and filtration of some detail points like vegetation, electric poles and buildings. In this model all details with their x, y, and z values are defined. With the rasterization process the data can be modelled in the desired grid space ( Figure 3).
LAS data density enables the production of DEM with 25 cm grid size. Additionally, another DEM data with 50 cm grid size which is also required in the project was produced. DEM data with .TIFF format tiled as 500 × 700 m 2 size was converted to .IMG format. DSM is a mathematical surface which includes vegetation, buildings, and the other visible objects on the ground (Figure 4). nDSM is a model which is obtained as a result of the removal of plain land surface, that is, DEM from DSM, and includes the object heights, like vegetation, trees and buildings (1). In Figure 5, this model shows the height difference on the surface with shades of gray.
3D city models are used for future estimations like exploring city data, analysis, and synthesis. The most important feature of 3D city model is that it allows the presentation of different spatial information in a single display. Furthermore, it enables generating complex city models and managing them. Because urban spaces have to be considered, planned and studied in a 3D environment that include models such as virtual 3D city models, land models, building models, plant models, roads have the display of geographic-based city data. The concept of scale for 3D buildings is stated with level of detail (LOD). Each LOD indicates a dilation level. Unlikely to 2D topographic maps, there is no LOD that is generally accepted for 3D buildings. The valid LOD today is mostly Medium vegetation (0.25-2.5 m) 5 High vegetation (2.5 mabove) 6 Building 7 Low point 8 Model key points 9 Error Figure 2. Classified LIDAR point clouds for Guzeltepe (bare earth, vegetation, and buildings are represented with colors orange, green, and red respectively).   determined by the data resolution, the content of semantic information and applications. 3D city modelling is making a contribution to obtaining 3D spatial data and minimizing the cost when used with its detailed levels.
There are no constituted extensive rules for 3D city modelling yet. In this regard, first standards are created to facilitate the 3D building data sharing by Open Geospatial Consortium (OGC) within City Geography Markup Language (CityGML) as the following: • Using CityGML language for the 3D building models which are created with the LOD, • 3D-Studio MAX and VRML file format for 3D building models, • ESRI Shapefile format for the 2D GIS data which includes 2D parcel and building contours and building height information, and • ESRI Shapefile format for the data that includes the geometric relations between building contours (topology).
Geographical Information Systems (GIS) is a group of tools and a computer model of geographic reality formed of hardware, software, data and user components, allowing to capture, store, manage and analyze the geographical information to meet the needs of specific interests, along with producing graphs and maps for representation (Pucha-Cofrep, Cánovas-García, Fries, & Oñate-Valdivieso, 2018). Apart from the spatial data above, 3D city models also include usual 2D geographic raster reference and vector data source. This data can be added on top of the 3D DSM as raster or vector data layer. CityGML enables data storage in . XML format data conversion and replacing data by GML language in the 3D city models. CityGML is developed for setting standards to the conception of the level of detail. In CityGML, there are five defined level of details as LOD-0, LOD-1, LOD-2, LOD-3, and LOD-4.
• LOD-0, the least detailed level, represents 3D DSM. An airborne image or a map of the model field can be linked to DEM.
• LOD-1 displays model of the land and the boxmodels of the buildings from the bottom of the building to top. It is the simplest model which has the 3D building models. In this LOD buildings are represented by rectangular prisms, and roofs are flat. • LOD-2 is a model that adds roof shapes to LOD-1 level. In LOD-2, types of building roofs, external details, and plants are displayed to a certain degree ( Figure 6). • LOD-3 adds exterior texture to the LOD-2. This LOD is produced by using some architectural models, such as the balconies, wall details, and roofs. High-resolution images are integrated to the exterior surfaces. Additionally, detailed plant and movable objects are shown in this LOD. • LOD-4 is the level that the information, like rooms, stairs, interior walls, and furniture are added to the LOD-3 (Kolbe, 2009;Yucel & Selcuk, 2009).
As part of a project conducted by BIMTAS company; georeferenced LAS data, DEM, DSM, nDSM, orthoimages, and LOD-2 3D city model covering an area of 5400 km 2 in Istanbul city were already generated, and used for this study (Figure 7). We specifically used the LoD-2 model to extract the building roof geometries. Because in Istanbul, buildings are not designed in a systematic way and have extremely different shapes and roof types with different slopes and annexes. LIDAR data acquisition can be hard on these kinds of environments and box shapes cannot be generated efficiently from the point cloud. So, we used LoD-2 model and used an approximation of roof heights to obtain proximate building heights.
Before the production of 3D city model with desired detail level, the LIDAR point cloud data is classified, edited automatically and manually.
Due to the dense housing and slope differences in the topography, the difference between building and ground classes at lower points; vegetation and building classes at higher points could not be separated very effectively. Especially in areas where the slope differences in the terrain is high, ground and buildings close Figure 7. Part of the 3D city model of Istanbul. Figure 6. LOD-2 model of a part of the study area.
to the ground could not be separated. The building roofs were mixed up with ground points. Also, trees originally assigned to high vegetation class, but standing too close to buildings and higher than 2.30 m caused problems in the classification of the buildings. In order to correct these problems, the point clouds needed manual editing afterwards. In the regions where building-terrain and building-vegetation classes were not separated accurately, sections are selected from the LIDAR data and manually corrected. After editing, the building vectors were created. First, the point clouds were classified automatically then it was edited manually (Figures 8 and 9).
In the process of creating solid building models, building vectors are created by using aerial images and topographic maps. The polygons belonging to outer boundaries of the buildings were used to decently create the outer lines. Aerial images are used when needed to adjust the missing and incorrect buildings. Figure 10 illustrates the LOD-2 city model with solid building structures.
The classification of the LIDAR point clouds is performed by Microstation based TerraSolid software. Roof surfaces and building frontals were created and extended to the terrain. All the process is done automatically, except for the mosque domes, minarets, bridges, and viaducts. Building models do not have a semantic structure due to CAD format. In addition, they contain difficult geometries due to the interception mistakes (snap errors) from surfaces that produce the   rood geometry. To remove the errors in the building models and constitute the semantic structure, automatic and manual editing were executed. FME Workbench software, especially designed to be used for GIS and CAD and raster graphic, is used for automatic editing. For this project, it is used to solve data processing problems, such as transforming .DGN formatted CAD data to another format, data model configuration and transformation, simple and complex transformation, without using another different software. The FME Workbench used in the transformations helps with minimizing those mistakes. For example, 0.5-m snap mistakes can be automatically corrected, double roof geometries can be erased.
To remove the topological mistakes in the building geometries and semantic modelling .DGN data need to be processed in some steps (Figure 11). The first step to edit building models is to chance raw building geometries to CityGRID .XML format.
The building models have all the objects semantically configured. Hierarchically, the building models are defined as building, roof, façade, roof detail, façade detail, and the roof and façade structure is separated according to the CityGML data structure (Figure 12).
The roof geometry has been classified in itself as Outer Boundary Eave, General Roof Line, Upper Break Edge, and Lower Break Edge. Outer Boundary Eave and General Roof line, encloses the roof and main lines as ridges respectively, are used for this study.

KOMPSAT-3 characteristics and height model generation
KOMPSAT-3 stereo pair is used for DEM generation. For this purpose, Geomatica 2013 OrthoEngine software is used. Optical Satellite Modeling is selected as mathematical modeling method. Epipolar images are created by using DEM from stereo toolbar. By using the epipolar images and the RPC data included in the KOMPSAT product, the DEM is created (Figure 13).
Then, the DSM2DEM tool on Geomatica software is used to correct a DSM into a bare earth DEM. This Figure 11. Workflow of the study.  tool uses user defined kernel (filter) sized to search for local minimum. Then, the DSM and DEM are subtracted to obtain nDSM.

GIS creation from development plans
The development plan is designed on NetCAD software. It is not possible to explicate that data with extracted models because the formats are different. In order to analyze the data, it is necessary to select the useful layers and create a GIS formatted data for further analysis. The building borders, floor numbers, and maximum allowed building heights are selected. By using the functions of ArcGIS, the neighborhood borders, building blocks are digitized into the geo-database. The datum of the plan was transformed to ITRF96 from ED50. After the transformation, an orthophoto is used to check if there was any discrepancy. The borders of the building blocks were digitized using the editing tool of ArcMap. Then, the block borders and the information corresponding to them is imported, extracted nDSM, CityGML data, development plans, and the analysis attribute table were gathered in an ArcMap project. The GIS data were imported by using the "import CAD annotation" tool. In order to associate the plans with the height models, all information on the plan is transformed into points and polylines. The average height information corresponding to every building eaves and building borders for LIDAR data and KOMPSAT-3 data, respectively, were integrated to the project. The development plans define the maximum values of the height of the building, the distances to the neighbored and the road. For the comparison of height, only the maximum height value defined as "H max (Y encok ) " is used. Every data is gathered together by using the "join" tool. The average elevation from the elevation models were calculated with the editing tools of ArcMap and formulated for comparison. Difference with the height models and development plans were put into the analysis attribute table. The difference between the LIDAR CityGML data and development plans is formulated as (2, 3): H average/CityGml refers to the average height from CityGML and H max is the planned maximum height. The difference between the KOMPSAT-3 data and the plans were formulated as: A verage/building block refers to the average building block height. The problem is, since Istanbul is a hilly and complex city, there are elevation differences seen on even one flat. One side of the flat can be on the ground level while the other side is above 2 m from the ground level. Thus, the height borders are often not defined from the bare Earth surface. The plus or minus elevation is often defined as the average of the elevations of all corners of the building. But it is the ideal case and it can change due to various topography and surveying conditions. The interpretation becomes harder due to these variants. The Istanbul Municipality produced smart development plans to overcome this problem. For this study, some of the mentioned factors were ignored for the sake of a simplified analysis.

Comparison
The maximum height boundaries taken from the development plans are compared with LIDAR CityGML data and KOMPSAT-3 nDSM. The heights of the buildings are gathered in an attribute table that includes KOMPSAT derived building heights, LIDAR CityGMLderived building heights, and their differences from the H max values in the development plans. As two different data with different resolutions and precisions are used, their interrelation is observed, then their values regarding to the differences from the maximum limits are compared. Buildings exceeding the limit are visually and statistically determined.

Results and discussion
The CityGML data provided the heights of the building eaves very precisely, and the nDSM generated by KOMPSAT images provided the average heights in the building Blocks. The area borders are taken from the city model. Because of the resolution differences, the results from two datasets implemented distinct results.
Overall, there were 2141 buildings in the neighborhood. The LIDAR data Were very effective on separating buildings from other objects with the use of special height-based algorithms. DEM generated from LIDAR data was all over more detailed then KOMSPAT elevation model. According to the development plan, the maximum height boundary for the neighborhood was 12.5 m, more or less regarding to four to five floors. When two datasets are compared with the development plan, KOMPSAT model could not productively detect the problematic areas, its results demonstrated only 1% of the buildings were exceeded the boundary, while LIDAR CityGML data exposed that 16% of the buildings were against the plan (Figure 14).
According to the LIDAR results, 348 (16%) buildings were higher than the limit of 12.5 m given in development plans for this area and 1793 (84%) buildings were in the planned boundaries ( Figure 15).
This difference might be due to the trees and other objects mixed with the buildings in the KOMPSAT DEM, as well as using an automatic algorithm for the generation of the nDSM can affect the accuracy. The triangulation process could provoke a rough average far from the real values. Although KOMPSAT panchromatic images are considered high resolution, producing a DEM reduces the spatial resolution and the parameters used for automatic DSM production can lower the detail levels of the data. LIDAR is a superior technique, in terms of detail level and accuracy. Although, too much precision can lead to misclassification, such as cars and energy transmission lines being classified as buildings. Their height values can alter the real heights and lead to wrong analysis if not corrected manually.

Conclusion
The aim of this study was to detect the structures built against the development plans of Guzeltepe district by using airborne LIDAR point clouds and testing the new stereo KOMPSAT-3 images. Firstly, point cloud classification is carried out using airborne LIDAR data. After automatic and manual editing, the buildings are extracted. Building details with different levels of details are obtained and the Outer Boundary Eave and General Roof lines are used to estimate the building heights. Secondly DEM, DSM and nDSM models are generated from stereo KOMPSAT-3 images. Development plans are converted into GIS format. All the data are combined in a GIS project and the results are compared. Overall, the buildings that are higher than the H max limit are more easily detected with the LIDAR CityGML data, though KOMPSAT data could also provide a rough estimation with the elevation models. The development and city planning disciplines are complex and have more parameters than just height information. In this study, it can be understood that it is possible to use LIDAR technology for building 3D city models as it provides really high accuracy. Also, KOMPSAT-3 can be used for estimated analysis. For a healthy and planned future for cities, the   governments are establishing new laws and regulations. Today, engineering and city planning have become inseparable. GIS integration can be used to improve inadequate infrastructure and transportation structures with full control. By tracking and comparing before and after city conditions, population estimation, and detection of new needs can be designated. In the future, when LIDAR and high-resolution satellite data become more available for every government and public establishments, a more livable and well-ordered city can be created. It is pursuable with the effort of especially city planners and engineers.