A raster-based typification method for multiscale visualization of building features considering distribution patterns

ABSTRACT In map multiscale visualization, typification is the process of replacing original objects, such as buildings, using a smaller number of objects while maintaining initial geometrical and distribution characteristics. During the past few decades, many vector-based methods for building typification have been developed, whereas raster-based methods have received less attention. In this paper, a new method for the typification of buildings with different distribution patterns called superpixel building typification (SUBT) is developed based on raster data. Using this method, buildings with different distribution patterns, such as linear, grid and irregular patterns, are first grouped by image connected component detection and superpixel analysis. Then, the new positions for building typification are determined by superpixel resegmentation. Finally, a new representation of the buildings is determined through analysis of the orientation and shape of the buildings in each superpixel. To test the proposed SUBT method, buildings from both cities and countrysides in China are applied to perform typification. The experimental results show that the proposed SUBT method can realize typification for buildings with linear, grid and irregular distributions while effectively maintaining the original distribution characteristics of the buildings.


Introduction
In map multiscale representation, building generalization aims to simplify the representations of buildings while considering the distribution, quantity and shape features, and this operation mainly includes building selection, simplification, displacement, aggregation and typification (Shen, Ai, and Li 2019b;Ai et al. 2017;Ai et al. 2015;Burghardt and Cecconi 2007;Li et al. 2004;Basaraner and Selcuk 2008). Building typification is the process of replacing original buildings using a smaller number of buildings while maintaining the initial geometrical and distribution characteristics (Regnauld 2001;Change software). In this process, the spatial position, geometrical shape, distribution pattern, and proximity relationship of buildings are usually considered and analysed (Wang and Burghardt 2019;Gong and Wu 2018;Burghardt and Cecconi 2007;Basaraner and Cetinkaya 2017;Cetinkaya, Basaraner, and Burghardt 2015).
As buildings typically possess human cultural characteristics, they tend to be orthogonal, with spatial distributions that are regular to a certain extent. For example, as shown in Figure 1, buildings along roads usually have linear distributions (Figure 1a). Additionally, in urban areas, buildings tend to display grid patterns (Figure 1b), while in some rural areas, buildings tend to be randomly and irregularly distributed (Figure 1c). Because of the multiple distribution patterns of buildings, it is difficult to design a typification method that simultaneously satisfies various distribution patterns. Thus, scholars have developed methods for the typification of buildings with a single distribution pattern combined with building pattern recognition (Wang and Burghardt 2019;Gong and Wu 2018). For example, Gong and Wu (2018) proposed a progressive method for the typification of buildings with linear distribution patterns by linear pattern detection, and Wang and Burghardt (2019) developed a mesh-based approach for the typification of buildings with grid distribution patterns.
Vectors and rasters are two basic types of spatial data representations. Vector data apply the x and y coordinates to represent spatial information, which mainly includes three types of points, lines and polygons. For example, as shown in Figure 1d, a building can be represented using a start point O and a point set A-G. However, a raster consists of a pixel matrix with regular columns and rows, and each pixel contains a value used to represent spatial information. As shown in Figure  1e, the same building can be represented with a format of regular pixels. Although many vectorbased methods for building typification have been developed, raster-based methods have received less attention. Furthermore, with the development of artificial intelligence, an increasing number of scholars have taken raster data as the basic object during the multiscale representation of spatial data. These scholars have applied neural networks (Feng, Thiemann, and Sester 2019;Kang, Gao, and Roth 2019) and computer vision technologies (Shen, Ai, and He 2018a;Shen et al. 2018b) for map generalization and representation. Thus, in this article, we develop a superpixelbased typification method for raster buildings with different distribution patterns, such as linear, grid and irregular patterns, using related computer vision technologies. This article is organized as follows. Section 2 presents the related work regarding building typification. Section 3 illustrates the typification methods for buildings with linear, grid and irregular patterns, which mainly include how to group, position and reconstruct buildings. In Section 4, buildings from both cities and countrysides in China are applied to perform typification, and the proposed superpixel building typification (SUBT) method is discussed and analysed. Section 5 describes the conclusions and future work.

Related work
In traditional building typification methods, scholars usually utilize vector data as basic processing objects. Some typical vector-based methods include mesh-based (Burghardt and Cecconi 2007;Gong and Wu 2018;Wang and Burghardt 2019), geometric (Bildirici and Aslan 2010;Bildirici et al. 2011), Gestalt theory (Regnauld 2001), self-organizing maps (Sester 2005), graph-based (Anders and Sester 2000;Anders 2005) and data matching (Li, Guo, and Liu 2005) methods. For example, Burghardt and Cecconi (2007) proposed a mesh-based method for building typification that achieves high-speed results that can be reproduced for real-time applications. In this method, building typification is divided into two steps: the positioning of buildings using Delaunay triangulation and the representation of buildings by orientation and size calculation. This mesh-based typification method was proven to effectively constrain the density of buildings and preserve basic geometrical characteristics, such as shape, semantics and direction. However, these investigators mainly used buildings with irregular patterns for experiments but ignored buildings with linear and grid pattern distributions. Subsequently, Gong and Wu (2018) and Wang and Burghardt (2019) proposed typification methods for buildings with grid and linear patterns, respectively, based on Delaunay triangulation. Based on Gestalt theory, Regnauld (2001) considered different criteria, such as density, size and shape, for grouping buildings and typified buildings by finding the minimum spanning tree. This method can produce fewer, larger and better spaced buildings than the original, and it can effectively maintain the original distribution characteristics. However, in their experiments, buildings with grid patterns were not further analysed. Considering a web mapping environment, Li, Guo, and Liu (2005) proposed an on-the-fly typification method for buildings based on data matching. This method mainly includes three steps: calculation of the building number, representation of the building shape, and calculation of the building size. In their experiment, buildings with linear distribution patterns were not discussed. Raster-based methods for building typification have received little attention. In addition to buildings, typification methods for other map elements, such as ditches (Sandro, Massimo, and Matteo 2011), drainage (Zhang 2007), and façade structures (Shen et al. 2016), have also been proposed by various scholars.
Existing building typification methods usually need to detect the distribution patterns of buildings (Mesev 2005;Zhang et al. 2013;Du, Shu, and Feng 2016;He, Zhang, and Xin 2018). For example, Zhang et al. (2013) developed two algorithms for automatically identifying buildings with collinear and curvilinear patterns based on Delaunay triangulation and minimum spanning tree technologies. Du, Shu, and Feng (2016) proposed a relation-based method for detecting building patterns from three levels by defining 169 basic relations. He, Zhang, and Xin (2018) used machine learning and graph partitioning methods to recognize building patterns. This method was proven to have the ability to identify both regular and irregular patterns. In their method, the regular building pattern was divided into three types: rectangular, curvilinear and collinear patterns, and the irregular building pattern was divided into three types: high-density, H-shaped and L-shaped patterns.
The above analysis indicates that almost all methods for building typification are based on vector data, and it is rare to find raster-based methods for building typification. However, many kinds of raster map data exist, such as scanned maps and remote sensing image maps. When handling these raster maps, traditional vector-based generalization approaches usually require more human intervention, such as data format conversion and data quality correction of the vectorization of raster images, which is not conducive for preserving the data accuracy with respect to consistency and topological relations. Moreover, the vector data structure is complex, and there are numerous ways to categorize it, such as entity and topology type, which is not convenient for data standardization, normalization and exchange. Thus, to ensure the accuracy and versatility of map data during generalization based on raster data, raster-based generalization approaches are necessary. In addition, the existing typification method usually can handle only buildings with a single distribution pattern; thus, pattern recognition (Mesev 2005;Zhang et al. 2013;Du, Shu, and Feng 2016;He, Zhang, and Xin 2018) should always be applied before typification. However, in this paper, we develop a raster-based method for building typification considering distribution patterns. This method does not require the identification of any building patterns before typification but can effectively preserve the original distribution patterns, such as linear, grid and irregular patterns.

Methodologies for building typification
Basically, building typification can be summarized in one problem: transferring m buildings to n buildings, where n , m, while maintaining the original characteristics as much as possible. During this process, three basic problems should be solved: (1) determining the buildings with relatively close distances in typification areas that should be grouped for further typification; (2) determining the number and position of buildings; and (3) determining the geometrical patterns of the buildings, including size, orientation and shape. Thus, in this paper, the methodologies for building typification are divided into three corresponding tasks: grouping buildings with connected component detection and superpixel analysis, positioning buildings using second segmentation and reconstructing buildings using multiple strategies. A flow diagram of the proposed SUBT algorithm can be found in Figure 2, in which the italic text marked in yellow represents the main calculation functions of the OpenCV software library used to realize the corresponding steps.

Grouping buildings with connected component detection and superpixel analysis
In raster data, building information is discrete. Thus, before grouping, the connected components corresponding to each building should be detected first. The detailed process for grouping buildings can be divided into the following three steps. The patterns of buildings can be divided into linear, grid and irregular (unstructured) patterns (Zhang et al. 2013). To better display the proposed method, a simple example is presented in Figure 3. The raster building data in this figure are generated from regular vector data, and we used the 'Polygon to Raster' toolbox in ArcGIS to convert vector data to raster data. The cell size used for rasterization is 0.3 meters per pixel. Figures 3(a, d and g) show the raster building data with linear, grid and irregular patterns. After the conversion of the data format, the original sizes of these three images are 1156 × 1156 pixels.

Detection of connected components
There are two types of connected components: four-connected and eight-connected components. If only the edges of pixels touch, then they are four-connected components. If the edges or corners of pixels touch, then they are eight-connected components. Many scholars have proposed various algorithms for connected component labeling (Di Stefano and Bulgarelli 1999;Suzuki, Horiba, and Sugie 2003;He et al. 2009;Grana, Borghesani, and Cucchiara 2010). In this paper, the typical method for four-connected component labeling proposed by Haralick and Shapiro (1992) is applied. Figures 3( b, e and h) show the labeled connected components of buildings using pseudocolour. Using connected component labeling, the original discrete pixels are built as multiple objects.

Grouping using superpixel analysis
In the field of map generalization, superpixel technology was first applied to simplify polygonal and linear features (Shen et al. 2018b), and the authors analysed the similarities and differences between superpixel segmentation and map generalization. Subsequently, superpixel technology has been applied to building aggregation (Shen et al. 2019a), building simplification (Shen, Ai, and Li 2019b) and collapse (Shen, Ai, and Yang 2019c). Following this direction, in this paper, superpixel technology was applied to another important generalization operator: typification. After performing connected component labeling, superpixel segmentation algorithms are used for grouping. To generate relatively regularly shaped superpixels without manually setting the compactness parameter, the zero version of the simple linear iterative clustering (SLICO) method (Achanta et al. 2012) is applied. Compactness, which is the ratio of the perimeter to the area of the object, is used to measure the shape of an object. As shown in Figure 3(e, b and h), the superpixel sizes used for the first segmentation are 20, 25 and 30 pixels, respectively. The generated superpixel tends to have a hexagonal shape in blank regions, and it presents slight shape changes in building regions. When the boundary of a superpixel and the boundary of a building connected component overlap, this superpixel and building connected component are connected. When a superpixel (marked in yellow) generated by the first segmentation connects two or more connected components of buildings, these connected components will be grouped. In Figure 3, after grouping via superpixel analysis, the linear buildings are divided into one group, the grid buildings are divided into six groups, and the irregular buildings are divided into four groups.

Construction of the typification area
This step aims to establish the specific areas used for further typification. The new positions, number and shape of buildings will be calculated in these so-called typification areas. The typification areas are first made up of the connected components (marked in pseudocolour) and superpixels (marked in yellow and blue) used for grouping. Then, the edges of typification areas are smoothed to generate the final typification areas. Figures 3(c, f and i) show the final typification areas (outlined by the solid red lines) of buildings with linear, grid and irregular patterns.

Positioning buildings using second segmentation
In this step, the number and location of new typified buildings at different levels of detail will be determined by a second segmentation procedure. In this process, the typification areas (marked in light gray in Figure 4) will be segmented multiple times by larger superpixels to generate various levels of detail. For example, in Figure 4, for three types of buildings (linear, grid and irregular), the segmentation results at two levels of detail are generated. In Figures 4(a and b), the superpixel sizes that are used for the second segmentation of linear buildings are 150 and 230 pixels, respectively. In Figures 4(c and d), the superpixel sizes that are used for the second segmentation of grid buildings are 180 and 300 pixels, respectively. In Figures 4(e and f), the superpixel sizes that are used for the second segmentation of irregular buildings are 185 and 210 pixels, respectively. The number of new typified buildings at different levels of detail equals the number of superpixels generated by the second segmentation procedure.
Then, the center of mass of each superpixel generated from the second segmentation is the new location typified from the original buildings located inside this superpixel. In Figure 4, the blue points are the center of mass of each superpixel, from which we can see that with increasing superpixel size, fewer centers of mass of superpixels are generated, which means that in one superpixel, more buildings will be typified to one building. In addition, the distribution patterns of the buildings are still preserved. For example, the distribution of the center of mass generated from linear buildings is still linear (Figures 4(a and b)), the distribution of the center of mass generated from grid buildings is still a grid (Figures 4(c and d)), and the distribution of the center of mass generated from irregular buildings is still irregular (Figures 4(c and d)).
In addition, two problems should be emphasized. First, we define a judgement rule R, namely, when a building falls into two or more adjacent superpixels, this building always belongs to the superpixel that has the most overlap with the building. Second, the typification areas defined in Section 3.1 should be separately extracted, and superpixel segmentation should be performed rather than segmenting all typification areas as a whole in one image during the second segmentation procedure.
For example, in Figure 4f, there are four typification areas marked with red boundaries. In this case, each typification area will be separately extracted, and superpixel segmentation will be performed.

Reconstructing buildings using multiple strategies
To reconstruct the typified buildings in the typification areas, three key parameters, namely, elongation, orientation and area, should be determined according to the following rules. (1) Elongation. The elongation of new typified buildings can be calculated by the following equation: where L major is the length of the major axis of the minimum bounding rectangle of the building with the maximum area and L minor is the length of the minor axis of the minimum bounding rectangle of the building with the maximum area.
(2) Orientation. The orientation of new typified buildings can be calculated by the following equation: O = a 1 O 1 + a 2 O 2 + · · · + a n O n , (a 1 + a 2 + · · · a n = 1) where a 1 , a 2 , · · ·,a n are the weighting coefficients of the orientations of buildings O 1 , O 2 , · · ·,O n respectively, calculated using the image moment (Hu 1962). According to this equation, the multistrategy orientation calculation method is applied as follows.
Average orientation: when a 1 = a 2 = · · · = a n , the orientation O is the average orientation of the original buildings located inside the corresponding superpixel generated from the typification areas.
Maximum orientation: when one of the weighting coefficients equals 1. For example, if the area of building n is the maximum, then a n = 1 and the orientation O equals the orientation of building n with the maximum area.
Weighted orientation: when the weighting coefficients are not in the above two cases.
(3) Area. The area of new typified buildings can be calculated by the following equation: A = a 1 A 1 + a 2 A 2 + · · · + a n A n , (a 1 + a 2 + · · · a n = 1) where a 1 , a 2 , · · ·,a n are the weighting coefficients of the areas of buildings A 1 , A 2 , · · ·,A n , respectively. According to this equation, the multistrategy area calculation method is applied as follows.
Average area: when a 1 = a 2 = · · · = a n , area A is the average area of the original buildings located inside the corresponding superpixel generated from the typification areas.
Maximum area: when one of the weighting coefficients equals 1. For example, if the area of building n is the maximum, then a n = 1 and area A equals the area of building n with the maximum area.
Weighted area: when the weighting coefficients are not in the above two cases. When area A is determined using the multistrategy area calculation method, a readjustment of the area can be performed. The dilation operation can be used to uniformly increase the area. During the dilation operation, the buildings should be first rotated with respect to the horizontal direction, and then the rectangular structure element should be applied. To avoid intersection, the maximum area after readjustment should be below the area of the corresponding superpixels generated from typification areas. In actual experiments, the related calculations, including the minimum bounding rectangle, image moment and basic geometric property, and drawing of typified results can be realized using the corresponding OpenCV functions. In

Experiments and evaluations
Since we need to test whether the proposed SUBT method can be applied to address buildings with different distribution patterns, such as linear, grid and irregular patterns, the principle for selecting experimental areas is to determine whether they contain enough buildings with these three types of distribution patterns. Thus, two different sets of building data from cities and countrysides in China are selected, in which the former mainly contains buildings with linear grid patterns, while the latter mainly contains buildings with irregular patterns. Different typification strategies are applied in these two different sets of building data. The original building data originate from the Gaode map of China. The Gaode map can provide tile map services. Thus, the original data can be downloaded from the internet for free according to the corresponding uniform resource locator (Zhou et al. 2019). To generate the final regular experimental data used to test the proposed SUBT method, we performed a series of automatic and manual data processing operations, including preliminary extraction according to the color values, building simplification and aggregation, and transformation of building data format using ArcGIS software.

Typification of building data from the city
In this experiment, the city building data originate from the city of Shanghai in China. Because more attention is given to architecture planning in cities, the building distribution is more regular. The west longitude extent of the experimental building data is between 121.360 and 121.377 degrees, and the north latitude extent of the experimental building data is between 31.192 and 31.202 degrees. Figure 6a shows the original building data. After data format conversion with a cell size of 0.5 meters per pixel, the size of the final raster data is 3948 × 2950 pixels. A total of 703 buildings are located in this experimental area. Most of these buildings form linear and grid patterns. Figure 7a shows the results of the grouping of buildings using superpixel analysis, and 30 building groups were generated. The corresponding construction result of 30 typification areas is shown in Figure 7b. The superpixel size of the first segmentation used for generating typification areas is S 1 = 23.
Since the number of SLIC superpixels depends on the shape and resolution of buildings, a specified number of superpixels cannot be generated by setting a fixed superpixel size. However, a target map scale within a threshold range can be calculated according to Töpfer's radical law (Töpfer and Pillewizer 1966): In this equation, S Tar and S Ori are the denominators of the target and original map scales, respectively, and N Tar and N Ori are the number of buildings at the target and original map scales, respectively. The original scale of the building data is 1:5000. According to the generated number of buildings in Table 1, the corresponding display scales of the typification results at two levels are 1:20000 and 1:60000. The superpixel sizes of the second segmentation used for positioning buildings at two levels are S 2 = 90 and S 2 = 120. During typification, the two special cases for large buildings should be handled separately.
The first problem is the handling of local abnormally large buildings according to the following optional typification strategies, which are designed for preserving the global uniformity of the distribution of typified buildings.
(1) Strategy for merging superpixels. Assuming that N x is the number of buildings that belong to the same superpixel x after the judgement of rule R defined in Section 3.2, then N x ≥ 0.
When a single building falls into two different superpixels A and B and the number of buildings contained in superpixels A and B equals 1, namely, N A + N B = 1, these two superpixels A and B can be merged into a single superpixel. An example can be found in Figure 8. The building marked in blue falls into two superpixels that do not contain any other buildings after the area calculation decides which buildings belong in the superpixels. Thus, these two superpixels can be merged into one superpixel, the corresponding number of typified buildings decreases by one, and the number of centers of mass is reduced from two (marked in red in Figure 8a) to one (marked in blue in Figure  8c). As shown in Figure 8b, when this merging strategy is not applied, the location of the typified building has a larger displacement that affects the uniformity of the global distribution of buildings. Using this strategy, the global uniformity of the distribution of typified buildings can be well preserved, as shown in Figure 8d.
(2) Strategy for building decomposition. When a single building falls into three or more different superpixels A, B, C, · · ·, and the number of buildings contained in these superpixels A, B, C, · · · equals 1, namely, N A + N B + N C + · · · = 1, this building can be divided into several parts by the superpixels. An example can be found in Figure 9. Figure 9a shows an example of the original building data. After a second segmentation procedure, the building marked in red falls into three superpixels, and there are no other redundant buildings that fall into these three superpixels. Thus, this building can be optionally divided into three parts (marked in green in Figure 9b) according to the boundaries of the three superpixels. Figures 9b and 9c show the typified results with and without the strategy of building decomposition, respectively. Using this typification strategy, the original building distribution pattern can be well preserved.
The second problem is the avoidance of intersections. Sometimes, adjacent typified buildings may generate intersections. Thus, the erosion operation can be applied. During the process of erosion, to maintain the orthogonal characteristics of the typified buildings, the buildings should first be rotated with respect to the horizontal direction. Then, the rectangular structure element used for erosion   should be applied. Finally, the buildings should be rotated to the original direction. An example can be found in Figure 10. Figure 10a shows the original buildings and the red typification areas. Figure  10b shows the typified results with a building intersection. Figure 10c shows the typified buildings after the erosion operation, from which we can see that the building marked in orange is shrunk to the inside of the corresponding superpixel. For some long and narrow buildings, erosion operations may lead to the disappearance of buildings. In this case, the buildings should gradually adjust their length and width while preserving the original area to avoid intersections. Since most buildings in the experimental area form linear and grid patterns, the optional typification strategies, the average area and average orientation typification strategies are applied to the building data from Shanghai. To evaluate the validity of the typification model, the inconsistency before and after typification (Gong and Wu 2018) can usually be measured by geometry changes. We analysed the influences of  various numbers of dilation operations on geometric changes of buildings during typification, which can be found in Table 1. From Table 1, without the dilation operation for enlarging typified buildings and with an increasing level, the reduction in the area (from 42.4% to 61.7%), perimeter (from 50.3% to 68.9%) and number of buildings (from 49.9% to 69.4%) increases. When the dilation operation is applied to enlarge the typified buildings at level 1 and the number of dilations increases, the reduction in the area (from 35.2% to 28.8%) and the perimeter (from 48.8% to 46.7%) decreases. When the dilation operation is used to enlarge the typified buildings at level 2 and the number of dilations increases, the reduction in the area (from 57.6% to 53.3%) and the perimeter (from 67.6% to 66.5%) decreases. Compared with level 1, level 2 has a more reduced area and perimeter because of the reduction in the number of buildings.
The relative distribution density of buildings calculated by overlaying circles with a certain area centered at each pixel point from the city before and after typification can be found in Figure 12. The yellow and blue parts represent high-and low-density areas, respectively. Compared with the original distribution density, the overall relative distribution density trends at the two typified levels are generally consistent. The high-density areas still tend to have a high density, and the low-density areas still tend to have a low density, as shown in the relatively high-density regions marked with red ellipses and the relatively low-density regions marked with black ellipses.
To quantifiably compare the relative density before and after typification, we counted the average density of all typification areas in Figure 7 according to the density level of each pixel that consists of buildings. Assuming that d is the relative density value, we divided the average density values into three levels: high (0.70 ≤ d ≤ 1.0), middle (0.3 , d , 0.7) and low (0 ≤ d ≤ 0.3). We counted the numbers and percentages of typified areas that possessed the same density level between the original and typified scales to measure the changes in relative distribution density. The density consistency information can be found in Table 2, which shows that the percentages of consistent density at the two typification levels are up to 86.7% and 90.0%, which means that the relative distribution density is well maintained during typification.
Some typical examples of typification using various area and orientation strategies can be found in Figure 13. In Figure 13, the gray buildings are the original buildings, and the blue buildings are the typified buildings. Figures 13(a and b) show the typification results using the average area and the maximum orientation strategies and the average area and average orientation strategies, from which we can see that the area, number and distribution of typified buildings in these two subfigures are the same, while only the orientation is different, as marked with the red arrow. In cartography, when a building is very important, it may be emphatically displayed. For example, if the building marked in green in Figure 13c is Jin Mao Tower in Shanghai, which is a landmark building, then the typification strategy of the maximum area used in Figure 13c is better than the typification strategy of the average area used in Figure 13d. Using the multistrategy typification procedure, the semantic feature can be well considered, and cartography options can be flexibly and conveniently provided.

Typification of building data from the countryside
In this experiment, the countryside building data are from the suburbs of Wuhu City in China. Because less attention is usually given to architecture planning in the suburbs, the building distribution is more irregular. The west longitude extent of the experimental building data is between 118.412 and 118.422 degrees, and the north latitude extent of the experimental building data is between 31.400 and 31.412 degrees. Figure 12. Density comparison before and after typification. Figure 14 shows the original building data. After data format conversion with a cell size of 0.5 meters per pixel, the size of the final raster data is 2856 × 2368 pixels. There are a total of 142 buildings in this experimental area. Most of these buildings form irregular patterns. Figure 15a shows the results of grouping the buildings using superpixel analysis, and 6 building groups were generated. The corresponding construction result of the 6 typification areas is shown in Figure 15b. The  Figure 13. Typification using different strategies.
superpixel size of the first segmentation used for generating typification areas is S 1 = 50. The original scale of the building data is 1:5000. According to Töpfer's radical law and the generated number of buildings in Table 3, the corresponding display scales of the typification results at three levels are  1:20000, 1:60000 and 1:120000. The superpixel sizes of the second segmentation used for positioning buildings at three levels are S 2 = 130, S 2 = 180 and S 2 = 230. Figures 16(a, b and c) show the typified buildings at these three levels using the maximum area and maximum orientation strategies, from which we can see that the proposed method can effectively realize the typification of buildings with irregular distribution patterns. As the level increases, the number of original groupings deceases, and the original irregular distribution patterns can be well preserved.
The influences of various numbers of dilation operations on geometric changes of buildings during typification can be found in Table 3. From Table 3, without dilation for enlarging typified buildings and with an increasing level, the reduction in the area (from 21.4% to 44.8%), perimeter (from 46.1% to 71.9%) and number of buildings (from 50.0% to 79.6%) increases. When the dilation operation is used to enlarge the typified buildings at level 1 and the number of dilations increases, a reduction in the area (from 13.0% to 6.5%) and perimeter (from 45.4% to 44.1%) decreases. When the dilation operation is used to enlarge the typified buildings at level 2 and the number of dilations increases, the reduction in the area (from 26.8% to 17.6%) and perimeter (from 61.6% to 59.8%) decreases. When the dilation operation is used to enlarge the typified buildings at level 3 and the number of dilations increases, the reduction in the area (from 30.9% to 23.6%) and perimeter (from 69.2% to 67.9%) decreases. With increasing levels, the area and perimeter decrease because of the reduction in the number of buildings.
Compared with the experimental results of city building data, the area change at the same typified level 1 using the building data from the suburbs of Wuhu (13.0% and 6.5%) is observably less than that from the city of Shanghai (35.2% and 28.8%), even if the same number of dilations were used. This finding is because the typification strategy used for the building data of Shanghai is the average area and the typification strategy used for the building data of Wuhu is the maximum area. As shown in Figure 17, the subregion comes from the original data marked in the red circle. The area differences between the individual buildings are relatively large. Additionally, the average area strategy leads to a relatively large reduction in the area, whereas the maximum area strategy leads to a relatively small reduction in the area.
The relative distribution density of buildings from the countryside before and after typification can be found in Figure 18. The yellow and blue parts represent high-and low-density areas, respectively. Compared with the original distribution density, the overall relative density distribution trends at the three typified levels are generally consistent, as shown in the relatively high-density regions marked with red ellipses and the relatively low-density regions marked with black ellipses. The density consistency information during typification can be found in Table 4, which shows that the percentages of consistent building density at three typification levels are up to 100%, 83.3% and 83.3%. Although the density levels of the buildings in one typification area are inconsistent at levels 2 and 3, the corresponding average density values are all approximately 0.3, and the density differences between the typified and original buildings in the typification area at both levels are less than 0.15, which means that the relative distribution density is well maintained during typification.

Comparison with the vector-based typification method
To show the advantages of the proposed raster-based SUBT method, we compared it with a vectorbased generic building typification method proposed by Bildirici et al. (2011). Although some other typification methods have been proposed in recent years, such as the methods proposed by Gong and Wu (2018) and Wang and Burghardt (2019), they have strict requirements in terms of regular spatial patterns and are limited to certain fixed patterns, such as linear (Gong and Wu 2018) and grid (Wang and Burghardt 2019) patterns, making them less comparable to our method than the method proposed by Bildirici et al. (2011). In Bildirici's method, a length typification method (LT) and an angle typification (AT) method are developed for generic buildings, which can maintain the shape and coverage of building groups. Figure 19 shows the comparisons of typification results between vector-based LT and AT methods and the proposed raster-based SUBT method. The original buildings are displayed in Figures 19(a, e and i). Although the vector-based LT and AT methods can generate encouraging results in different cases, as shown in the regions marked with green boxes in Figures 19(b-c and f-g), they tend to result in gaps among building groups, such as the regions marked with red cross-shape labels in Figures 19(b -c, f-g and j-k), and the original spatial distribution pattern, such as linear, grid and irregular patterns, cannot be well preserved. However, the proposed raster-based SUBT method can effectively avoid generating gaps and maintain the original spatial distribution patterns, such as the linear, grid and irregular patterns in Figures 19(d, h, and l).  We also quantifiably compare and analyse the vector-based LT and AT methods and rasterbased SUBT method. The evaluation indicator used to measure the inconsistency before and after typification (Gong and Wu 2018) can be calculated as follows: denotes the xth indicator value in indicator set I before typification and I x A denotes the xth indicator value in indicator set I after typification. The lower the R x value is, the better the consistency and the better the typification results. In this study, set I contains three indicators, namely, sum area, global distribution density and range (Wang et al. 2017;Gong and Wu 2018).
The comparison results of these three indicators between the vector-based (LT and AT) and proposed raster-based SUBT methods can be found in Table 5, from which we can see the following. First, the area changes during typification generated from the SUBT method are significantly lower than those of the vector-based LT and AT methods. For example, at level 1 for the typification of buildings from the countryside, the area changes of LT and AT are 0.611 and 0.500, respectively, while the area change of the SUBT method is only 0.065. This is mainly because the SUBT applies the dilation operation to reduce area changes. Second, the differences in density changes between the vector-based (LT and AT) and raster-based SUBT methods are not very large, while the latter performs better. For example, at level 1 in the typification of buildings from the city, the density changes of LT and AT are 0.536 and 0.617, respectively, which is less than ten percent larger than that of the SUBT method. Finally, the differences in range changes between the vector-based (LT and AT) and raster-based SUBT methods are close. Sometimes, the range changes of SUBT are slightly higher than those of the vector-based method. For example, at level 1 for the typification of buildings from the countryside, the range change of LT is 0, while the range change of the SUBT method is 0.002. Sometimes, the range changes of SUBT are slightly lower than those of the vector-based method. For example, at level 1 for the typification of buildings from cities, the range change of AT is 0.016, while the range change of the SUBT method is 0.001.

Parameter settings and strategy selection
There are two main parameters in the proposed SUBT method: the superpixel size (S 1 ) of the first segmentation used for generating typification areas and the superpixel size (S 2 ) of the second Figure 19. Comparisons between the vector-based and proposed raster-based SUBT methods. segmentation used for positioning buildings. The parameter S 1 has a decisive influence on the result of building grouping. When S 1 is too small, it cannot generate sufficiently large typification areas and is unable to achieve the effect of typification. For example, Figures 20(a and b) show the grouping results of original buildings using a small S 1 = 13 and a large S 1 = 53, respectively, while the appropriate grouping results can be found in Figure 7a when S 1 = 23. Figures 20c-f show the contrasting results of typification using these three values. When S 1 is too small, such as S 1 = 13, some scattered buildings cannot be effectively typified, such as the buildings marked with blue boxes in Figure 20d, while this phenomenon can be avoided when S 1 = 23, as shown in Figure 20c. When S 1 is too large, it may cause the overlapping of buildings and other geographical elements, such as main roads. For example, in Figure 20f, when S 1 = 53, the typified building marked with a blue arrow overlaps with the main road, while a suitable value can avoid overlapping phenomena, such as the typified results in Figure 20e when S 1 = 13. For the parameter S 2 , its size can be determined according to Töpfer's radical law described in Section 4.1. For a given target scale, a corresponding fixed superpixel size S 2 can be automatically determined to obtain the corresponding number of typified buildings. In actual applications, when setting the parameters S 1 and S 2 , we recommend the following rules. Assuming that w a is the width of the main road, w b is the width of a secondary road one grade below the main road, A t is the area of all typification regions, where w a , w b and A t are defined in units of pixels, S Tar and S Ori are the denominators of the target and original map scales, respectively, and N Ori is the number of buildings at the original map scale, then, the preliminary parameter values of S 1 and S 2 can be set as follows: Figure 20. Impact of parameter settings on the typification results.
To obtain a satisfactory typification result, we can fine-tune these preliminary values of S 1 and S 2 . The rules for setting the values of S 1 and S 2 are not applicable in all cases, especially in the case of isolated buildings.
In the proposed SUBT method, we designed different typification strategies. In the countryside, since the construction of buildings is not uniformly planned, the size of the building area varies greatly. If the average area strategy is used, there will be a great visual difference. That is, there are obviously large buildings in the building group, but the typification building is very small, such as the typified results in Figure 17a. In this case, the maximum area strategy is selected. In cities, since the construction of buildings is usually uniformly planned, the changes in building area or orientation are small. In this case, the average strategy is a better choice because it can represent an average level of area or orientation for a cluster of buildings while maintaining a good visual effect. However, it is not a mandatory step to distinguish city buildings from countryside buildings since the typification strategies are mainly recommended to obtain better typification results. According to this principle, we provide the following recommendations for selecting typification strategies when using the proposed SUBT method.
. If there are small area differences between buildings, the average area strategy is recommended. . If there are small orientation differences between buildings, the average orientation strategy is recommended. . If there are large area differences between buildings, the maximum area strategy is recommended. . If there are large orientation differences between buildings, the maximum orientation strategy is recommended.
In addition, these strategies can be used flexibly according to different cartography scenarios. For example, the average area strategy may be a better choice if the maximum area strategy causes many overlaps with other map elements.

Conclusions
This study proposes an innovative approach for building typification. The existing typification method can usually handle only buildings with a single distribution pattern; thus, building pattern recognition is required. However, our study focuses on the raster-based method for building typification using related computer vision technologies, in which different distribution characteristics can be well considered. In the proposed method, buildings with different distribution patterns, such as linear, grid and irregular patterns, are first grouped by image connected component detection and superpixel analysis. Then, the new positions of the buildings in the typification areas are calculated using a second segmentation procedure. Finally, a multistrategy representation of buildings is generated by the analysis of the orientation and shape of the buildings in the typification areas. The following conclusions can be drawn.
(1) The proposed method can effectively realize multistrategy typification, such as the average and maximum orientation, the average and maximum area, and the strategy of building decomposition. Thus, the geometric and semantic features of buildings can be well considered, thus providing flexible options for cartography.
(2) Compared with vector-based LT and AT typification methods, the proposed SUBT method performs better in preserving the distribution patterns (linear, grid and irregular), density and geometry features of buildings, while the distribution range cannot be maintained as well as the vector-based methods.
Some limitations and improvements of the proposed method should be mentioned. First, a second segmentation procedure is used in the proposed method, and many adjustments are required to determine the optimal segmentation parameters at different levels of detail. Thus, how to automatically determine the optimal segmentation parameters according to the size, direction and distribution of the original buildings is a further research direction. Second, during grouping, only geometric proximity is considered, and other aspects, such as semantic and directional features, should be considered. Moreover, scholars have also considered the shapes of typified buildings, such as L-shaped and U-shaped buildings (Stoter et al. 2010), during typification, while the proposed method cannot achieve these shapes. Finally, the topological relationships with roads, river systems and other elements are not fully considered in the proposed method. These relationships can be explored in the future.