Layout graph model for semantic façade reconstruction using laser point clouds

ABSTRACT Building façades can feature different patterns depending on the architectural style, functionality, and size of the buildings; therefore, reconstructing these façades can be complicated. In particular, when semantic façades are reconstructed from point cloud data, uneven point density and noise make it difficult to accurately determine the façade structure. When investigating façade layouts, Gestalt principles can be applied to cluster visually similar floors and façade elements, allowing for a more intuitive interpretation of façade structures. We propose a novel model for describing façade structures, namely the layout graph model, which involves a compound graph with two structure levels. In the proposed model, similar façade elements such as windows are first grouped into clusters. A down-layout graph is then formed using this cluster as a node and by combining intra- and inter-cluster spacings as the edges. Second, a top-layout graph is formed by clustering similar floors. By extracting relevant parameters from this model, we transform semantic façade reconstruction to an optimization strategy using simulated annealing coupled with Gibbs sampling. Multiple façade point cloud data with different features were selected from three datasets to verify the effectiveness of this method. The experimental results show that the proposed method achieves an average accuracy of 86.35%. Owing to its flexibility, the proposed layout graph model can deal with different types of façades and qualities of point cloud data, enabling a more robust and accurate reconstruction of façade models.


Introduction
Advanced three-dimensional (3D) building models, along with detailed and computable information, play an important role in urban planning, thermal performance evaluation, and virtual reality, among other applications (Li, Wang, and Jiang 2021). In such building models, semantic façade models are key constituents. Unlike 3D solid models, which are solely used for visualization, semantic façade models indicate the location and label of each façade element and help identify the geometric and topological relationships among these elements. In this regard, the task of reconstructing a semantic façade model is considerably more difficult than semantic segmentation, which automatically extracts semantic façade entities from images or laser point clouds but does not describe their relationships (Salas 2020;Shan et al. 2020) To establish a common standard of storage and exchange for this type of model, the Open Geospatial Consortium introduced the CityGML (Gröger and Plümer 2012). Detailed façade information can be defined by Levels of Detail 3 (LoD3), which include the appearance, 3D geometry, and topology of façade entities.
The difficulty in automatically reconstructing semantic façade models can be ascribed to two factors. First, façade observations are often noisy, leading to a rough result of semantic entity extraction. Datadriven methods, such as dynamic programming (Cohen, Schwing, and Pollefeys 2014), conditional random field (Gadde et al. 2017), Restricted Boltzmann machine (Fathalla and Vogiatzis 2017), are difficult to completely eliminate the interference of noise, resulting in the detection results of facade elements with fuzzy boundaries and missing details. Even when using state-of-the-art deep learning methods, their robustness remains unreliable (Qi et al. 2017;Su et al. 2018;Zhang et al. 2019;Hensel, Goebbels, and Kada 2019). Recently, researchers have been attempted to improve the problems from the perspective of multi-source data integration. Lin et al. (2019) use thermal infrared imager to obtain heat distribution of facade. With combination of image features, a high robustness facade element extraction model is developed. The use of thermal imager equipment, requires a large of labor and time costs in large-scale modeling. By improving the distortion correction method of panoramic images, Zhu et al. (2020) excavate reliable information of building facades from panoramic photos. However, this method does not fundamentally solve noisy problem in images.  take advantage of regularized arrangement of facade elements in street-view images, and transform the alignment of facade elements into binary integer programming, which optimizes the position accuracy of facade elements and improves computational efficiency. In fact, facade element extraction still relies on deep learning method, such as YOLO v3 (Redmon and Farhadi 2018), which cannot solve the occlusion problem. Wu et al. (2020) collected street-view images from search engines, social media, and mobile phones, and extracted facade information by constructing image point cloud. This method lacks the ability to extract semantic information on facades.
The second factor is the complexity and diversity of the façade structures themselves, which makes it difficult to describe façades structure with different styles in a flexible manner. Existing methods such as those based on façade grammar are often unable to derive façade structures. (Martinovic and Van Gool 2013;Weissenberg et al. 2013;Gadde, Marlet, and Paragios 2016;Dehbi et al. 2017). This is because the above methods have a consistent understanding of facade structure, that is, the facade structure is a combination of simple geometric elements. In fact, whether from the perspective of architectural construction or architectural design, the structure of facade is not determined by simple geometric or topological rules. It is affected by the design expectation of client, the style of architectural designer and the environmental conditions. The excavation of structure characteristics should not be based on the surface morphology, but should be combined with the causes of facade structure, the implied esthetic and people's cognitive law.
In this regard, we propose a novel façade structure description method, i.e. facade layout graph model. The proposed approach is based on Gestalt principles (Koffka 1935;Schwartz and Krantz 2017) and the Principle of architectural form (Flemming 1990;Doersch et al. 2012;Jennath and Nidhish 2016). Based on Gestalt principles, visually similar elements (i.e. those with similar shapes, labels, and sizes) can be grouped as a cluster. This cluster then serves as a node of the graph. Moreover, the Principle of architectural form enables the theoretical interpretation of architectural knowledge and the esthetic design of façade structures. In this method, considering the application of topological graphs, we use a few parameters to represent the attributes of semantic entities (through nodes) and the topological relationships among them (through edges). Moreover, nodes formed using Gestalt principles can represent multiple entities by sharing attributes. Façades must be designed considering human suitability and environmental factors, such as lighting and energy saving, in order to ensure appropriate architectural functionality. Moreover, codes and directives have been stipulated to control the shape and size of façade elements in most countries or organizations (e.g. (MHCLG 2019)). We set the parameter domains based on these codes and directives.
As the primary contribution of this study, the proposed method deduces and reconstructs semantic facade model with low quality of data and complicated façade structure. By integrating the parameterized layout graph model using prior knowledge, we apply simulated annealing (Metropolis et al. 1953) to achieve non-convex optimization within a high-dimensional space. Moreover, Gibbs sampling coupled with simulated annealing (Geman and Geman 1984) is used to obtain a new candidate solution within the appropriate parameter domain.
The remainder of this paper is organized as follows. In Section 2, we introduce and discuss existing semantic façade reconstruction methods and related works. Subsequently, we present the detailed construction of the proposed layout graph model in Section 3.1, and the corresponding parameter configuration in Section 3.2. The optimal layout of the point cloud inference algorithms is presented in Section 3.3. The experimental results and analyses are discussed in Section 4. Lastly, the conclusions of this study are presented in Section 5.

Related work
Several studies have focused on semantic façade reconstruction. Instead of using machine learning methods to improve semantic segmentation accuracy, we reconstruct façade structures by exploring the geometric and topological relationship of the façade structure.

Grammar designing
Methods based on façade grammar, which include split grammar (Wonka et al. 2003), CGA shape grammar (Müller et al. 2007), and developed formal grammar Brenner 2006, 2009), are closely related to the proposed methodology. The advantage of façade grammar is that it can describe a façade structure by designing a set of production rules (Alegre and Dellaert 2004;Becker 2009), benefiting from the fact that façades, being man-made objects, have regular shapes (rectangular, circular, triangular, etc.) and repetitive patterns. Using façade grammar, a façade structure can be split into numerous basal tiles using a split line, and each basal tile contains the semantic label and shape of the façade element. Horizontal and vertical production rules are used to derive these split lines. For example, the rule floor À > windowjremain floor; 0 v 0 ; 85; 30 ð Þ f gimplies splitting the left-hand non-terminal symbol "floor" into "window" and "remaining_floor" with a vertical split line "v" at the position (85,30). Either the optimization method or the learned probability distribution should govern the selection of the production rules and splitting positions when reconstructing a semantic façade model. Ripperda and Brenner (2006) transformed façade grammar into the form of a parameter set and adopted the reversible-jump Markov chain Monte Carlo method to search for optimal combinations of the production rules from high-dimensional parameter spaces. This method is considerably time-consuming owing to the large parameter spaces involved. To reduce computational complexity, Koutsourakis et al. (2009) used a Markov random field to guide the optimization process. This method uses a sequence of three rules -an extrusion, a vertical split operator, and a horizontal split operator -to decompose formal grammar into graphical models. Thereafter, efficiency is improved using intrinsically decoupling optimization. Teboul et al. (2011) employed the hierarchical Markov decision process to phrase 1D grammar parsing problem. Reinforcement learning (RL) can be used sequentially to determine the optimal solution of rule combinations. Riemenschneider et al. (2012) used irregular grids to parse complex shape grammar and the pixel-wise classifier to extract the position of label transitions; thus, an initial irregular grid could be determined. Using the Cocke-Younger-Kasami (CYK) algorithm, the rationality of attribute assignment for each grid can be ensured. Cao et al. (2017) improved the efficiency and accuracy of the procedure of solving the Markov decision process using highlevel topology optimization. The abovementioned optimization methods function efficiently under the premise of designing specific grammar rules to achieve precise reconstruction. However, the designed production rules need to be simple because the optimization process typically necessitates large computing times to ensure convergence.

Learning grammar
The concept of learning grammar production probability was further developed to obtain more flexible grammar rules. This concept is extended from the field of natural language processing. As this method requires a treebank, it is necessary to convert façades with ground truth into parsing trees in order to form datasets; this is a heuristic method for façade parsing. Martinovic and Van Gool (2013) used the Bayesian model merging method to refine grammar parsing trees using labeled façade images. Repeated and redundant grammar rules are reorganized such that trees with a clear hierarchy can be created depending on different façade styles. Weissenberg et al. (2013) proposed a method for the automated learning of façade grammar; this method involved three steps: compression, comparison, and virtual façade synthesis. A minimum description length was used to ensure the validity of synthetic grammar. Gadde, Marlet, and Paragios (2016) used RL to analyze façade grammar, using which a less arbitrary and more systematic parsing tree could be generated. Dehbi et al. (2017) used statistical relational learning to automatically learn weighted attribute context-free grammar, which enforces the robustness of reconstructing complex façades. However, this method is limited by the façade style selected in the training set; as it is a supervised learning method, datasets containing different façade styles cannot be mixed.

Structure deducing
Approaches for deducing façade structures have also been proposed. For example, Zhang et al. (2013) reconstructed façade structures by leveraging their symmetry. Fan et al. (2015) proposed a layout grid for describing a façade structure, and Liu et al. (2019) used the Kronecker Product to model the repetitive patterns of façades. Recently, Li et al. (2020) used several rules to improve the geometric correctness of reconstructing façades from unstructured 3D point clouds. Apart from the abovementioned approaches, methods for detecting façade openings have also been proposed (Zolanvari, Laefer, and Natanzi 2018;Xia and Wang 2019). However, despite being able to achieve façade reconstruction, these approaches do not adequately account for the structural differences in façades; in these methods, façades are reconstructed solely from selected datasets.

Graph model
The proposed method focuses on the geometric and topological relationships among façade elements, which are represented using a compound graph. This approach is partially inspired by the attribute parsing graph (Han and Zhu 2005;Schmittwilken et al. 2009). In an attribute graph, each element is indexed by the spatial relations among surrounding elements. By defining bottom-up generation rules, the entire structure can be deduced from elements in a Bayesian framework. However, validated applications of such attribute graphs involve consistent shapes and structures, such as the detection of Manhattan world structure (Liu, Zhao, and Zhu 2018) and human pose estimation (Park, Nie, and Zhu 2017). For this task, façade structures need to be described in a more flexible manner due to their complexity and variety. Xiong, Elberink, and Vosselman (2014) and Xiong et al. (2015) introduced a flexible roof topological graph for reconstructing building roofs, which is similar to the method proposed herein. The proposed layout graph model, based on the abovementioned works, is introduced in Section 3.

Layout graphs for reconstructing semantic façade models
This study aims to reconstruct a semantic façade model using a point cloud. Given the unevenness of façade point clouds, we implement reconstruction using a novel framework ( Figure 1). Under this framework, we first segment façade components using the RANdom SAmple Consensus (RANSAC) method (Fischler and Bolles 1981). Similar to the works of Tuttas and Stilla (2013) and Nguatem, Drauschke, and Mayer (2014), coarse outlines of windows and doors were extracted by detecting edges and the direction of inliers. In our experiment, the coarse outlines of balconies could also be extracted by setting a direction of outlier. This was possible because windows are often indentations in the façade, whereas balconies protrude from the façade. To measure the coherence between components and the layout graph model, we project the refined façade structure onto a 2D plane. Depending on the size of the façade, three initial layout graph models of the window, balcony, and door are generated randomly. These models should conform with prior knowledge determined using the Principle of architectural form. By integrating these initial models with the extracted components, we can obtain the joint-prior probability and likelihood corresponding to the three components. On this basis, the problem of searching for an optimal configuration of the layout graph model can be formulated as Maximum A Posteriori (MAP) estimate Owing to the high-dimensional parameter space, instead of performing the MAP calculation directly, optimization is first carried out through simulated annealing coupled with a Gibbs sample, which corresponds to optimization within the Bayesian frame.

Layout graph model
Owing to its simple and flexible characteristics, we use the graph model to describe façade structures. The classical graph model is defined as where V denotes the vertex of the graph, which is composed of objects with similar attributes, and E is the edge formed by connecting two vertices, which represents the geometric and topological relationship between these vertices (Kilgour and Hipel 2005). From the perspective of building reconstruction, a few methods applying this graph model to describe repetitive building components have been proposed. For instance, by considering each facet on the roof to be identical, a topological roof graph can be created Xiong et al. (2015). In addition, some studies (Fan et al. 2015;Li et al. 2020) have described façade structures using topological graphs, focusing solely on basic façade elements such as windows. However, deriving façade structures using these methods is significantly complicated, because several constraints such as horizontal alignment, vertical alignment, array arrangement, and symmetry need to be satisfied.
Façades feature hierarchical structures depending on the Principle of architectural form. For example, the floor has a higher level than windows or balconies when considered as a semantic entity. Thus, when specifying the location of a window, we typically first specify the floor on which it lies. In this regard, the conventional graph is inadequate in terms of depicting hierarchical structures in the façade. Thus, we introduce a hierarchical graph model representing large and complicated structures; such graphs are typically termed as compound graphs (Sugiyama and Misue 1991;Dogrusoz et al. 2004). The advantage of a compound graph is that it describes a graph model that employs a node to represent a sub-graph. We exploit a compound graph model to describe the hierarchical façade layout and call it the façade layout graph model.
For the façade in Figure 2, we have drawn a façade layout graph model with a two-level compound graph. The first level graph is termed Top Layout Graph (TLG) as it represents the floor layout. When elements on adjacent floors are similar, they can be grouped into a cluster according to the "Law of Proximity", included in Gestalt principles. For example, the red node in Figure 2 represents the first three floors with the same attributes, and the green node represents the last two similar floors. Thus, a TLG is formed, and we denote it as G t ¼ < V t ; E t > , where V t is a set of nodes; in this case, it contains two different nodes. The attributes of a node in V t can be "sf", "wf", "hf", "hgap", "wb", and "DLG" detailed descriptions are presented in Figure 2. E t represents the geometric relationship among the nodes in V t . In particular, it refers to the connection of different floors, which can be reflected in the 3D shape of a façade.
The second level graph is called the Down Layout Graph (DLG), representing the layout of façade elements. Each node in this graph can be generated from an element cluster consisting of one or more façade elements. The determination of an element cluster can be based on the "Law of Proximity" or the "Law of Similarity", included in Gestalt principles. The edge in the DLG is represented by a collection of intra-and inter-cluster spacings. Accordingly, the DLG can be represented by n" represents the number of nodes. The attributes of each node consist of a tuple of three items, such as v n ¼ 0 sw 0 ; 0 ww 0 ; 0 hw 0 ð Þ, as shown in Figure 2. An edge is composed of two items, namely where "d_intra" have two types of values 0 and d 2(n-1) , which represents the intra-cluster spacing of the n-th node, and "d_inter" has one value d (2n-1) , which is the inter-cluster spacing between two nodes connected by an edge. It should be noted that each cluster has an intra-cluster spacing. When there is only one element in a node, its intracluster spacing is set to 0.
To describe a façade structure in the form of a layout graph model, we parse it using a top-down process, as shown in Figure 3. First, we segment the refined façade into three components: windows, balconies, and doors. Considering the intrinsic connection of windows and balconies, we analyze their layout based on a potential sequence, such as windows→bal-conies→doors. Accordingly, the layout of windows can be parsed first. In this example, we parse the layout via visual interpretation. Thereafter, a TLG with two nodes can be generated according to the two types of floors, which are surrounded by the red and green rectangles. Each node in TLG possesses unique attributes, except for the geometric attributes (i.e. "wf", "hf", "wb"); the sub-graph, i.e. the DLG, is also unique. Therefore, we only need to analyze the elements of one floor in a node of the TLG and the others remain the same, similar to the DLG in Figure 3. The two different DLGs correspond to the two nodes in the TLG. The parsing of balconies is based on the introduction of windows.
To ensure that each façade layout graph has a unique form, we construct the nodes in the graph by following Gestalt principles. Two necessary principles are formulated to automatically implement the construction of nodes based on Gestalt principles: Principle 1: When all the elements in a DLG are similar to each other, each edge in this DLG must satisfy the condition that the intra-cluster spacing, "d_intra", is less than the adjacent inter-cluster spacing, "d_inter". Thus, elements in proximity can form a cluster, conforming to the "Law of Proximity." Two examples are shown in Figure 4(a and 4(b)).
Principle 2: Elements with different widths or heights are treated under another condition, where the shape factor takes precedence over the spacing factor when grouping elements, because only elements with the same shape can express their attributes through a node. Two typical examples are depicted in Figure 4(c) and 4(d).
To illustrate the difference between "d_intra" and "d_inter", we use polylines to represent the edges in the graph, where a segment has a greater height,   Figure 4(a), w 1 and w 2 can be clustered to form V 1 . This is because, when d 0 < d 1 , w 2 tends to form a cluster with w 1 rather than w 3 . The spacing between w 3 and w 4 is equal to that between w 4 and w 5 , suggesting that the three elements -w 3 , w 4 , and w 5 -can be clustered to form V 2 . In Figure 4(b), a special node V 1 is formed by a single element w 1 when d 0 ¼ 0. Based on the edges in Figure 4(a) and 4(b), it is evident inter-spacings are always greater than nearby intra-spacings when the elements are similar. Figures 4(c) and 4(d) illustrate a special case where the widths of the elements in one layer are different; thus, additional clusters are needed to record their attributes. The nodes V 1 and V 3 in Figure 4(c) and V 1 and V 2 in Figure 4(d) only contain one element; therefore, their intra-cluster spacings are 0.

Model parameters
The layout graph model is described by a parameter set Θ, which contains the attributes of the nodes and edges in the TLG and DLG; some configurations of the parameter set Θ are shown in Figure 2. To generate a new layout graph model, in addition to the parameters mentioned above, we also need parameters that can describe the graph structure, such as the number of nodes in the TLG and DLG. By randomly configuring the parameter set Θ, we can obtain a large number of façade layouts. However, some incorrect and unreasonable façade layouts can also be constructed, such as the façades in Figure 5(a) and 5(c), due to the absence of appropriate constraints. In façade reconstruction, many constraints can be represented by the topological relationships between elements.
Before describing the topological constraints, we first impose a constraint on the geometric properties of the façade elements. Conventionally, the design of residential buildings, composite buildings, and shopping malls need to comply with certain codes and standards (Horton 2015; MTRTS 2016; European Union 2018; MHCLG 2019). For instance, to ensure that people are comfortable within the building space, the State of Victoria (2017) stipulates that the floor height must be greater than 2.5 m, and SNSW (2015) specifies that the floor height needs to be greater than 2.2 m. In multi-floor and multi-functional buildings, the height of the ground floor should be greater than that of the other floors to provide additional space for use, such as retail or commercial uses. SNSW (2015) stipulates that the ground floor height should be greater than 3.3 m. Buxton (2015) determined that the standard height for the ground floor of a shopping mall should be no less than 3.2 m. In addition to suitability, energy and daylight savings are also important factors to be considered. For instance, WAPC (2018) states that the height of a window should be greater than 1.6 m, and the State of Victoria (2017) stipulates that the minimum width of a window should be 1.2 m. We studied several codes and directives related to building design and energy saving, which provide a reference for setting the initial domain of the geometric attribute of façades; these are described in Table 1.
As façade elements are geometrically and topologically related, we cannot generate them arbitrarily. Under constraints, the process of generating a new façade layout can be outlined as follows: Step 1. First, the initial floor height, i.e. "hf", is obtained randomly. The domain of "hf" is defined in Table 1. Accordingly, the total number of floors is then obtained by calculating H=hf b c, where "H" is the height of the façade and ⌊ ·⌋ represents the round-down function.  Step 2. The number of nodes (nns) in the TLG is calculated considering that each node represents at least one floor. The attribute "sf" of each node can be determined using a stochastic allocation algorithm based on the Banker algorithm (Louchard and Schott 1991). According to the Banker algorithm, when allocating the number of floors in one node, "sf" should be greater than 0. Lastly, the sum of "sf" for all the nodes must be equal to the total number of floors. A pseudopod code is presented in Algorithm 1.
Step 3. The value of "wf" is determined based on the façade width "W", and the initial values of "wb" and "hgap" are set as 0 for each node in the TLG.
Step 4. To obtain the configuration of the DLG for one node in the TLG, a procedure similar to that used for forming the TLG is used. Thus, the heights "hw" and widths "ww" of the façade elements are randomly generated according to their domain, as defined in Table 1. As spacings exist between the elements on one floor, the value of "sp" needs to be generated randomly according to its domain, which is defined in Table 1. Therefore, the number of elements can be calculated as W= ww þ sp ð Þ b c.
Step 5. Similar to step 2, the number of nodes in the DLG can be generated randomly considering that each node contains at least one element. A stochastic allocation algorithm can be used to determine the attribute "sw".
Step 6. Finally, the remaining width on one floor can be calculated as wf À ne � ww. Using Principle 1, the edge attribute of the DLG can be assigned.

Model formulation
In the proposed framework, the façade model can be generated using a parameter set. This procedure adopts the bottom-up approach. As shown in Figure 6, we consider the window structure as an example. At the beginning of reconstruction, two window layouts are reconstructed according to the DLGs. Thereafter, according to the attributes of the red nodes in the TLG, a three-layered structure surrounded by a red rectangle is reconstructed, where each layer contains the same window layout as the layer under it. The twolayered structure on the right, surrounded by a green rectangle, is reconstructed in a similar manner. By combining these two structures, we can reconstruct a façade with windows. The position of each element on one floor can be cumulatively acquired using the attributes "wf" and the edges in the DLG. Thus, the coordinates of elements in the i-th floor can be calculated using Equation (1): Figure 6. Bottom-up reconstruction using the layout graph.
where "nd j " denotes the j-th node in the DLG, "ed" denotes the edges in the DLG, "nt i " refers to the nodes in the TLG, and "nm" is a counter. When j = 0, we have hsp ¼ H, wsp ¼ nt i wb ð Þ, nm ¼ 0. By mapping the generated model to the given data, coherence can be measured. In this context, we transform the problem of determining the optimal model for the given data into a MAP strategy (Equation (2)). Posterior probability can be defined via a joint-prior estimator and the likelihood function: where Θ refers to the parameter set containing Θ window , Θ balcony , and Θ door . Pr(Θ) represents the joint-prior estimates of the reconstructed model; and L(Dt|Θ) represents the likelihood function of the reconstructed model and the given data Dt. The weight β 2 0; 1 ð Þ can be adjusted according to the quality of point cloud data. In most practical scenarios, the weight can be set as β ¼ 0:5. According to our test results, the weight has an effect only under extreme conditions, i.e. when very few contours are extracted. This is because the acceptance of a model requires a high likelihood and also a reasonable prior estimate. We consider the window as an example to introduce the function.

Joint-prior estimates
In building façades that are significantly crowded or sparse, the components typically do not conform to the Principle of architectural form or the lighting requirements of the building (Goia, Haase, and Perino 2013), as demonstrated in Figure 5(c). A rational control function is established to assess whether the generated model conforms to the preferences of the inhabitants. Three prior conditions are designed for this function.
Under the first condition, the total width of windows on one floor is constrained. A detailed introduction of this condition can be found in Wang, Fan, and Zhou (2020), and it is also expressed in Equation (3). The smaller the value of Pr 1 (Θ), the more reasonable is the façade layout; this ensures that windows on the façade are not excessively sparse or crowded.
where nw= 2 � nw À 1 ð Þ is the ratio of the reference width of the window to the façade. We sum up this function by investigating several façades and verifying the effectiveness of the experiments.
Under the second condition, the ratio of the area of the window to that of the wall is constrained, considering the Window-to-Wall Ratio (WWR). Typically, the optimal WWR has minor variations in different regions or directions with respect to the sun. For example, Lee et al. (2013) reported that the optimal WWR in Asia was 0.25 in the south and 0.5 in the north. Goia (2016) investigated the optimal WWR for office buildings in Europe and reported it to be 0.3-0.45. Shaeri et al. (2019) analyzed the relationship between the WWR and the energy saving in urban buildings, reporting an optimal WWR value of 0.3-0.5 for Bushehr. In our experiments, we believe that the closer the WWR is to 0.3, the better is the façade layout.
The third condition pertains to the inherent topological relationship among façade elements. For example, balconies are always adjacent to windows. Therefore, we combine different components to determine whether the generated model is feasible. For each floor, the relationship can be formulated as follows: where (x window , y window ) and (x balcony , y balcony ) represent the coordinates of lower-left corner of a window and a balcony respectively, h balcony means the height of a balcony.
The abovementioned three conditions are simultaneously implemented, and the following function is used to normalize the prior estimate:

Likelihood function
Likelihood indicates the degree of coherence between the generated model and given data. This can be obtained by mapping the reconstruction model to an actual façade point cloud. The higher the degree of coherence, the better is the parameter configuration. In the reconstructed models, windows are represented by a rectangular Bounding Box (BBox). After mapping these BBoxes to the refined façade structure, we can measure the degree of coherence by counting the number of laser points within the BBox area, denoted as L 1 . However, using this calculation function alone may generate a large component, because the larger the area, the greater is the number of points. Therefore, a point density calculation method is also required to ensure the reliability of the likelihood function, denoted as L 2 .
(a) Point counting function To simplify the calculation complexity, we first project the façade plane onto a 2D plane, where the x-axis represents the width of the façade and the y-axis represents height. Each generated rectangle in the model is regular with respect to the coordinate axes. The Euclidean distance from each laser point to the center of the rectangle is calculated, where the distances along the x-and y-directions are expressed as x d and y d , respectively. When x d < ww=2 and y d < hw=2 are satisfied simultaneously, the measured laser point can be assigned to the window class. The total number of laser points corresponding to the windows can be expressed by point_window (Θ window ). We set the total laser points in the façade as length(Dt); then, the function can be built as follows: Point density is defined by the number of laser points contained within a single rectangle. Using the previous function, L 1 , point_window(Θ window ) was obtained. Thus, the point density can be calculated using Equation (9): In this manner, the total area of all the rectangles in the generated model can be calculated easily.
Similar to the procedure for the window, the coherence of the balcony and door can also be calculated. Consequently, the total likelihood function can be obtained as follows:

Global optimization
We aim to calculate the value of Θ* according to the MAP. Searching for optimal model parameters is a non-convex optimization problem. Simulated annealing can be used to obtain the global optimal within a high-dimensional space. Theoretically, simulated annealing can escape from local optima in a probabilistic manner, ensuring convergence to the optimal global solution for any initial parameters (Lafarge et al. 2008).
The steps involved in simulated annealing used for this purpose are as follows: In this task, we set α ¼ 0:95 and T 0 ¼ 100. When the temperature decreases to T min ¼ 10 À 6 or the current solution does not change even after iter max ¼ 50, we assume that the algorithm has converged. The abovementioned values represent the optimal configurations for simulated annealing, as summarized in our experiment. When generating new hypothetical parameters, Gibbs sampling can be used to ensure that each generated solution is meaningful for a particular façade.

Gibbs sampling
When generating a new solution from the parameter space, two problems need to be addressed. First, owing to the existence of several non-independent parameters in the layout model, direct sampling from the parameter space is difficult. Second, random sampling from a high-dimensional space can yield invalid solutions. To address these issues, we adopt Gibbs sampling to generate candidate solutions, influenced by the geometric and topological relationships among the

Algorithm 2: Optimization
Step 1. Input Initial parameter Θ, Initial temperature T 0 , iteration = 0 Step 2. Cooling schedule T i α i � T 0 While T i > T min and iteration < iter max Do Generate a new parameter Θ new from parameter space. // Gibbs sampling Step 3. i←i + 1 and Repeat Step 2 Step 4. End and Output Θ elements. To increase the rate of convergence, we also control the Gibbs sampling process by designing the proposition kernel Q m .

Proposition kernels
At different stages of the algorithm, we select different kernels for sampling under different probabilities. The proposed kernels narrow the dimension at different stages, limiting the scope of finding the optimal solution. This narrowing allows the algorithm to produce as many meaningful parameter combinations as possible. Three kernels (Q 1 , Q 2 , and Q 3 ) are proposed in this work, and the corresponding changes are shown in Figure 7, where we consider the window as an example.
Q 1 : TLG kernel. This kernel plays an important role during the initial stage of the algorithm. Under this kernel, the DLG is fixed to only one node, which prevents large changes in the horizontal direction. Thus, we can search for the optimal configuration of the TLG. The change in state from (a) to (b) in Figure 7 expresses the application of Q 1 .
Q 2 : DLG kernel. In this kernel, the number of windows and the width of the window can be selected as primary parameters when searching for the optimal solution. This can cause significant changes in the prior estimate and affect the area of windows; this is illustrated by the change in state from (b) to (c) in Figure 7.
Q 3 : Optimization kernel. In this kernel, we only disturb the formation of nodes and edges in the TLG and DLG. Disturbing the formation is necessary for finding a true global optimal solution. In the iterations close to a better solution, some parameters may not help optimize the model. Several changes still occur in the layout model under the same conditions of window size and number of windows. The changes in inter-cluster and intra-cluster spacings of the DLG can generate different models, similar to the change in state from (c) to (d) in Figure 7.
In the simulated annealing process, the kernel is selected dynamically. The kernel is selected based on two aspects. The first is the posterior probability of the current solution. There exists a case where a better solution is effectively found during the initial stage. In this case, additional iterations on this kernel are unnecessary. Therefore, a new kernel can be assigned a greater probability to help the optimization enter the next stage. Second, when iterating up to a certain time, finding better solutions under some conditions can be difficult. In this case, a suitable kernel should be selected to escape from potential local optima. The selection probability of kernels is detailed in Table 2.

Experiment and evaluation
In the experiments, we tested the ability of the façade layout graph model to reconstruct semantic façades. The Intersection over Union (IoU) was used as a criterion for the quantitative evaluation of reconstruction accuracy. Three datasets were used for the experiments.

Experimental data
Dataset-A was obtained through a single mobile laser scanner from a vehicle-based system. Each laser point records, in detail, the x-, y-, and z-axis coordinates; scanning time; and reflectivity. Dataset-B is a part of public dataset "Paris-Lille-3D" (Roynard, Deschaud, and Goulette 2018), where each point is characterized (x, y, z, x_origin, y_origin, z_origin, GPS_time, reflectance, label, and class), and the point density is approximately 1000-2000 points per square meter. This dataset was acquired using a multi-beam LiDAR sensor from a vehicle-based system. Thus, the point density in most areas can be considered similar; however, an anisotropic pattern also exists. It should be noted that the Paris-Lille-3D dataset was collected on a street in Paris. The buildings in Paris feature many different architectural layouts; therefore, this dataset serves as practical proof for validating the proposed method. Dataset-C is a subset of the public dataset SEMANTIC3D. NET (Hackel et al. 2017), acquired via a static terrestrial laser scanner. Point clouds acquired via static scanners exhibit differences in the point density under varying distances. Therefore, this dataset can be used to verify the robustness of the proposed method for different qualities of façade data. An overall description of these datasets is provided in Table 3.
We manually segmented façade and non-façade points. Notably, sloped roofs were included in the nonfaçade class; this is because a sloped roof can increase the height of the façade, sequentially increasing the actual façade area and affecting the calculation of the rational control function.

Reconstruction results
In experiments, we use this procedure of Figure 6 to reconstruct regular, complex and low-quality facades. We selected Bld-1-5 from Dataset-B to show the reconstruction of regular facades. Bld-6 is also from Dataset-B to show the reconstruction of complex facade. Bld-7 and Bld-8 are from Dataset-C, which are low-quality point clouds, so they are used to verify low-quality facade reconstruction. Moreover, entire results of the three datasets can be found in Figure 12.

Regular façades
Regular façades have similar layouts, which reflect the shape of elements and the spacing between them. Thus, we can use the proposed optimization method with constraints to accurately infer the optimal façade model. Figure 8 shows the reconstruction results of regular façades; structural refining results are shown in the second row of this figure. Different components were separated using the RANSAC method; however, for the sake of presentation, they are combined in this figure. The final results obtained using the proposed method to search for optimal models are depicted in the third row of the figure. Similar to the second row, Figure 8. Results of complete façade reconstruction. results of different components are combined. It is evident that some balconies are overlapped by windows, such as in the second façade. This is because some noise from the façade data is retained even after façade refinement, which affects the determination of the optimal model. In the optimization process, to obtain a better prior estimate, the size of the window is increased. From the results in the fourth row, we can verity that these overlaps are eliminated by using bottom-up generation.

Complex façades
The proposed method is also capable of reconstructing complex data. In a complex façade, the layout of each floor is usually different, and the geometric attributes of the façade elements are also different. When refining such a façade structure, it can be difficult to identify window contours, as shown in Figure 9(b). Thus, to reconstruct this type of façade, a higher weight of the prior estimation is required for the optimization.
To avoid large windows, we limit the upper boundary of the domain of window width (i.e. "ww") during the derivation of the façade, which helps obtain a more accurate result compared to unrestricted inference.

Low-quality façades
In the point cloud data acquired via a static laser scanner, low-quality façades often exist because of the small angle of incidence and the long distance between the façade and the scanner (Dong et al. 2020). Refining the structure is not necessary for such façades because only a few points are recorded. The probability of a window being recorded is much lower than that for a façade; therefore, holes in the façade can be considered as windows. Accordingly, we use a modified form of the likelihood function (Equation (10)), i.e. L inverse DtjΘ ð Þ ¼ 1 À L DtjΘ ð Þ, for the optimization. From the results in Figure 10, we can verity that holes are found and the semantic facade models are reconstructed correctly. It should be noted that the doors in Bld-7 is randomly generated according to the rule that door is located at the bottom of facade. Moreover, in Bld-8, the deduced facade layout divides the large holes on the left side of this facade into two windows with the same layout as other parts. This partition method reduces the complexity of facade layout graph by reducing the number of nodes in DLG.

Quality assessment
To perform a quantitative evaluation of the proposed method, the ground truth of the test data is required as a reference. In this experiment, we manually extract elements from the tested façades, and these elements are classified via visual interpretation. We adopted the IoU as a criterion to evaluate the precision of the reconstructed façades. The IoU is denoted by the ratio of the number of points present in the ground truth as well as the reconstructions to the total number of points present across both results (Rezatofighi et al. 2019).
Doors are often reconstructed using a stochastic value in the reconstruction process due to the missing points on the ground floor; this leads to a low IoU score. In our experiment, the position and size of the detected windows are essential for obtaining an accurate reconstructed model. Thus, we also introduce an Accuracy score, expressed as Accuracy ¼ Ground À truth I Detectedj Ground À truth (12) Figure 9. Results of reconstructing the complex façade of Bld-6.
For Bld-1 to Bld-6, the Accuracy score was close to 1. This is because we derive their layouts from the refined façade structure, which contains points that belong to the façade elements; the higher the number of points included, the better is the result. In contrast, the Accuracy score of Bld-7 and Bld-8 is close to 0, because we infer their layouts from as few points as possible. Table 4 indicates that the proposed method achieves good performance in terms of accuracy; however, its IoU is low. This is because we do not consider the Principle of architectural form when creating the reference ground truth, which led to a smaller BBox than that derived. Additionally, in point cloud data, it is difficult to accurately determine edges. Therefore, the derived façade models only possessed the maximum fitting of the given data and not the most accurate result. From the perspective of Accuracy, the derived models fit the façade structure to the maximum extent possible, Figure 10. Results of low-quality façade reconstruction. Figure 11. Layout graph of the window, balcony, and door for Bld-6. In the window layout, we refine the DLG. Nodes with identical attributes can be constructed using attribute inheritance, thereby reducing the number of parameters. In the balcony layout, we apply the same attribute inheritance method to the TLG to refine the structure of the layout graph.   Figure 12. Overall reconstruction results of chosen datasets, where the cyan denoted windows, the pink denoted balconies and the blue denoted doors. Among these results, facades are generated using the CityGML LoD3 standard. Roofs in these buildings were generated by using Random3DCity engine (Biljecki, Ledoux, and Stoter 2016). proving the effectiveness of the method. It is worth mentioning that better reconstruction results were obtained for the low-quality façades than for other façades. This is mainly because, in these façades, windows and doors are clearly represented as holes, making it easy to obtain accurate results using the designed L inverse DtjΘ ð Þ function.

Limitations
Although the proposed layout graph model can describe different types of façade structures, as evidenced by the experimental results, it is not sufficiently refined, which is reflected by its descriptions of complex façade structures. We assign six attributes to each node in the TLG, including the DLG. The DLG contains nodes with three attributes and edges with two attributes. When the façade structure changes, it is necessary to increase the number of nodes in the TLG, which results in at least 11 attributes being increased, such as the 96 parameters used to represent the layout of Bld-6. However, this is still simpler than the grammar-based approach, which uses 153 rules to generate façade structures. Moreover, based on our observations, the models derived via the proposed approach can be stored and delivered in a more concise storage spacing compared to those from previous approaches. This process of simplifying model can be learned via a combination of rules in learning grammar and Occam's razor, and the parameter set used in the proposed approach can be refined using inheritance rules. Furthermore, through this method, we obtain a refined result with a reduction of 17 parameters (shown in Figure 11). However, automation of the refining process could not be achieved. In the following work, we plan to explore methods to make the proposed model more refined.

Conclusions
From the experimental results, it is evident that the proposed façade layout graph model is suitable for reconstructing different types of façades. As advantages, this model features a clear hierarchy and enables the intuitive interpretation of façade structures. In particular, we use the same type of graph for both the TLG and the DLG. The derivation of these two levels of the graph model is based on the Principle of architectural form and Gestalt principles. Therefore, the facade layout model can be parsed and reconstructed in a simple manner, similar to the top-down and bottom-up strategies. For the derivation of the optimal model, we use structural knowledge of the building to design constraint conditions, thereby reducing the domain of parameters. Simultaneously, under these constraints, we use Gibbs sampling to obtain new candidate models.
The geometric and topological relationships among high-level façade structures are reflected by the clustering of floors in the TLG, while those among low-level façade elements are represented by the clustering of elements in the DLG. Based on the description of these relationships, we impart additional information to the semantic façade model, as compared to simply reconstructing it from semantic segmentation; this is expected to enable computers to understand façades in a more intelligent manner. Moreover, when using the proposed method for describing façade structures, façades can be easily edited without affecting other floors, such as adjusting the location and size of façade elements and deleting or adding façade elements; this will be conducted in our future work.