Data-driven estimation of building interior plans

ABSTRACT This work investigates constructing plans of building interiors using learned building measurements. In particular, we address the problem of accurately estimating dimensions of rooms when measurements of the interior space have not been captured. Our approach focuses on learning the geometry, orientation and occurrence of rooms from a corpus of real-world building plan data to form a predictive model. The trained predictive model may then be queried to generate estimates of room dimensions and orientations. These estimates are then integrated with the overall building footprint and iteratively improved using a two-stage optimisation process to form complete interior plans. The approach is presented as a semi-automatic method for constructing plans which can cope with a limited set of known information and constructs likely representations of building plans through modelling of soft and hard constraints. We evaluate the method in the context of estimating residential house plans and demonstrate that predictions can effectively be used for constructing plans given limited prior knowledge about the types of rooms and their topology.


Introduction
Measurement and modelling of interior spaces has received attention in a variety disciplines. Within engineering contexts, approaches to tackle this issue often use laser scanners for making high-resolution interior models. However, despite considerable research into utilising these devices for modelling indoor spaces, their usage is still time-consuming, particularly in more confined areas. Mobile and human-mounted systems (such as Bosse et al. 2012) reduce the difficulty and time-consuming nature of this survey (Thomson et al. 2013). However, these systems are not available to non-professional users or those requiring a low-cost measurement solution. Low-cost systems using active-sensing technology have demonstrated reasonable accuracy, but these approaches still require the addition of hardware such as Kinect (Henry et al. 2012), projector (Kim et al. 2012) or laser range finders (Nguyen et al. 2013) Existing methods for constructing plans of interiors require either the prior establishment of a grammar dependent on building style and focus on office rather than residential settings or need an exhaustive capture of all wall surface dimensions. In this paper, we present an approach and experimental results of a method to construct building plans without needing to make explicit interior dimension measurements. We describe how using measured data relating to the building exterior and just the room types, topology and the orientation of one interior room can effectively estimate a 2D geometric plan of the internal layout.

Related work
Our approach is inspired by work in several disciplines including architectural design, mobile capture and grammar-based modelling. For example, modelling building interiors for aiding building design purposes has been a focus of research for many years. Arvin and House (2002) and Harada et al. (1995) both developed tools to aid in interior layout tasks. More recently, Liu et al. (2013) developed a system to generate costeffective designs for buildings and their interiors composed of precast concrete slabs. Merrell et al. (2010) developed an approach to generate floor plans from a set of highlevel specifications such as the building footprint, number of a particular room type (e.g. bedroom or bathroom), room types and adjacencies. However, all these approaches focus on the automatic design, rather constructing as-built interior measurements as considered here.
For low-cost mobile measurement capture, Sankar and Seitz (2012) and Pintore and Gobbetti (2014) propose systems for floor plan creation using smart-phone sensors. However, their approaches aim at interactive, non-metric, floor plan creation at an arbitrary scale. Rosser et al. (2015), Pintore et al. (2016) and the mobile app MagicPlan (Sensopia 2014) target estimation of plan dimensions which have a high geometric accuracy; however, none of these approaches tackles estimation of unknown dimensions, thereby reducing measurement data capture, as is proposed in our work.
Other methods investigating exploitation of phone sensor data have proposed estimating plans by tracing movements using Wi-Fi signals, for example Shin et al. (2012) and Gao et al. (2014); however, these are office-mapping approaches that require a pre-existing and dense set of Wi-Fi base stations which are not necessarily available in a private residential building.
Recently, Loch- Dehbi et al. (2016) proposed an approach for automatically deriving floor plans using both logical and stochastic reasoning to determine the most likely internal geometry. The work describes an application of constraint logic programming and Bayesian networks trained on a database of floorplans to hypothesise wall positions. However, the authors only demonstrate their method for an office environment, rather than for residential plans as we describe here. Furthermore, Loch- Dehbi et al. (2016) do not describe the geometric accuracy of their method.
Use of formal grammars constitutes a procedural method for defining the composition of objects and offers a promising approach for modelling interior spaces. Peter et al. (2013) reconstruct a coarse-building model from an evacuation plan and refine it using Inertial Measurement Unit data and a grammar to constrain hypothesised room representations. Becker et al. (2013) detail use of an indoor grammar that is automatically derived from observation data and thus is sufficiently flexible to be used for interior mapping. Although the authors note the advantage of using the grammar to hypothesise unobserved indoor spaces, the results are focused on mapping of office environments. Furthermore, a full accuracy assessment is not made, although the authors note an average room width error of around 2 m.
Grammar approaches have also been used to estimate interior plans based on the building's footprint and observable external features (e.g. windows, doorways and chimneys) (Yue et al. 2012). However, such an approach requires pre-existing specification of the grammar, thereby limiting it to a particular style of house. Grammars have also been demonstrated as the basis for creating as-built 3D building models, enabling dimensions to be initially guessed with the addition of measurements added by users at a later stage (Hohmann et al. 2010). However, appropriate estimates of dimensions are not inferred from a corpus of real plans and then automatically refined, as proposed in our work.
Although not specifically tackling modelling of indoor spaces, recent research on learning grammars rules using probabilistic methods is interesting to note. Specifically, predictive modelling approaches are used to learn grammar rules for reconstructing of building facades Martinovic and Van Gool 2013). Similarly, Fan and Wonka (2016) describe a parametric description of a building which uses a graph-based model to encode the joint probability of the building's attributes. An optimisation algorithm is then used to convert these attributes to a 3D geometric model.

System overview
The aim of the system is to support the construction of building plans when no interior measurements have been taken. Our approach takes a probabilistic method to handling the uncertainty arising from missing measurements and refines predictions using an optimisation model. A predictive model encodes the occurrence of particular room types and their associated dimensions learned from a corpus of real-world residential building plans. Then, given knowledge from the user about the building exterior and rooms, this model is used to estimate the remaining unknown variables. The knowledge provided by the user includes total floor area and building footprint aspect ratio, as might be sourced from a topographic mapping database, and a list of categorical room types. Using this information, the system infers likely room measurements. These predicted measurements are then refined in an optimisation process to generate the most plausible building plan. Figure 1 illustrates the sequence of user interactions for specification of this input data. Initially, the user provides information regarding the room types and a shape category of rectangular or L-shape (step 1a). These room shapes impose a Manhattan world-like constraint on the house plan; however, this common form found in building interiors (Steadman 2007). In addition, they provide an outline of the building in question (step 1b) which provides a basis for total area and footprint aspect values. Using this prior information, the system estimates the room dimensions and orientations (step 2). The user places these rooms at their approximate position within the building outline (step 3) and specifies the topology of the space as defined by walls and doorways (step 4). A twostage optimisation process then generates the building plan estimate (steps 5 and 6).
Steps 5 and 6 simplify the optimisation undertaken by the system by separating the two styles of adjustments required to construct an accurate plan.
The remainder of this article is structured as follows: Section 4 describes the probabilistic model and its data, Section 5 introduces the optimisation approach, Section 6 reports on experimental results, Section 7 discusses the implications and assumptions and Section 8 presents the conclusions of the work.

Prediction model
To form the basis of the predictive model used in the system, an appropriate data-driven technique is required. Specifically, a method for encoding the variables that comprise building plans is needed to form a knowledge base of likely room dimensions and orientations. Bayesian networks provide an effective means for achieving this task. These graph-based models can be used to model distributions of plan geometry such as building's total floor area, shape, existence of different room types and dimensions. We build upon techniques presented by Merrell et al. (2010) who investigated learning floor plans for architectural prototyping and visualisation. While Merrell et al. (2010) focused on architectural design of suitable plan topologies and layouts, the work presented in this paper focuses on supporting as-built survey of buildings and concentrates on room dimension and orientation estimation. The use of Bayesian networks was based on the following rationale: (1) Bayesian networks allow manual specification of the probabilistic network structure. Although automatic learning of Bayesian network structures is possible, an understanding of the problem domain allows assertion of suitable conditions based on prior knowledge and does not rely on dependencies detected in a relatively small training sample.
(2) Once trained, Bayesian networks support estimation of variables when no existing information is available. However, if information is available, it may be included as evidence and may be taken into account when inferring the values of other variables, depending on the network structure. This is ideal for a building surveying problem where measurements may be added during the course of the survey process, enabling generation of refined predictions of unknown measurements elsewhere in the building. (3) Bayesian networks show the degree of belief associated with a prediction and so provide an indication of confidence in an estimate. This information regarding confidence may be fed back to the user to guide the survey process or automatically prioritise the addition of measurements.
Implementation of Bayesian networks in this work was achieved using the Bayes Net Toolbox for Matlab (Murphy 2001).

Plan data
Foxtons, a London estate agent, was used as a source of real world building plan data for the purposes of this work (Foxtons 2013). Although databases of house plan designs do exist, using plans based on actual buildings' measurements is preferable and more representative of real layouts than designed plans produced by architects. The precise accuracy of each plan is not known, but each is required to conform to a code of practice asserting that the accuracy is fit for purpose (RICS 2007). A corpus of 99 plans of residential houses in the South London area, chosen to represent a variety of suburban property types, was compiled. An automated scraper for downloading the data was developed, with further manual processing of each plan undertaken to check and add additional attributes. This time-consuming processing restricted the number plans that could be sourced. The data captured included plan geometry and room attributes comprising the type, area (in m 2 ) and the maximum length and maximum width found for each room (in m). Therefore, rooms with more complex shapes such as those with a protrusion (e.g. a bay window) include this additional length in the room measurements. To create a large and representative training dataset, the modelling focus was restricted to considering plans of the first floor of houses and concentrated on modelling of bedroom, bathroom and landing spaces. Where possible, houses were classified as terraced (55%), end-terrace (5%), semi-detached (25%) and detached (5%), with the remainder (9%) of unknown type.
For each plan, details of the overall building footprint, individual room types, and room area, shape and orientation were recorded. Specifically, to represent the overall building footprint, the total floor area (in m 2 ) and aspect ratio were encoded. Then for each plan, the presence of particular room types was recorded. In addition, for each room type, the shape of the room, as represented by its aspect ratio and room area (in m 2 ), was encoded. For all rooms, apart from the landing space, the room area value was taken as rectangular area as defined by the length and width dimensions. This assumption of rectangular spaces was deemed suitable for habitable rooms as everyday buildings conform to this specification (Steadman 2007). However, for the non-habitable circulation space of the landing, frequent deviations from a rectangular shape were observed. Therefore, the polygon area of these spaces was measured and encoded. In addition, for each room in a plan, the room type, area, aspect ratio and orientation with respect to the building footprint (i.e. orthogonal or parallel to the footprint's longest edge) were recorded.

Network design
Specification of a probabilistic network structure is required for representing conditional dependencies between variables in the corpus. These dependency links enable sampling and inference tasks to be carried out on the network nodes. Links between nodes were formed based on logical assumptions and assertions about residential houses, discussed in detail below.
To tackle estimating both room dimensions and room orientation, two separate networks were designed for each task. Table 1 summarises the dependencies between variables and the rationale behind their specification with additional details for each network described below.

Dimension prediction network
Here, the network design for modelling room dimensions, shown in Figure 2, is described. First, root nodes in the network, that is, those variables with no parents, were identified. For the purposes of constructing building plans, variables relating to the Table 1. Dependencies and rationale for probabilistic network structure.

Dependency Rationale
Total floor area > existence of room Larger total floor area will likely contain a larger number of rooms Total floor area > room area Larger total floor area will likely mean rooms of larger area Room type > room area > room aspect Room type indicates affects sizes and dimensions of rooms Footprint aspect > room aspect Overall shape of the building affects shape of rooms Footprint aspect > room orientation Overall shape of the building affects how rooms are arranged Existence of master bedroom > existence second bedroom > existence third bedroom Existence of particular room types affects the existence of other room types Master bedroom area > second bedroom area > third bedroom area Size of particular rooms affects the sizes similar room types exterior building footprint, namely total floor area (FloorArea node) and footprint aspect (FloorAspect node), were identified as the dominant cause directly affecting room shape and the number of rooms within a building. These nodes are conditionally independent and are assigned prior distributions during parameter learning. As the number of usable rooms that can fit within a particular building footprint is defined by the space available, links between each room existence node (e.g. Bedroom1Exists, BathroomExists) and FloorArea were specified. For example, buildings with small footprints may only have first floor space for two or three bedrooms and bathroom. Historically, older nineteenth-century terrace houses were commonly built with a 'two-up, two-down' room specification but are often extended with a backprojecting annex to support an additional space on the first floor (Brown and Steadman 1987). Furthermore, the size of these rooms will be informed by both the existence of the room (a room must exist to have dimensions!) and the total area available. Thus, dependencies were specified between both room area and floor area.
Inter-room dependencies were specified between bedrooms and the associated area variables for each type. The first bedroom forms the master bedroom and thus is likely to have a larger floor area than the other bedrooms. These rules also reflect the 'universal plan' of semi-detached houses which was commonly built across the United Kingdom and specified three bedrooms, two of which were large enough to accommodate a double bed, whereas the third was a considerably smaller 'box room' (Allen 1934 cited by Brown and Steadman 1987, p. 430). Although this assertion is not recent, we reason that dependency relationships between bedroom type areas are unlikely to have changed drastically given the old age UK housing stock [over three quarters of the stock is built before 1980 (DCLG 2015)].
Room shape, represented as an aspect ratio, was deemed to be dependent on the overall shape of the building footprint. The assertion here is that long and narrow houses tend to encourage rooms lying along its axis to also be longer. As mentioned below in regard to orientation, this is particularly the case for the landing (which includes the stairs) area which, if it lies along the dominant axis of a long building (i.e. as a side staircase), is usually required to connect to all rooms on that floor level. Brown and Steadman (1987) note that the side staircase is particularly common in terraced houses.

Room orientation prediction network
A separate network structure was designed for the modelling of room orientations, illustrated in Figure 3. Recall that the aspect ratio of each room in a plan is not relative to the orientation of the building footprint. Therefore, to provide a useful estimate of a building plan within a given footprint, knowledge of the orientation of the room is required. An alternative approach might be to model room aspect ratio, with respect to the global footprint. However, this creates a multi-modal distribution which is more difficult to apply in the plan optimisation step discussed in Section 5.
The same root nodes of FloorArea and FloorAspect as the preceding example were used as the basis for the orientation prediction network. Floor aspect forms the main indicator on likely orientations of rooms. However, this orientation is also dependent on the room existing, which in turn is dependent on the floor area available. Hence, dependency links for those variables were also introduced. As was the case for room aspect ratio, narrower buildings were thought to be more likely to have rooms following the dominant axis. As mentioned by Brown and Steadman (1987), the greatest flexibility in a semi-detached house is achieved with the staircase (i.e. landing, in this work) against the inner (party) wall, since this maximises the number of windows for habitable rooms.

Parameter learning
The model parameters in both networks were implemented as discrete node types for all variables. For room existence and room orientation variables, the nodes were specified as binary types. The remaining raw data within the corpus were continuous and were binned into discrete sets of values using a linear spaced binning. The number of bins for each variable was set experimentally to balance information loss and the precision of the variable. Note that each variable tended to have a different number of data points (e.g. the WC room appeared infrequently), and therefore visual testing was used to identify an interval appropriate with consideration of the sample size and room type. For example, total floor area was discretised into 15 bins based on a linear spacing ranging from 15 to 100 m 2 (resulting in a bin width = 6.0714 m 2 ). Figure 4 shows examples of the discretisation used for continuous variables in the corpus.
For learning, parameter values the expectation-maximisation algorithm [as implemented in the toolbox (Murphy 2001)]. Once the Bayesian network is trained, a set of high-level evidence may be used to carry out inference with the model. For example, floor area, floor aspect and room existence nodes (e.g. bathroom exists) may be set as evidence. To compute probabilities of node values, the junction-tree algorithm (Lauritzen and Spiegelhalter 1990) is used [as implemented in the toolbox (Murphy 2001)].

Optimisation model
Once trained, the predictive model described above may be used to provide initial estimates of building dimensions. However, this information alone is insufficient to generate accurate and consistent building plansthe model generates predictions of room dimension and orientation but does not take account of the topology of the space. Furthermore, there is uncertainty in the predicted room dimensions, orientations, and the absolute positions of rooms are unknown. In particular, the predicted room dimensions, comprising area and aspect, are based on an interval (bin) centre with an associated probability. Similarly, the predicted orientation of each room is a binary value with associated probability. This uncertainty leads to gaps and overlaps between room shapes. Thus, further processing is required to identify the best fit of room measurements with respect to each other and the overall building footprint.
The application of an optimisation model is used to resolve these conflicts (recall from Figure 1, there is a sequence of prediction, layout optimisation and finally dimension optimisation). A bespoke simulated annealing approach is applied to a cost model defining the quality of a building plan. Preliminary experiments indicated poor results when optimisation of predicted dimension and orientation values was carried out simultaneously. The uncertainty and error in the predictions means identification of the most plausible building plan using optimisation is challenging. For example, uncertainty in the orientation estimations creates multiple minima of a similarly low value. This presents a problem to a stochastic process which may arrive at different solutions over repeated runs. To address this, a two-stage approach was adopted: plan layout optimisation and plan dimension optimisation. The aim of the plan layout stage is to identify the position of a room close to its true location and orient the room correctly within the footprint. The dimensions of rooms are then adjusted to refine measurements and form a final building plan.

Initial plan estimation
Prior to completing the optimisation, initial measurements of room dimensions are estimated using the predictive model. Given the building footprint area, footprint aspect and specification of room types, probabilities of room area, aspect and orientation may be computed, as discussed in Section 4. Taking the user-specified shape types (defined either as rectangular or L-shape), initial room models are generated according to the predicted values. The area and aspect values are then computed as the bin centre of the predicted discrete class. These predicted room shapes may be placed in approximate positions within the building footprint by the user. The topology of the plan is defined through specification of paired room edges.

Proposal moves
Different groups of proposal moves were chosen for each step in the two-stage optimisation pipeline. Figure 5 contains illustrations for each of the proposal moves. During plan layout, the following moves were used: • Room displacement A single room is permitted to move independently within the free space defined by the exterior building shell and other room elements. The move comprises a 2D translation. The magnitude and direction is determined from a normal distribution. Thus, for room object i, its position, defined as coordinates p i , maybe updated as pi → pi + δp where p~(Nx(0,σ) Ny(0,σ)) and x and y are variables relating to x and y coordinates. An effective value for σ was determined experimentally as 0.05 (m).

• Interior plan displacement
All rooms in the plan may be shifted within the building footprint simultaneously. The move comprises a translation of all rooms where the magnitude and direction is drawn from a normal distribution. Thus for all room objects R→R+δp where p~(N x (0,σ) N y (0,σ)). A value for σ was determined experimentally as 0.05 (metres).

• Room orientation flipping
The orientation of a room without a locked orientation may be adjusted such that its aspect ratio is inverted. For rectangular rooms, this is analogous to a 90-degree rotation. For L-shapes, the edges defined about the interior angle must also be adjusted. To ensure that the room remains within the boundary of the building, the adjustment is applied to the edges closest to the centre of the building (defined by its centroid). Thus, the position of the room edge that lies along an exterior wall remains unaffected.
For the plan dimension optimisation step, aimed at refining individual measurements of rooms, the following moves were used: • Room displacement (as above) • Wall adjustment The shape of a single room is adjusted by moving an individual wall. The wall is randomly chosen and translated orthogonal to its direction. The magnitude of the translation is determined from a normal distribution δ~N(0,σ) where σ was determined experimentally as 0.05 (m).
• Shared wall adjustment Shared wall adjustment randomly chooses a pair of adjacent (parallel) walls and slides both features by the same distance. The magnitude of the translation is determined from a normal distribution δ~N(0,σ) where σ was determined experimentally as 0.05 (m).
In order to support fixing of particular dimensions during the survey process, rooms may have their measurements or their orientation locked. Where this is the case, proposal moves that modify the room's measurements or flip its orientation are not generated during optimisation.
To limit the neighbourhood space and guide the optimisation towards a feasible and likely final building plan, the following hard constraints are employed: • Topological constraints: Rooms in any configuration must be inside the area defined by the exterior shell of the building. During plan layout optimisation, intersections between rooms are permitted whereas moves generating these conflicts during dimension optimisation are rejected. • Shape constraints: User-provided initial data comprise definitions of room shapes which must be maintained and thus reflected or inverted L-shapes are not permitted. Furthermore, in order to effectively evaluate room aspect cost during plan dimension optimisation, the aspect ratio may not become inverted (i.e. the longest edge becomes the shortest edge) following application of a proposal move.

Cost function
The cost function C(x) is optimised for plan layout and dimension adjustment. The cost is defined as the weighted sum of the cost terms: where x is the building plan and α wt , α aw , α ic , α oc , α pa , α pma and α pas are weighting coefficients that define the relative importance of each corresponding cost term. In this work, all weights were set to 1 to prevent dramatic changes in the cost-surface topography which could make the optimisation process less stable. The cost terms, wall thickness (C wt ), wall adjacency (C aw ), room intersection (C ic ), room orientation (C oc ), room area (C pa ), room area maximum (C pma ) and room aspect (C pas ), are defined individually below.

Wall thickness
Walls need to exhibit a likely thickness within a plan. We adopt the same approach to Rosser et al. (2015) and use an objective function to evaluate wall thickness. Let w = {w 1 ,w 2 ,. . .w k } denote all walls in the plan. The wall-thickness cost for adjacent walls is where k = number of contiguous walls D(.,.) is the distance operator w i and w j are a pair of walls, p w i w j ¼ 1 if w i and w j are adjacent and 0 otherwise, l and u are lower and upper thickness bounds and t is a thickness function. The objective function t is defined as Values for l and u for interior-exterior walls were set as 0.21 and 0.30 m to support single-brick thickness and thicker cavity walls. Values for l and u for interior-interior walls were set at 0.15 and 0.2 m, respectively, based on single brick and typical internal partition thicknesses (Marshall et al. 2013).

Plan topology
The topological structure of a building, as defined by wall adjacencies and connecting doorways, must be maintained to ensure that a likely building plan is generated. To achieve this, the pairwise relationships between objects are utilised. For a pair of adjacent walls f and g, the maximum thickness outside of the recommended upper limit u is taken.
where p fg = 1 if f and g are paired walls and 0 otherwise, with

Room intersection cost
During plan layout optimisation, uncertainty in the predicted dimensions results in overlapping building plans. To account for this, intersections between rooms are permitted but are penalised. The intersection cost is where A(.) is the area operator, R i and R j are rooms and n is the number of rooms.

Room orientation prediction
During plan layout optimisation, the orientation arrangement of rooms within the building footprint is evaluated. In these experiments, the existence of particular room types is assumed to be known enabling the previously learnt Bayesian networks to provide probabilities for each room's orientation. The predicted orientation cost evaluates the plan using the conditional probability of room orientations where P(O|E) is the probability of the orientation O given evidence E, n is the number of rooms and O(.) returns the orientation of room R i . The query-set roomExistence is the set of room existence nodes and c is the observed values of these variables.

Room area prediction
The relatively small training sample size and discretised data used in the predictive model results in a rough surface to optimise. To compensate for this, the area estimation function is approximated as a normal distribution defined by parameters derived from the predictive model. This allows for some flexibility in the room prediction area, which is defined according to an imprecise bin interval. The predicted area cost for the rooms is defined as where A(.) is the area operator and where P roomType is the probability associated with the corresponding room-type node for room i in the trained Bayesian network.
That is, the most likely bin for the room type is used as the mean with half of the bin width β area as the standard deviation σ. Figure 6 illustrates an example prediction of master bedroom area taken from the trained Bayesian network together with the normal distribution computed in the cost function.
The use of a Gaussian function for modelling predicted area within the optimisation introduces an additional requirement to the cost function. In order to consistently converge to a low final value, the optimisation should ideally not encounter plateauing topography in the cost-surface landscape. Such areas are challenging as the optimiser is not able to discriminate between points on the surface as lowering or raising the function value and therefore convergence to satisfactory solution will not occur. As the gradient of the Gaussian function lessens at the tails, a term is introduced to prevent the optimiser drifting too far from likely area predictions.
with μ, β and σ as defined in Equations 9 and 10.

Room aspect prediction
Maintaining the predicted aspects of rooms within the building plan during optimisation is achieved in a similar way to room areas. A normal distribution based on the bin width of the room type is used. The predicted room aspect cost is where Asp(.) computes the aspect ratio of the room and β aspect is the corresponding aspect bin width.

Experimental results
For experimental testing, the FloorArea and FloorAspect nodes were assumed to be variables that may be observed directly (i.e. set as evidence) during plan construction. This assumption was made based on the ease of acquiring accurate topographic mapping of residential buildings which can be used to calculate these.

Predictive model cross validation
To assess the validity of the predictive model, leave-one-out cross-validation (LOOCV) was completed. Thus, the data were iteratively partitioned k times into training and testing datasets, with each observation used once as a prediction ground truth. The advantage of this is to maximise the use of the relatively limited sample size in model training while minimising the degree of bias in estimated error statistics. Evaluation of the classification accuracy was completed for area, aspect and orientation predictions. In this assessment, the most likely predicted class of a particular dimension is drawn from the network and compared to its true class. For example, the prediction for a dimension, d pred , for either area, aspect or orientation, is taken as the most likely class for the given evidence, that is, d pred = max (P node |planEvidence = c), where the planEvidence c is constructed for each test case. Due to the level of discretisation used in the predictive model, probabilities for d pred may not always be inferred using the network. The reason for this is insufficient data points in the training set relating to the given evidence, and thus no probable value can be calculated. In such cases, it is clear that an estimate is not available and thus test cases where this occurs are not included in the accuracy assessment. Note that this issue is the result of a relatively small experimental dataset and could be avoided in a practical deployment of the proposed methodology with a larger training sample. In the worst case, 16% of test cases had no estimate during the LOOCV testing. Table 2 shows the LOOCV prediction accuracy for room area, aspect and orientation values. To provide a comparison to the proposed Bayesian network prediction method, results for a majority learner model are provided. The majority learner prediction is taken as the most frequently occurring interval in the training data. On average, the proposed Bayesian network approach predicts the correct area class or the adjacent interval for 86% of cases. The proposed method is demonstrably better than the majority learner at correctly predicting first, second, third bedrooms and landing area classes, with improvements of 14%, 7%, 15%, 13%, respectively. However, the majority learner achieves higher accuracy than the Bayesian network method for bathroom and WC area prediction.
When predicting aspect ratio, the accuracy is found to be slightly worse than when predicting area. On average, the correct or adjacent interval is predicted for 80% of cases using the Bayesian network. By way of comparison, the majority learner achieves an average of 73% across all room types.
Room orientation is a binary classification test with an average accuracy of 79% across all room types. For most room types, the Bayesian network attains equal prediction accuracy to that of the majority class. However, for the master bedroom orientation prediction, the Bayesian network shows a distinct advantage (84% vs. 53.25%).
It is worth noting that for all variables, the lack of available data regarding WC room types (only 16 records in the corpus) makes accurate prediction of this data challenging.

Accuracy assessment
To evaluate the use of the predicted measurements for generation of as-built plans, experiments assessing the accuracy and stability of optimisation were carried out.
To generate building plans, four example cases were taken from the data. These examples were chosen to demonstrate a range of plan compositions with differing numbers of spaces, room types and shapes. Predictive models for each example were trained on all plans remaining after exclusion of the example case in order to maximise the training set size. Room shapes were generated using the estimated area, aspect and orientation dimensions. See Tables 3, 4 and 5 for predicted the area, aspect and orientation values for these examples. In these experiments, the orientation of the landing area is a known value as access to this area is necessary to determine approximate room positions and topology. Five optimisation runs (including both stages of optimisation) were completed with the lowest cost result chosen as the final plan.

Plan layout and dimension optimisation
For each test case, the input plan generated from the estimated measurements and results of both stages of optimisation is shown from Figures 7 to 10. Notice the two smallest rooms in the input data of Plan C shown in Figure 9 have incorrect orientation predictions (i.e. both rooms are estimated to be in parallel with the overall building orientation, whereas the ground truth shows this is not the case). Following layout optimisation, the orientations have been corrected. Referring back to Table 5, the prediction probabilities for these values are relatively low and the high cost of intersection has encouraged flipping of their orientation during layout optimisation..

Accuracy assessment
Accuracy is assessed against ground-truth measurements according to measures of RMSE and area. Table 6 contains RMSE for the each complete building plan and the maximum RMSE for any room in plan. To provide an indication of accuracy improvement for each stage of optimisation, RMSE for both layout and dimension adjustment are shown. The RMSE calculation is computed according to corresponding corner points, for example,     where d is the distance between corresponding pairs of room corner points in the estimated plan and actual data, and n is the total number of pairs across the whole plan. On average, the two-stage optimisation generates plans at 0.23 m RMSE. Plan D exhibits the lowest positional accuracy. This is likely due to the plan layout which, unlike Plans A and B, is rectangular and thus imposes less constraint on the adjustments. This also results in some inconsistency in the stability of optimisation results, discussed in Section 6.2.3. Plan C shows the highest level of accuracy with an RMSE of 0.11 m. This plan is the most complex in terms of numbers of measurements and room shapes which likely helps guide the optimisation effectively, with the larger amount of information helping to constrain the process. Table 7 presents the area accuracy results. These are seen to be within 3% of the ground truth for all plans.

Solution stability
To assess the stability of the two-stage optimisation, 50 trials each were completed on all plans. Table 8 shows the mean and standard deviation results for cost and RMSE. On  average, the plans exhibited an average of 0.05 m in standard deviation over the repeated trial runs. It became apparent that the poorer RMSE evident for Plan D noted during accuracy assessment was due to inconsistent orientation of the lower left room (a bathroom). A low confidence prediction results in the room flipping to lower the wallthickness cost during some optimisation runs, but not all.

Discussion
In this work, the generation of building plans given some limited observable evidence regarding building form is demonstrated. Using the overall building footprint in combination with some prior knowledge of the room shape, topology and probabilistic modelling of their measurements is shown to provide sufficient basis on which to construct plans with reasonable accuracy. In these experiments, measurements of room dimensions of the interior are not required, provided that the orientation of the landing area with respect to the building footprint is known. We focused on first floor building mapping scenarios as these provide a challenging set of measurements for further adjustment. Although ground-floor plans can exhibit interesting room combinations for prediction purposes (e.g. presence of open-plan kitchen and dining areas), they tend to have a few rooms and therefore have fewer measurements offering any redundancy. However, further work could extend the approach to consider multiple floors simultaneously.
The accuracy of the layout and dimension estimation was worst when both room shapes and the exterior walls provided little to constrain measurements, as for Plan D. In practical application, this might be improved by the system recommending the user provision of certain additional prior information. For example, further details of the room orientations might be provided where the uncertainty in the predicted value is high.
Increasing the size of the training data used in this work would likely enable further useful benefits, improving the prediction accuracy and associated certainty. For example, the estimated dimensions generated by the predictive model, given as discrete intervals, are incorporated in the optimisation using a Gaussian model fitted to the most probable interval. This effectively leads to the optimisation overlooking other possible predictions (e.g. if the distribution is bimodal) which may be only very slightly less probable. Given a larger training set, a finer scheme of discretisation would be possible. This could then permit the predictions to be used directly within the optimisation as probability-based cost functions, rather than their Gaussian approximation. However, this addition may also mean less consistent optimisation due to the added complexity in the cost surface.

Conclusion
This paper presented a modelling approach for constructing building plans using learned room dimensions, topology and orientations in combination with an optimisation model to generate metric scale building plans. The use of a predictive model for computing the probable values of particular dimensions is shown to be a suitable framework for modelling room dimensions and orientations. Furthermore, when the model predictions are used in a two-stage constrained modelling procedure, the results illustrate that a relatively high degree of accuracy is achievable using an accurate external building footprint. This is in spite of the fact that no physical measurements of the interior spaces are made.