Assessing a fit-for-purpose urban building energy modelling framework with reference to Ahmedabad

Urban building energy models (UBEM) are driving sustainable design and operations of cities by combining urban datasets with energy simulations. UBEMs are developed from a range of inputs on the spatial and semantic details of the buildings, and the systems affecting their energy performance. Large geographical scales with finer spatio-temporal details increase the challenges of data processing for a reliable UBEM. Thus, it is essential to understand the impact of increasing the resolution of model inputs on the outputs to balance the efforts spent on model development, filling data gaps and maintaining the reliability of the results. This research introduces a Fit-for-Purpose modeling strategy and extends the concept of Levels of Detail (LoD) used for 3 D models, to UBEM characteristics including occupancy, geometry, context, modeling methodology, and calibration with the proposed model characterization framework. A case study based on a 0.3 km2 area of Ahmedabad, India, is presented to demonstrate the framework. The results highlight a need for a higher LoD in occupancy modeling for the residential and educational buildings, whereas a higher LoD is more important for the commercial buildings’ envelope characteristics. These insights will enable a highly targeted supplementary data collection approach for the UBEM of the entire city.


Urban development in India
India's urbanization at a rate of 34% will bring 590 million urban residents by 2030 (McKinsey 2010). This will put immense pressure on the urban infrastructure systems, with an additional 20 billion m 2 floor area required in the residential and 4 billion m 2 in the commercial sector (GBPN CEPT. 2014; Kumar et al. 2010). Building operations account for 31% (AEEE 2018) of India's total energy consumption,. Urban Local Bodies (ULBs) spend nearly 60% of their operational budget on the energy bills of municipal services including water supply (Iyer et al. 2020). With 70% of the building stock of 2030 yet to be constructed (GBPN CEPT 2014), development through contemporary practices will double the electricity consumed in buildings from 544 TWh to 1192 TWh (NITI 2015). Considering this, it is a daunting task to achieve India's Nationally Determined Contribution of reducing emissions to 35% of the 2005 levels, and the Sustainable Development Goals 7, 11 and 13. Energy efficiency codes and policies like the Energy Conservation Building Code (ECBC) (BEE 2017), and the National Energy Policy will help to guide this sustainable development (Khosla and Janda 2019). However, there is an enormous gap in implementing these policies because of a lack of assessment of their effect on urban buildings & infrastructure dynamics, and the limited data currently available on a city's building stock and energy consumption patterns.

The role of UBEMs in developing policy recommendations
UBEMs (Urban Building Energy Models) are emerging as powerful tools for addressing these challenges. UBEMs, as described by Hong, Chen, et al. (2020) refer to the computational modeling and simulation of the energy performance of a group of buildings in the urban context in order to account for the interactions between buildings, urban infrastructure and microclimate. The concept of UBEM stems from the classification of Urban Energy Models into top-down or bottom approaches by Swan and Ugursal (2009). Its definition in literature extends from being limited to only bottom-up physics-based simulation models of a group of buildings in the urban context ; both bottom-up and top-down approaches that can be subdivided into statistical, data-driven or simulation based models . While both top-down statistical and bottom-up data-driven UBEMs are widely used for estimating urban energy use, they have two important limitations: (1) accessibility of historic and current energy use data and (2), inability to account for design modifications (Abbasabadi and Mehdi Ashayeri 2019). For its applications Indian cities where such data are scarce (Shnapp and Laustsen 2013), these challenges are insurmountable. Combining the data generated in cities with energy simulations, bottom-up simulation based UBEMs can help assess the impact of energy efficiency codes, amend them to suit the regional context and guide sustainable urban development and governance policies. During the last decade, bottom-up simulation-based UBEMs have been extensively researched for urban and regional analyses where integrated energy supply-demand scenarios are being investigated (Johari et al. 2020). Successful applications of UBEMs include: energy benchmarking and retrofit strategies with lifecycle cost (Hong et al. 2016), developing energy performance based urban design (Bergerson et al. 2015), and designing district energy systems and municipal service network (Fonseca et al. 2016). This paper is focused on Bottom-up, simulation-based UBEMs due to its growing popularity in research and practice for the intended application in sustainable urban development.
Developing a UBEM requires both geometric and semantic data (non-geometric) for the building stock. This includes: (a) building geometry, floor area, glazing area; (b) year built/refurbished; (c) geographical location, neighboring context and climate; (d) building use type and operational patterns; (e) HVAC, lighting, equipment loads; and (f) construction details. These models can also account for the energy consumption in municipal services required for building operations, like water-pumping, waste disposal and common lighting provided inputs for plug loads, capacity, and operational hours, are available. Although bottom-up physics-based models are not solely dependent on metered energy consumption data, it is useful for validating the simulation results and reducing model uncertainties.
The need to extend the LoD concept Biljecki (2013) introduced the concept of Level of Detail (LoD) for geometric data, establishing a clearly defined framework of different levels of granularity in the detail provided about the geometric model. The LoD concept facilitates comparison between models and allows assessment of the impact of increasing detail on model results and accuracy. Like geometric model, various types and resolution of semantic urban data and modeling approaches have also been proposed by academics and practitioners to develop a UBEM Johari et al. 2020).
At present, there is a lack of consistency in nomenclature of these modeling approaches and urban datasets; identifying their resolution, and understanding their choice of selection with respect to the desired UBEM applications and outputs . While oversimplification of urban data and modeling approach might cause large inaccuracies, very detailed inputs are not always necessary to obtain consistent results from a UBEM, as demonstrated by Chen and Hong (2018), Monteiro et al. (2017), andNouvel et al. (2017). Their work indicates that there is no direct relation between increasing the resolution and complexity of the model inputs and methodology with the model accuracy or desired outputs.
For contexts like India's where the current data on building stock and energy use is unstructured with many gaps (Shnapp and Laustsen 2013), maintaining data quality, collecting additional data and managing UBEM complexity with least uncertainty, becomes a huge challenge. To make the best use of the available data and streamline the resources required to collect additional data, there is a need to assess of how increasing detail in one aspect or characteristic of the UBEM affects overall model results.
This presses to shift focus toward a "fit-for-purpose" approach for developing a UBEM. Similar to the concept introduced by Gaetani, Hoes, and Hensen (2016) for occupant behavior modeling in building energy simulations, there is a need to assess the best, most suitable LoD of the model characteristics that provide the most reliable results for the intended application of UBEM and cost the least in terms of effort.
This approach will enable a suitable tradeoff between the LoD, and the efforts and resources spent on filling input data gaps and complexity of model development, without compromising the reliability of the intended UBEM outcomes.
This research aims to develop a Model characterization framework, leading to a Fit-for-Purpose UBEM approach. This framework extends the LoD concept introduced for city 3 D models (Biljecki 2013) to a wider range of UBEM characteristics and uses a detailed literature review to identify the possible LoDs for each.
The "fitness" of the UBEM will vary contextually with the availability of data, resources, the intended application, and the objective of developing the UBEM for different projects. Therefore, along with identifying these LoDs in the model characteristics, this research introduces the concept of Application Case Attributes and Level of Effort (LoE) and ties all three of them in a broader perspective to approach Fit-for-Purpose UBEMs (Figure 1).
The Model characterization framework is then applied to a case study in the city of Ahmedabad. It assesses the impact of increasing the LoD of certain characteristics on 1076 Science and Technology for the Built Environment the model outcomes against the effort required to simulate these. This case study highlights the utility of the framework and the Fit-for-purpose approach and facilitates the development of targeted strategies for supplementary data collection.

Developing the model characterization framework
This section describes the development of the model characterization framework and the identification of different LoD for each characteristic through the literature review.

Previous attempts to develop classification frameworks
The growing popularity and utility of UBEMs has led to the development of various and wide ranged modeling approaches to explain the differences between different models. Nouvel et al. (2017) assessed the impact of data granularity on UBEM results by increasing details in one aspect of the model and comparing its results with a lesser detailed model for a case study of Ludwigsburg. The selection of input data for this comparison and the results obtained were highly specific to the case study and the information available to the researchers. However, their approach to categorize different UBEM simulation inputs based on their priority or sensitivity, motivated the development of the framework discussed in this paper. Ang, Berzolla, and Reinhart (2020) addressed a similar problem of selecting an appropriate modeling methodology for different applications, by introducing a concept of a Minimum Viable UBEM. They identified four typical UBEM applications and suggested suitable details for geometry, archetype templates, weather data, measured energy data and calibration methods for each application through a literature review.
Ferrando, Causone, Hong, and Chen's study ) focused on the main bottom-up physics-based UBEM tools, comparing them from a user-oriented perspective,. They were classified based on: (i) the required inputs, (ii) the reported outputs, (iii) the exploited workflow, (iv) the applicability of each tool, and (v) the potential users. The review presented by them for the research and developments potential in the field of UBEM indicates the presence of different modeling approaches in various aspects like datasets for building geometry, occupant behavior, accounting for microclimate, inter-building heat exchange and model calibration techniques. The review indicated the impact of these approaches the model outcomes had in specific UBEM projects. They suggested that mixed used of databases, dissimilar nomenclature of methodologies, lack of structured framework and standardization for data collection and modeling methodologies leads to a difficulty in choosing Volume 27, Number 8, September 2021 an appropriate UBEM tool, balancing the level of complexity, accuracy, usability, and computing needs. Fennell et al. (2019) aimed specifically to develop a framework which would allow the characteristics of the UBEM, and adequacy of the data to be assessed across different descriptors of the model characteristics for a regional context, especially where data is scarce. In their framework, eleven model characteristics were defined that represent a typical UBEM across four levels i.e., User, Building, Environment and Methodology as shown in Figure 2. For each model characteristic a series of descriptors were established from the literature review to classify approaches of varying granularity.
While all these researchers have addressed a similar problem, the challenge persists in formalizing the definition of the model characteristics and their LoD that exist in theory and practice.
This study uses Fennell et al. (2019) as a starting point and includes an additional model calibration layer. The literature review undertaken by Fennell et al. (2019) is extended using a snowballing approach to encompass 85 UBEM studies. Figure 3 illustrates the geographic spread of the studies and helps to identify the most relevant case studies for new projects in those regions. Descriptions of each characteristic were recorded using Eppi Reviewer (ER4.5 2018) and were sorted into themes from which definitions for different LoD were derived.
The complete details of the classification of each study can be found in Appendix 1. The following sections detail the updated categories of approach for each of the 11 model characteristics identified in Fennell et al. (2019) and includes a new characteristicthe level of application of calibration. We begin by examining the use case of the UBEM before moving on to the characteristic framework.
Use case of the UBEM Figure 3 highlights the trend of models developed for specific cases before being repurposed for different contexts and datasets. Consequently, it is anticipated that along with

1078
Science and Technology for the Built Environment the required degree of accuracy and intended spatial scale, the use case of a particular UBEM will inform key decisions about the types and granularity of data required. Hong, Chen, et al. (2020) identified 4 distinct use cases: Energy use benchmark Existing building retrofits Energy forecasting and performance-based design District energy system design and optimization.
From the observation of statistics and the review of the UBEM projects considered in this research it was found that 75% of the projects are developed to be used for End use energy benchmarking or developing baseline energy models. 17% of the projects performed Energy retrofit studies, followed by 6% projects for Future scenarios. Since data availability is a common challenge in most of the UBEM exercises (Johari et al. 2020), baseline model calibration is essential to minimize uncertainty in modeling assumptions across different model characteristics, which is essential for application in Retrofit and future scenario assessment. Apart from this, only 2% projects worked on district energy systems and municipal service integration. Cities with detailed urban datasets have been able to test advanced applications of UBEM with certain confidence in results post calibration (Romero, Fonseca, & Schlueter, 2017).

The model characteristics
Occupancy Happle, Fonseca, and Schlueter (2018) indicated that occupant behavior can be modeled as presence or activity that represents operation of equipment, lighting, and other systems. They categorized occupancy modeling methods as deterministic and stochastic, that can be space or personbased. From these, only three combinations were practically observed: (1) deterministic space-based, (2) stochastic spacebased, and (3) stochastic person-based approaches. Further these approaches can have or not have diversity. Diversity can again be space-based and may be introduced within the building archetypes or person based accounted by the kind of population, e.g., permanent v/s visitors. From this, 4 levels of detail were established: Deterministicsingle profile: These models use a repeatable hourly schedule of a typical day in the year. The same schedule is applied to all buildings from the same archetype e.g., T. Hong, Chen, Piette, et al. (2018) who use a DOE benchmark survey or Heiple and Sailor (2008), who based occupancy schedules on the ASHRAE benchmark survey. This approach lacks diversity so is unsuitable for modeling peak loads but is easy to define from reference buildings. Deterministicmulti-profile: extend the single profile approach to include a variety of profiles for the same archetype, e.g. Quan and Li (2015), which varies occupancy density or which includes diversity across archetypes. This approach is useful for creating seasonal diversity, and creating diversity across buildings in an archetype. Stochastic space-based: Derived from statistical distributions to predict the occurrence of certain actions. The probability of each action depends upon the previous action. Richardson, Thomson, and Infield (2008) use a Time of Use Survey (TUS) at 10 minute interval for Volume 27, Number 8, September 2021 multiple spaces, occupants and buildings in an archetype using several occupant surveys to derive stochastic profiles and create spatial & temporal variations. Agent based (Stochastic person based): Models every individual's presence, activities and actions based on probability distributions, for example Robinson et al. (2007) developed occupancy models based on TUS with sensor data for occupant presence, time of arrival and window opening to account for mixed mode operations. Barbour et al. (2019) proposed tracking the 3.5 million inhabitants of Boston from mobile phone usage using a statistical Time-Geo Framework. With this approach they observed that occupancy patterns are 5 times lower from deterministic approach. A difference of 15 to 21% was noted in the energy consumption due to this.
Building geometry 3D city models are required to define building's geometry, envelop thermal properties, glazing ratio, ventilation rates, inter-building heat exchange and mutual shading. This geometry can be made available in the form of Semantic City 3 D models. Traditionally, 3 D city models can be developed with Levels of Detail ranging from 0 to 4 (Biljecki 2013). So far, no UBEM models have been developed with LoD4 due to high complexity and unnecessary detailing. The commonly adopted geometry LoDs are: 2D geometry (CityGML LoD0) -Buildings represented as footprints or roof edge polygons, in absence of building geometry arbitrary box models are developed. (Filogamo et al. 2014) used cubiod geometry based on typical floor area to generate a scaled dynamic UBEM. These approaches are suitable for statistical and data driven calculation and are often used to explore solar insolation potential, although over shadowing may compromise results. 2.5D extrusion (CityGML LoD1)-2D geometry of the building footprint extruded to their respective heights. This is the most widely used approach for UBEMs. Cerezo Davila, Reinhart, and Bemis (2016) combined a 2D GIS database with cadastral data for the number of floors to extrude the 2.5D model. They observed that the differences in conditioned volumes to the modeled volume resulted in some errors when not combined with other semantic details. 3D Geometry (CityGML LoD2)represents the actual 3D geometry of the buildings accounting for different shapes of the roofs as opposed to the prismatic flat roof LoD1 geometry. Monien et al. (2017) used LiDAR data and building reconstruction methods to develop the same for Essen, Germany. They found an increase in accuracy of 10% compared with LoD1. Nouvel et al. (2017) also observed an increase of 15-20% in the result accuracy for buildings with pitched roofs and attics while modeling a 3D geometry as compared to 2.5D extrusion. 3D with external features (CityGML LoD3) -3D models with detailed wall and roof structures including doors, windows and other external features. Saretta, Bonomo, and Frontini (2020) used these models for estimating the potential of BIPV on an urban district in Switzerland. They found that exact glazing ratios and shading elements improved model accuracy but resulted in four times increased computational effort.

Thermal zoning
To perform thermal simulation, zones are created to represent and group interior spaces in a building with similar exposure to outdoors, operational patterns, HVAC systems and thermal conditions (set-point temperature, ACH etc) and report their average results. Four classes of approach were identified in the literature: Single Zone per buildingeach building volume is a single thermal zone, e.g., (Robinson et al. 2009). This approach is suitable when the entire building has similar use and construction characteristics. Zone per floor/Space -Separate thermal zone to account for the ground floor and top floor having different adjacencies and exposure e.g., (Chen, Hong, and Piette 2017). This is the most popular approach for dynamic UBEMs and accounts for different uses in a single building. Chen and Hong (2018) used a variant approach in which floor multipliers are used to group similar floors, reducing simulation time by 3 times with marginal increase of 2.6% error. Core-Perimeter zoning -Accounts for impact of different orientations. Perimeter zoning along the façade within 5m depth, is also prescribed by ASHRAE 90.1x G. Dogan, Reinhart, and Michalatos (2016) proposed an autozoning algorithm to implement this approach. Chen and Hong (2018) found an improved accuracy of predicted cooling & heating loads by 7.5% & 16.9% respectively with this zoning approach. Detailed internal zoning -Further divides interiors spaces following the building's interior layout e.g., (Yi and Peng 2019) and (Remmen et al. 2018). This approach requires detailed information of the internal layouts and is typically restricted to scaled dynamic and reduced order models.

Building archetypes
Reinhart and Cerezo  indicated that development of archetype requires segmentation and characterization. In segmentation, the building stock is classified according to the building's shape, age, use, energy consumption, systems, or other parameters. Characterization assigns a complete set of thermal properties including construction assemblies, schedules and building systems for the archetype. Johari et al. (2020) highlighted that most studies relied on simple archetype development from the available data. Clustering algorithms can also be used for classification based on multiple parameters. They indicated that archetype classification and characterization can be either deterministic or probabilistic. Typically, classification by building use type represents the operational profile and the building age represents the construction materials and the systems. In this proposed framework, Building Archetypes only considers classification of archetypes. Characterization and calibration 1080 Science and Technology for the Built Environment of the inputs is separately dealt with in the Treatment of Uncertainty. Three classes of approach were found: Single criteriabased on use type (e.g., residential, commercial etc.) or vintage. Building use types are determined from literature or cadastral data. Archetypebased inputs (e.g., operating schedules, building fabric) are the same for all buildings, for e.g., (Darren Robinson et al. 2009).
Multiple with a single criteriona variety of different archetypes are included within a single criterion, such as building use type. However, each building within the archetype uses the same inputs. To elucidate, residential buildings may be divided into types such as attached, detached, apartments, etc. The study conducted by Davila (2017) found the error reduced from 16% to 4% by having multiple residential archetypes based on age group. Multiple with multiple parameters -Multiple combinations across different parameters e.g., Use type and age -Apartment 20 yrs. old, Detached 40 yrs. etc. e.g., Monteiro et al. (2017), Coffey et al. (2015). This approach required detailed data on building type and class, age, roof shape, size, and neighborhood characteristic. The need to create multiple archetypes depends on the variability and the sensitivity of the parameters i.e. when their representativeness toward energy use is important.

Context
Effects of local shading, shared walls between buildings, radiation from sky & neighboring buildings and wind patterns can be accounted in a UBEM. The following approaches were found in the reviewed projects:  (2018) used a pre-processing algorithm to remove blocked surfaces, only considering effective shading surfaces in the context. This improved simulation time by 70%. Detailed contextual interaction: Includes long-wave radiant exchange between buildings, and microclimatic effects. This requires modeling surface properties of contextual structures, additional algorithms, and simulation tools. Hong, Ferrando, et al. (2020) used this method to account for longwave radiant exchange and waste heat from HVAC systems with the details of system type and COP. Palme & Salvati, (2018), modeled anthropogenic heat transfer and evapotranspiration rate, urban microclimate co-simulated with UBEM, to account for dynamic inputs. This method increases complexity of simulation and is typically adopted in reduced order models.

Climate
Usually, a single typical meteorological year (TMY) weather data either for the location or the closest city is used. Only a few models account for micro-climate data (T. Hong, Chen, et al. 2020). The following approaches for addressing climate in UBEMs were found: Steady state -Long-range average values for outdoor weather conditions using historic weather data, e.g. Gupta (2009).
Typical Meteorological year (TMY) -Weather data obtained from historic measurements, e.g. Reinhart et al. (2013). This approach is the most used in dynamic UBEMs. Urban Microclimate -Microclimate can either be measured from a local weather station or can be generated in the simulation model or separately. This approach can account for urban heat island when recorded for a longtime duration. Reduced average error in simulations to 10% as opposed to TMY were observed by using a Local weather station by Nouvel et al. (2017).  used simulated Microclimate data and observed reduced simulated energy values by 2-11% as opposed to TMY, but with a 15% increased simulation time.

Municipal services
Most of the reviewed projects do not account for energy consumption for municipal services. A UBEM can account for total municipal service energy derived from the publicly available dataset or calculations with or without Spatio-temporal simulations: Included without spatial mapping -Energy use is calculated from total load, operational hours and benchmark values without spatially locating the services, e.g. Amado et al. (2018). This method can be used to estimate energy consumed by services like street lighting, water supply and waste-water treatment. Included with spatial mappingenergy use is calculated from mathematical models considering spatial configuration & layout, e.g. the studies conducted by Agugiaro, Robineau, and Rodrigues (2017), Fonseca and Schlueter (2015). This approach requires details of metered data for individual units and network layouts and is typically used with reduced order models and for calculating energy consumed by district heating or cooling and other renewable energy systems.

Stock dynamics
Depending upon the nature of the investigation, the building stock can be modeled as either static or dynamic: Dynamic simulation -This involves simulating each building individually and allows spatial diversity and contextual interactions to be accounted for e.g., Cerezo Davila, Reinhart, and Bemis (2016) and Chen, Hong, and Piette (2017). This approach offers improved accuracy but increased time due to complexity of calculations and may require high performance computing resources.
Temporal resolution UBEM can cover temporal scales from an hour to a day, a week, a month, a year, and one or multiple decades . Many projects report the results in Annual Energy use intensity (EUI) kWh/m2 (Chen, Hong, and Piette 2017;Fonseca et al. 2016;Perez, K€ ampf, and Wilke 2011). Some report monthly or bi-monthly energy consumption values to establish trends Krayem et al. 2019). Cerezo Davila, Reinhart, and Bemis (2016) reported hourly energy data to perform peak demand optimization scenarios.

Treatment of uncertainty
To reduce the discrepancies between predicted energy demand and actual measurements, calibration methods are needed in building energy studies. Uncertainty in the input data is reduced through model calibration and subsequently, the assumptions must be validated for consistency in obtained results. The approaches for treating uncertainty were classed as: Deterministic -An unstructured approach involving manual adjustment of a few uncertain parameters to a fixed defined value. It requires some metered energy data and typically observed range of values of these uncertain parameters. Krayem et al. (2019) performed manual calibration and post-processing simulation results to improve result accuracy. Berthou et al., (2019) followed an iterative process of calibrating the most sensitive parameters. Probabilistic -Uncertain inputs adjusted based on probability distributions of a range of values e.g., Cerezo Davila et al. (2017) used this approach with the range of values for input parameters, their probability of occurrence and sensitivity toward the results. Their approach leads to the creation of thousands of parametric combinations to simulate. The errors reduced error to 4% but with 40 times increase in simulation time. Bayesian calibration -Combines probabilistic distributions of input parameters with prior knowledge of outputs.  found this to be the approach with the highest accuracy but with 5 times increase in simulation time as compared to the probabilistic approach. C. K. Wang et al. (2020) found significant improvements in the performance of reduced order models following Bayesian Calibration with less than 5% errors in results

Model calibration level
To improve reliability of simulation results, UBEMs are calibrated to match metered energy data. However, a large no. of UBEM projects were found to not being calibrated due to lack of available data (Ang, Berzolla, and Reinhart 2020). UBEM can be calibrated at multiple temporal and spatial scales e.g., annual, monthly, or hourly on the building, block, or district. Calibration data for different UBEM projects ranges from 10 minutes time step at building level through monthly data to annual aggregate data (Berthou et al., 2019). The level must depend on the use case, temporal resolution and, more importantly, to the measured energy data available to the modeler (Davila 2017). The granularities in calibration level are: -Archetype level -UBEMs are calibrated to match the extremes, average and distribution of benchmark or representative energy consumption values e.g., Cerezo Davila, Reinhart, and Bemis (2016), Berthou et al., . This approach is the most frequently used in the absence of granular data. Typically, annual metered energy bills of sample buildings and national level monitoring campaign data is used for calibration. Cerezo Davila, Reinhart, and Bemis (2016) observed that due to variation in building level results the Archetype values average out to give error in the range of 1-20%. However, high inaccuracy in the range of 12-55% were observed at regional scale and 5-99% at the urban scale. Aggregate level -Single aggregate demand values used to calibrate models of complete districts, typically measured annually for validation, e.g., Cerezo Davila, Reinhart, and Bemis (2016) used total metered energy consumption at ZIP code level from all buildings, and reported average errors of 5-20% at the annual level. Remmen et al. (2018) reported 5.6% average discrepancy after calibration at annual level from the District level energy consumption data.
Building Level -Metered energy data for each building being simulated in UBEM used for calibration building by building.

The model characterization framework
The LoDs for each model were defined based on the evaluation of the literature. Two more characteristics were added to the framework proposed by Fennell et al. (2019) in a new 'Model Calibration' layer, representing the level & data used for calibration, leading to 13 Model characteristics. Based on increasing complexity, the approaches within each characteristic were assigned a LoD from 0 to "n", 0 being the simplest, and "n" being the most complex. As shown in Table  1, the first, second & subsequent rows represent the layers, the characteristics & the LoDs for each characteristic, respectively. A UBEM can be defined by combination of any of these LoDs across all model characteristics.
The section ahead will demonstrate how the different model characteristics relate to UBEM attributes in the literature.

Patterns of application
After reviewing each study and assigning a Level of Detail for the approach to each characteristic, the results for all

Application case attributes
The most appropriate LoD for each model characteristic will depend on the specific application case of the UBEM project. These specifics of each project have been termed as "Application Case Attributes" in this research. These attributes include: i) Location and climate of the project, ii) Spatial scale, iii) Use case/application of the UBEM, iii) Source of urban datasets, iv) Required/Desired model accuracy with respect to measured data and v) Modeling infrastructure available including computation power and time, data collection equipment, meters, sensors etc.

Influence of application case attributes on model characteristics
Three Application Case Attributes were selected for further analysis based on their significant influence on the selection of LoDs as observed with the literature review conducted for this research À 1) Scale of the project; It was found to influence the level of effort, model complexity and model accuracy (C. F. Reinhart and Cerezo Davila 2016), 2) Use case of the UBEM;impacts the sensitivity of different model characteristics, as observed by Ang, Berzolla, and Reinhart (2020), and 3) Result accuracy desired or obtained; which is largely affected by the granularity or Level of Detail of the various inputs (Nouvel et al. 2017) and methodologies . Based on the learnings from different projects in the literature review, the model characteristics that were most sensitive were studied for their relationship with these application case attributes. This understanding can help formulate a relationship between the LoDs and the factors that govern their relevance and guide the future development of new UBEM projects.

Scale of the project
The scale of each project in the literature review was recorded and compared, from these three groups of UBEM scales were identified: i) Block levelup to 500 buildings, ii) Neighborhood level À 500 to 5000 buildings, iii) City level À5000-1,00,000 buildings iv) Regional/National level beyond 1,00,000. These scales are further subdivided to distinctly represent different urban levels. It was observed that choice of form of calculation is greatly affected by the scale of the project. Figure 5 suggests that maximum projects are conducted on block-level with less than 100 buildings with dynamic & reduced-order models. For scales   LoDs. 80% of Future scenario UBEM work on neighborhood scale. Only 2 projects have used the UBEM for District Energy systems. One on less than 100 buildings (Fonseca et al. 2016) and one on City scale (Agugiaro, Robineau, and Rodrigues 2017).

Model accuracy
The metrics used for reporting a model's accuracy were observed as: Percentage Error, KS test and Mean Absolute Percentage Error & Correlation Coefficient. Over 70% of projects reported percentage errors. Figure 7 shows the range of errors observed with different treatment of uncertainty for only these projects that reported model accuracy in percentage error. It was observed that the mean error in Deterministic method is more as compared to Bayesian and probabilistic methods. However, this trend may also be affected by other model characteristics having different LODs.

The concept of level of effort
The Fit-for-Purpose model will identify the most suitable LOD for each model characteristic that serves the intended application case with the least level of effort. The effort spent in performing a UBEM project can be conceptually divided into three stages as indicated in Figure 8; i) Efforts in Data collection, ii) Efforts in Pre-processing the data which includes Model development, preparing simulation inputs, validating the assumptions and iii) Efforts in Post processing which includes reducing uncertainty and analyzing the results for decision making. These efforts can be both manual e.g.,surveys, data collection and computational. As observed from the Literature review, for each UBEM project the degree or Level of Effort (LoE) across these three stages can significantly vary based on the availability of resources, technical know-how and the application case attributes. Thus, for each UBEM project the modeler will have to assess their own LoE across these three stages and decide rationally upon the usability of a higher LoD in the Model characteristics.

Observations from literature review
An attempt was made to group similar projects based on their LoD in different characteristics using K-modes clustering algorithm, however no coherent groups were formed. The observations show that it is unlikely that a meaningful relation between the LoD and modeling objectives can be established through literature. At present each research group makes its assessment of what is the required LoD for their purposes and depending on the other UBEM attributes. This

1086
Science and Technology for the Built Environment is often implicit within their work and may be driven by practical considerations of data availability, rather than the actual requirements of the study. Parallel research is focused on developing new tools and methods to reduce simulation time and complexity. This indicates that the framework is essential in guiding newer research to optimize their resources to achieve what is desired.

Application of the model characterisation framework to Ahmedabad
This section attempts to understand the existing data and infrastructure available for Ahmedabad and identify the application of the developed framework studying the impact of various LoD to develop supplementary data collection approach for the UBEM of the entire city. The city of Ahmedabad in Gujarat has been one of the most important trade centers in western India and a major industrial and financial hub. Owing to a rapid development at a Compounded Annual Growth Rate (CAGR) of 3.18 for commercial & 3.61 in residential stock respectively (Rawal, Pandya, and Shukla 2018), robust planning through UBEM is required for achieving energy efficiency, analyzing the impact of implementing different policies and tailor them to suit both the existing and the upcoming building stock of Ahmedabad.

Public datasets in Ahmedabad
CEPT Geomatics Lab used the administrative data along with satellite imagery to extract building footprints and develop a GIS database. 1. The extracted building footprints didn't match the plot boundaries and overestimated building area by including projections like balconies, porch, temporary structures and unoccupied areas like lift/staircase cores, shafts & corridors. 2. Building heights were estimated based on maximum permissible FSI and not on the actual existing building. 3. The Address on the tax assessment data did not locate the property on the GIS map for many properties. 4. Archetype characteristics were assigned from literature studies due to a lack of actual surveyed data for Ahmedabad. 5. Energy bills were inaccessible due to data privacy regulations. Only total energy sales in Ahmedabad were known from reports published by the electricity Distribution Company (DISCOM).
To overcome these challenges and fill the data gaps, the model characterization framework is applied to identify the impact of higher LoD in different characteristics on the simulation results. This will enable identifying the most sensitive parameters for different archetypes and inform where a higher LoD is beneficial. The test case presented here deals with identifying the effort in the Post processing stage and uses Simulation time against which the impact of different LoDs is assessed.

Test case
A UBEM for 0.3 km 2 area of the Central Business District of Ahmedabad comprising 250 buildings was developed, as shown in Figure 9. The area is distinguished into three clusters of buildings, referred to as 1) Zone A has 87% Volume 27, Number 8, September 2021 residential use. 2) Zone B has a diverse mix of uses with 18% commercial office, 25% retail and 47% residential. 3) Zone C has 32% educational buildings. The building ages have been classified into 3 categories less than 20 years, 20-50 years and older than 50 years. Age of the building determines the construction materials and efficiency of systems. Building heights will make an impact on mutual shading.

Simulation workflow
The UBEM is created through a customized python script, SimStock (Korolija 2020) shown in Figure 10. The GIS data collected and developed by the Department of Planning at CEPT University contains shapefiles of the buildings with important data like building use, number of floors, age, linked with a unique building ID. Archetype characterization with occupancy, equipment, lighting, and HVAC details are associated with the building use. The building shapes and number of floors are then used to extrude a LoD1 geometry and glazing ratios are defined by use type to create thermal zones. After this construction, templates are assigned to the buildings based on archetypes or based on age. Once all the input data for thermal simulations is linked with the GIS data, IDF files are generated for each building with context buildings as shading objects. These IDF files are then batch simulated in Energy Plus using the SimStock python script.

Input data sources
The inputs for the UBEM were largely derived through literature review as cited in Table 2. Some sample electricity bills were collected for residential buildings in an online survey and a walkthrough survey of the site was performed to note the discrepancy in the GIS data and to collect information on construction materials, glazing ratios and window shading.

Assessing the impact of increasing levels of detail
A baseline UBEM referred to as Iteration 1 was developed with LoD across each iteration for which data was available. Subsequent iterations (highlighted in green) were developed by incrementally changing LoDs in three key characteristics, i.e., Occupancy, Archetypes, and treatment of uncertainty, as shown in Table 3; to study their impact on the UBEM results with respect to addition in simulation time.
The data required to develop the higher LoDs was sourced from field surveys and modelers assumption. The results discuss the EUI (kWh/m2) values for Residential, Education & Commercial Buildings across these combinations.
Iterations1 & 2 compare the difference between Deterministic single and multi-profile occupancy schedules with temporal diversity only. In iteration 2 for residential buildings, seasonal operation of air conditioners and evaporative coolers is included. For educational buildings, the vacations were considered and for commercial buildings working Saturdays were considered once a month.
Iterations 2 & 3 compare the impact of multiple archetypes based on age & building use. The construction envelope was changed based on the age of the building.
Iterations 3&4 compare the impact of probabilistic assignment of Window to Wall Ratio based on use types. Finally, in iterations 5&6 the comparison between Deterministic assignment of 0.5 m deep overhangs v/s Probabilistic with overhangs from 0.3 to 0.6m depth was made. In iteration 5

1088
Science and Technology for the Built Environment the overhangs were added to residential and educational buildings only and in iteration 6 overhangs are also added to commercial buildings older than 50 years. The details of input parameters varied across different iterations are reported in Figure 11.

Results
The simulated EUI (kWh/m2) of all the buildings in different zones were compared to a baseline reference value derived from the literature data noted in Table 3, to check  Figure 12 shows the distribution shape with the median & interquartile range shown as dashed lines. The colored dashed line shows the baseline EUI value for each use type.

Significance of differences between EUIs
The Kolmogorov Smirnoff (KS) test was used to assess the significance of differences between results for each iteration for the different building classes. In the KS test, p-value greater than 0.05 means the null hypothesis that the two samples are drawn from the same distribution is rejected. If the null hypothesis is not rejected, the lesser complex LoD in that case would be adopted. Figure 13 suggests Iterations 1 & 2 have maximum difference for Residential buildings due to seasonal operation of ACs. At the neighborhood level there is a difference between iteration 2 & 3 due to variation in building ages distributed in the area. Other iterations do not have a significant difference. Like residential buildings, the educational buildings are also affected the most with iterations 1-2 & 2-3, due to occupancy profiles and construction details ( Figure 14). Figure 15 suggests that for Commercial buildings occupancy patterns do not have significant variation seasonally thus iterations 1&2 have no comparable difference. At the neighborhood level there is a difference between iterations 2 & 3 due to variation in building ages distributed in the area. This

1092
Science and Technology for the Built Environment implies differences in construction materials make more impact than other parameters.

Evaluating the fit-for-purpose iterations
The effective difference between two LoD combinations represented by the p-value from the KS test is reported against the difference in simulation time in Table 4. This comparison will give a suitable tradeoff between different LoDs and suggest the path for maximum benefit with the least effort for this test case. On the graph, in Figure 16 the X-axis represents the difference in simulation time and the Y-axis represents the P-value. The iteration is defined by different shapes and colors across use types as shown in the Table 4. Based on the results we can divide this graph into 5 segments. i) Preference 1on the lower left corner represents extremely significant change with least effort, thus this change must be incorporated. ii) Preference 2top left below the gray band, has an acceptable change and does not cost much effort. iii) Preference 3on the bottom right corner has extremely significant change but requires more effort and needs a call for decision. iv) Preference 4the least priority to iterations having an acceptable change but extremely high time. At last, the values in the gray band are insignificant and may be avoided. Thus, the iterations highlighted in green in Table 4 are the ones selected for different building types.

Observations from test case simulation results
The simulation and data collection exercise for the pilot study in Ahmedabad indicates the need to focus on the following areas: 1. The impact of urban morphology in terms of density, height, age, & mixed use which is heterogeneous in the city. Similar areas and building types need to be grouped and the archetype definition must account for these differences. 2. Economic status is also vital while defining archetypes as it relates to appliance ownership, air-conditioned area and overall energy use (Garg, Maheshwari, and Upadhyay, 2010) 3. Not accounting mixed mode operation, the UBEM overestimate cooling energy use by 1.5-2 times in many buildings. Thus, data on operation of ACs needs to be collected in more detail especially in residential buildings. Core and perimeter zoning can separate conditioned and non-conditioned zones to account for spatial mixed mode.

Conclusion
The extensive literature analysis of 85 UBEM projects conducted in this research suggests: 1. There is no standardization made for data requirement for developing UBEMs. Each research group works with their available resources and technical know-how. 2. A lack of standardized nomenclature and classification of model inputs, methodologies, modeling objectives and consistency of metrics to report simulation results makes it challenging to compare or correlate different UBEM projects. This makes it difficult for new users to find relevant case studies and adopt standardized workflows to develop a UBEM.
The proposed model characterization framework in this research takes a step forward to bridge this gap. The application of this framework as demonstrated in the case study for Ahmedabad serves three purposes: À 1. It enables to identify the LoD of the existing data 2. Through pilot study of a sample area the modeler can identify suitable LoD across different Model characteristics required for different building types, 3. These pilot projects can help identify requirements of additional data which can then be prioritized by the modeler or the research group.
The developed framework and the application of the same through the case of Ahmedabad suggested that the choice selecting a higher LoD for different model characteristics may vary across building use types and the urban morphology. Also, these Model characteristics may not be consistently sensitive on all urban areas. Thus, it is important to conduct pilot projects with more granular data to understand this relationship on a particular urban context before scaling up the exercise. While different LoD may be suitable for different building types and locations, there needs to be a consistency in LoD on the overall UBEM of the city. Thus, the pilot projects will allow understanding the variation and address this problem by selecting the LoD that is more predominantly observed to maintain consistency.
This framework has a potential to be revised and updated with newer characteristics and LoD as processes change, data availability improves, and new methods are developed.
Appendix 2 gives a visual description of the framework and step-by-step application exercise, that can be used and scaled up to develop a fit-for-purpose UBEM for the entire city.

Application of the model characterization framework various UBEM projects
The Model characterization framework and the approach toward Fit-for-purpose UBEMs as presented in this research can be adopted by upcoming projects in the following manner: The Green highlighted values represent the significant change in P-value and those iterations for which the P-value cell is colored green is selected.
1. Use the framework to study the most appropriate literature reference through its model characteristics and other attributes that match the application case attributes of your project. 2. Evaluate the available data through the framework to identify the LoDs that are achievable and gaps in the data. 3. Study the city's urban morphology & use patterns to identify a few representative zones for conducting pilot/ test projects. The zones are unique from each other but can represent other similar zones in the city. 4. A detailed analysis of these zones as pilot projects can help scale the modeling process for the entire city most effectively. These zones have distinct urban characteristics and must be dealt with individually. 5. For each zone through the framework of achievable LoDs, perform Urban simulations with different iterations possible through all characteristics. 6. Difference between the energy use patterns from subsequent iterations can be studied with respect to the effort spent. This enables the selection of different iterations to be adopted with a suitable tradeoff with different stages of effort; whichever is most significant for the user. Like in the test case presented in this research, simulation time was used to assess the tradeoff. The modeler may select any other quantifiable metrics to evaluate the level of effort. 7. For each representative zone, this detailed study helps in identifying the best LoD combinations through all characteristics for different building use types. 8. With the most suitable LoDs for each building type in each representative zone, we can now scale up the modeling exercise. 9. Suitable LoDs for each representative zone can now streamline data collection and modeling efforts for other zones like them. 10. Scale up the study and develop a Fit-for-purpose UBEM of the entire city with the least effort for the maximum benefit.