A simulation approach of indoor temperature in existing buildings driven by short-term field measured data

ABSTRACT Simulation of indoor temperature provides important references for thermal environment not only for buildings at design stage but also for existing buildings. The current thermal environment simulation software tools suit for buildings at design stage, however not for an existing building. A model is proposed to simulate indoor temperature combining Optimization multivariable grey prediction model (OGM(1,N)) and Elman neural network. The proposed model is trained by short-term field measured data. A unit is assembled to measure and record thermal parameters in a case natural ventilated building at half-hourly intervals during 7:00 May 29 and 6:30 June 2010. Programming in Matlab implements the proposed model and referenced models. The maximum mean deviation is 0.46°C, the maximum standard mean square deviation is 0.65°C. Three referenced indoor temperature simulation models, OGM(1,N), Elman neural network, and Designer’s Simulation Toolkit are executed, respectively, in case building to provide comparison. Compared with referenced models, the proposed model has higher accuracy and stronger robustness. It is expected that this study provides important references for thermal environment assessment in existing buildings using short-term field measured data.


Introduction
Indoor thermal environment simulation provides important references for building thermal environment design, assessment as well as energy efficiency (Andarini 2014). Simulation of indoor temperature provides important references for thermal environment not only for buildings at design stage but also for buildings in operation. Field measurement provides important references for thermal assessment in existing buildings. Long-term field measurement provides more important references for thermal environment assessment than short-term. A field measurement of relative short duration is low cost. Furthermore, a short-term field measurement has less impact on occupants in an existing building. Short-term field measurement provides limited reference for thermal environment assessment since operation of a building varies and weather fluctuates. Fortunately, prediction modelling is capable to excavate short-term field measurement data to provide more reference for thermal environment assessment. A thermal dataset for the heating season was constructed using the prediction with short-term measured data which saved about 77% of the heating season measurement's time (Sözer and Aldin 2019).
There are many thermal environment simulation software tools based on physics theory such as DeST, Energyplus, Fluent Airpak, DesignBuilder at present. These physics-based tools are widely used and provide significant references to thermal environment design and assessment in building industry and academic research. The current thermal environment simulation software tools suit for building design stage; however, they have disadvantages for an existing building. The indoor thermal environment is affected by a great deal of aspects including local climate and weather, thermal performance of building envelopes, building services, behaviour of occupants in building. The interaction of the indoor thermal environment and these aspects is complicated and nonlinear. It exists a gap between current indoor thermal environment tools and the practical operation of an existing building (Guo et al. 2014;Robinson). Occupant behaviour and physical parameters of the indoor and outdoor environment cause gaps between on-site measurement and simulation by DeST (Chen and Liu 2017). Although old houses are found in the housing stock which are affected by infiltration and condensation, resulting in humid walls, software generally consider properly maintained buildings, assuming dry walls (Martínez-Ibernón et al. 2016). The current thermal environment simulation tools are faced with practical problem while used in an existing building, for they require too detailed input information which is hardly available such as the detailed structure of building envelope. Besides the current software tools, many physics-based modelling methods have been studied as well. A dynamic model based on thermal-electrical analogy is proposed for heat transmission simulation of a building (Giacomo, Grazia, and Cammarata 2017). A lattice Boltzmann method based 3D computational fluid dynamics technique has been implemented on the graphics processing unit for the purpose of simulating the indoor environment (Amirul et al. 2017).
The field measured data of indoor thermal environment is the results of building design and the operation of the building; therefore, it contains abundant potential resource. Many prediction models driven by data are investigated. Cornick and Kumaran (2014) proposed a prediction empirical model for indoor temperature through comparing experiment data in specific building samples. Machine learning becomes an effective solution to thermal environment simulation in recent years due to the relative high prediction accuracy. Artificial neural network (ANN) is widely used in the field of thermal environment prediction. ANN is used to predict the hourly thermal performance data during the heating season based on short-term measured data (Sözer and Aldin 2019). ANN is used to predict the indoor temperature of an institutional building in Australia (Afroz, Shafiullah, and Higgins 2017). Wang, Cai, and Wang (2015) develop a prediction model for indoor temperature through the support vector machine. These modelling methods offer significant reference for thermal environment; however, they generally require big size sample of training data. Sahar et al. (2021) combine physics-based method and data-driven method to predict the real-time transient temperature in data centre, however, this hybrid prediction suffers from quite complex structure.
"Evaluation standard for indoor thermal environment in civil buildings" (GBT50785-2012) (MOHURD 2012) is the current practical standard in China. A set of thermal parameters should be field measured, which cover the outdoor and indoor temperature, outdoor and indoor relative humidity in term of standard GBT50785-2012, the least requirement of measurement period is 24 h. The required short-term measurement period reduces the disturbance to occupant and cost of field measurement. The field measurement of short-term period provides limited reference for thermal environment assessment. Therefore, there is a practical and academic need for prediction of indoor thermal environment under short-term field measured data. The grey prediction model was proposed by Deng (1982). The grey prediction model is widely used for modelling of simulation in many fields (Lin and Liu 2000;Wu, Wang, and Yang 2012). The grey prediction model is good at the uncertainty problems with small size sample training data, which is difficult to be solved by traditional probabilistic and statistical theory (Zeng, Yin, and Meng 2018). The grey prediction model has excellent performance on prediction modelling of small size data (Deng 1986;Zeng et al. 2016). Because of the low requirement on the size of data, the grey system theory has been widely adopted to estimate the behaviour of unknown systems (Bezuglov and Comert 2016;Wang, Wang, and Zhang 2020). The grey prediction model is selected to predict indoor environment in this study since it suits for prediction modelling under small size data.
The single-variable grey prediction model GM(1,1) is used widely. Thananchai (2008) uses GM(1,1) to generate discrete time sequence data of the outdoor temperature based upon past known data of the outdoor temperature. The multivariable grey prediction model GM(1,N) suits for indoor environment predication since there are more than one explanatory variables. Optimization multivariable grey prediction model (OGM(1,N)) was proposed to improve GM(1,N) by introducing a linear correction term and a grey action quantity term to GM(1,N) (Zeng et al. 2016). The linear correction term reflects the linear relations between dependent variable and explanatory variables. The grey action quantity term shows the degree of fluctuation in the dependent variable. OGM(1,N) optimizes GM(1,N) in higher estimation accuracy. In addition, OGM(1,N) improves the robustness of traditional grey model since OGM(1,N) is capable of modelling of greater fluctuation in dependent variable than GM(1,N).
The grey theory has many convincing applications in small size data mining. However, the grey theory is lack of self-adaptation and parallel computing ability; therefore, the grey system has the risk of oversensitive to input data. Though OGM(1,N) is an optimized model, minor fluctuation of input data leads to the recalculation of entire system. Moreover, the simulation accuracy of grey predication cannot be adjusted to one target value.
Elman neural network is proposed in 1990 (Elman 1990). Elman neural network is selected to predict indoor environment in this study since it has high self-adaptation and strong tolerance to fault. ANN is widely used in many fields since it has good robustness and fault tolerance (Jiao et al. 2016). Elman neural network is one kind of feedback ANN, has been optimized and used widely for prediction modelling (Antoni and Macie 2016). Elman neural network has strong self-adaptation and self-learning ; therefore, minor fault in input data has minor impact on the entire network. Moreover, the training process iterates until the target accuracy is achieved in Elman neural network. However, the performance of Elman neural depends on the big size of training data. Elman neural network is an effective solution to overcome the weaknesses of the grey theory in modelling of indoor temperature by small size data. Therefore, OGM(1,N) and Elman neural network are combined to simulate indoor temperature in this paper. OGM(1,N) and Elman neural network are combined since they compensate each other in accuracy and robustness.
This study aims to provide references for thermal environment assessment at relative short period of field measurement in existing buildings. The proposed model is on the combination of OGM(1, N) and Elman neural network. Following Section 2 illustration of method of simulation of indoor temperature using the proposed model in detail. In Section 3, the identifying of explanatory variables of thermal environment and the construction of field measured database in a case building are illustrated. In Section 4, training of simulation model in the case building is demonstrated. Section 5 show the simulation result. Section 6 is discussion, three referenced indoor temperature simulation models are executed, respectively, to provide comparison to the proposed model. The three referenced models are the indoor temperature simulation model based on OGM(1,N), the indoor temperature simulation model based on Elman neural network, and DeST (2022). Section 7 is the conclusion and outlook of this study.

Training of the proposed model of indoor temperature prediction
The proposed model is on the basis of the combination of OGM(1,N) and Elman neural network. The structure and the training steps of the proposed model is mapped out in Figure 1. The training stage consists of the following five steps.

Step 1. Identifying explanatory variables
The indoor temperature is affected by many factors including building design, building thermal physical parameters, local weather variables, building services, and behaviour of occupants in building. These factors and indoor temperature form a complex nonlinear system in the context of development of a simulation model of indoor temperature driven by data. The indoor temperature is dependent variable and these factors are explanatory variables in the structure. Therefore, the identifying of explanatory variables is the precondition for modelling of indoor temperature prediction. The dependent variable is noted by X 1 . It is assumed that N-1 explanatory variables are identified, these explanatory variables are noted by X 2 ,X 3 , . . .,X N .
Step 2. Collecting data A database covering the dependent variable and explanatory variables are essential for construction and training of the proposed model. Field measurement is a convincing way to collect data in indoor thermal environment assessment.
It is assumed that X 1 ,X 2 ,X 3 , . . .,X N are field measured in a period with an equal interval time and recorded simultaneously in X . . . ; . . . ; . . . (1) These elements in the field measured array are normalized into 0~1.0 to prevent the importance of the data with smaller dimension from submerging in the data with larger dimension. X ð0Þ i is the normalized sequence of X 0 i ð0Þ . The normalization method is listed in X ð0Þ i ðeÞ: The e th element in the normalized field measured sequence of variable X i ; X 0 ð0Þ i ðeÞ: The e th element in the field measured sequence of variable X i ; X 0 ð0Þ i ðmaxÞ: The maximum element in the field measured sequence of variable X i ; X 0 ð0Þ i ðminÞ: The minimum element in the field measured sequence of variable X i .

Step 3. Dividing data
The normalized filed measured array is called as parent array in this paper. The parent array is divided into several child arrays which has two advantages. Firstly, the division of array improves the quality of OGM(1,N) since it disperses the risk of abnormal data which is unavoidable in measurement. Secondly, the division of array produces bigger size of output data by the model based on OGM(1,N), which provides bigger size input data for Elman neural network in the next step.
It is supposed that the parent array is divided into ti child arrays averagely, there are M elements in a child sequence. It is clear that M equal to the integer of m/ti. Child sequences form ti child arrays of Step 4. Modelling of indoor temperature by OGM(1,N) The step of modelling of indoor temperature by OGM (1,N) consists of two sub-steps.
Step 4.1 Construction of model based on OGM(1,N) using child arrays Each child array is used to construct a model based on OGM(1,N) for indoor temperature simulation. Each constructed model based on OGM(1,N) is expressed by a vector of coefficient as Construction of models based on OGM(1,N) using child array is the implements of Formula (3), Formula (4), Formula (5). Each child array yields a model based on OGM(1, N) for indoor temperature simulation, there are ti vectors of coefficient as X i ðcÞðGÞ (i = 1,2, . . .,N; c = 1,2,3, . . .,ti) is the accumulated sequence of X i ðcÞ in term of Formula (4); . . . ; tiÞ is calculated with least square method by Formula (5). While the coefficient sequence of A c is calculated out as the constant sequence, the model based on OGM(1,N) is constructed.
Step 4.2 The simulation of indoor temperature based on the constructed model of OGM(1,N) The array of X i ð0Þ ðeÞ (i = 2, . . .,N;e = 1,2, . . .,m) is inputted into the constructed models of OGM(1,N),  (7), Formula (8). G � X 1 ðcÞðGÞ is the normalized accumulated simulation sequence of indoor temperature using the constructed model based on OGM(1,N) of child array c, calculated by Formula (6).
Step 5. Modelling of indoor temperature by Elman neural network The structure of Elman neural network is composed of input layer, hidden layer, connection layer and output layer mapped in Figure 1. Each layer contains one or more than one artificial neuron. Each artificial neuron is corresponding to a sequence of variable. Each connecting pair of neurons located in different layers is assigned an initial weight randomly. The weights between neurons are adjusted by iterative training steps. Compared with traditional neural networks such as Back Propagation network, Elman neural network has an additional feedback layer called connection layer as a delay operator between input layer and hidden layer. The connection layer operates a backward feedback loop with the hidden layer. The connection layer is used to remember the past state and map the dynamic features by storage of its internal state. The output of connection layer is used as the new input to the hidden layer at the next iterative training step. Therefore, Elman neural network is stronger than traditional neural networks in robustness and fault tolerance because of the ability of dynamic memory in connection layer. The parent array and the normalized simulation sequences of indoor temperature by OGM(1,N) are inputted into the Elman neural network to train it. The normalized sequence of indoor temperature by field measurement X 1 ð0Þ is used as the instructor value relating to neuron net3(1) in output layer. It is clear that there is one artificial neuron in the output layer. The normalized sequences of explanatory variables by field measurement X i ð0Þ ði ¼ 2; 3; . . . ; NÞ relate to input layer, X 2 relates to neuron net1(1), X 3 ð0Þ relates to neuron net1 (2), and so forth, X N ð0Þ relates to neuron net1(N-1). The normalized simulation sequences of indoor temperature by OGM(1,N) also relate to input layer, G � X 1 ð1Þ relates to neuron net1(N), G � X 1 ð2Þ relates to neuron net1(N+1), and so forth, G � X 1 ðtiÞ relates to neuron net1(p). It is supposed that there are p artificial neurons in the input layer. It is clear that p equal to N-1+ ti.
It is supposed that there are n artificial neurons in hidden layer, n artificial neurons in connection layer. An integer is assigned to n in training process, bigger n indicates stable network but heavy calculation load. The weight between neuron i in input layer and neuron j in hidden layer is noted by ω2 ij (i = 1,2, . . .,p; j = 1,2, . . .,n). The weight between neuron i in connection layer and neuron j in hidden layer is noted by ωr2 ij (i = 1,2, . . .,n; j = 1,2, . . .,n). The weight between neuron i in hidden layer and the neuron in output layer is noted by ω3 i 1(i = 1,2, . . .,n).
The training process of Elman neural network iterates until the simulation accuracy achieve the target value or declared maximum number of steps. Each training step is composed of a forward propagation and a backward propagation. In the t th forward propagation of training step, net1ðiÞ t ði ¼ 1; 2; ; pÞ is the output sequence of neuron i in input layer. The output sequence of neuron i in hidden layer of the t time training step is noted by . The output sequence of neuron i in connection layer of the t th training step is noted by Rnet2ðiÞ t ði ¼ 1; 2; ; nÞ. net2ðjÞ t ðj ¼ 1; 2; ; nÞ is calculated by Formula (10): f 2 is the activation function on hidden layer, usually performed by nonlinear function such as Sigmoid function. R net2ðiÞ t ði ¼ 1; 2; ; nÞ is calculated by Formula (11). The output of connection layer in the t th forward propagation is as same as the output of the hidden layer of the (t-1) th forward propagation. rnet2 t ðiÞ ¼ net2 tÀ 1 ðiÞ; i ¼ 1; 2; ; n The output sequence of the neuron in output layer of the t th is noted by net3ð1Þ t , calculated by Formula (12).
f 3 is the activation function on the output layer, can be defined as a linear function. The weights in the Elman neural network are adjusted in the backward propagation by the method of gradient descent. In the t th backward propagation, the weights of each pair of neurons are adjusted by Formula (13) and Formula (14).
The training process terminates at the target value of simulation accuracy or the fixed maximum number of steps, then each weight is assigned a constant value. A set of constant weights represent that the neural network is completely trained.

Simulation of indoor temperature using the trained model combining OGM(1,N) and Elman neural network
For an existing building, the sequences of explanatory variables X 2 ; X 3 ; . . . ; X N are inputted into the trained model, the output of the trained model is the simulation sequence of indoor temperature. Besides field measurement, there are several alternative ways to collect sequence of explanatory variables, such as current weather database and simulation from current software. It is supposed that there are q elements in each sequence of explanatory variable. It is clear that q is bigger than m, which represent that the period of thermal environment assessment is extended.

Identifying explanatory variables
The indoor temperature is affected by many factors. These factors and the indoor temperature form a complex nonlinear system in the context of the development of a simulation model of indoor temperature driven by data. The indoor temperature is the dependent variable and these factors are explanatory variables in the nonlinear system. The identifying of explanatory variables is essential for modelling of indoor temperature driven by data.
The selection of explanatory variables for indoor temperature simulation follows three principles in this study. Firstly, the explanatory variable has statistical association to indoor temperature, which is the foundation for simulation modelling driven by data. Secondly, the cost of collecting the value of the explanatory variable is acceptable, for instances, it has been in a current database or it can be field measured at a relative low cost. Thirdly, the explanatory variable is united in practice of building indoor environment assessment, "Evaluation standard for indoor thermal environment in civil buildings" (GBT50785-2012) (MOHURD 2012) is the current practised standard for thermal environment assessment in China. A set of thermal parameters should be field measured including the outdoor and indoor temperature, outdoor and indoor relative humidity in term of standard GBT50785-2012. As a result, the dependent variable is the indoor temperature noted by X 1 , explanatory variables are identified as: outdoor temperature noted by X 2 , outdoor relative humidity noted by X 3 , outdoor wind velocity noted by X 4 , and the corresponding recording time X 5 . It is a fact that these above variables are included in local weather database united in current tools, for an example, DeST contains typical year weather data covering more than two hundreds cities in China.

Construction of field measured database in a case building
A field measurement is carried out in a case building. The case building is located in a university campus in Hangzhou city of Zhejiang province of China. Hangzhou is 30.2 degrees north of latitude and 120.2 degree east of longitude. Hangzhou belongs to hot summer and cold winter area of climate zone in China. The 3 rd floor architectural plan of the case building is mapped in Figure 2. A measurement system is developed to measure and record these identified variables and some other behaviour of occupants (Yang, Liu, and Ying 2015). A photo of indoor measurement unit installed in Room 3 is listed in Figure 3. The photo of outdoor measurement unit installed on the roof of a near building is listed in Figure 4. The measurement is carried out for a year, the interval measurement time is 30 min. The measurement data during 7:00 May 29 and 6:30 June 2010 is selected as the training data for the temperature simulation for natural ventilated Room 3. The measurement covers variables of X 1 , X 2 , X 3 , X 4 , and X 5 . An array with 5 rows 624 columns is collected as the parent array by field measurement as Formula (15). One data point contains one time measurement for X 1 , X 2 , X 3 , X 4 , and X 5 . These data points are noted by No. 1~ No. 624. There are 624 data points in total.

The association analysis between the indoor temperature and explanatory variables
The grey association analysis not only provide validation for the selection of explanatory variables for indoor temperature simulation but also provide reference for weighting explanatory variables. Grey theory uses Grey Association Coefficient (Abbreviation as gac1 i ) as the indicator for the association between indoor temperature and the explanatory variable of X i (i = 2,3, . . .,N). The range value of gac1 i is [0, 1]. The greater of the gac1 i indicate the closer association between the explanatory variable of X i and indoor temperature.  Many methods are used for weighting indicators such as regressive analysis and variance analysis as well as principal component analysis; however, these methods require big size data and a special probability distribution of data. The grey association analysis suits for small size data; furthermore, it does not require a special probability distribution of data.
The grey association analysis consists of the following steps.
Step1. Calculation of normalized array X i Step 3. Calculation of gac 1i with Formula (19)  ε is the distinguishing coefficient, is usually constant of 0.5.
The Grey Association Coefficient calculated in the case building is: gac 12 of 0.743,gac 15 of 0.657, gac 14 of 0.613, gac 13 of 0.586. The important degree is "X 2 ,outdoor temperature">"X 5 , the corresponding recording time"> "X 4 ,outdoor wind velocity">"X 3 ,outdoor humidity">. It is clear that selected variables have important impact on the indoor temperature. Therefore, the selection of the explanatory variables has been validated.

Selection of training data
The proposed model aims to simulate the indoor temperature with small size training data. The array of two days from the field measured array is selected as the input data to train the proposed model, whereas the remaining data is used to validate the simulation of the proposed model. The training data size occupies about 15% of the field measure data whereas the validation data occupies about 85%. 24 h is the minimum period for thermal parameters measurement in term of the "Evaluation standard for indoor thermal environment in civil buildings" (GBT50785-2012). The proposed model can be directly used to extend the period of thermal environment assessment, therefore, it provides reference for the current practices of thermal environment assessment in China. A selected array of two days is used as a parent array in the proposed model. Four parent arrays, Array A, Array B, Array C and Array D, are selected to train the model, respectively, in order to investigate the suitability of the proposed model. The ranges of data points of four selected parent arrays are listed in Table 1.

Programming
A program is developed to implement the modelling of indoor temperature by proposed model. Matlab is selected as the platform since it provides sufficient tools for matrix and neural network. The programming covers four parts: (1) definition of functions; (2) modelling of the indoor temperature based on OGM(1,N); (3) modelling of the indoor temperature based on Elman neural network; (4) modelling of the indoor temperature combining OGM(1,N) and Elman neural network.

Training of the proposed models
The four parent arrays are inputted into the proposed model to train it, respectively. The training process of Parent array A is demonstrated as an example.
The model is trained with parent array A by the following main steps. Step 1. The parent array A is normalized using Formula (2), yield an array of 5 rows 96 columns in Formula (21): Step 2. The normalized parent array A is divided into two child arrays averagely.
Step 3. Two child arrays are used to construct two models based on OGM(1,N), two coefficient sequences (A 1 , A 2 ) are listed in Table 2; therefore, two models based on OGM(1,N) are constructed.
Step 5. ½X � is inputted into the Elman neural network to train it, X 1 ð0Þ is inputted into the Elman neural network as the tutor of output values. The number of neurons in input layer is 6. The number of neurons in hidden layer is assigned as 15 as same as in connection layer. The number of neurons in output layer is 1. Learning rate of a is 0.001. The Elman neural network is trained 1000 iterative steps, terminates at the simulation accuracy of 0.0001. The weights of trained Elman neural network are assigned as constant.
The explanatory array is inputted into the trained models. The simulation results using model trained by Array A is listed in Figure 5. The simulation results using model trained by Array B is listed in Figure 6. The simulation results using model trained by Array C is listed in Figure 7. The simulation results using model trained by Array D is listed in Figure 8. The shape of Figures 5-8 shows the consistency between the simulated temperature and the field measured temperature. It is clear that the simulation indoor temperature is highly consistent with the field measured data at training stage. The mean and standard deviation was used by Sözer and Aldin (2019) to evaluate the consistency between the simulated and the measured indoor temperature at the simulation stage in prediction with short-term measured data. The mean of measured indoor temperature is noted by MMean and calculated by Formula (22). The mean of simulated indoor temperature is noted by SMean and calculated by Formula (23). The standard deviation of measured indoor temperature is calculated by Formula (24). The standard deviation of simulated indoor temperature is calculated by Formula (25).
X 0 1 ð0Þ is the measured indoor temperature value, SX 0 ð0Þ 1 is the simulated indoor temperature value. The calculated mean and the standard deviation are listed in Table 3. Compared with the value in Sözer and Aldin (2019), both of the simulated mean and standard deviation are quite closer to the measured. Therefore, the simulation data is accepted consistent with the field measured indoor temperature in simulation stage and training stage.
The measured indoor temperature provides the base for analysis of accuracy of simulation. The relative deviation between the simulated and the field measured indoor temperature is noted by RD. RD is calculated by Formula (26), MRD is the arithmetic mean of relative deviation between the simulated and the field measured indoor temperature, calculated by Formula (27). MD is the mean deviation between the simulated and the field  (28). SMSD is the standard mean square deviation between the simulated and the field measured indoor temperature, calculated by Formula (29). The lesser of deviation means the higher simulation accuracy. The calculated simulation accuracy of trained models is listed in Table 4.
In term of Table 3, the maximum MRD is 1.76% of model trained by Array A, whereas the minimum MRD is 1.43% of model trained by Array D. The Maximum RD is 6.32% of model trained by Array  The requirement for accuracy of measurement instrument of temperature is ±0.5°C in term of "Evaluation standard for indoor thermal environment in civil buildings" (GBT50785-2012) (MOHURD 2012) in China. Although an exact accuracy requirement for the indoor temperature simulation in civil building is not found by the current knowledge of authors, the requirement for accuracy of measurement instrument of temperature could provide a reference for the bench mark for simulation. Table 4 shows that the MD is lower than 0.5°C, and the SMSD is close to 0.5°C. Therefore, the  accuracy of simulation of the proposed model is satisfactory for the assessment of indoor thermal environment in existing civil building.

Discussion
Two simulation models based on data mining are executed, respectively, to provide comparison for analysis of the performance of the proposed simulation model. The simulation model based on OGM(1,N) and the simulation model based on Elman neural network are executed, respectively. The proposed model in this paper is based on the combination of OGM(1,N) and Elman neural network. The four parent arrays of field measurement in the case building are inputted into model based on OGM(1,N) to train it, respectively. The four parent arrays are inputted into model based on Elman neural network to train it. The array of explanatory variables is inputted into the trained models to yield simulated indoor temperature. The comparison of simulation results using models trained by Array A is listed in Figure 9. The comparison of simulation results using models trained by Array B is listed in Figure 10. The comparison of simulation results using models trained by Array C is listed in Figure 11. The comparison of simulation results using models trained by Array D is listed in Figure 12.
The patterns in Figures 9-12 indicate two features. Firstly, the simulation value of the model based on the combination of OGM(1,N) and Elman neural network has highest consistent with the field measured indoor temperature among the models. Secondly, the combination model has stronger robustness than the other two models. Evidence is that the selection of parent array has greater impacts on model of OGM(1,N) and model of Elman neural network than the combination model. Figure 9 shows that the model of Elman neural network trained by Array A has the biggest deflection from the measurement. Figure 11 shows that both of the model of Elman neural network and the model of    N). The small size of input data is the main reason for the insufficiency of model of Elman neuron network in this study. It is clear that the simulation performance of combination model is strengthened by the compensation from each other of grey system and neuron network in accuracy and robustness.
The comparison of simulation accuracy is listed in Table 5. The simulation accuracy of the model based on the combination of OGM(1,N) and Elman neural network is the highest among the models.
The simulation tool DeST is executed to provide further comparison, DeST is a kind of physicsbased thermal simulation tool. The simulation indoor temperature of DeST and proposed simulation models in this paper are mapped in Figure 13. The indoor temperature of DeST has a great gap over the field measured value. A reason for the gap is that the outdoor climate embodied in DeST is  a typical year data which has a deviation from the field measured value of outdoor environment. Another reason for the gap is that the operation of the building is different from the pattern of design embodied in DeST. Compared with the simulation indoor temperature of DeST, the model of combination of OGM(1,N) and Elman neural network is highly consistent with the field measured indoor temperature. It is evident that the proposed model suits for the thermal environment simulation in existing buildings, whereas DeST suit for the design stage for building thermal environment.

Summary and outlook
Simulation of the indoor temperature provides important references for thermal environment not only for design of building but also for assessment of existing buildings. The current thermal environment simulation software tools suit for building design stage, however do not suit for existing buildings. Field measurement is a convincing assessment method of indoor thermal environment in existing buildings, however measurement of long period is high cost and cause more disturbances to occupants. Small size field measured data is commonly available in building thermal assessment practices in term of  "Evaluation standard for indoor thermal environment in civil buildings" (GBT50785-2012). Therefore, there is a need for prediction of indoor thermal environment under short-term field measured data in both practical and academic research. This study aims to provide more reference for indoor thermal environment assessment and design. This research is summarized as follows: (1) A model is proposed to simulate indoor temperature, which is trained by small size field measured data in existing building. The proposed model is on the combination of OGM(1,N) and Elman neural network. OGM(1,N) has convincing modelling ability in mining of the small size data, however, it is lack of self-adaptation and parallel computing ability. Elman neural network has strong self-adaptability and fault tolerance in prediction, however the high performance of Elman neural depends on the big size of training data. Therefore, the combination of OGM(1,N) and the Elman neural network compensate for each other in stable structure and accuracy with small size training data.
(2) A unit is designed and installed in a case building in Hangzhou of China to measure and record thermal parameters, the field measurement is carried out in for a year at interval of half hour, covering indoor temperature, outdoor temperature, outdoor relative humidity, outdoor wind velocity and the recording time. Twelve days field measured data in the second season in natural ventilated building are selected as the database, four selected 2 days of field measured data are used to train the proposed model respectively, the remaining data is used to validate the proposed model. The process of training and validation of the proposed model is implemented by programming in Matlab. The validation of the proposed model is demonstrated by the analysis of consistency and accuracy between the simulation indoor temperature and the field measured data. (3) Two simulation models driven by data mining, the simulation model based on OGM (1,N), the simulation model based on Elman neural network, are executed respectively to provide comparison for the proposed simulation model. The simulation indoor temperature of the model based on the combination of OGM(1,N) and Elman neural network has highest consistent with the field measured indoor temperature. The simulation accuracy of the model based on the combination of OGM(1,N) and Elman neural network is the highest among these three models. The combination model has stronger robustness than other two models. The simulation tool DeST is executed to provide further comparison, it is evidential that the proposed model suits for the thermal environment simulation for existing buildings, whereas DeST suits for the design stage for buildings.
It is the wish of authors that the following issues will be investigated to strengthen this study in the future. (1) It is observed a relatively big gap between the simulation indoor temperature and the field measurement on the range of peak value in simulation stage in Figures 5-8. It will be investigated how the peak value of explanatory variables in the extended period influence on the model. (2) This study was carried out in the second season.
How the model work in other seasons, special in the third season, it is worth being further validated in the future. In summer and winter, the cooling system and heating system becomes the main explanatory variable, how this model is revised to provide reference for thermal environment assessment, it will be investigated in the futher.