Simulating urban growth through case-based reasoning

ABSTRACT Case-based reasoning (CBR) simplifies knowledge acquisition and is suitable for researching complex geographical problems. However, CBR analyses of land-use changes are difficult to apply in the study of urban growth due to shortcomings in the case structure and model algorithms. In response, this study proposes a three-step urban-growth simulation model based on CBR (UGSCBR). First, to adapt the CBR to an urban-growth simulation process, the characteristics of regional differentiation in geographical spaces are determined. Second, a comprehensive retrieval method is developed that improves upon traditional case-retrieval methods by giving full play to the comprehensive function of each component of the case. Third, a quantity demand constraint indirectly adds a time factor to solve the initial blurriness of the traditional CBR-inference cycle. Taking Jixi city as the research area, we test the accuracy of the proposed model. The total accuracy of simulation results is 95.4%, and the Kappa is 87.4%. The figure of merit and Mathews correlation coefficient are 0.151 and 0.23, respectively, indicating that the model can meet the application requirements. The results show that the UGSCBR model has strong flexibility and simplicity, and it provides an effective prediction method for urban growth.


Introduction
Urban growth is the process of cities spreading outward to occupy other types of land (Mookherjee & Hoerauf, 2015). It is the result of economic, social, political, legal, institutional, and environmental factors that interact in a parallel and sometimes conflicting manner (Campos et al., 2018). Spatiotemporal landuse and land-cover change simulations are effective and reproducible tools for assessing and analyzing the process of dynamic urban change (Liu et al., 2017). Scholars have used a variety of simulation models to study the expansion of different cities. For example, the Markov model (Ahmad et al., 2017;Coppedge et al., 2007) can predict changes in quantity over a certain period time, but it is a non-spatial model. The system dynamics model (Z. Liu et al., 2019;Tan et al., 2018;Wu & Ning, 2018) can effectively explain the function of complex systems and their relationship with structure, but the spatial scale effect in earth science cannot be integrated into this model. The Bayesian is an effective method for simulating landuse change in spatial regions, but it requires extensive assumptions, such as a complete dataset, no preferred selection, and discontinuous variables network (Ouyang et al., 2019). Cellular automata (CA; Cheng & Masser, 2004;Firozjaei et al., 2019;Jafari et al., 2017;Shafizadeh-Moghadam et al., 2021) is a very useful tool for simulating complex systems and is a common method of simulating urban growth; however, the acquisition process of its transformation rules is complex, and it is difficult to describe the geographical significance of complex rules. From the perspective of existing applications, these methods and models each have their own pros and cons (Du et al., 2010), making it necessary to explore new ideas that can solve the problems of urban growth simulation.
Case-based reasoning (CBR) is an important approach in the field of artificial intelligence that relies on having enough historical data to conduct quantitative analyses and predictions of phenomena without knowing the development mechanisms (Du et al., 2010;Mantaras et al., 2006). It is essentially memorybased reasoning, using specific case knowledge from previous experiences to solve similar problems; it therefore conforms to human cognitive processes (Aamodt & Plaza, 1994;Watson, 1999). It eliminates the need to display domain knowledge models, avoids the constrictions of knowledge acquisition, and has the advantages of simplifying knowledge acquisition and knowledge accumulation while improving problem solving efficiency and solution quality (Du et al., 2002).
Currently, in the field of GIScience, CBR is being applied to forest fire monitoring (W. Liu et al., 2015), urban disaster prevention and mitigation , water environment simulation (Liao et al., 2019), mapping (Dou et al., 2015;Du et al., 2013;Machado et al., 2019), and other research directions. Few studies, however, have been conducted on the spatial patterns and evolution of geographical phenomena using CBR. At present, such research mainly includes two types of studies: a CA simulation process of land-use change constructed by replacing static rules with CBR, and a CBR reasoning that integrates CBR and land-use change. The former is represented by the CBR-based CA model proposed by Li and Liu (2006), which showed that CBR can effectively replace static rules and solve the problems regarding the internal differences of complex systems that the traditional rulebased CA model could not. However, this research focused on the mining of transformation rules using CBR, without in-depth discussions on CBR's application in land-use change analysis. Du et al. (2010) proposed using CBR to analyze land-use change and showed that the CBR method is simple and more flexible than the CA model (Du et al., 2012). However, based on the present literature, it is difficult to simulate the spatial pattern formation processes of a complex system, and the single-direction reasoning of previous case-retrieval results lacks comprehensive analysis based on historical experience. In addition, the lack of key ideas on the time factor control and the blurry CBRreasoning cycle makes it difficult to determine the simulation result period.
In summary, the CBR method can implicitly describe the rules of urban pattern transformations based on historical cases. It breaks through the constrictions of traditional models' rule acquisitions and has a better effect in terms of simulating complex areas. However, the existing research employing CBR to study land-use change has shortcomings with respect to case structures and model algorithms, making the CBR reasoning incomplete. In addition, the differences between the research objectives of those studying land-use change and urban growth make it difficult to apply this method to land-urbanization research. Thus, this study aims to improve the existing CBR land-use change analysis method (Du et al., 2010) and establish a CBR urbangrowth simulation model to analyze urban spatial pattern evolution. The following problems must be solved within the modeling process: (1) Improve the existing representation modes for land-use change cases by proposing a set of case structures that adapt to the urban-growth simulation process.
(2) Develop a comprehensive retrieval approach to solve the problem of single reasoning in traditional research.
(3) Study the introduction of a time factor to solve the problem of blurry CBR-inference cycles.
Accordingly, this study proposes an urban-growth simulation model based on the CBR approach (UGSCBR) and applies it to an urban-growth simulation of Jixi city in China from 2005 to 2015.
The remainder of this article is organized as follows. Section 2 describes the proposed UGSCBR model in detail. Section 3 describes the study region, lists the data, and provides the experimental results. Section 4 discusses the results, and Section 5 concludes.

The urban-growth simulation case-based reasoning approach
The basic idea of the UGSCBR model is as follows ( Figure 1): Taking the complete description of each land unit in time and space, the land-evolution type is inferred. The "new case", that is the urban-growth result, is then obtained by retrieving the most similar previously known case. To address the three key problems mentioned in the introduction, the UGSCBR model includes three steps: "case organization", "case retrieval", and "case constraint". The model retrieves known cases that are similar to the unknown case and deduces the results of the unknown case using these levels. Figure 2 provides a detailed illustration of the modeling process.

Case organization
The case represents the basic unit and essence of CBR (Holt, 1999), as its structure and representation mode determine how CBR works. The existing study (Du et al., 2010) on CBR land-use change takes land parcels as the case unit, making it is difficult to simulate the spatial-pattern formation processes of a complex system. Moreover, the case representation structure of land-use change cannot be fully adapted to the simulation of urban growth. Therefore, it is necessary to develop a set of case structures that can better express the change process of complex spatial patterns and that can adapt to the research objectives of urbangrowth scholarship.
Regular, discrete raster pixels are the most commonly used mode in the study of geographical problems. This representation mode can effectively simulate the formation process behind the spatial patterns of complex systems by generating global patterns through local, individual behaviors. Therefore, the UGSCBR model proposed in this study is represented using this mode.
This study proposes a case representation structure (initial state, geographic features, and result) based on the existing three-component mode of land-use change (Du et al., 2010). The meaning of each component is as follows: (1) the initial state describes the land-use type at the beginning of the unit, (2) the geographic features include a set of spatial data indices that affect urban growth, and (3) the result involves whether the unit is ultimately transformed into urban land. The mode can be regarded as the process of landunit (case) change within a certain period. In other words, the land's initial state is affected by its geographical features, which will help achieve the end result. Cases can be defined by the following equation: where i is the case number, LU i is the case's initial state, and Result i is the result of case i, which is a Boolean variable with "1" indicating urbanization and "0" indicating non-urbanization. Index 1i ,Index 2i , . . ., Index ni represent the multidimensional geographical index of case i, with a total of n. For example, in a case describing the change process of a land unit from 2005 to 2010 (Figure 3), the "initial state" (LU) is the land-use type of the unit in 2005, the "geographic features" (Index n ) are the geographic-feature indices of the unit in 2005, and the "result" is the urbangrowth result of the unit in 2010. This model improves the adaptability of the CBR model's case structure to the study of urban growth by taking into account the effect of land-use type (initial state) and spatial environment (geographical features). It also simplifies the description of the "results" to urbanization and non-urbanization.
The CBR model is a process of inferring the results of a new period from a past one. To distinguish the role of cases in different periods, cases expressing historical changes in urban growth are regarded as "geographical cases", while those used to simulate urban growth are considered "simulated cases". By collecting grid cells in space, a "geographic case database" (GCDB) and "simulation case database" (SCDB) can each be constructed. Table 1 further illustrates the terms proposed in the UGSCBR model.

Case retrieval
The purpose of case retrieval is to find the most similar cases. We quantitatively calculate the similarity coefficient between cases and then deduce the simulated case result. The simulated case is deduced by finding the geographic case with the shortest distance in attribute space from the simulated case. The basic process of UGSCBR case retrieval is as follows: Each experience case is searched, and the most similar geographical case is taken as the result of the simulated case. To give full play to the comprehensive function of each component of the case, this study proposes a comprehensive retrieval strategy, which uses "initial state" to induce, "geographical features" to retrieve, and "results" to reason the case.
Different land-use types have different urbanization conditions. To reflect the role of the initial state, this study proposes the idea of induction and then retrieval; that is, all cases are summarized and grouped according to their initial state, so that each group of simulated cases can only be retrieved from geographical cases in the same group. For example, a simulated case with an initial state of arable land only needs to be retrieved and calculated with the geographical cases in the GCDB that also have arable land as their initial state.
Case retrieval is realized through similarity calculation of the cases' geographical features, with the nearest neighbor method used most often. The smaller the coefficient value calculated by this method, the higher the similarity between cases. The following equation describes this similarity: SIM ¼ ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi  In this equation, SIM represents the case's similarity coefficient value, i is the index number, p i is the i of the geographic case, q i is the i of the simulated case, n is the index quantity, and ω i is the weight assigned to i. While there are many ways to determine the weight, referring to relevant research (Li & Liu, 2006), the entropy weight method is adopted in this study. Before determining the weight of each index, the variables need to be normalized so that their values fall between [1,0]. This can be calculated as follows: where H is entropy and n is the total sample number. The entropy falls within the range of [0,1]. The smallest value of 0 represents the maximum amount of information exhibited in the variable, and the largest value of 1 indicates the minimum amount of information. Thus, the amount of information is proportional to the form: 1-H n . A feature with more information is expected to have greater weight. Thus, the entropy weight of the ith indicator can be expressed as: The similarity coefficient between each simulated case and all geographical cases can be calculated by Eq.
(2) to match the most similar geographical case. To give full play to the outcomes of urbanization and nonurbanization, this study proposes a retrieval method based on retrieval measurements, that is, a simultaneous retrieval of the most similar geographical cases from the two kinds of "results". The retrieval process for a certain simulated case j is as follows (Figure 4). .
(1) Calculate the similarity coefficient SIM i between the simulated case and each geographic case with the same group (the same "initial state") using Eq. (2). The calculated SIM number is the same as the number of geographic cases in the group.
(2) Record the similarity coefficient values of the most similar geographic cases in urbanized (result is 1) and non-urbanized (result is 0) locations as SIM min-1 and SIM min-0 . The minimum value (SIM min ) is defined as the "retrieval measure" of simulated case j.
(3) The result of simulated case j is deduced by a retrieval measure, and the reasoning formula is as follows: Applying the above method to the retrieval process of each simulated case, the results of all the simulated cases can be obtained.

Case constraint
Within the actual geographical environment, different land-use types tend to have specific spatio-temporal change characteristics, and the similarities between these characteristics in the time domain are suitable for solving the issues with the case retrieval methods (Qian et al., 2007). To determine the reason for the case simulation's result, it is necessary to control the case change process in the time domain, that is, the period of case change guided by a time factor. However, given the lack of a time factor control in traditional CBR-reasoning processes (Du et al., 2010) caused by employing a blurry CBR-reasoning cycle, the period of the simulation results is difficult to determine. In the UGSCBR model, although it is possible to simulate all cases through a complete case retrieval, it is not possible to determine the number of cases that have changed during the simulation period. This study proposes a method to control the number of case changes after case retrieval by calculating the quantity of urban growth (quantity demand [QD]) in the forecast year, introducing the time factor in an indirect way to solve the above problem. First, we reclassify the simulated case based on the retrieval measure. When SIM min = SIM min-1 , we add the case to the quasiurbanized case set, and when SIM min = SIM min-0 , we add the case to the quasi-non-urbanized case set. Then, Markov (Blanchard et al., 2010), spatial superposition, or other methods can be used to calculate and compare the urban growth (QD i -in which i is the type of initial state) and number of cases (NC i ) in the quasi-urbanized case set. The reasoning behind the three possible scenarios is as follows ( Figure 5): (1) When QD i = NC i , it infers that all cases in the quasi-urbanized case set of type i are transformed into urban land, while cases in the quasi-non-urbanized case set are not transformed. This situation has a low occurrence probability.
(2) When QD i <NC i , the cases in the quasi-urbanized case set are sorted from small to large according to the SIM min . This infers that QD i cases are transformed into urban land, while other cases in the set and in the quasinon-urbanized case set are not transformed.
(3) When QD i >NC i , it infers that all cases in the quasi-urbanized case set of type i are transformed into urban land, and the cases in the quasi-non-urbanized case set are sorted from small to large through SIM min-1 . To ensure there are enough cases, such cases are transformed into urban land to meet the needs of urban growth, while other cases are not transformed.

Study area and data
The proposed UGSCBR model was applied to the simulation of urban growth in the main urban area of Jixi city. Jixi is a southeast city in Heilongjiang Province, China ( Figure 6). It has jurisdiction over nine county-level administrative regions, covering a total area of approximately 22,550 km 2 , and has a total population of 1.72 million. This region has a cold temperate continental monsoon climate, and the terrain mainly includes mountains, hills, and plains. By 2018, its economic output was $7.7 billion. The spatial distribution of coal resources and geological conditions affects urban development in this region. Its urban spatial structure is dispersed, and the degree of aggregation is low. The built-up urban area only accounts for 37% of the total urban land area. As a mining resource-based city with strong resource support capacity, the study area continues to expand, and its evolution process is complicated. As such, urban development regulation is not obvious, and it is difficult to determine a set of ideal rules for urban space transformation. Thus, Jixi city serves as an ideal location to test the effectiveness of the UGSCBR method. . The data were divided into six types of land use (urban land, arable land, woodland, grassland, water, and unused land), and CGC2000 was taken as a unified coordinate system. Referring to relevant literature (Aburas et al., 2017;Chen et al., 2016;Liang et al., 2018b), we standardized and divided various land-use drivers into distance and terrain factors with the same resolution as the land-use data. Table 2 lists the data from this study. This study used ArcGIS10.2 as the data processing and model operation software.

Model implementation
This study simulated the research area's urban growth in 2015. The experiment established the SCDB between 2010 − 2015 with a transition period of 5 years, based on the 2010 land-use condition data, and established a GCDB of historical cases from 2005 − 2010. According to the UGSCBR model's case structure, the simulated case is expressed as follows: the initial state describes the land-use type of the unit in 2010, the geographic features are the various indices in 2010 (Table 2), and the result is the urban-growth simulation result in 2015, which took a null value when building the database. For the geographic case, the initial state reports the land-use type of the unit in 2005, the geographic features are the various indices in 2005 (Table 2), and the result describes whether the unit had been transformed into urban land in 2010.
Geographic cases need to be grouped according to the initial state. We did not consider changes to water and urban land, and only constructed the GCDB with

2005-2010
Using the "Euclidean distance" function to obtain the distance of all grid cells to the nearest railway or highway. Distance to city center (D center ) Using the "Euclidean distance" function to obtain the distance between all cells and the city center. Distance to city edge (D edge ) Using the "Euclidean distance" function to obtain the distance of all grid cells to the nearest urban land. Distance to water (D water ) Using the "Euclidean distance" function to obtain the distance of all grid cells to the nearest body of water.
arable land, woodland, grassland, and unused land. Each group of GCDB was built by randomly sampling 10,000 urbanized and non-urbanized cases, respectively, and collecting all cases in this group if their numbers were insufficient. Table 3 shows the statistics of each case group, and Table 4 presents the GCDB. The index weights (Table 5) of each group of geographic cases were calculated by the entropy weight method. According to the case retrieval process, Eq.
(2) was used to calculate the similarity coefficients, SIM min-1 and SIM min-0 , for the most similar urbanized and non-urbanized geographical cases. We then constructed the quasi-urbanized and quasi-nonurbanized case sets. The retrieval process was developed using Python 2.7.
Based on the spatial overlay analysis, the amount of each land type transformed into urban land from 2010 to 2015, the urban-growth quantity demand (QD i ), was compared with the number of cases (NC i ) in the quasiurbanized case set (Table 6). The grassland-type cases belong to the QD i >NC i situation, and the other case types belong to the QD i <NC i situation. The simulation results are reasoned based on the case constraints discussed in the "Case constraint" section.

Result
The UGSCBR must be validated when it is applied to the simulation of actual cities. The simplest validation method is to visually compare the simulated patterns with the actual ones . Therefore, visual inspections were conducted to compare the simulated urban areas for 2015 and the actual situations derived from land-use data. The results show that the actual situations ( Figure 7) were similar to the simulated ones (Figure 7). A confusion matrix and error map of the concordance between the simulated and actual situations was further obtained to conduct a quantitative analysis (Table 7 and Figure 8). This matrix was calculated based on a cell-on-cell spatial overlay of these two situations (total simulation accuracy: 95.4%). The simulation accuracy of the urbanized and non-urbanized cases was 90.4% and 97.0%, respectively. In addition, the kappa coefficient was 87.4%. Overall, the results show that the UGSCBR model has a high simulation accuracy.
The proportion of unchanged regions in the whole study area is often high in urban simulations. The total accuracy of the confusion matrix and kappa coefficient involves these unchanged parts in the calculation range, which leads to a high calculation value. Employing the "figure of merit" (FoM) indicator can help avoid this situation (Chen et al., 2013). The FoM index is better than the kappa in evaluating the accuracy of simulated changes and can be calculated by Eq. (7; Pontius et al., 2004). In addition, the Matthews correlation coefficient (MCC) is considered as a balance measure for evaluating the quality of binary classifiers or models and can be calculated by Eq. (8; Boughorbel et al., 2017;Kantakumar et al., 2019).

Figure of merit ¼ Hits Misses þ Hits þ False alarms
In this equation, Misses is the area of non-urban land that is transformed into urban land in the actual scenario but not in the simulation scenario.
Hits is the area of non-urban land transformed into urban land in both scenarios (the correctly transformed area). False alarms is the area of non-    Matthews correlation coefficient MCC ð Þ ¼ Hits � Correct rejections À False alarms � Misses ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffiffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffiffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffiffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi urban land that is not transformed into urban land in the actual scenario but is in the simulation scenario. Correct rejections is the area of nonurban land that is unchanged in both scenarios. The FoM index and MCC were calculated as 0.151 and 0.23, reconfirming the UGSCBR model's high simulation accuracy and ability to meet the application requirements.

Influence of geographical case number on accuracy
The simulation accuracy of each land type's urbanization was further analyzed, and the FoM index of the arable land, woodland, grassland, and unused land simulations was calculated as 0.156, 0.129, 0.010, and 0.055, respectively. The results show that the simulation accuracy for the different land types varies greatly, and the key reason for this difference is the varying number of recorded historical experiences for each case type. After comparing the FoM with the number of cases (Figure 9), we found a significant positive relationship between the FoM index and the total number of geographic cases in each group. In addition, the shortage of geographic cases affected the simulation accuracy of grassland and unused land as well as the overall accuracy. Although the nonurbanized geographic cases of arable land and woodland were the same, the differences between the urbanized geographic cases caused the low simulation accuracy of the woodland. Therefore, increasing geographic cases plays a positive role in improving simulation accuracy, and the number of geographic cases with different results in each initial state will affect the simulation results.

Influence of case constraints on accuracy
The UGSCBR model's constraint process consists of two parts. On the one hand, the similarity coefficients (SIM min-1 and SIM min-0 ) of the urbanized and non-  urbanized geographic cases that were most similar to the simulated cases were comprehensively retrieved, and the SIM min-0 was used to constrain the transformation sequence of the simulated cases. On the other hand, the quantity demand (QD i ) was used to control the number of cases transformed into urban land. The two constraint methods are tested below. We can analyze the simulated case accuracy, as affected by the constraint, to study the process of the SIM min-0 constraint. As shown in Table 8, we extracted the cases that could not be transformed into urban land due to the constraint during the simulation process: in 2015, 284 cases were transformed and 1162 cases were not, under the actual scenario. We then extracted the supplementary cases that were transformed into urban land due to the constraint, among which 306 cases were transformed and 1140 cases were not, under the actual scenario. In the cases affected by the constraint, their  simulation accuracy after the constraint improved by 1.6%, confirming the effectiveness of the constraint method.
The main principle of the quantity demand constraint is that the smaller the similarity coefficient, the stronger the correlation between cases. Therefore, the effect of the constraint can be analyzed by comparing the accuracy of different similarity coefficient intervals. The mean-standard deviation method is used to divide the similarity coefficient interval of the urbanized simulated cases into five grades (Table 9).
We counted the correct and incorrect cases in each interval. As shown in Figure 10(a), the high and medium similarity levels had the most correct simulated cases, followed by higher similarity level. However, the incorrect cases in the three grades also showed an increasing state as the grade dropped. In addition, the fewest correct simulations were in the low and lower similarity levels, with the most incorrect cases appearing in the lower level. Simulation accuracy and the FoM index can show the relation between interval grade and accuracy more directly. According to Figure 10(b), urban accuracy, overall accuracy, and the FoM index all showed an obvious downward trend with a decrease in interval grade. The similarity interval grade had an obvious positive correlation with the simulation accuracy. It also showed that the smaller the similarity coefficient, the higher the similarity between cases and the higher the reliability of the simulation results. Therefore, the quantity demand method was proven to be effective.

Influence of other factors on accuracy
The research area observed in this study concerns a mining resource-based city. Compared with ordinary cities, the urban land is more scattered (Duncan et al., 2009;Schueler et al., 2011;Zhang et al., 2017) and the laws of urban development are not obvious. New coal mines, industrial parks, and changes in transformation policies will lead to unpredictable development of the city (Ye et al., 2021). The results of this study are compared to that of similar studies that use multiple models to simulate the urban growth of Tehran, Mumbai, and Pune (Kantakumar et al., 2019;Shafizadeh-Moghadam et al., 2017, and the FoM is above 0.40. According to the distribution and shape of urban land, most of Tehran's urban land is concentrated in the north, with an overall high degree of urban concentration, and the southern part of the city is the main direction of urban growth; Mumbai's urban-land connectivity is relatively high, and urban growth is mainly based on internal filling. However, the urban land in Jixi city is mainly scattered in the south, middle, and east of the study area, with more directions and more complicated urban growth patterns (Ye et al., 2021). In addition, there are significant differences in development between the different countries and regions. In studies taking Chinese cities as experimental areas, the FoM values are mostly concentrated between 0.1 and 0.2 (Chen et al., 2016;Liang et al., 2018a). This is similar to the simulation accuracy observed in this study.

Simulation accuracy and efficiency
According to the experimental results (Figure 9), with the increase in the number of sample cases, the probability of retrieving more similar cases can be improved, and the reliability of the simulation results can be enhanced. However, this also means that the accuracy is limited by the number of historical cases. Specifically, the simulation accuracy is not satisfactory when the number of historical cases is small. At the same time, the number of cases also affects the operation speed of the model. If excessive historical cases are collected to improve the accuracy, this usually leads to longer simulation times, which benefits neither the analysis nor the exploration of the simulation process. Therefore, it is difficult to balance the model's simulation accuracy and efficiency under the influence of the number of historical cases, making a standard for sampling historical case numbers necessary. Future research can explore the influence of different historical periods on the simulation results to develop a method of increasing historical experience through multi-period data sampling and to explore the correlation between simulation accuracy and efficiency to perfect the model's reasoning in urban-growth research.

Conclusions
CBR simplifies knowledge acquisition and relies on sufficient historical data to realize quantitative analyses and phenomena predictions. It can effectively solve the barriers of conversion rules that emerge in the process of spatio-temporal land-use change simulation. This study presents a UGSCBR model for urban-growth simulation. By introducing a threestep process, this model improved the shortcomings of the case structures and model algorithms traditionally used in research (Du et al., 2010). It also adapts CBR reasoning to the urban-growth simulation process and solves the problem of CBR not being sensitive to time scales. This study applied the model to an urban-growth simulation of Jixi city from 2010 to 2015, and the experiment showed the flexibility and simplicity of the UGSCBR model. This model provides a new direction for the study of urban growth.