Prediction of network public opinion features in urban planning based on geographical case-based reasoning

ABSTRACT As a significant part of sustainable urban development proposed by the United Nations, urban planning is related to the ecological environment and transportation, especially affecting quality of life and social well-being. In the process of urban planning, the public express their opinions on open network platforms, resulting in large quantities of network public opinion data, which has important implications for evaluating urban planning. Based on the idea of geographical case-based reasoning (CBR), this paper constructs an expression framework for urban planning cases in the form of a ‘case problem–case attribute–case result’ triad. On this basis, this paper proposes a similarity calculation method of urban planning cases that integrates case attribute. Finally, based on an improvement to the traditional k-nearest neighbors method, the proposed public opinion feature calculation model considers similarity weights, which allow us to predict network public opinion features, including viewpoint-level emotional tendency and concerned groups of urban planning cases. The experimental result shows similarity weights (SWs) model could effectively improve the prediction accuracy, where the average MIC-F1 score reached more than 74%. Based on CBR, the proposed method can predict the development trends of public opinion in future planning cases, and provide scientific and reasonable decision support for urban planning.


Introduction
Urbanization is one of the most irreversible and influential human activities, which affects the sustainable development of cities and the living standard of residents (Li, Sun, and Fang 2018;Shao et al. 2021). The growth of urban population and changes in the type of urban land use have exacerbated the process of urban expansion in developing countries, posing significant demands and challenges to the rational planning of urban space and land resources (Sumari et al. 2020;Guo et al. 2022;Xu et al. 2019). With the rapid development of urbanization (Deng, Fu, and Sun 2018), urban planning has a direct impact on quality of life and well-being, and the public has an increasingly strong desire to participate in urban planning (Xing et al. 2011). Western countries have conducted extensive exploration and research on public participation in urban planning, which is of great significance for making scientific and people-friendly urban planning decisions (Arnstein 1969;Forester 1999;Healey 1992). Today's top-down planning approach may no longer be applicable to the current urban development process (Li et al. 2019), especially in consideration of the diversity of cities and the awakening of public consciousness in the era of economic prosperity. With the emergence of the Internet of Things and the increasing popularity of the Internet, social media has become an indispensable part of people's daily lives (Abdul-Rahman et al. 2021). With the help of open online platforms, people can participate in urban planning (Mahdavinejad and Amini 2011;Rall, Hansen, and Pauleit 2019), which results in the convergence of a large number of urban planning networks of public opinion that profoundly impact the rational preparation and smooth implementation of urban planning. Full consideration of the public's emotional attitude in public opinion will help to improve the rationality, democracy, and scientificity of urban planning, and achieve widespread participation (Hao, Zhu, and Zhong 2015). Urban planning is forward-looking to a certain extent, and it is necessary to determine the direction of future urban planning based on historical planning experience. A recent focus of investigation pertains the use of big data resources of public opinion in urban planning in an effort to predict the emotional tendencies and concerns of groups in future planning cases based on historical planning cases (Holt 1999), and to provide guidance for urban planners.
Spanning a wide range of research content, including hotspot event prediction (Skoric, Liu, and Jaidka 2020;Chen, Duan, and Wang 2017;Ma et al. 2019), major event early warning (Forkan, Khalil, and Atiquzzaman 2017;Yu et al. 2008;Fu et al. 2016), and evolution trend prediction (Zhang et al. 2019;Li 2021;Fu and Zhao 2021), public opinion prediction is a hot research topic in many areas such as social politics, economy, culture, and so on. Prediction models of network public opinion can be divided into two categories (You and Chen 2016). The first consists of predictions model based on traditional statistics, and the other is based on intelligent machine algorithms. In terms of public opinion heat prediction, many methods, such as time series models (Hester and Gibson 2003), grey prediction methods (Wenjie et al. 2013;Tien 2012), prediction models built by back-propagation neural networks (Chen, Liu, and Zhang 2015;Zeng et al. 2007), etc., are used to predict and analyze the changing trends of public opinion heat in case-by-case events. Popularity prediction algorithms combined with latent Dirichlet allocation (LDA) and k-nearest neighbors (KNN) (Berbague, El islem Karabadji, and Seridi 2018) can further improve the accuracy of heat prediction of public opinion. In terms of sentiment prediction of public opinion, multiple linear regression models (Chen, Duan, and Wang 2017), transfer learning (Tao and Fang 2020), and other methods are used to predict the emotional tendency of public opinion. Correlation vector machine models are also used to predict the emotional tendency of netizens (Xin et al. 2016;Rajendiran and Priyadarsini 2022), which can avoid the problem of local optimal solutions.
In the field of urban planning, research on public opinion prediction mainly focuses on the analysis and prediction of user satisfaction in planning. By selecting indicators related to factors such as the natural environment (Rafieian, Asgary, and Asgarizade 2009) and social environment (Wu and Jung 2016), it is possible to analyze the influencing factors of user satisfaction (Gruber and Shelton 1987;Currie and Thacker 1986), and construct satisfaction prediction models based on analytic hierarchy processes. The questionnaire survey method is used to obtain the opinion of stakeholders regarding planning and their corresponding demands (Fan 2015). The topic of public opinion can be extracted through the analysis of public feedback on specific planning projects (Baawain et al. 2020). At present, in the field of urban planning, most data are obtained from questionnaire surveys, and the big data resources of public opinion are not fully utilized. There are also limitations in the quantity and concentration of such data, which makes it difficult to reflect the overall trends of the development of public opinion. At the same time, there is a lack of multidimensional analyses of the characteristics of public opinion, and multi-dimensional attributes such as subjective social population factors and objective geographical environment factors, which are not fully considered. Therefore, it is difficult to predict emotional tendency at the viewpoint-level and the categories of concerned groups in these cases. Land use and land cover information is used for effective detection of urban land use patterns, which, combined with aerial and satellite remote sensing data, can improve understanding of changes in urban area functions (Liu et al. 2017;Ligate, Chen, and Wu 2018). In addition, the combination of remote sensing and GPS data improves the measurement and analysis accuracy of urban sprawl (Xu et al. 2019). Social media data is considered a valuable resource for advancing urban research, bringing new perspectives to urban research. It is used to study the relationship between urban population distribution and urban function, the impact of human activities on urban environment, and public participation in urban planning (Martí, Serrano-Estrada, and Nolasco-Cirugeda 2019;Li, Shen, and Hao 2016).
Based on the spatiotemporal analysis of urban planning case data, it is possible to infer judgment of the state and possible results of an urban planning case, which in turn can be applied to the result prediction of urban planning problems and decision-making judgment of urban planning events (Ye and Shi 2001;Yang et al. 2008;Du et al. 2002). This paper proposes a public opinion feature prediction method based on urban planning case reasoning, which refers to calculating the public opinion features of a new urban planning case based on the public opinion information of historically similar urban planning cases. Based on the theory of geographical case-based reasoning (CBR), this paper proposes a method to calculate the similarity weights (SWs) of urban planning cases by integrating the temporal, spatial, and population attributes of cases, so as to realize the feature prediction of urban planning network public opinion. On the basis of urban planning CBR, a public opinion prediction model of urban planning cases is constructed that integrates subjective social population factors and objective geographical environment factors. According to the public opinion features of similar historical planning cases, the emotional tendency from different viewpoints and the categories of concerned groups of new planning cases are predicted to provide decision-making support for such planning. In Section 2.1 and Section 2.2, we detail the study area and the sources of data used in the study. In Section 2.3, we propose the prediction process of public opinion features of urban planning CBR, while in Section 2.3.1, we construct an expression framework for urban planning cases; In Section 2.3.2, we put forward a similarity calculation model of urban planning cases; and in Section 2.3.3, we focus on the calculation method of public opinion features based on SWs. In Section 3, we forecast public opinion features based on the planning data of Guangzhou, China. Finally, in Section 4 we present our conclusions.

Study area
The study is conducted in Guangzhou, located in the south of mainland China (latitude 22°26 ′ -23°5 6 ′ N, longitude 112°57 ′ -114°03 ′ E) ( Figure 1). Guangzhou is one of the most highly developed megacities in China. As the capital of Guangdong Province, Guangzhou is the political, economic, scientific, technological, educational, and cultural center of Guangdong Province. In 2020, the gross domestic product (GDP) of Guangzhou was 250.19 billion RMB. By the end of 2020, the total population in Guangzhou was 18.6766 million, making it the fifth largest city in China. Guangzhou has 11 administrative districts, namely, Yuexiu District, Haizhu District, Liwan District, Tianhe District, Baiyun District, Whampoa District, Huadu District, Panyu District, Nansha District, Conghua District, and Zengcheng District.

Data source
(1) Public announcement data and online public feedback data of planning projects The data used in this study are from the public announcement of urban planning projects on the website of the Guangzhou municipal planning and natural resources bureau (http://ghzyj.gz.gov. cn/) and the corresponding public feedback data. Public announcement data of urban planning projects can be browsed and downloaded on this website. People can freely comment on the corresponding planning projects on this website, and these planning public feedback data are collected for analysis. The collected data span from 22 November 2007 to 3 September 2019, involving 3434 planned projects and 102,825 public comments. The data covers all county-level cities in the districts and counties under the jurisdiction of Guangzhou city. Announcement data contain detailed information about urban planning projects, such as project ID, name, and location, while public feedback data is a kind of semi-structured data, including fields such as release time, release IP, corresponding planning number, body content, etc. Feedback opinions and announcement data are linked by project ID, and the body content data of text type are the data source for the extraction of online public opinion features of urban planning in this paper.
(2) Point of interest data In this study, the Gaode Map's application programming interface (API) was used to obtain point of interest (POI) data for the whole research area, which yielded 893,924 datapoints. The coordinates of these data were converted to the WGS84 coordinate system. Some example POI data are shown in Table 1.  (3) Population data According to the 2000 census data published on the official website of the Bureau of Statistics of Guangzhou city (http://tjj.gz.gov.cn/), the statistical data of all subdistrict administrative areas of the city were obtained, which contained the population ages, education levels, and main sources of living.

Urban planning CBR
The complexities of geographical environments lead to many factors affecting geographical problems, which makes it difficult to use mathematical models for accurate prediction and reasoning. However, phenomena or problems in complex geographic environments often show similar relationships (Wenjing et al. 2008). Thus, when performing similarity analyses between geographical problems, hidden information in the geographical environment can be obtained, avoiding the use of complex mathematical models for problem-solving. On the basis of the similarity calculation method, geographical CBR extracts similar geographical cases from a large number of historical data to make predictions to complex geographical problems. Different from abstract simulations based on mathematical models, geographical CBR can allow for the analysis of geographical environmental problems from an overall point of view and build a model based on its own similarity ( A Holt 2000).
Urban planning case is an organization or description of urban planning phenomenon in geographical space and its influence. It is an abstract description of urban planning geographic information and network public opinion information based on geographical cases. It contains both geographical space-time information of traditional geographical cases and rich network public opinion information. Urban planning CBR refers to the application of geographical CBR to urban planning. After the new planning case is made, and considering the different attribute characteristics of the planning case, a case similarity measurement index can be defined. Similar planning cases are extracted from the historical planning case base, and the calculation strategy of public opinion features is constructed to predict public opinion features that may be triggered by the new planning activity according to the public opinion features of historical cases. The prediction process of public opinion features of urban planning CBR based on geographical CBR mainly includes the following three parts, as shown in Figure 2.
(1) Expression of urban planning case. Considering the influence of geographical space, the attribute features and spatial features of urban planning cases are described and quantitatively expressed, and the problems to be solved in urban planning CBR are clarified.
(2) Calculation of the case similarity. According to the specific expression method of urban planning cases, the corresponding similarity calculation method of cases is determined. Historical cases similar to the case to be predicted are extracted, and the similarity between the geographical environments in which the cases are located is considered in the similarity calculation.
(3) Prediction of public opinion features. The rules and strategies for solving the case problem of urban planning case are constructed, and the case problem of urban planning case to be predicted will be solved according to the result states of similar historical cases.

Expression of urban planning case
The urban planning case is an abstract description of public feedback after the public announcement of an urban planning project. Based on the theory of geographic CBR, this section constructs a case description model of network public opinion in the form of a 'case problem-case attribute-case result' triad to organize the extracted information from network public opinion, and describe the relationship between the predicted public opinion features in urban planning and the temporal and spatial attributes of urban planning schemes. The expression framework is shown in Figure 3.  The case problem clarifies the expressed form of the prediction target of the urban planning network public opinion, and defines the multi-dimensional characteristics of the crowd, emotion, and viewpoint of the urban planning network public opinion to be predicted. Expression of the case attribute is achieved by quantifying the different information dimensions of the planning case, including the temporal information of the planning project's publicity, its spatial properties, location information, humanistic environment, and other attribute characteristics of the geographical environment where the project is located, which is the basis for the reasoning and solving of the case problem. In other words, expression of the case result is a quantitative measurement of the state of online public opinion on urban planning, including its multi-granularity and multi-dimensional attributes, such as concerned groups, emotional tendency, and opinion categories.
2.3.1.1. Expression of the case attribute. There is a high spatial relationship between urban planning cases and their surrounding geographical environment; thus, the locations and types of planning cases are influenced by the locations and spatial relationship of existing geographical entities in the urban geographical environment. Therefore, in the attribute expression of urban planning cases, not only the attribute information of the case itself, but also the geographical environment of the city must be considered. Location is a key implicit element that affects the whole case (Du, Wen, and Cao 2009). In the quantitative measurement of case attribute, the location of the case should be taken as the core attribute, and influencing factors such as the spatial entity of the urban geographical environment, the spatial relationship between entities, and the urban residents connected to the case should be considered and combined. According to the categories of influencing factors of emotional tendency of public opinion, this paper divides case attribute into three parts: temporal, spatial, and population, which respectively represent 'time', 'place', and 'people' factors in the feature prediction of public opinion.
(1) Temporal attribute The temporal attribute (TA) of urban planning cases is expressed through the starting time when an urban planning project becomes public knowledge. The TA not only represents the beginning time of the public announcement of the planning project, but also represents the beginning time when online public opinion begins to be expressed about the planning project. The TA of planning case C i is defined as where TY i , TM i , TD i , Th i , Tm i , and Ts i represent the year, month, day, time, minute, and second of the time information, respectively, and 0 is used to represent low-precision or default information.
(2) Spatial attribute The spatial attribute (SA) of urban planning cases is composed of category semantics and location semantics, which respectively express the spatial functional features and spatial structural features of each case. The category semantics of urban planning cases express the functional characteristics of land use space in the region where the urban planning cases are located, and represent the urban construction function undertaken by urban space. According to the types of land use in urban planning, this paper classifies urban planning cases, extracts the semantic information of category words from planning news data, and maps the category words into a vector SemVec Class i = (v Class i1 , v Class i2 , · · · , v Class iM ) representing the category semantic space. The location semantics of urban planning cases are influenced by the spatial distribution characteristics and spatial relations of the geographical entities near the planning case, which reflect the structural characteristics of urban planning geographic space. For urban planning case C i , the adjacent POI set of case C i is denoted as NPP(C i ) = { p 1 , p 2 , . . . , p k }, and the distance between any POI point p j and C i is dist( p j , C i ). The category semantic vector corresponding to p j is denoted as v p j = (w p j,1 , w p j,2 , · · · , w p j,r ), where r is the total number of dimensions, and j [ {1, 2, · · · , k}.
The location semantic vector corresponding to case C i is denoted as SemVec Loc i = (w C i,1 , w C i,2 , · · · , w C i,r ), where w C i,s is the weight of the s-th dimension. That is, the category semantic vector of the planning case is the weighted average of the category semantic vector of its adjacent POI points, and the weight is the reciprocal of the distance between POI points and the planning case.
(3) Population attribute The population attribute (PA) includes the age, education, and main sources of living of the population in the subdistrict administrative area where the case is located. The age of the population is divided into equally spaced k-intervals, and the age eigenvector of case C i is defined as where the vector dimension is k. v Age il is the eigenvalue of the l-th dimension, which is equal to the proportion of people whose ages are in the l-th interval. The level of education of the population is also divided into equally spaced s-intervals, and the education eigenvector of C i is defined as where the vector dimension is s. v Edu il is the eigenvalue of the l-th dimension, which is equal to the proportion of the people whose education level is in the l-th interval. Similarly, the main sources of living of the population are divided into equally spaced q-intervals, and the main sources of living eigenvector of case C i is defined as where the vector dimension is q. v Sol il is the eigenvalue of the l-th dimension, which is equal to the proportion of the people whose main source of living is in the l-th interval.
2.3.1.2. Expression of the case problem and case result. The case result of an urban planning case refers to the corresponding public opinion information, which have the characteristics of complexity and multi-dimensional. When people express opinions on specific planning cases, they will not only express overall approval or opposition to the planning case, but also express different views on different aspects of the project construction, such as engineering projects, planning and design, and environmental greening. Therefore, in the expression of a case result, it is necessary to quantify both the different viewpoints of urban residents and their emotional tendency toward different viewpoints. To quantitatively express the public opinion information of urban planning network public opinion cases, this paper presents the multi-granularity characteristics of urban planning network public opinion, which are divided into coarse-grained features and fine-grained features to realize a multi-level expression of the case result.
(1) Coarse-grained features of public opinion The coarse-grained features of network public opinion in urban planning include the categories of concerned groups and emotional responses to each urban planning case. The categories of 'concerned people' refer to urban residents who are influenced by the construction of planning projects and who express their opinions and attitudes towards specific planning cases. It is helpful to ascertain the demands of different urban residents on planning projects by expressing the characteristics of different urban residents and extract the types of urban residents in planning public opinion information. The categories of emotional tendency refer to the general attitude of the public towards each planning case, which are usually divided into positive, negative, and neutral. This paper uses multi-dimensional eigenvectors of concerned people and emotional tendency to express the coarse-grained features of network public opinion in urban planning. The different dimensions of the concerned group's eigenvector represent different crowd categories, where a dimension value of 0 represents the crowd not being involved with the case, and 1 for the contrary scenario. The different dimension values of the case's emotional tendency's eigenvector represent the proportion of different types of emotional tendency in the urban planning network public opinion data.
All the words in the category dictionary of concerned people constitutes the concerned people category set, which is denoted as PeoT L = {t 1 , t 2 , · · · , t L }, where L is the total number of categories. For case C i , a vector with L dimensions is used to represent the eigenvector of the group of people concerned about the urban planning case, which is denoted as PeoVec i = (Peo i1 , Peo i2 , · · · , Peo iL ). Here, Peo il , the l-th dimension of PeoVec i , corresponds to the l-th category of concerned people, as shown in Equation (1), and l [ {1, 2, · · · , L}. When the l-th category of concerned people is in case C i , Peo il = 1; otherwise, Peo il = 0.
For case C i , the set of public feedback texts is D i = {d i1 , d i2 , · · · d im }, and the list of emotional tendency of these texts is Pol i = { p i1 , p i2 , · · · p im }, where m represents the total number of feedback texts of case C i . The collection of all public feedback sentiment categories is PolT K = {t 1 , t 2 , · · · , t K }, where K is the total number of categories. The eigenvector of emotional tendency is expressed as a k-dimension vector, which is denoted as PolVec i = (Pol i1 , Pol i2 , · · · , Pol iK ). Here, Pol ik , the k-th dimension of PolVec i , corresponds to the k-th emotional tendency, t k , where t k [ PolT K . The calculation method for Pol ik is shown in Equation (2), where count(Pol i , t k ) is the number of elements in list Pol i whose value is t k .
(2) Fine-grained features of public opinion The fine-grained features of network public opinion in urban planning include the viewpoint category and the emotional tendency category. The viewpoint category refers to viewpoints on the urban planning case in all aspects of the project's construction and residents' lives. The emotional tendency category at the viewpoint-level reflects the degrees of satisfaction of urban residents with different opinions. In this study, a multi-dimensional eigenvector of emotional tendency at the viewpoint-level is used to express the fine-grained emotional features of urban planning cases. The different dimensions of the vector represent different categories of viewpoints, and the dimension scalar value represents the affective tendency category corresponding to the opinion category.
The set of viewpoint vocabulary vectors obtained based on the LDA model is recorded as w = {w k |k [ {1, 2, · · · , K}}, w k = (w k1 , · · · , w kq , · · · , w kQ ), K is the total number of viewpoints, and Q is the is the total number of words. Firstly, a viewpoint cluster is defined for each viewpoint vocabulary vector, and then the viewpoint cluster is merged according to the minimum distance criterion, that is, the two viewpoint clusters with the minimum distance between the classes are merged until only one viewpoint cluster is left. The calculation method of the inter-class distance of opinion clusters u i and u j is shown in Equation (3). Among them, w im and w jn represent the viewpoint vocabulary vectors belonging to the opinion clusters u i and u j respectively, and dist(w im , w jn ) is the Euclidean distance of the vectors w im and w jn . After obtaining the multi-level opinion clusters, the semantic information of the opinion clusters is obtained according to the opinion vocabulary vector, and combined with the basic framework of the urban living environment quality evaluation index system, the opinion category system is determined, and the name of the opinion category is defined.
The emotional tendency feature at the viewpoint-level of network public opinion in urban planning is the total set of emotional tendency at the viewpoint-level of all public feedback opinion texts of the urban planning case. The eigenvector of emotional tendency at the viewpoint-level of case C i is recorded as OpiVec i = (Pol i1 , Pol i2 , · · · , Pol iP ), and the text set of public feedback for case C i is recorded as D i = {d ij | j [ {1, 2, · · · , J}}, where J represents the total number of public feedback responses about the case. Any text d ij in public feedback set D i corresponds to P categories of viewpoint. The emotional tendency at the viewpoint-level of d ij is recorded as OpiVec d ij = (Pol ij1 , Pol ij2 , · · · , Pol ijp , · · · , Pol ijP ), where Pol ijp represents the category of emotional tendency at the viewpoint-level. The calculation method of Pol ip for the emotional tendency of case C i is shown in Equation (4) (4)

Calculation of case similarity
This paper proposes a similarity calculation model of urban planning cases that integrates case attribute. According to the case expression framework, the attributes of urban planning cases are composed of TA, SA, and PA, and the case similarity is defined as the average similarity of each attribute. For cases C i and C j in the historical cases set C, i, j [ [1, n], n is the total number of cases. The attribute set of case C i is A i = {TA i , SA i , PA i }, and the attribute set of case C j is A j = {TA j , SA j , PA j }, so the calculation formula for the similarity of case C i and C j is as follows: Among them, ATSim, ASSim, and APSim are the similarity of TA, SA, and PA, respectively. The larger the CaseSim value, the greater the degree of similarity between cases.
The calculation process of case similarity is shown in Figure 4. Firstly, the eigenvectors of TA, SA, and PA are obtained. Secondly, different similarity calculation methods are used for the different attributes to calculate the local similarity between the different attribute features. Finally, the global similarity between cases is obtained according to Equation (4).
2.3.2.1. Similarity of TA. For cases C i and C j , the calculation method of the time similarity is as in Equation (6): Among them, TA 1 and TA n are, respectively, the time of the first and last case in the historical planning cases. TADis is the distance in time, whose calculation method is as in Equation (7): where Num(TA i ) is the 8-digit representation of AT i , which is calculated as Num(TA i ) = 10, 000 × TY i + 100 × TM i + TD i where TY i , TM i , and TD i , respectively, represent the year, month, and day of the start time of the planning case announcement.
The value range of TASim is [0, 1], where larger TASim values imply greater similarity degrees of TA.
2.3.2.2. Similarity of SA. This paper divides the SA of each case into semantic and spatial semantic feature categories. Semantic features are n-dimensional vectors of the form (w 1 , w 2 , · · · , w l , · · · , w n ), where w l is the weight corresponding to semantic feature term l. In this paper, the cosine distance is used to calculate the similarity of semantic eigenvectors, and hence measure the consistency of SAs in different semantic dimensions. For cases C i and C j , the similarity of SA is defined as the average value of the category's semantic similarity and location semantic similarity; the calculation method is shown in Equation (9): where ClassSim represents the semantic similarity of case categories, and LocSim represents the semantic similarity of the case location. The calculation methods of ClassSim and LocSim are shown in Equations (10) and (11), respectively: where SemVec Class i and SemVec Class j in Equation (9) represent the category semantic vectors of cases C i and C j , respectively, SemVec Loc i and SemVec Loc j in Equation (11) represent the location semantic vectors of cases C i and C j , respectively, and CosDist is the cosine distance of the semantic vector of the category position, used to measure the similarity between the directions of the vectors. For any two word vectors, e.g. v i = (w i1 , w i2 , · · · , w il , · · · , w in ) and v j = (w j1 , w j2 , · · · , w jl , · · · , w jn ), the calculation method of CosDist is shown in Equation (12): The value range of CosDist is [0, 2]. If two vectors are exactly equal, the cosine distance between them is 0. The value range of ASSim is [0, 1]; hence, larger ASSim values imply higher similarities between the SAs of the urban planning cases.
2.3.2.3. Similarity of PA. In this paper, three types of features of the subdistrict where the case is located, population age, education level, and main source of living, are represented as multi-dimensional eigenvectors. The population age is divided into equally spaced k-intervals, and the eigenvector of the population age in case C i is defined as PeoVec Age i = (w Age 1 , w Age 2 , · · · , w Age l , · · · , w Age k ), where the vector dimension is k, and w Age l is the eigenvalue of the l-th dimension, which is equal to the proportion of people whose ages are in the l-th interval. The education level is divided into s categories, and the eigenvector of the education level in case C i is defined as PeoVec Edu i = (w Edu 1 , w Edu 2 , · · · , w Edu l , · · · , w Edu s ), where the vector dimension is s, and w Edu l is the eigenvalue of the l-th dimension, which is equal to the proportion of the population with the l-th education level. The main sources of living are divided into q categories, and the eigenvector of the main sources of living in case C i is defined as PeoVec Sol i = (w Sol 1 , w Sol 2 , · · · , w Sol l , · · · , w Sol q ), where the vector dimension is q, and w Sol l is the eigenvalue of the l-th dimension, which is equal to the proportion of the number of people whose main source of living is in the l-th interval.
The Euclidean distance is used to calculate the similarity of the population eigenvectors and thus measure the differences between the values of population features in different dimensions. For cases C i and C j , the similarity of PA is defined as the average similarity value of the age, education level, and main source of living. The calculation method is shown in Equation (13): where AgeSim, EduSim, and SolSim are the similarity degrees of age, education level, and main source of living, respectively. Taking AgeSim as an example, its calculation method is shown in Equation (14): where PeoVec Age i and PeoVec Age j are the population age eigenvectors of cases C i and C j , and EucDist is the Euclidian distance of the population age eigenvector, given as: where w Age il and w Age jl are, respectively, the eigenvalues of the population age eigenvectors of cases C i and C j in the l-th dimension. The calculation methods of EduSim and SolSim are similar to that of AgeSim. The value range of APSim is [0, 1]; hence larger APSim values suggest higher degrees of similarity between the PAs of the planning cases.

Prediction of public opinion features
On the basis of the case similarity calculations, KNN is used to obtain the results of CBR (Guo et al. 2003). However, the traditional KNN method takes the category that contains the largest number of samples from the nearest K samples as the classification result, without considering the influence of the sample similarity on the result, which is prone to misclassifications caused by improper values of K (M.-L. Zhang and Zhou 2007). This study improves the traditional KNN method by proposing a calculation method of public opinion features based on SWs. The calculation method of emotional orientation based on SWs is based on weighting the emotional orientation of historically similar cases, and assigning larger weights to more similar cases. This approach can avoid situations where equal numbers of samples from different categories in K adjacent points occur, which reduces the amount of classification errors caused by differences in the K value, thereby improving the robustness of our prediction model.

Calculation of SW.
Based on the case similarity calculation method, the K cases that are most similar to the case to be predicted are selected from the historical cases set. The public opinion feature of the case to be predicted is predicted by integrating the K historical cases. According to Equation (5), C i and C j represent the case to be predicted and the j-th similar case, respectively, and the similarity degree of the K closest cases of C i is calculated. Then, the weights of the different cases are set according to their corresponding similarities. The weight calculation method is shown in Equation (16), where the higher the similarity, the greater the weight:

Prediction of public opinion features.
(1) Viewpoint-level sentiment The emotional tendency of the case to be predicted is calculated by integrating the emotional tendencies of the K cases. According to the definition of the case problem in this paper, the emotional tendency prediction result of case C i is expressed as an emotional tendency vector, OpiVec i = {Pol i1 , Pol i2 , · · · , Pol im }, where m represents the total number of viewpoints. For emotional tendency Pol il , the calculation method is shown in Equation (17): where H is the total number of affective tendency value categories, and Pol h jl is the value of emotional tendency in the l-th dimension of the j-th similar case, C j . When Pol h jl belongs to class h, Pol h jl = 1; otherwise, Pol h jl = 0.
(2) Categories of concerned people In addition to the prediction of emotional tendency, CBR can also realize the prediction of the categories of people concerned about the case. The predicted result of the categories of concerned people is expressed as vector PeoVec i = {Peo i1 , Peo i2 , · · · , Peo iL }, and the vector of the categories of concerned people in the j-th similar case is recorded as PeoVec j = {Peo j1 , Peo j2 , · · · , Peo jL }, where L is the total number of categories of concerned people. For the l-th dimension Peo il , and for l [ {1, 2, · · · , L}, the calculation method is shown in Equation (18): Case C j is a similar case to case C i , and w(C i , C j ) is the weight of C j to C i . T is the total number of values available for dimensions of the categories of concerned people. In this study, the desirable values for each dimension are {0, 1}; that is, T = 2. When case C j has a category of concerned people corresponding to the l-th dimension, Peo t jl = 1; otherwise, Peo t jl = 0. According to Equation (18), the value of the l-th dimension of the vector of the category of concerned people in case C i is equal to the value of the maximum sum of weights in similar cases.
In conclusion, the similarity measurement method based on the multiple attributes of each case yields the most similar historical planning cases of the case to be predicted. According to the public opinion features of historical cases, with the SW method, the emotional tendency of public opinion of planning cases and the eigenvector of concerned groups can be predicted.

Data preprocessing
(1) The division of training data and semantic vector of case The first 3334 planning cases were selected as the historical data set, spanning from 19 October 2007 to 10 August 2019, and the remaining 100 planning cases were selected as validation data for the prediction model of public opinion features, which range from 11 August 2019 to 2 September 2019. TA, the semantic vector of case category, and semantic vector of case position were obtained by calculation.
(2) Population division by attribute and population eigenvectors The population was divided into 22 age groups: <1 years, 1-4 years, 5-9 years, 10-14 years, 15-19 years, … , 95-99 years, and >100 years. Therefore, the dimension of population age in this case is 22. The educational level was divided into seven categories, including 'never went to school', 'primary school', 'junior high school', 'senior high school', 'junior college', 'undergraduate', and 'graduate'; thus, the dimension of educational level in this case is 7. The main sources of living were also divided into seven categories, including 'labor income', 'retirement pension', 'unemployment insurance', 'minimum living security', 'property income', 'support from family members', and 'others'. As before, the dimension of the main sources of living in the case is 7. The inverse geocoding API of Gaode Map was used to obtain the subdistrict administrative division corresponding to the case's coordinates, and the population feature vector of the subdistrict was taken as the population feature vector of the case. Some sample population feature vector data of the case are shown in Table 2.

Analysis of the overall characteristics of network public opinion
(1) Categories of population In the text of public feedback opinions, the public usually uses specific words to describe their age, occupation, and other characteristics of concerned people, such as 'old man', 'child', 'student', 'head of household', 'leader', and so on, indicating the category of the concerned person in the network public opinion data. With Chinese word segmentation and part-of-speech tagging, 28 groups of concerned people in the feedback text were obtained: owners, residents, children, students, the elderly, leaders, masses, pedestrians, patients, teachers, adult children, security guards, tourists, passengers, merchants, civil servants, workers, doctors, managers, intermediaries, technicians, passengers, police, sanitation workers, researchers, hosts, audience, party members, and cadres. As there are 28 population categories in the category dictionary of concerned groups, the dimension of the feature vector is 28.
The text extraction results of public feedback were summarized to obtain the categories of concerned people of planning cases. Statistics are made on the number of cases involved in the categories of concerned people, as shown in Figure 5. It can be found that the 'owner' category has the largest number of cases, accounting for 57% of the total, indicating that the 'owner' group is most affected by planning.
(2) Viewpoints of public opinion According to the point of view recognition method added in 2.3.1.2, the semantic similarities and differences among viewpoints were calculated, and viewpoints with similar lexical probability distributions were grouped into one class to obtain viewpoint clusters with hierarchical relationships. According to the results of the viewpoint clustering, a total of 6 viewpoint categories and 23 viewpoint subcategories were obtained by selecting clusters with clear meanings in the probability vector of the viewpoint vocabulary. Therefore, the dimension of emotional tendency eigenvector of viewpoint categories is 6, and that of viewpoint subcategories is 23. A classification system of viewpoints about public opinion with a two-layer structure was then constructed, where the hierarchy of the classification system is shown in Figure 6. The viewpoint-level emotional tendencies were divided into four categories: neutral, positive, negative, and unmentioned, which are represented by the numbers 0, 1, 2, and 3, respectively. The positive category indicates approval and support for the planning; the negative category indicates dissatisfaction with the planned project; and the neutral category indicates suggestions and expectations regarding details of the planning scheme.

Analysis of the influencing factors
To explore the influence of the case attribute on the prediction of public opinion features, we designed a prediction comparison experiment with different combinations of case attribute, e.g. TA, SA, and PA. Different attribute elements were considered when calculating the similarities of the planning cases based on Equation (4) to realize the prediction of public opinion features via the combination of different attributes. We set the K value to 2% of the total number of historical cases. The viewpoint-level emotional tendency of the public opinion in the case to be predicted was calculated based on Equation (16), and the categories of the attention groups of the public opinion in the case were predicted based on Equation (17). The Hamming loss (HL), and Average-Macro-F1 (MaF1), and Average-Micro-F1 (MiF1) scores were used for model verification. Table 3 shows the experimental results of the predicted emotional tendencies of the viewpoint categories and viewpoint subcategories, and the categories of concerned groups. As shown in Table 3, smaller HL values and larger MaF1 and MiF1 values indicate higher prediction accuracies.
In the prediction of public opinion based on the method proposed in this paper, the multi-attribute fusion (TA + SA + PA) prediction model of public opinion features had the highest prediction accuracy. Its accuracy was higher than that found when using a single attribute and double attributes. The prediction results of a single attribute can reflect the importance of different attributes for the prediction of public opinion features to a certain extent. Here, we found that the prediction accuracy based on SA was the highest, which indicates that SA is a key factor affecting the prediction of public opinion features. Interestingly, SA is ignored in most public opinion predictions at present. Similarly, in the prediction of public opinion features with a combination of two attributes, the prediction accuracy of combining SA and PA was highest. On the basis of introducing SAs, TAs, and PAs to carry out case similarity calculations, forming a multi-attribute fusion (TA + SA + PA) prediction model of public opinion features appears to yield excellent results. The prediction results on the sentiment tendency and characteristics of concerned groups were more accurate, which verifies the effectiveness of the prediction of public opinion characteristics based on the combination of multiple attributes proposed in this paper.

Comparison of the prediction methods
To verify the method of case public opinion feature calculation based on SWs proposed in this paper, a comparative experiment between the SW method, the traditional KNN method, and a linear support vector machine (SVM) was designed. The HL, MaF1, and MiF1 were used for model verification. As summarized in Table 3, the optimal combination of case attribute was selected, namely 'TA + SA + PA'. The experimental results of predicting the emotional tendencies of viewpoint categories and subcategories, and the categories of concerned people are shown in Table 4.
The results show that compared with linear SVM and KNN methods, the calculation strategy based on SWs can improve the prediction accuracy of public opinion features. According to the values of MaF1, the SW method had a higher MaF1 value than found with KNN in predicting the emotional tendency of viewpoint categories, the emotional tendency of viewpoint subcategories, and categories of concerned people. With the SW method, the MaF1 value increased by about 10% on average, the average-Micro-F1 value increased by about 6%, and the HL value decreased by about 16%, indicating that the introduction of the SW can significantly improve the prediction accuracy, and the model is applicable to the prediction of various public opinion characteristics. Different from the KNN model, cases with higher similarity in the SW model had higher weights in the prediction process, indicating that the characteristics of public opinion among planning cases with similar attribute characteristics were more similar, so the prediction accuracies found with the SWs were higher. The prediction accuracy of the three kinds of models in the sentiment tendency of the finer category of opinions was higher than that in the sentiment tendency of the coarser category of opinions, which further indicates that the finer classification of opinion categories was helpful for the model to measure the characteristics and similarity of case opinions, so as to obtain a higher prediction accuracy of the sentiment tendency of public opinions.

Conclusions
Previous studies have not conducted multidimensional analysis of urban planning cases, especially in the context of the increasingly strong desire of the public to participate in urban planning, and lack of studies that fully consider multidimensional attributes such as subjective social population factors and objective geographical environment factors. Thus, based on the theory of geographical CBR, this paper presented an expression framework of urban planning cases based on integrating subjective humanistic factors and objective geographical environment factors. By combining multidimensional attributes and case SWs, a prediction model was constructed that can predict the emotional tendencies and concerned groups of new planning cases. On the basis of the expression framework of case attribute, the similarity between cases was measured using multi-dimensional attributes, such as TA, SA, and PA. The emotional tendency and the concerned groups of new planning cases could be predicted according to the public opinion features of similar historical planning cases.
The experimental results showed that the case similarity measurement method with multidimensional attributes was helpful for improving prediction results. Compared with the traditional KNN method, the proposed prediction model based on SWs could effectively improve the prediction accuracy, where the average MIC-F1 score reached more than 74%. The prediction model proposed in this paper could predict the development trend of public opinion of future planning cases. The prediction results of emotional tendency and concerned people have the potential to be greatly beneficial to ameliorating construction schemes for future planning projects in a targeted way, thereby improving urban residents' satisfaction, and providing reasonable decision support for urban planning.
Future research work could be carried out in the following two ways. First, more influence factors can be considered when predicting public opinion features, especially as the influencing factors of public opinion in urban planning cases are complex. This will allow us to construct a more reasonable method for measuring case similarity. Second, the weights of the different impact factors can be calculated. With different attributes in the case attribute having different influences on emotional tendency, calculating their weights may help to further improve the prediction accuracy.
Glossary CBR: case-based reasoning TA: temporal attribute SA: spatial attribute PA: population attribute

Disclosure statement
No potential conflict of interest was reported by the author(s).

Data availability
The data that support the findings of this study are available from the corresponding author, R. L, upon reasonable request.