Research on post occupancy evaluation of Oze National Park in Japan based on online reviews

ABSTRACT With the development of internet, online reviews are user-generated content posted in the web-media era and can extract meaning from the comments through data-mining technology. The study uses python crawler to collect reviews and conduct post-occupancy evaluation on Oze National Park (ONP), a natural conservation in Japan. Content and semantic network analysis were adopted to study the high-frequency words and their relationships. Latent Dirichlet Allocation was utilized to uncover the dimensions of the comments. The conclusions were as follows: 1) Japanese tourists focused on landscape plants and nature, while the Chinese focused on facilities and experience, English concentrated on hiking activity. 2) The Ozegahara Plateau route and the Lake Ozenuma route were the two main courses in ONP. 3) 48% of the concept of the landscape of ONP was conveyed to the users. 4) Male tourists focused on location information and landscape, while females were concerned about the process and experience. The single-visitors paid attention to the road and boardwalk. Family tourism concentrated on experience and time while friends and couple-visitors preferred to perceive action. 5) The Lysichiton Americanus had become a basic feature of ONP, and visitors were satisfied with the landscape and experience.


Introduction
According to the International Union for Conservation of Nature, national parks protect large-scale ecological areas, the species and ecosystem (IUCN 2020), which integrate protection and moderate tourism development together in a specific area. In 1872, the US Congress established the world's first national park, Yellowstone National Park, which preserved natural area to benefit people (Sellars 2009). The advent of the Yellowstone sparked an upsurge in the construction and protection of national parks subsequently. Unlike North America's idea of establishing national parks to protect vast unspoiled natural areas, Germany focuses on small conservation part, and the first national park Bavarian Forest was created in 1970 (Mayer and Woltering 2018). The British Parliament passed National Parks and Access to the Countryside Act 1949 to promote national parks, with seven national parks established in upland England in 1950s (Thompson, Garrod, and Raley 2015). South Africa has a well-established national park system to protect the environment and biodiversity, with Kruger National Park first delegated in 1898 (Burgess 2012). In Asia, Japan established a sound national park system in 1931 with the promotion of National Parks Law (Hiwasaki 2005).
The national parks in Japan consist of 34 parks, characterized by a variety of environments such as mountains, forests, farmland and rural areas. Dating back to the early 20th century, Japan began to explore the way that suited its national conditions. An application to recognize Nikko as an imperial park was submitted to the National Diet in 1911 in Japan. Under the strong initiative of the citizens, the National Park Act was issued in 1931, and Setonaikai, Unzen and Kirishma were the first three national parks to be designated in 1934 (Ministry Of The Environment 2020a). In 1957, Japan revised the National Parks Law. According to the act, natural park classification we call today consists of three parts: 34 national parks, 57 quasi-national parks, and 311 prefectural natural parks (Ministry Of The Environment 2020b).
With the booming tourism industry, national parks have become major tourist destinations worldwide. However, many are facing environmental damage due to excessive tourism development (Eagles and McCool 2002;Finnessey 2012), resulting in the pollution and depletion of natural resources. Wildlife and plants suffered gravely from the pollution (Finnessey 2012). Attention has been paid to the protection of national parks and environment (Xu and Fox 2014). In Japan, the establishment of national parks was more related to tourism and entertainment than nature conservation, and it had been difficult for Japan to balance the dual tasks of protection and entertainment (Hiwasaki 2005). The increasing number of visitors to national parks in the 1960s led to environmental pollution and overdevelopment of tourism and destruction of biodiversity (Ministry Of The Environment 2021a).
In the digital era, the advanced application of the internet and social media provides numerous online reviews related to travel information (Gretzel and Yoo 2008;Ye, Zhang, and Law 2009) and promoted a new way to collect feedback. Rudimentary approaches of collecting customer feedback are limited to time, cost and geographical constraints when compared with online survey (Fleming and Bowden 2009). It usually receives only a small number of responses, which not only tend to show obvious bias but are not sufficient for analysis (Udyapuram and Gavirneni 2019). Recently, tourists are likely to post and share their experience online (Wong and Qi 2017), which are often used in tourism to shape the perception of destinations' images (Choi, Lehto, and Morrison 2007;Stepchenkova and Morrison 2006), customers' satisfaction (Ahani et al. 2019;Xiang et al. 2017) and the field of attractions (Fang et al. 2016;Pearce and Wu 2018). Mining techniques such as data collection and data mining are adopted to dig out the latent information in online reviews (Luo et al. 2021;Xiang et al. 2017). Although online reviews are widely used in tourism and hotel management, the applications of social media reviews in landscape and urban planning are still lacking (Song et al. 2021). The subsidiary tourism feature of national parks makes it possible to evaluate the landscape and urban planning of national parks through online reviews.
Research on Japanese national parks had been carried out mainly in two aspects, one was the policy and management of the national parks (Norihisa and Suzuki 2006), the other paid attention to the protection of the ecosystem (Yamamoto 2019) and landscape (Kunii and Furuya 2011;Wang et al. 1999). However, there are few papers to conduct post-occupancy evaluation (POE) on national parks from the perspective of online reviews and big data, which is the user's feedback on the evaluation of the built environment (Zimring and Reizenstein 1980). This study was a pilot work to conduct postoccupancy evaluation on ONP using review data and big data approaches. Most importantly, it proposed a new method for POE combined with big data to guide the practice of urban planning and design practices, and provides a reference for the related fields of landscape and built environment. This paper aimed to explore four research objectives: (1) the basic perception and attitude of reviewers on ONP in different language contexts, (2) whether the concept of the landscape was efficiently conveyed, (3) difference and preference in the perception under different demographic characteristics, (4) topics of the online reviews.

Oze National Park
Located at about 140 km north of Tokyo, Oze National Park (ONP) was designated as part of Nikko National Park and got separated from it on 30 August 2007. ONP covers an area of 37,200 ha across Fukushima, Tochigi, Gunma and Niigata and consists of Ozegahara Plateau and Japan's largest highland moor Lake Ozenuma together with the surrounding mountains and marshes (Ministry Of The Environment 2020c). Figure 1 showed that ONP was divided into special protection zone and special zone (Ministry Of The Environment 2020d). In 1950s, excessive exploration caused vegetation damage and excessive garbage (Hiwasaki 2007). The beauty of ONP ( Figure 2) today lies in the nature conservation activities such as the control of private cars ( Figure 3) and Vegetation restoration, etc. Those initiatives were emerged in the ONP and applied nationwide. That is why ONP is considered to be the origin of nature conservation. To the north of Oze lies Mt. Keizuru and Mt. Hiuchigatake (Figure 4), the highest peak in northern Japan, and to the west lies Mt. Shibutsu, Mt. Koshibutsu and Mt. Kasagatake. Recognized as an internationally important wetland in the Convention on Wetlands (Ramsar Convention), ONP owns great diversity in the flora and fauna, with more than 938 species confirmed to grow in ONP (Ramsar 2020).

Data collection and pre-processing
First, we utilized the automated techniques to collect online reviews to form the database of users. The terms "Oze National Park", "Ozegahara Plateau", "Lake Ozenuma", etc. were used to search for online reviews related to ONP on several major tourism websites ( Figure 5), such as tripadvisor (https://www.tripadvi sor.jp) and jalan (https://www.jalan.net). We developed a crawler program written in python language to mimic the user's access to the website by using a list of URLs and downloaded all the information in the following links of online reviews shown in the search results. Subsequently, we extracted the relevant information, including comments, titles, ratings, gender and so on, and deleted the same and blank information. Finally, we saved the data into a CSV formal file. From 25 October 2007 to 20 July 2020, a total of 947 unique reviews were collected, including demographic information such as gender and tourism type. The reviews consisted of 878 comments in Japanese, 31  comments in Chinese, 38 comments in English. At the meantime, the database of designers and management came from 23 official files published on the website of ONP, including "The 7th Oze National Park Conference", "Outline of the Oze National Park Conference", etc (Ministry Of The Environment 2021b).

Data construction
The databases were constructed, respectively (Table 1), to analyze the difference between the most frequent words in online reviews of ONP and texts of official files. The users' databases were divided into three parts according to the language used in the reviews. The 878 Japanese online comments were combined together to form the Japanese database with a total of 152,929 characters. The demographic information may be undisclosed in the online reviewers (Table 2). Thirty-one reviews with 18,195 characters and 38 reviews with 3964 characters made Chinese and English database, respectively. The other was the manager's database, with 23 official files transformed from pdf into text format.

Content analysis
The content analysis of online reviews was to analyze the words frequency in the data (Choi, Lehto, and Morrison 2007;Stepchenkova and Morrison 2006). The process of content analysis was divided into the following parts: texts pro-processing, word segmentation, removing stop words, statistical word frequency and visualization. The texts of Japanese, Chinese, English reviews and the texts of official files were read in python, respectively. Subsequently, due to the uniqueness of the Japanese and Chinese, the mecab module, a Japanese word segmentation system (Kudo 2006), and the jieba module for Chinese word segmentation (J. Sun 2012) were imported into python programming to help segment the sentences. The third step was to remove the stop words, including default stop words and self-built vocabulary and counted the occurrences of each word. Then the frequency was calculated as the  number of word occurrences divided by the number of comments and shown in Table 3 in rank order. Ultimately, the tagxedo tool was adopted to visualize the results. In addition, this paper also adopted content analysis to investigate the different perceptions under different demographic characteristics. By analyzing the most frequency words, the difference between the information that the designer and manger wanted to convey and that received by the user were compared (Table 4) to judge whether the concept was effectively communicated (Zhao, Lin, and Zhang 2019).

Semantic analysis
Although the content analysis well reflected the importance of words in the data, it still had certain limitations. The mutual relationship and classification of keywords were ignored, resulting from the lack of location information of words (X. Sun and Ni 2018). Semantic analysis takes most frequency words as nodes, the frequency of cooccurrence of the words as the connection between nodes to draw meaning from highfrequency words (Hou et al. 2019). Therefore, semantic network analysis was adopted to make up for this shortcoming. Python programming was used to find the top 100 most frequently mentioned words in each comment. Table 5 shows the results of the sample. Ultimately, the resulting data was imported into the SPSS24.0 software for correlation analysis. Those with a correlation coefficient above 0.4 were considered to have strong correlation and extracted, and those less than 0.4 were considered irrelevant and eliminated. Furthermore, location high-frequency words were separately extracted and semantic analysis was performed on them to understand the course and planning of ONP.

LDA model
When mining potential topics in online reviews, topic model technologies are widely used to speculate potential topics. Latent dirichlet allocation (LDA) is the most common methods used for topic modelling (Blei, Ng, and Jordan 2003) to extract dimensions and related words from online reviews (Guo, Barnes, and Jia 2017;Luo et al. 2021;Song et al. 2021). Each review was modeled by the distribution of latent topics, and each topic was represented by the distribution of several words. Gensim module was imported to python to conduct LDA analysis (Řehůřek and Sojka 2011), and Term Frequency-Inverse Document Frequency (TFIDF) was applied to modify the high-frequency words. Finally, we used the pyLDAvis module to visualize the result of the LDA (Sievert and Shirley 2014). In the test to extract potential topics, the different numbers of topics from 2 to 50 were conducted in LDA. We identified 22 topics as the most suitable ones based on perplexity curves ( Figure 6) and intertopic distance map diagram ( Figure 7). On the one hand, the lower the performance score, the better the generalization performance (Blei, Ng, and Jordan 2003). On the other hand, we took the intertopic distance map into consideration and subjectively compared the overlap of each topic in different numbers of topics. For example, Figure 7 showed that the overlapping performance of 22 topics was better than that of 10 topics. After comprehensive
(4) Feelings: Think (42.03%), Enjoy (26.42%), Good (20.73%), Very (20.05%), etc.  Note: Due to the variability of the words in Japanese, variants of the same word were counted as one word ("見ま"was counted as"見る")      Words like Times (14.46%) and comment 1&2 indicated that the subjects thought the park was worth visiting again, suggesting that the experience and landscape were awesome. Negative or neutral reviews, like comment 3, mainly focused on the transportation, especially in summer when there were too many people to visit ONP. 7.18% of the comments mentioned Physical Strength (rank 87 th ) which indicated that hiking might cause excessive physical consumption, make tourists prone to fatigue (see comment 4), and be unfriendly to the middle-aged and elderly tourists. Comment 1: "This was my second time to visit here. The weather was fine and the Lysichiton americanus was blooming." Comment 2: "I was healed by the wonderful Ozenuma. I just stayed for a long time. We definitely want to go again." Comment 3: "There are few places to rest and it is very crowded. I no longer want to go to Oze." Comment 4: "I went to see the boardwalk and Ozenuma that I admired. I just walked and used my physical strength quite a bit"

Perception on ONP based on Chinese reviews
For the content analysis of Chinese reviews ( Figure  12), the most frequent word was still Oze (154.84%). Unlike Japanese, the Chinese were more sensitive to location information and facilities. About 96.77% comments mentioned Wetlands, 83.87% mentioned Hut and 80.65% noted the Japan. Moreover, about 93.55%, 41.94%, 38.71% and 29.03% of the comments mentioned Rest, Night, Bus and Two Days, respectively, showing that visitors paid attention to the time and facilities during the journey (see comment 5&6).
Comment 5: "I have been there twice, and both stayed for two days" (Chinese) Comment 6: "The stars are so beautiful at night" (Chinese)  Comment 7: "Spent 2 days. We stayed the night at Miharashi" (English) Comment 8: "It's difficult to arrange everything at first" (English) Comment 9: "We could have been better prepared" (English) Comment 10: "I thought that huts and lodges that take bookings could have helped by providing more information about local conditions, especially to foreigners." Comment 11: "Me being a foreigner and a solo hiker, to say the truth it's difficult to arrange everything at first." Comment 12: "We were not prepared. No proper shoes."

Perception on ONP based on English reviews
As for the English content analysis (Figure 13), the most frequent word was Park (152.63%), while Oze (78.95%) only ranked 4 th in the list, quite different to Japanese and Chinese reviews. 78.95% of comments talked about Walk, Oze and National, and 68.42% of the reviews mentioned Hike, showing that English reviewers were more likely to perceive hiking in the journey. In addition, words like Day (63.15%), Hut (47.37%), Stay (47.37%), Hut (47.37%), Night (28.95%) and comment7 suggested that the travel would be overnight. Interestingly, the Lysichiton americanus ranked high in Japanese and Chinese reviews, but not in the English list. One possible reason was that the word in Japanese and Chinese were written in the same way, while the word in English was terminology. What the English reviews complained was about the unfriendly book system and the poor preparation at the beginning in the beginning (see comment 8-12).
Comparing the content analysis of the Chinese and English comments, we were able to figure out the different perceptions under different language contexts, making the research more comprehensive. The Japanese were sensitive to the landscape, nature and plants. Chinese visitors paid attention to facilities and perception of tourism, while English concentrated more on the perception hiking activity.

Communication of landscape concept
The high-frequency words from texts of the design management files were visualized (Figure 14), which could be divided into following parts: (1) Environmental protection. Utilize (4313) Table 4 showed the comparison of the top 100 high-frequency words in three languages with the official files. Interestingly, about 48% of the concept of the landscape was conveyed to the users. One possible explanation was that ONP was a pioneer of the Japanese conservation movement, which had experienced over exploitation and other obstacles. There was no doubt that management emphasized protection of the park rather than the development of tourism, which was considered as a subsidiary element in ONP.

Different perceptions on ONP in gender reviews
According to the characteristics of Japanese reviews, the comments were classified according to demographic characteristics. In terms of gender reviews, there were 37 female reviews and 44 male reviews. Table 6 displayed and compared the top 20 highfrequency words of the male and female reviews. The top three words in male reviews were Ozegahara Plateau, Oze and Wetland, with the proportion of 131.82%, 90.91% and 65.91%. For reviews like comment 13, it was inferred that male visitors were sensitive to location information. However, female comments emphasized the process and experience of trip, demonstrated in the words like Go (83.78%), Road (81.08%), Think (70.27%), Lysichiton Americanus (70.27%). Another interesting observation was that male visitors had a greater tendency to perceive landscapes and flowers than women. Among the top 20 most frequency words, half of words in the men's lists were related to landscape and plant, while only 5 words in the female top 20 list.  Table 7 showed the comparison of the top 20 high-frequency words among friends, family, single and couple travel. Couple reviews accounted for the majority in the reviews of travel types, indicating that couple travel was the most common in ONP. Words such as: Oze, Lysichiton Americanus, Go were all ranked high in the four groups of reviews. This reinforced the notion that Lysichiton Americanus was the main feature of ONP. However, the rank for Road and Tree were very different for each list. For instance, Road ranked 1 st in single comments, 11 th in the couple reviews, 9 th in the friends reviews and 17 th in the family reviews. Different aspects of emphasis were put on the trip. As comment 14 showed, single tourists focused on the road and boardwalk. Family travel emphasized on the experience and the travel itself. For example, Month (45.16%) and Comment 15&16 showed that family paid great attention to the time and period. Words like Go, See, Walk, and  Figure 15. Semantic network analysis of Japanese online reviews.

Different perception on ONP in tourism type reviews
Think were used most frequently in the friends and couple list, indicating they tended to perceive action. In general, ONP was suitable for both individual and group travel.

Potential connection of the words
This paper conducted semantic network analysis on the content of Japanese reviews and utilized high-frequency words as nodes to elaborate the relevance between the top 100 most frequently mentioned words. Networkx module was imported in python (Hagberg, Swart, and S Chult 2008) to visualize the semantic network of the Japanese reviews ( Figure 15). The width of the line between words expressed the correlation of the two words. Based on the analysis of the network semantic graph, the following conclusions were drawn: (1) Hatomachi-Yamanohana, Ushikubi bunki-Yamanohana, Hatomachi-Ryugu, Yamanohana-Ryugu, Hatomachi-Tokura were words about location ( Figure 16). The lines between these keywords were obvious, indicating that the correlations between the two sites were quite strong (see comment 17).  common course of Ozegahara Plateau. The second route began with Oshimizu and connected Lake Ozenuma and Wetland. Ozegahara Plateau ranked higher than Lake Ozenuma in Table 3, perhaps indicating that people preferred Ozegahara Plateau to Lake Ozenuma.
(2) Analysis based on the connection of Hut-Night, Mountain Lodges-Night and Hut-Ryugu showed that it might take too long to travel in Oze, resulting in overnight accommodation in Oze (comment 18).
(3) Toilet-Yen, Park-Yen, Bus-Trip and Park-Tokura also had a strong correlation. The parking lot and toilets in Oze charged a certain fee for waste disposal. Moreover, restrictions were made on private cars. The expansive parking fee and the long walk distance to the park were also the complaints of tourists.

Topic extraction
Twenty-two topics (Table 8) were identified by the LDA method, and within the extracted topic showed the top five words and the relative weight. For instance, the first and highest topic "Lysichiton americanus" was based on the words Summer, Lysichiton americanus, Month, Ozegahara Plateau and Time, with a weight of 0.8% and ranks high in the first topic. It was inferred that the Lysichiton Americanus had become a basic feature of ONP, especially in summer. The words in the Course topic showed that the boardwalks sometimes got slippery and it was common to rain in ONP, which can also be demonstrated in the strong correlation of End-Rain and Rain-Here in Figure 15 An interesting observation in the topic was Self-restraint, showing that ONP were influenced by the spread of COVID-19. For example, original comment 19&20 described the policy and appeared as: Comment 19: "Refrain from entering the mountain. Even in Oze, please refrain from entering the mountain for the time being." Comment 20: "On 17 April 2020 a request was made to refrain from entering Oze."

Discussion
The main contribution of this paper was the uniqueness of the methodology, combining online reviews and big data to evaluate the landscape of ONP. Most reviewers acknowledged that ONP was worth visiting and were satisfied about the journey. The perception of ONP from online reviews can provide references to the evaluation on the landscape and management of national parks, which can directly guide the landscape design practice. Suggestions for ONP management include the following aspects: (1) According to high-frequency words "Physical Strength" in section 3.1.1, it should be emphasized that long-term fatigue hiking may cause excessive physical exertion and be unfriendly to the middleaged and elderly people. Consequently, more replenishment sites should be set up at fixed scenic spots.
(2) Meanwhile, the government should make use of the Internet to promote the preliminary research and eco-tourism to local people and foreigners before travel shown in analysis in section 3.1.3, which was conducive to raise awareness of environmental protection. Ultimately, natural treasures could be preserved for future generations. (3) Auxiliary facilities were of great assistance to the ONP. For instance, the boardwalk needed to be repaired regularly to prevent it from affecting human safety and ecology, especially when the boardwalk got slippery in rainy days, as demonstrated in topic analysis in section 3.5.1. In addition, safety measures and education were supposed to be implemented since deer and bears often appear in ONP.
Despite the findings, some limitations could not be ignored in this study. First of all, the insufficient samples of English and Chinese reviews made it difficult to gather precise perceptions of Oze and conducted semantic and LDA analysis while compared with Japanese reviews, on account of the fact that ONP is located in Japan and native visitors account for the majority. Second, there lacked comparison and reference when studying whether the concept of the landscape was efficiently conveyed. Moreover, some deviations did exist when calculating the frequency of certain words, due to the variations of Japanese.

Conclusion
In summary, this research proposed a new way of postoccupancy evaluation on landscape based on online reviews and big data. The analysis led to the following conclusions: (1) The Japanese, Chinese and English lists of the frequency words were compared to figure out the perception of the park. The Japanese were sensitive to the landscape, nature and plants, and Chinese visitors paid attention to facilities and perception of tourism, while English concentrated more on the hiking activity.
(2) In addition, there existed two main courses in ONP: the Ozegahara Plateau route and the Lake Ozenuma route, each of which cost at least 1 day. And people preferred to visit Ozegahara Plateau rather than Lake Ozenuma in terms of the two words frequency.
(3) Only 48% of the concept of the landscape of the Park was conveyed to the users, on account of the fact that the management attached great significance to the protection of the national park and considered tourism as a collateral element.
(4) Male tourists laid stress on geographical location information, while females were more concerned about the process and experience. Male visitors tended to perceive the landscape more than female tourists. In terms of the different tourist type, the single visitors paid attention to the road and boardwalk. Family travel placed emphasis on the experience and time. Friends and couple travelers preferred to perceive action.
(5) The Lysichiton Americanus had become a basic feature of ONP, especially in summer. Almost all visitors thought highly of the scenery and landscape. Negative or neutral reviews mainly focused on the transportation. With the internet and big data advanced rapidly, online reviews propose a new way of evaluating the landscape. The paper conducts post-occupancy evaluation on ONP based on online reviews and big data and is expected to provide a reference for the design and management of national parks and postoccupancy evaluation on landscape.