The use of user-generated content for business intelligence in tourism: insights from an analysis of Croatian hotels

Web-based peer review sites are gaining importance in travellers ’ decision-making and provide information for destinations ’ management. Textual reviews are especially important, but very extensive and hard to process. This article discusses the benefits of recent developments in computational linguistics and shows it can be used, based on a study of 18,000 reviews of Croatian hotels. Results show that numerical evaluation rarely provides sufficient information, while textual reviews reveal details about facilities ’ competitive (dis)advantages. Being very extensive, the reviews are difficult to use. By applying computational linguistics the study illustrates how the information can be summarised and used in decision-making. The study extends the application of computational linguistics methodology to tourism literature and provides the first extensive analysis of TripAdvisor data for Croatia.


Introduction
Understanding and measuring the nature of tourist destination, hotel, and service images, as well as assessing the impact of these images on the traveller decision-making process and on firms' performance have become essential for both travellers as well as managers. Tourist destination image, for which accommodation often represents a major component, is 'the sum of beliefs, ideas and impressions that a person has of a destination' (Crompton, 1979, p. 18) and is built from 'associations, beliefs and attitudes about a company' in the minds of consumers (Barich & Kotler, 1991, in Foroudi et al., 2018. These images are constructed from many sources. Today, consumers are increasingly relying on the Internet. Internet 2.0 has changed consumer behaviour in many industries, including the hospitality industry, primarily through increasing the importance of on-line life, including blogs, forums, and different social platforms (Casal o, Flavi an, Guinal ıu, & Ekinci, 2015;Godnov & Redek, 2014;Kladou & Mavragani, 2015). The Internet is a medium for exchanging and sharing tourist experiences, 'travel-related comments, opinions and personal experiences, which then serve as information for others' (Tuominen, 2011, p. 3). Thereby, travellers co-create, co-influence the creation of a tourist destination or hotel or brand image (Simms & Gretzel, 2013;Yang, Park, & Hu, 2018). Peer travellers and their evaluations on on-line booking sites have become an essential source of information for other travellers as well as destination and hotel management staff as they provide short numerical or star evaluations as well as extensive written evaluations, often supported with other materials (pictures, videos).
Since the majority of travellers rely on internet sources as support in their travel decisions and even more of travellers admit that online peer reviews affected their planning, studying online travel reviews in relation to travel destination or hotel image, the importance of studying web resources in relation to consumer image is clear (Filieri, Alguezaui, & McLeay, 2015;Jeong & Mindy Jeon, 2008, 2008Lee, Law, & Murphy, 2011;Memarzadeh & Chang, 2015). Online peer reviews are very influential also because such reviews are considered to be objective and trustworthy in presenting real experiences ('The 2014 Traveler's Road to Decision,' 2014). This view is shared by a number of other researchers (Filieri et al., 2015;Gretzel & Yoo, 2008;Liu, Bi, & Fan, 2017;Petrescu, O'Leary, Goldring, & Ben Mrad, 2017;Yuan, Hong, & Pavlou, 2012); destination and hotel management should pay particular attention to the mixed and negative reviews, since these are considered even more trustworthy (Wu, Noorian, Vassileva, & Adaji, 2015).
For the destination and/or hotel management tourists' 'impressions' shared through the web are important for several reasons. First, because for a potential traveller the perception implies a decision for or against a specific hotel or destination, and it influences loyalty towards the destination or accommodation (Casal o et al., 2015;Foroudi et al., 2018;Rajesh, 2013). Following the consumer decision-making process, the experience and the created image will also be shared with other potential travellers and will consequently impact their final choice (Gruen, Osmonbekov, & Czaplewski, 2006;Serra Cantallops & Salvi, 2014;Torben, 2013;Zhao, Yang, Narayan, & Zhao, 2012). This, most evidently, is relevant also to companies and managers in the hospitality industry as it is an important factor of tourist destination competitiveness and business performance (Fan, Che, & Chen, 2017;Horwath Htl, 2016;Luca, 2016;Tuominen, 2011).
From the perspective of the tourists' destination, hotel or location image, the two most important questions are: what do the (potential) travellers see and how can this information be used efficiently to enhance business performance? To show how and why online reviews can provide relevant decision-making input to hospitality managers, this paper combines the most recent text-mining approaches (including aspect-based sentiment) with standard statistical methods. The study relies on 18,000 reviews of 87 Croatian hotels, for which the focus is primarily on the application and usefulness of methods in decision-making and not the evaluation of Croatian hotels.
This article contributes to the literature in the field of tourism in several aspects. First, from the methodological perspective, it provides an in-depth overview of computational linguistics methods for use in business intelligence that could be used in analysing online written content in tourism but also elsewhere. The article also discusses the practical application of each method and the possible use of each type of result as a source of business intelligence. The study aids in understanding the decision-making process of the consumer/guest as it identifies the most discussed and important aspects. In addition, the article provides insights about Croatian hotels, which could be used by the local hospitality sector. The results are discussed in order to identify prospective use by the destination management, which extends the existing studies of destinations' strategic behaviour and competitiveness. In addition, the study extends the existing literature on tourist destination image and image formation, its analysis and use.
The article is structured as follows. First, the materials and methods are discussed. Primarily different text mining approaches are discussed in this segment to present the methodology and the practical applicability of results obtained from a specific method for both destination management as well as potential travellers. The results, obtained using the methodology presented on a broad sample of reviews for Croatian hotels are then discussed. The article concludes with practical implications of the methodology.

Materials and methods: investigating destination or hotel image using textual reviews
The role of user-generated content in image formation and consumer decision-making The Internet has changed consumer behaviour. Today, consumers are 'digital explorers' and influencers. If, in the past, consumers went from a stimulus (perhaps a TV commercial) to the shelf and then experienced the product, this simple three-stage consumer journey today became more complex. While in the past, the first experience was usually obtained with the first use, consumers today read reviews, watch videos, ask their questions on social networks and then make a decision (a more informed one). Following their hands-on experience, consumers today much more significantly also become influencers with their user-generated content when they share videos, experience on social networks, share reviews (Torben, 2013). This builds the reputation of a product or a destination, hotel, location, etc. in tourism and thereby affects its importance directly.
In the hospitality industry, TripAdvisor and similar information sources became important in the first stage and remain so in the second. TripAdvisor, Booking and Expedia offer 'trustworthy' and easily accessible information. First, these pages offer summarised information about the perceived tourist destination quality because they offer numerical evaluations. However, they also offer descriptions. Moreover, written text exhibits emotions better than numbers do (Yus, 2005). It not only offers data and facts but also adds colour and tone to the text by adding emotions with the choice of words. Chung and Pennebaker (2007) claim that words are used to convey emotions and stress the potential of the textual analysis for other branches of science as well, not just linguistics (e.g. Cohn, Mehl, & Pennebaker, 2004).
User-generated content shapes the image of a certain producer or provider (in the hospitality industry), where these images, which are shaped by consumers, also impact their performance. Information provided by consumers can provide efficient business intelligence.
A typical review provides two basic types of information: (1) a numerical or star evaluation; and (2) textual evaluation (review). This information can also be accompanied by other materials, such as videos, photos. However, given that star and text evaluations represent the predominant source, the paper focuses on these two in the analysis. To obtain comprehensive information, which can be used in decision-making, both sources (even better all) should be used. Namely, despite the clarity and straightforwardness of star-evaluation, this information does not provide a good overview of the actual situation (Table 1 provides an illustration).
Studying the information provided, the article illustrates how the following decision-making information can be obtained: 1. Should the management closely follow the written reviews as well or do the star (numerical evaluations) suffice for an overall analysis of trends as well as a comparative analysis (e.g., between hotels of different brands, same brand at a different location)? 2. If yes, can the written evaluation be summarised efficiently? 3. What kind of information can be obtained from the written evaluation that will help identify competitive advantages, disadvantages, and help monitor trends in the hotel? 4. Should specific reviews be monitored more closely? Which are these?

Methods
To extract valuable business intelligence from the reviews, the following methods were used (Table 2 summarises them): (1) sentiment analysis (two methods of document-based sentiment analysis and aspect-based sentiment analysis) with emotion extraction; and (2) content analysis (key-words and topic modelling). These text-mining approaches were combined with standard statistical analysis applied to numerical outputs of sentiment analysis and numerical (star) evaluations. The analysis was conducted in R and Rapidminer software with Aylien A.P.I. Sentiment analysis is one of the techniques that will be first used to investigate attitudes, opinions and emotions of the travellers. Sentiment analysis can provide valuable additional information, primarily to tourist facility management since it provides them with a deeper understanding of the satisfaction of their past clients. A potential traveller also benefits from the sentiment analysis, but in this case, it analysis is done 'individually' as the text is read and depends on the person.
Analytically, sentiment analysis has been present for a while. According to Liu (2012), sentiment analysis has also gained momentum for business purposes since the year 2000 due to the increased needs and interest in natural language processing as well as the availability of digital texts. According to Pang and Lee (2008), the early forerunners date to 1980s (Carbonell, 1981;Wilks & Bien, 1984), but it was only recently that the methodology started developing fast due to the increasing processing power as well as Internet 2.0, which contributed to the emergence of part of 'big data', which relies on different user interaction and customer opinion (Facebook, Twitter, etc.) and is increasingly important in marketing.
Technically, sentiment analysis is based on text mining. 'Text mining is an interdisciplinary field of activity amongst data mining, linguistics, computational statistics, and computer science. Standard techniques are text classification, text clustering, ontology and taxonomy creation, document summarization and latent corpus analysis' (Feinerer, Hornik, & Meyer, 2008). Feinerer et al. (2008) prepared a thorough overview of methods of text mining (in R software).  Methodologically, sentiment analysis can be conducted at different levels: document (entire text) level, sentence level or subject level (author), and entity level (e.g. a specific hotel or a specific service of a hotel). To conduct a sentiment analysis, opinion words are sought. These are indicators of positive or negative sentiments or emotions. For example, for a hotel, examples of positive words could be 'good ', 'amazing', and 'wonderful', while 'dirty', 'bad', and 'horrible' could be examples of negative terms, but not all of them have the same power or the extent of positive or negative connotation. Sentiment lexicons have been developed and are used in statistical packages (e.g. Hu & Liu, 2004, AFFIN, MPQA, SentiWordNet, and others; see Potts [2011] for an overview). In such a lexicon, each word is assigned a 'sentiment' value. The simplest method classified words either as negative (À1), neutral (0), or positive (1). A cumulative score for every sentence was then calculated based on the number of positive and negative words. The overall score could be 0, implying a neutral sentence; a negative score implies a negative sentence, and a positive score a positive sentence. However, lexicons differ in the ranges of word valuation; for example, the simplest only distinguishes three classes (negative, positive, neutral), while more recent ones expand the range, also based on the different meanings a set of words with the same root might have (Potts, 2011). It should be admitted that results do depend on the choice of the lexicon; a more advanced one hopefully is also one that is better. In addition, the lexicon could be adjusted to the specific need/topic of research or new lexicons can be created.
Modern sentiment analysis studies also selected aspects and the attitudes (sentiment) related to those aspects. To evalute the satisfaction with specific elements that are important for tourists in their evaluation process, aspect-based sentiment analysis was also used. While sentiment analysis is generally based on a text segment (sentence, description, etc.), consumers are primarily interested in satisfaction or sentiment regarding specific characteristics (restaurants, staff, bed, etc.) and their aspects (food, friendliness, etc.). Saujanya and Satyendra (2018) explain that aspect-based sentiment analysis is a two-step procedure, which first extracts terms in a set of sentences that include a specific aspect (e.g. restaurant). In the second stage, the polarity of each aspect is studied at the sentence level. Given a set of pre-defined aspect categories (e.g. food, cleanliness, friendliness) the aspects in a specific sentence are found, only to be then evaluated as positive, negative or neutral.
Three methods were used; first, two standard sentiment analysis approaches and then the aspect-based approach. The original Hu and Liu (2004) method was applied, which classifies the words into three classes. Words from a negative list obtain a value of -1, neutral words 0, and words from a positive list obtain a value of 1. This initial method is still widely used, but with advances in IT techniques as well as the increased use of sentiment analysis, the classification of words or the structure of 'dictionaries' have also improved. Thus, the recent AFFIN lexicon (Nielsen, 2011;Sharma, Kumar Yadav, & Pal, 2016) was also used. The words are evaluated from À5 implying a very negative word (e.g. 'bastard') to 5 (e.g. 'outstanding') implying a very positive word. In total, almost 2500 words are classified. Finally, an aspect based approach is used, which is especially valuable in the hospitality industry as it allows the study of satisfaction with relevant aspects of the industry.
To identify the polarity of each review (positive, negative, neutral) the Wiebe and Mihalcea (2006) subjectivity lexicon based on a naive Bayes classifier was used.
Sentiment analysis can also be extended by classification of the evaluations using the prevailing exhibited emotion: for example positive, negative, neutral. These most recent developments rely on an extensive N.R.C. lexicon (Mohammad, 2015(Mohammad, , 2018, which comprises a total of 14,000 words and distinguishes the emotional sentiment of words in great detail (Mohammad, 2015). It enables the classification of texts into several categories by the prevailing emotion (disgust, fear, anger, sadness, surprise, anticipation, trust, joy) and also allows the identification of the most common words associated with each emotion. The most significant advantage of the method is the very detailed capture of the emotions. For example, Mohammad (2015) gives an example of the word 'abandon'. This word has a negative connotation. However, if only 'abandon' is used, this is usually associated with the emotions of fear and sadness. In contrast, 'abandoned' is, in addition to fear and sadness, often also associated with anger (i.e. 'abandonment'); the three could also be related to surprise, but in a negative way.
In the analysis, each review was evaluated from the sentiment perspective using two methods: Hu and Liu (2004) method and the recent A.F.F.I.N. method. Each review was evaluated separately using these two lexicons, and the total 'sentiment' of each review was calculated. Based on individual sentiments, average scores for regions, hotels or even reviewers could then be calculated. The textual review sentiment was next contrasted against the numerical evaluation that each reviewer also provides to examine whether there is a strong link between the two or not. Finally, the NRC method was used to distil the prevailing emotion.
Topic analysis was the next step. According to Brett (2012), it is a method for identifying patterns in a text (corpus). Words from a corpus are merged into topics. From the perspective of the hotel analysis, the topics discussed in reviews are highly significant both for potential travellers as well as managers (indicating the weaknesses and strengths). Keywords and keywords in context are the first step in identifying the points discussed most. Methodologically, words are simply counted (Feldman, et al., 1998;Zhang et al., 2008). However, as emphasised by Hulth (2003) in Zhang et al. (2008), the extraction of 'a small set of words, key phrases, keywords, or key segments from a document that can describe the meaning of the document'. Keywords by definition summarise the text, help classify the text, help detect topics and assist automatic clustering and are, as such, considered as key input or approach to automatic text processing.
In the next step, topic modelling is used. It helps us further identify the main topics the reviewers discuss and the key words associated with each topic. Posner, Wallace, and Borovsky (2012) define topic modelling as a 'method for finding and tracing clusters of words (called 'topics' in shorthand) in large bodies of texts'. Methodologically, it is based on probabilistic mathematical modelling (Blei, 2012), which helps identify the main themes that pertain to large corpora of text in the otherwise unstructured text (as the reviews are also taken, as a corpus, a group of words). To conduct topic model analysis, a Latent Dirichlet allocation (L.D.A.) approach was used (Chen, 2011). L.D.A. approaches the corpus of words as a bag of words. The 'bag of words' is using an iterative procedure broken down into topics: key words of each topic and most common words of each topic are identified based on a probabilistic iterative procedure. The topics are also graphically presented with key nodes of each topic and the links to most important words, pertaining to each topic as well as the strength of the relation. The 'web' of words depicts the topics very clearly and as such provides highly valuable input into the managerial decisionmaking processes. Table 1 summarises the primary methods used, the research output obtained and the implication of results pertaining to each method. The relevance of each method for the destination image analysis for both management as well as travellers is indicated.

Data
In total, 18,288 textual reviews were collected (only in English to avoid problems with 'translation noise'; for more details, see Godnov & Redek, 2015), with a total of 3.4 million words written about 87 Croatian hotel tourist facilities: 73.3% were from Dalmatia, 19.9% from Istria, and 6.8% from Kvarner (islands) for all major tourist cities (Table 4 and Table A1 in the Appendix). To avoid problems with the differences in accommodation types and service types as well as the impact of the size of the facility on a number of reviews only mainland hotels were targeted. 2 The number of reviews per facility varies from 20 (Crikvenica, Njivice) to 6858 (Dubrovnik), but the number of reviews is well correlated with the size of the facility. On average, each evaluation had 186 words, with the shortest being only seven words long and the longest almost 3500 words long ( Table 2). The authors of the current study would like to stress that the purpose of this analysis was not to compare or evaluate the hotels but to study the implications for business intelligence. 3 The analysis was mainly conducted in the R statistical program with appropriate packages (e.g., tm), while aspect analysis was done in Rapidminer with Aylien extension. Hotel reviews were entered into the sentiment analysis and topic models analysis in R. For the key-words and key-words in context analysis, 'stop-words', and punctuations were removed. Stop words are the most common words and terms used; they include 'the, a, and, to, in, etc.' (see also Godnov & Redek, 2014, 2016a. For aspect analysis, an Aylien extension in Rapidminer was used, which includes aspect for hotel analysis. Otherwise, the reviews were not cleaned to filter them out by any criterion based either on the reviewer, location, or numerical/textual evaluation.

Results
Do written reviews provide value added to numerical evaluations?
Usually, travel platforms offer some sort of numerical evaluations as well as the option of written evaluations, supported with photo, audio, and video materials. Although numerical evaluations would be expected to provide both an efficient and a sufficient summary, this is not the case. First, because the relationship between the numerical evaluation and the actual portrait of the experience in the text are often weakly correlated, and second, because of the abundance of additional information provided in the text reviews, which also allows the study of polarisation, prevailing emotion and content. This, third, deepens the analysis and allows a more efficient comparison between hotels/brands or monitoring of improvement for a specific brand/hotel.
Relationship between numerical and text evaluations. When customers search in on-line databases for a suitable hotel, the numerical rating of a hotel (on TripAdvisor 1 to 5, on Booking.com 1 to 10) provides an important signal to the potential traveller about the overall quality of the facility. It is, therefore, an important signal as well as a method of comparison also for hotel management. However, it should be noted, that the numerical evaluation provides only an inadequate summary of the textual evaluations and management should therefore not rely on it.
The 'story between the lines', which is conveyed with the choice of words, provides a much richer and often even much richer information. For example, compare the two reviews in Table 3. Both travellers graded the hotel the same numerical grade (highest grade, 5), but the quality of the hotel, which is depicted in the text, provides a much different feel. In the case of the first review, excitement is very apparent, while the second is much colder in the attitude, providing even some criticism.
The differences in the quality of the evaluations can be studied with sentiment analysis and content analysis. As a direct numerical measure of the quality that is depicted in the text, sentiment values can be used. In the example (Table 3), despite the same numerical evaluation, the sentiment of the text was much different, only 1 compared to 124. To a reader (potential customer), the first review is much more convincing and portrays a better quality of the hotel. Therefore, management should not rely on numerical evaluations. In general, for the studied sample, on average, the hotels were evaluated very highly, with an average evaluation of 4.28. Two types of sentiment analyses were conducted; A.F.I.N.N. and Liu Hu methods were used. The A.F.I.N.N. sentiment was on average close to 20, while Liu Hu was 8.7. The average sentiment per word in the textual evaluation was 0.06 with the Liu Hu sentiment case and 0.14 with the A.F.I.N.N. methodology. The relationship between the numerical and sentiment evaluation is weak (Table 5). In addition, the value of the numerical evaluation is negatively related to the length of the textual evaluations, implying that dissatisfied customers write more. Once the sentiment is corrected for the length of the review (sentiment per word) the correlation is higher, close to 0.5. Nonetheless, even this correlation is only medium strength, speaking in favour of the importance to study the content of the text also. Polarisation and analysis of the prevailing emotions provide additional comparative information. From the managerial perspective, the polarisation of the text in the reviews (positive, negative, neutral) as well as the prevailing emotion provide an interesting perspective. In our case, based on Wiebe and Mihalcea's (2006) naive Bayes classifier, the majority of reviews (almost 63%) are positive, while only 7.7% are negative. To identify the prevailing emotion, the N.R.C. method was applied to the entire population of comments. These were, in general, rather positive (Figure 1). Taking that into consideration, it is not surprising that the prevailing emotion, that can be 'read between the lines' is one of joy and trust (in almost 26% of reviews), followed by anticipation (19.3%). Negative emotions of sadness (7%), anger (4.7%) and fear (4.4%) are much less common (Figure 2).
For comparative purposes (competitive analysis between hotels), analysis of polarisation and emotions between hotels can be conducted. Here, hotels are compared by numerical evaluation, since the purpose is not to compare specific hotels. Clearly (Figure 3) with increased quality (as evident from numerical evaluation) the share of reviews with positive polarisation also increases. Similarly, the share of reviews, which could be classified as displaying 'joy' also increased with the evaluated hotel quality  (star evaluation). Both methods can be efficiently used as additional information in addition to numerical evaluation and sentiment in either static comparative analysis between hotels or to monitor changes in performance (or comparative performance) in time.
The comments can also be analysed with regards to the strength of the language, and what words travellers use to describe their experience. Stronger language, either positive or negative, will be more convincing. For the management, this provides additional information about the (comparative) level of satisfaction, although this information is summarised by the sentiment values. The AFINN dictionary (words evaluated from À5 to 5) was used to do that. Generally, mildly positive words (and primarily adjectives) prevail in terms of relative frequency, such as 'good', 'great', 'nice', 'clean', 'friendly', 'helpful', 'recommend', 'comfortable' (all with sentiment values of þ3 and 2), followed by words such as 'free' (no charge), 'big' (usually was related to the word 'room'), 'fresh', 'easy', and 'huge'. The most common negative word is 'problem' with a relative frequency of 0.07%. Interestingly, the users used  none of the most negative words (evaluated by -5) that exist in the A.F.I.N.N. dictionary. However, since the overall evaluation is rather positive for the hotels, the result that travellers did not use extreme language does not surprise. Table 6 summarises the results.

The benefits of content analysis
For hotel management, it is essential to be aware of the details of the content of the texts in the reviews. While in many cases of smaller hotels, reading a few reviews is not problematic, larger hotels have many more reviews. The problem becomes even  greater if management is interested in a comparative analysis within the hotel chain or with other hotels/chains. The text provides input: 1. regarding the actual services and accommodation characteristics and helps identify aspects of services, which are the most important to guests (or were the most problematic or the best); 2. combined with aspect-based sentiment analysis, content analysis also enables a comparative (or longitudinal) study of the quality of aspects which are most important to tourists (beds, food, room, cleanliness, etc.)

allows extraction of information regarding what is good and what could be
improved or what should be added to the services, which is very important input for strategic decision-making (regarding investment or training of staff) Text can be summarised and analysed using several methods; in this paper key words, topic analysis (L.D.A.), identification of the cluster of words, and aspect-based sentiment are used.
The key-words analysis shows that the most discussed words in terms of nouns were 'hotel', 'room', 'staff', 'food', 'pool', and 'breakfast' with an absolute frequency above 800 apperances in the studied reviews, which in total comprised 3.4 million words. Consequently, the most common words represented between 0.02 and 0.11% of all words. This signals that guests, when evaluating a specific hotel, appreciate most the quality of the room, friendliness of the staff, the hotel facilities (pool) and the food (Figure 4).
In this context, the aspect-based sentiment analysis provides additional input, since it allows a comparison between hotels or hotel chains based specifically on the satisfaction with those aspects that are most important to tourists. Aspect analysis evaluates each of the key-aspects with positive, negative, or neutral evaluation based on the text of the review. To allow easier comparison either in time or across hotels, the values were recoded into '1' for positive, '0' for neutral and '-1' for negative. Results show that for the entire sample, people were most satisfied with comfort, design, food/drinks and cleanliness, while the most problematic was payment, customer support, and wi-fi (Table 7).
In each of the reviews, where a specific aspect was identified as either good or bad, details can be found about the identified problems. For example, in one of the reviews, where the 'value aspect' was considered as negative, the following was said: The hotel is a nice resort hotel, with a great pool. The concierge, especially Tin, is very helpful. On the other side, the room is small for the price we paid (superior double) and was in a poor location (first floor facing trees). The hotel is pretty big, and it's lacking signs. Also, the insulation was poor, which made the room noisy. The food at the pool [name of restaurant deleted] is average and expensive.
Such analysis is quick, allows rapid categorisation of reviews both by aspects which are also considered by satisfaction with a specific aspect. For managers, this again allows valuable input for improvement strategy as well as comparative analysis. Figure 5 presents the results of a comparison of three hotels (all three had numerical rating of 5). Clearly, there is much dissatisfaction with customer support in 'hotel 2', while in three wi-fi is the most problematic. Such analysis can also easily be done between hotels from different chains, which is especially important if the hotels target a similar customer group or a closely located.
The content of the text can also be further studied with the analysis of word clusters. Figure 6 presents the results for the entire sample (but could be done for a hotel level or comparatively). Correlations strengths between the words, for which the strength is at least 0.2, are presented; line thickness signals association strength. The users discuss the hotel, primarily: the hotel amenities (pool, bar), room (baclcony and sea view), food, hotel location (walk to town, sea view) and staff and their qualities. The word staff is very strongly correlated to friendly and helpful, hotel amenities are discussed together and room quality.  Source: Authors' calculations.
One might expect that the guests would speak about more topics, perhaps the hotel with the room, staff, facilities, but then also the tourist destination in general, the location's facilities (town 'vibe', local food, etc.). To identify the actual topics that prevail in the text, L.D.A. was used. L.D.A. analysis is based on a user determination of the number of topics that should be sought in the text, and then an iterative procedure (5000 iterations were used) is applied to identify the 'nodules' of specific topics and terms that are most often associated with the nodules. Several numbers were used as the initial number of topics. However, the topics are very closely related, and no clear distinction can be made among topics (Table A1 in Appendix).
However, since hotels differ, the topics were also studied by hotel quality. First, this article is interested in whether the topics that the guests discuss differ by the quality of the hotel (as designated by the numerical evaluation of the facility). This could be relevant for management as it would help them understand the specific demands of their target group. It could also be expected that the worse the hotel, the more guests complain about a specific topic (cleanliness, food, etc.). Reviews were first divided into groups by the numerical evaluation. Again, different numbers of topics were tested. Table 8 reveals that even in this case the topics are closely related and that only a slight distinction can be made between the two topics in the case of better hotels. While generally the first is very closely related to the hotel and room itself, the second topic often revolves around the staff and also slightly revolves around the general destination (town, location).
The results of the numerical hotel evaluations again reveal that guests that visit the best-evaluated hotels and those that visit a lower ranked hotel have similar considerations; differences again could hardly be identified. This essentially confirms that guests care most about the basics (hotel, room, staff, and facilities). If these are good, the evaluation is higher, otherwise lower. However, the same topics are discussed; only the 'tone of discussion' (sentiment) differs. The relationship is not strong (correlation coefficient is 0.36), but is highly significant.

Importance of specific reviews
Given that consumers seek information, not all reviews might be equally important. PowerReviews (2014) suggests that consumers are often prone to seeking out the negative reviews as the 'negative reviews provide a baseline for the worst-case scenario consumers could have with a product' (see also (Lackermair, Kailer, & Kanmaz, 2013;Luca, 2016). These reviews (both positive and negative) affect consumer behaviour as well as help build brand reliability. Namely, consumers are aware that products and services do have flaws and some negative reviews are appreciated ('The Power of Reviews,' 2014). When the usefulness of reviews are studied, it is evident that the travellers primarily appreciate longer reviews and those written by reviewers that have already provided more reviews (strongest correlation). This also indicates that the content of the review of the reviewers with more experience in writing reviews changes; more of the relevant aspects are also mentioned. Figure 7 provides information about the correlations (all are significant at 0.05 level or 0.001).

Discussion of results
The analysis of the reviews of user evaluation of tourist facilities in the most prominent tourist destinations in Croatia was used to illustrate what kind of information can be distilled from the extensive written reviews for hotel management with the combination of standard statistical methods and different text-mining approaches.
Four research aspects were studied: 1. Should the management use the written reviews? 2. If yes, can the written evaluation be summarised efficiently?
3. What kind of information can be obtained from the written evaluation that will help identify competitive advantages, disadvantages and help monitor trends in the hotel? 4. Should specific reviews be monitored more closely? Which are these? Table 9 summarises the results by research questions, highlights the methods used and points to the implications for the use in decision-making.  T1  T2  T1  T2  T1  T2  T1  T2  T1  T2   room  room  pool  room  food  room  good  room  staff  pool  staff  night  rooms  food  pool  good  room  good  room  good  beach  just  time  staff  rooms  staff  nice  town  great  just  like  staff  stay  good  like  breakfast  great  clean  ÃÃÃ  town  pool  like  even  reception  just  view  staff  staff  food  breakfast  service  stay  staff  area  good  nice  pool  breakfast  good location Notes: Ã City name replaced by ÃÃÃ . Source: Authors' calculations.
The textual reviews provide abundant information with valuable business intelligence in the hospitality industry. The purpose of this article was threefold: (1) to show the importance of studying written reviews (besides the numerical evaluations); (2) to show how and which business intelligence data can be obtained; and (3) to determine whether all reviews are equally important.
Both numerical evaluations and written evaluations provide essential business intelligence data. While numerical evaluations are very straightforward and allow easy and efficient quick comparison between hotels, or monitoring of improvements or changes in time, including quality control, they do not suffice.
Written reviews provide important information, which can be efficiently used in: (1) competitive analysis in order to conduct an in-depth comparison between the characteristics and qualities of different hotels (of either the same or different chains); (2) to monitor the performance of a specific hotel in time (which is especially relevant in quality control); or to (3) understand more deeply the characteristics of consumers. The importance of studying written reviews was evident in our study from: (1) the weak relationship between the numerical evaluation and the sentiment analysis as well as by; (2) the abundance of other useful information, which can be distilled from the text. Similarly, other authors have also argued in favour of Yes, primarily those from experienced reviewers and longer reviews (contain also more aspects) Experienced reviewers (those that have written more reviews) write reviews that are more often marked as usefulindicates impact on consumer decision. Reading these reviews provides quality improvement and competitive intelligence data. Source: Authors' own and adaptation based on Godnov and Redek (2016b). studying written reviews (de Langhe, Fernbach, & Lichtenstein, 2014;G emar & Jim enez-Quintero, 2015) to obtain deeper information. The information from the written reviews can also be used to understand the consumers better and improve the services to fit better with the expectations of a specific target group, which was also suggested by the literature ('The Power of Reviews,' 2014). Understanding the consumer well to ensure the strong performance of hotels is especially important also because travelling and vacations are 'products and services' with higher emotional involvement of consumers or are 'high-involvement products' (e.g. Kokemuller 2016) for which consumers are especially sensitive, both because of the importance of vacation as well as comparative financial value (in comparison to other goods and services consumers buy). The literature on consumer behaviour online emphasised that consumers increasingly rely on written reviews, that they take time to read them, study them, also study accompanying material (photos, videos) when choosing both their destination as well as hotel, showing the increasing impact of the electronic word-of-mouth (eW.O.M.) (Anderson, 2014;Hennig-Thurau, Gwinner, Walsh, & Gremler, 2004;Synchrony Financial, 2014). With reviews, consumers also co-shape corporate (hotel, destination) image and therefore studying reviews and trying to improve based on the information in the reviews is extremely important. The fact that reviews impact the direct choice of the hotel as well as shape the brand (hotel) image additionally supports the need to study the text reviews.
The results also allow the identification of key aspects of hotel services and their infrastructure. The results show that tourists were evaluating the 'basics' (room, food, staff, cleanliness) primarily. This was revealed both by key-word analysis as well as topic models (see also Godnov & Redek, 2015). Text mining also enables monitoring of the satisfaction with each of these specific aspects, which allows both competitive analysis as well as quality control. In the long-run, the information, which can be obtained from the texts, can (if used wisely) significantly improve the performance of a hotel or a chain.
Results also suggest that not all reviews are equally important. Primarily, reviews written by experienced reviewers have the most impact. These reviews are especially critical in the comparative sense and provide valuable insight for quality management and investment and training policies. There is also an indication that longer reviews (these are usually also more negative) have more impact (since they are considered more useful). Other consumer product research shares similar views. (Lackermair et al., 2013;Luca, 2016;'The Power of Reviews,' 2014) Overall, the study of online materials about the characteristics of products and services has become immensely important for the overall performance of companies as it directly impacts the choices consumers make. As such, it provides valuable business intelligence input, primarily for the following purposes: competitive business intelligence, as it allows both quick and straightforward (star ratings) as well as extensive comparative analysis between different brands (for the use of the methodologies across different types of destinations and accompanying problems see also Godnov & Redek, 2018); quality management, since it helps identify good and bad aspects of performance, as well as longitudinal monitoring of the impact of changes of services on satisfaction and its link to business performance; marketing and sales, since it helps to better understand the target consumers and differences between different consumer groups; brand management, as it allows monitoring consumer-affected brand/hotel image and making appropriate changes to improve it; investment policies and HRM and training policies (including staffing), as studying text-reviews points to the deficiencies in the services and infrastructure.

Contributions
This article provides several contributions to the literature. First, it points to the benefits of text mining as a method for efficient study and summation of abundant written materials for business intelligence purposes. Thereby, it extends the literature that stresses the importance of the study of written material. In addition, the study extends the existing literature on tourist destination image and image formation, its analysis and use (e.g. Lackermair et al., 2013;Luca, 2016; 'The Power of Reviews,' 2014) by providing a framework for how to do it.
By studying the reviews, this article also contributed to the field of understanding hotel customers or guests in, extending the marketing, hospitality literature and consumer decision-making literature (e.g. also Cheung & Lee, 2012;Gruen et al., 2006).
This article also provides some information about the Croatian hotels and preferences of consumers, although the use of Croatian data was primarily intended as an illustrative case. As such, it extends the study of the Croatian hospitality sector (Grzinic, 2008;Mikuli c & Prebe zac, 2011;Pivcevic & Pranicevic, 2012;Plenkovi, Gali, & Ku, 2010;Razovi, 2014).
As a further practical implication, such analysis could be programmed and results offered at review-web-pages, or as a part of business consulting activity. Sentiment values as aspect-based sentiments could be used to improve the summary information, now provided as star-ratings. For example, each review could be evaluated based on emotional analysis and a 'suitable smiley face' could accompany the review besides the user-generated numerical evaluation. This would provide better summary information to other users. It would increase the value of the information these web-pages provide to interested clients. This could be done not just for the hospitality industry but also in general.

Limitations and challenges for future research
The paper also faced several limitations. The analysis would be deeper if data also incorporated some deeper demographic data, providing information about the characteristics of customers: primarily about their travel purpose, length of stay, previous travel experiences, income, age, as these determinants impact consumer expectations significantly. The same hotel might seem very good to someone, but much worse to someone else.
The analysis would also be improved by merging these data with financial performance data as this would allow a study of the actual impact of the reviews on hotel performance. Similarly, it would be better if the data was a panel data set, also merged with panel financial performance data.
This study relied only on English reviews to avoid the problems of introducing noise through translations (especially in emotional and polarity analysis). However, broader (non-English) review data would allow a deeper cross-cultural study, as well as provide additional data. With further improvements of machine translations, this will become possible.
This research opens also challenges for future research. From the tourist literature perspective, it would be interesting to further examine the relationship between the satisfaction of customers and the written review, primarily from the perspective of the potential travellers, are they more inclined to reading longer reviews, longer negative reviews and, if so, why? From the perspective of the reviewers, it would be interesting to study the different reviewers and see, how they 'behave' or evolve over time as the number of their reviews increases. Methodologically, a tourist lexicon would be useful to provide a more reliable calculation of the sentiment.

Conclusion
With the emergence of Web 2.0, the internet became an interactive space, a community where people daily seek news, shop, join in chats and find friends and also seek different information about their potential travel destinations. Booking.com, Tripadvisor.com, Expedia.com and many others have become a huge source of traveller data, which by the opinions and actions of people offer reliable and trustworthy information about different destinations. These pages co-create tourist destination images and should consequently be studied as such.
Computational linguistics is a tool that facilitates the aggregation and summation of diverse written information. The purpose of this article was to illustrate with a practical case of Croatian hotel reviews what kind of information can be distilled from the data and how this information can complement the readily-available numerical evaluation for more efficient use in business intelligence.

Disclosure statement
No potential conflict of interest was reported by the authors.

Funding
This research was partially funded by research grant P5-0128 Izzivi vklju cujo cega in trajnostnega razvoja v prevladujo ci paradigmi ekonomskih in poslovnih znanosti. Notes 1. Selected highlights of this research were published as a short research note in Godnov, U., and Redek, T. (2016). Application of text mining in tourism: case of Croatia. Annals of Tourism Research, vol. 58, pp. 162-166. 2. The research targeted only hotels in order to ensure compatibility in the type of the services provided. Self-catering apartments and (private) rooms differ significantly in both service types as well as facilities; they also receive significantly fewer reviews. The mainland was targeted only for a similar reason. The islands generally offer more private accommodation and have smaller hotels; consequently, they receive less attention in reviews. 3. Therefore, several data aspects important for comparative analysis were not considered.