Predicting tourism demand by A.R.I.M.A. models

Abstract The paper provides a short-run estimation of international tourism demand focusing on the case of F.Y.R. Macedonia. For this purpose, the Box–Jenkins methodology is applied and several alternative specifications are tested in the modelling of original time series and international tourist arrivals recorded in the period 1956–2013. Upon the outcomes of standard indicators for accuracy testing, the research identifies the model of A.R.I.M.A.(1,1,1) as most suitable for forecasting. According to the research findings, a 13.9% increase in international tourist arrivals is expected by 2018. The forecasted values of the chosen model can assist in mitigating any potential negative impacts, as well as in the preparation of a tourism development plan for the country.


Introduction
Tourism is one of the world's largest industries and thus represents a major area of interest, not just because of its size in terms of the enormous number of travellers, passengers, visitors and tourists, or the size of their consumption, but also because of its enormous impact on national economies and people's lives. Tourism has proven to be a surprisingly strong and resilient sector of economic activity and a fundamental contributor to economic recovery by generating billions of dollars in exports and creating millions of jobs.
Yet, this dynamic industry faces many challenges that affect its development. In order to cope with them, the planners and policy-makers apply the process of forecasting as the only way to furnish information, which permits them to reach decisions before the occurrence of certain events. In order to create a comprehensive tourism development plan as a base for formulating tourism policy, reliable estimates of future demand must be obtained. So, the main aim of introducing forecasting processes in tourism is to envisage the success of the destination by ensuring that visitors are hosted in a way that maximises the benefits to stakeholders with minimal negative effects, costs and impacts (Edgell, DelMastro Allen, Smith, & Swanson, 2008;Goeldner & Ritchie, 2006;Mason, 2015;Wilkinson, 1997). KEYWORDS

Box-jenkins methodology;
forecasting; short-run estimation; tourism demand The main aim of the current research is to identify a model that best describes and forecasts international tourism demand in F.Y.R. Macedonia. To this end, the time series of international tourist arrivals is considered as a significant indicator of tourism activity and tourism development of the country. The data sample covers the period between 1956 and 2013. The research is structured as follows: following the introduction, Section 2 provides a brief overview of the literature addressing the main topic of interest. The description of the methodology applied is presented in Section 3. Section 4 covers the analysis, main results and discussion. The forecasts and accuracy of the selected specifications are given in Section 5, while the conclusions and some future challenges are presented in the final section.
The study contributes to increasing the limited number of research articles in this area by enriching the poorly developed empirical academic work in F.Y.R. Macedonia. Some exceptions are noted in the work of Petrevska (Petrevska, 2012(Petrevska, , 2013, who argues for the need to apply forecasting methods in preparing the country for coping with the unexpected challenges surrounding international tourist arrivals.

Literature review
Forecasting tourism demand has attracted much interest in academia. A variety of econometric models have been applied. In this regard, according to the comprehensive study of Song and Li (2008), the forecasting methodology is very diverse since researchers employ both time series and econometric approaches in estimating tourism demand. Consequently, Wong, Song, Witt, and Wu (2007) and Song, Witt, Wong, and Wu (2008) advocate for the introduction of combined forecasts directed at obtaining more accurate models. On the other hand, combining forecasts is more beneficial for long-run estimations.
Although the basic indicator describing tourism demand has gradually been modified, tourist arrivals is still the most applicable indicator. It is noticeable that many authors break this variable down in several in-depth variables, such as holiday tourist arrivals, business tourist arrivals, tourist arrivals for visiting friends and relatives (Kulendran & Wong, 2005;Turner & Witt, 2001a, 2001b, tourist arrivals by air (Coshall, 2005;Rosselló, 2001), tourist expenditure in the destination (Li, Song, & Witt, 2004Li, Wong, Song, & Witt, 2006), tourist expenditure on particular tourism product categories as meal expenditure (Au & Law, 2002), sightseeing expenditure  and shopping . Other tourism demand variables used in the literature include tourism revenues (Akal, 2004), tourism employment (Witt, Song, & Wanhill, 2004) and tourism import and export (Smeral, 2004).

Methodology
The A.R.I.M.A. models applied in the current research follow the standard expression (1) of Zou and Yang (2004) and Fernandes, Teixeira, Ferreira, and Azevedo (2008): In expression (1), the A.R.I.M.A. model is (p,d,q), in which p is the order of the autoregressive (A.R.) process, d is the number of differences or integrations and q is the order of the moving average (M.A.) process. A.R.I.M.A. models can be used for short-run estimation based on annual, quarterly, monthly or even weekly, daily or hourly data.

Analysis, results and discussion
The main objective of this research is to make a short-run estimation of international tourism demand in F.Y.R. Macedonia in terms of international tourist arrivals by introducing several specifications of A.R.I.M.A. models. The research is based on the available sources of secondary data -in this case from the State Statistical Office -and these are processed by the software E-views version 6.0. The original time series is the number of international tourists per year, which is considered to be significant indicator of tourism development -in this case of F.Y.R. Macedonia. The data observed cover the period from 1956 to 2013 on an annual basis due to the unavailability of monthly data, so the sample consists of 58 observations. Figure 1 visually evaluates the potential stationarity as a first basic assumption for applying the A.R.I.M.A. modelling. One can observe two pronounced trends in the movement of the time series, suggesting that some of its characteristics imply non-stationarity. Specifically, it is obvious that the series has different average values and variances in various sub-periods of the sample.
However, for precautionary reasons, the above-noted intuitive conclusion is tested by a correlogram of the series, showing the autocorrelation function (A.C.F.) and autocorrelation coefficients (ρ k ) calculated for 24 lags (Table 1). Due to facts that the correlogram starts with a very high correlation coefficient of 0.94, that the ρ k slowly decays and that there is a highly expressed autocorrelation in the series even for several lags (ρ k at lag 6 is still high at almost 0.5), one may conclude that the series is non-stationary.
Furthermore, the research continues by checking the statistical significance of the sample autocorrelation coefficients in order to determine whether they really represent the true ρ k of the population. As stated in the statistical theory, if dealing with a random process, then the ρ k are approximately characterised by the normal distribution, with a zero mean and variance of 1/n, where n is the sample size (Gujarati, 1995, p. 717). Hence, the standard error of the ρ k can be calculated as: √1/58 = 0.131. According to the tabular values for the normal distribution, the 95% confidence interval for the ρ k is ±1.96 × 0.131 = ±0.257.
If the calculated ρ k lies within the confidence interval, this means that the null hypothesis that the true ρ k of the population is zero (H 0 : ρ k = 0) cannot be rejected. From Table 1, it can be seen that the first seven ρ k are statistically significant (i.e., different from zero), the coefficients from lag 8 to lag 13 lie within the confidence interval, and then these coefficients are again different from zero. The large number of statistically significant coefficients once again confirms that the series is non-stationary.
However, given the problems with the individual testing of the significance of ρ k , the joint hypothesis that all ρ k are equal to zero is tested by employing the Ljung-Box (L.B.) statistic. This tests the null hypothesis that there is no autocorrelation for all of the coefficients at certain numbers of time lags. Then, if the null hypothesis is true, the L.B. statistic asymptotically follows the χ 2 distribution, with degrees of freedom equal to the number of autocorrelation coefficients. The values of the L.B. statistic are presented in Table 1, showing that for all time lags, the L.B. statistic by far exceeds the critical values. For instance, even at lag 24, the L.B. statistic is statistically highly significant (476.3). In that respect, this test shows that the null hypothesis can be rejected, which is a proof that the analysed time series is non-stationary. Nevertheless, it is known that the L.B. statistic has low power because the significant coefficients can be neutralised by the insignificant ones. Hence, the evidence obtained by the L.B. statistic is additionally tested by employing two unit root tests: the augmented Dickey-Fuller (A.D.F.) and the Phillips-Perron (P.P.) test.
In the first row of Table 2, the values of the A.D.F. test are shown in its three variants, and in all cases, the null hypothesis for the presence of a unit root cannot be rejected. Consequently, this test suggests that the series is non-stationary. Yet, it is known that the A.D.F. test has low power and, hence, the results are additionally checked with the P.P. test. As shown in the second row of Table 2, all of the variants of the P.P. test show that the null hypothesis of a unit root cannot be rejected. Hence, this test, too, suggests that the series is non-stationary. As mentioned previously, if the time series is non-stationary, then the Box-Jenkins methodology cannot be applied. This means that it is necessary to transform the series in order to make it stationary, which is done by differencing the original series (Table 3).
When the series is differenced, one cannot observe a regular movement of the autocorrelation coefficients, which begin with low values, decreasing quickly to zero and then, up to lag 12, moving in a wave style (i.e., increasing and decreasing). In order to check the stationarity of the differenced series, the ρ k are individually tested with the confidence interval, which in this case is ±0.257. It was then shown that the null hypothesis that the true ρ k of the population are equal to zero cannot be rejected. Specifically, the value of L.B. statistic with 24 degrees of freedom is 34.919, which is not sufficient to reject the null hypothesis. The above results show that by differencing the original time series, stationarity is obtained. Yet, once again, in order to verify the results, the A.D.F. test and the P.P. test are used. From Table 3, it can be concluded that the values of the statistics are highly significant, so once again, it can be concluded that the first-differenced series is stationary.
After performing the additional tests, it can be concluded that the Box-Jenkins methodology can be applied. The first step is to identify the appropriate model that will explain the time series movement. Here, crucial instruments are the sample A.C.F. and the partial A.C.F. (P.A.C.F.). The detailed analysis of both functions (presented in Table 4) reveals that both A.C.F. and P.A.C.F. decay gradually up to lag 3, and then there is no regularity in the movement of the autocorrelation coefficients, from which the model could be identified. What the correlogram suggests is that we have a mixed process (i.e., a combination of A.R. and M.A. processes).
Given the unclear character of the time series, two alternative specifications were used in order to model the original series -A.R.I.M. A.(1,1,1) and A.R.I.M.A.(2,1,2) -which represent the original time series in an adequate manner. The models' estimates along with the accompanying diagnostics are shown in Tables 5 and 6, respectively. As can be seen, the A.R.I.M.A.(2,1,2) model has a slightly lower adjusted coefficient of determination comparing to the competing model. However, all of the parameters in this model are statistically significant in contrast to A.R.I.M.A. (1,1,1), in which the M.A. term is not significant even at 10%. In both models, the inverted A.R. roots have modulus values lower than one (i.e., they lie within the unit circle). This is true for the M.A. terms, too, whose roots are   imaginary, but lie within the unit circle with values much lower than one. These results confirm that the two models are covariance stationary and invertible, thus making them appropriate for forecasting. Finally, the values of the Akaike and the Schwarz information criteria are very similar, suggesting that both models perform well (i.e., we can barely discriminate between the two specifications based on the information criteria). Due to the fact that the primary purpose of building a forecasting model is to clearly discern the future of a phenomenon, the most important criterion is how accurately the model does this. This means attempting to disentangle how closely the estimations provided by the model conform to the actual events being forecasted. In this respect, we support the accuracy of the suggested model by within-sample forecasts. To this end, some of the standard indicators are employed, such as root mean squared error (R.M.S.E.), mean absolute error (M.A.E.), absolute percentage error (A.P.E.), mean absolute percentage error (M.A.P.E.) and the Theil inequality coefficient (T.I.C.). Table 8 presents the values of the evaluation of the suggested models. Generally, both models present good accuracy according to the M.A.P.E. classification for analysing the quality of the values forecast since the results lie within the interval 10-20% (Lewis, 1982). Based on the analysis of the error values (Table 8), it may be concluded that the models produce good and satisfactorily accurate forecasts for the observed period (Appendix A, When evaluating the forecast performance of econometric models, the issue of model stability arises naturally. As revealed in Figure 1, international tourist arrivals in F.Y.R. Macedonia show an upward trend until the end of the 1980s, followed by a steep decline during the 1990s. Then, since the 2000s, one can observe another reversal in the dynamics of the series. Naturally, this raises the issue of the presence of potential structural breaks during the sample period. Though some preliminary conclusions can be obtained on the basis of mere visual inspection of Figure 1, we provide the results of several formal stability tests, such as the Chow breakpoint test, the Chow forecast test and the Quandt-Andrews test for unknown breakpoint dates. Though we have run the stability tests for the two A.R.I.M.A. specifications, for the sake of preserving space, we report and discuss the results for the A.R.I.M.A.(1,1,1) model. Note that the results are very similar for the A.R.I.M.A.(2,1,2) model and they are available upon request.

Forecast results and evaluation of models
Appendix C presents the two variants of the Chow test for possible structural breaks dated in 1986, 1991 and 2002, as well as the Quandt-Andrews test for unknown breakpoint dates. The main findings from these tests are as follows: regarding the possible structural breaks occurring in 1986 and 1991, both the Chow breakpoint test and the Chow forecast test reject the null hypothesis at the 1% significance level. Only the first Chow test points to a possible structural break in 2002 at the 10% significance level. All of the test statistics associated with the Quandt-Andrews test confirm the presence of multiple structural breaks within the sample period. These results suggest that some caution is needed when applying the above A.R.I.M.A. models for forecasting. However, this qualification can be partially counterbalanced by the fact that the selected models consist of few A.R. and M.A. terms, which could be appropriate for tracking the dynamics of many economic time series characterised by considerable degrees of persistence.
The Box-Jenkins methodology is commonly applied on a regular basis when dealing with tourism demand estimations. In this case, the evaluated results provide good accuracy, highlighting that A.R.I.M.A.(1,1,1) is more suitable for modelling and forecasting due to  slightly better and satisfactory statistical and adjustment qualities. Furthermore, it should be pointed out that the anticipated values must be interpreted with caution, because the model does not indicate the reasons for these effects on the estimated results. This is very important, as these indicators have significant influence on identifying and implementing measures and activities in order to create appropriate tourism policy for the country.

Conclusion and future work
Tourism in F.Y.R. Macedonia plays a significant role in strengthening the economy, particularly in terms of job creation. International tourist arrivals have shown persistent increases during the past few years, due to some government support measures (subsidies, larger budgets for tourism promotion, raising awareness of the positive impacts of tourism, aggressive promotion in international regional markets, etc.). Hence, anticipating international tourism demand is necessary for creating favourable conditions and images of the country. The specification of the Box-Jenkins methodology, the A.R.I.M.A.(1,1,1) model, is constructed and proposed for analysing and short-run forecasting of international tourism demand in F.Y.R. Macedonia based on annual data. This research forecasts that the upward trend in the international tourism demand will continue in the near future, pointing to positive impulses and expectations of a continuous increase by 2018.
Additionally, the paper explains that the suggested model does not provide 'the solution' , but only assists in finding it. Even though the model results are essential elements in the preparation of well-coordinated policies, they cannot do the job all by themselves. The research outcomes may be presented only as a framework, while the rest needs to be achieved with a lot of common sense and knowledge of the details. So, the projected values cannot explain the factors underlying these trends, but on the other hand may serve as a solid base for mitigating any potential negative impacts and preparing the tourism development plan in F.Y.R. Macedonia.
Although the accuracy of the proposed A.R.I.M.A. model can be regarded as good, valid and satisfactory, and despite the fact that the projected values are classified as reliable forecasts, the model is still not highly accurate, probably due to the presence of several structural breaks during the sample period. This means that the suggested model has some difficulties in producing forecasts with minimal errors, implying that different models could be tested. Therefore, future aspects of this research may address the application and modelling of other models that may obtain forecasts that are closer to the actual time series and with greater accuracy, such as hybrid models, artificial neural network models, conditional volatility models, Holt-Winter additive models, Holt-Winter multiplicative models, vector autoregressive models, error correction models; Gaussian mixture model (G.M.M.), etc.
Even when an ideal forecasting model can be identified, it can only serve as an approximation for complex tourist behaviours, as it is possible that tourists' decisions could reflect changes in preferences, motivations or economic shocks. Hence, the planner should always be prepared to make revisions to the previously identified and defined model, adapting it to any newly created changes.

Disclosure statement
No potential conflict of interest was reported by the author.