Forecast of rainfall distribution based on fixed sliding window long short-term memory

ABSTRACT Applying data mining techniques for rainfall modeling because of a lack of sufficient memory components may increase uncertainty in rainfall forecasting. To solve this issue, in this research, a deep-learning-based long short-term memory (LSTM) model is developed for the first time for forecasting monthly rainfall data, and its capability is compared with a random forest (RF) data-driven model. To this end, monthly rainfall data for a period of 41 years (1980–2020) from two meteorological stations in Turkey, namely Rize and Konya, with different climatic conditions, are used. The analysis is carried out using optimum window sizes for determining the optimum lag times of rainfall time series. The performance of the models is evaluated using five statistical measures, namely root mean square error (RMSE), RMSE-observations standard deviation ratio (RSR), Legate and McCabe’s index (LMI), correlation coefficient (R) and Nash–Sutcliffe efficiency (NSE), and also using two visual means, namely Taylor and violin diagrams. The results reveal that the LSTM model, as a more efficient tool, outperforms the RF model in forecasting rainfall at both stations, with improved RMSE of 12.2–14.9%, RSR of 12.3–14.8%, R of 9.4–13.5% and NSE of 32.9–33.2%. The LSTM-based approach proposed herein could be adopted over any global climatic conditions to forecast the monthly rainfall with reasonable accuracy. GRAPHICAL ABSTRACT


Problem statement
Rainfall is a dynamic meteorological variable, which affects different hydrological and agricultural processes both directly and indirectly. It behaves as a natural agent by governing the natural hydrological cycle, and, subsequently, affects the regional crop water requirements. Despite its significance in maintaining the hydrological cycle, sometimes excessive rainfall leads to flooding disasters, which, in turn, adversely affect societies and human civilizations. The situation may be exacerbated by the dynamic environmental and landscape conditions of a region. Hence, quantification of the possible rainfall magnitude over a region well in advance will have a useful impact on the decision-making process. In Turkey, a substantial amount of spatial variation in rainfall is noticed, in that a considerable part of Turkey is susceptible to dry conditions due to a lack of adequate rainfall. This leads to acute water scarcity problems during lean periods, adversely affecting agricultural operations. These inherent variabilities associated with Turkey's climatic pattern necessitate the accurate prediction of plausible rainfall scenarios over the concerned region well in advance.
Rainfall prediction has remained a major global concern within the scientific community for the past few decades. The major challenges faced in the process of rainfall prediction include its random nature and its frequency. Moreover, the unavailability of long-term historical data renders the rainfall prediction process more intricate and cumbersome. As far as the occurrence of rainfall is concerned, a given atmospheric condition may yield rainfall and sometimes it may not. Hence, a clear understanding of the atmospheric processes that play a crucial role in governing the occurrence of rainfall over a region is essential. Furthermore, rainfall is an interlinked phenomenon that is implicitly influenced by other meteorological variables such as minimum and maximum temperature, pressure, relative humidity and wind speed. Hence, the consideration of these associated meteorological parameters in the rainfall prediction process is of utmost importance. Since rainfall is a major variable that has a correlation with certain adverse natural phenomena, such as flood movements, avalanches, landslides and drought, the accurate prediction of rainfall can serve as an early warning system for adverse natural processes.
With advances in science and technology, numerous techniques such as data mining, artificial intelligence, deep learning and machine learning are employed in the field of rainfall prediction. These advanced techniques can address the inherent stochastic and nonlinear behaviors involved in the rainfall prediction mechanism (Nayak et al., 2013). The rainfall prediction process is accompanied by two fundamental processes, namely dynamic and empirical approaches. The dynamic approach incorporates process-based equations by means of physical models to predict rainfall over a region. However, considering the cost, computational efficiency and expertise needed to operate these models, the generalized application of this approach has been limited in the rainfall prediction application domain (Dash et al., 2018). Under such a scenario, the empirical datadriven approach seems to be the best possible alternative, with limited input data availability and cost-effectiveness.
The chaotic behavior of atmospheric conditions, coupled with the associated meteorological variables, hinders the application of the conventional empirical approach for rainfall prediction. A machine learningbased approach has proved to be the most feasible technique, by extracting hidden patterns from the historical rainfall data. Prior to the processing of rainfall and other meteorological variables using a suitable machine learning algorithm, the most common preprocessing operations, namely dusting and normalization, have to be performed to reduce the noise and bias present in the raw data. Commonly used data mining techniques in practice include artificial neural networks (ANNs), gene expression programming (GEP) and support vector machine (SVM). Most of these data mining approaches have had considerable application in hydrological modeling fields, including rainfall-runoff modeling (Hosseini & Mahjouri, 2016) and evapotranspiration modeling (Elbeltagi et al., 2020). The random forest (RF) technique has also emerged as a powerful machine learning algorithm owing to its inherent potential for quick training and high flexibility with classification algorithms, which can work well for all types of data, namely balanced data and unbalanced data (Breiman, 2001). However, the applicability of RF algorithm to the field of rainfall prediction is yet to be evaluated.
Nevertheless, meteorological data sets are subject to a high degree of autocorrelation and the forecasting process may become complex with data mining approaches. Furthermore, the lack of a sufficient memory component in the conceptualization of the algorithm leads to the disappearance of gradients in the network, thereby increasing the predictive uncertainty in rainfall prediction. To solve this issue, deep learning methods are becoming widely used in complex hydrological problems such as wind prediction (Liu et al., 2018), evaporation prediction (Majhi et al., 2020) and streamflow prediction (Fu et al., 2020). In deep learning methods, which are a type of machine learning method, the machines think similarly to humans. Deep learning is a collection of algorithms, the primary goal of which is to model high-level concepts in data sets. Deep learning modeling is accomplished by the use of a deep graph composed of numerous processing layers that incorporate a variety of linear and nonlinear transformations.
The long short-term memory (LSTM) model is a deep learning model that has been successfully applied to rainfall-runoff simulation (Hu et al., 2018;Kratzert et al., 2018;Li et al., 2021). Nevertheless, there have been only a few studies on rainfall prediction. Poornima and Pushpalatha (2019) proposed an intensified LSTM model to predict rainfall in the Hyderabad region of India, and compared it with the LSTM, Holt-Winters, autoregressive integrated moving average (ARIMA), extreme learning machine (ELM) and recurrent neural network (RNN) models. The proposed method showed promising results. Haq et al. (2021) used an LSTM model and rainfall parameters, including El Niño and the Indian Ocean Dipole (IOD), to predict rainfall in Indonesia, and obtained accurate predictions, with a mean arctan absolute percentage error (MAAPE) value of 0.58.

Research aims
Based on the best knowledge of the authors, there have been no studies on rainfall forecasting using an LSTM model. Hence, in the present study, for the first time, an LSTM model is used for rainfall forecasting. Moreover, as another innovation, a fixed sliding window is applied for determining the best lag time for the LSTM model's inputs with a computer program. Alternative methods that have been used in the literature to select the optimal lags of rainfall are the autocorrelation function (ACF), partial autocorrelation function (PACF), average mutual information (AMI), principal component analysis (PCA), RF and relief algorithm (Mohammadi et al., 2020;Sumi et al., 2012). The third innovation of the present research is that performance of the fixed sliding window LSTM method will be evaluated using Taylor diagrams. The chosen study area in this research is Turkey, which is characterized by different climatic conditions, from humid to semi-arid, experiencing varying degrees of rainfall behavior throughout the year. Hence, it is useful to evaluate the efficacy of machine learningbased and LSTM-based approaches in rainfall forecasting over an area with varying topographic and climatic conditions, and the obtained results could be used as a guiding tool for researchers, engineers and decision makers.
In light of the above discussions, the specific research gaps of these studies are outlined as follows: (1) in past rainfall forecasting studies, the classical data mining models were applied to model rainfall; and (2) the efficacy of different data mining approaches in rainfall forecasting studies is yet to be evaluated thoroughly. Considering these gaps, the present studyhas the following specific objectives: (1) to develop fixed sliding window machine learning-based LSTM and RF frameworks for forecasting rainfall; and (2) to compare the performances of the two developed approaches in reproducing rainfall characteristics in two locations in Turkey.

Long short-term memory (LSTM) neural network
The LSTM model, which is an RNN method, has the ability to learn long-term relationships and dependencies (Hochreiter & Schmidhuber, 1997). It can generate errors without vanishing gradients by adding cell modes with constant errors. Unlike other RNNs, LSTM, instead of having one layer of neurons, has four layers of neurons that interact according to a specific structure. LSTM has quadruple the parameters and computational cost of RNN. It has three gates and a network for computing memory input. The input or update gate serves to check whether the information obtained from the current moment is worth storing in long-term memory. The output gate is used to transfer only the information that needs to be output. The role of the forget gate is to check the memory time in the memory cell and to forget unnecessary past information. Parameters to be determined using training data comprise the number of hidden layers and neurons, maximum epoch and learning rate. Batch size can be determined algorithmically or randomly. Functions in the gates are determined and computed based on the expected error and maximum epoch (Cai et al., 2020;Salman et al., 2018). The architecture of the LSTM layers is presented in Figure 1, where X is a time series having C channels with a length of S, Figure 1. Long short-term memory (LSTM) layer architecture (Mathworks, 2020).
Ct represents the cell state at time step t, and ht is the output or hidden state (Mathworks, 2020). To compute the first input and output states, the first LSTM is used for the first time step of the series and the early state of the network. For computing the input cell state and output (ct), the current state of the network (ct −1, ht −1) and the next time step of the sequence are used in this block at time t.
The state of the layer includes the cell state and output (hidden) state. At time step t, the output state includes the output of the LSTM layer. The cell state has information on the previous steps. At any time step, the layer removes or adds information from or to the cell state. The gates control these updates. To control hidden and cell state, several ingredients of the LSTM architecture are implemented. The cell state update level is controlled by the input or update gate (i), and the cell state level, which is added to the output state, is controlled by the output gate (o). The cell candidate (g) adds information to the cell state, and the cell state reset level is checked by the forget gate (f). A schematic diagram showing the flow of data at time step t is shown in Figure 2. For further details, please refer to Cho et al. (2020).

Random forest (RF)
The RF model includes a group of randomized regression trees that perform by developing large numbers of regression trees and then collecting them to obtain a single output (Figure 3). It creates a valid error estimated by applying out-of-bag (OOB) data, and estimates covariate importance by changing the arrangement order of values of each covariate in the OOB sample and predicting OOB samples using the changed variable (Zhao et al., 2012). The change in the OOB error then provides an indication of the importance of that covariate in the data set. The efficiency of the RF model depends on two parameters, namely ntree (the number of trees in the forest) and mtry (the number of auxiliary data points in each random subset), which are optimized by increasing mtry from 1 to 10 (the total number of covariates) and ntree from 100 to 10,000 by increments of 100 (Ghorbani et al., 2020;Were et al., 2015).

Study area and data used
This research uses the monthly rainfall data from two meteorological stations in Turkey. The locations of Turkey and these two stations (Rize and Konya) are shown in Figure 4. Rize Station, in north-east Turkey, has a humid climate, and Konya, in central Turkey, has a semi-arid climate. Rainfall data for 41 years, ranging from 1980 to 2020 (486 records), have been compiled by the Turkish State Meteorological Service for both stations. The observed monthly rainfall time series and heat maps of monthly rainfall data for Rize and Konya Stations are presented in Figure 5. It is clear from Figure 5(b) that the extreme value of rainfall between April and  September has decreased at Rize Station; however, it seems that there is no trend in rainfall values for Konya Station, and high rainfall values are seen between February and July. Statistical properties of the rainfall data set, along with training (77%) and testing (22%) data sets, are reported in Table 1. In comparison with the literature review, this data span is sufficient to construct a machine learning model and it is adequate to mimic the actual trend in the rainfall data set. The maximum and minimum records of rainfall at Rize and Konya Stations are 46.871 and 0 cm and 22.239 and 0 cm, respectively. The stochasticity of the data set at Rize Station is more complex as the variation between the maximum and minimum records is very high.

Error analysis
Different metrics have been used by researchers in the literature to analyze their models, and there is no standard for a unified parameter. In this study, to evaluate and compare results of the models, five metrics are used, as follows.
Nash-Sutcliffe efficiency (NSE): Root mean square error (RMSE): Coefficient of correlation (R): RMSE-observations standard deviation ratio (RSR): Legate and McCabe's index (LMI): where f (i) and o(i) are forecast and observed rainfall values, respectively;p andō are averaged forecast and observed values, respectively; and N is the number of data points.
The lower the values of RMSE and RSR, the more accurate the model's results are; conversely, the higher the values of NSE and R, the better the model's performance is. Table 2 presents the adopted performance ranking system based on four different ranges of NSE and RSR values.
Besides the statistical metrics presented in Table 2, a Taylor diagram is employed to assess the accuracy of the models (Taylor, 2001). The Taylor method is basically a graphical representation of the modeled and observed data. In the Taylor diagram, the radial distance from the origin refers to the ratio of the normalized standard deviation of the simulation to that of the observation, and the azimuth angle represents the correlation coefficient between the modeled and observed data (Ghorbani et al., 2017).

Results and discussion
Owing to the massive advances in computer-aided models, the current research aims to develop a deep learning model and validate its predictability performance against the RF model in simulating the rainfall process in two stations with different climatic conditions. The sliding window method can generate data with the current time step. For example, if the data for a month are to be forecast, then the n-dimensional data can be generated from the data of the previous n months to generate forecasts (Dong et al., 2020). In this study, the maximum number of dimensions for the dimensional construction of all months is set to 8, which means that data from 8 months before this month are used. Therefore, first, eight data sets  are generated from a given data set by varying the window size from 1 to 8. Then, the LSTM model is trained on these eight data sets. The window size that yields the highest accuracy is selected as the optimal window size (lag time), which is used to make forecasts. Figure 6 shows that the sizes of the minimum errors of sliding windows are equal to 4 for both Rize and Konya Stations. In other words, the optimum lag time of the rainfall time series is months at both Rize and Konya Stations. For rainfall modeling, the LSTM model is adopted as an automated feature selection for the correlated lag time in accordance with the mean squared error (MSE) metric. Figure 7 shows a sample of the training progress of the LSTM model in Mathematica software using the training data sets of Rize and Konya Stations. It shows that MSE values (loss) decrease with increasing rounds of training of the LSTM model for both training and validation data sets at both stations.
Mathematica 12 software is used to run the LSTM and RF models. Optimal values of the training parameters of the models obtained in this research using a trial-anderror method are presented in Table 3. Figure 8 presents different graphical evaluations of LSTM model performance in the testing phase. The graphs show the satisfactory forecasting capability of the LSTM model, with relatively high values of R and NSE criteria (0.788 and 0.602, respectively) and relatively low values of RMSE, RSR and LMI criteria (2.12 cm, 0.627 and 0.389, respectively). According to the values of NSE and RSR, and the performance ranking in Table 2, the model shows satisfactory performance. Based on Figure 8(a), actual and forecast time series are mostly identical, especially in forecasting high rainfall values. Figure 8(b) shows the scatter diagram of the LSTM model, and Figure 8(c) shows the scatter diagram of the residuals. Figure 8(b) and (c) show that (1) all data points have a good correlation in all ranges, (2) errors in higher ranges are trivial, and (3) discordance in the low and medium ranges is strong (about ± 5 cm). In the scatter plots, blue to red colors indicate high scatter to low scatter of the data points, respectively. Figure 8(d) shows the probability density function (PDF) plot for LSTM residuals. According to the figure, the PDF distribution is normally distributed; the residuals are approximately symmetrical, with low values of the mean and standard error (0.45 and 2.08, respectively). Figure 9 shows different graphical evaluations of the RF model performance in the testing phase. The graphs show the satisfactory forecasting capability of the RF model, with a relatively high value of R (0.72) and a low NSE value (0.452), and relatively low values of RMSE, RSR and LMI criteria (2.49 cm, 0.736 and 0.243, respectively). According to the values of NSE and RSR, and the performance ranking in Table 2, the model shows unsatisfactory performance. Moreover, based on Figure 9(a), actual and forecast time series are not close to each other; and the high rainfall values are underestimated and low values are overestimated. Figure 9(b) shows the scatter diagram, and Figure 9(c) shows the scatter diagram of the residuals. Figure 9(b) and (c) indicate that (1) all data points do not show a good correlation in all ranges, and (2) there is high discordance in all ranges (about ± 6 cm). Figure 9(d) shows the PDF plot for the RF residuals. According to the figure, the PDF distribution is normally distributed; the residuals are not symmetrical, with relatively high values of the mean and standard error (0.86 and 2.34, respectively). Figure 10(a) shows the evaluation metrics (correlation, RMSE and standard deviation) in the form of a Taylor diagram for the LSTM and RF models in the testing period. It can be found that the point of the LSTM model is closer to the observed point (in blue) compared to the RF model's point (2.248 and 2.512, respectively), and this indicates the higher ability of the LSTM model (with high and low values of R and RMSE, respectively) than the RF technique. Figure 10(b) and (c) show violin and box plots of the models, respectively. A box plot displays variation (such as minimum and maximum of data) in a data set. A violin plot is like a box plot; however, it presents the kernel probability density for different values of the modeled and actual data. Figure 10(b) and (c) show that the violin and box plots of the LSTM model are more similar to both plots of the actual data sets, compared with the RF model. This means that the statistical characteristics of the forecast rainfall values of the LSTM model are more similar to the statistical characteristics of the actual data. In other words, the LSTM model is more successful in rainfall modeling than the RF model at Rize Station. It should be noted that this conclusion is drawn by comparing the results of Figures 8 and 9. Figure 11 shows different graphical evaluations of LSTM model performance in the testing period. The graphs show a satisfactory forecasting capability of the LSTM model, with relatively high values of NSE and R criteria (0.545 and 0.748, respectively) and low values of RMSE, RSR and LMI criteria (1.92 cm, 0.670 and 0.326, respectively). According to the values of NSE and RSR, and the performance ranking in Table 2, the model again shows satisfactory performance. Based on Figure 11(a), the actual and forecast time series are very similar and close to each other, except for some peak values. Figure 11(b) shows the scatter diagram, and Figure 11(c) shows the scatter diagram of the residuals. Figure 11(b) and (c) show that (1) there is a very good correlation for all data points in all ranges, (2) most errors in the lower and higher ranges are trivial, and (3) discordance in the medium ranges is strong (about ± 6 cm). Figure 11(d) shows the PDF plot for LSTM residuals. According to the figure, the PDF distribution is normally distributed; the residuals are symmetrical, with very low values of the mean and standard error (0.079 and 1.927, respectively). Figure 12 shows different graphical evaluations of the RF model performance in the testing phase. The graphs do not show a high forecasting capability of the RF model, with low values of NSE and R criteria (0.41 and 0.659, respectively), and high values of RMSE, RSR and LMI criteria (2.187 cm, 0.764 and 0.265, respectively). According to the values of NSE and RSR, and the performance ranking in Table 2, the model again shows unsatisfactory performance. Moreover, based on Figure 12(a), the actual and forecast time series are not close to each other; and high rainfall values are very underestimated at some peaks. Figure 12(b) shows the scatter diagram,  and Figure 12(c) shows the scatter diagram of the residuals. Figure 12(b) and (c) indicate that (1) there is a good correlation for all data points displayed in the low and medium ranges, and (2) discordance in the higher ranges is strong (about −7 cm). Figure 12(d) shows the PDF plot for the RF residuals. According to the figure, the PDF distribution is normally distributed; the residuals are approximately symmetrical, with relatively high values of the mean and standard error (0.225 and 2.185, respectively).  respectively), and this indicates the higher ability of the LSTM model (with high and low values of R and RMSE, respectively) than the RF technique. Figure 13(b) and (c) show the violin and box plots of the models, respectively. According to these figures, it can be seen that the violin and box plots of the LSTM model are more similar to both plots of the actual data sets, compared with the RF model. This means that the statistical characteristics of the forecast rainfall values of LSTM model are more similar to the statistical characteristics of the actual data. In other words, the LSTM model has a higher ability to simulate rainfall than the RF model at Konya Station. It should be noted that this conclusion is drawn by comparing the results of Figures 11 and 12.

Discussion
In general, the LSTM model, as a deep learning technique, shows less error and more satisfactory performance in rainfall forecasting than the RF model, with improved RMSE estimates of 12.2-14.9%, RSR estimates of 12.3-14.8%, R estimates of 9.4-13.5% and NSE estimates of 32.9-33.2% at both Rize and Konya Stations, with different climatic conditions. In other words, the  LSTM model is more able to forecast rainfall successfully in different climatic conditions. This can be attributed to the structure of the LSTM model, which is a type of RNN model and has a special memory ability compared to other traditional methods to remember and learn past long-term dependencies and relationships, such as trend, seasonality and cycles, over long periods of rainfall time series, and can make better forecasts (predictions) for the future according to the current information. However, climatic conditions have no significant impact on models' performances, and the models show similar and accurate results in different climatic conditions. The RF technique shows satisfactory results owing to its inherent potential for quick training and high flexibility with classification algorithms, which can work well for all types of data.
The models' capabilities in forecasting extreme rainfall values (i.e. high and low values) are very important and so should also be evaluated. According to the scatter diagrams of the residuals, it can be seen that residual values of the LSTM model are lower than those of the RF model for both high and low rainfall values. Therefore, the LSTM technique forecasts the extreme values of rainfall more accurately than the RF model. The successful performance of the LSTM model in this research is in accordance with Poornima and Pushpalatha (2019) and Haq et al. (2021), who applied LSTM in rainfall prediction. Moreover, Yu et al. (2017) achieved good results by applying the RF model in rainfall forecasting; however, the RF model performed as the second best model compared with the SVM model.
It should be noted that the estimations of both models are more accurate in humid climatic condition (Rize Station) than in semi-arid conditions (Konya Station). This may be due to the fact that in humid and rainy areas, rainfalls with similar characteristics occur on most days of the year and in small time intervals, so a model can more accurately determine the relationship between its input variables (time-lagged rainfall) and output variable (current rainfall) and therefore it can predict rainfall more successfully. However, in semi-arid and low rainfall areas, owing to the relatively long time interval between rainfalls, the probability of two consecutive rainfall events occurring with different characteristics is high and, as a result, their modeling operation becomes a little more difficult.

Conclusion
The objectives of this study were to apply a new deep learning model (i.e. LSTM model) to simulate a monthly rainfall data set for the first time, and to evaluate its ability by comparing its results with the RF method at two stations (Rize and Konya Stations) with different climatic conditions in Turkey. A fixed sliding window method is utilized for determining the optimum lag times of monthly rainfall time series. The performance of an LSTM method for modeling rainfall data is evaluated using several statistical metrics (R, NS, RMSE, RSR and LMI) and plots (actual and forecast time series, scatter, PDF, Taylor, violin and box plots). The results show that the LSTM method, having special memory ability, outperforms the RF method in forecasting rainfall data at both stations. Although the LSTM model has several advantages in time series modeling problems, such as learning much more quickly, having special memory ability, and solving complex, nonlinear and long time lag problems, it has some disadvantages in the complexity of high network training, high decoding delay and long training time (Gu et al., 2021). The results of the RF model in this study are satisfactory in rainfall forecasting in both climatic conditions. Although the RF model has advantages, such as solving both regression and classification problems, solving unsupervised machine learning problems, handling many input variables without variable selection, acting as a feature selection technique and taking care of missing data internally, this model has some disadvantages, such as requiring a long time for training, as it computes a lot of decision trees to determine the class, and making inaccurate predictions for data outside the training data range (Simi, 2019).
This pioneering research provides a valid reference for applying a fixed sliding window LSTM deep learning model as a powerful technique for rainfall forecasting in different climatic conditions. For future studies, it may be recommended to compare the performance of the fixed sliding window LSTM model with other fixed sliding window deep learning models (e.g. convolutional neural networks and temporal deep belief networks) or data-driven models (e.g. SVM and decision trees) at different time scales (e.g. daily and annual) and in different parts of the world with other climatic conditions (e.g. arid and tropical). Moreover, applying the dynamic sliding window LSTM for rainfall forecasting and analyzing its performance can be strongly recommended in future studies.