Optimal RCM and spatial interpolation methods for estimating future precipitation in the Republic of Korea

ABSTRACT Recent droughts and floods caused by climate change have caused considerable damage to the use of water resources. According to several climate forecasts in various Intergovernmental Panel on Climate Change (IPCC) reports, the use of stable water resources remains vulnerable and can be heavily influenced by climate change. An adequate response to the climate change threat is thus growing more important for water resources management. This study aimed to select the optimal regional climate model (RCM) and spatial interpolation method for the practical application of future estimation of precipitation data produced by climate change scenarios. The results indicate that HadGEM3-RA (Hadley Centre Global Environment Model version 3 – Regional Atmospheric) is the most suitable of the five RCMs when comparing its predicted values to the monthly precipitation forecasts of the Korea Meteorological Administration (KMA), from 2006 to 2015, in Korea. Universal kriging was chosen as the optimal interpolation method for generating accurate precipitation data, among several spatial interpolation methods. In Korea, the accuracy of precipitation prediction by the RCM is low on the coasts and islands. In comparison, the accuracy is found to be higher in the northern inland area.


Introduction
The recent severe heat, droughts, floods, hurricanes and wildfires resulting from climate change have put stress on human and natural systems (IPCC, 2014). Currently, many countries are experiencing the effects of climate change. Since 1999, Korea has implemented comprehensive measures to cope with climate change, and continues to study technology for climate change prediction, impact assessment and adaptation. Many studies in Korea are being conducted on the various areas affected by climate change, especially on floods, droughts, precipitation, water management, impact assessment, estimation of river runoff, evaporation and groundwater (Ahn et al., 2003`;Choi et al., 2006;Kim, 1999;Kwon & Kim, 2012). In addition, research on climate impacts using various climate change scenarios has been applied to hydrology, hydraulics and river engineering around the world (Frigon et al., 2007;Milano et al., 2016;Soubeyroux et al., 2015;Viet Thang et al., 2018).
Since 2012, the return period of a drought in Korea has fallen from 5-7 years to 2-3 years, according to a drought analysis from 1960 (Bae et al., 2013;Choi, 2015). As a result, the water resources at several multipurpose dams in the provinces of Gangwon and Chungcheong have grown scarce (Kim et al., 2015). Such drought risk can lead to a shortage of water resources, which could be an increasing problem for the water supply in Seoul and other metropolitan areas (Jang et al., 2016).
Response to climate change is becoming more important for management of water resources. The Intergovernmental Panel on Climate Change (IPCC) has been working on responses and suggested a CO 2 concentration scenario that has a direct impact on climate change in its first (1990), second (1995), third (2001), fourth (2007) and fifth (2013) reports. In the present study, data of the Representative Concentration Pathway (RCP) 8.5 scenario, most widely used in Korea before the 6th report of the IPCC, was used for climate change assessment.
Climate change scenarios are climate projections based on scientific evidence for use in the many areas impacted by climate change. Climate forecasts are highly uncertain, however, due to the variability of nature, various greenhouse gas (GHG) emission scenarios, and modelling errors (Cho et al., 2011).
Climate change studies based on predictions from climate models should thus be considered together with uncertainty in the research process and its consequences. Through the United Kingdom Climate Change Programme (UKCCP), which began in 1994, the United Kingdom has been conducting studies on climate change science and warming and the quantification of GHG emissions. Since 1997, UKCCP has been working on climate change impact assessment and adaptation strategies, starting with the United Kingdom Climate Impacts Programme (UKCIP). In addition, the United Kingdom Meteorological Agency Hadley Center is conducting research on general climate change studies, global and national climate trend monitoring, the development of coupled ocean-atmosphere models, climate change prediction and impact assessment. In addition, the Climate Model (CM) group of the Global Climate Model (GCM) group has developed a fivestep coupled model comparison project (2009) in order to project climate change scenarios according to the scenario development strategy of the RCP Coupled Model Inter-comparison Project Phase 5 (CMIP5; Allen & Ingram, 2002;Kundzewicz et al., 2008;Taylor et al., 2009).
Under CMIP5, 14 countries including Korea participated to produce global climate change scenarios. In addition, studies are underway seeking to reduce the uncertainty of climate change prediction. They are endeavouring to predict data using more than 40 various models as part of a multi-model ensemble method (Allen & Ingram, 2002;Kundzewicz et al., 2008;Taylor et al., 2009).
Climate-model-derived estimated precipitation data have evolved through collaborative studies amongst many countries. However, concerns remain about the uncertainty of the models and data . There have been various studies on climate change in Korea, but there first needs to be an improvement in data to underscore the reliability of studies on the uncertainty and accuracy of the evaluation of climate change models and scenarios (Suh et al., 2012).
Therefore, this study examines the applicability of Korea's precipitation prediction data produced by regional climate models (RCMs) before using downscaled scenarios for climate change. In particular, RCP 8.5 is the most widely used climate change scenario in Korea and has been used for various future climate predictions. Therefore, the present analysis was conducted based on this scenario as well. To improve the applicability of forecast precipitation data forecasted by a climate change scenario, the results of monthly precipitation data based on the regional climate model and detailed data on Korea were compared based on the RCP 8.5 scenario. The scope and main content of this paper are as follows: 1. The theoretical background for improving the applicability of climate change scenarios is described. The theory of spatial interpolation methods for data generation at specific points was examined. Among the various methods, inverse distance weighting (IDW), natural neighbour (NN) interpolation, ordinary kriging (OK), and universal kriging (UK) are summarised. In addition, the process for verifying the estimation results according to the different interpolation methods is described. 2. The precipitation data generated by the RCM and the observed precipitation data of the KMA are compared and analysed. To compare model results and observational data at the same point, the spatial interpolation method was applied to extract the results of applying the RCM at the observatory. Based on a comparison between the extracted results and the actual observation data, the optimal interpolation method was selected to identify which RCM produces the most accurate data.

Spatial interpolation methods for generating precipitation data from RCM
Methods of spatial interpolation are generally classified as global versus local, exact versus inexact, and stochastic versus deterministic. Typical deterministic methods include IDW, NN and kriging as statistical methods . Spatial interpolation is used for making short-or long-range predictions determined by the distribution of values included in the data for estimation. A disadvantage, however, is that an error could occur when the interpolation method is used over a long distance. Determination of the suitability of the actual data used when using interpolation is thus crucial (Germann & Joss, 2001;Smith et al., 1996;Skøien et al., 2003;Todini, 2001).
Kriging, one of the most widely used methods of spatial interpolation, uses mathematical and statistical techniques to estimate the value, reflecting the characteristics of field data by analysing the correlation of the base data to the value of the predicted point. It also expresses the regional trend of the data through a distribution map (Chun et al., 2005).
Yet when choosing an interpolation method, if it does not estimate an unknown value appropriately, the possibility of a large error exists in the predicted value. Thus, selecting the optimal interpolation method is crucial for predicting unknown values by appropriately reflecting spatial distribution characteristics using OK, UK, NN and IDW methods, etc. (Kim et al., 2010).
In general, the most commonly used form of OK is to employ a linear combination of surrounding and known values to estimate the unknown value at an arbitrary point. The unknown value can be predicted as shown in Equation (1), and an error variance is expressed following Equation (2) in OK.
where z � x o ð Þ is the estimation value at x 0 , Zðx i Þ is the known value at x i , λ i is the weighting factor of zðx i Þ, σ 2 OK is the error variance of OK and n x ð Þ is the total number of data points used.
The kriging method requires that the error between the predicted and true values is minimised to determine weight and that the estimated value should be unbiased. Bias is defined as the difference between the mean of the population and that of the estimates to predict the population factor (Isaaks & Srivastava, 1989).
On the other hand, methods exist for predicting new values using weighted linear combinations of surrounding values; these methods include polygonal, triangular, regional mean, and IDW (Isaaks & Srivastava, 1989). In the case of the regional mean method, a limit is used to assign equal weight to all data values within the radius of influence. The assumption here is that the weight is inversely proportional to the magnitude of the distance. IDW is expressed by Equation (3): where d i is the distance between x 0 and x i , x 0 is the estimation point and x i is the point of known value. If α is close to "0" in the equation, the result of the IDW method is the arithmetic mean value. As the value of α increases, the influence of nearby points grows too large (Kim et al., 2010).

Methods for verification of estimated values
To evaluate the accuracy of the OK, UK, IDW and NN methods used in this paper, error ratio analysis was performed to evaluate the difference between actual and simulated values. Accuracy estimation can measure error by the value generated by spatial interpolation.
The verification methods used are mean absolute error (MAE); mean squared error (MSE), which evaluates the accuracy of the predicted value; percentage of bias (PBIAS), which evaluates the bias of the estimation result; and the g-index, which evaluates the prediction effectiveness. The g-index is derived from the Nash-Sutcliffe coefficient (Bucchignani et al., 2014).
The methods described as above are applied as in Equations (4-7). The comparison between the actual and interpolated values can be more accurately evaluated through regression analysis.
where ẑ x i ð Þ is the estimated value at i and � z is the mean value of total usage data In the case of MAE and MSE, the smaller the calculation value, the more accurate the estimated value; and the closer to 0 that PBIAS is, the less the estimation result is biased. A g value of 100 is a perfect estimate. If the g value is negative, it is less reliable than using the average of data values as a predictor (Kim et al., 2010;Korea Meteorological Administration, 2017).

Provision of RCM for generating precipitation data
The RCP CO 2 scenarios in the IPCC's Fifth Assessment Report are used as inputs to climate models and are provided through the IPCC's data distribution centre.
The development of scenarios of global climate change (CMIP5) produces global data based on RCP scenarios. Based on global data, climate change scenarios generated by the RCM by region are available in the local climate specification experiment (the Coordinated Regional Climate Downscaling Experiment or CORDEX). CORDEX calculates regional scenarios of climate change by dividing the world into 14 regions, including North America, Europe, Africa and East Asia, at a resolution of 0.44° (about 50 km). With regards to Korea, data on climate change forecasts by the RCM can be used in CORDEX-EA for the East Asia region by HadGEM3-RA (Hadley Centre Global Environment Model version 3 -Regional Atmospheric), RegCM4 (Regional Climate Model version 4), SNU-MM5 (Seoul National University -Meso-scale Model version 5), SNU-WRF (Seoul National University -Weather Research and Forecasting), YSU-RSM (Yonsei University Regional Spectral Model) (McCuen et al., 2006). In addition, through the KMA's Climate Information Portal, global data with a resolution of about 135 km by HadGEM2-AO, 12.5 km regional climate data by HadGEM3-RA, and 1 km high-resolution downscaled data by statistical analysis results are available. The data of each website providing climate change scenarios are provided through various models, data periods, variables, and resolution. For this paper we collected data on the CORDEX-EA and websites of the Korea Meteorological Administration Global Atmosphere Watch (KMAGAW) (2016), which provide the results of RCMs among several variables in the RCP climate change scenario.
The World Climate Research Programme (WCRP) initiated the CORDEX framework to produce an improved generation of regional climate change projections worldwide, and to provide a framework for better coordination of regional climate downscaling (Giorgi et al., 2009). The CORDEX experiments provide an opportunity to evaluate the relative and absolute performances of various RCMs over predefined regions. As one of the 14 branch domains within the framework of CORDEX, CORDEX-EA covers a large area of East Asia, with a horizontal resolution of approximately 50 km (Zhou et al., 2016). Table 1 summarises the status of the provision of precipitation data from the KMA and the CORDEX website (Bucchignani et al., 2014;McCuen et al., 2006). Five regional climate models provide monthly and sixand three-hourly data from 2006 to 2050, at a spatial resolution of 50 km for RCP scenario 8.5. The KMA's HadGEM3-RA provides data from 2006 to 2100 at a resolution of 12.5 km (KMAGAW, 2016). CORDEX-EA provides RCP 8.5 RCM data generated from Gongju University, Seoul National University and Yonsei University. In CORDEX-EA, the time variables that commonly use RCM results are monthly and daily data. In the SUN-WRF model, since the time range of monthly data is 2006 to 2010, it can be used up to 2049 when converting daily data from 2011 (Park, 2017). At the KGAWC (Korea Meteorological Administration, 2016), 1 km high-resolution data is available using the predicted precipitation data of five RCMs, but the data period is 2021 to 2100.

Collection of precipitation data from RCM
To compare the results of the RCM model with actual data to determine the accuracy of climate forecasting through the climate model, data was selected for use based on the status of the climate change scenario. Climate change scenarios collected are shown in Table 2. Data for the same period is required to compare climate change scenarios and observations. Thus, 50 km resolution precipitation data for 2006~2015, which overlaps with the past period's data, was collected among the prediction data of HadGEM3-RA, RegCM4, SNU-WRF, SNU-MM5 and YSU-RS of CORDEX-EA. In the case of SUN-WRF, only monthly data was available from 2006 to 2010, so daily data was collected and converted into monthly data from 2011 to 2015. Figure 1 shows the January 2006 precipitation forecast data for each CORDEX-EA RCM, using the Panoply software (NASA, 2022). The five Table 1. RCP-regional climate model provided by CORDEX-EA & KMA (Park, 2017 RCMs are targeted at East Asia. However, the HadGEM3-RA of the KMA has a different spatial range from the rest of the models. To check whether the inter-model grids coincide, Korea and its neighbouring regions were examined using ArcGIS software, as shown in Figure 2.

Collection of observation data for comparison with RCM
Precipitation data is available through the KMA website and KMAGAW (2016; Park, 2017). Ground observation data comes from the automated synoptic observation system (ASOS), automatic weather system (AWS) and automated agricultural observing system (AAOS). Data is provided in CSV and PDF formats.
ASOS comprises ground observations performed at the same time at all observatories in order to determine the atmospheric conditions at any given time. Air pressure, temperature, wind direction, wind speed, relative humidity, precipitation, solar radiation, daylight time, ground temperature, vertical temperature and ground temperature are all automatically observed. The observation density is about 36 km, measured in units of 1 minute. ASOS is currently being observed at more than 80 sites (Korea Meteorological Administration, 2016).
Observed precipitation data for 2006~2015, consistent with the RCM data period, was collected. Observation points were selected from the synoptic meteorological network, where no data was missing. No movement of precipitation observation equipment was seen over the measurement period. Location information on 66 selected precipitation data observation points was collected through the KGAWC and is shown in Figure 3.

Application of the spatial interpolation method by the RCMs
Comparisons were made with observational data to determine the accuracy of RCMs. As the grid locations of the five RCMs provided by CORDEX-EA differ, RCM precipitation data generated according to the spatial interpolation method is needed to compare with the same points of observations (Park, 2017).
Spatial interpolation methods were thus applied to monthly precipitation data for 10 years (2006~2015) of RCM with a resolution of 50 km. Precipitation data    for each climate change scenario located at national synoptic observatory sites were generated by spatial interpolation methods in this study.
To interpolate the precipitation data, ArcGIS 10.3 software was used (http://www.esri.com/news/arc news/spring12articles/introducing-arcgis-101.html). The spatial interpolation methods applied in this study are IDW, NN interpolation, OK and UK. These four interpolation methods were used to generate the precipitation data of the missing area of Korea (Park, 2017). Input data for the climate change scenarios of HadGEM3-RA, RegCM4, SNU-WRF, SNU-MM5 and YSU-RSM for 2006~2015 were generated in ArcGIS format using IDW, NN, OK and UK interpolation.
In all four interpolation methods, the grid size was 0.009094°, or about 1 km. The number of points of the search radius was chosen so that 12 sample points were selected in the same way for the IDW, OK and UK methods. In the NN method, spatial interpolation was performed with no selection of grid size option. The distance weighting coefficient of IDW was 2, a general value. The semi-variogram model of kriging was selected as spherical in the case of the OK method and linear with linear drift in the UK method. Figure 4 is a flowchart outlining the generation of precipitation data using the spatial interpolation method. RCM and observation precipitation data were collected from CORDEX and the KMA for 10 years (2006~2015), and data pre-processing, spatial interpolation and data extraction were performed using the Arc-GIS model. Precipitation data at the actual observation point was extracted using the spatial interpolation method. The five RCMs and four methods of spatial interpolation were evaluated to assess which is most accurate. These processes are also considered references for spatial data analysis (Cressie, 1993;Price et al., 2000).

Comparison of RCM simulated and observed data using the spatial interpolation method
Before comparing the climate change scenarios of the RCM models at 66 stations in Korea, the results of each spatial interpolation method were analysed. To examine national trends in forecasting climate change scenarios by the interpolation method, Park (2017) analysed results from January 2007, when the variance of national average mean precipitation between the observed values and scenarios was smallest during the analysis period (2006~2015), at 8.3 mm (Park, 2017). Figure 5 shows the interpolation results of the spatial interpolation method of HadGEM3-RA. As shown in the figure, IDW is an interpolation method giving a larger weighted average of the values available at the known points. Compared to other interpolation methods, the regional difference in the small region is apparent, and the NN and OK interpolations showed similar patterns. In the UK method, the predicted values were analysed most densely among the four methods, and overall precipitation was estimated to be larger than the other methods. Figure 6 is a graph of annual precipitation for the five RCM models. Figure 6(a) compares the actual precipitation and the model-simulated precipitation of the five RCMs with a mean interpolated value of 66 selected observation points. Figure 6(b-f) shows the results generated by each RCM through the four interpolation methods.
As shown in Figure 6(a), the total precipitation value of RegCM4 had a similar tendency to observation, but HadGEM3-RA and SNU-MM5 were underestimated. SNU-WRF and YSU-RSM showed a tendency to overestimate when compared to the measured precipitation. As shown in all the graphs, the difference in the precipitation values under different interpolation methods was not large, but that between the RCMs was significant. HadGEM3-RA and SNU-MM5 were more likely to show the tendency of time-series precipitation than those of the five models. Yet in the case of SNU-MM5, annual precipitation was about 1000 mm higher than the actual measurement.
To evaluate the accuracy of the forecasting data of the RCM's monthly precipitation, four evaluation methods were applied (using Equations (4-7) in Section 2.2). The evaluation results are presented in Figure 7, where the results of the 66 stations in Korea are shown as an averaged value. In the accuracy assessment, higher MAE and MSE values indicate lowaccuracy estimates. As these parameters indicate, the HadGEM3-RA model under the UK interpolation method showed an optimal RCM. PBIAS shows the highest accuracy with a value of zero, overestimation occurs when it has a positive value, and underestimation when it has a negative value. PBIAS showed that SNU-WRF under the UK method had the least bias. RegCM4, HadGEM3-RA, and SNU-MM5 were overestimated and YSU-RSM was underestimated.
A g value close to 100 means perfect estimation. The UK method of HadGEM3-RA had the highest accuracy among the RCMs. The other RCMs have a negative g-index and their accuracy was evaluated as low, meaning applying these RCMs in precipitation estimation was found to be ineffective.
In summary, the accuracy assessment found HadGEM3-RA to be the best among the RCM models. The difference in interpolation methods was not significant, but UK was found to be the most accurate among them. The results of Figure 7 are analysed by the average value of all regions in Korea, and the existence of regional differences is considered.
For the HadGEM3-RA model, which is considered the most accurate among the RCMs, the Pearson coefficient (R) was used to analyse the correlation between the measured and predicted values. The result is shown in Figure 8. In Figure 8, the Pearson correlation coefficient is shown to be in the range of 0.4~0.6, indicating a positive correlation with the measured value. But certain points show a coefficient value of less than 0.2. As a result of the points with low accuracy per the Pearson coefficient, the islands of Ulleung (No. 9), Heuksan (No. 29) and Jeju (No. 32) were selected. These observatories are located along Korea's coasts and/or on islands. And in Figure 8, observatories located in the nation's southern region show lower accuracy than those in its northern parts.
In Figure 8, monthly time-series graphs are shown for the highest and lowest accuracy points in  were selected as having the most accurate points and the islands of Jeju (No. 32) and Heuksan (No. 29) were selected as having the lowest.
Based on Figure 8, the two highest and two lowest accuracy points were selected as shown in Figure 9. Monthly precipitation results from observation and UK were also compared. The precipitation patterns of climate change scenarios are similar to those observed   in the northern part of Korea, such as Cheolwon (No. 2) and Ganghwa (No. 35). They did not, however, reach the maximum precipitation of summer and instead showed a low level. Jeju (No. 32) and Cheolwon (No. 2) were not predicted to see extreme precipitation (1000 mm/month). Prediction estimation is more difficult for island areas because of the complexity of their seasonal and monthly patterns of precipitation.
A previous study that quantitatively compared the results of GCM and RCM found that regional climate simulations can generally be improved by using RCMs superimposed within coarser-resolution GCMs (Guo & Wang, 2016). Therefore, it makes sense to compare the simulation results obtained from very fine-resolution GCM and RCM dynamic downscaling. If the methodology proposed in this study is supplemented, it is considered a strategic method in climate prediction along with existing studies. The results of this study are also expected to be compared with a GCM in the future so that more accurate analysis of the techniques can be performed.

Conclusions
Predictability analysis of precipitation forecasting by climate change scenario was conducted with the goal to promote optimal water management. For this, precipitation results from five RCMs based on the RCP 8.5 scenario were compared and analysed using spatial interpolation methods. This study's conclusions are as follows.
The forecast monthly precipitation for 2006~2015 and the observed precipitation data for Korea's five RCMs were compared. The results show that the HadGEM3-RA model gave the most accurate prediction of actual precipitation. Except for coastal regions and islands, this model showed statistically significantly better results. However, since it is the result of comparison through the observation data of 10 years, it can be improved if it is analysed through continuous collection of observational data in the future. Four deterministic methods of spatial interpolation were used to estimate precipitation from the RCMs used. Although the difference between the interpolation methods was not great, the UK method was found to most accurately predict precipitation in Korea. Interpolation using the UK method is recommended when applying precipitation data generated from RCMs in Korea. In this study, precipitation data of RCMs with 50 km resolution is interpolated to the major observation stations of Korea. It is expected that in future studies, a more suitable spatial interpolation method can be selected when comparing results by using high-resolution data of RCMs.
The accuracy of precipitation prediction in Korea via RCMs is considered low due to complicated rainfall patterns on the coasts because of the nation's peninsular topography. The southern area located at the tip of the peninsula, and islands far from the mainland such as Jeju and Ulleung, saw low accuracy. Accuracy improved on the mainland and inland areas located in the northern part of Korea. Based on this study, the accuracy of RCM precipitation prediction in coastal and southern areas is low. To solve this problem, it will be necessary to study the prediction of precipitation around the coastline of Korea.
The results of this study are based on the monthly precipitation data of RCMs. It is necessary to analyse the daily precipitation forecasting of the HadGEM3-RA model, whose optimal monthly precipitation prediction model is among the RCMs in Korea.
To effectively utilise water resources and design water facilities, calculation of probability in the area of precipitation requires analysis. Infrastructure planning based on the results of precipitation forecasting per expected climate change or for improving the accuracy of RCM is necessary for future water management.

Disclosure statement
No potential conflict of interest was reported by the authors.