GIS-based regional assessment of seismic site effects considering the spatial uncertainty of site-specific geotechnical characteristics in coastal and inland urban areas

ABSTRACT Earthquake-induced hazards are profoundly affected by site effects related to the amplification of ground motions, which are strongly influenced by site-specific geologic conditions such as soil thickness, bedrock depth and soil stiffness. Seismic disasters are often more severe in coastal or riverside locations than over stiff soils or rocks due to differences in local site effects. In this study, a recently developed geographic information system-based framework was applied in coastal and inland urban areas in Korea, and its applicability for regional assessments was evaluated using appropriate geostatistical zonation of site-specific seismic site effects. The proposed framework was composed of four functional components: multivariable statistical clustering, geostatistical optimization, geotechnical analysis, and local visualization. The framework was applied in the Seoul and Busan areas of Korea for consideration of site effects in inland and coastal urban areas. Such zones of thick soil, or with a deep depth to bedrock, are susceptible to ground motion amplification due to site effects during earthquakes. The earthquake losses associated with possible building damage can be estimated based on spatial zoning maps considering geological and topographical characteristics and by a comparison of the spatial correlations of seismic site classes between inland and coastal areas of Korea.


Introduction
The amplification of earthquake ground motions is affected by site-specific geotechnical characteristics. Current engineering seismic design code provisions (Elghazouli 2016; Wang et al. 2016) have incorporated amplification capabilities that depend on local geologic and soil conditions because of their importance in earthquake-induced hazard mitigation. Local site effects related to geologic conditions have been frequently observed in recent earthquake events, such as the 1985 Mexico City, 1989Loma Prieta, 1994Northridge, 1995Kobe, 1999Chi-Chi, 2005Kashimir, 2008WenChuan, 2010Haiti, 2011Tohoku, and 2016 Kumamoto earthquakes. In general, the term site amplification refers to the increase in the amplitude of seismic waves during their propagation through soft soil layers. Accounting for such effects is critically important in seismic regulations, land use planning, and the seismic design of critical facilities. Seismic zonation and the monitoring of local networks and alert systems may allow promotion of appropriate policies for the prevention and mitigation of earthquakes effects, particularly in regions with high seismic risk, such as Seoul and Busan in South Korea, where urban areas are characterized by high levels of seismic vulnerability (Castelli et al. 2017).
The Korean Peninsula is located within a region of moderate seismicity inside the Eurasian plate, in contrast to nearby regions (having high seismic vulnerability) that are located at the intersections of tectonic plates . Metropolitan areas in Korea have a low absolute seismic risk and have experienced few modern earthquake disasters. Nevertheless, the absolute earthquake risk potential is higher than in the country's mountainous areas because there are greater soft soil deposits in coastal and riverside locations in metropolitan areas (Luzi et al. 2005). Moreover, most urban areas are densely situated on plains surrounded by mountains; regional geology plays a role in site effects and earthquake-induced hazards in cities. Observations of recent destructive earthquakes have demonstrated that the extent of an earthquake disaster differs depending on site-specific effects, and can even vary within the same site. Thus, it is necessary to understand the local characteristics of site effects in respect to the geological or geotechnical conditions in Korea. The corresponding site-specific variation between coastal and inland urban areas should be considered prior to developing any seismic mitigation plan.
Seismic zonation can be generally developed using geotechnical investigation data. For spatial prediction of subsurface geotechnical conditions across an area of interest, such as a large metropolitan area, existing borehole drilling data in and near the area can be efficiently used as a fundamental resource, and geotechnical expert knowledge can also be used to enhance prediction reliability. Geotechnical datasets consist largely of stratigraphic profiles obtained by boring and characterized by variable degrees of accuracy; some are accompanied by in situ and/or laboratory tests. The database assembled for a project includes boring data with GIS locations, and the density distributions of these locations vary widely among sites (Castelli et al. 2016). In many cases, the borehole data are considered to represent the true values of geomaterials. However, the borehole data have uncertainties caused by the inherent soil variability, the measurement uncertainty, and the transformation uncertainty (Kulhawy et al. 1992). Thus, it is essential to reduce these uncertainties by confining the appropriate geostatistical models for each specific area of similar characteristics, with site-effect parameters derived from geotechnical datasets.
Geostatistics is a mathematical method used to develop efficient spatial networks based on discrete data transformed from continuous values, binary values, or categorical data (Howarth 1978;Isaaks and Srivastava 1989). It can be regarded as a collection of numerical techniques that deals with the characterization of spatial relationships, employing primarily random models (Borruso and Schoier 2004). Some previous studies have attempted to obtain a multi-dimensional distribution of soil properties by analysing a variety of one-dimensional borehole data with geostatistical techniques ( € Ozt€ urk and Nasuf 2002). Density estimation is a particularly useful method because it helps to identify precisely the location, spatial extent, and intensity of site-specific geotechnical or geological clustering zones (Borruso 2005;2008). It is also visually attractive, helping to invoke further enquiry and to explain the reason why the spatial distribution of seismic site-effect parameters is concentrated. The density surface can also reflect the distribution of seismic amplification characteristics against the natural geography and geotechnical properties of the area of interest, including representing natural boundaries, such as reservoirs and lakes, or an alignment that follows a particular street in which there is a high concentration of seismic hazard potential.
The authors previously established a geostatistical assessment method for the regional zonation of seismic site effects on the basis of an advanced geotechnical information system (GTIS) framework (Sun 2004;Sun et al. 2005;2008;2014;Sun and Kim 2016). The systemized GTIS comprises four functional components: a database, geostatistical analysis, geotechnical analysis, and visualization. However, the more reliable zonation method should be systemized as a clustering-based geostatistical analysis function (Hashemi and Alesheikh 2011), considering site-specific geotechnical characteristics in representative coastal and inland urban areas of Korea. Therefore, in this study, a geostatistical optimization of the interpolation method and clustering estimation was developed as a computational framework to construct the appropriate seismic zonation using a geo-spatial database composed of multi-variable geo-data. A multivariate zonation method based on inverse distance weighting using interpolated datasets was proposed for assigning values to unknown zonation datasets located at the centre point of a unit grid. Finally, a comparison of local site-effect parameters for Seoul and Busan in Korea was performed based on the proposed framework.

Geographic information system (GIS)-based frameworks for the geostatistical zonation of site-specific seismic site effects
In this study, a GIS-based framework was established to develop the geostatistical zonation of sitespecific seismic site effects, prior to considering the local characteristics of site effects depending on topographical or geological conditions in Korea. The proposed framework was composed of four functional techniques: multivariable statistical clustering, geostatistical optimization, geotechnical analysis, and local visualization ( Figure 1). First, the local geotechnical datasets were classified using kernel density and hot spot clustering analysis to ensure geostatistical clusters with a similar spatial correlation of geo-layer characteristics. Second, to optimize the conditions of random-field assumptions in kriging methods and incorporate appropriate interpolation and zonation, a possible geostatistical methodology was validated and determined using the cross-validation-based verification test (Rue and Follestad 2003). Third, the representative geotechnical characteristic parameters correlated with site coefficients were estimated as spatial grid information from optimized zonation. Finally, the multi-scale zonation of local seismic site effects was combined with a topographical map and geo-layers were visualized. According to the proposed framework (Figure 1), the previously constructed geo-spatial database was used as the backbone dataset for stage-by-stage procedures using GIS-based multi-layer information. To build the database, borehole datasets and geo-knowledge information, which provide data spanning the fields of geotechnical engineering, geology, and geomorphology, were collected and standardized. For more reliable prediction of geotechnical information in the area of interest, the authors acquired topological surface information from topographic maps, satellite images, surface geologies, and a digital elevation model (DEM).

Multivariable statistical clustering
Generally, there are natural variations of spatial density depending on the local status of geotechnical datasets. In the first stage, to identify the spatial pattern and correlations of the geo-spatial database in the target area, a grouping within similar geotechnical datasets is conducted based on multivariable statistical clustering. The geotechnical investigation datasets are spatially distributed and form linear or circular clusters focused on urban facility sites (roads, railways, buildings, pipelines, etc.) for engineering projects. Accordingly, there are some deviations of spatial interpolation depending on the density of the specific cluster in the target area. Thus, geotechnical datasets with similar spatial correlations are grouped together as spatial fields and used as sequential analysis categories to determine the appropriate zonation method considering the spatial correlations of unit clusters, using conventional geostatistical methods.
Kernel density calculates a magnitude per unit area from point or polyline features using a kernel function to fit a smoothly tapered surface to each point (Borruso and Schoier 2004;Borruso 2005;2008). A well-established method used to identify spatial patterns is kernel density, which calculates the density of events around each point, scaled by the distance from the point to each event. Kernel density describes a smooth and continuous surface map of risk targets because a discrete density surface is made continuously by interpolation (Flahaut 2003). Therefore, this method can compensate for a paucity of data. A general density estimation function is shown by (1) where x i is the value of the variable x at location i, n signifies the total number of locations, h denotes the bandwidth or smoothing parameter and K represents the kernel function, as presented in an earlier report. Gaussian kernels 1 ffiffiffiffi 2p p exp À 1 2 u 2 À Á are typically used to create a smooth, unimodal function with a peak at 0. A common method to choose the optimal h is to use the h value that minimizes the asymptotic mean integrated squared error (AMISE). According to Yu et al., previous studies have indicated that the kernel density function selection did not significantly affect the results. However, bandwidth h significantly affects the results. No perfect measure exists for determining the bandwidth (Silverman 1986).
Using the kernel function, the optimized hot spot analysis calculates the Getis-Ord statistic for each feature in a dataset to identify the location of local clusters for site-effect parameters (G€ okkaya 2016). The Getis-Ord Gi Ã of the target feature is included in the analysis, and shows where hot spots (clusters of high values) or cold spots (clusters of low values) exist in the area (Getis and Ord 1996;Prasannakumar et al. 2011). This method works by examining each feature within the context of neighbouring features. To be statistically significant, the hot or cold spot should have a high or low value and be surrounded by other features with high or low values. The local sum for a feature and its neighbours is compared proportionally to the sum of all features. A statistically significant Getis-Ord Gi Ã score results when the local sum is very different from the expected local sum, and the difference is too large to be the result of random chance. These statistics have a number of attributes that make them attractive for measuring the association in a spatially distributed variable. After the clustering of geotechnical datasets in accordance with the Getis-Ord Gi Ã score, the site-specific grouped datasets are separately interpolated based on each variogram model, and merged as spatial grid information.

Geostatistical optimization of the spatial interpolation method
Geostatistical interpolation can provide a reliable zonation of seismic response properties (Howarth 1978). However, its effectiveness relies on the accuracy of the interpolation method used to define the spatial variability of soil properties (Goovaerts 1998;1999;2001). The variogram is a mathematical description of the relationship (or structure) between the variance of pairs of observations (or data points) and the distance separating these observations (h) (Olea 1991). The fitted curve minimizes the variance of the errors. The variogram model is used to define the weights of the kriging function (Webster and Oliver 2001;Sun and Kim 2016) and the semivariance is an autocorrelation statistic defined as where g(h) is the semivariance for interval distance class or lag interval h, N(h) is the total number of sample couples or pairs of observations separated by a distance h, Z(x i ) is the measured sample value at point i and Z(x i + h) is the measured sample value at point i + h (Isaaks and Srivastava 1989;Azpurua and Dos-Ramos 2010). In this study, to consider the correlated distance within clusters and the corresponding weights of the kriging function, the individual experimental variogram was modelled for every clustered geotechnical dataset.
If there is a lack of data, the large error in the variogram will swell the prediction errors, without it being apparent in the calculated values. Therefore, the authors validated the results of the proposed step-by-step techniques with separate independent data. To validate the accuracy of interpolation methods, the existing datasets were cross-validated to evaluate the susceptibility of kriging or zonation models and to reduce the statistical uncertainty of the boring data (Guarascio et al. 1976;David 1976; Knudsen and Kim 1978;Isaaks and Srivastava 1989). The local reliability for each observation was evaluated based on the difference between measured and estimated values using the following procedure. To evaluate cross-validated residuals, an experimental variogram from the entire sample dataset was computed and a plausible model was fitted. After excluding the measured target value at that point, the sequential value at each sampling point was estimated using kriging. Then, the difference between the estimated and measured values at each sampling point was calculated. For comparison, the root mean square error (RMSE) from the cross-validation result was the square root of the average squared distance of a data point from the fitted line, calculated by the following equation: where y i andŷ i are the measured and estimated values, respectively, of the ith data point and n is the total number of data points. As the RMSE approaches zero, the estimate is more accurate. The coefficient of variation (CV) is the ratio of the RMSE to the mean of the dependent variable ( € Ozt€ urk and Nasuf 2002).
To determine reliable site-specific criteria for geotechnical layer classification considering local site effects, an optimized geostatistical method was developed and standardized based on cross-validation-based verification tests of geostatistical estimations. Sun et al. (2016) proposed a geostatistical analysis component for the optimum geostatistical estimation of soil conditions using conventional kriging methods. This computer-based framework comprises step-by-step and adaptive optimization techniques, with a separate independent geostatistical method (Deutsch and Journel 1992). First, to determine the optimum interpolation method, four representative interpolation methods [inverse distance method (IDW), simple kriging (SK), ordinary kriging (OK), and empirical Bayesian kriging (EBK)] are used in a cross-validation-based verification test. Second, to determine the accuracy of raster-based zonation, three decision-making tools are used to obtain the representative values (minimum, average, and maximum) among the datasets distributed at coincident locations (unit square grid-cells). In this grid-based zonation method, the multivariate zonation is applied, with a known scattered set of points at a specific unit grid. The values assigned to unknown zonation datasets located at the centre point of each unit grid are calculated with a weighted average of the values available at the known interpolated datasets, which are simultaneously distributed within a unit grid and circular influence radius ( Figure 2). Thus, the representative values can be considered to be inverse-distance-weighted values at the coincident locations (unit square grid-cells), estimated by the following equation: where d i is the distance between x i (interpolated datasets) and z i (zonation datasets). The authors tested the grid-size effect related to interpolation accuracy based on an unconditional multi-grid simulation, consisting of several stages. In the first stage, an unconditional simulation was generated on a coarse grid. In each of the subsequent stages, simulation was performed on a finer grid, covering the same area (or volume), and was conditioned to values simulated at the previous stage. This process was repeated until the final grid was completed. The optimum grid size (having the lowest RMSE versus grid size) was determined for the comparison of cross-validationbased RMSEs in each stage of the multi-grid simulation. Consequently, to determine the accuracy of the zonation type, the authors conducted a validation test by comparing the cross-validation-based RMSEs of grid-based zonation and borehole-based zonation methods. According to the proposed framework for optimum geostatistical estimation, grid-based geotechnical values (thickness of geotechnical layers) were calculated for the geotechnical analysis to calculate site parameters.

Geotechnical analysis
Site effects inducing an amplification of ground motion are directly related to geological site conditions and are associated with the passage of seismic waves through soil layers . The behaviour of site-specific seismic responses are predominantly controlled by differences in the shear wave velocity (V S ) between the soil layers and the underlying bedrock, which represent an impedance contrast, and by the thickness of the soil layers or the depth to bedrock (H). Site response analysis techniques have incorporated these concepts, particularly the phenomenon by which the largest amplification of earthquake ground motion at a nearly level site occurs at approximately the lowest natural frequency (Rodriguez-Marek et al. 2000). The period of vibration corresponding to the fundamental frequency is called the characteristic site period (T G ), and for the one-dimensional model of multi-layered soil is calculated as follows: where D i is the thickness of each soil layer above the bedrock (H = SD i ), V Si is the V S of each soil layer and n is the number of soil layers. The site period is a useful indicator of the period of vibration, during which the most significant amplification is expected. In addition, the depth to bedrock geometrically indicates the local seismic response patterns by assuming a similar stiffness in soil layers over the bedrock. If the spatial variations in the thickness and V S values of soil layers are known for an entire study area, the spatial variation of T G can be readily established and used for regional earthquake hazard estimations. For seismic design of structures in accordance with site conditions, correlations have been established between the mean V S of the upper 30 m (V S30 ) and site coefficients (or amplification factors) based on empirical and numerical studies of specific earthquakes, including the 1989 Loma Prieta earthquake (Borcherdt 1994;Borcherdt 2002). Accordingly, current seismic codes apply a site characterization for a site class based only on the top 30 m of the ground (Rodriguez-Marek et al. 2001;Sun et al. 2005). The site class is determined solely and unambiguously by one parameter: V S30 . For a profile consisting of n soil and/or rock layers, V S30 (in units of m/s) is given by where d i is the thickness of each soil and/or rock layer to a depth of 30 m (30 m = Sd i ).
To quantify site effects for use in structural design, correlations between site coefficients and several geotechnical characteristic parameters have been established based on empirical and numerical studies conducted in many countries (Borcherdt 1994;Dobry et al. 1999;Rodriguez-Marek et al. 2000;Sun et al. 2005). Geotechnical parameters have been used as criteria for categorizing site conditions according to the extent of ground motion amplification quantified by site coefficients. Representative parameters include the V S30 and T G . In most current seismic design codes, site conditions are classified into five categories (denoted by A to E) according to the V S30 values and an exceptional special category (denoted by F) (Dobry et al. 1999;Sun et al. 2005). The site coefficients are used to estimate the design response spectra that are dependent on both the site classes and the intensity of rock motions. F a and F v are the same (site class B) and increase as the soil becomes softer with decreasing V S30 or as the site class evolves through C, D and E. In addition, the site coefficients are generally higher for small rock outcropping motions than for large rock motions because of geomaterial nonlinearity (Dobry et al. 1999;Sun 2004).
The spatial grid information of T G values was computed using both the thickness and the V S of geotechnical layers (including weathered rock) over the bedrock. The thickness of soil layers had already been estimated across the study area within the geotechnical database. However, V S had not been determined for each testbed due to insufficient seismic testing. Thus, representative V S values of geotechnical layers in target areas were determined by compiling the results of previous in situ seismic tests that had obtained V S profiles at a number of sites in South Korea (Sun 2004;Sun et al. 2005). On the basis of these previous seismic testing results, the representative V S values were determined to be 190 m/s for fill, 280 m/s for alluvial soil, 350 m/s for weathered residual soil, 650 m/s for weathered rock, and 1,300 m/s for bedrock (Sun et al. 2014). In this study, the site classification scheme for Korea (Table 1) was adopted to demonstrate site classification based on the depth to bedrock, V S30 , and a T G zoning map. For further validation, the site-specific correlations between seismic site parameters and geotechnical testing results can be developed and redefined as site classification criteria for regions where adequate in-situ site investigations, such as downhole seismic tests, multichannel surface wave analysis, small scale microtremor measurement, and standard penetration tests, were conducted (Rahman et al. 2015).

Local visualization
In this study, the vertical scales in the three-dimensional figures were exaggerated five-fold, and surface cover data, such as administrative borders, rivers, and buildings, were overlaid on the ground surface for better visualization of surface and subsurface features. A DEM of sea level was also included under the surface cover to consider the geomorphological characteristics of coastal areas. The spatial information was built using units of meters on the Transverse Mercator (TM) coordinate system, on which X and Y represented the directions from west to east and south to north, respectively, and Z was the elevation (Sun et al. 2014). The thickness of the geotechnical layers and the depth to bedrock were expressed as zoning contour maps and were overlain with the spatial topographic surfaces of the study area to reflect reality better (Sun et al. 2016). Therefore, visualizations based on a GIS platform usually produce two-dimensional contour maps of the plane. The thickness of the geotechnical layers and the site-effect parameters are usually expressed as contours on corresponding contour maps, which can be overlain with three-dimensional topographic surfaces of the study area to reflect reality better.

Geotechnical database in coastal and inland urban areas of Korea
The authors constructed the geo-spatial database for representative inland and coastal urban areas of Korea and applied it to assess site-specific geotechnical values, specifically the thicknesses of the soil layers or the depth to bedrock. Testbeds were first separated into areas with a 100-m mesh, resulting in Seoul and Busan having 156,025 and 56,800 spatial grids, respectively. The component mesh-unit data were created for each spatial grid. The authors chose the target study area to be the entire territory surrounded by the administrative boundaries of the Seoul and Busan metropolitan areas as the administrative region. These target areas are the largest and second largest urban areas in Korea. The process involved gathering existing borehole data, with walk-over site visits conducted across the extended area to acquire surface geo-knowledge data. The subsurface soil layers identified from borehole data were classified into five categories: fill, alluvial soil, weathered soil, weathered rock, and bedrock. Kriging interpolation based on a geostatistical analysis component was expected to produce more reliable predictions of unknown geotechnical data from known geotechnical data than extrapolation in the spatial domain. The authors introduced the concept of an extended area encompassing the study area. Borehole datasets of the study area were insufficient because of the biased spatial distribution. Accordingly, site visits were conducted to acquire surface geo-knowledge data mainly in areas where borehole data were lacking. The surface geo-knowledge datasets (representatively, bedrock outcrop data) were determined by a geotechnical ground survey (using a simple cone test, GPS, etc.) with grid-type locations and by cross-checking with the geotechnical layers from neighbouring borehole data based on geotechnical engineering judgments. At the Seoul area, estimates of the spatial geotechnical layers across the extended area were collected from a total of about 22,300 existing borehole datasets and about 1,700 surface geo-knowledge datasets composed of the five categories of geotechnical layers. In contrast, the geotechnical database was constructed based on about 2,900 existing borehole datasets and about 200 surface geoknowledge datasets in the Busan area.
To estimate soil layers spatially for the testbed areas, the authors applied the proposed site-specific interpolation method to the extended Seoul area (39.0 km west to east, 34.0 km north to south) and the extended Busan area (57.0 km west to east, 48.0 km north to south). Figure 3 shows the geographic information for Seoul and Busan, and the corresponding selected areas (extended area and the study area of the territory of Busan city) with the DEM. In addition, Table 2 shows the typical ground stratification (additional compositional ordering and thickness of each layer) of Seoul and Busan with classifications of each soil stratum based on the borehole log.

Kernel density estimation
The authors used the kernel density estimation to visualize spatial trends for site-effect parameters. The authors found markedly different spatial distributions depending on the geotechnical datasets and geographic conditions, with the amount of geo-layer and hot spots varying. To identify the spatial pattern of geotechnical datasets in respect to the site-effect parameters, spatial density was evaluated based on the kernel density. Table 3 portrays the results after applying kernel density (bandwidth 250 m) to three site-effect parameters (depth to bedrock, V S30 , T G ) overlapped with the administrative boundary in Seoul and Busan. In the Seoul extended area, the central area of Seoul has a higher consistency of kernel density for site-effect characteristics (especially depth to bedrock) along the Han River. Therefore, the authors can see quite clearly that there are two strong concentrations of site effect induced by depth to bedrock, one spreading over a distance of several kilometres on the west central area and one on the east central side. On the other hand, the  kernel densities of site-effect parameters were estimated centrally along the coastline of Busan. Rather than choosing an arbitrary interval, it is useful to use the mean nearest neighbour distance for different orders of K, which can be calculated by ArcGIS toolsets as part of a nearest neighbour analysis. Thus, geostatistical estimation can be discriminately conducted for large-scale zones with similar spatial correlations according to the specific group (or cluster) and considering kernel density.

Optimized hot spot clustering with kernel density
The hot spot analysis method uses vector datasets to identify the locations of statistically significant hot spots and cold spots. To evaluate site-specific concentrations of dual variables (site-effect parameters), optimized hot spot clustering was conducted using the kernel function. In this study, the local point map of correlations between depth to bedrock and site period was created using a kernel density estimation map (Figure 4). The multivariable clusters were categorized as five groups: not significant, high-high cluster, high-low outlier, low-high outlier, and low-low cluster. To identify a strong positive relationship between depth to bedrock and site period, which would have the most significant amplification, the cluster with the highest site-effect characteristics and the highest kernel density was defined as the high-high cluster. The high-low outlier represented the spatial relationship of the highest depth to bedrock and the lowest site period. In contrast, the low-high outlier was defined as a point map with dual attributes consisting of the lowest depth to bedrock and the highest site period. These outlier clusters can be regarded as outlier datasets, which should be reduced or removed for reliable prediction of the seismic zonation representing local site effects. The low-low cluster was defined as the cluster with the lowest site-effect characteristics and the highest kernel density. The point map with a non-critical relationship among the site-effect parameters was determined to be the non-significant cluster.
In the Seoul area, there was a high potential for site effects determined in the western and eastern central urban areas. Clusters with less potential for site effects were evenly distributed across the entire area. In the Busan area, no hot spots with a high potential for site effects were identified. Nevertheless, the delta plain at the mouth of the Nakdong River and the eastern coastal area had a higher density of high-high clusters. Therefore, Seoul was classified as having six clusters: two highhigh clusters, one high-low cluster, one low-high cluster, and two low-low clusters. Busan was defined as having five groups: two high-high clusters, one high-low cluster, one low-high cluster, and one low-low cluster. Depending on the geotechnical datasets of each group, the construction of two-dimensional geo-layer information was performed separately according to the appropriate variogram method. Specifically, the variograms for each separate cluster were modelled for Seoul and Busan.

Spatial distribution of the thickness of the soil layer on the ground surface
Site amplification induced by strong ground motion plays an important role in site-specific seismic damage to structures. For convenience and effectiveness, empirical relationships or simple site classification schemes have been used to evaluate site-specific seismic responses at the regional scale. In this study, the authors used the depth to bedrock, V S30 , and T G to estimate the site effects for the entire administrative area of Seoul and Busan based on a geotechnical analysis. The site effects were presented on zoning maps that identified locations or zones of varying seismic hazard potential. To identify the relationship between the variance of observation pairs and the distance separating these observations for each clustered group in testbeds, the variogram was developed as shown in Figure 5. The variograms for the borehole datasets with exponential models were determined considering the spatial trend of a geo-layer. The corresponding correlation length (or range), which is the distance where the variogram reaches the sill, was determined for the thickness of the soil layer in each clustered group. Based on the variogram, phased verification tests were conducted to determine the optimal interpolations and zonation. Figure 6 presents a spatial zoning map showing the distribution of the thickness of the soil layer, with maximum values of about 46 and 69.8 m in Seoul and Busan, respectively. The alluvial sediment was generally composed of sandy and clayey soil, which are regarded as vulnerable layers in a seismic site response, and was deposited along several branches of the Han River in Seoul, and the mouth of the Nakdong River or coastal sites in Busan.

Determination of optimal interpolation method and unit grid zonation method
Using the database in the extended areas of Seoul and Busan, site-specific geotechnical spatial datasets were interpolated. Prior to that, geostatistical interpolation conditions were optimized and standardized stage by stage using cross-validation-based verification tests of the geostatistical estimation considering site conditions in the study area. The optimum interpolation method was determined by applying four representative interpolation methods: IDW, SK, OK, and EBK. At the same time, to verify the zonation methods at the coincident unit square grid-cells, the authors determined the minimum, average, and maximum values among the datasets distributed at coincident locations. Accordingly, cross-validation-based RMSEs were estimated for the depth to bedrock on the basis of a 100-m grid-cell size in target areas, as shown in Figure 7. Among the siteeffect parameters, OK had the lowest RMSE, which indicates that the technique was the most accurate geostatistical interpolation method for the two testbeds. The RMSE obtained when using the maximum value method was generally smaller than that obtained when using the minimum and average value methods in coincident unit square grid-cells, which suggests that OK using the maximum zonation at unit grid-cells was more appropriate in the extended areas of Seoul and Busan, as shown in Figure 7.
However, there was no significant variation of the correlations among the minimum, average, and maximum values at coincident unit square grid-cells and the corresponding RMSE. Furthermore, the proposed multivariate zonation method was then applied and verified to overcome the conventional zonation method without consideration of the spatial relationship in unit square gridcells. Thus, each cross-validation for specific geo-layers classified into five categories was estimated using OK (Figure 8). The cross-validation-based RMSE of the inverse-distance-weighted value using the multivariate zonation method was the lowest, even though there were greater or lesser differences among the geotechnical values. The multivariate zonation method for depth to bedrock, estimated by compiling the four geotechnical layers, had a much lower RMSE than cases using the minimum and average values, because of the accumulation of error variance in kriging.

Determination of optimal grid size
The appropriate subsequent stage (grid size) was determined from an unconditional multi-grid simulation. The unconditional multiple-grid simulation consisted of seven stages of grid-cell size: 5, 10, 20, 50, 100, 200, and 500 m. Figure 9 shows the cross-validation-based RMSE versus the seven stages of grid-cell size for the three site parameters (depth to bedrock, V S30 , and T G ). The optimum grid sizes for Seoul and Busan, with the lowest RMSE, were respectively determined as 20 and 50 m by the comparison of cross-validation-based RMSEs in each grid size shown in Figure 9. Therefore, the authors determined the optimum geostatistical interpolation method to be OK using the multivariate zonation method in unit grid-cells (20 £ 20 m for Seoul and 50 £ 50 m for Busan). Based on the proposed procedure, the site-specific site conditions and parameters were computed and presented on zoning maps by multiplying and averaging the grid-based geotechnical values into four size subunits. Table 4 shows the spatial distribution of the thickness of four geotechnical layers and three site parameters predicted by OK in accordance with the multivariate zonation method at unit gridcells with a grid size of 20 and 50 m for each testbed.

Determination of optimal zonation methods
To verify the reliability of the grid-based zonation method, the authors compared it with the conventional zonation method using point-based interpolation. This was performed using OK in unit grid-cells (20 £ 20 m for Seoul and 50 £ 50 m for Busan) after computing the three site parameters, as shown in Table 5, which described the relative difference depending on zonation methods for each grid-cell. In the Seoul area, the average differences for the three site parameters were 0.46 m (in the depth to bedrock case), 58.25 m/s (in the V S30 case), and 0.03 s (in the T G case). In the Busan area, the average differences for three site parameters were 0.51 m (in the depth to bedrock case), 112.05 m/s (in the V S30 case), and 0.05 s (in the T G case).
Accordingly, the RMSE of grid-based zonation was smaller than that of borehole-based zonation following cross-validation, which demonstrated that grid-based zonation was more appropriate for all indices of site effects in the extended Seoul and Busan areas (Figure 10). Although the difference in RMSE depending on zonation method was extremely small, there was a much larger spatial difference in V S30 , which is used for site classification in Korean seismic codes, with a focus on the mountain areas in the Busan area. In contrast, the thicker alluvial deposit layers are located at the riverside along the Han River, where there are more geotechnical datasets for geostatistical estimation than in the mountain areas around Seoul. Thus, in any case where the geotechnical datasets have multi-scale spatial correlations, the grid-based zonation method for producing a site-specific continuous zonation map is appropriate, considering seismic site characterization.  For efficient zonation based on T G over the study areas, the geotechnical thickness data interpolated by the geostatistical analysis component and the representative V S values were imported into the geotechnical analysis component. Then, T G and V S30 were calculated for the optimized grid at 20 and 50 m intervals based on Equations (5) and (6). The calculated depth to bedrock, V S30 , and T G values for the Seoul and Busan areas were spatially modelled, resulting in the seismic zoning maps presented in Figures 11 and 12. Three-dimensional spatial geotechnical datasets and their threedimensional visualizations provide appropriate site-specific seismic site effects, corresponding to the strata of geo-layers, based on a GIS platform.
In the river basin of the Seoul area, the alluvial soil is thicker (up to 70 m) and the depth to bedrock is deeper (up to about 85 m) than in the surrounding mountain areas (Figure 11(a)). Soil development in the river basin is mainly a result of fluvial landform processes (Figure 11(a)). Such zones of thick soil or the deep depth to bedrock are susceptible to ground motion amplification due to site effects during earthquakes. The V S30 ranged between about 240 and 320 m/s (Figure 11(b)) in part of the western river basin, which was deeper and smaller than for mountainous and hilly areas. For

Seoul
Busan T G (s) Table 5. Spatial difference between grid-based and borehole-based zonation methods for three site effect parameters.
Depth to bedrock (m) V S30 (m/s) T G (s) Seoul Busan efficient zonation based on T G values across the study area, the geotechnical thickness data interpolated in the geostatistical optimization component and the V S values were imported into the geotechnical analysis component. The representative T G values (Figure 11(c)) for the zone along the river with many buildings were generally greater than those for mountainous and hilly areas: values generally ranged from about 0.3 s to 0.5 s in the Seoul area. The spatial distribution of T G was particularly consistent with the distribution of bedrock depth, as depicted in Figure 11(c). This rigorous   zonation can serve as a fundamental resource for predicting seismically induced structural damage. All objects or structures have their own natural periods. The natural period of a building is generally accepted to be 0.1 times the number of its stories. Therefore, three-to five-story buildings located along the river would be relatively vulnerable to seismic damage caused by earthquake resonance.
In the Busan area, the depth to bedrock ranged between about 50 and 65 m (Figure 12(a)), and V S30 ranged between about 200 and 350 m/s in the western deltaic plains and parts of the eastern coastal slopes, which was deeper and smaller than for mountainous and hilly areas. In the same regions, T G was also longer in the western deltaic plains and parts of the eastern coastal slopes than for mountainous and hilly areas, ranging mainly between about 0.46 and 0.62 s (classified as D1 to D4), respectively. The spatial distribution of T G is particularly consistent with the distribution of bedrock depth presented in Figure 12(a). In Figure 12(c), the spatial data for building cover (presented as a grey line) are overlain on the T G distribution to examine their seismic vulnerability. In the western deltaic areas of Busan, buildings between about five and seven stories would be vulnerable to seismic damage caused by earthquake resonance. Zoning information based on T G (or depth to bedrock or V S30 ) can contribute to earthquake-related management strategies and also to rational land use, city planning, and development in the study area. Zoning information based on T G values can contribute to earthquake-related strategies and also to effective land use and city planning or development in the entire study area.

Site classes with administrative sub-units
Spatial zoning maps of site classes in the Seoul and Busan areas based on administrative sub-units were constructed. Site classes for all of the administrative sub-units of Seoul and Busan were estimated based on the average of three seismic site-effect parameters (bedrock, V S30 , and T G ) for each district and are shown in Figures 13 and 14. The short-and mid-period site coefficients (F a and F v ) according to depth to bedrock, V S30 , and T G for the seismic design of structures, described in the site classification system of Table 2, are presented in the legends of Figures 13 and 14.
In the Seoul area, most sub-units adjacent to the Han River and major creeks fall within site classes C2, C3, and C4, for the three parameters. However, three sub-units for V S30 (Figure 13(b)) and eight sub-units for T G (Figure 13(c)) in the south-western plain fall in site classes D1 and D2. The amplification potentials shown in Figure 13 were lower than those shown in Figure 11 because the site class for each sub-unit was determined by averaging the site classes, which can be particularly useful for official agencies when making earthquake-related decisions. As shown in Figure 14, the western deltaic plains and several coastal slope zones in Busan fall within site classes D (D1 to D4), representing the site conditions of significant ground motion amplification. These areas also correspond to seismically vulnerable areas identified by T G , as indicated in Figure 12(c). Most mountainous and hilly areas in Busan fall into site class B, having a value of 1.0 for both F a and F v , or site classes C1 and C2, with slightly amplified ground motion. This spatial zonation map of site classes can provide information useful for the preliminary seismic design of structures before the practical seismic design is implemented on site.
In a comparison of the spatial distribution of seismic site classes between inland and coastal areas of Korea, earthquake loss associated with possible building damage can be predicted based on the spatial zonation maps by considering the geological and topographical characteristics. Some threeto five-story buildings in the southwest area of the river basin (urban area) would be vulnerable to seismic activity due to resonance during earthquakes, according to the spatial zoning map of T G in the Seoul area. On the other hand, the T G map suggests that three-to seven-story buildings along the east central coast (urban area) or deltaic plains in the Busan area are more vulnerable to seismic activity than buildings in the Seoul area (land urban area).
The site class for seismic design and seismic performance evaluation can be determined unambiguously by three parameters (depth to bedrock, V S30 , and T G ). Thus, if the spatial variations of site conditions are known over the entire study area, the site coefficients according to these site classes  can be readily determined for any site in the study area by spatial seismic zonation. To assist with the conservative seismic decision-making for each administrative sub-unit in a metropolitan area, a seismic zoning map for site classification based on T G values is more appropriate than a map using the other site parameters because it can better classify deep soft soil in the same sub-unit.

Conclusion and discussion
In this study, a recently established GIS-based framework was applied to the coastal and inland urban areas of Korea and its applicability for regional assessment of the geostatistical zonation of site-specific seismic site effects was evaluated. The proposed framework was composed of four functional techniques: multivariable statistical clustering, geostatistical optimization, geotechnical analysis, and local visualization. In the first stage, to identify the spatial pattern and correlations of a geospatial database in the target area, a grouping within similar geotechnical datasets was conducted based on the multivariable statistical clustering method: kernel density estimation, and optimized hot spot clustering. Accordingly, to consider the correlated distance within clusters and the corresponding weightings of the kriging function, an individual experimental variogram was modelled for every clustered site-specific geotechnical dataset. Second, to optimize the assumption conditions for appropriate interpolation and zonation, the possible geostatistical methodology was validated and determined using a four-step cross-validation-based verification test: optimization of interpolation method, unit grid zonation method, grid size, and zonation method. Third, the representative geotechnical characteristic parameters (depth to bedrock, V S30 , and T G ) correlated with site coefficients were estimated as spatial grid information results using the optimized zonation method. Fourth, the multi-scale zonation of local seismic site effects combined with a topographical map and geo-layers was visualized.
The established framework was applied in the Seoul and Busan areas for consideration of sitespecific site effects in coastal and inland urban areas in Korea. The process involved gathering existing borehole data, with walk-over site visits conducted across the extended area to acquire surface geo-knowledge datasets. The six clusters in Seoul and five clusters in Busan were classified within subdivided geotechnical datasets that had similar geostatistical spatial trends based on the optimized clustering with kernel density. Spatial geotechnical layers and indices of site effects were predicted by OK in accordance with the zonation method in unit grid-cells, which were classified as multivariate zonation values with a grid size of 20 m in Seoul and 50 m in Busan. The RMSE of grid-based zonation was smaller than that of borehole-based zonation following cross-validation, which demonstrated that grid-based zonation was more appropriate for all indices of site effects in the extended Seoul and Busan areas.
In the river basin of the Seoul area, the alluvial soil was thicker (up to 70 m) and the depth to bedrock was deeper (up to about 85 m) than in the surrounding mountain areas. Soil development in the river basin was mainly a result of fluvial landform processes. Such zones of thick soil or deep depth to bedrock are susceptible to ground motion amplification due to site effects during earthquakes. The V S30 ranged between about 240 and 320 m/s in parts of the western river basin, which was deeper and smaller than for mountainous and hilly areas. The representative T G values for the zone along the river with many buildings were generally greater than those for mountainous and hilly areas, with values generally ranging from about 0.3 s to 0.5 s in the Seoul area. In the Busan area, the depth to bedrock ranged between about 50 and 65 m, and the V S30 ranged between about 200 and 350 m/s in the western deltaic plains and parts of the eastern coastal slopes, which was deeper and smaller than for mountainous and hilly areas. In the same regions, T G was also longer in the western deltaic plains and parts of eastern coastal slopes than for mountainous and hilly areas, ranging between about 0.46 and 0.62 s (classified as D1 to D4), respectively, in the Busan area.
The earthquake loss associated with possible building damage can be predicted based on spatial zonation maps by considering the geological and topographical characteristics in a comparison of the spatial distribution of seismic site classes between inland and coastal areas of Korea.
Furthermore, according to the geo-spatial grid information in coastal areas and river basins, earthquake vulnerability induced by site effects should be considered in more detailed seismic performance evaluations.