Improving global digital elevation models using space-borne GEDI and ICESat-2 LiDAR altimetry data

Open source Global Digital Elevation Models (GDEMs) serve as an important base for studies in geosciences. However, these models contain vertical errors due to various reasons. In this study, data from two Satellite LiDAR altimetry systems, GEDI and ICESat-2, were used to improve the vertical accuracy of GDEMs. Three di ﬀ erent machine learning methods, namely an Arti ﬁ cial Neural Network (ANN), Extreme Gradient Boosting (XGBoost), and a Convolutional Neural Network (CNN), were employed to improve existing DEM data with satellite LiDAR data. The methodology was tested in ﬁ ve areas with varying characteristics. Ground control data were selected from high accuracy DEMs generated from Airborne LiDAR and GNSS data. The use of ANN method improved the vertical accuracy of SRTM data from 6.45 to 3.72 m in Test area-4. Similarly, the CNN method demonstrated an improvement in the vertical accuracy of bare ground SRTM data increasing from 3.4 to 0.6 m in Test area-4. In Test area-5, the ANN method improved the vertical accuracy of SRTM data with slopes between 30 and 60%, increasing from 3.8 to 0.5 m. Notably, the results underscore the successful improvement of GDEMs across all test areas.


Introduction
A Digital Elevation Model (DEM) is a 3D raster model that displays the elevation per grid cell obtained at a given datum.DEMs serve as an important base for almost everyone working in geosciences.Elevation data and directly derivable data such as slope, aspect, and roughness play an important role in flooding and landslide analysis and in terrain visualization.For this reason, many DEMs have been produced with different extent, spatial resolution, and data collection techniques (Mesa-Mingorance and Ariza-López 2020).Global DEMs (GDEMs) cover the entire earth and are often freely available.These GDEMs are commonly used as input for studies that do not require high spatial resolution.At the same time, for hard-to-reach regions and countries with limited resources, GDEMs data is still the only elevation data available.Because of their wide applicability, the quality and quality improvement of different GDEM products is an important research topic.In this study, we consider the three most widespread GDEMs: ALOS World 3D (AW3D30), Aster GDEM, and SRTM.AW3D30 is a terrain model with a resolution of approximately 30 m (1 arcsec) produced from stereo pairs of optical images acquired by the ALOS satellite.AW3D30 was developed by the Japan Space Research Agency and its first version was released in 2015.AW3D30 covers the area between 60°North and 60°South (Tadono et al. 2014).Aster GDEM is also a terrain model with a resolution of ∼30 m (1 arcsec) resolution derived from stereo images, images acquired by the ASTER satellite were used.Aster GDEM was developed jointly by the National Aeronautics and Space Administration (NASA) and the Ministry of Economy, Trade, and Industry (METI).The first version was released in 2009.Aster GDEM coverages the area is between 83°North and 83°South latitudes (Tachikawa et al. 2011).SRTM is a DEM data terrain model with a similar resolution of approximately ∼30 m (1 arcsec) produced by from C band radar imagery (C band), (Farr et al. 2007).SRTM was developed by the NASA and its first version was released in 2003.The SRTM covers the area is between 60°North and 60°South latitudes (Van Zyl 2001).We evaluated these GDEMs them with different ground truth data at several in many locations.Some similar studies on these data are presented in Table 1.
In our study, spaceborne altimeter data from two LiDAR systems, started to operate at the end of 2018, were used to improve GDEM data.The Global Ecosystem Dynamics Investigation (GEDI) system, mounted on the International Space System (ISS), collected Earth elevation data using a LiDAR full waveform system (Dubayah et al. 2020).The ATLAS instrument on the ICESat-2 satellite collects elevation data using LiDAR photon counting system (Neuenschwander and Magruder 2019).These two data sources have been used together or separately in various recent studies (Liu, Cheng, and Chen 2021;Narin and Gullu 2023;Quiros et al. 2021;Shang et al. 2022;Urbazaev et al. 2022).Urbazaev et al. (2022) compared the performance of GEDI and ICESat-2 data in determining terrain height for four different terrain classes and six different forest types.They stated that the increase in slope has a negative effect on the accuracy of both data.They also stated that the accuracy varies according to forest type.Liu, Cheng, and Chen (2021) investigated the success of ICESat-2 and GEDI data in determining terrain height and canopy height for 40 different so-called NEON fields.In addition, the influence of the moment of acquisition such as night/day was examined in detail for both data sources.As a result, they obtained vertical accuracies between 2.24 and 4.03 m RMSE for middle and lower latitudes.They also stated that the accuracy of both data in determining the terrain height decreased for areas containing trees higher than 20 m and for areas where the canopy was more than 90%.They recommended the use of data obtained at night, as it provides higher accuracy due to the absence of the influence of solar radiation.Quiros et al. ( 2021) compared airborne LiDAR data with GEDI data for 10 different areas in Spain.As a result of comparison between the RH100 metric with LiDAR, they achieved an average accuracy of 3.56 m according to RMSE.They stated that the accuracy of GEDI was highly correlated to canopy density and slope.Since spaceborne LiDAR altimeter data are affected by many environmental effects, a new strategy has been introduced to find high quality ICESat-2 data (Shang et al. 2022).This study was conducted in two different regions of China and reported a terrain height error of 0.53 m RMSE for ICESat-2 data after implementation of an 8-step filtering methodology (Shang et al. 2022).The above results indicate that spaceborne LiDAR data has higher accuracy than collocated GDEM data.
GDEM data comparisons to other elevation data differ considerably for different areas.In order to improve the quality of GDEM data, studies recommend integrating it with other available elevation data.Yue et al. (2017) produced a new elevation model for the entire China region by merging Aster, SRTM-1, and ICESat-1/GLAS points using an ANN method.The reported RMSE values for ASTER and SRTM-1 are 20.5 and 16.5 m, respectively.The RMSE of the improved DEM decreased to 15.2 m (Yue et al. 2017).Chen, Yang, and Li (2020) co-registered SRTM with ICE-Sat/GLAS data over Jiangxi province, China, using 146 GNSS control points.They employed multi linear regression (MLR), a back propagation neural network (BPNN), a generalized regression neural network (GRNN), and Random Forest (RF) for co-registration.According to their statement, RF yielded the best results among and GDEM data from AW3D30, Aster GDEM, and TanDEM-X could be improved as a result.Lee and Hahn (2020) produced a new DEM by combining DEM data from the KOMPSAT 3 satellite and ICESat-1 satellite altimeter data using point-to-surface matching.As a result, they stated that the DEM accuracy increased from 9 to 2 m RMSE.They only used GCPs containing ground data and did not use vegetation points or points from surfaces that would otherwise cause errors.Yamanokuchi, Doi, and Shibuya (2007) combined InSAR and ICESat/GLAS data over Antarctica.28th JARE (Japanese Antarctic Research Expedition) GPS measurements were used as terrestrial ground truth data.They reported an RMSE of ± 279.3 m between an InSAR DEM and JARE measurements, while the accuracy of the corrected InSAR DEM improved to ± 39.5 m RMSE.Bagheri, Schmitt, and Zhu (2018) fused TanDEM-X and Cartosat-1 DEM data and improved accuracy over an urban area using an ANN.Girohi and Bhardwaj (2022) proposed a neural network-based fusion approach to improve InSAR based DEM data.The results revealed high levels of improvement on plain and hilly regions.Okolie, Mills, and Smit (2022) compared watersheds and drainage networks derived from Copernicus GLO-DEM, AW3D-DEM, and a fused version of these two data products.Their accuracy analysis using LiDAR reference data showed that the fused data had better accuracy.
It is noteworthy that the number of articles on GDEM improvement is still limited.In addition, notably older, and low coverage ICESat-1/GLAS data was used for GDEM improvement.In our study, we used ICESat-2 and GEDI data together, which are both more up-to-date than ICESat-1/GLAS data, and together have much better coverage.In this way, we aimed to increase the point density of higher quality data for GDEM improvement.To integrate elevation data, several machine learning and deep learning methods were used.Sahin (2020) used Extreme Gradient Boosting (XGBoost), and RF for landslide susceptibility mapping and obtained the best result with the XGBoost method.Panahi et al. ( 2020) compared the Convolutional Neural Network (CNN) and Support Vector Regression (SVR) methods to determine the potential locations of groundwater and stated that CNN gave better results.Abdikan et al. (2023) used Simple Linear Regression (SLR), Multiple Linear Regression (MLR), ANN, XGBoost, and CNN techniques for plant height estimation.They stated that the best result was obtained by the ANN method followed by CNN.Apparently different methods give better results in different settings.In our study, we used the ANN, CNN, and XGBoost methods to improve GDEMs with spaceborne altimeter data.In addition to the improvement of the GDEM data, we also performed a detailed assessment of the data in our study areas.In summary, we consider the following objectives; 1.After filtering the ICESat-2 and GEDI data, vertical accuracy analyzes were performed according to land use and slope.2. GDEMs were corrected by incorporating spaceborne altimeter data with machine learning techniques.3. The results were evaluated in terms of vertical accuracy and land use and slope.4. For each new model obtained, the accuracy results were evaluated for all areas, different slope groups, and land use.

Study areas
We consider 5 test areas in our study.Test area-1 is in the South-East of Washington State, USA (Figure 1).Our study area is in the forests of Umatilla National, west of the Blue Mountains range, and 77% of the area is covered by forest.This area has steep slopes and deep canyons (U.S.D.A. n.d.).Test area-2 is located on Puerto Rico, an island in the Caribbean (Figure 1).The south of the study area includes the Cordillera Central Mountain range and the highest point of the island, 'Cerro de Punta' (Balghin and Coleman 1965).The north side is relatively flat and contains river beds that have formed two deep valleys.90% of this area is covered by forest (Figure 1).Test area-3 is located in the south of New Zealand, near Christchurch, and is called 'Banks peninsula' (Figure 1).Test area-4 is located in Ankara, the capital of Turkey (Figure 1).This area was selected due to its abundance of buildings.Test area-5 is situated on the west side of Istanbul, which has the densest population in Turkey (Figure 1).It is characterized by both dense settlements and forests.An overview of the study areas is given in Table 2.

Satellite altimetry
A satellite altimeter is a system that basically calculates the time it takes for a radar or laser beam sent from a satellite to hit the surface and return.Spaceborne LiDAR altimeters were used in different Earth observation studies (Dandabathula 2022) considering ocean topography (Morison et al. 2022), inland water level monitoring (Narin and Abdikan 2023), change detection (Zhang et al.  2023), canopy height estimation (Vatandaslar, Narin, and Abdikan 2023), terrain height estimation (Liu, Cheng, and Chen 2021), GDEM improvement (Li et al. 2023) and DEM production (Narin and Gullu 2023).In 2018, the GEDI and ICESat-2 LiDAR altimeter missions started operation.GEDI is a LiDAR full waveform system mounted on ISS.GEDI collects data in the WGS84 coordinate system between 51.6°north and south (Dubayah et al. 2020).In this study, the GEDI Level 2a Geolocated Elevation and Height Metrics product is used.The ATLAS instrument on ICESat-2 collects elevation data using the photon-counting principle.ICESat-2 has a revisit time of 3 months and collects data in the WGS84 coordinate system between 88°north and south (Neuenschwander and Magruder 2019).We used the ATL08 Land Water Vegetation Elevation product that contains ground and canopy height information (Neuenschwander et al. 2020).Note that both GEDI and ICESat-2 collect data in profile mode, that is elevations are obtained along several orbit-parallel profiles, but not is swaths, as is common for many other space-borne systems.The filtering procedure used to remove outliers from the GEDI and ICESat-2 data are described below in section 2.3.1.

Global digital elevation models
A DEM is a digital data product organized in a regular raster, providing terrain height information for each raster pixel.DEM data is used by many disciplines as it contains altitude information on geographical structures including mountains, valleys and plains.For this reason, numerous DEM products have been produced on both local and global scale.In our study, we utilized three different GDEMs.These are AW3D30 (Version 3.2), ASTER GDEM (Version 3), and SRTM (Version 3.0) (Table 3).

Global land use/land cover
Land Cover data is a data product that presents the different types of physical classes on the Earth's surface in raster format.In our study, land cover data were used to investigate the effect of land cover type on the accuracy of our improved DEM products.In this context, global Land Use/ Land Cover (LULC) data, developed by Impact Observatory, Microsoft, and Esri were used.LULC data was derived from Sentinel-2 satellite images using deep learning.The LULC data has a resolution of 10 m and the data was acquired between 2017 and 2021, available free of charge (Karra et al. 2021).In our study, we used the land cover data of 2020 after resampling it to a 30 m resolution.The land cover classes in the study areas and their surface coverage rates are given in Table 4.

Methodology
In selecting the five test areas, we aimed to ensure a broad representation of slope regimes and land cover classes that could impact DEM quality before and after the correction of the GDEM data (Figure 2).Despite the significant differences between GEDI and ICESat-2, their common objective is to determine topography.Therefore, data from both systems were combined to maximize the number of space-borne LiDAR altimeter points in the different test areas.AW3D30, Aster GDEM and SRTM data, being freely available and widely used in the literature, were chosen for improvement.Three different machine learning methods were applied to improve GDEMs data with spaceborne LiDAR altimeter data, as elaborated in the following sections.In this study, DEMs produced from airborne LiDAR were used for ground truth data for three test areas, while Global Navigation Satellite System (GNSS) data was used for the remaining two areas.
In addition to height comparisons across all these areas, detailed analyzes were performed to illustrate the extent to which errors vary based on land use and slope (Figure 2).

Data filtering and preparation of training data
A series of filtering steps, as described in the following sections, were applied before using the GEDI and ICESat-2 data as training data, followed by vertical datum correction.The GDEM data in this study use the EGM96 vertical datum, while the GEDI and ICESat-2 data use the WGS84 datum.For this reason, we used point-based undulation values for EGM96 from The International Center for Global Earth Models (ICGEM) for satellite LiDAR altimeter data (Ince et al. 2019).Subsequently, we applied a reduction of our ICESat-2 and GEDI data to the same vertical datum as the GDEMs data with the help of Equation ( 1); The point densities of GEDI and ICESat-2 data for the different test areas are given in Table 5, while the spatial distribution of the spaceborne LiDAR data is visualized in Figure 3.After filtering and preparing the data, the collocated height information acquired from GDEMs, GEDI and ICESat-2 was used as training data for CNN, ANN and XGBoost, referencing the coordinates of GEDI and ICESat-2.
2.3.1.1.Filtering GEDI data.In this study, we downloaded GEDI Level 2a data and applied four filter steps to remove outliers in the data.This filtering includes recommendations and thresholds from Chapter 6 of the GEDI User Guide (Dubayah et al. 2021).Despite applying these recommended processing steps, it was observed that some outliers were still present.Therefore, we implemented an additional DEM-based threshold (Shang et al. 2022).The filtering steps applied to GEDI data are given below, 1.Firstly, high-quality data was selected from all downloaded GEDI data (Quality_flag = 1).2. Secondly, only night data was chosen to eliminate the negative effect of sun rays (solar_elevation < 0). 3. Thirdly, a threshold was applied to data according to sensitivity for densely forested and urban areas (Sensitivity > 0.9).4. Finally, a DEM-based threshold was applied using SRTM data.(16 m > Difference (SRTM-GEDI) > −16 m) 2.3.1.2.Filtering ICESat-2 data.In our study, we downloaded ICESat-2 ATL08 data from openaltimetry.org (Khalsa et al. 2022).The filtering steps applied to ICESat-2 data are given below, 1.Firstly, Strong beams of ICESat-2 were selected.2. Secondly, the same DEM-based threshold as for GEDI was applied using the SRTM data.

Improvement analysis
For improving GDEMs, three different machine learning methods were used.Machine learning techniques are systems that facilitate learning different data by providing correct data to the system without requiring a specific mathematical function.Essentially, they operate similarly to how the human brain learns data.One of them is XGBoosting, a decision tree based method, while the other two are ANN and CNN, which are both deep learning techniques.All analyzes are performed in the R programming language (R Core Team 2020).KERAS (Allaire and Chollet 2022) and CARET (Kuhn 2022) libraries are used for implementing the CNN, ANN and XGBoosting methods.Latitude, longitude and height information from ICESat-2, GEDI and GDEMs data were used as input, and improved height information was obtained as output (improved GDEM).General information about the used parameters, and parameter variables are given in the following sections.).We used a one-dimensional CNN architecture (DataTechNotes n.d) to improve GDEMs.The CNN architecture consists of an input, a 1d convolutional layer, a flatten layer, and an output.For the 1d convolutional layer, the variables filters = 1024, kernel size = 1, and activation function = relu were selected.The parameters epochs = 100, batch size = 16 and verbose = 0 were selected to improve GDEMs to make them fit according to the CNN architecture.In addition, loss functions were evaluated after the training for assessing any over-or under-fitting.

Artificial neural network (ANN).
An ANN was used as a different method in our study (Geeksforgeeks 2020).The ANN architecture consists of an input layer, two hidden layers, and an output layer.Before training the data, the height values were normalized between 0-1 according to the min-max method.3 neurons are used for the 1st hidden layer and 2 neurons are used for the 2nd hidden layer.The activation function selected is Sigmoid.

Extreme gradient boosting (XGBoosting).
XGBoosting is a decision tree based algorithms, and unlike other methods in the study, it is a predictive method (R-bloggers 2020).In our study, the XGBoosting variables are as follows: max depth = 3, nrounds = 2500.

Ground truth data
In our study, we used ground truth data based on 2 different data collection techniques.DEM data generated from Airborne Laser Scanning is used for Test area-1, Test area-2, and Test area-3.Information on the DEM data, which is open access, is given in Table 6.
The reference data for Test area-4 and Test area-5 areas is GNSS data.These data are collected by the Turkish General Directorate of Mapping (GDM) for ground gravity condensation studies.The data were collected at 2-3 km intervals and the spatial distribution of the data is given in Figure 4.The data were measured with sub-centimeter accuracy horizontally and sub-decimeter vertically.The height values of the data have been converted to the EGM96 datum.

Accuracy assessment
In the accuracy analysis, ALS and GNSS data were employed as references to evaluate the accuracy of the improved data derived from GDEMs (AW3D30, ASTER GDEM, and SRTM).Moreover, the  accuracy of the ICESat-2 and GEDI data thoroughly examined for the first three test areas.The metrics used in the study are outlined below. (5) where n is the number of Ground truth data, g is the elevation provided by the ground truth data, and t is the space-borne elevation data, which consist of the raw GDEM data or the GDEM data improved by GEDI and ICESat-2, depending on the scenario considered.The correlation coefficient R 2 is not reported, as it is consistently exceeded 0.99, therefore, did not provide additional informative value.

Accuracy assessment of spaceborne altimetry system data
Firstly, we examined the accuracy of the ICESat-2 and GEDI data in the first three test areas using ALS DEM data and various metrics.Since our ground truth data was GNSS in Test area-4 and Test area-5, these areas could not be examined.Then, we produced error histograms for the data (Figure 5).Examination of the RMSEs reveals the best result is 6.48 m in Test area-3 with ICE-Sat-2 data, while the least favorable result is 11.29 m with GEDI in Test area-3 (refer to Table S1-1 in Supplementary Material S1).
When considering the NRMSEs, GEDI demonstrated the highest accuracy, achieving 2.3% in Test area-1, whereas it yielded the poorest result, reaching 11.29% in Test area-3 (refer to Table S1-1 in Supplementary Material S1).In terms of NMAD, GEDI exhibited the most favorable outcome in Test area-1, while ICESat-2 presented the least favorable result in Test area-3.Overall, the GEDI data consistently produced the least satisfactory results in Test area-3.In our study, comparisons were conducted based on different land slope groups.The findings of our investigation align with those reported in the literature (Adam et al. 2020;Liu, Cheng, and Chen 2021;Quirós et al., 2021), revealing an increase in errors as the slope steepens (Figure 6).Notably, both GEDI and ICESat-2 data exhibit similar sensitivity to slope variations (refer to Table S1-3 in Supplementary Material S1).Specifically, the slope group of 15-30% and above (refer to Table S1-3 in Supplementary Material S1) performs less optimally compared to the overall accuracy (refer to Table S1-1 in Supplementary Material S1).In Test area-3, GEDI data demonstrates consistent results across all slope groups (refer to Table S1-3 in Supplementary Material S1).
When examining the results based on land cover types, it is evident that water yielded the most accurate results for both ICESat-2 and GEDI data in Test area-1 (ICESat-2 = 1.4 m, GEDI = 2.4 m according to MAE) as depicted in Figure 7.However, it is essential to note that the water class  comprises limited number of points (refer to Table S1-2 in Supplementary Material S1), consequently exerting minimal influence on the overall accuracy.In Test area-1, GEDI exhibited the least favorable results for the bare ground class (RMSE = 9.57 m), representing the poorest performance among all classes.On the other hand, ICESat-2 yielded the least favorable outcome in the rangeland class (RMSE = 8.01 m).The crop class yielded the best result for both data sets (ICESat-2 = 2.23 m, GEDI = 3.93 m according to RMSE) as illustrated in Figure 7 for Test area-2.Conversely, the Trees class reported the least favorable results (ICESat-2 = 7.47 m, GEDI = 8.5 m according to RMSE) (Figure 7).In Test area-3, the water class once again demonstrated the best results for both datasets (ICESat-2 = 0.52 m, GEDI = 10.74 m according to RMSE), while the Trees class yielded the least favorable results (ICESat-2 = 6.93 m, GEDI = 13.4 m according to RMSE) (Figure 7).It is concluded that GEDI and ICESat-2 data are influenced in a similar manner by land cover types (refer to Table S1-2 in Supplementary Material S1).
In Comparison to other test areas, Test area-3 exhibits a higher coverage of rangeland and bare ground (Figure 1).Notably, the forest class yielded favorable results compared to the other classes, corroborating the findings of Liu, Cheng, and Chen (2021) and Urbazaev et al. (2022).Conversely our study aligns with Liu et al.'s (2021) research, indicating that crops, delivered the most favorable outcomes.The results in the Water class gave bad results according to the studies in the literature.This is because the water height determination studies in the literature are monitored with different methodologies (Narin and Abdikan 2023;Xiang et al. 2021).

Overall accuracy assessment of GDEMs for raw and improved
In this section, we conducted a comparative analysis of the both the downloaded GDEMs (Raw-GDEMs) and the improved GDEMs (Imp-GDEMs) in terms of height.Examination of the Raw-GDEMs for Test area-1 reveals a predominantly negative error distribution (Figure 8).Among the raw data, AW3D30 yielded the best results with an RMSE of 11.7 m, while ASTER GDEM data exhibited the worst results with an RMSE of 16.89 m.Notably, in terms of overall accuracy, the RMSE results of ICESat-2 and GEDI data (refer to Table S1-1 in Supplementary Material S1) outperformed those of the Raw Data (refer to Table S1-4 in Supplementary Material S1).Moreover, the error distribution of the ICESat-2 and GEDI data displays a positive trend (Figure 5), in contrast to the negative distribution observed for Raw-GDEMs (Figure 8).Upon the examination of the improved GDEMs in Figure 8, it is evident that errors of the improved DEM data have transitioned from the negative to the positive.A study by Chen, Yang, and Li (2020), which utilized only SRTM and ICESat/GLAS data, similarly demonstrated a shift from negative to positive error distribution.Additionally, an increase in accuracy is observed across all methods for Test area-1, as indicated by the RMSE results (refer to Table S1-4 in Supplementary Material S1).The highest accuracy improvement occurred in the AW3D30 data with CNN method, decreasing from 11.7 to 10.46 m according to the RMSE.Notably, in Test area-1, the most substantial accuracy enhancement was observed in SRTM data employing to the CNN method, reducing from 14.55 m to 11.81 m according to RMSE.
For Test area-2, it can be seen that the errors of Raw data are negative, similar to Test area-1 (Figure 9).Among the Raw-GDEMs, the best result is obtained with AW3D30 data (RMSE = 13.44 m) and the worst result is obtained with ASTER GDEM (17.61 m) as in Test area-1.The results from ICESat-2 and GEDI data in Test area-2 (refer to Table S1-1 in Supplementary Material S1) are better than those from Raw-GDEMs (refer to Table S1-4 in Supplementary Material S1). Figure 9 shows that errors in Imp GDEMs errors are more concentrated in the region close to zero.Additionally, the CNN method improves AW3D30 data best in this area, reducing the error from 13.44 m to 10.71 m. (refer to Table S1-4 in Supplementary Material S1).Results for Test area-2 results are generally similar to Test area-1, but accuracy in Test area-2 appears to increase more for each Raw GDEMs (refer to Table S1-4 in Supplementary Material S1).
Except for ASTER GDEM, the errors for Raw-GDEMs in Test area-3 predominantly exhibit a negative (Figure 10).SRTM produced most favorable Raw-GDEM result in this area with an RMSE of 6.63 m, whereas Aster GDEM reported the least favorable outcome with an RMSE of 16.6 m.Only ASTER GDEM data in Imp-GDEMs demonstrated improvements across all methods, and the CNN technique yielded the best result with RMSE ranging from 16.6 to 12.25 m.No improvement was observed when processing the AW3D30 data with the ANN method or the SRTM data with the CNN method in this region (refer to Table S1-4 in Supplementary Material S1).Unlike the other four test areas, no improvement was evident in this area.One contributing factor maybe the lower accuracy of GEDI data in Test area-3 (refer to Table S1-1 in Supplementary Material S1) compared to Raw GDEMs (refer to Table S1-4 in Supplementary Material S1), which could not be improved.Additionally, Test area-3 represents the region with the lowest point density (Table 5).Besides, in this study area, GEDI exhibits an error distribution with negative values in this region (Figure 5).Given that SRTM and AW3D30 data are concentrated in a region close to 0, this contributes to a negative-sided error distribution of the data (Figure 10).However, since the ASTER GDEM data exhibits a positive error distribution (Figure 10), improvements in the data are observed.These conditions constrained us from refining the data in this test area in other GDEM data except for ASTER GDEM.
The error distribution for Test area-4 appears to be negative except for ASTER GDEM (Figure 11).ASTER GDEM yielded the best results among Raw-GDEM in Test area-4 with an RMSE of 5.64 m.Following the refinement of the processing of the data, SRTM produced the most favorable results with the ANN method.The accuracy improved from 6.45 to 3.72 m according to RMSE (refer to Table S1-4 in Supplementary Material S1).In addition, it was observed that the error distribution was concentrated in areas close to zero after the data improved (Figure 11).The best results for AW3D30 data in this area were achieved with the XGBoosting method (refer to Table S1-4 in Supplementary Material S1).
It is observed that the error distribution for all data in Test area-5 is negative (Figure 12).For RAW data in Test area-5, AW3D30 yielded the best result with an RMSE of 5.47 m, while ASTER GDEM reported the least favorable outcome with an RMSE of 7.74 m (refer to Table S1-4 in Supplementary Material S1).Alganci, Besol, and Sertel (2018) conducted a study on the west side of Istanbul, Turkey, and obtained results similar to the findings of our study.Consistent with the literature, AW3D30 produced the best results, and ASTER GDEM exhibited the least favorable outcomes in our study.However, it is important to note that the error rates are higher due to the distinct characteristics of the study areas (Alganci, Besol, and Sertel 2018;Apeh et al. 2019;Santillan and Makinano-Santillan 2016).Notably, the AW3D30 data demonstrated the most significant improvement over the modeled data (refer to Table S1-4 in Supplementary Material S1).
When we compare these methods, it is evident that the CNN method outperforms the other methods in Test areas 1, 2, and 3 (Figure 13).Specifically, in Test area-1, the SRTM data exhibited an improved by 18.8% with the CNN method.In Test area-2, the AW3D30 data demonstrated a 20.31% improvement.For Test area-3, ASTER improved GDEM data by 26.2%.In Test area-4 the SRTM data was notably improved by 42.3% with the ANN method, and in Test area-5 CNN method resulted in a 14.08% improvement for ASTER GDEM data.

Land cover based assessment
In this section, we conducted a comparison of the land cover results of Raw-GDEMs and improved GDEMs in terms of RMSE.In Test area-1, comprising 5 different classes, the tree class constitutes the largest area, accounting for 76% (Table 4).The land cover class with the lowest RMSE was the crops, while the worst results observed in areas covered by trees (Figure 14).Utilizing the CNN method, the RMSE of AW3D30 data increased from 12.55 to 9.71 m in the trees class (refer to Table S1-5 in Supplementary Material S1).Moving to Test area-2, which consists of 6 classes with the tree class having the largest area at 90% (Table 4), the crops class exhibited the lowest RMSE, while the poorest results were noted in tree class covered areas (Figure 14).With the CNN method, the RMSE of the AW3D30 data for the trees class in Test area 2 was improved from 13.85 to 10.96 m (refer to Table S1-5 in Supplementary Material S1).In Test area-3, featuring 5 classes with the rangeland class as the largest at 42% (Table 4), the accuracy of ASTER GDEM data increased from 16.54 to 11.82 m in the rangeland class with the CNN method (refer to Table S1-5 in Supplementary Material S1).Test area-4, comprising 6 classes with built class representing the largest at 59% (Table 4), saw the RMSE of SRTM data increase from 6.59 to 4.02 m in built area class with the ANN method (refer to Table S1-5 in Supplementary Material S1).Lastly, in Test area-5, which includes 7 classes with the built area as the largest at 44% (Table 4), the accuracy of ASTER GDEM data in the built area class improved from 7.52 to 6.35 m with the CNN method (refer to Table S1-5 in Supplementary Material S1).
In the land cover-based comparison, it is evident that the improvements are more successful in classes where the accuracy of the ICESat-2 and GEDI data is high.Although there is an improvement in water areas in Test area-2 (refer to Table S1-5 in Supplementary Material S1), the methodology we employed has limitations in water areas.This limitation arises because the water surface is relatively flat and uniform in height across all points.In our method, since we are trying to improve each pixel of the raw data of the GDEMs, the error rate increases in the water areas.The most notable improvements occurred in the Trees and Built area classes across all test areas.This aligns with findings in the literature, where the forest class tends to yield less favorable results compared to other classes, (Santillan and Makinano-Santillan 2016;Uuemaa et al. 2020).Consequently, our method, coupled with ICESat-2 and GEDI data, offers a viable means for terrain determination in forested and built areas.

Slope-based assessment
It is observed that the error increases with an increase in slope across all study areas and types of data (refer to Table S1-6 in Supplementary Material S1).Examining the improvement rates of the data, higher improvements are particularly noticeable after the 9-15% range (Figure 15).The most significant improvement for Test area-1 is observed in the SRTM data (refer to Table S1-6 in Supplementary Material S1).Notably, the methodology has shown substantial improvement, especially in areas with a 9-15% slope and higher (Figure 14).Improvement trends are also evident across all slope groups and data types in Test area-2 (Figure 15).In 15-30% slope category, representing the highest slope group in Test area-2, the RMSE of AW3D30 decreased from 13.87 to 10.79 m (refer to Table S1-6 in Supplementary Material S1).In Test area-3, significant improvements are observed in all slope groups except the 0-3% slope group in the ASTER GDEM data (refer to Table S1-6 in Supplementary Material S1).Particularly at the 15-30% slope, where the highest slope group is located, the RMSE of ASTER GDEM decreased from 17.17 to 12.12 m.For Test area-4, considerable improvements are noted in all areas except SRTM data 30-60% slope category (refer to Table S1-6 in Supplementary Material S1).Similarly significant improvements are observed in all slope groups, except 30-60%, in all data in Test area-5 (refer to Table S1-6 in Supplementary Material S1).In general, the results regarding the slope of Raw-GDEMs data align with findings from other studies in the literature (Shetty et al. 2022;Uuemaa et al. 2020).Additionally, the methodology employed in this study provided notable improvements.

Conclusion
In this study, a method is proposed to enhance the vertical accuracy of GDEMs using new generation LiDAR altimeter data from GEDI and ICESat-2.The effectiveness of this method has been evaluated in five different areas using various accuracy metrics and ground truth data.Upon examination of the results, among the Raw-GDEMs, Test area-1 (RMSE = 11.7 m), Test area-2 (RMSE = 10.71 m), and Test area-5 (RMSE = 5.47 m) yield the best results with AW3D30 data.Conversely, Test area-3 (RMSE = 6.63 m) and Test area-4 (RMSE = 5.64 m) exhibit the best results with SRTM and ASTER GDEM, respectively.
GEDI consistently provides more data in all areas compared to ICESat-2 data.The integration of ICESat-2 and GEDI data resulted in improvements for all other datasets, except SRTM in Test area-3.Upon examination of different methods, the CNN method emerged as the most successful.Notably, the most significant success was observed in the trees class.Furthermore, the method demonstrated better results in areas characterized by high slopes.
In the literature, high errors are commonly observed in GDEMs data in areas with high slopes and forested regions.However, in this study, improvements were achieved in all study areas, including both high slope groups and in forested areas.Narin and Gullu (2023) stated that DEM production using GEDI and ICESat-2 data is not currently possible due to data scarcity and that DEM improve would be a better option.The findings of this study suggest a significant correlation between data improvement and the integration of ICESat-2 and GEDI data.As the number of points and data accuracy increase, it is anticipated that new models can provide even better results.Further studies may consider conducting specialized investigations to determine the terrain specifics exclusively for forest areas.

Figure 1 .
Figure 1.Left column: location of the five test areas; middle column: their relief; right column: land use maps.

Figure 4 .
Figure 4. Spatial distribution of the GNSS elevation data for Test area-4 and Test area-5.

Figure 7 .
Figure 7. RMSEs of ICESat-2 and GEDI data according to land cover types.

Figure 13 .
Figure 13.Improvement rate of methods in each area (%).

Figure 14 .
Figure 14.RMSE of Raw-GDEMs and Imp-GDEMs data according to land cover classes.

Figure 15 .
Figure 15.RMSE of Raw-GDEMs and Imp-GDEMs data according to slope groups.

Table 2 .
Spatial Information on the five test areas.

Table 3 .
Specifications of the GDEMs used in this study.

Table 4 .
Land cover information of the five test areas.
Figure2.The workflow of GDEM improvement using GEDI and ICESat-2 data.

Table 5 .
The point densities of the ICESat-2 and GEDI datasets.

Table 6 .
Information on the ALS derived DEM data used as reference data for Test areas 1-3.
*Above Ground Level.