Dataset of high temperature extremes over the major land areas of the Belt and Road for 1979-2018

ABSTRACT Hot temperature extremes exert strong negative socioeconomic and environmental impacts worldwide. In this paper, we focus on the major land areas of the Belt and Road (BR) and develop a dataset of high temperature extremes including hot days, hot nights and combined hot extremes using Climate Prediction Center (CPC) gridded daily temperature data at a resolution of 0.5° during 1979–2018. The constant thresholds of daily maximum surface air temperature (Tmax) include 28°C, 30°C, 32°C, 35°C, 38°C, and 40°C, while those of daily minimum surface air temperature (Tmin) consist of 20°C, 25°C, and 30°C. Data of hot days, hot nights and combined hot extreme are produced based on different constant Tmax and Tmin thresholds and their combination. We also adopt the 90th and 95th percentile thresholds of Tmax and Tmin to generate high temperature extreme data. The validation with China Meteorological Administration (CMA) data over eastern China for the period of 1979–2011 indicates that our dataset can generally describe the spatial and temporal variations of high temperature extremes well. Potential applications of this dataset include analyses on the local-to-regional characteristics of high temperature extremes and their impacts on various sectors over the major BR land areas. The dataset is available at http://www.sciencedb.cn/dataSet/handle/904.


Introduction
Our planet is experiencing surface warming at a rapid pace. The mean land surface air temperature of 2006-2015 has increased by 1.53°C with respect to the pre-industrial level, much faster than the global mean warming rate (Intergovernmental Panel on Climate Change [IPCC], 2018[IPCC], , 2019. Under ongoing climate change, extreme heat events become more frequent and more intense (Intergovernmental Panel on Climate Change [IPCC], 2012;Perkins, Alexander, & Nairn, 2012;Seneviratne, Donat, Mueller, & Alexander, 2014). Many record-breaking high temperature events have been recorded worldwide during the last few years, and the rate at which new records are set is projected to further rise in the future (Lehner, Deser, & Sanderson, 2018;Power & Delage, 2019). High temperature extremes can lead to extensive damages to both human society and the ecological environments (Obradovich, Migliorini, Mednick, & Fowler, 2017;Wang et al., 2019;Watts et al., 2018;Wernberg et al., 2012). They may cause human life loss, agricultural yield reduction, water resource scarcity and the enhanced occurrence of wild fires among many others (Gasparrini et al., 2015;Mora et al., 2017;WMO, 2019). Global and regional historical datasets of extreme climate indices developed yet remain limited Caesar, Alexander, & Vose, 2006). For example, Donat et al. (2013) generated the HadEX2 dataset of extreme temperature and precipitation indices at a resolution of 3.75°longitude× 2.5°latitude for the period of 1901-2010.
The Belt and Road Initiative focuses primarily on Asia, Europe and Africa, and welcomes the participation of all partners around the world (Office of the Leading Group for Promoting the Belt and Road Initiative, 2019). The major Belt and Road (BR) land areas have diverse climate conditions and fragile environments, and face increasing threats from climate-related disasters such as extreme heat under global warming (Guo, 2018;Zhang, Zhou, Zou, Zhang, & Chen, 2018). For example, the warming-induced retreat of glaciers on the Tibetan Plateau and other regions of the "third pole" threatens water supplies and ecological well-beings throughout Asia (Gao, Yao, Masson-Delmotte, Steen-Larsen, & Wang, 2019). Extreme heat events have frequently taken place in different parts of the major BR land areas in recent years such as the 2010 Russian heatwave, causing many damages to human health, agricultural productions, and other sectors (China Meteorological Administration, 2014;Dodla, Satyanarayana, & Desamsetti, 2017;Grumm, 2011;Valleron, 2007). The adverse impacts of weather and climate extremes such as hot events are projected to become more frequent and severe over the major BR land areas in the future (Intergovernmental Panel on Climate Change [IPCC], 2014;Nangombe et al., 2018;Zhang, Zhuang, et al., 2018, 2019a and the people living in less developed areas are much more vulnerable to climate-related disasters (Pelling & Garschagen, 2019). High temperature extremes data provide a fundamental basis for understanding their spatial-temporal variations, impacts and risks at various scales.
In this study, we produce a dataset of high temperature extremes over the major BR land areas during 1979-2018 at a spatial resolution of 0.5°. Indices including hot days, hot nights and combined hot extremes are used to describe the frequencies of high temperature extremes. As the absolute temperature thresholds have been commonly used in both previous studies and practical applications (WMO, 2019; Zhang, Yang, Wu, & Yang, 2019b), we focus mainly on the production of high temperature extreme data based on constant temperature thresholds. Humans, animals, plants and so on can acclimatize to the local temperature and other climatic conditions, and therefore we take different constant thresholds of the daily maximum surface air temperature (Tmax), the daily minimum surface air temperature (Tmin), and their combination to define hot days, hot nights and combined hot extremes. In addition, we generate high temperature extreme data based on the 90th and 95th percentile thresholds of Tmax and Tmin. The users can select high temperature extremes with appropriate thresholds according to their specific research and application purposes from the dataset of high temperature extremes over the major BR land areas.

Data source
In this dataset, we define the major BR region as covering the areas of 12°S-66°N in latitude and 18°W-148°E in longitude. The major BR land areas in this study cover most of Asia and Europe, and northern Africa. The Tmax and Tmin data at a spatial resolution of 0.5°for the period of 1979-2018 used to produce the dataset of high temperature extremes were taken from the Climate Prediction Center (CPC) Global Gridded Temperature dataset developed by the National Oceanic and Atmospheric Administration (NOAA) of the United States of America (https://www.esrl.noaa.gov/psd/). The Tmax and Tmin data from 6000-7000 stations across the globe were interpolated through the Shepard algorithm with consideration of orographic effects to develop CPC gridded data (Fan & Van den Dool, 2008;Peterson & Vose, 1997;Ropelewski, Janowiak, & Halpert, 1984).The missing Tmax and Tmin values from the CPC dataset are both less than 0.17% of the total records for the period of 1979-2018. Before the production of the high temperature extremes dataset, any missing Tmax and Tmin data in the CPC dataset are filled using average values of the same days over these years which have data available for 1979-2018.

Dataset of high temperature extremes
The dataset of high temperature extremes includes three sub-datasets of hot days, hot nights and combined hot extremes for the period of 1979-2018. The threshold at which the surface air temperature can cause hazardous conditions varies with locations, and therefore we use different absolute temperature thresholds to describe the indices of high temperature extremes based on previous studies (e.g. Vaghefi et al., 2019;Zhang et al., 2019bZhang et al., , 2011. Hot days are defined as days on which Tmax exceeds or is equal to 28°C, 30°C, 32°C, 35°C, 38°C, or 40°C in a given year. Hot nights describe the days on which Tmin is at least 20°C, 25°C, or 30°C. Combined hot extremes are defined as days on which Tmax and Tmin exceed or are equal to 30°C and 20°C, 35°C and 25°C, or 40°C and 30°C, respectively. Figure 1 is a flow diagram showing the procedure used to produce this dataset of high temperature extremes based on the absolute temperature thresholds. We also develop data of hot days, hot nights and combined hot extremes with the 90th and 95th temperature thresholds of the baseline period 1979-2008 using missing values-filled CPC gridded data.

Case analysis methods
We considered Southwest Asia (15°N-38°N, 40°E-61°E) which is severely influenced by extreme heat events to conduct a case analysis. Empirical Orthogonal Function (EOF) analysis can decompose changes in the variable field into paired spatial and temporal compositions, of which the first few can reflect the major features. (North, Bell, Cahalan, & Moeng, 1982). We adopt the EOF analysis to extract the main space and time components from hot days, hot nights and combined hot extremes defined using the highest constant temperature thresholds over Southwest Asia for the period of 1979-2018. The linear trends of high temperature extremes averaged over Southwest Asia are computed through the least square regression method during 1979-2018. The significance levels of the linear trends are judged based on Student's t-test with consideration of the decrease in degrees of freedom due to the autocorrelations of the high temperature extremes series.

Data records
The domain of the dataset covers the land areas of major BR region, which ranges from 12°S to 66°N in latitude and from 18°W to 148°E in longitude at a spatial resolution of 0.5°. The dataset of high temperature extremes produced at annual scales for 1979-2018 has 12 files in NetCDF format for the absolute temperature thresholds. Each file corresponds to one threshold or threshold combination. For example, the file 1979-2018.OBOR.30hd. nc refers to the sub-dataset of hot days with the Tmax threshold of 30°C, the file 1979-2018.OBOR.20hn.nc represents the data of hot nights with the Tmin threshold of 20°C, and the file 1979-2018.OBOR.3020.nc is data of combined hot extremes with the Tmax and Tmin thresholds of 30°C and 20°C, respectively. The size of each file is 7.91MB. Figure 2 presents mean hot days with 6 thresholds, hot nights with 3 thresholds and combined hot extremes with 3 threshold combinations for the periods of 1979-2018 over major BR land areas. It is seen that the high temperature extremes are sensitive to both the threshold and location.
There are 6 files for high temperature extreme data based on the percentile thresholds. We define hot days using the thresholds of Tmax ≥ 90th percentile and Tmax ≥ 95th percentile, hot nights with the thresholds of Tmin ≥ 90th percentile and Tmin ≥ 95th percentile, and combined hot extremes with the thresholds of Tmax ≥ 90th percentile and Tmin ≥ 90th percentile, and Tmax ≥ 95th percentile and Tmin ≥ 95th percentile. The 90th and 95th percentile thresholds at every grid are computed based on the baseline period 1979-2008 using the missing values-filled CPC gridded data. Figure 3 presents the distributions of mean hot days, hot nights and combined hot extremes based on the 90th and 95th percentile thresholds and their time series averaged over the major BR land areas for the period of 1979-2018. The high temperature extreme dataset produced in this study using both the absolute and percentile thresholds is publicly available via the website: http://www.sciencedb.cn/ dataSet/handle/904. The dataset is based on CPC global gridded temperature data during the period 1979-2018, and it can be updated in the future when new data become available.

Case analysis
Southwest Asia is confronted with severe impacts of extreme high temperature events on human health, agriculture and food security and natural environment, and we take this region (15°N-38°N, 40°E-61°E in this study) to perform the case analysis (Chung et al., 2014;Dosio, Mentaschi, Fischer, & Wyser, 2018;Watts et al., 2018). The highest absolute thresholds of high temperature extremes are used: Tmax ≥40°C for hot days, Tmin ≥ 30°C for hot nights, and Tmax ≥ 40°C and Tmin ≥ 30°C for combined hot extremes. The spatial patterns of the first EOF modes of hot days, hot nights and combined hot extremes for the period of 1979-2018 demonstrate that high temperature extremes vary simultaneously over most areas of Southwest Asia (Figure 4). The corresponding time series of first EOF modes of high temperature extremes all show an increasing trend. The first EOF modes can explain 50.7%, 46.5% and 57.6% of the total variances of hot days, hot nights and combined hot extremes, respectively ( Figure 4). The large interdecadal changes in high temperature extremes around 1997 may be related with the tropical sea surface temperature anomalies, the Atlantic Multidecadal Oscillation and other factors (Arblaster & Alexander, 2012;Gao, Yang, et al., 2019;Kenyon & Hegerl, 2008), which are subject to further analysis in the future. For the second EOF modes of the high temperature extremes which account for 9-22% of the total variances, the patterns show distinct spatial differences and the year-to-year variations of the time coefficients are all strong ( Figure 5).
As shown in Figure 6, there are consistent and significant (> 99% confidence level) increases with magnitudes of 6.42 days/decade, 3.15 days/decade and 3.17 days/decade in hot days, hot nights and combined hot extremes averaged over Southwest Asia (15°N -38°N, 40°E-61°E) for the period of 1979-2018. These results indicate that hot extremes have become more severe in line with previous studies, and thus are more likely to cause heat-related loss over this region (Perkins-Kirkpatrick, Fischer, Angélil, & Gibson, 2017;Perkins-Kirkpatrick & Gibson, 2017;Zhang, Zhou, et al., 2018).

Data validation
Surface air temperature data at a resolution of 0.5°over eastern China are taken from the dataset of Gridded Daily Surface Air Temperature over China Version 2.0 (CGDV2, http://data. cma.cn/data/cdcdetail/dataCode/SURF_CLI_CHN_TEM_DAY_GRID_0.5.html), which is provided by the China Meteorological Administration (CMA) to perform data validation. 1979-2011 is selected as CGDV2 data have no any missing values in this period. Tmax, Tmin and the daily mean surface air temperature (Tmean) from 2472 observational stations were interpolated through the Thin Plate Spline method combined with 3D geospatial information to produce the gridded CGDV2 dataset at a resolution of 0.5° (Hutchinson, 1991). The distribution of observational stations is more uniform and at a much higher density in eastern China than in western China, and the deviation and root-mean-square error are smaller in eastern China. There are much more stations over eastern China adopted in the CGDV2 dataset than in the CPC dataset. We compare hot days with a Tmax threshold of 30°C, hot nights with a Tmin threshold of 20°C and combined hot extremes with Tmax and Tmin thresholds of 30°C and 20°C over eastern China from the dataset of high temperature extremes produced in this study based on the CPC temperature data with the corresponding ones from the CGDV2 dataset for the period of 1979-2011 to evaluate the performance of the developed dataset. The mean hot days, hot nights and combined hot extremes for the period of 1979-2011 from the dataset of high temperature extremes developed in this study agree well to those based on the CGDV2 dataset in spatial distribution pattern which generally exhibit a southnorth gradient. Highest values of hot days, hot nights and combined hot extremes appear over Southeast China. Spatial correlation coefficients of three high temperature extreme indices based on the CGDV2 and CPC datasets are all higher than 0.92. The magnitudes of high temperature extremes based on the two datasets differ to some degree. For example, hot nights based on the CPC dataset are generally larger than those based on the CGDV2 dataset over eastern China (Figure 7). We further calculate the high temperature extremes based on both CGDV2 and CPC datasets averaged over eastern China for 1979-2011( Figure 8). There are consistent variations in each index of high temperature extremes based on the CGDV2 and CPC datasets with very similar standard deviations. Correlation coefficients of high temperature extreme time series based on the two datasets are all higher than 0.97 for 1979-2011. Meanwhile, some biases exist especially for hot nights. In summary, the dataset of high temperature extremes can generally capture the spatial and temporal variations of the high temperature indexes well over eastern China.

Usage notes
As stated earlier, high temperature extremes are sensitive to both the thresholds and locations. Users can select the indices of high temperature extremes with appropriate thresholds based on the research and application objectives. For example, we select hot days with the threshold of Tmax ≥ 30°C, hot nights with the threshold of Tmin ≥ 20°C, and combined hot extremes with the threshold of Tmax ≥ 30°C and Tmin ≥ 20°C over eastern China to perform data validation. High temperature extremes dataset produced in this study is based on the CPC gridded Tmax and Tmin data which were developed using the global observational data. Note that the quality of the CPC gridded data is largely determined by the availability of observational stations. In particular, when this dataset is applied at a local scale, validation with the available station observations is recommended. A recent study demonstrated that the CPC temperature dataset has the best performance among several global gridded datasets over the central north region of Egypt where observational stations are spare (Nashwan, Shahid, & Chung, 2019). The dataset produced in this study can be used to assess the characteristics and various impacts of high temperature extremes at local-to-regional scales in the major BR land areas. When new daily Tmax and Tmin data become available, this dataset can be updated frequently.

Disclosure statement
No potential conflict of interest was reported by the authors.