Evaluation of the HadISST1 and NSIDC 1850 onward sea ice datasets with a focus on the Barents-Kara seas

In recent years, long-term continuous sea-ice datasets have been developed, and they cover the periods before and after the satellite era. How these datasets differ from one another before the sate...


Introduction
Sea-ice data are of primary importance for understanding climate variability and change. During the past several decades, Arctic warming has been at least twice the global average (Blunden and Arndt 2012). One crucial factor for this amplified Arctic warming is the positive feedback between sea-ice reduction and warming. Physically, sea ice not only blocks solar radiation into the upper ocean but also affects the energy and vapor exchange between the atmospheric and oceanic surfaces Li et al. 2018). Furthermore, sea ice plays an important role in midlatitude weather and climate (Yang, Xie, and Huang 1994;Huang and Gao 1999;Wu, Su, and Zhang 2011;Li and Wang 2013;Guo et al. 2014;Gao et al. 2015;Zuo et al. 2016;Wu, Yang, and Francis 2016). However, there are still uncertainties regarding the effect of sea ice on climate, because the strong internal variability of the atmosphere at the mid-high latitudes may obscure the effects of sea ice (Walsh 2014;Overland et al. 2015). To understand the impact of sea ice more thoroughly and reduce the uncertainty, a longer sea-ice dataset is necessary.
However, reliable sea-ice data were unavailable until 1979, when satellite observations began. Thus, various sea-ice datasets were extended back to the 19th century and continued to the 21st century. One of the most often used is the sea-ice concentration dataset from the UK Met Office's Hadley Centre (HadISST1; Rayner et al. (2003); hereafter, 'Hadley dataset'). It has a horizontal resolution of 1.0°× 1.0°and a time span from 1870 to the present day. HadISST2 is an updated version of HadISST1, constructed by Titchner and Rayner (2014). Another sea-ice dataset is the Gridded Monthly Sea Ice Extent and Concentration dataset (SIBT1850). This dataset is from the National Snow and Ice Data Center (NSIDC) (Walsh, Chapman, and Fetterer 2015). Although the NSIDC provides a great number of different datasets, in this letter, 'the NSIDC dataset' refers to the SIBT1850 dataset. In comparison with the Hadley dataset, the NSIDC dataset has a finer horizontal resolution of 0.25°× 0.25°and spans from 1850 to the end of 2013. Additionally, the dataset has more sources (14 in total), such as whaling ship reports. Every source is represented by a specific number. The method used to merge the data sources is based on a ranking hierarchy, where higher numbers outrank lower ones. Each of the potential sources for a sea-ice concentration value at a particular location is given a rank with a specific number. How these datasets differ from one another, and whether one is more reliable than the other, is important but unclear, because the sea-ice record prior to 1979 is sparse and not continuous. Evaluating the quality of these two datasets constitutes the primary aim of this study.
Evaluation will be conducted only for the period after 1958. This is because, prior to 1958 (the first international geophysical year), atmospheric variables upon which the reconstructed sea-ice extent (SIE) is based had a lack of systematic in-situ observations in the polar regions. Particular attention is given to the two sub-periods before and after the satellite era: 1958-78 and 1979-2013, respectively. Winter sea ice in the Barents and Kara seas (BKS) is studied, because the sea ice in this region is more active and more closely related to climate anomalies (Wu, Huang, and Gao 1999;Sorokina et al. 2016;).

Preliminary comparison of sea-ice variability
Before evaluating the two datasets, we conduct a preliminary comparison of their sea-ice variability. Using a bilinear interpolation method, the data obtained from the NSIDC dataset is interpolated from 0.25°× 0.25°into 1°× 1°, which is the same resolution as the Hadley dataset. The SIE is defined as the sum of the area where the sea-ice concentration is above 15% for a separated grid point. Winter refers to December through February; for example, the winter of 1979 refers to December 1978 through February 1979. Figure 1 displays the historical evolution of the winter mean BKS SIE. Here, the BKS region refers to the domain shown as the green polygon in Figure 2(e) (70. 5°-81.5°N, 15.5°-90.5°E). For the period of 1958-2013, the two datasets are overall consistent (Figure 1), with a high correlation coefficient of 0.91 (Table 1). This consistency is also observed in their standard deviation (not shown). However, when the period is separated into two sub-periods, before and after 1979, there are obvious differences between the two datasets.
For the satellite era after 1979, the consistency between the two datasets is most evident. From the interannual evolution of BKS SIE (Figure 1), the two datasets share the same years with more (less) sea ice in 2006 and 2010 (2007 and 2012). This high consistency is emphasized by the high correlation coefficient (0.95, Table 1). The standard deviation (Figure 2(a,b)) displays similar maxima in sea-ice interannual variability in the BKS, Greenland Sea, Labrador Sea, Bering Sea and Okhotsk Sea.
For the period 1958-78, the consistency reduces substantially. First, the correlation coefficient in BKS SIE between the two datasets reduces to 0.64 from 0.95 (Table 1). Second, the standard deviation of the winter monthly sea ice in the Northern Hemisphere exhibits a visually distinct difference over the Okhotsk Sea. This observation suggests a difference and uncertainty between the two datasets for the period prior to 1979. To exclude the potential contribution from the linear trend, we recalculated the correlation of BKS SIE for the detrended data and found the same result. Thus, the difference between the two datasets during the period prior to 1979 is not a result of the different trends in the datasets.

Proxy for BKS sea ice
In the above section we illustrate the inconsistency in BKS sea-ice variability prior to 1979 between the two datasets. Which dataset is more reliable is an important issue. Here, we develop a reconstructed SIE based on the idea that the sea-ice variation in BKS is not isolated but closely related to surface air temperature (SAT) at adjacent islands. Also, the SAT record over land has a much longer time span and greater reliability.
First, sea ice is impacted by atmospheric circulation. It also has feedbacks on the atmosphere inducing the SAT anomaly over the adjacent regions (Sorteberg and Kvingedal 2006;Deser and Teng 2008;Zhang et al. 2008;Overland, Wood, and Wang 2011;Wu, Overland, and D'Arrigo 2012;Luo et al. 2016). In other words, a correlation exists between sea ice and the SAT anomaly in the ice-atmospheric interaction regions. Second, oceanic flow processes can also cause a correlation between the sea ice and the overlying atmosphere. For example, the sea-ice anomaly can act on the overlying atmosphere in a larger domain because of its larger heat content and longer persistence relative to the atmosphere (Wu et al. 2013), and the oceanic heat transport influencing sea-ice variation usually leads to warmer SAT over adjacent lands (Schlichtholz 2011;Pavlova, Pavlov, and Gerland 2014). The correlation of sea ice with SAT in adjacent lands provides a physical basis for using adjacent land SAT as a proxy of sea ice.
In Figure 3, simultaneous correlations between the winter BKS SIE and some atmospheric variables after 1979 are shown. Here, the BKS SIE is derived from the NSIDC data. Also, the SAT is from the version 3.  which has a horizontal resolution of 0.5°× 0.5°and a time span of 1900 through 2015. The 10-m above ground wind, sea level pressure (SLP) and 500-hPa geopotential height (Z500) at 1.25°horizontal resolution are from the JRA-55 dataset (Kobayashi et al. 2015). Figure 3(a) indicates that BKS SIE is substantially negatively correlated with the SAT at the adjacent islandsnamely, Novaya Zemlya and Franz Josef Land (yellow polygon in Figure 2(e)). Two factors may explain this negative correlation. First, the SLP anomalies corresponding to increased BKS sea ice are negative (Figure 3(b)), and the anomalous northerly wind in the northwestern region of the anomalous low-pressure zone transports cold polar air southward and induces colder SAT over the islands. The negative Z500 anomaly overlaps with the anomalous low, suggesting a barotropic atmospheric circulation anomaly (Figure 3(c)). A similar connection between BKS sea ice and the Z500 has been observed in previous studies (Luo et al. 2016;Sorokina et al. 2016). Second, the anomalous northerly tends to reduce the climatological surface southerly and causes colder SAT (Figure 3(d)). Therefore, the coherence of BKS sea ice with SAT at the islands of Novaya Zemlya and Franz Josef Land may be physically reasonable. Table 2 shows the high correlation coefficients between BKS SIE and SAT over the Novaya Zemlya and Franz Josef Land after 1979. The coherence of BKS sea ice with SAT over adjacent lands and the physical basis for this coherence is also seen in the historical runs from 21 coupled models (Table 3  series by linking the individual model results into one single long series. From Figure S1(a), BKS SIE is negatively correlated with the SAT over the Arctic and Eurasian high latitudes, similar to the observations (Figure 3(a)). The correlation of SLP also bears some similarity to the observations (cf. Figures S1(b) and 3(b)). There is an east-west dipole in the highlatitude region, with negative correlation in Eurasia but positive correlation in North America, although the negative center over the Eurasian high latitudes shifts somewhat southward. The islands mentioned above are governed by the anomalous northerly wind on the east side of the anomalous high extending from eastern North America to Greenland and causing a southward transport of cold polar air that results in a colder SAT. The Z500 anomalies at high latitudes ( Figure S1(c)) also appear similar to the observations, although less significant over Eurasia.
The above analysis again suggests that the correlation between BKS sea ice and the SAT over Novaya Zemlya and Franz Josef Land is physically reasonable. Thus, the domain-averaged SAT over Novaya Zemlya and Franz Josef Land (70°-82°N, 44°-70°E) can be used as a proxy to reconstruct the BKS SIE. Below, we further verify this point by comparing the proxy with the SIE in the observed dataset and the CMIP5 historical runs.
The winter mean SAT over Novaya Zemlya and Franz Josef Land is calculated from the CRU's observational global land SAT dataset or the CMIP5 models. Similarly, the BKS SIE can easily be derived. A substantially negative correlation between the BKS SIE and the domainaveraged SAT is seen in the two ice datasets (Table 2) and nearly all of the models (Table 3). The correlation coefficients of the domain-averaged SAT with the BKS SIE in the two observational datasets are −0.74 and −0.78 during the satellite era . Additionally, the correlation coefficients in more than two-thirds of the CMIP5 models (16 of 21 models) is less than −0.6. For 19 models (all models except FGOALS-g2 and IPSL-CM5A-LR), the correlation coefficients are smaller than −0.32, meaning that the models are above the 90% confidence level. When the analysis period for the models is extended backward to 1960, the significant negative correlation in most of the models remains.
Thus, we used the proxy to establish a reconstructed BKS SIE by using linear regression as follows: (1) Here, y is the reconstructed BKS SIE, and x is the domain-averaged SAT (proxy). By using least-squares fitting, the coefficients a and b are calculated based on the observed BKS SIE and the proxy during the period 1979-2013, and have the values −0.035 and 0.456, respectively. As a validation, the SAT-based reconstructed BKS SIE using the regression model is compared with the observed sea ice for 1979-2013. Calculation suggests that the reconstructed SIE correlates well with the SIE in the two datasets, with correlation coefficients of 0.74 and 0.78 (Table 4), respectively. From Figure 4(b), the reconstructed SIE sufficiently captures the variation of the observed SIE. This agreement indicates that the SAT-based reconstructed SIE is an appropriate representation of the sea ice. Because of the greater reliability and the longer time span of the SAT data than those of the sea-ice data prior to 1979, the proxy provides a valuable approach to evaluate the sea ice prior to the satellite era. In the next section, we use the reconstructed SIE as a benchmark to evaluate the sea ice prior to 1979. Figure 4 compares the BKS SIE in the two datasets with the reconstructed BKS SIE. As seen above, it is unsurprising that the evolution of the SIE prior to 1979 (i.e., 1958-78) in the two sea-ice datasets is different. The Table 2. Correlation coefficients of BKS SIE in the Hadley and NSIDC datasets with the domain-averaged SAT over Novaya Zemlya and Franz Josef Land for three periods. Bracketed is the result after detrending. 1958-78 1979-2013 1958- Tables 2 and 4). This finding suggests that the quality of the sea-ice data from NSIDC is better than that of the data from Hadley. The lower correlation (0.64) of BKS SIE in the Hadley dataset than that in the NSIDC dataset (0.76) with the reconstructed SIE during the whole period from 1958 to 2013 is in agreement with this assessment. Thus, the interannual BKS sea-ice data in the NSIDC data are relatively more reliable. The greater reliability of the NSIDC sea-ice data prior to 1979 is consistent with the standard deviation distribution. As mentioned in section 2, the standard deviation for the period before 1979 in the two datasets exhibits a substantial difference, particularly in the Okhotsk Sea (Figure 2(f)). When comparing the sea-ice standard deviation before and after 1979, the NSIDC data before 1979 not only bear a greater resemblance to themselves but also to the Hadley data after 1979. To some extent, this result further verifies the greater reliability of the NSIDC data before 1979.

Conclusions and discussion
In this letter, the quality of sea ice in the BKS before 1979 in two datasets, one from the UK's Hadley Centre and the other from NSIDC, is investigated. The sea-ice proxy is the average mean of winter (December-January-February) SAT over the islands of Novaya Zemlya and Franz Josef Land. Based on the proxy, the reconstructed sea ice is used as a benchmark. The results suggest that the winter BKS sea-ice quality in the NSIDC data is higher than that in the Hadley data for the period 1958-78, although both datasets are substantially consistent with each other and reasonable after 1979. Here, the quality means the interannual variability of the sea ice. The better quality of the dataset from NSIDC may be related to the data source used and the analog method to fill in temporal gaps. By checking the data sources used in the BKS, we found that both the Walsh and Johnson data (source No. 5) and the Russian Arctic and Antarctic Research Institute (AARI) data (source No. 10) had values. According to the ranking method introduced in the introduction, the AARI data are used instead of the Walsh and Johnson data, which are different from the Hadley data. The analog method is used to fill in temporal gaps in the NSIDC dataset. It is possible that the different methods used to fill temporal gaps may also lead to different results.
The impact of the sea ice in BKS on the atmosphere is still controversial and deserves further study (Wu, Su, and D'Arrigo 2015;Walsh 2014;Kelleher and Screen 2018).  There is a need for a reliable sea-ice dataset that encompasses a long time period. The present study suggests that the sea ice from NSIDC is more appropriate for such studies. Semenov and Latif (2015) demonstrated that winter sea-ice concentrations in BKS show an obvious positive bias from 1966-1969 relative to those during 1971-2000 in the Hadley dataset ( Figure S2(a)). When a similar comparison for the two periods using the NSIDC dataset is conducted, no evident positive bias is seen ( Figure S2(b)). This result suggests that the bias may result from the Hadley dataset itself.
Here, we only choose the SAT at the islands of Novaya Zemlya and Franz Josef Land as the proxy. One may wonder why the SAT over Svalbard is not used, because the SAT there is similarly negatively correlated with the BKS sea ice (Figure 3(a) and S1(a)). There are two arguments for our choice. One is that the climatological southerly component around Novaya Zemlya and Franz Josef Land is stronger, and the SAT over the two islands is influenced more easily by local sea ice, even for the period prior to 1979 when the climatological sea-ice boundary is at a more southern location. The other is that there is less of the sea-ice anomaly east of Svalbard before 1979, and the SAT anomaly caused by the sea ice cannot be easily transported to Svalbard.