A hybrid model for the forecasting of sea surface water temperature using the information of air temperature: a case study of the Baltic Sea

ABSTRACT Sea surface temperature (SST) is an important indicator of marine system. In this study, the hybrid physically-statistically based air2water model was modified for the forecasting of SST. The hybrid model combines empiricism and theory, and balances the complexity and accuracy between the process-based physical models and statistical models. Daily observed SST data (2009–2019) from six stations in the Baltic Sea were used for the evaluation of model performance. Two metrics including the root mean squared error (RMSE) and the Nash-Sutcliffe efficiency coefficient (NSE) were used for model assessment. With the increase of air temperature, SST presents a clear warming trend (0.133°C/year–0.166°C/year), and air temperature warms faster than SST in the studied stations. The modelling results indicated that the model performs well for SST forecasting (in the validation period, mean value of RMSE is 1.245°C, and mean value of NSE is 0.961). Cross-validation results showed that the model is transferable in unknown stations. However, the model works a little bit worse in the warm period due to the impact of the upwelling phenomenon. Overall, the model is a promising tool for the prediction of SST.


Introduction
Water temperature (WT) is a key variable controlling many processes in estuaries and oceans. With the impact of climate warming, increasing WT will inevitably affect marine ecosystem. For example, Baird et al. (2019) found that the marine ecosystem (Sylt-Rømø Bight, Wadden Sea) becomes less organised and more dissipative with higher WT; Parravicini et al. (2010) found that thermal anomalies up to 4°C above the climatological mean may lead to mortalities of benthic species of the NW Mediterranean during thermal events. In this regard, having a good knowledge of thermal dynamics in marine systems is of great significance.
Sea surface temperature (SST) is an important index as the surface layer interacts directly with the atmosphere. Surface water temperature is a good indicator to show the impact of climate warming and thus it is always used for climate change studies (e.g. Pei et al., 2017;Piccolroaz et al., 2020Piccolroaz et al., , 2021. It impacts the energy balance of the other layers as well. In order to have a comprehensive understanding of the thermal dynamics in the marine system, it is important to have a good knowledge of SST variations. Mathematical models provide good tools for the forecasting of SST. In the past decades, different categories of models were developed, and they include process-based deterministic models (e.g. Brassington, 2013) and statistical models (e.g. Jahanbakht et al., 2022;Landman & Mason, 2001;Wolff et al., 2020). Process-based deterministic models are based on energy balance, and they are complex by coupling the atmosphere with the ocean system, which require large computational efforts. Additionally, they need a lot of data as model inputs, such as a full set of meteorological variables, flow, wave, etc. Statistical models are simpler, and by developing the relations of SST with the other impact factors (e.g. air temperature, solar radiation, wind, humidity), they can obtain the approximated estimates of SST. However, they lack of clear physical meaning. To compromise between the above two methods, we propose whether it is possible to have a hybrid physically-statistically based model, which has clear physical meaning and is easy to use, especially for the projection of climate change.
Despite the complexity of the impact of various factors on SST (Shaltout, 2019), air temperature is generally a parameter of key importance. Galbraith et al. (2012), by analysing SST in the Gulf of Saint Lawrence, pointed to a strong correlation between SST and air temperature, which is useful in predicting the response of water temperature to changing climate. Positive and high correlations between SST and surface air temperature in the case of the Arabian Sea and Indian Ocean were evidenced in research by Jaswal et al. (2012). An increase in water temperature in all regions of the Baltic Sea is a natural consequence of changes in air temperature and atmospheric circulation (Rukseniene et al., 2017). Moreover, it should be emphasised that data regarding air temperature show a high level of record detail, and are publicly available.
In this study, we simply modified the hybrid air2water model, which was initially developed for the forecasting of lake surface water temperatures (Piccolroaz et al., 2013). Since its first release, it has been widely used and well justified in different regions (e.g. Heddam et al., 2020;Piccolroaz et al., 2018Piccolroaz et al., , 2021Zhu et al., 2021Zhu et al., , 2020. The model is easy to use and accurate, and it needs only the information of air temperature (Heddam et al., 2020;Piccolroaz et al., 2021;Zhu et al., 2021Zhu et al., , 2020. We tested the modified model by using the observed SST data from six gauge stations in the Baltic Sea. Water temperature modelling in the Baltic Sea is relatively limited, with several available studies employing very complex models, such as the coupling of circulation and wave models in Alari et al. (2016). These studies mainly focused on the spatial variation of water temperatures (e.g. vertical profile), thus using these models for the prediction of SST is overqualified. For the projection of climate change, we often need only the information of surface water temperatures (e.g. Piccolroaz et al., 2020Piccolroaz et al., , 2021Woolway et al., 2020). In this regard, investigation of simple method is of great significance, which is the aim of this study.
The rest of this paper is organised as follows: Section 2 describes the study area and the data used for model evaluation, and the hybrid model used in this study. Section 3 introduces the results of trend analysis and model performance. Section 4 discusses the results. The primary conclusions are summarised in Section 5.

Study area and data
The Baltic Sea is an inland shelf sea in North Europe, connected with the North Sea, and further with the World Ocean with a system of the narrow Danish Straits. The surface area of the Baltic Sea is about 415 thousand km 2 , with a relative depth of 52 m and water resources of 21 thousand km 3 . The Baltic Sea catchment has an area of 1.7 million km 2 , accounting for 17% of Europe. A total of 250 rivers flow into the sea, supplying an annual average of 470 km 3 of freshwater. The Baltic Sea is characterised by a low rate of water exchange with the World Ocean. Water exchange with the North Sea can be described as estuarial, i.e. with inflows of heavier near-bottom waters and outflow of lighter freshwaters in the surface zone. The climate of the Southern Baltic Sea is mostly determined by the Atlantic Ocean and the East European continent. Except for the northern direction, land surrounding the Baltic Sea favours free movement of air masses from all directions. Due to this, the area constitutes a zone of very frequent exchanges of air masses, resulting in exceptionally high weather variability.
The data concerning sea water temperature come from observations by the  Table 1).
remaining data (2017-2019) were used for model validation (around 30%). To check the performance of the model in unknown stations, cross-validation is conducted by training the model using data from five stations (No. 1-5) and using data from the remaining one station (No. 6) as validation.

The hybrid air2water model
The air2water model is a hybrid physically-statistically based model that has been developed initially for the forecasting of lake surface water temperatures. It has been widely used in lakes worldwide, including some very large lakes as Great Lakes (Piccolroaz, 2016;Piccolroaz et al., 2013) and Lake Baikal (Toffolon et al., 2014).
It is based on the energy balance equation of the surface layer (here, we modified the equation for SST forecasting): where: ρ s is density of sea water (1025 kg m −3 ), c p is heat capacity of water (4186 J kg −1 °C −1 ), A is the surface area (m 2 ), V s is the surface volume (m 3 ), dSST dt is the variation of sea surface temperature (SST) in time (t: days), and H net is the net heat flux at the air-water interface (J m −2 day −1 ). H net is the summary of several terms: net shortwave radiation, net longwave radiation, latent and sensible heat fluxes, etc. More detailed information about the term H net can be found in Piccolroaz (2016).
By considering air temperature (T a ) as a proxy for the integrated effect of the external forcing following that of Livingstone and Padisák (2007), and using Taylor expansion to expand H net as a linear form of T a and SST, a version with eight parameters can be deduced: where: a 1 to a 8 are parameters that can be determined in the calibration period using trial and error method, and their initial ranges can be determined using the method in Toffolon et al. (2014), δ is a dimensionless ratio and it is a function of SST and a reference water temperature of the deep layer (T h ; see, Equation (3)).
Here, T h can be assumed as 5°C for deep ocean layers (as a comparison, T h is assumed as 4°C for dimictic lakes in the original version for the forecasting of lake surface water temperatures). More detailed information can be found in Piccolroaz (2016). Two metrics are used for the evaluation of model performance: the root mean squared error (RMSE) and the Nash-Sutcliffe efficiency coefficient (NSE; Graf et al., 2019). Detailed descriptions of these two metrics can be found in Graf et al. (2019).

RMSE ¼
ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi 1 N where N is the number of data samples, SST O and SST M are the observed and modelled sea water temperature at time i, and SST AV is the average value of SST O .

Trend analysis
Analysis of the varying trend of annual average SST and T a is shown in Figure 3. As seen, for the three meteorological stations, T a shows clear warming trend with the warming rate ranging from 0.185°C/year to 0.195°C/year. With the increase of T a , SST presents clear warming trend as well. For the six stations, the warming rate of SST varies between 0.133°C/year and 0.166°C/year. Here, we also analysed the thermal sensitivity of SST to T a , which is the ratio of the warming rate of SST and T a . From the results, we found that for all the stations, the thermal sensitivity is below 1.0, and varies between 0.717 and 0.896. The result indicates that T a warms faster than SST, which is also clearly shown in Figure 3.

Evaluation of model performance
The values of the eight parameters are summarised in Table 2. They are determined by trial and error using the objective function (RMSE) by the particle swarm optimiser. These parameter values are in the ranges as reported in the previous studies (e.g. Piccolroaz et al., 2021;Toffolon et al., 2014). For each station, the modelling results (RMSE and NSE) are detailly summarised in Table 3. As seen, in the calibration period, RMSE varies . Generally, the model performs well for SST forecasting, which can also be found in Figure 4 using box-plots.
Here, we also evaluated the RMSE values for each year and the warm period (June-September) separately, and the detailed results are shown in Figure 5. As seen, for each station, RMSE values in the warm period are larger than that of yearly values, indicating that the model works slightly worse in the warm period.
To check the performance of the model in unknown stations, cross-validation is conducted by calibrating the model using data from five stations (No. 1-5) and using data from the remaining one station (No. 6) for validation. RMSE values for the calibration and validation phases are 1.546°C and 1.178°C, respectively. The results indicated that the model is transferable in unknown stations.

Discussion
The result that T a warms faster than surface water temperatures has been reported in a lot of studies (e.g. Levitus et al., 2000;Li et al., 2019;Peng et al., 2021;Piccolroaz et al., 2021). However, in many studies, it is found that T a warms slower than surface water temperatures (e.g. Austin & Colman, 2007;Shearman & Lentz, 2010). For example, Shearman and Lentz (2010)    found that water temperatures along the northeastern U.S. coast presented warming rates that are 1.8-2.5 times the regional atmospheric temperature. In this regard, the thermal sensitivity of SST to T a depends on the studied regions and the local climate.
The methodical approach aimed at precise determination of SST throughout the years continuously evolved. In recent period, research exceeding trend analyses (El-Geziry, 2021; Grbec et al., 2018;Yu et al., 2019) refers to the modelling of its course  (Li, 2021;Patil & Deo, 2017;Yang et al., 2018). The present paper corresponds with this narrative. By comparing with the modelling results of Alari et al. (2016), which used a complex model and the RMSE values for three stations varied between 1.75°C and 2.53°C, the model proposed in this study offers simple and yet precise possibilities of modelling SST in the Baltic Sea. By calibrating and validating the model for each available station of the studied region, we can summarise the results and get the overall SST patterns. Cross-validation results showed that the model can be transferred to the other stations in the studied region by obtaining the relationship between air temperature and SST through model training, which is particularly useful for regions with limited observed data.
The temperature of surface waters of the southern coast of the Baltic Sea primarily depends on solar radiation and thermal properties of air masses, as well as on water circulation. Mean seasonal air temperatures have a considerable effect on seasonal water temperatures, and the correction coefficient is the highest in winter and spring (Girjatowicz & Świątek, 2019). Stramska and Białogrodzka (2015) highlighted considerable interannual SST variability, which is significantly correlated with interannual variability of the air temperature. In the warm season, solar radiation has a dominant effect on thermal conditions of water (Miętus, 1999). In this period, another factor appears, reducing the dependency between SST and air temperature, namely water circulation related to upwelling. Upwelling may be recorded over a period of approximately four months, i.e. during thermal stratification. The phenomenon occurs in the case of strong winds that push warmer surface water replacing them with cooler near-bottom waters from a depth of approximately a dozen metres. In the southern Baltic region, the number of days with upwelling is variable along the coastline, and in particular seasons the phenomenon may not occur at all (Bednorz et al., 2019). Czuchaj et al. (2020), analysing SST for five stations on the southern coast, determined that the highest number of rapid daily decreases in water temperature (at least 5°C) occurred in Kołobrzeg (38) and Władysławowo (26). Krezel et al. (2005) indicated that the Kołobrzeg upwelling region had the largest spatial range (up to 5000 km 2 ). In this context, RMSE error analysis for the warm season on the background of the annual course ( Figure 5) showed certain spatial variability.
The general division resulted from the location of particular stations. The highest values were recorded in those located towards the open sea, i.e. Kołobrzeg, Łeba, and Władysławowo. The remaining ones include two in the protected Puck Bay (Puck and Gdynia), and the third one, the most specific (located on a peninsula) station in Hel. In the case of the latter, there is a possibility of water inflow from deep marine zones, and exchange with the warmer waters of the shallow Puck Bay the thermal regime of which, according to Klekot (1980), changes very rapidly depending on the effect of air temperature. The physical properties of the sea in the coastal zone depend on various factors, including among others freshwater inflow mentioned by Fernández-Nóvoa et al. (2021). In this context, the role of the Vistula River may be important (second largest river in the basin, average discharge of 1000 m 3 s −1 ). Its temperature strongly depends on air temperature (Ptak et al., 2022). Such a situation may affect SST in the eastern part of the analysed region (stations: Hel, Puck, Gdynia), further differentiating the circulation of water masses with different thermal properties in the region. In windless weather or in the case of wind from the southern sector, its waters are distributed perpendicularly northwards. In the case of winds from the western or eastern sector, the discharged river waters are distributed along the marine coast -in accordance with the direction of flow of the littoral current. In such situations, depending on the season, river waters can warm up or cool down marine waters in the coastal zone. Inflow of rivers may contribute to both an increase in SST (Park et al., 2011) and cooling of SST (White & Toumi, 2014). Such a situation is region-specific. In the case of the Vistula River (station Świbno), in the analysed period, the average annual temperature of water flowing into the Baltic Sea was more than 1°C higher than the SST applied for Puck and Gdynia, and 1.6°C higher than in Hel. In reference to particular months, the variability is even greater. It can be higher in July, by even more than 5°C on average, and in January lower by 1.5°C.

Conclusions
In this study, the hybrid physically-statistically based air2water model was modified to be used for SST forecasting, and daily observed SST data from six stations in the Baltic Sea in the period 2009-2019 were used for the evaluation of model performance. The results lead to the following conclusions: (1) With the increase of air temperature, SST presents clear warming trend (0.133°C/year -0.166°C/year), and air temperature warms faster than SST for the studied stations.
(2) The model performs well for SST forecasting. In the validation period, RMSE values range from 0.874°C to 1.703°C (average value: 1.245°C), and NSE values vary between 0.920 and 0.982 (average value: 0.961). Cross-validation results showed that the model is transferable in unknown stations.
(3) The model performs slightly bit worse in the warm period due to the impact of the upwelling phenomenon in the studied region. Overall, it is a promising tool for the prediction of SST.
Searching for new solutions offering the determination of the course and scope of changes in SST is necessary due to the transformation of the ecosystem resulting from the co-occurring natural and anthropogenic factors. Though the model shows good performance for SST forecasting using the data from the Baltic Sea, the model still needs to be justified using data from other regions by comparing its performance with other models, which will be the work in the next step.