Validating predictability of a sea level rise at a tide gauge station

ABSTRACT Predicting coastal sea level rise during the 21st century is essential for risk assessments. It is, therefore, a central theme for numerous studies in predicting sea level rise at coastal regions using historical tide gauge measurements. However, these investigations rarely investigated if such an extrapolation of sea level rise at a coastal region is viable without evidence. This study proposes a broadly framed three-tier statistical validation approach for kinematic models that use tide gauge measurements in sea level predictions. It consists of a concurrent (contemporaneous) validation, retrospective (hindcast) validation and prospective (forecast) validation. The modus operandi for each stage is exemplified using sea level measurements at Brest tide gauge station since 1800. The proposed three-tiered validation process provide statistical information about the predictions to be made at this station.


Introduction
Sea level predictions and projections are usually used interchangeably in coastal sea level studies. Nonetheless, projections are scenarios that are 'what if' exercises based on a wide range of assumptions (Bray & von Storch, 2009), whereas prediction is used as an estimate of the future sea level variations at coast using solely past tide gauge (TG) records. In this context, a prediction at a tide gauge station based on its past records is also a projection. It is inherently more precise and encapsulate only a single scenario, which is essential for risk assessment of coastal sea level rise. A successful prediction however is contingent on the kinematics of the sea level variations not changing drastically despite the climate conditions in the near horizon most of which will affect sea level in known ways. Operationally, what makes predictions inaccurate is failing to model a multitude of potential in-situ lowfrequency sea level variations whose origins are multicausal as recognised as early as 1990s (ex. Sezginer & Ghil, 1995). Since then, despite the recognised complexities almost all the investigations carried out to model and predict sea level variations at coast using TG measurements assumed that such sea level extrapolations are viable and reliable without evidence. Therefore, the aim and the scope of this study is to broadly define a framework for a statistical validation process for kinematic models to investigate the viability of in-situ sea level predictions. The proposed approach consists of concurrent (contemporaneous) validation, retrospective (hindcast) validation and prospective (forecast) validation. In the following sections, each stage of the validation is defined in detail and the modus operandi for each stage is exemplified using sea level measurements carried out at Brest tide gauge station. The outcome provides statistical evidence for the feasibility of predicting level sea level rise during 21st century at this station.

Definitions
The following nomenclature was inspired from the taxonomy of medical science (FDA, 2020) but redefined and framed broadly to suit the aim of coastal sea level studies.
As stated before, the proposed sea level rise validation process is intended to establish consistent statistical outcomes to ensure reliability of kinematic models to conduct sea level predictions using pertinent records. Three stages of sea level validation process are identified: Concurrent (contemporaneous) validation, retrospective (hindcast) validation and prospective (forecast) validation. Concurrent (contemporaneous) validation is used to provide statistical evidence that models do what they intend to do using all the available information. A concurrent model is the scaffolding to predict future and past sea level variations that are not included in the in the TG records. In this study, a previously investigated kinematic model by this investigator is entertained (İz et al., 2018). The model is evaluated for concurrent validation then used for retrospective and prospective validations.
Retrospective (hindcast) validation uses the model elucidated at the concurrent validation stage to produce statistical evidence by assessing the discrepancies between predicted and historical observed TG data, namely,observed-predicted, (O-P), which are not that included in the records to validate the selected model. The outcomes of a retrospective validation will also be used to compare and to contrast the results of the prospective validation for further inferences.
Prospective (forecast) validation is like the retrospective validation process with the exception that it excludes TG records towards the end of the records to be used for O -P assessments. A successful prospective validation is evidence for the model's predictive power, whereas a failing prospective validation, in the presence of a successful retrospective validation, is indicative of changing conditions that are not captured by the concurrent model or current and historical records, i.e. predictability of the future sea level variations at a particular locality.

Sea level variations at a TG station and their proxies
Sea level variations experienced at TG stations are multi-causal. Some of the effects are well known, such as local subsidence, vertical crustal movements caused by regional tectonics, global isostatic adjustment. Some of the others are of oceanographic and meteorological origin; changes in atmospheric pressure, wind, river discharge, ocean circulation, changes in water density and added water volume due to the melting of ice. In shallow areas, bathymetry becomes relevant. Steric sea level changes, changes in ocean circulation, the melting of glaciers or the ice sheets on Greenland and Antarctica all contribute to sea level changes. The impact of external forcing such as astronomical tides lunar of solar origin, thermosteric effects of warming oceans appear in sea level anomalies. They may be secular, episodic or transient in nature, or periodic at semi-annual, annual, interannual, decadal or longer time scales, all contributing to sea level changes. Among the periodic changes, annual sea level variations are dominant (Pugh, 1996), the others were detected and quantified by numerous studies (ex. Iz 2014Iz , 2015. At some coastal areas, the observed trend could simply be a portion of a very low-frequency, globalscale natural variability-only perceived as a trend because the data records are too short. Many authors have recognised their presence of low-frequency signals, removal of their impact on the calculated trends has not generally been done exhaustively. For example, a highly cited study by Douglas (1992) in which lowfrequency sea level variations are explicitly recognised but a quadratic model is used to investigate sea level accelerations with the expectation use of long series will ameliorate the problem. Another extensively debated study published by Houston and Dean (2011) also used a quadratic model, but no attempt was made to account for the low-frequency sea level changes. Note that both studies' results were dubious because of the omission of the effect of the first-order autoregressive disturbances. Another example is the study by Wahl et al. (2013), which is exclusively dedicated to long-term mean sea level changes but again no attempt was made to model them. The result was again imprecise due to omission of the AR(1) effect, which has the propensity of generating random excursions giving the impression of episodic changes in sea level (İz & Shum, 2020b).
The reason for the omission in modelling these effects is simply because all these low-frequency anomalies materialise at different TG locations differently due to their compounding. Sometimes the needed corrections, such as IB effects, wind pressure, etc., are not readily available. There is however a shortcut to control their impact effectively in the analyses of in-situ sea level variations. Keeling and Whorf (1997) conjured random beatings in sea level variations of various origin interacting with forced periodic change of astronomical origin (solar, rations or nodal tides) resulting together with their sub-and super-harmonics. Subsequently, Munk et al. (2002) proposed the compounding of natural broad band sea level variations with forcings of astronomical origin acting as carrier frequencies with lunar node cycle P = 18.613 y, which compounds with natural variations at TG stations to produce subharmonics with periods including 2 × P = 37.226 y; 3 × P = 55.839 y; 5 × P = 74.452 y, and its super-harmonics with periods: P/2 = 9.306 y; P /3 = 6.204 y Some of the others are caused by solar radiation with a period of P = 11.1 y, with its subharmonics with periods: 2 × P = 22.2 y and longer (Iz 2014). Table 1 lists the periodicities of the effects of luni-solar origin generated through the compounding of natural sea level changes. Iz (2014) study confirmed their presence through a metaanalysis of their estimated amplitudes at globally distributed TG stations with long records. This investigator has a long history of demonstrating their statistically significant presence in several of his studies (Iz 2006(Iz , 2014(Iz , 2015(Iz , 2016a(Iz , 2016bİz et al., 2012İz & Ng, 2011;İz & Shum, 2020aİz & Shum, , 2022İz & Shum, , 2022İz et al., 2020).
Although the observed amplitudes of the 18.6year nodal constituent are small, within the range of 15 to 35 mm empirically, compounding of the nodal tide with natural sea level variations can potentially bias sea level trend and acceleration estimates as confounders for short TG time series as evaluated by Iz(2006) analytically. Moreover, their effects may also be mistakenly interpreted as an accelerated sea level rise if they are not incorporated into the models in analysing short as well as longer series (Iz 2014;İz et al., 2020). A follow-up study by Iz (2015) demonstrated that once these effects are modelled and the corresponding model parameters are estimated, spectral analyses of the TG residuals revealed additional statistically significant sea level variations at decadal scale due to the ocean surface wind forcings and periodic changes in atmospheric pressure along the coastal lines of some of the TG stations (İz et al., 2018). Hence, these effects can also be incorporated into models. Iz (2016aIz ( , 2016b has shown that one of the two major compounders of sea level variations in tandem with luni-solar effects have been the thermosteric effect of the warming oceans, which also exhibits lowfrequency sea level changes (Iz 2016a). The other is the atmospheric pressure through the inverted barometric effects (Iz, H.B, 2018a). The periods of these compounding effects are listed in Table 1. In the following section, the proposed three-tiered verification process will be exemplified using the TG records at Brest, France, with a model already constructed by İz and Shum (2021) that make use of these periodic effects.
Note that the underlying model for verifications does not have to be the one promoted in this study. The outcome of the verification is dependent on how successful the model used is in capturing the properties of the sea level variations at a TG station.

Tide gauge records at brest
This study will use Brest, France, tide gauge (TG) station records, which are meticulously studied by (Wöppelmann et al., 2008(Wöppelmann et al., , 2006 to exemplify the proposed validation processes. The station is located at the Eastern part of the Atlantic coast with one of the longest records in Europe spanning 212 years . Several studies investigated the kinematics of the sea level rise at this TG station, such as Gornitz and Solow (1991), Boretti (2020), and very recently, İz and Shum (2021).
Brest monthly TG time series data displayed in Figure 1 were downloaded from the PSMSL repository on September 2020 (PSMSL, 2020). All records are referenced to the Revised Local Reference (RLR). No corrections including post glacial rebound (PGR) nor inverted barometric effects were applied to the data. The omission of the PGR correction makes the nature of the trend estimates relative but does not impact the validation stages. Modelling the effect of the natural forcings and their variations at this locality is an integral part of the model discussed in the following section. Note that TG records at Brest require a more sophisticated modelling effort to account for a statistically significant trend change at the end of the 19th century (İz & Shum, 2021).

A broken trend model with two phases for validations
As noted, the parameters of the validation process cannot be prescriptive and varies from station to station as a function of a multitude of factors experienced at the TG stations that vary on a monthly, interannual and longer basis. The following kinematic model emphasises a two-trend representation in two phases denoted by Phase I and Phase II that occur before and after 1900 as can be seen graphically in Figure 1. This critical period was first reported by Gornitz and Solow (1991) and confirmed by İz and Shum (2021). This date is also the common initial epoch of the records for allocating a reference datum at h 1900 in the current study. In this representation, monthly averaged sea level anomaly at an epoch t is denoted by h t , and t PHI andt PHII denote the monthly epochs during the Phase I and Phase II periods, respectively, The periodic parameters of this model are consequential for the adopted top-down approach for statistical testing of the model parameters (concurrent verification) and prospective and retrospective verifications as it will be exemplified in this study. The periodicities run throughout the series and include a mix of 17 sub-and superharmonics attributed to the node tides solar radiation, annuals and sub-annuals as shown in Table 1. Each period introduces two parameters, α k ; γ k for the cosine and sine components from which the amplitudes a k and the phase angles of the periodic terms are determined. In total, the extended kinematic model includes 34 parameters for the coefficients of the 17 periodicities as listed in Table 1. Together with Phase I and Phase II trends and the datum parameter, model consists of 37 unknown parameters. The statistical properties of the model assume that the disturbances of monthly sea level anomalies are randomly distributed, i.e. ε t 0; σ 2 ε À � , where σ 2 ε is the variance of the disturbances. The square root of its estimate will be denoted by the standard error of the solution, σ ε or simply SE. First-order autocorrelation AR (1), which is present with varying magnitudes in globally distributed tide gauge stations (İz et al., 2012), is about 0.1 for Brest record provided that the lowfrequency sea level variations are modelled as in this study. Therefore, the effect of AR(1) can be safely ignored. Concurrent validity of the proposed kinematic model is exemplified in the following section.

Concurrent validation
The statistically significantα ¼ 0:05% model parameters of eqn. (1) were estimated using the stepwise ordinary least squares (OLS) method (Uotila, 1967) and tabulated in Table 2. The model explains 51% (Adj. R 2 ) of the variation in the observed monthly TG records with SE ¼ 75mm: All the listed estimated parameters, except for Phase I trend, are statistically significant. The absence of a statistically significant trend during Phase I is the evidence for a non-uniform acceleration during the total time span of the records as elucidated in İz and Shum (2021). Because of the length of the records, the impact of the periodicities on the estimated trend parameters is minimal. Nonetheless, if the lowfrequency variations are not modelled the adj. R 2 value will decrease, and the SE of the disturbances, thereby the SE of the trend estimates increase. This outcome adversely impacts the null hypotheses testing of the statistical significance of estimates in the following validation steps.
Although the Adj. R 2 = 51% of the solution is not impressive, the model given by eqn. (1) captures systematic sea level variations effectively. Figure 2 shows that the residuals exhibit normally distributed disturbances, free from any pronounced systematic unmodelled effects as revealed by their plot against the adjusted records as shown in Figure 3. Consequently, the solution statistics for the concurrent validation justifies the use of this model for the following retrospective and prospective validations.

Retrospective model solution
As described earlier, the retrospective (hindcast) validation is intended to use the model validated in the previous section by assessing the discrepancies between predicted and historically observed TG data, O -P, that are not included in the record. The length of the records can be altered depending on the total length of the series and for testing the intended prediction period. In this example, the initial 20-year segment of the TG record  is allocated for the retrospective validation. Because the time span of the records is long, the choice was made simply to conform with the available 19 years TG data during the 21st century. Nonetheless, a better practice would be to test several time spans to determine the effective length of the prediction. The OLS solution statistics to be used for the retrospective validation are listed in Table 2. The amplitudes of the periodicities did not vary significantly, yet some of them are not significant because of the shortened series. The Adj:R 2 ¼ 75mm remains unchanged. SE ¼ 51mm shows a better fit for the time span under consideration.
The residuals in Figure 4 and Figure 5 exhibit similar distributions of the concurrent validation residuals (Figure 2 and Figure 3). The Phase I trend is now significant, because the high leverage 1 of the TG record at the beginning of the series is removed, which may have an impact on the retrospective prediction.

Retrospective model validation
With this information in hand, retrospective prediction for the 20-year period was carried out using the wellknown LS prediction method known as best linear uniformly unbiased prediction, (BLUUP), (Goldberger, 1962;Iz, 1992;İz et al., 2018).   The corresponding SE of the differences is 80.8 mm (Table 4) and is compatible with the SE of the retrospective model solution, 75.2 mm. These results validate broadly the model retrospectively. However, precise information can be obtained if O -P values are used as dependent variables with random disturbances ε t P , and regressed over the prediction time period with monthly intervals t p with a mean shift, δ, and trend bias, β, as independent variables as follows, To ensure that the prediction is successful retrospectively (or prospectively), the OLS solution to the above model should disclose no statistically significant trend and intercept estimates in observed minus predicted, O -P values. Otherwise, there is a trend bias and/or intercept bias (a constant bias) in prediction. The resulting estimates for the trend bias are 0.55 ± 0.90 mm/y and 75.4 ± 75.2 for the mean shift during this period (Table 4). Both estimates cannot reject the null hypothesis. The mean shift can be interpreted as a constant unmodeled uplift or depression in the mean sea level that can be attributed to a persistent lingering IB effect, or a sudden subsidence experienced at the station.
Nonetheless, the trend estimate is not well defined due to the noisy sea level variations during Phase I period. This estimate is the potential bias in predicted trend but vague in the sense that the null hypothesis for its statistical significance cannot be rejected.
The results validate the concurrent model's potential statistical power for retrospective prediction. The outcome however does not mean that the model is also capable to extrapolate sea level anomalies into the future, which will require the prospective validation. This is the topic of the following section.

Prospective model solution
The prospective validation process follows the same narrative with the same steps as in the case of retrospective validation, with the only difference being the ending 20 years of the TG record is allocated for the validation. The broken trend model solution's statistics (Table 5) agree with the concurrent model solution results (Table 2).
The model explains 43.4% (Adj, R 2 ) of the variation in the sea level variations at this station during 1807-2000. Residuals shown in Figures 8 , 9, 10 and11 also confirm that the model effectively represents the systematic portion of the sea level variations, which makes this model suitable for the prospective validation of the next section.

Prospective model validation
The estimated statistically significant trend parameters and periodicities are once again used to predict monthly sea level anomalies prospectively. Observed and predicted sea level anomalies are again dominated by the annual and sub annual sea level variations ( Figure 10). Figure 11, based on the O -P differences, does not reveal statistically significant mean shift and trend bias (Table 6) during the prediction period. The SE of the regression solution is still in agreement with the SE of the prospective model solution. Although, the outcomes validate the prediction for the 20-year period statistically, the magnitudes of  the estimated bias parameters are large, and their standard errors are very noisy to insure precise predictions.

Conclusion
The utility of the kinematic models to predict coastal sea level rise over the globe during the 21st century rely on their predictive ability and the availability of their historical records to capture low-frequency sea level variations as well as the ambient noise. The proposed and exemplified three-tired statistical validation approach is informative to assess their reliabilities. The scope of the validation is broadly defined and needs further improvements and adoption by the sea level community as a standard. In this process, the length of the records to allocate sufficient data for retrospective and prospective predictions for validations, which are scarcely available at TG stations, is prohibitive. The prediction of sea level rise at the Brest station during the 21st century is validated. Although the models used for the predictions are coherent and effectively captures the low-frequency sea level variations at this station, the results show noisy sea level residual anomalies overlapping with potential prediction biases, hampering precise predictions at this station. In any case, the proposed three-tiered validation is needed at TG stations before any predictions are made to establish reliability of the predictions. The proposed validation approach is not prescriptive. Stakeholders must make their own assessments according to their needs, their models, and their prediction horizon, i.e. the number of future intervals to find the optimal length of the predictions, since the outcomes will vary at different localities for different time spans.      Table 3 compared to the one in Table 2 is the location of the initial epoch of the Phase I series which is at the end of the series. Data located at the end or beginning of the series the series always have higher impact on the trend estimates, known as having high leverage (Iz and Shum, 2000). That is why the initial epoch of the series must be located at the middle of the series which was the reason for choosing 1900 as the initial epoch for the whole series 1807-2019. To conform with this selection, we kept the same epoch for phase I and phase II the same resulting with the different estimates.