Creation of a global tide analysis dataset: Application of NEMO and an offline objective analysis scheme

ABSTRACT The accurate prediction of tides is vital for the operation of many industries, early warning of coastal flooding and scientific understanding of ocean processes. In this paper, we describe the creation method of a global dataset of tidal harmonics using NEMO (Nucleus for European Modelling of the Ocean) for the first time and an offline objective analysis scheme. Data are assimilated as part of a post-processing step, reducing the computational resources required. A reduced ensemble of tidal harmonics is generated, where each member is run for a shorter period of time than a central background state. This ensemble is used to estimate a single background covariance state, which is used for analysis. Output is validated using an ensemble of objective analyses. For each ensemble member, random selections of observations are omitted and validation is performed at these locations. Improvements in both Mean Absolute Error (MAE) and correlation coefficients ( ) are seen across all 6 of the largest diurnal and semi-diurnal constituents. MAEs in amplitude and phase are reduced by up to and , respectively, and correlations by as much as 0.14. In addition, the majority of locations (between 70 and 80%) see significant improvement.


Introduction
Tides in the ocean are periodic movements due to the interactions of the Earth, Sun and Moon system (Pugh and Woodworth, 2014), often manifesting at the coast as a once or twice daily approach and recession of the ocean. Their prediction has a long history and many applications, both practical and scientific. They are vital for port operations, where ships and cargo must safely manoeuvre waterways and harbours. Forecasts of storm surges and resulting coastal inundation, such as those made by agencies such as the UK Met Office, rely heavily upon them, especially in regions with large tidal ranges. Whether or not the tide is high or low can make the difference between unnoticed changes in sea level and life-threatening coastal flooding. Commercial ocean operations such as oil drilling and coastal engineering also require accurate predictions as part of their working procedure.
Where there are long time series of observations available, such as at a tide gauge, tidal predictions may be created with relative ease using a harmonic analysis (Foreman and Neufeld, 1991;Foreman, 1996;Pugh and Woodworth, 2014). This procedure takes advantage of the strong periodicity in the tidal signal and decomposes it into the amplitudes and phases of its sinusoidal components, known as tidal constituents. Amplitudes and phases can then be used to reconstruct a tidal time series into the future with high accuracy. However, such methods are constrained by the availability of observations, both in time and space. Tidal predictions may only be made at the same location where the tidal signal was originally observed and in many regions, appropriate observations are sparse or none existent. Additionally, a long enough observed time series, at the appropriate frequency, is required in order to adequately resolve some constituents. In areas where observations are lacking, numerical models can instead be used to generate harmonics and subsequent predictions. Such models may be prone to errors, however. More accurate 'analysis' datasets can be developed through the use of data assimilation techniques to combine model data with observations. The creation of global datasets of tidal harmonics using numerical models, data assimilation and inverse methods has been well studied (Pekeris and Accad, 1969;Schwiderski, 1979;Parke and Hendershott, 1980;Ray, 1999;Egbert and Bennett, 1996;Arbic et al., 2010). Prominent examples of existing up to date databases include TPXO9 (Egbert and Erofeeva, 2002), FES2014 (Carrere et al., 2016) and HAM12 . Stammer et al. (2014) give a comprehensive review of global tide datasets and their accuracy. These datasets contain varying numbers of harmonics based on the assimilation of a variety of observations into a numerical model.
In this paper, we investigate a lightweight methodology for the generation of a new assimilative global dataset of tidal harmonics at a 1 12 -degree resolution. The dataset is created via the combination of harmonic output from a numerical model and observations from bottom pressure recorders and tide gauges. For the first time, the NEMO (Nucleus for European Modelling of the Ocean (Madec, 2008)) model is used for this purpose. Data assimilation is performed as a post-processing routine rather than during the model simulation. Such a method is flexible enough to allow for the dataset to be updated whenever new harmonic data becomes available, without the need to run new simulations. A reduced ensemble approach is used, wherein an ensemble of model harmonics is used to generate a single covariance matrix, which is subsequently used for assimilation of observations into a single, long model run. This is a quicker and more lightweight approach than in standard ensemble routines.
In the following sections, the generation and validation of the dataset are detailed and discussed. In Section 2, all aspects of the generation of our data are described, including the numerical model configuration, the analysis scheme used, selection of observations and harmonic analyses. Following on from this in Section 3, an attempt is made to validate the analysis through the creation of a pseudo-analysis. Such a validation is necessary to give us confidence that the analysis scheme works. In this section, our dataset is also compared visually to another prominent global tide dataset.

Overview
The dataset discussed in this study is generated by combining harmonic output from a numerical model and observations using data assimilation techniques. The order of operations used to do this is as follows: (1) Run a numerical global ocean model to obtain fields of sea surface height (SSH).
(2) Analyse model output to obtain fields of harmonic amplitudes and phases.
(3) Select and analyse observations to obtain harmonic amplitudes and phases. (4) Combine the observed harmonics and model harmonics using an ensemble optimal interpolation scheme (OI).
In this section, details on each aspect of the dataset generation are given: the choice of numerical model, the data used, the assimilation scheme and decisions on harmonic analysis. There are number of datasets discussed in this study. For later reference, their names and descriptions may be found in Table 1.

Numerical model
Background fields (see Section 2.4) are generated using a harmonic analysis of sea surface height (SSH) output from a numerical model, specifically NEMO (Nucleus for European Modelling of the Ocean) (Madec, 2008). The configuration used is similar to GO6 (Storkey et al., 2018), with a 1/12-degree ORCA Arakawa-C grid (Madec and Imbard, 1996) over a global domain. The grid is curvilinear and tripolar, allowing for the whole globe to be modelled. Physics are modelled using a finite difference integration scheme over 10 terrain-following vertical levels to resolve depth varying currents. Self-attraction and loading (see Farrell (1973) and Ray (1998)) are modelled using a scalar approximation scheme such that suggested by Accad and Pekeris (1978), with b = 0.94. Tidal forcing is applied barotropically to the ocean as a pressure-like force. This forcing is determined from an estimation of the equilibrium tide amplitude and phase for each constituent, based Blending together of linearly adjusted model and all observations using data assimilation techniques. Also referred to in this paper as GTM Validation analysis Harmonic dataset derived from an ensemble of analyses. Each member is an analysis for which a random 1 3 of observations is omitted from the assimilation procedure. Dataset is created by taking the ensemble average, but only when locations were omitted.

Observations
Harmonic observations from tide gauges and bottom pressure recorders used for assimilation and validation.
on work by Cartwright and Taylor (1971). A set of 31 constituents are used for this purpose (henceforth 'forcing' constituents). In all model runs, the ocean-free surface and currents begin at rest. Temperature and salinity are included in the model but are initially the same everywhere. All temperature and salinity fluxes into the ocean are set to zero for the duration of each run, keeping these variables constant. This homogeneity and constant density means that internal waves will not be well (or at all) represented by the model. However, the dissipation of barotropic tidal energy due to internal wave drag has been shown to be important in the open ocean (Egbert and Ray, 2000) and therefore must be parameterised in our configuration. This dissipation is modelled using the scale relation suggested by Jayne and Laurent (2001). The NEMO code was modified for this study to allow for this scheme. The modelling of ice is omitted from this version of the dataset, which will undoubtedly lead to some inaccuracies towards polar regions. This must be kept in mind when validating and assimilating observations in these areas.
A number of input datasets are used. Bathymetry is developed from the GEBCO dataset (GEBCO Compilation Group, 2020), for which the minimum depth is constrained to 23 m. This is not ideal, especially in shallower coastal areas and for nonlinear constituents, but is done to ensure long-term model stability.
Atmospheric forcing is applied using a bulk formulation as outlined by Large and Yeager (2004). The forcing files themselves are hourly averages generated using 30 years of ERA5 atmospheric data. This will allow for the influence of the atmosphere on some constituents whilst smoothing out extreme events such as storm surges. As previously mentioned, all surface heat and salinity fluxes are set to zeroonly wind and atmospheric pressure forcing is applied at the ocean surface to preserve constant density.

Observations
For assimilation and validation, harmonic data from bottom pressure recorder and tide gauges are used. These harmonics are derived from analyses of bottom pressure and sea surface height time series. Data from bottom pressure recorders are obtained from the IAPSO (Cartwright and Zetler, 1979), GLOUP [CITE] and PSMSL (Permanent Service for Mean Sea Level, 2019; Holgate et al., 2013) databases. Tide gauge records are obtained from the GESLA (Woodworth et al., 2017) database. The locations of these observations can be seen in Figure 2. Spatial coverage is good in many coastal areas, including the Northwest European Shelf and the coastlines of the USA and Japan. However, observations are sparse in polar regions, especially in the Arctic Ocean and South Pacific. The treatment and harmonic analysis of these observations is described in Section 2.5.

Assimilation scheme
2.4.1. Overview Data assimilation techniques are used to blend together SSH harmonics from the model and observed harmonics (discussed further in Section 2.4.2). Henceforth, the dataset of combined observations and model harmonics will be referred to as the 'analysis dataset', which is distinct from the harmonic analysis discussed in Section 2.5. Amplitudes and phases are not directly used in this scheme due to the periodicity of phase and strictly positive nature of amplitude. These characteristics create difficulties when applying modifications to the model (e.g. adjusted amplitudes must never be negative) and in ensuring that errors are normally distributed: a necessity for many assimilation schemes. Instead, z 1 and z 2 are used, defined here as: where a and g are the amplitude and phase, respectively, of a given constituent. These are the cartesian (or complex) analogue of amplitude and phase, and both lie in the set of real numbers, R. This also aligns well with comments made by Xu (2018) on the interpolation of harmonic information. It is important to note that only the assimilation of SSH is discussed in this study. Options for extending this to currents are discussed in Section 4.

Ensemble optimal interpolation
To create the analysis dataset, with observations assimilated into the model simulation dataset (hereafter the 'analysis'), Ensemble Optimal Interpolation (EnOI) is used (Oke et al., 2002;Evensen, 2003). As the name suggests, this is an ensemble version of Optimal Interpolation (Lorenc, 1986;Daley, 1991), where the analysis x a is obtained by solving the linear system: where x b is the background variable, y is a vector of observations, K is the gain matrix and H is a linear operator which transforms variables in the background space to the observation space (often just interpolation). For this study, x b is harmonic analysis output for modelled SSH, linearly adjusted such that errors have no offset or trend (henceforth 'linearly adjusted model'). The y is a vector of tide gauge and bottom pressure analyses as detailed in Section 2.3. The term y − Hx b is the innovation; a measure of the difference between the model and the observations. The K matrix describes a set of weights designed to minimise the error variance in the analysis and is determined using: where B is the background error covariance matrix and R is the observation error covariance matrix. For this study, the background error covariance matrix B is estimated as the Schur product of a correlation function ρ and a covariance matrix B ′ (Houtekamer and Mitchell, 2001): where B ′ is the covariance matrix obtained from the ensemble of model-like variables x ′ i : The variables x ′ i are model anomalies obtained by subtracting the ensemble mean from an ensemble, A, of model states x i : where, For this study, the correlation function ρ is intended to 'localize' the assimilation increment, i.e. limit the spatial extent of an observations influence. If ρ is chosen correctly then the product in Equation (5) retains the required properties of a covariance matrix. Gaspari and Cohn (1999) suggested a number of functions suitable for this purpose. We have chosen a simple singlevariable Gaussian type function: where r is a great arc distance between two points and σ is a constant to be chosen. The function is isotropic and decreases with distance with a rate dictated by σ, which is defined globally and tuned independently for each constituent by minimising mean absolute errors in z 1 and z 2 . This means that the localisation is homogeneous, which may not completely represent different length scales in the open ocean and coastal regions. However, it does preserve the required symmetry and positive semidefinite property of the covariance matrix (Gaspari and Cohn, 1999).
EnOI is closely related to the popular Ensemble Kalman Filter (EnKF) method (Evensen, 1994;Burgers et al., 1998), which has been used extensively in oceanic and atmospheric applications. It uses a covariance matrix which is stationary in time, applying a derived increment to a single deterministic model state. On the other hand, EnKF updates the covariance model by updating each member of the covariance ensemble at each application. As a result, EnKF is approximately N times more computationally expensive than EnOI. For our purposes, EnOI is sufficient, seeing as assimilation is done only into a single harmonic state, negating the need for an updated error covariance matrix (Oke et al., 2010).

Implementation
The ensemble used is comprised of 18 members, each being a harmonic analysis of a different model run. Bottom friction and β (used to model self-attraction and loading, see Section 2.2) are varied in each member in order to generate an appropriate spread in the resulting ensemble. More specifically, bottom friction is varied linearly 5% either side of the central value and β between 0.085 and 0.103. These two variables are highly influential in the generation and propagation of global tides in a barotropic model, and therefore the correct choice of value represents a significant portion of the error. For this study, the ensemble used is not expected to be optimal and there will be able to be improved. We discuss this further in Section 4.
In many ensemble assimilation methods (e.g. EnKF), the result is an ensemble of analyses. However, in this study, the ensemble is used only to generate an estimate of the background error covariance matrix B. Each member of the ensemble is generated from a model run of 3 months. Observations are assimilated into a single 'central' background state, which is generated from a longer harmonic analysis of 1-year. The values used for bottom friction and β in this model run are those that lie in the centre of the ensemble values. This methodology allows us to estimate the background error covariance matrix with less expense than performing the full-length run for each ensemble member. It also allows us to perform the assimilation step just once.
For assimilation purposes, errors in z 1 and z 2 are assumed to be independent of each other (zero correlation). Though a strong assumption, preliminary tests showed it gave better results than allowing for crossvariable correlation. Assimilation is then done independently for each harmonic constituent.
The linear problem detailed in Equation (4) becomes very large for a global domain. A parallel routine was developed to overcome this problem. The routine works by splitting the global domain into smaller subdomains and an analysis is created independently for each. All available innovations from the global domain are used for each subdomain to ensure solutions are the same as the full problem. The background covariance matrix B is never constructed explicitly. Instead, only BH T and HBH T , as required for K (Equation (4)), are calculated, consuming less memory.

Harmonic analysis
For both validation and the assimilation procedure, it is imperative that the harmonics from the model and observations are comparable. When using an assimilation scheme, the background and observed variables must represent the same quantity as best as possible. Therefore, a number of decisions and assumptions have been made in choosing a model harmonic analysis and selecting data for assimilation.
Model harmonics are obtained from a harmonic analysis of hourly sea surface height data. The diaharm module, packaged with NEMO, is used for this purpose. One year of model data is analysed for the final dataset, enough to resolve most major constituents, included those with longer periods. All forcing constituents are included in the analysis along with five nonlinear constituents. The choices of harmonic analyses of observed SSH varies per dataset and, in the case of bottom pressure recorders especially, per location. Observations to be included in the analysis are chosen based primarily upon the analysis length and available constituents. Specifically, only locations where K2 can be resolved (approximately six months or more (Foreman, 1996)) are chosen.
The period of time used for harmonic analysis of model data is different from that used for the observed data (which also varies from location to location). Therefore, by using this type of data, an assumption is made that the tidal harmonics are not changing significantly over the timescale of 20-30 years. It is likely that the tides are changing due to climate change (Pickering et al., 2017) ; however, these changes are likely to be small enough such that this assumption is not important in this context.

Validation of raw model
Before examining the analysis dataset, we take a look at raw model output (no assimilation or adjustment) to understand how well it performs. In this section, we perform a validation of the central background state described in Section 2.4.3. We focus on three diurnal and three semi-diurnal constituents. Q1, O1, K1 and M2, S2 and K2. All available observations are used to evaluate the raw modelmore than used in the actual assimilation or to validate the analysis in Section 3.2. Figure 1 shows the observed amplitudes and phases plotted directly against the modelled amplitudes and phases at then nearest model grid point. The results of a linear regression are also shown, along with the R 2 of the fit. Ocean and shelf points have been defined as a location where the bathymetric depth is deeper/shallower than 200 m, respectively. These points are shown by different markers in the figures. For all constituents, the gradient of the data is good for both phase and amplitude. The largest deviations of gradients from unity are seen for Q1 and O1 amplitudes, reaching 1.07 and 1.06, respectively. Offsets are also generally small, especially for amplitudes. R 2 values are generally good, exceeding 0.85 in most cases ; however, there is some significant spread in some of the constituents. In many cases, larger outliers are seen inbut not limited topoints on the shelf. Larger errors in shallower/ coastal regions are to be expected due to the relatively low resolution of the model and restrictions on depth (see Section 2.2). Overall, however, the model performs well, even in the absence of assimilation.
These errors can also be studied spatially. Figure 2 shows amplitude and phase errors on a geographic plot at observations locations. Spatial patterns in these errors vary considerably between the two constituents. In general, the magnitudes of amplitude errors are larger in regions where the amplitude is also larger, which is not surprising (see Figures 5 and 6 for amplitude magnitudes). Many of these areas are also coastal and/or have complex coastal geometry. For example the Bay of Fundy (northeast coastline of the North America) and some areas of the Northwest European Continental Shelf. Strong regional patterns are present in some areas. For instance, M2 amplitude is generally overestimated in the eastern Atlantic but underestimated in the Western Atlantic. The inverse of this is true in some regions for K1 amplitude, e.g. the eastern Atlantic and west coast of both Americas are underestimated. An area which stands out as having particularly large errors and also less of a consistent spatial pattern is the Western Pacific. Here, many areas are shallow and the complex island/coastal geometries are likely not well resolved.
Consistent spatial patterns are also seen in phase. At most locations, errors in M2 phase are below 30 • . Larger errors are typically seen along coastlines or in more enclosed areas such as the Baltic Sea and Gulf of Mexico. Again, a different pattern of errors is seen for K1 phase, with the largest errors being clustered in regions such as

Validation of analysis scheme
Validation can only be done at observation locations. However, The analysis dataset cannot be directly compared to observations as they have been incorporated into the data, giving high accuracy at observation locations. Instead of using the final analysis dataset, we take an ensemble approach to validation. For each constituent, a 120 member ensemble is created where in each member an analysis is generated as before but one third of observations, randomly chosen, are not included in the assimilation procedure. This means that in each ensemble member, we have a set of observations that can be used for validation. For this approach, observations are first thinned such that the minimum distance between any two points is 300 km. This is to reduce the effect of areas of relatively dense observations skewing the analysis.
Using the above ensemble, a new dataset is constructed by taking an ensemble at each observation location, but only when data was not included in the ensemble member. Henceforth, this dataset will be referred to as the 'validation analysis'. Figures 3 and  4 show results from this process, which are discussed further below. Error statistics for the validation analysis are compared to those from model data that has not seen data assimilation. For this comparison, the linearly adjusted model output, as described in Section 2.4 and Table 1 is used. Tables 2 and 3 summarise the results.  open ocean (depth exceeding 200 m). In almost all cases, amplitude and phase MAE are larger in the shallower areas for both the linearly adjusted model and validation analysis. The exception is O1, for which the shelf phase error is slightly smaller than that in the open ocean. In all cases, MAEs are at least 30% lower for the validation analysis than the linearly adjusted model suggesting that, on average, errors are indeed being reduced. As can be seen in Tables 2 and 3, this reduction is as much as 58% for amplitude and 71% for phase (both for O1). Figure 4 shows how well the linear adjusted model and validation analysis correlate with observations using the Spearman's Rank correlation coefficient R 2 . Similarly to Figure 3, amplitude and phase values are plotted on separate axes. In this case, a movement towards (1, 1) shows where the validation analysis had a higher correlation coefficient than the linearly adjusted model. Mirroring the MAE case, Correlations are generally lower in shelf regions but still high in most cases. For every constituent, the validation analysis improves correlations for both amplitude and phase. For some constituents, this improvement is modest, for example, M2. Others see significant improvement such as Q1 and O1 phase. Improvements in MAE and correlations at locations not included in the analysis show that, on the whole, the analysis scheme is spreading innovations into the domain realistically. The scheme is not perfect, however, which is demonstrated by the number of points which do not see improvement and MAE/correlations have room for further improvement. However, these results give confidence in the analysis schemes ability to blend observed data into the modelled harmonics. This is especially true for the final analysis, which incorporates many more observations.

Analysis dataset
In this section, we visually compare the analysis datasets generated for this study to another prominent tidal dataset: FES2014 (Carrere et al., 2016). This comparison allows for an inspection of large scale regions where the datasets vary and to reason why. These comparisons are shown in Figures 5 and 6 for M2 and K1, respectively. Other constituents are not discussed here ; however, the conclusions remain the same. For this section we refer to our dataset as GTM (Global Tide Model). It is important to note that in this section we do not intend to make a direct comparison with FES2014 or to improve upon it ; however, we choose to use this dataset as a broad check that our scheme works over large scales. Figure 5 shows amplitude and phase comparisons for M2. The datasets compare well for both variables, especially away from the poles. The locations of many major amphidromes (identified as areas of strong phase gradient and zero amplitude) are similar in many regions, for example in the North Atlantic, Northeast Pacific and Northwest Pacific. Even in some shallower more coastally complex regions, such as the North Sea and Caribbean Sea, these systems compare well. There are also areas of notable difference. An amphidrome in the FES2014 Weddell sea is missing (or perhaps degenerate) in the GTM dataset. This is likely due to differences in domain extent, as the domain used for GTM does not extend as far south as the FES2014 domain. Another notable area of difference is the Southeast Pacific. This  is an area where tide gauge and bottom pressure observations are sparse (see Figure 2). Figure 6 shows comparisons for K1. Again the two datasets compare well; however, the Southern Ocean shows some large differences in both amplitude and phase, especially in the South Pacific which, as already noted, is an area of limited tide gauge and bottom pressure observations. There are also some smaller scale amplitude features seen in the FES2014 Wedell and Ross Seas that are not present in the GTM dataset. This is again most likely due to domain constraints and the lack of ice modelling in our model. Overall, these results suggest that our method works wellwhere observations are available.

Discussion
In this study, the generation of a global dataset of tidal harmonics was described and evaluated. The method used data assimilation techniques to combine modelled and observed sea surface height harmonics. A full year of model data (from a NEMO configuration) was used to obtain harmonic information which was in turn combined with harmonic data from tide gauges and bottom pressure recorders. The assimilation procedure is completely offline, i.e. data assimilation is performed in a post-processing step rather than on a time step basis within the model. In addition, an ensemble is used only for covariance estimation, with each member being a truncated, shorter version of the full assimilative run. As a result, computation is relatively inexpensive compared to online assimilation or methods using an ensemble of full-length model runs, and the system is flexible, allowing for fast updates to the dataset whenever new simulated or observed data becomes available.
The assimilation scheme was validated using a specially constructed validation dataset, generated from an ensemble of analyses. In each member of this ensemble, one third of the observations were randomly omitted and it is at these locations that the validation was performed. This approach suggested that the assimilation scheme was able to reduce MAEs and increase correlation coefficients for both amplitude and phase in regions where observations were sparse. When averaged across all locations, improvements were seen in all constituents. However, improvements were not seen at all locations. Our validation analysis shows that in all cases, a percentage of points saw no significant improvementapproximately 20-30% for both amplitude and phase. There are numerous reasons why this may be the case, which we discuss in the following paragraphs.
The assimilation of z 1 = a sin g and z 2 = a cos g means that only harmonic data may be used and there are fundamental limitations associated with the offline assimilation of harmonics, limiting the available observations to locations where time series are sufficiently long. Increments are also constrained to regions near observations. The lack of forward model in the assimilation procedure means that increments will not propagate dynamically from areas with observations to those without, as would be the case for online assimilation of sea surface height. Although spatial propagation of information will be captured by the ensemble covariance method, regions which have sparse observations may see little improvement or gain a noisy characteristic, as seen in Section 3.3 for the South Pacific. This may partially explain why some regions do not see any significant improvement.
Changes to the numerical model could result in better background data and a more representative ensemble used to generate the background error covariance. For example, higher resolution in coastal areas, a relaxed restriction on bathymetric depth and the inclusion of ice into the model. Physical parameterisations could be further tuned, improved or even replaced. For example, in this study the scalar approximation method was used to parameterise self-attraction and loading; however, other methods with better accuracy are available (Ray, 1998;Stepanov and Hughes, 2004) such as iterative methods (Accad and Pekeris, 1978) and Green's Function methods (Hendershott, 1972;Farrell, 1973). These methods can be computationally costly, however, and were beyond the scope of this work. The dissipation of barotropic tidal energy in the open ocean is also very important for global tide modelling (Jayne and Laurent, 2001;Simmons et al., 2004). The parametrisation used for this work is a simple scale relation and could be improved. Full modelling of internal physics could also be done if vertical density profiles were not homogeneous (and a higher resolution model was used). The ensemble used for estimation of the background covariance matrix in this paper is not optimal and could be improved. Future work is required to iterate upon it and improve its representation of the model error, leading to innovations being spread from the observation point into the domain in a more realistic fashion. The ensemble used in this paper is not optimal and This may mean improving the physical model or the statistical decisions made when designing the ensemble itself. The ensemble used for this study was small (18 members) and limited. Despite this, however, our scheme worked well and improved the model at the majority of locations.
The construction of the covariance matrix itself may also benefit from further iteration. In this paper, it is constructed from the combination of the ensemble covariance and a Gaussian localisation function. Although this worked for our purposes, this could be improved. For example, the localisation function uses haversine distance as a parameter, which may not be the best choice in the presence of complex coastlines and islands. Signals propagate around coastal boundaries, not through them as implied by such a distance metric. For future work, other methods are available, such as shortestpath methods (see Byrne et al. (2021)), which calculate distances around coastal boundaries, although these methods may be costly on a global domain. The Gaussian function used also assumed spatial homogeneity and isotropy. Although these assumptions simplify the analysis procedure, investigations into using different lengths scales in different regions (e.g. open ocean and on the shelf) may improve the methodology further.
An important limitation to note is that of the variance in harmonic analyses used for both the observations and model. It was mentioned in Section 2.5 that the observed and modelled tidal harmonics were all derived from different time periods of data. In addition to this, however, these were also calculated using different analysis parameters: analysis length, analysis period, constituent sets and software choice. This may result in some representativity error when comparing model harmonics with the observations, which will subsequently find its way into the analysis dataset. The alternative is a vastly reduced observation dataset due to the lack of original time series and the need to run the model for a limited period. For this study, we accept this uncertainty but it will require further study in the future.
Finally, creating dynamically consistent barotropic currents can be a challenge, although options are available. SSH harmonics can be inverted using a method such as that outlined by Ray (2001). Such a method, however, means having to solve very large systems of equations on a global domain. Alternatively, SSH-current covariances can be determined from an ensemble and used to adjust currents within a multivariable assimilation scheme such as those by Lorenc (1981), Courtier et al. (1998), andLorenc et al. (2000).
The limitations of the simulation physics provide a greater challenge for the offline assimilation scheme that is presented here. Nevertheless the scheme is demonstrated to improve Mean Absolute Errors and correlation coefficients across the largest diurnal and semi-diurnal constituents.