## ABSTRACT

The old principle of parsimonious modelling of natural processes has regained its importance in the last few years. The inevitability of uncertainty and risk, and the value of stochastic modelling in dealing with them, are also again appreciated, after a period of growing hopes for radical reduction of uncertainty. Yet, in stochastic modelling of natural processes several families of models are used that are often non-parsimonious, unnatural or artificial, theoretically unjustified and, eventually, unnecessary. Here we develop a general methodology for more theoretically justified stochastic processes, which evolve in continuous time and stem from maximum entropy production considerations. The discrete-time properties thereof are theoretically derived from the continuous-time ones and a general simulation methodology in discrete time is built, which explicitly handles the effects of discretization and truncation. Some additional modelling issues are discussed with a focus on model identification and fitting, which are often made using inappropriate methods.

**EDITOR** Z.W. Kundzewicz **ASSOCIATE EDITOR** S. Grimaldi

## 1 Introduction

The principle of parsimony in explaining and modelling natural phenomena is very oldFootnote^{1}
but in the past three decades its importance may have been neglected. The power of computers encouraged many to seek inflationary detailed modelling approaches, hoping that these would provide exact descriptions of real-world systems by representing any detail thereof and would radically reduce uncertainty. After a period of growing hopes for dramatic reduction of uncertainty in modelling of natural, and particularly hydrological, processes, the inevitability of uncertainty and risk is again recognized. At the same time, the value of stochastic modelling in dealing with uncertainty and risk is also again appreciated (Koutsoyiannis *et al*. Citation2009, Koutsoyiannis Citation2010, Montanari and Koutsoyiannis Citation2012, Montanari *et al*. Citation2013). Stochastic modelling aims to make probabilistic predictions of future events, conditional on past observations or not. As analytical calculation of probabilities becomes cumbersome or even infeasible in many cases, stochastic modelling is typically combined with Monte Carlo simulation, which relies on generation of synthetic time series for the processes of interest.

The earlier departure from the principle of parsimony may have also affected stochastic modelling of natural systems. The literature offers a variety of stochastic models that are widely used in simulation and prediction of hydrological and geophysical time series. The dominant approach in stochastic modelling is to choose and fit a model from a repertoire offered in books on time-series analysis. This includes stylized families of models mostly known by their acronyms, such as AR (for autoregressive), MA (for moving average), ARMA (for autoregressive moving average), ARIMA (for autoregressive integrated moving average), ARFIMA (for autoregressive fractionally integrated moving average), and so on. Despite being so common, most of the models are too artificial (in the sense that they were constructed by taking an initially simple model and expanding it mathematically using additional terms), while the typical modelling approach is characterized by the following problems:

These models refer, by definition (or by construction), to discrete time while natural processes evolve in continuous time. A model defined in discrete time may not necessarily correspond to a continuous-time process.

With the exception of a few cases, such as AR(1) or ARMA(1,1), the models are over-parameterized and thus not parsimonious. Moreover, the stochastic structure of the processes is tightly associated with the number of parameters (e.g. to make an autoregressive process with different structure from that of the AR(1) process, we need to use a number of additional autoregression parameters, thus constructing an AR(

*q*) process with*q*> 1). This proves to be a problem in many cases (Koutsoyiannis Citation2000) as ideally, in parsimonious modelling, the alteration of the stochastic structure of a process should not require additional parameters. In cases where the stochastic model is coupled with a deterministic process-based model (see Montanari and Koutsoyiannis Citation2012) the parameters of the stochastic process are additional to those of the deterministic model and thus the lack of parsimony in the stochastic model also affects the entire modelling scheme.Identification of the models is typically based on estimation of the autocorrelation function from data. However, this is characterized by negative bias and uncertainty. While in model identification much importance is typically given to the autocorrelation coefficients for a number of lags, the bias is effortlessly neglected. However, the bias may be very high, particularly when the process is characterized by long-term persistence (LTP), so high that it may even hide the presence of LTP (Koutsoyiannis Citation2003, Koutsoyiannis and Montanari Citation2007, Papalexiou

*et al*. Citation2011; see also Section 6.1 below).

It is not too difficult to get rid of all these model families and modelling practices, which are in fact unnecessary, and to construct a general method to generate synthetic series easily from processes with other autocorrelation structures. Koutsoyiannis (Citation2000) proposed a methodology for the generation of time series from a process with any arbitrary autocorrelation function, provided that it is mathematically feasible (that is, it is positive definite, or, in other words, corresponds to a non-negative power spectrum). A specific application of this generic methodology for the Hurst-Kolmogorov process (also known as fractional Gaussian noise; see below) has been given in Koutsoyiannis (Citation2002).

This paper reports some essential developments on several issues that have been made in the last decade and remain unpublished, in the hope that the rather simple, parsimonious and general framework the paper provides suffices for most applications of stationary processes, with an emphasis on those with LTP, in hydrology and beyond, single-variate or multivariate. The case of cyclostationary processes is also briefly discussed. In brief, the advancements reported next are the following:

The entire framework is based on the concept of a continuous-time stochastic process (Section 2). Inevitably, the generation of time series (similar to the observation of a natural process) requires discretization of time and this can be done either by instantaneously sampling from the continuous-time process or by aggregating or averaging it at specified time intervals. However, instead of formulating stochastic properties for discrete time, it is possible (and much better; see Section 3) to derive the properties of the latter from the continuous-time process.

The specific processes examined and presented as examples (Section 3) are parameter parsimonious as, in addition to the marginal moments of the process and a time-scale parameter, they contain no more than two (shape) parameters. While the simulation methodology discussed can be applied with as many parameters as one wishes, over-parameterization should be avoided at any rate, particularly in stochastics (e.g. Lombardo

*et al*. Citation2014). As far as the dependence structure of the process is concerned, only the parameters that suffice to determine the type of this structure (e.g. short- or long-range dependence or both) should (and usually can) actually be determined.The framework has a tight connection with entropic considerations, which provide some guidance about the expected type of dependence and make the models more physically plausible.

The framework puts special emphasis on the climacogram, which is defined to be the variance (or the standard deviation) of the time-averaged process as a function of time scale of averaging (Koutsoyiannis Citation2010). This tool has some convenient theoretical properties but, above all, it is best suited for model identification and parameter estimation. While, like other stochastic tools, it also involves estimation bias, this can be determined analytically and included in the model identification and fitting.

The framework is complete and stand-alone as it incorporates a general procedure to simulate from any stationary stochastic model.

The framework is illustrated with several examples (Section 5.2). These are not real-world case studies but rather designed to underline the simplicity and effectiveness of the procedures in simulation.

## 2 Definitions and basic concepts

Let * x*(

*t*) be a stationary stochastic process quantifying the instantaneous quantity of a certain hydrological or other physical process at (continuous) time

*t*. The quantity

*is assumed to be a (continuous) random variable and this is reflected in the notation by underlining it, adopting the so-called Dutch convention (Hemelrijk Citation1966). Regular variables such as the time*

__x__*t*or realizations of

*are denoted by non-underlined symbols. Since the process is stationary, the marginal distribution function of*

__x__*(*

__x__*t*),

*F*(

*x*) :=

*P*{

*(*

__x__*t*) ≤

*x*}, where

*P*denotes probability, as well as the marginal probability density function

*f*(

*x*) := d

*F*(

*x*)/d

*x*are real-valued functions of the real (regular) variable

*x*, not functions of time. (Throughout the text, definitions are denoted by the symbol “:=” meaning “is defined as”).

While natural processes evolve in continuous time, observation and generation of synthetic time series are only made in discrete time and therefore we always need to consider derived discrete-time processes and to take into account the effect of discretization in the properties thereof. As shown in , from the continuous time process, we may define two interesting discrete-time processes (with the discrete time denoted as *i*), for which we usually observe or generate time series. By sampling at spacing *Δ* we obtain the sampled instantaneous process

By averaging at time scale *Δ* we obtain the averaged process

where

is the cumulative process (see , upper panel), which obviously is nonstationary. Both the sampled instantaneous process and the averaged process are stationary. From them we can construct the nonstationary cumulative processes at discrete time *i*: namely, that of is precisely * X*(

*iΔ*)

*/Δ*, while that of is different, i.e.:

Without loss of generality we assume that the mean of the process * x*(

*t*) is zero (E[

*(*

__x__*t*)] = 0) and we denote

its variance, which is also the variance of the sampled process, i.e. . Let

be the variances of the cumulative and the averaged process, respectively (note that *Γ*(0) = 0 and *γ*(0) = *γ*_{0}). We call *γ*(*Δ*) and *Γ*(*t*), as functions of time scale *Δ* or time *t*, the *climacogram* and the *cumulative climacogram* of the process, respectively (see Koutsoyiannis (Citation2010) for the justification of the term and note that sometimes it has been used to describe the standard deviation instead of the variance). We observe that, while *Γ*(*t*) is the variance of the nonstationary process * X*(

*t*),

*γ*(

*Δ*) is the variance of the stationary process , i.e. and thus it is not a function of time

*t*.

Once the climacogram (or the cumulative climacogram) is known, every second-order property of the process, such as the autocovariance, autocorrelation, power spectrum, variogram or structure function, can be obtained from this (Koutsoyiannis Citation2013b). For example, the autocovariance

where *τ* is the lag time, is related to the climacogram by

while the inverse transformation is

Also, the power spectrum *s*(*w*), where *w* is frequency, is, by definition, the Fourier transform of the autocovariance, i.e.:

(note that *c*(*τ*) is an even function), while the inverse transformation is:

From (11), we observe that the entire area under the spectrum for frequencies from 0 to ∞ equals the variance of the process (*c*(0) = *γ*_{0}); for that reason the multiplier 4 was used in the definition of the power spectrum in (10).

The structure function (also known as the semi-variogram or variogram) of the process is defined as:

i.e. it is directly connected to the autocovariance and hence can also be estimated from the climacogram. However, by analogy we can define a climacogram-based structure function (CBSF) as:

which has the same asymptotic properties as the standard structure function *h*(*τ*). Combining the climacogram and the CBSF we can also derive a climacogram-based spectrum (CBS), defined as

where is frequency (as in the standard power spectrum). In most cases, the asymptotic behaviour of the CBS is the same as that of the standard power spectrum, and thus the CBS constitutes an easily calculated (i.e. without performing a Fourier transform) substitute for the standard power spectrum, particularly useful when handling data series of small length.

The variance (hence the climacogram) is more naturally than any other of the above tools associated with the uncertainty, and hence the entropy, of the process. We recall that the (Boltzmann-Gibbs-Shannon) entropy of the cumulative process * X*(

*t*) with probability density function

*f*(

*X; t*) is a dimensionless quantity defined as:

where *h*(*Χ*) is the density of a background measure. Entropy acquires its importance from the *principle of maximum entropy* (Jaynes Citation1957), which postulates that the entropy of a random variable should be at maximum, under some conditions, formulated as constraints, which incorporate the information that is given about this variable. Its physical counterpart, the tendency of entropy to become maximal (second law of thermodynamics) is the driving force of natural change.

Assuming that the background measure is the Lebesgue measure, with constant density , and using simple constraints related to the preservation of the mean and variance, the principle of maximum entropy yields that the probability distribution of * X* will be Gaussian. For a Gaussian process, the entropy depends on its variance

*Γ*(

*t*) only and is given as (Papoulis Citation1991):

Koutsoyiannis (Citation2011) defined entropy production as the time derivative of entropy:

and the entropy production in logarithmic time (EPLT) as the derivative in logarithmic time:

For a Gaussian process, by virtue of (16), the EPLT is:

This expresses in dimensionless terms the rate of uncertainty production at time *t* > 0 when nothing is known about the past (*t* < 0) and the present (*t* = 0). When the past and the present are observed, instead of the unconditional variance *Γ*(*t*) we should use a variance *Γ*_{C}(*t*) conditional on the past and present. For a Markovian process (see Section 3), in which only the present (the latest known value) matters in conditioning, it is relatively easy to calculate the conditional variance. Using detailed results from Koutsoyiannis (Citation2011) for a Markovian process, we can verify that the following relationship holds:

while the derivative is:

so that the conditional EPLT is:

Equations (20)–(22) are strictly valid for a Markovian Gaussian process. However, any persistent (positively correlated) process (under some rather general conditions) can be decomposed into a sum of Markovian process (Koutsoyiannis Citation2002, Citation2011), and therefore (20) and (21), by virtue of their linear form (subtraction of two functions), can provide approximations for any persistent Gaussian process. In fact equation (20) provides a lower bound of the conditional variance for a non-Markovian process: it would be exact only if, by knowing the entire past of * x*(

*t*), we were able to decompose

*(0) into the components corresponding to the constituents of the sum of Markovian process. This is not actually the case as the actual conditioning information is lower than assumed to derive (20). An alternative numerical method to evaluate an upper bound of*

__x__*Γ*

_{C}(

*t*) and consequently can be based on the methodology presented in Koutsoyiannis (Citation2005b) and Koutsoyiannis

*et al*. (Citation2007) for discretized processes. Nonetheless, if EPLT is to be calculated, equation (22) may provide a sufficient approximation because, by taking the ratio of the two quantities, the errors in the numerator and the denominator will tend to cancel each other.

## 3 Effect of discretization

As explained in the Introduction, observation and generation of synthetic time series are made in discrete time and therefore we always need to take into account the effect of discretization in the properties of the discrete-time processes. Once the continuous-time climacogram is known, the discrete-time autocovariance function of the averaged process can be determined from the following relationship, analogous to (8), which is exact and generic (Koutsoyiannis Citation2013b):

where is the second-order central difference operator for spacing *Δ* and is the second finite derivative. This is valid for *j* > 0, but it can be written in the following manner so that it is valid for any integer *j*:

It is easily seen that for *j* = 0 this results in , while , as it should.

For the sampled process, assuming that the discrete-time cumulative climacogram

is known for any positive integer *k*, the following relationship, similar to (23), gives its autocovariance function for any positive integer lag *j*:

This holds also for any type of discrete-time process, even if it is not linked to a continuous-time one.

The inverse relationships, which give the (sampled) discrete-time climacograms from the autocovariance function and also hold for any type of discrete-time process, are

where is the variance of the discrete-time process. The following recursive relationship facilitates calculation of the climacogram:

Specifically, it is easily verified that recursive application of (29) results in (27).

The direct calculation of the discrete-time power spectrum from that in continuous time is feasible but somewhat involved. Specifically, discretization makes the power spectrum a periodic function with period *Δ* and since it is, by definition, an even function it suffices to determine it for frequencies *w* in the interval (0, 1/2*Δ*) only, where the upper bound of this interval (1/2*Δ*) is known as the Nyquist frequency. Usually, in discrete time we use a dimensionless frequency *ω* and we also standardize the power spectrum so that its area remains equal to the variance, i.e.:

Thus, for the sampled process the following relationship connects with that in continuous time, *s*(*w*) (see Papoulis Citation1991, p. 336):

To find the relationship of of the averaged process with *s*(*w*), we utilize (31) after we have applied a moving average operation to the process. The power spectrum of the process obtained as the moving average of at time window *Δ*, i.e. (an operation known as a low-pass filtering; see Papoulis Citation1991, p. 325, also noticing the different notational and other conventions), is:

where . By sampling at intervals *Δ*, we find:

Alternatively, the power spectrum of a discretized process can be calculated directly from the autocovariance function of the discretized process. Thus:

for the sampled and the averaged process, respectively. The former expression can also be applied for a discrete-time process that is not linked to a continuous-time process. Note that the first term in each of the right-hand parts of (34) equals the respective variance, i.e. in the case that the discrete-time process is sampled from a continuous-time process) and . The inverse transformation is:

Equations (34) and (35) are the discrete-time analogues of the continuous-time equations (10) and (11). Notice that, even in the discrete-time case, the inverse transformation is an integral, not a sum.

## 4 Example processes

Three example processes have been chosen to analyse here on the basis that they have some properties that make them physically plausible and thus candidates to be used in hydrological and other applications. Specifically, they correspond to extremization of entropy production—in particular extremization of EPLT, which as shown in Koutsoyiannis (Citation2011) is equivalent to extremization of entropy production. The first two, the Markov process and the Hurst-Kolmogorov (HK) process, are well-known processes which Koutsoyiannis (Citation2011) found to correspond to EPLT extremization for asymptotic times, *t* → 0 and *t* → ∞; the third process, the hybrid Hurst-Kolmogorov process (HHK), is introduced here as a physically plausible modification and generalization of the HK process.

While the three processes will be described in the following subsections, it is instructive to overview at this point their entropic properties. These are illustrated in , which depicts their climacograms and their EPLTs *vs* time. At time scale *Δ* = 1 all three processes have the same variance, *γ*(1) = 1, and the same autocovariance for lag 1, = 0.5. These two values are used as constraints in the extremization of EPLT, as detailed in Koutsoyiannis (Citation2011). As seen in , the Markov process has the highest EPLT for *t* → 0 (*φ*(0) = 1, *φ*_{C}(0) = 1.5) and the lowest for *t* → ∞ (*φ*(∞) = *φ*_{C}(∞) = 0.5). Conversely, the HK process has constant EPLT, *φ*(*t*) = *φ*_{C}(*t*) = *H* = 0.7925, where *H* is known as the Hurst coefficient and its value 0.7925 is fully determined from the above constraints (see Section 4.2). This EPLT of the HK process is lowest for *t* → 0 and highest for *t* → ∞. The HHK process has the highest EPLT both for *t* → 0 (*φ*(0) = 1, *φ*_{C}(0) = 1.5 as in the Markov case) and for *t* → ∞ (*φ*(∞) = *φ*_{C}(∞) = *H* = 0.7925, as in the HK case).

The subsections below provide all the details of the three processes, while additional interesting continuous-time processes are given in Gneiting *et al*. (Citation2012) and Koutsoyiannis (Citation2013b).

### 4.1 The Markov process

By definition, a Markov process is one in which the future does not depend on the past when the present is known (Papoulis Citation1991). As a consequence of this, the autocovariance function of the process is a pure exponential function (equation (T1.4) in ), in which *λ* (dimension [*x*]^{2}) is a state-scale parameter and *α* (dimension [*t*]) is a time-scale parameter, while all other second-order properties of the process are determined using the framework of Sections 2 and 3 and are given in .

The process in continuous time is also known as the Ornstein-Uhlenbeck process. By inspecting and comparing with known equations in the time-series literature, it is seen that in discrete time the process is identical to an AR(1) process if it is sampled at equidistant times, or a special case of an ARMA(1,1) process if it is averaged at time windows with constant length.

Major advantages of this process are its simplicity (all its second-order properties are given by simple equations in ) and parsimony (it contains only one parameter, *α*, in addition to those describing its marginal distribution). However, the Markovian property, which is an ideal case not verified in nature, does not make it quite plausible for natural phenomena.

### 4.2 The Hurst-Kolmogorov process

The HK process can be defined based on its climacogram, which is a power function of time scale (Equation (T2.2) in ). The autocovariance function and hence the power spectrum are then easily determined (see ).

The process in continuous time is also known as fractional Gaussian noise (FGN) due to Mandelbrot and van Ness (Citation1968), although these authors used a more complicated approach to define it. In essence, though, the mathematical process had been earlier proposed by Kolmogorov (Citation1940), while Hurst (Citation1951) pioneered the detection in geophysical time series of the behaviour described by this process; hence the name we use for this process. Because this process has infinite instantaneous variance, the sampled process in discrete time is not meaningful (many properties take infinite values). However, the averaged process is well behaved, with all of its properties (including its variance) finite, which makes it quite useful in applications.

The HK process is almost equally as simple and parsimonious as the Markov process; again it contains only one parameter, *H*, in addition to those describing its marginal distribution. Notice that the process variance is controlled by the product *λ α*^{2–2H}, so that *λ* and *α* are not in fact separate parameters. Despite that, we prefer the formulation shown in with three nominal parameters for dimensional consistency: *α* and *λ* are scale parameters with dimensions [*t*] and [*x*]^{2}, respectively, while *H*, the Hurst coefficient, is dimensionless in the interval (0,1).

For *H* = 0.5 the process reduces to pure white noise. For 0.5 < *H* < 1 the process exhibits LTP (also called long-range dependence). For 0 < *H* < 0.5 it is called an antipersistent process. Most of the expressions shown in are valid in the all cases. However, the autocovariance *c*(*τ*) has different expressions in the three cases, as shown in . Specifically, for *H* < 0.5, the autocovariance *c*(*τ*) is negative for any lag *τ* > 0, tending to −∞ as *τ *→ 0. However, at *τ* = 0, *c*(0) = +∞, because this is the variance of a process that cannot be negative; thus, there is a discontinuity at *τ* = 0. Consequently, the averaged process has positive variance and all covariances negative. Such a process is not physically realistic because real-world events at near times are always positively correlated, which means that for small *τ, c*(*τ*) should be positive. Also, the infinite variance cannot appear in nature (it would be associated with infinite energy); even the white noise, whose autocovariance is a Dirac delta function (corresponding to infinite variance, see (T2.3) for *H* = 1/2), cannot describe a natural process. Thus, the HK process can describe natural phenomena only for 0.5 < *H* < 1 and for time scales not too small. Furthermore, values of *H* > 1 that sometimes are reported in the literature are mathematically invalid (Koutsoyiannis Citation2013b) and are the results of inconsistent algorithms.

Most of the functions in describe simple scaling laws, i.e. are depicted as straight lines if plotted in double-logarithmic plots. More formally, these slopes are represented by log-log derivatives, such as:

and likewise for , , etc. It can be seen that for any *Δ* ≥ 0, for any *τ* > 0—and also for *τ* = 0 in the persistent process. The slope of the power spectrum is also constant, i.e., , which is negative for the persistent process, positive for the antipersistent process and zero for the purely random process. As footnoted in , the power spectrum of the sampled process diverges while that of the averaged process exists but does not have a closed expression. Numerically the latter can be evaluated from (34) and generally approaches the continuous-time power spectrum whose expression is given in (T2.5). For small frequencies the approximation is perfect, but as *w* approaches 1/2*Δ* (the Nyquist frequency), the approximation worsens until the slope of becomes exactly zero at *w* = 1/2*Δ*. Interestingly, in the antipersistent process there is a range of intermediate scales (close to ~1/5*Δ*) where the log-log slope becomes much steeper than , close to +2. However, by choosing a higher *Δ*, this range moves to higher frequencies, so that as *Δ* → 0, this high-slope area disappears.

Notably, the HK process is the only one in which the log-log derivatives of all second order properties are constant for any *Δ, τ* and *w*, thereby defining simple scaling laws. In all other processes these log-log derivatives vary, yet scaling laws are defined asymptotically, i.e., for *Δ, τ* or *w* tending to 0 or tending to ∞. In any process other than the HK the respective asymptotic values of the log-log derivatives are different in the two asymptotic cases, 0 and ∞. Interestingly, combining (36) with (9) and (19) we conclude that for Gaussian processes the EPLT is related to the log-log derivative of the climacogram by

This makes EPLT maximization equivalent to maximization of *γ*^{#}(*t*). The above results confirm that in the HK process the EPLT is constant, *φ*(*t*) = *H*, and also enable generalization of the Hurst coefficient for any process by

### 4.3 The hybrid Hurst-Kolmogorov process

This process is introduced here in terms of its climacogram, described by the Cauchy-type relationship

Here *α* and *λ* are scale parameters with dimensions [*t*] and [*x*]^{2}, respectively, *H* is the Hurst coefficient as in the HK process, a dimensionless scaling parameter in the interval (0,1), and *κ* is a second scaling parameter, again dimensionless and in the interval (0,1), which will be called the fractal parameter. As we will see, *H* determines the global properties of the process (as *t* → ∞) and *κ* determines the local properties (as *t* → 0).

This process remedies both inconsistencies of the HK process discussed above, while retaining the persistence or antipersistence properties. Specifically, the variance of the instantaneous process is always finite (*γ*_{0} = *γ*(0) = *λ*), while even for 0 < *H* < 0.5 the initial part of the autocovariance function for small lags is positive for all variants of the process (continuous time, discrete time, either sampled or averaged, for a small time interval *Δ*).

A similar process was proposed by Gneiting and Schlather (Citation2004), which has a Cauchy-type autocorrelation function rather than climacogram. However, the climacogram-based definition in (39) is superior, because it allows for antipersistent processes for 0 < *H* < 0.5, which have positive autocovariance for small lags and negative for large lags. (In the process with a Cauchy-type autocorrelation function, autocovariances are either all positive or all negative). The name “hybrid” for the process defined in (39) reflects the fact that it incorporates both the Markov and the HK processes. Specifically, in the special case with *H* = *κ* = 0.5, HHK is practically indistinguishable from a Markov process (even though not precisely identical, as can be seen by comparing (39) with (T1.2)). Furthermore, as *α* → 0, the process tends to a pure HK process with the same Hurst coefficient *H*. Also, for any specified parameter set, HHK exhibits Markov behaviour for small time scales (if *κ* = 0.5, or similar to Markov if *κ* ≠ 0.5) and Hurst behaviour for large time scales, as seen in .

The physical realism of the process is its strongest advantage. This is expressed by the fact that it maximizes entropy production for both small (*t* → 0) and large (*t* → ∞) times, as well as by the properties discussed above in comparison with the HK process. Empirical testing of the consistency of the process with nature is not quite so easy, as the availability of observed time series of natural processes that have both short observation scale and large sample size is quite limited. Note that, if the available time series had short observation scale and small sample size, the process would look Markovian, while if it had large sample size but at large discretization scale, HHK would look indistinguishable from the pure HK. There is one exception offered by laboratory measurements of turbulence, which can have high temporal resolution as well as very large sample size. And indeed the behaviour observed from time series of turbulence is consistent with that of an HHK model (see Koutsoyiannis Citation2013a, who fitted a special case of the HHK model to turbulence data).

The expressions for most of the HHK second-order characteristics are too complex, but the framework provided in Sections 2 and 3 suffices to evaluate numerically all of its properties based on (39). As an exception, the autocovariance in continuous time has a rather simple expression:

A further important feature of this process is that it allows explicit control of the asymptotic behaviour of all properties (*γ*(*Δ*), *c*(*τ*), s(*w*), *ψ*(*w*)) at both ends (for *Δ, τ, w* → 0 or ∞), which are different at each end, opposite to the HK process, which implies simple scaling laws as described above. Despite the expressions of these properties being involved, the asymptotic properties are easy to calculate, i.e., , , , . From the last two values we conclude that the fractal properties of the process, which are local properties expressing the asymptotic behaviour for small scales or large frequencies, depend on the parameter *κ* only (with the fractal dimension being equal to 2 − *κ*; cf. Gneiting and Schlather Citation2004). In contrast, the behaviour for large scales or small frequencies depends on the Hurst coefficient *H* only.

## 5 Methodology for stochastic simulation

### 5.1 The general algorithm

The so-called symmetric moving average (SMA) method (Koutsoyiannis Citation2000) can directly generate time series with any arbitrary autocorrelation function provided that it is mathematically feasible. The SMA method consists of the following generation equation:

where *a _{j}* are coefficients calculated from the autocovariance function and

*is white noise averaged in discrete time (and not necessarily Gaussian; see Section 6.4). Assuming that we work for the sampled discrete-time process with autocovariance and power spectrum (the procedure is the same for the averaged process), it has been shown (Koutsoyiannis Citation2000) that the Fourier transform of the*

__v___{i}*a*series of coefficients is related to the power spectrum of the discrete-time process by

_{l}Thus, to calculate *a _{l}* we first determine from the power spectrum of the process and then, using an equation analogous to (35), we estimate all coefficients

*a*. In other words, the generation of time series from any possible autocorrelation function is a matter of applying the Fourier transform a couple of times to estimate some coefficients. Note that the coefficients are internal constants of the model, not model parameters.

_{l}It is expected that the coefficients *a _{l}* will decrease with increasing

*l*and will be negligible beyond some

*q*(

*l*>

*q*), so that we can truncate (41) to read

This would introduce some truncation error, as the resulting autocovariance function

would not be identical to . In processes with short-range dependence the error is small even for a relatively small truncation limit *q* (see Section 5.2.3). However, if the dependence is of long-range type with high *H*, then there may be a problem. While in hydrology the *H* values are rather moderate, recent studies have estimated values of *H* > 0.9 for some geophysical series, e.g. Earth’s temperature (Koutsoyiannis and Montanari Citation2007, Markonis and Koutsoyiannis Citation2013, Koutsoyiannis Citation2013a). In such cases *q* should be high to achieve good approximation of the autocovariance function and even of the variance . For example, Koutsoyiannis (Citation2002) suggested *q* = 250 000 if *H* = 0.9. This entails high computational burden.

However, a small *q* is possible if we restrict our interest to the preservation of a moderate number of autocovariance terms , say, two to three orders of magnitude. Our desideratum in this case is to restrict *q* in the same order of magnitude as that of the required number of autocovariance terms. Koutsoyiannis (Citation2000) provided an iterative numerical algorithm for that problem, which does not make use of the Fourier transform, but uses an optimization procedure. While that algorithm is general enough, an optimization for, say, 1000 terms *a _{l}*, which become the control variables in the optimization problem, may be cumbersome.

Here we study a different procedure, which is direct and explicitly considers the truncation in the Fourier analysis. Specifically, instead of deriving *a _{l}* from , we consider a modified spectrum of

*a*which should give for

_{l}*j*>

*q*, while retaining close to the actual

*c*for small lags. We thus estimate as the sum of two terms:

_{j}where *ε* is a constant. The rationale of the first term is that it represents a high-pass filtered process (Papoulis Citation1991, p. 325; cf. (32), noticing that this is a low-pass filter and that is proportional to the square root of ), which suppresses low frequencies or high time scales (>*q*). The second term, whose inverse Fourier transform is the rectangular function (so that, with respect to equation (34), it results in coefficients equal to *ε*/(4*q* + 2) for lags ≤ *q* and 0 for lags > *q*), replaces the truncated part with a constant component for the lags of interest. In conclusion, both terms result in virtually zero coefficients for lags > *q*.

Let be the coefficients *a _{l}* corresponding to the first term of (45), while those corresponding to the second term are equal to each other for all

*l*,. Their sum

will give the final coefficients *a _{l}* to be applied for the generation. The coefficients can be calculated from (35) by using . The constant

*a″*requires the weight

*ε*to be determined. Applying (44) for

*j*= 0, so as to estimate (and preserve exactly) the variance, , we obtain

Solving for *a″*, after algebraic manipulations we find

where and

This method of truncation with adjustment will be referred to as Method 1. A further approximation, which constitutes a simpler calculation denoted as Method 2, is to use directly (rather than ) to estimate (by taking the inverse transform of as in (35) but using in place of ) only for *j* = 0 to *q*, set all terms beyond *q* equal to 0, and then estimate *a″* again from (48).

In both methods, it is expected that . If due to numerical imperfections it happens that , then it is advisable not to use equations (45)–(48), but rather modify by proportional adjustment, i.e.:

This adjustment will be referred to as Method 3.

### 5.2 Application

The preservation of the autocovariance function of any process by the SMA algorithm is theoretically guaranteed and therefore there is no need to test it through applications (although some applications can be found in Koutsoyiannis Citation2000). What is more meaningful to examine is the influence of truncation and the ability of the proposed methods of adjustment to deal with the truncation error. This is done in the following subsections, where we use the three models described above with parameter values as listed in the caption of (in some cases additional parameter sets are tested). It is recalled that, in the case illustrated, the Hurst coefficient for the HK and the HHK process is relatively high, around 0.8.

#### 5.2.1 HHK process

(a) presents the autocovariance function of the HHK process with and without truncation at *q* = 32 in comparison with the theoretical one. The 32 terms used are too few for a process with LTP but serve well as illustration. If we make truncation without any adaptation, then even the first 32 autocovariance terms are not well approximated and there is a 3.3% underprediction of the variance (not shown in figure). However, truncation methods 1 and 2 remedy this problem, with method 1 performing somewhat better than Method 2. (c) shows that to correct the autocovariance function the sequence of *a _{l}* needs to be raised in its right tail. (b, d) shows that this rise adds some periodicity to the power spectrum as well as to the Fourier transform of the sequence of

*a*. Generally, the results are quite satisfactory and obviously they would be more satisfactory if more

_{l}*a*terms were used (

_{l}*q*> 32).

#### 5.2.2 HK process

The application of the same method to the HK process gave quite similar results, so similar that they would be indistinguishable from those in and thus not necessary to reproduce here. The reason that they look similar is that for the time scales and frequencies that can be modelled with *q* = 32 terms, the HK process is indistinguishable from the HHK.

However, the simplicity of the HK process allows the presentation of a more systematic comparison. Moreover, this simplicity allows an analytical determination of the sequence of the untruncated sequence of *a _{l}*. Specifically, as shown in Koutsoyiannis (Citation2002), the coefficients

*a*can be approximated as:

_{l}where the multiplying term in the right-hand side was found after a numerical investigation, so that the sum of all equals *γ*(*Δ*), as it should. Here, by refining the approximation using the same method, the following improvement was obtained:

For *H* > 0.5, the proximity of the power spectrum of the averaged process to that of the continuous-time process (equation (T2.5) in ) allows the theoretical derivation of a more consistent equation, i.e.:

For *H* < 0.5 the proximity is not good and thus equation (52) does not perform well. However, equation (51) is a very good approximation and is also valid for *H* < 0.5.

Equation (51) is readily applied with adjustment method 2 or 3, without any need to calculate a Fourier transform. Results of the application are shown in for a persistent (*H* = 0.9) and an antipersistent (*H* = 0.2) case for a varying number of terms *q*. As shown in the figure, the method performs quite well, giving a good approximation of the autocorrelation function for lags approaching *q* and it outperforms the case where no adjustments are used, even if in the latter case a very high number of terms are used.

#### 5.2.3 Markov process

Coming again to the application designated in but now with the Markov model, some results are depicted in . The results support the conclusion that, due to the exponential decay of autocovariance, in this model only a few terms *a _{l}* are required (in this case, even

*q*= 8 is too much, given that autocovariance values of the order of 10

^{-3}can be neglected) and that no adjustment due to the truncation is actually needed. While the results in are for the averaged process, the application was also made for the sampled process with quite similar results, which are aimless to reproduce here.

One may ask whether this framework offers anything for the Markov case in comparison to the ARMA(1,1) model (for the averaged process) or the AR(1) model (for the sampled process). It should be made clear that these two classical models are useful in simulation and need not be replaced by the current framework. The strength of the current framework is its applicability to *any* stochastic process in the same form. In addition, the SMA scheme is simpler and more convenient for any other stochastic process except these two. To see this, we consider two examples, the AR(2) model and a bivariate AR(1) model. In each of these models the generation expression will give the variable * x_{i}* as a weighted sum of two past variables (for two and one time steps before, respectively) and some white noise variables, while the SMA model in the form (41) or (43) will be a weighted sum of white noise variables only. Now, if we try to reproduce the fourth moment (kurtosis) of the process and thus raise the generation expression to the fourth power, in the AR(2) model and the bivariate AR(1) model some joint fourth-order moments will emerge, which cannot be handled. In the SMA model no such terms will appear.

## 6 Final notes

### 6.1 Model identification and fitting

The typical procedure to choose an appropriate stochastic model, based on an observed time series, is to construct the empirical autocorrelogram of the time series and estimate which stochastic process (e.g. of AR or ARMA type) is suitable and how many autocorrelation terms should be preserved. It is not too difficult to illustrate that this technique can completely distort the underlying process. (a) depicts the autocorrelogram of a time series with length 100, which does not seem to have any relationship with the theoretical autocorrelation function of the process from which was constructed. That process is the HHK one with parameters as in the caption of . Clearly, the empirical autocorrelation does not give any hint that the time series stems from a process with LTP. With that autocorrelogram one would conclude that an AR(1) model with a lag-1 autocorrelation of about 0.4 would be appropriate. Note that, considering the annual scale, which cancels out seasonality, a length 100 of a time series is unusually high in hydrology and in other geophysical fields. For shorter time series the results would be even more disappointing.

There are two reasons for the failure of the autocorrelogram to capture the real behaviour of the process. First, from (8) and (23) it is seen that the autocorrelation (autocovariance standardized by variance) is by nature the second derivative of the climacogram. Estimation of the second derivative from data is too uncertain and makes a very rough graph. The second reason is that autocorrelation is biased, as explained in Koutsoyiannis (Citation2003).

It is thus more natural to directly use the climacogram instead of the autocorrelogram for model identification. For our example time series, this is illustrated in (b), which indicates that the long-term persistence is well captured by the empirical climacogram and the parameter *H* is correctly estimated (*H* = 0.79, based on the method presented in Koutsoyiannis Citation2003, and Tyralis and Koutsoyiannis Citation2011).

Here it should be noted that in stochastic processes, particularly ones with LTP, almost all classical statistical estimators are biased and the climacogram is not excepted from this rule. However, the bias of the climacogram is easy to calculate *a priori* and actually this has been done in the plot of (b), where the empirical climacogram is not compared with the theoretical one but with the expected one considering bias. The big distance between the two climacograms in (b), theoretical and adjusted for bias, indicates how big the bias can be and suggests the importance of considering the bias even for model identification, let alone model fitting.

Specifically, the bias in the climacogram can be considered as follows. As shown in Koutsoyiannis (Citation2011), assuming that we have observations of the averaged process , where the observation period *T* is an integer multiple of *Δ*, the expected value of the empirical (sample) climacogram is:

where is given by the standard statistical estimator, i.e.:

whereas is the standard (unbiased) statistical estimator of the common mean *μ* of the instantaneous process * x*(

*t*) as well as of the discrete processes

*and , i.e.:*

__x___{i}If the observed process is the sampled () at time steps *Δ*, rather than the averaged one (), then (53) becomes (Koutsoyiannis Citation2013b):

where

Notice that in this case .

It becomes clear from the above equations that direct estimation of the variance , let alone that of the instantaneous process *γ*(0), is not possible merely from the data. We need to know the ratio and thus we should assume a stochastic model that evidently influences the estimation of . Once the model is assumed and its parameters estimated based on the data, we can expand our calculations to estimate the variance for any time scale, including the instantaneous one *γ*(0). This applies to both cases of discretization, sampled and averaged, and therefore the need to assume a model in order to estimate the process statistics is imperative, irrespective of the type of observations, instantaneous or averaged. One may counter-argue that the standard statistical estimation offered in statistics books is free of model assumptions in estimating the true (population) parameters from data. However, this is not correct: a model is always assumed. In classical statistics the assumption is the independence in time and hence the model is the white noise process. In fact, such independence can hardly be a realistic assumption for natural processes.

In other words, model identification and fitting cannot be separated from each other. The climacogram, thanks to its analytical expression of bias and its smoother shape, in comparison to that of the autocorrelogram, offers a basis to perform both tasks simultaneously. Once the model is identified, at a later step its parameters can be fine tuned by using additional methods, for example the maximum likelihood method. Tyralis and Koutsoyiannis (Citation2011) have streamlined the maximum likelihood method for the HK process, in which the autocovariance matrix introduced in calculations is unavoidably huge. The same method can be adapted for the HHK process without difficulty. In addition, Tyralis and Koutsoyiannis (Citation2011) compared several parameter estimation methods and concluded that the maximum likelihood is generally superior, while other methods they examined performed much worse, except two climacogram-based methods. These performed only slightly worse than the maximum likelihood method but have the advantage of combining model identification and parameter estimation.

The maximum likelihood method (in most cases approximated by a least squares objective function) has also been in common use in ARIMA-based stochastic modelling. Obviously, this is a much more reliable practice in comparison to fitting the model through empirical estimates of autocorrelations, based on the Yule-Walker equations. Yet in standard modelling practice, the empirical autocorrelation function remains the primary tool for model identification and, as explained above, this is not good modelling practice. If the model identified is inappropriate, then seeking the proper values of its parameters is less important and perhaps misleading. In particular, if the actual process is characterized by LTP, then a time series may contain rich patterns even if the underlying process is quite simple and parsimonious (Papalexiou *et al*. Citation2011). However, if the model identification failed to detect the LTP, then it is quite likely that it would need a complicated model with many parameters to describe these rich patterns.

A more thorough investigation of problems related to model identification and parameter estimation has been made by Dimitriadis and Koutsoyiannis (Citation2015), who compared the climacogram, autocovariance and power spectrum and concluded that the first is more advantageous.

### 6.2 Multivariate modelling

Most applications of hydrological simulation require multivariate setting as it is usually important to simulate cross-correlation coefficients among different processes. While the above discourse is for single-variate setting, the multivariate extension is direct and easier than in other modelling approaches, as already indicated in Section 5.2.3. The details required for this extension are given in Koutsoyiannis (Citation2000).

### 6.3 Prediction

Whenever future prediction, rather than unconditional simulation, is required, while past observations are available, then those observations can be used to condition the distribution of the future realizations of the process * x*(

*t*). A general method to handle this case, which has been described in Koutsoyiannis (Citation2000), can easily be implemented within a Monte Carlo prediction framework (Efstratiadis

*et al*. Citation2014). Such a prediction is vital in water management applications (Koutsoyiannis

*et al*. Citation2003b). Conditioning on past data within a Monte Carlo framework is particularly important when LTP is present, as clarified by Koutsoyiannis

*et al*. (Citation2007) using a classical statistical framework and by Tyralis and Koutsoyiannis (Citation2014) using a Bayesian framework.

### 6.4 Non-Gaussian distributions

If the marginal distribution of * x*(

*t*) is Gaussian, then no consideration in addition to that already exposed is required. However, non-Gaussian distributions (mostly positively skewed and possibly heavy tailed) are very common in hydrology and their emergence has some theoretical justification again based on entropic considerations (Koutsoyiannis Citation2005a, Citation2014). For non-Gaussian distributions two strategies are commonly used. The first is to convert the process, using a nonlinear transformation, to a Gaussian process (e.g. Koutsoyiannis

*et al*. Citation2008). This is not easy as the transformation cannot be invariant with respect to the time scale (Lombardo

*et al*. Citation2012). The second strategy is to explicitly preserve, at the time scales of interest, higher-order marginal moments, in particular the third moment or, equivalently, the skewness coefficient. The SMA model allows easy handling of high-order moments, as already indicated in Section 5.2.3. The details of the latter technique have been studied in Koutsoyiannis (Citation1999, Citation2000). It should be emphasized, though, that high-order moments cannot be reliably estimated from data (Lombardo

*et al*. Citation2014) and hence a model for the marginal distribution is again required to better specify the values of the moments to be preserved.

### 6.5 Intermittency

Hydrological processes are often intermittent at small time scales (e.g. rainfall). Intermittency is not easy to handle. Some techniques have been implemented in Koutsoyiannis *et al*. (Citation2003a) and Efstratiadis *et al*. (Citation2014). However, they contain some *ad hoc* elements and therefore further research is needed to develop a fully consistent theoretical approach.

### 6.6 Cyclostationarity

At sub-annual time scales, hydrological and other geophysical processes exhibit periodicity in their statistical properties. This behaviour is also known as seasonality or cyclostationarity. A popular technique to deal with this problem is to “stationarize” the time series by standardizing by the mean and standard deviation of each separate “season” (e.g. month or day). However, it should be kept in mind that such “stationarization” applies only to the marginal distribution (in particular, only to the first two moments of the marginal distribution if this popular linear transformation procedure is used). It has no effect on the dependence structure, which may also be characterized by cyclostationarity. Two different techniques can deal more effectively with this problem. One (Langousis and Koutsoyiannis Citation2006) generates times series of each “season” separately, thus accounting for long-term (multiyear) persistence, and then modifies the separate time series to establish proper autocorrelations between consecutive months. The other technique, more widely implemented so far in hydrology, uses a disaggregation approach. This first generates the annual time series, thus accounting for LTP, and then disaggregates them into sub-annual (e.g. monthly or daily) quantities (Efstratiadis *et al*. Citation2014, Koutsoyiannis Citation2001).

## 7 Conclusions

It is more natural and better from many aspects to formulate the stochastic processes that are used in modelling of hydrological (and other geophysical) processes in continuous time rather than using the recipes of discrete-time processes (e.g. AR, ARMA, etc.) of the literature. The theoretical framework provided here allows derivation, with easily applicable equations, of the stochastic properties in discrete time, whether these correspond to sampling of a continuous-time process at equidistant times or averaging it at time intervals with equal length.

Among continuous-time processes, the HHK process, introduced and studied here, proves to have properties that make it a good candidate for several hydrological applications. First of all, it is associated with maximum entropy production at both small and large time scales, thus combining Markovian behaviour at small scales and Hurst-Kolmogorov behaviour at large scales. It also incorporates both these processes as special cases. Second, it is parameter parsimonious as, in addition to marginal distributional properties, it uses one time-scale parameter and up to two dimensionless parameters to define the dependence structure. One of the latter (*κ*) determines its behaviour on small scales (the fractal properties) and the other one (*H*) the long-term persistence (the HK properties). Third, it is free of some unnatural properties of the HK process *per se*, namely the infinite variance of the instantaneous process and the discontinuity of its autocovariance function in the case of antipersistence (*H* < 1/2), which results in negative autocorrelation even for arbitrarily short lags.

Despite those shortcomings, the HK process, which is also thoroughly studied here, provides a perfect approximation of the HHK process for intermediate and large time scales, being even more parsimonious than the HHK process and enabling analytical expressions of its properties in continuous time and of most of those in discrete time (for the averaged process).

Whichever of the three above processes is used, or even any other process, the general simulation methodology (SMA) allows easy application and also correction of possible truncation errors. The results of the case studies examined show that these errors can be almost eliminated by the adjustment methodologies studied.

As regards model identification and fitting, an example is provided that illustrates the inability to specify the appropriate model if the common practice based on the empirical autocorrelogram is used. The framework supports the use of the climacogram for this purpose. While, like other stochastic tools, the climacogram also involves estimation bias, its bias can be determined analytically and included in the model identification and fitting. Furthermore, it is shown that the last two tasks are closely related to each other. Even fundamental statistical estimators, such as the sample variance, cannot be applied without specifying a model type.

Overall, the methodology developed is not demanding in terms of computational means. For instance, all the examples presented here have been prepared in common spreadsheet software. However, it is planned that all the presented advances be implemented in the next version of the stochastic simulation framework *Castalia* (Efstratiadis *et al*. Citation2014).

### Acknowledgements

I gratefully appreciate discussions about several issues covered here with Yannis Dialynas, Panayiotis Dimitriadis, Federico Lombardo, Hristos Tyralis, Andreas Efstratiadis, Panagiotis Kossieris, Andreas Langousis, Yannis Markonis and Simon Papalexiou. I particularly thank the first four for detailed comments and corrections on an earlier version of this paper. I am grateful to the reviewers Robin Clarke and Alberto Montanari, Associate Editor Salvatore Grimaldi and Editor Zbigniew Kundzewicz for the positive evaluation of the paper and for the review comments that helped to substantially improve the paper.

## Disclosure statement

No potential conflict of interest was reported by the author.

## Additional information

### Funding

## Notes

^{1} It was first expressed by Aristotle (384–322 BC) in the following manner:

*Ἒστω γὰρ αὕτη ἡ ἀπόδειξις βελτίων τῶν ἄλλων τῶν αὐτῶν ὑπαρχόντων, ἡ ἐξ ἐλαττόνων αἰτημάτων ἢ υποθέσεων ἢ προτάσεων*.*We may assume the superiority, other things being equal, of the demonstration which derives from fewer postulates or hypotheses or propositions*. [Posterior Analytics, I, 25]*Φανερὸν ὅτι μακρῷ βέλτιον πεπερασμένας ποιεῖν τὰς ἀρχὰς, καὶ ταύτας ὡς ἐλαχίστας πάντων γε τῶν αὐτῶν μελλόντων δείκνυσθαι, καθάπερ ἀξιοῦσι καὶ οἱ ἐν τοῖς μαθήμασιν*.*Obviously, it is much better to assume a finite number of principles, as few as possible yet sufficient to prove what has to be proved, like in what mathematicians demand*. [On the Heavens, ΙΙΙ, 4]

*c.*1168–1253), Thomas Aquinas (

*c*. 1225–1274) and William of Ockham (

*c*. 1285–1347), whose statement

*Plurality is not to be posited without necessity*.has given the principle the alternative name

*Ockham’s razor*(see also Gauch Citation2003).

## References

- Dimitriadis, P. and Koutsoyiannis, D., 2015. Climacogram versus autocovariance and power spectrum in stochastic modelling for Markovian and Hurst–Kolmogorov processes.
*Stochastic Environmental Research and Risk Assessment*, 29 (6), 1649–1669. doi:10.1007/s00477-015-1023-7. - Efstratiadis, A., et al., 2014. A multivariate stochastic model for the generation of synthetic time series at multiple time scales reproducing long-term persistence.
*Environmental Modelling and Software*, 62, 139–152. doi:10.1016/j.envsoft.2014.08.017. - Gauch Jr., H.G., 2003.
*Scientific method in practice*. Cambridge: Cambridge University Press. - Gneiting, T. and Schlather, M., 2004. Stochastic models that separate fractal dimension and the Hurst effect.
*Society for Industrial and Applied Mathematics Review*, 46 (2), 269–282. - Gneiting, T., Ševčíková, H., and Percival, D.B., 2012. Estimators of fractal dimension: assessing the roughness of time series and spatial data.
*Statistical Science*, 27 (2), 247–277. doi:10.1214/11-STS370 - Hemelrijk, J., 1966. Underlining random variables.
*Statistica Neerlandica*, 20 (1), 1–7. doi:10.1111/j.1467-9574.1966.tb00488.x - Hurst, H.E., 1951. Long term storage capacities of reservoirs.
*Transactions of the American Society of Civil Engineers*, 116, 776–808. - Jaynes, E.T., 1957. Information theory and statistical mechanics.
*Physical Review*, 106 (4), 620–630. doi:10.1103/PhysRev.106.620 - Kolmogorov, A.N., 1940. Wienersche Spiralen und einige andere interessante Kurven im Hilberschen Raum.
*Dokl. Akad. Nauk SSSR*, 26, 115–118, (English translation Wiener spirals and some other interesting curves in a Hilbert space. In: V.M. Tikhomirov, ed., 1991. Selected works of A.N. Kolmogorov, Volume I: mathematics and mechanics. Berlin: Springer, 324–326). - Koutsoyiannis, D., 1999. Optimal decomposition of covariance matrices for multivariate stochastic models in hydrology.
*Water Resources Research*, 35 (4), 1219–1229. doi:10.1029/1998WR900093 - Koutsoyiannis, D., 2000. A generalized mathematical framework for stochastic simulation and forecast of hydrologic time series.
*Water Resources Research*, 36 (6), 1519–1533. doi:10.1029/2000WR900044 - Koutsoyiannis, D., 2001. Coupling stochastic models of different time scales.
*Water Resources Research*, 37 (2), 379–391. doi:10.1029/2000WR900200 - Koutsoyiannis, D., 2002. The Hurst phenomenon and fractional Gaussian noise made easy.
*Hydrological Sciences Journal*, 47 (4), 573–595. doi:10.1080/02626660209492961 - Koutsoyiannis, D., 2003. Climate change, the Hurst phenomenon, and hydrological statistics.
*Hydrological Sciences Journal*, 48 (1), 3–24. doi:10.1623/hysj.48.1.3.43481 - Koutsoyiannis, D., 2005a. Uncertainty, entropy, scaling and hydrological stochastics. 1. Marginal distributional properties of hydrological processes and state scaling.
*Hydrological Sciences Journal*, 50 (3), 381–404. doi:10.1623/hysj.50.3.381.65031 - Koutsoyiannis, D., 2005b. Uncertainty, entropy, scaling and hydrological stochastics. 2. Time dependence of hydrological processes and time scaling.
*Hydrological Sciences Journal*, 50 (3), 405–426. doi:10.1623/hysj.50.3.405.65028 - Koutsoyiannis, D., et al., 2009. HESS Opinions: “climate, hydrology, energy, water: recognizing uncertainty and seeking sustainability”.
*Hydrology and Earth System Sciences*, 13, 247–257. doi:10.5194/hess-13-247-2009 - Koutsoyiannis, D., 2010. HESS Opinions: “a random walk on water”.
*Hydrology and Earth System Sciences*, 14, 585–601. doi:10.5194/hess-14-585-2010 - Koutsoyiannis, D., 2011. Hurst-Kolmogorov dynamics as a result of extremal entropy production.
*Physica A: Statistical Mechanics and its Applications*, 390 (8), 1424–1432. doi:10.1016/j.physa.2010.12.035 - Koutsoyiannis, D., 2013a. Hydrology and Change.
*Hydrological Sciences Journal*, 58 (6), 1177–1197. doi:10.1080/02626667.2013.804626 - Koutsoyiannis, D., 2013b.
*Encolpion of stochastics: fundamentals of stochastic processes*. Athens: Department of Water Resources and Environmental Engineering – National Technical University of Athens. Available from: http://www.itia.ntua.gr/1317/ [Accessed 19 September 2015]. - Koutsoyiannis, D., 2014. Entropy: from thermodynamics to hydrology.
*Entropy*, 16 (3), 1287–1314. doi:10.3390/e16031287 - Koutsoyiannis, D., Efstratiadis, A., and Georgakakos, K., 2007. Uncertainty assessment of future hydroclimatic predictions: a comparison of probabilistic and scenario-based approaches.
*Journal of Hydrometeorology*, 8 (3), 261–281. doi:10.1175/JHM576.1 - Koutsoyiannis, D. and Montanari, A., 2007. Statistical analysis of hydroclimatic time series: uncertainty and insights.
*Water Resources Research*, 43 (5), W05429. doi:10.1029/2006WR005592. - Koutsoyiannis, D., Onof, C., and Wheater, H.S., 2003a. Multivariate rainfall disaggregation at a fine timescale.
*Water Resources Research*, 39 (7), 1173. doi:10.1029/2002WR001600 - Koutsoyiannis, D., Yao, H., and Georgakakos, A., 2008. Medium-range flow prediction for the Nile: a comparison of stochastic and deterministic methods.
*Hydrological Sciences Journal*, 53 (1), 142–164. doi:10.1623/hysj.53.1.142 - Koutsoyiannis, D., et al., 2003b. A decision support system for the management of the water resource system of Athens.
*Physics and Chemistry of the Earth, Parts A/B/C*, 28 (14–15), 599–609. doi:10.1016/S1474-7065(03)00106-2 - Langousis, A. and Koutsoyiannis, D., 2006. A stochastic methodology for generation of seasonal time series reproducing overyear scaling behaviour.
*Journal of Hydrology*, 322, 138–154. doi:10.1016/j.jhydrol.2005.02.037 - Lombardo, F., Volpi, E., and Koutsoyiannis, D., 2012. Rainfall downscaling in time: theoretical and empirical comparison between multifractal and Hurst-Kolmogorov discrete random cascades.
*Hydrological Sciences Journal*, 57 (6), 1052–1066. doi:10.1080/02626667.2012.695872 - Lombardo, F., et al., 2014. Just two moments! a cautionary note against use of high-order moments in multifractal models in hydrology.
*Hydrology and Earth System Sciences*, 18, 243–255. doi:10.5194/hess-18-243-2014 - Mandelbrot, B.B. and van Ness, J.W., 1968. Fractional Brownian motions, fractional noises and applications.
*SIAM Review*, 10 (4), 422–437. doi:10.1137/1010093 - Markonis, Y. and Koutsoyiannis, D., 2013. Climatic variability over time scales spanning nine orders of magnitude: connecting Milankovitch cycles with Hurst–Kolmogorov dynamics.
*Surveys in Geophysics*, 34 (2), 181–207. doi:10.1007/s10712-012-9208-9 - Montanari, A. and Koutsoyiannis, D., 2012. A blueprint for process-based modeling of uncertain hydrological systems.
*Water Resources Research*, 48, W09555. doi:10.1029/2011WR011412. - Montanari, A., et al., 2013. “Panta Rhei—Everything Flows”: change in hydrology and society—The IAHS Scientific Decade 2013–2022.
*Hydrological Sciences Journal*, 58 (6), 1256–1275. doi:10.1080/02626667.2013.809088 - Papalexiou, S.-M., Koutsoyiannis, D., and Montanari, A., 2011. Can a simple stochastic model generate rich patterns of rainfall events?
*Journal of Hydrology*, 411 (3–4), 279–289. doi:10.1016/j.jhydrol.2011.10.008 - Papoulis, A., 1991.
*Probability, random variables and stochastic processes*. 3rd ed. New York: McGraw-Hill. - Tyralis, H. and Koutsoyiannis, D., 2011. Simultaneous estimation of the parameters of the Hurst-Kolmogorov stochastic process.
*Stochastic Environmental Research & Risk Assessment*, 25 (1), 21–33. doi:10.1007/s00477-010-0408-x - Tyralis, H. and Koutsoyiannis, D., 2014. A Bayesian statistical model for deriving the predictive distribution of hydroclimatic variables.
*Climate Dynamics*, 42 (11–12), 2867–2883. doi:10.1007/s00382-013-1804-y.