A Process of Dependent Quantile Pyramids

Despite the practicality of quantile regression (QR), simultaneous estimation of multiple QR curves continues to be challenging. We address this problem by proposing a Bayesian nonparametric framework that generalizes the quantile pyramid by replacing each scalar variate in the quantile pyramid with a stochastic process on a covariate space. We propose a novel approach to show the existence of a quantile pyramid for all quantiles. The process of dependent quantile pyramids allows for non-linear QR and automatically ensures non-crossing of QR curves on the covariate space. Simulation studies document the performance and robustness of our approach. An application to cyclone intensity data is presented.


Quantile Regression
Quantile regression (QR) has drawn increased attention as an alternative to mean regression.QR was motivated by the realization that extreme quantiles often have a different relationship with covariates than do the centers of the response distributions.QR can target quantiles in the tail of the distribution and is more robust to outliers than is mean regression.The advantages of QR can be substantial and have led to its use in many areas, including econometrics, finance, medicine, and climatology.
The seminal work of Koenker & Bassett (1978) extended median regression, which dates back at least as far as Edgeworth (1888), to QR by allowing asymmetry in the objective function defining the regression.That is, when one is interested in the τ th quantile (0 < τ < 1) for the response y i and the covariate x i ∈ R p+1 , i = 1, . . ., n, and assuming independent responses, the estimated QR surface is x T i b * , where b * = arg min b∈R P +1 n i=1 ρ τ (Y i − x T i b), and ρ τ (u) = u(τ − 1 (u<0) ) is the (asymmetric) check loss function.This method is implemented in the R package 'quantreg' (Koenker 2005) and has led to a wide range of developments.A recent overview of the area is provided in Koenker (2017).
The Bayesian counterpart of quantile regression was introduced by Yu & Moyeed (2001) who used the asymmetric Laplace distribution (ALD) for the sampling density of Y |x as a device to focus on the QR for the τ th quantile.The ALD has density f (u) = τ (1 − τ ) exp(−ρ τ (u)) which can be seen as a scaled and exponentiated check loss function.This substitution of a loss function for the log-density is an early example of the generalized Bayes technology developed in Bissiri et al. (2016).Kozumi & Kobayashi (2011) made use of the reparametrization of the ALD illustrated in Kotz et al. (2001) and Tsionas (2003) to create an efficient Gibbs sampling algorithm.The method is implemented in the R package 'bayesQR' (Benoit & Van den Poel 2017).Many authors have followed the approach of Yu & Moyeed (2001), describing properties of the method.See, for example, Geraci & Bottai (2007), Reich et al. (2010), Waldmann et al. (2013), Lum & Gelfand (2012).Some have appealed to a semi-or nonparametric approach.Kottas & Krnjajić (2009), for instance, proposed two approaches to model the error distribution nonparametrically in QR, using a Dirichlet process (DP) mixture of uniform densities and a dependent DP mixture of normal densities.Chen & Yu (2009) developed a QR function in a nonparametric fashion using piecewise polynomials.

Crossing quantiles
When more than one quantile level is considered, however, fitting a QR curve for each level by itself does not correspond to an encompassing model, may not respect the monotonicity of the quantile function, and can result in crossing quantiles.Researchers have suggested various approaches to handle this issue.Rodrigues & Fan (2017) constructed a likelihood inspired by Yu & Moyeed's approach, while ensuring monotonocity of quantiles with an additional adjustment step.Semi-or non-parametric Bayesian approaches to simultaneous QR include Taddy & Kottas (2010) who suggested an approach to estimate the entire joint density of (x, y) and then extract the QR from this density.This ensures monotonicity of quantiles since the inference of quantiles is based on a single density.Reich et al. (2011) and Reich (2012) model the entire quantile process using Bernstein polynomial basis functions in spatial and spatiotemporal settings.In both papers, the prior is specified to satisfy the monotonicity constraint on the quantile function.Kadane & Tokdar (2012) developed a characterization of the quantile function that induces monotonicity in the joint estimation of linear QR models for a univariate covariate.Yang & Tokdar (2017) extended this to any bounded covariate space in R P via reparameterization.Chen & Tokdar (2021) generalized this to incorporate spatial dependence.Hjort & Walker (2009) proposed the quantile pyramid (QP) for nonparametric inference for a single distribution and briefly mentioned a possible extension to QR.Most similar to our approach, Rodrigues et al. (2019a) used the QP for QR.In their work, independent QPs are used to specify the prior distribution for quantiles at (p + 1) pivotal locations in a bounded p dimensional covariate space.For each quantile, a linear QR is then constructed as the hyperplane passing through the specified quantile at each of the pivotal locations.Rodrigues et al. (2019b) adapted this idea to a spline regression setting.

Our contribution
We generalize the construction of Hjort & Walker (2009) by incorporating dependence in the QPs across the covariate space and by allowing for non-binary splits in the pyramids.
Our approach allows direct and flexible modeling of the quantiles over covariate spaces and, by construction, naturally respects the monotonicity of QR curves.
Our contribution is twofold: (1) a novel approach to show the existence of a single QP, and (2) extension of the QP from a model for a single distribution to a model for a collection of distributions that vary with the covariate.The first point is a stepping stone to generalize the idea of QP.With an eye to the second point, it also involves expansion of the mathematical framework to move from a single QP to a process of QPs, with greater attention to mappings between the interval [0,1] and the real line.
The rest of this article is organized as follows.In Section 2, we introduce the idea of a process of dependent quantile pyramids (DQPs) and a canonical construction of the model.
Section 3 provides theoretical results.Section 4 describes prior specification and posterior inference.Simulation studies appear in Section 5 and application to real data appears in In this section, we briefly recap the QP of Hjort & Walker (2009) and introduce a DQP.
The following remark comes from Parzen (2004).
Remark 1.For a random variable Y whose distribution function is F (•), its quantile function is defined as a left-continuous function In other words, if we define the quantile function, there exists a random variable with the corresponding distribution function.This fact is useful to understand how constructing quantile functions can lead to a distribution over distribution functions.

Binary Quantile Pyramid
Hjort & Walker (2009) created the QP, a Bayesian nonparametric model that focuses on the quantiles of a distribution.The QP provides a distribution over distribution functions, and so it is suited to use as a prior distribution for an unknown distribution function.The quantile function, Q(•), on the unit interval [0, 1], is at the heart of the QP.Q(0) ≡ 0 and The pyramid is built in levels for dyadic quantile levels, as a binary tree.The 0 th level is the unit interval [0, 1].At the first level, the median of the unit interval is drawn from some density, which divides the interval into two subintervals.Intervals are recursively split into smaller subintervals, doubling the number of subintervals with each new level.Thus, at level m, we have specified the The new quantiles at the m th level can be expressed, for j = 1, 3, . . ., 2 m − 1, as where V m,j , j = 1, 3, . . ., 2 m − 1, are a set of mutually independent random variables with support [0, 1]. Figure 1 (a) contains a visualization of this idea with three quantiles.For τ ∈ (0, 1), less the specified quantile levels, Q(τ ) is filled in by linear interpolation.
There is scope for a wide variety of choices for the distribution of the conditional medians of the subintervals (the V m,j ).If the V m,j are assumed to have mean 1/2, for example, then Q m (τ ) forms a martingale sequence and has a limit almost surely by Doob's martingale convergence theorem.Moreover, if V m,j are chosen so that max Hjort & Walker (2009) showed that there exists a continuous limiting quantile process to which Q m converges.
While Hjort & Walker (2009) primarily focused on quantiles at the dyadic levels, we take inspiration from the oblique pyramid construction method introduced by Rodrigues et al. (2019a).In the oblique pyramid, quantile levels in a binary pyramid are not limited to dyadics.

General Quantile Pyramid
Rodrigues et al. (2019a) focus on finite quantile pyramids, where a pyramid has only M levels.They begin with scenarios where 2 m−1 quantiles are generated at the m th level of a pyramid, ensuring that each subinterval contains one specified quantile.This constraint leads to exactly 2 M − 1 quantiles in an M -level pyramid.They then proceed to the oblique pyramid, a pyramid in which some subintervals contain a quantile but others do not.An M level pyramid may be unbalanced, specifying fewer than 2 M − 1 quantiles.Rodrigues et al. (2019a) specify a rule for how the pyramid is constructed.We provide a brief example and refer the reader to their paper for full details.Consider the case of four quantiles, denoted as Q(τ 1 ) < . . .< Q(τ 4 ).Since there are an even number of quantiles, there are two quantiles positioned in the middle.Following the rule of Rodrigues et al. (2019a), the smaller quantile serves as the middle quantile level.Thus, Q(τ 2 ) is specified at the first level of the pyramid.Moving on to the second level, we have two subintervals.
In the left subinterval, we specify Q(τ 1 ), while the right subinterval contains two quantiles.
The smaller quantile, Q(τ 3 ), is specified at this level of the pyramid.In the third level, there are four subintervals: (0, ).In this level, we specify the remaining quantile, Q(τ 4 ), in the last subinterval, while the other three subintervals remain empty.This example is illustrated in Figure 1 (b).
We introduce additional flexibility by permitting non-binary splits.This yields the general quantile pyramid, and it empowers the user to fully customize the pyramid according to their preference.For instance, in the case of four quantiles, one can create a pyramid with Q(τ 2 ) at the first level and the remaining three quantiles at the second level, distributing one quantile to the left subinterval and two to the right subinterval.This concept is visually represented in Figure 1 (c).The move from a binary pyramid to a general pyramid requires novel notation which is introduced in the next section.

Dependent Quantile Pyramids
In the sequel, we extend the QP from a single distribution to a collection of distributions, creating a process of dependent quantile pyramids (DQPs).To do so, we construct a QP at each value of x in some index set, X .We replace each scalar V m,j in equation ( 1) with an appropriate stochastic process, say {V x,m,j , x ∈ X }.This leads to a collection of QPs that may exhibit dependence across X .Formally, we construct a distribution-valued stochastic process.For modeling purposes, the index set of the process is identified with the covariate space, as in QR.Alternatively, the index set may be described as containing values of predictors, spatial locations, constructed features, or time, to name a few possibilities.The important part is that, conditional on x, we have a model for a QP.We construct QPs on the unit interval [0, 1] following Hjort & Walker (2009) and use a general quantile pyramid scheme of Section 2.2. Figure 1 (d) contains the visualization of an example of a process of DQPs.

Notation
We write Q x (τ ) for the τ -quantile at x.For any x, {Q x (τ ), τ ∈ (0, 1)} forms a complete quantile pyramid.For any τ , {Q x (τ ), x ∈ X } provides the τ -quantile surface over the covariate space.We refer to this as a QR curve.The DQP is constructed sequentially.To provide intuition behind the DQP and to facilitate proofs of its existence, we establish two distinct sets of notation.
The first set of notation matches the recursive construction of the DQP.We begin with a DQP with m − 1 levels, where the quantile surfaces have been specified for τ ∈ T m−1 .
That is, for x ∈ X and τ ∈ T m−1 , the function {Q x (τ ), x ∈ X , τ ∈ T m−1 } has already been determined in the sequential construction.The ordered quantile levels are At the next level, these quantile levels remain.The corresponding quantiles define a set of bands within which the quantiles newly specified at level m of the pyramid must lie (at a given x, the band is merely an interval).For example, the QR curve for a quantile τ ∈ (τ * 1 , τ * 2 ) that is newly specifed at level m must lie in the band with lower boundary {Q x (τ * 1 ), x ∈ X } and upper boundary {Q x (τ * 2 ), x ∈ X }.At level m, the specified quantiles are renumbered to form T m and preserve the ordering of the τ * i .
The second set of notation is preferred when describing the probability space that underlies the DQP.For this, we need to have each variable (with subscripts) refer to a single random element in the construction of the DQP.These variables appear in different positions in the pyramid.We introduce the 'ϵ-notation' to capture the position in the pyramid and to handle the potentially non-binary splits in the general quantile pyramid.
Consider a pyramid with m levels.The sequential construction of the pyramid will place the quantile Q x,ϵ 1 at level 1, Q x,ϵ 1 ϵ 2 at level 2, and so on.The length of the sequence indicates the level at which the quantile was specfied.The notation also conveys the structural relationship between quantiles.The parent quantiles of ) .If there are K quantiles specified in the interval between the parent quantiles, ϵ m takes a value in the set 1, . . ., K. The left and right parent quantiles have ϵ m = 0 and ϵ m = K +1, respectively.The quantiles are increasing in ϵ m .This results in two distinct sequences ϵ 1 . . .ϵ m for a quantile that serves as both the left endpoint and the right endpoint of adjacent subintervals.As the construction proceeds, we use the left-endpoint sequence for which ϵ m = 0.For convenience, a short form ϵ ≡ ϵ 1 • • • ϵ m−1 will be used for the (m − 1) length sequence throughout the paper.Figure 2 illustrates both notations for a three-level DQP with a given value of x.
The oblique DQP is a special case in which all subintervals are split in two.In this case, K = 0 or 1 for all subintervals.For a dyadic, binary DQP, K = 1 for all subintervals and the quantile

Definition
DQPs come in two main varieties.The first is the process of finite dependent quantile pyramids (FDQP) while the second, arrived at from a countable sequence of FDQPs, is termed the limit process of dependent quantile pyramids (LDQP).
We begin with the FDQP.The FDQP focuses on a finite number of quantiles.As with the QP, the FDQP is defined sequentially.We focus on the quantiles in a single interval (e.g., the center interval at level 2 in Figure 2).The description applies to all such intervals.
Definition 1.We call Q M = {Q M x , x ∈ X } a process of Finite Dependent Quantile Pyramids (FDQP) with M levels if there exists an M -level QP valued stochastic process whose distribution Q M follows.That is, for each x ∈ X , Q M x is an M -level QP on [0, 1], and, for each n and distinct x 1 , . . ., x n ∈ X , Kolmogorov's permutation and marginalization conditions are satisfied.
Definition 2. We call induced by a process of FDQP with M levels, if for every x ∈ X , F M x is a distribution function and The FDQP with M levels is given by its quantile functions, {Q M x (τ ), x ∈ X , τ ∈ [0, 1]}.It is defined sequentially, beginning with level 1 of the pyramid.Assume that K quantiles are to be drawn at level m, for m = 1, . . ., M , in a subinterval that comes from level m − 1.
The quantiles for the K specified quantile levels in the subinterval are A finite number of quantiles, that are not specified directly in the pyramid are given by linear interpolation.For all τ ∈ (τ * t , τ * t+1 ), t = 0, . . ., T , we linearly interpolate Q M x (τ ), i.e.
The sequential construction and interpolation together define the conditional quantiles given x for all quantile levels τ ∈ (0, 1).
In practice, when determining the number of quantiles, T , of an FDQP, we recognize that there is a tradeoff between the number of quantiles considered and computational efficiency.Our primary recommendation is to focus on the key quantiles of interest.A secondary, generic recommendation is to prioritize symmetric quantiles such as the three quartiles or the nine deciles.
The construction of the FDQP leads to its existence, which is formally proven in Lemma 3 in Section 3.2.At each value x, the distribution is defined by a finite collection of real valued random variates.The use of stochastic processes that are well-defined ensures that the QP construction for a single x extends to X .We restrict attention to cases defined on a single probability space where the FDQP with M levels is the marginal distribution for each of the FDQPs with more than M levels, with appropriate renumbering of the quantiles in ∪ M m=1 T m , where T m is the set of quantile levels specified at level m.
With some sufficient conditions, the limit may exist, in which case we have an infinite pyramid, leading to the LDQP.Our concern is with cases where a set of quantiles that is dense in [0, 1] is determined in the limit, and so we have no need for interpolation.The existence of the LDQP is established in Section 3.2.When the limit of a sequence of FDQPs exists as m → ∞, the limit process of dependent quantile pyramids (LDQP) is defined.
Definition 3. We call Q = {Q x , x ∈ X } a limit process of Dependent Quantile Pyramids (LDQP) if there exists a QP valued stochastic process whose distribution Q follows.That is, for each x ∈ X , Q x is a QP on [0, 1], and, for each n and distinct x 1 , . . ., x n ∈ X , Kolmogorov's permutation and marginalization conditions are satisfied.
Definition 4. We call F = {F x , x ∈ X } a set of conditional distribution functions induced by a process of LDQP, if for every x ∈ X , F x is a distribution function and Q x (τ ) = inf{y :

Canonical Construction
In this section, we provide a canonical construction of the quantiles in an arbitrary subinterval at level m of a process of DQPs.To do so, we construct U -processes and V -processes and then make use of equation ( 2) in Section 2.3.We begin under the assumption that the choice of quantiles has been made and that this choice is consistent, matching the correct ordering of the quantiles.By repeatedly sampling U -processes and V -processes, we can proceed to sequential construction of the entire process of pyramids.

U-processes induced from Gaussian processes
Suppose that we are interested in obtaining K quantiles at level m for a subinterval For each sequence ϵk for k = 1, . . ., K + 1, we consider a Gaussian process (GP) {Z x,ϵk , x ∈ X } with zero mean, unit variance, and some correlation function λ(x, x ′ ), so that Z x,ϵk ∼ GP(0, λ(x, x ′ )).The correlation function governs the interdependence of quantiles across the covariate space.We note that the correlation function can differ for different ϵk.
Once V -processes are generated, we can proceed to the quantiles Q x,ϵ1 , . . ., Q x,ϵK in the subinterval, using equation ( 2).Given the left and right endpoints of the subinterval, the conditional quantiles at the same quantile level can be expressed as a simple linear transformation of the V -processes.This observation also implies the preservation of the dependence structure.In other words, for any x, x ′ ∈ X and ϵ, and for all k = 1, . . ., K. When K = 1 and the construction in this section is used, the random variable V x,ϵ1 follows a beta distribution.It is a continuous, monotonic transformation of U x,ϵ1 .For measures of dependence that are invariant under such a transformation, the (conditional) dependence of V x,ϵ1 and V x ′ ,ϵ1 matches that of U x,ϵ1 and U x ′ ,ϵ1 .

Martingale construction
One construction of the dyadic QP in Hjort & Walker (2009) relies on a martingale.At each level of the QP, each interval is split in half.In this canonical construction, K = 1 and the martingale property requires that α x,ϵ1 = α x,ϵ2 for a beta distribution.Allowing for splits that do not bisect the interval, this becomes, at level m of the pyramid, α x,ϵ1 = c x,m (τ ϵ1 −τ ϵ0 ) and α x,ϵ2 = c x,m (τ ϵ2 − τ ϵ1 ) for some c x,m > 0.
For a non-binary split of an interval, the values of α x,ϵk need to be mapped to the relevant subintervals.Assume that a subinterval at the m th level contains quantiles associated with K specified quantile levels τ ϵ1 < . . .< τ ϵK .Let γ ϵk for k = 1, . . ., K denote the scaled quantile levels in the subinterval.The quantile levels of left and right endpoints of the subinterval are denoted by τ ϵ0 and τ ϵ(K+1) , respectively.Then From this, we can derive the values of α x,ϵk satisfying the martingale condition.That is, ) for k = 1, . . ., K + 1 and some c x,m > 0. We note that the chosen quantile levels τ ϵk do not depend on x.

Quantile Mapping for Transitioning to the Response Space
Assume that the DQP has been constructed on (0, 1) and denote the QP at x by Q x (τ ).
As briefly mentioned in Hjort & Walker (2009), we can transform the scale as for a proper response space, say, a real line using the inverse normal cdf transformation.Defining trend parameters µ x and scale parameters σ x , we have the canonical DQP regression model, centered on a normal theory regression: The trend parameter µ x is shared across the quantiles and controls the overall trend of quantiles throughout the covariate space.The scale parameter controls the dispersion of the distributions while the realized QPs determine the departures from normality.Together, the scale and QP determine the departure (or local fluctuation) from a set of constant variance normal models with centers given by µ x .
Various choices can be made for the trend and scale.If one believes that there is no significant global trend, µ x can be set to a constant.Alternatively, a linear regression model, µ x = x ⊤ β, may provide an effective choice.The model is compatible with more complicated models for the trend.Similarly, models for σ x can range from a simple constant scale to more complex forms.The trend and scale parameters can be incorporated into Markov chain Monte Carlo (MCMC) procedures used to fit the model.Alternatively, to save computational effort, estimates can be plugged in and the parameters treated as fixed.

Theoretical Results
The novel notation used in this section is formally defined in Appendix A while proofs of the results appear in Appendix B.

A Novel Approach to Existence of Quantile Pyramid
Hjort & Walker ( 2009) provide two different proofs for the existence of the QP.The following results establish the existence of the QP under slightly weaker conditions than those of Hjort & Walker (2009).They also apply to the oblique pyramid of Rodrigues et al. (2019a).
The QP is a probability measure for a distribution-valued random element.To show its existence, we wish to show that the sequence of probability measures that define the finite quantile pyramids (FQPs) converges to a limiting probability measure.Our argument relies on two key facts.First, the space of probability measures over distribution functions with support contained in [0, 1], when equipped with the Prokhorov metric, is compact (Parthasarathy 1967).Second, and yet to be established, the sequence of probability measures forms a Cauchy sequence.Together, these facts lead to the conclusion that the sequence of FQPs converges to a limit and that the limit is a probability measure on distribution-valued elements.
Lemma 1. Define ϵ = max t=1,...,T +1 (τ * t − τ * t−1 ).Assume that F and G are two distribution functions such that there exist y 1 , . . ., y T for which F (y t ) = G(y t ) = τ * t for t = 1, . . ., T . Then, The construction of the QP proceeds through a sequence of FQPs, indexed by m, the number of levels in the pyramid.These FQPs are defined on a single probability space (Ω, B, µ).Each ω ∈ Ω defines a sequence of FQPs with m = 1, 2, . . .levels.The distributions in the sequence all have support contained in [0, 1] and share values of the quantile function at certain specified quantile levels, as described in Section 2.2.In particular, the A Cauchy sequence on a compact set converges to a limit in the set.In this case, the sequence of measures µ m converges to a limit measure µ.The limit measure is a probability measure on distribution-valued elements where the distributions have support contained in the interval [0, 1].This reasoning leads to the existence of the QP, stated formally in the next theorem.We note that the dyadic construction of the QP satisfies the denseness condition in Lemma 2.
Then, the QP constructed as in Section 2.1 or 2.2 is a random element whose distribution is determined by (Ω, B, µ).

Existence of a Process of Dependent Quantile Pyramids
The existence of a process of DQPs follows from consideration of the joint distributions of the QPs at finite sets of indices in the index set.The construction of the FDQP in Section 2.3 ensures the existence of the joint distribution of the corresponding sequence of FQPs.From here, the argument parallels that of the previous section, with the Lévy metric and Prokhorov metric replaced with their suprema over the finite set of indices.
This ensures the existence of a probability space on which the limiting DQPs satisfy the requisite permutation and marginalization conditions.
The DQP relies on a countable collection of real-valued stochastic processes, V x,ϵk , all of which have index set X .Each of these processes satisfies Komogorov's consistency axioms.
The processes are defined on a single probability space.Selecting a finite set of distinct indices, say, x 1 , . . ., x n ∈ X , the sequence of FDQPs at these indices is generated by a countable collection of real-valued random variables.The permutation and marginalization conditions follow immediately.This establishes the following lemma.
Lemma 3. (Existence of FDQP) Under the conditions in the previous paragraph, for each m ∈ N, there exists a distribution-valued stochastic process, F m = {F m x , x ∈ X }, with the specified finite dimensional distributions.
Define the suprema of the Lévy metric and Prokhorov metric over a set S to be Lemma 6 appearing in Appendix B shows that d Lu and d Pu are metrics.We first reprise Lemma 1.
Lemma 4. Define ϵ = max t=1,...,T +1 (τ * t − τ * t−1 ).Assume that F x and G x , x ∈ S, are two sets of distribution functions such that, for each x ∈ S, there exist y x,1 , . . ., y x,T for which The construction of the FDQPs ensures that, for all ω, for each x ∈ {x 1 , . . ., x n } the Finally, noting that the set of distributions with support contained in [0, 1] n is compact under d Lu , by Tychonoff's Theorem, we know that µ m converges to some probability measure µ.This ensures the existence of the limiting DQP.
Theorem 2. Suppose that ∪ ∞ m=1 T m is dense in [0, 1].Then, for any S = {x 1 , . . ., x n } ∈ X , the DQP for x ∈ S constructed as in Section 2.3 is a random element whose distribution is determined by (Ω, B, µ).Furthermore, there exists a stochastic process defined for all x ∈ X whose finite dimensional joint distributions on the QPs are exactly these.

Posterior Inference
In this section, we discuss how to specify the DQP prior and the likelihood under the canonical construction.Suppose we have T quantiles of interest, τ * 1 < • • • < τ * T , to be specified on the DQP.For simplicity, we work with the binary pyramid.A similar formulation can be laid out in the general case.Recall that with the canonical construction of the DQP in Section 2.4 and the canonical transformation in Section 2.5, quantiles are generated in the uniform scale and are transformed to the real line.For notational simplicity, we omit the superscript R from the quantile in (3) and let Q x (τ ) denote the quantiles in the real line scale in this section.
Let the quantiles of interest be denoted by Q.For convenience, we view Q as a T × n matrix.The t th row of Q (denoted by Q τ * t ) is the vector of τ * t quantiles at x 1 , . . ., x n for t = 1, . . ., T .The τ * t quantile at x i is denoted by Q x i ,τ * t for i = 1, . . ., n.The parent quantiles that defines the left and right endpoints of the interval on which t as their elements, respectively.Write µ x for the vector of trend transformation parameters and σ x for the vector of scale transformation parameters, where µ x i and σ x i denote the element of µ x and σ x , respectively, corresponding to the value of x i .
For a GP with zero mean given the input points (x 1 , . . ., x n ), denoted by Z, let Λ = [λ x i ,x j ] n i,j=1 be the correlation matrix, where λ x i ,x j = λ(x i , x j ) is the correlation between x i and x j and λ x i ,x i = 1.Let Ψ(•; a, b) denote the cdf of the beta(a, b) distribution and Φ(•) and Φ(•; η, ν) denote the cdfs of the N (0, 1) and the N (η, ν 2 ) distribution, respectively.
Given the parameters Λ, µ x , σ x , the joint conditional density of the finite dimensional distribution of the specified quantile levels of the DQP at (x 1 , . . ., where ϕ n (•; η, V ) denotes the probability distribution function (pdf) of the n-variate Normal, N n (η, V ), the back-transformed quantiles to the Z-scale and the determinant of the Jacobian matrix is of the form where ϕ(•) and ψ(•; a, b) denote the pdfs of the N (0, 1) and the beta(a, b) distribution, respectively.The beta distribution parameters a τ and b τ can be established using the canonical choice outlined in Section 2.4.Given that we are working with the binary pyramid in this section where one quantile is estimated per subinterval, we can denote τ * t as τ ϵ1 using the ϵ-notation with an appropriate ϵ value.For τ = τ x,ϵ1 , we set a τ = α x,ϵ1 and b τ = 1−α x,ϵ1 .

Updating
Since we use the normal distribution for the transformation from [0, 1] to R, the conditional density of the response variable is piecewise-normal.Given the values of the covariate ⊤ and the parameters Q, µ x , and σ x , the pdf of y is , where I A (x) is an indicator function whose value is one when x ∈ A and zero otherwise.
Utilizing the information present in the data through the likelihood, we can update the prior in Section 4.1 via a MCMC procedure with a Metropolis-Hastings step.More details can be found in Appendix C. If desired, variations on the basic algorithm can be used to improve the mixing of the Markov chain.

Inference for the DQP
Posterior inference can be made based on the DQP prior and likelihood using the MCMC procedure in Appendix C.An estimate of a set of QR curves under squared error loss is obtained by taking the posterior mean of the sample of quantiles at each x.
The dependence across the covariate space is fundamental to the DQP framework.
The conditional quantiles are intrinsically dependent on one another through both the GP covariance structure, Σ, and the pyramid structure.Typically, increasing the level of dependence results in smoother QR curves, while reducing dependence leads to more flexible and bumpier QR curves.

Linearized Inference under the DQP
Bayesian methods allow one to distinguish between beliefs about a quantity, say a QR curve, and inference about the quantity.One strategy that has proven successful is to perform modeling in a large space and to impose parsimony through a restriction on the inference (e.g., MacEachern (2001), Hahn & Carvalho (2015)).In contrast, practitioners of classical statistics generally impose parsimony by working in a smaller model space or through model selection.
Much of the literature on QR, both classical and Bayesian, focuses on linear QR.The DQP model naturally produces a posterior distribution that assigns probability one to nonlinear QR curves.To linearize inference, one needs a distribution over the covariate and a set of draws from the posterior distribution of the QR curve.
As a criterion, we minimize the integrated squared difference between the nonlinear QR curves and a linear QR curve.That is, with x following a distribution G and a quantile level of interest τ , and presuming the following integrals are finite, We can obtain a posterior predictive distribution of quantiles for a new x value, denoted as x * .This is achieved by following a two-step algorithm.First, we run an MCMC algorithm to sample mapping parameters and DQPs from their joint posterior distribution at the covariate locations where data points exist, namely x 1 , . . ., x n .Next, for a given MCMC iterate, we sample the DQP at the new value x * from its conditional posterior distribution.
We empasize one important point.In the event that there is no existing data point at x * , the conditional posterior distribution is just the conditional prior distribution.By collecting the sampled quantiles at the new value x * , we are able to construct the marginal posterior predictive distribution of quantiles for that specific location.For details on how to perform the posterior sampling of quantiles and mapping parameters from their respective posterior distributions, please refer to Section 4.2.
The covariate is x ij = i and the response is Y ij for i = 1, . . ., 10 and j = 1, . . ., r.Let N (0, 1) denote the standard normal distribution and t df a t-distribution with df degrees of freedom.We considered the following three scenarios for the simulation study.
For each setting, we used r = 10 and r = 30 replicates at each value of x, corresponding to overall sample sizes of n = 100 and n = 300, respectively.In each of our 32 scenario-T -sample size combinations, we generated N = 100 data sets, fit QRs, and calculated the empirical mean-squared error (MSE) for each quantile τ at x ∈ X : , where Qs x (τ ) is an estimated quantile for τ level with s th simulated data set at x.We further computed the MSE for the quantile, averaged over the distribution of x as MSE(τ ) = 10 x=1 0.1 MSE x (τ ), which again is averaged over τ as We fit a binary FDQP with two levels for the T = 3 case and with three levels for the T = 7 case based on the normal inverse cdf transformation from (3), centering the FDQP on a linear regression model with µ x = x ⊤ β.Regarding the dependence of the pyramids between locations x and x ′ in the covariate space, we used the canonical construction for the Gaussian processes with zero mean vector and Gaussian correlation function λ(x, x ′ ) = exp(−∥x − x ′ ∥ 2 /ϕ) with ϕ = 5.For the beta distribution at level m in the construction of the binary pyramid prior, we used α ϵ1 = c m (τ ϵ1 − τ ϵ0 ) and α ϵ2 = c m (τ ϵ2 − τ ϵ1 ) with c m = (m + 5) 2 .The prior distribution for β is bivariate normal with mean µ 0 = (5, 0) ⊤ and variance matrix Σ 0 = diag(3, 3).For the scale transformation parameter σ x , we used the sample standard deviation at x as a plug-in estimator.
We compare our method to four alternatives: a classical QR approach by Koenker & Bassett (1978)  For the MCMC runs of bayesQR, qrjoint, JSQR-GP, and DQP, we used a warmup of 1, 000 iterates followed by 100, 000 draws for estimation, thinned at a rate of 1 in 100.
The AMSE values are summarized in Figure 3, while the corresponding values and standard errors can be found in Table 1 and Table 2 in Appendix D. Examining Figure 3, the linearized DQP ('DQP-lm') demonstrated competitive performance relative to the alternative approaches in Scenario 1.It either had the lowest AMSE or had an AMSE within a two standard error range of the lowest value for all of the cases of Scenarios 1-1 and 1-2 except for the T = 7 and n = 100 case.This suggests that when the underlying relationship between the response and covariates is believed to be linear, the DQP can be effectively used for linear inference, yielding reasonable results.In Scenario 1, qrjoint had the lowest AMSE in most cases, primarily because its linearity assumption in the model closely aligns the 0.7 quantile.Lower quantiles had slopes closer to zero.Kadane & Tokdar (2012) analyzed the same data set and came to a different conclusion.They developed a Bayesian approach to simultaneously fit a collection of linear QRs.
They found that the increasing trend in the cyclone intensity was significant for almost all quantiles, not just the upper quantiles.
We analyzed the data set (https://myweb.fsu.edu/jelsner/temp/Data.html) from Elsner et al. (2008).The data consist of the lifetime maximum wind speeds of 291 North Atlantic tropical cyclones derived from satellite imagery.The wind speeds are the response (Y ) and range from 29.8 to 159.5 ms −1 .The year is the covariate (x).
For the purpose of comparison, we also applied the alternative simultaneous QR methods discussed in Section 5 to the same dataset.
Figure 4 displays the analysis results from the simultaneous QR methods, including our approach.In panel (a), the grey lines provide the posterior means at each year and the black lines the linearized fit.The lines are overlaid on the data points.The 95% credible intervals for the slopes of the linearized fit are shown in the right panel of (a).The slopes tend to be greater for greater quantiles, implying that stronger cyclones have been getting stronger more quickly.Indeed, the 95% credible interval for the slope of the 0.95 quantile is observed to be well above that of the 0.50 quantile.This means that the most intense cyclones are increasing in strength at a significantly faster rate than the moderately strong cyclones.The 95% credible intervals for the slope of the quantiles below 0.50 include zero.
The slopes do not significantly differ from zero for lower quantiles.Overall, our method applied to these data shows the intensity of the upper half of the cyclones to be increasing, with more powerful cyclones intensifying at a more rapid rate.
On the other hand, the outcome obtained from qrjoint in panel (b) reveals that all the 95% credible intervals lie above zero while overlapping with one another.This suggests that cyclone intensities are increasing across all levels, although the rate of increase may not significantly differ among cyclones with varying power.Meanwhile, the findings from JSQR-GP in panel (c) indicate that only the slopes for the strongest cyclones have values greater than zero, with their respective intervals also overlapping.While we lack knowledge of the absolute ground truth, it becomes evident that all three models concur on one point: the intensity of the strongest cyclones is increasing.

Discussion
We propose a nonparametric Bayesian approach to quantile regression, based on a process of DQPs.The DQP generalizes the QP of Hjort & Walker (2009) to allow dependence across a predictor space.The flexibility of the model allows us to depart from linearity and to handle regressions for multiple quantiles in unbounded predictor spaces without quantile crossing.
The canonical construction can be adapted to account for a various features of the data.
As examples, the standard normal distribution used in the inverse cdf transformation can be replaced with a distribution with different tails to handle thick or thin tailed data; skewness in the quantiles can be handled by replacing the symmetric normal distribution with an asymmetric distribution; and positively valued responses can be handled by basing the transformation on a distribution supported on the half line.The linear regression model for the mean can be replaced with a nonlinear form, resulting in large-scale nonlinearity of the QRs.Simple choices for this include the use of a deterministic form such as a fixed-knot spline or a stochastic form such as a Gaussian process.Replacement of the linear form for the scale factor (x ⊤ γ) with a form that ensures positivity, for example, exp(x ⊤ γ), relieves concerns about the implications of unbounded X .
In the simulation examples, we employed the sample standard deviation at the location x as a plug-in estimator for the scale transformation parameter σ x .An alternative is to use a pooled sample standard deviation in situations where heterogeneity is not suspected, such as Scenario 1.The results of this alternative will be included in our future work.The simulations we report above rely on a single predictor.The stochastic processes used for the FDQP are Gaussian processes that are then passed through transformations to arrive at the FDQP.Extension to the case of multiple predictors is straightforward through the use of Gaussian processes with a multivariate index.The Gaussian processes can be replaced with other processes.Turning to the probability measures on the FQPs with m and n levels, we note that This holds for all Borel A, and so d P (µ m , µ n ) ≤ ϵ.Thus, for each ϵ > 0, there is an M such that, for all m, n ≥ M , Proof of Theorem 1.By Lemma 2, the sequence {µ m } ∞ m=1 is Cauchy.Moreover, the space P equipped with the Prokhorov metric is compact, and thus complete, since the space D equipped with Lévy metric is compact (see Theorem 2.6.4 in Parthasarathy (1967)).Therefore, the sequence {µ m } ∞ m=1 is convergent.That is, there exists µ to which µ m converges and that µ provides a probability distribution on lim m→∞ F m .Thus, a limit of QP exists.Kolmogorov's permutation condition is satisfied at each step in the sequence of FDQPs since each of the V x processes satisfies the condition.His marginalization condition is also satisfied.Since {x 1 , . . ., x n } was arbitrary, this ensures the existence of a stochastic process with the specified limiting distributions (e.g., Billingsley (1968), chapter 7).■ 10.1 MCMC Algorithms

■
(iii) Accept the new proposal Q p whose t th row is replaced with Q p τt over the current value Q c with probability .
(iv) Repeat (i) -(iii) until the bottom of the pyramid.
(ii) Sample µ x ) and accept with probability .

Sample σ
x given Q and µ x following a similar step to 2.
Note that the sampling of Q can be further broken down to sampling of smaller blocks as in step 2. Doing so impacts the acceptance rate of the proposals and the mixing of the Markov chain.

Linearized Inference
Throughout this derivation, we assume that the integrals are finite.This ensures the nontrivial existence of a minimizing β * .
where P (•) denotes the posterior distribution function of Q x,τ , G(•) denotes the distribution function of x, and E x [•] denotes the expectation of the quantity at each x.Moreover, if G(x) is the empirical cdf of x, then the problem above becomes which is equivalent to the least squares linear regression problem.Each node in a tree structure corresponds to a subinterval.The first node is the unit interval.The rhombi represent quantiles pinned down in each subinterval.
• Figure 2: An example of a DQP with covariate x with three levels and 13 quantiles.
and F m+k (ω) are identical for all ω and for any positive integer k.B is the Borel σ-field generated by the open sets under the Lévy metric.The next lemma shows that the sequence of probability measures arising from the FQPs is Cauchy.The measure µ m provides the probability distribution on F m .The distance between µ m and µ n is measured by the Prokhorov metric, d P (µ m , µ n ).Lemma 2. Suppose that ∪ ∞ m=1 T m is dense in [0, 1].Then, under the Prokhorov metric d P , the sequence {µ m } ∞ m=1 is a Cauchy sequence.
ω) and F m+k x (ω) are identical for every positive integer k.This lets us apply the lemma.With a suitable choice of quantile levels, the sequence of probability measures for the set of n FQPs is Cauchy.Lemma 5. Suppose that ∪ ∞ m=1 T m is dense in [0, 1].Then, with S = {x 1 , . . ., x n }, under the supremum Prokhorov metric d Pu , the sequence {µ m } ∞ m=1 is a Cauchy sequence.
where P (•) denotes a posterior distribution function of Q x,τ and E x [•] denotes the expectation of the quantity at each x.In practice, if lacking a distribution G, we use the empirical cdf of x for G and the posterior mean of the conditional quantiles to estimate E x [Q x,τ ].With this particular choice, this minimization is obtained as a least squares fit on the posterior mean of the conditional quantiles.The same β L (τ ) can be obtained in a second fashion.First, map the nonlinear QR curve at each MCMC iterate to the best fitting linear QR curve with a least squares fit.Then map this collection of linear QR curves values to the overall best linear QR curve via a second least squares regression of the least squares fits on x.More details are contained in Appendix C. 4.3.2Quantile Prediction for a New x Value ('quantreg'), Yu & Moyeed's Bayesian QR approach implemented by Benoit & Van den Poel (2017) ('bayesQR'), a simultaneous linear QR by Yang & Tokdar (2017) ('qrjoint'), and its generalized variant incorporating spatial dependence using a Gaussian process by Chen & Tokdar (2021) ('JSQR-GP').The estimators are evaluated by AMSE.

Figure 1 :
Figure 1: (a) A binary QP following Hjort & Walker (2009) with three quartiles; (b) An oblique QP with four quantiles following Rodrigues et al. (2019a); (c) A general QP with four quantiles; (d) A process of DQPs, where there is a QP at each value of x.Each node in a tree structure corresponds to a subinterval.The first node is the unit interval.The rhombi represent quantiles pinned down in each subinterval.

Figure 2 :
Figure 2: An example of a DQP with covariate x with three levels and 13 quantiles.Both τ *and ϵ notation is shown.At level 0, the interval at each x is [0, 1].At level 1, K = 2 quantiles are specified, creating three subintervals.At level 2, four quantiles are newly specified, leading to a total of 7 subintervals.Notation for the newly specified quantiles is shown.At level 3, one quantile is specified within each subinterval.Both notations are shown for all quantiles in (0, 1).Collecting the quantiles across x gives the QR curves.The corresponding tree structure is the same across all values of x.It is shown in the right panel.Dots and rhombi represent subintervals and quantiles, respectively.

Figure 4 :
Figure 4: Left column: 15 estimated quantiles of cyclone intensity with grey data points.Right column: Estimated slopes of the quantile lines with 95% empirical credible intervals.In (a), grey lines are the posterior means of the quantiles and black lines are the linearized fit of the posterior mean.

Figure captions •
Figure captions Both τ * and ϵ notation is shown.At level 0, the interval at each x is [0, 1].At level 1, K = 2 quantiles are specified, creating three subintervals.At level 2, four quantiles are newly specified, leading to a total of 7 subintervals.Notation for the newly specified quantiles is shown.At level 3, one quantile is specified within each subinterval.Both notations are shown for all quantiles in (0, 1).Collecting the quantiles across x gives the QR curves.The corresponding tree structure is the same across all values of x.It is shown in the right panel.Dots and rhombi represent subintervals and quantiles, respectively.• Figure 3: AMSE values for each combination of scenario-T -sample size • Figure 4: Left column: 15 estimated quantiles of cyclone intensity with grey data points.Right column: Estimated slopes of the quantile lines with 95% empirical credible intervals.In (e), grey lines are the posterior means of the quantiles and black lines are the linearized fit of the posterior mean.
Lemma 6. d Lu as defined in Section 3.2 is a metric on the space of distributions.dPu is a metric on the space of probability measures over distributions.The set S need not have finite cardinality.Proof.Symmetry, non-negativity, the triangle inequality, and the zero property follow from straightforward calculation.With a nod to the St. Petersburg paradox, we must showthat d Lu (F m , F n ) < ∞.Since d L (F m x , F x ) ≤ 1 for all x ∈ S, we have that d Lu (F m , F n ) ≤ 1.The argument for d Pu is established in the same way.■ Proof of Lemma 4. From Lemma 1, we have d L (F x , G x ) ≤ ϵ for all x ∈ S. Thus d Lu (F, G) ≤ ϵ. ■ Proof of Lemma 5. Replace d L with d Lu , d P with d Pu , and B ′ with B ′ S in the proof of Lemma 2. ■ Proof of Theorem 2. The proof of the first part of the theorem follows that of Theorem 1.

Table 1 :
AMSE values over 100 simulated datasets with standard errors in parentheses when n = 100.

Table 2 :
AMSE values over 100 simulated datasets with standard errors in parentheses when n = 300.