Aspects of Convergence of Random Walks on Finite Volume Homogeneous Spaces

We investigate three aspects of weak* convergence of the $n$-step distributions of random walks on finite volume homogeneous spaces $G/\Gamma$ of semisimple real Lie groups. First, we look into the obvious obstruction to the upgrade from Cesaro to non-averaged convergence: periodicity. We give examples where it occurs and conditions under which it does not. In a second part, we prove convergence towards Haar measure with exponential speed from almost every starting point. Finally, we establish a strong uniformity property for the Cesaro convergence towards Haar measure for uniquely ergodic random walks.


Introduction
Let G be a real Lie group and Γ a lattice in G, that is, a discrete subgroup of G such that the homogeneous space X = G/Γ admits a G-invariant Borel probability measure m X .This measure m X is unique and we refer to it as the (normalized) Haar measure on X.A good example to have in mind is G = SL d (R) and Γ = SL d (Z).
The objects of study in this paper are random walks on X, given by probability measures µ on G: A step corresponds to randomly choosing a group element g ∈ G according to µ and then moving from the current location X ∋ x to gx.Starting at x 0 ∈ X, the distribution of the location after n steps is given by the convolution µ * n * δ x 0 , (1.1) which is the push-forward of the product measure µ ⊗n ⊗δ x 0 under the multiplication map G n × X ∋ (g n , . . ., g 1 , x) The broader context in which the study of these random walks originated is that of subgroup actions on homogeneous spaces.After Ratner's treatment of the rigidity and asymptotic properties of unipotent actions in her celebrated series of articles [21,22,23,24], a new approach was needed to understand the dynamics of non-unipotent actions.Passing from a deterministic to a probabilistic point of view turned out to be a particularly fruitful angle.Still, understanding the long-term behavior of random walks on homogeneous spaces and the limiting behavior of the n-step distributions (1.1) is a notoriously difficult problem.Major contributions to this line of study were made e.g. by Eskin-Margulis in their work on non-divergence [15], and by Benoist-Quint in their breakthrough series of articles [4,6,7,8].We reproduce one of the main results of [8] as motivating example.For the statement, recall that a probability measure ν on X is called homogeneous if there exists a closed subgroup H of G and a point x ∈ X such that supp(ν) = Hx is a closed orbit and ν is H-invariant.Theorem 1.1 ).Let µ be a compactly supported probability measure on G. Denote by S and G the closed subsemigroup and subgroup of G generated by supp(µ), respectively, and suppose that the Zariski closure of Ad(G) in Aut(g) is Zariski connected, semisimple, and has no compact factors.Then for every x 0 ∈ X there is a homogeneous probability measure ν x 0 on X with supp(ν x 0 ) = Sx 0 = Gx 0 and such that as n → ∞ in the weak* topology.
Here the weak* convergence (1.2) more explicitly means that for every compactly supported continuous function f ∈ C c (X) we have as n → ∞.Recently, it was shown by Bénard-de Saxcé [3] that the compact support assumption on µ in Theorem 1.1 can be relaxed to a finite first moment assumption; see Remark 2.7.Another recent generalization of the theorem above in joint work of the author with C. Sert and R. Shi [19] replaces the algebraic assumption on the support of µ by a certain expansion condition, which allows for cases in which µ is e.g.supported on a parabolic subgroup of a semisimple group.
Some questions left open by Theorem 1.1 are listed by Benoist-Quint at the end of their survey [5].A major one is the following.Question 1.2.In the setting of Theorem 1.1, is it also true that as n → ∞?
Answers are available only in special cases: Breuillard [11] established (1.3) for certain measures µ supported on unipotent subgroups, Buenger [12] proved it for some sparse solvable measures, and in previous work the author dealt with the case of spread out measures [18].Very recently, Bénard [2] observed that (1.3) holds for aperiodic measures µ under the assumption that µ has two convolution powers which are not mutually singular.
The purpose of this article is to discuss three (largely independent) aspects of random walk convergence related to Theorem 1.1 and Question 1.2, mainly having in mind the case that G is a semisimple real Lie group.We are going to use the following terminology.Definition 1.3.Let ν be a probability measure on X and x 0 ∈ X.We say that the random walk on X given by µ converges to ν on average (resp.converges to ν) from the starting point Convergence on average is also commonly referred to as Cesàro convergence.We use the two terms interchangeably.
The article is organized as follows.
In §2, we look into the obvious obstruction to the upgrade from Cesàro convergence to (non-averaged) convergence: periodicity.We show in Example 2.1 how (1.3) can fail when x 0 has finite orbit under S. Using a product construction, we can also produce a counterexample in which the orbit closure Sx 0 has positive dimension (Example 2.2).In both cases, the periodic behavior occurs at the level of the connected components of the orbit closure.As it turns out, this is no coincidence: If, in the setting of Theorem 1.1, the orbit closure Sx 0 is connected, there can be no periodicity (Theorem 2.5) and we can show that the Cesàro convergence (1.2) also holds along arithmetic progressions (Corollary 2.8).
In §3, we establish effective convergence of random walks to the normalized Haar measure m X for typical starting points x 0 : When supp(µ) generates a Zariski dense subgroup of a semisimple real Lie group G without compact factors, for any fixed not only holds but is in fact exponentially fast for m X -almost every x 0 ∈ X (Theorem 3.2, Proposition 3.4).The proof relies on an L 2 -spectral gap of the convolution operator acting on measurable functions on X. Taking into account regularity of the function f , the above can be further strengthened to the statement that almost every x ∈ X is exponentially generic (Definition 3.12): Up to a constant factor depending on derivatives of f , the exponential speed of convergence holds uniformly over all compactly supported smooth functions (Theorem 3.13).Key to this upgrade are the definition of suitable Sobolev norms and a functional analytic argument involving relative traces, first exploited in a dynamical context by Einsiedler-Margulis-Venkatesh [13].
Finally, in §4 we prove that convergence on average to m X happens locally uniformly in x 0 in a strong way when the random walk is uniquely ergodic and admits a Lyapunov function (Theorem 4.13).For example, this is the case when G is a connected semisimple real algebraic group and supp(µ) generates a non-discrete Zariski dense subgroup, and also in the setup of Simmons-Weiss [27], which has connections to Diophantine approximation problems on fractals.To this end, we introduce the new concept of (K n ) n -uniform recurrence (Definition 4.10), which refines recurrence properties of random walks previously studied in [6,15].

Standing Assumptions & Notation.
As many of our arguments work in greater generality, in the remainder of the article we will relax the assumptions stated at the beginning of this introduction.The following setup shall be in place whenever nothing else is specified: G is a locally compact σ-compact metrizable group acting ergodically on a locally compact σ-compact metrizable space X endowed with a G-invariant probability measure m X ; and µ is a Borel probability measure on G.
Acknowledgments.The author would like to express his gratitude to Andreas Wieser for valuable comments on preliminary versions of the article, and to Manfred Einsiedler for explaining how relative traces can be used to make separability effective.Thanks also go to HIM Bonn and the organizers of the trimester program "Dynamics: Topology and Numbers", in the course of which parts of this manuscript were completed, for hospitality and providing an excellent working environment.Finally, the author is grateful to the anonymous referee for pointing out a simple way to establish a better speed of convergence in Theorems 3.2 and 3.13.

Periodicity
In this section, we start with two simple counterexamples to (1.3), which illustrate ways in which a random walk may exhibit periodic behavior ( §2.1).Analyzing these examples for their common feature, we are led to a simple condition ensuring aperiodicity, stated and proved in §2.2.
Being the kernel of the reduction homomorphism from SL 2 (Z) to SL 2 (Z/2Z), we recognize Γ(2) as a finite-index normal subgroup of SL 2 (Z).In particular, Γ(2) is a lattice in G.
Then the closed subgroup G generated by supp(µ x 0 , with transitions as shown in the following diagram: Consequently, we see that the random walk with starting point x 0 alternates between the two sets The 2-step random walks on these sets constitute irreducible, aperiodic, finite state Markov chains, so that as n → ∞ in the weak* topology.⋄ In the example above, the support of µ generates a Zariski dense subgroup of G and the lattice Γ in G is irreducible.(Recall that, loosely speaking, "irreducibility" of Γ means that it does not arise from a product construction, cf.[20,Definition 5.20].)By the work of Benoist-Quint ([8, Corollary 1.8]), these properties force any orbit closure Sx 0 to be either finite or all of X.As soon as intermediate orbit closures are possible, however, one can also construct examples with periodic behavior on non-discrete orbit closures.
Example 2.2.Let G, Γ, X = G/Γ, h 1 , h 2 , x 0 and G be as in Example 2.1 and choose a diagonal matrix a ∈ SL 2 (R) such that the diagonal entries of a 2 are irrational.We are going to consider the random walk on the product space given by the probability measure µ The (closed) subgroup generated by the support of this measure µ is given by G × aGa −1 = SL 2 (Z) × a SL 2 (Z)a −1 .Indeed, the correct entry in the second copy of G can be arranged using a finite product of g ±1 1 , g ±1 3 , and then the entry in the first copy can be corrected using g ±1 2 , g ±1 4 .By Theorem 1.1 we thus know that for the starting point (x 0 , x 0 ) ∈ X × X we have the weak* convergence as n → ∞, where ν (x 0 ,x 0 ) is the homogeneous probability measure on the closure of the G × aGa −1 -orbit of (x 0 , x 0 ).(Recall that it makes no difference for the closure whether one considers the orbit under the generated subgroup or subsemigroup.) Let us identify this orbit closure.In the first copy of X, we recognize the finite orbit O from Example 2.1.In the second copy, we see the action of irrational conjugates of h 1 , h 2 .As the acting group has product structure, the orbit closure in question is the product of these two orbit closures in the components: Since the orbit aGa −1 x 0 is infinite by our choice of the matrix a, it follows from [8, Corollary 1.8] that aGa −1 x 0 = X, so that for the normalized counting measure m O on O and the normalized Haar measure m X on X.However, in analogy to Example 2.1, the random walk is found to alternate between the sets for all n ∈ N. Hence, we conclude that the random walk starting from (x 0 , x 0 ) does not converge to ν (x 0 ,x 0 ) .⋄ Remark 2.3.The same behavior as in the previous example can be arranged inside a homogeneous space in G ′ = SL 4 (R) and the diagonal embeddings We therefore see that Example 2.2, i.e. periodic behavior on a non-discrete orbit closure, can be realized inside X ′ = G ′ /Γ ′ .Of course, after applying this embedding, the subgroup generated by the support of µ will no longer be Zariski dense in G ′ .⋄ 2.2.An Aperiodicity Criterion.Inspecting the examples above, one may notice that their common salient feature is that the orbit closure Sx 0 is disconnected.This naturally raises the question whether periodic behavior can also occur when this orbit closure is connected.In what follows, we answer this question in the negative.We shall use the following formalization of periodicity.
Definition 2.4.Assume that the random walk on X given by µ converges on average to a probability measure ν on X from the starting point x 0 ∈ X.We say that this convergence is periodic if there exists an integer d ≥ 2 and pairwise disjoint measurable subsets Otherwise, we call the convergence aperiodic.
The requirement on the boundaries of the sets D i is needed to ensure that the cyclic behavior is witnessed by the limit measure ν.Without a condition of this sort, one could try to artificially define D i as the set of all points in X that can be reached from x 0 precisely in n ≡ i mod d steps.Indeed, this construction is possible for example when µ is finitely supported with the property that its support freely generates a discrete subsemigroup S of G and the starting point x 0 ∈ X has a free S-orbit.The latter is the case e.g. for ) and h 2 = ( 1 0 2 1 ), and x 0 = a SL 2 (Z) for a diagonal matrix a ∈ SL 2 (R) such that the diagonal entries of a 2 are irrational.
We are now ready to state the announced aperiodicity theorem.
Theorem 2.5.Retain the notation and assumptions from Theorem 1.1 and let x 0 ∈ X be such that the orbit closure Sx 0 is connected.Then the Cesàro convergence to ν x 0 of the random walk on X given by µ starting from x 0 is aperiodic.
For the proof we need the following simple lemma.
Lemma 2.6.Let H be a Zariski connected real algebraic group and S a subset of H generating a Zariski dense subsemigroup.Then for every d ∈ N, also the d-fold . ., g d ∈ S} generates a Zariski dense subsemigroup of H.In particular, if supp(µ) generates a Zariski dense subsemigroup for some probability measure µ on H, the same is true for supp(µ * d ).
Proof.Let U ⊂ H be a non-empty Zariski open subset and consider the map over, this preimage is non-empty because U is dense in the Lie group topology and ϕ is a diffeomorphism near the identity.By the assumption that S generates a Zariski dense subsemigroup, we can thus find an element g ∈ ϕ −1 (U ) that is the product of finitely many elements of S. It follows that ϕ(g) = g d lies in the intersection of U with the subsemigroup generated by S d .The second claim involving µ immediately follows from the above together with the inclusion supp(µ * d ) ⊃ supp(µ) d .□ Proof of Theorem 2.5.Suppose d ∈ N is an integer such that there are pairwise disjoint D 0 , . . ., D d−1 ⊂ X with ν x 0 (∂D i ) = 0 for all 0 ≤ i < d and such that (µ * n * δ x 0 )(D n mod d ) = 1 for all n ∈ N as in the definition of periodicity.We have to show that d = 1.
First note that from Theorem 1.1 and the properties of the sets D i it follows that where the application of weak* convergence to the set D 0 is justified since it has negligible boundary with respect to the limit measure ν x 0 .In view of Lemma 2.6, Theorem 1.1 also applies to the d-step random walk given by µ * d .Assuming for the moment that the limit measure for this d-step random walk starting from x 0 coincides with ν x 0 , we deduce that Together, (2.1) and (2.2) imply d = 1, the desired conclusion.
It thus remains to show that the d-step random walk starting from x 0 does indeed have the same limit measure as the 1-step random walk.Denoting by S and S d the closed subsemigroups of G generated by supp(µ) and supp(µ * d ), respectively, this statement is equivalent to the equality Sx 0 = S d x 0 of orbit closures.To prove this, let g ∈ supp(µ) be arbitrary.We claim that Indeed, since Sx 0 is homogeneous, it is invariant under the group generated by S. As Sx 0 clearly contains S d x 0 , the inclusion "⊃" follows.For the reverse inclusion let We already noted that Theorem 1.1 applies to µ * d .In particular, the orbit closure S d x 0 and its translates by g −k , 0 ≤ k < d, are submanifolds of Sx 0 .Necessarily, all these translates have the same dimension, and since together they make up Sx 0 by the claim above, their shared dimension coincides with that of Sx 0 .This implies that S d x 0 is open in Sx 0 .However, it is also closed, so that the assumed connectedness of Sx 0 forces Sx 0 = S d x 0 .This completes the proof.□ Remark 2.7.It was recently shown by Bénard-de Saxcé [3] that the compact support assumption on µ in Theorem 1.1 can be relaxed.Indeed, their [3, Theorem C] establishes the same conclusion under the substantially weaker assumption that µ has a finite first moment, meaning that Relying on this stronger result, also our Theorem 2.5 above and Corollary 2.8 below are seen to hold under a finite first moment assumption on µ, instead of requiring compact support as in Theorem 1.1.⋄ We end this section by recording a corollary of the proof above.Then Sx 0 = S d x 0 , and for the homogeneous probability measure ν x 0 on this orbit closure we have for arbitrary r ∈ N 0 that as n → ∞ in the weak* topology.
Proof.The statement about orbit closures was established as part of the proof of Theorem 2.5.From Theorem 1.1 we thus get the weak* convergence which is (2.3) for r = 0. Given f ∈ C c (X), the general case follows by applying (2.4) to the compactly supported continuous function f r defined by This corollary sharpens the convergence statement in Theorem 1.1 in the case of a connected orbit closure: The Cesàro convergence to ν x 0 holds along arbitrary arithmetic progressions.Although this does not provide an answer to Question 1.2, it at least allows the following conclusion to be drawn: If (n i ) i is a sequence of indices such that µ * n i * δ x 0 converges to a weak* limit different from ν x 0 as i → ∞, then (n i ) i cannot contain a density 1 subset of an infinite arithmetic progression.

Spectral Gap
In this section, we will explain how a spectral gap of the convolution operator π(µ) associated to a random walk entails the convergence of µ * n * δ x towards m X for m X -a.e.x ∈ X.In its simplest form, the involved argument works in great generality and also produces an exponential rate of convergence from almost every starting point when the test function f is fixed.This is done in §3.1.The following subsections §3.2- §3.4 are dedicated to a substantial refinement of this spectral gap argument for random walks on homogeneous spaces of real Lie groups, making the exponentially fast convergence uniform over smooth test functions.

Generic Points
and x ∈ X, and that it extends to a continuous contraction on each L p -space (see [9,Corollary 2.2]).We shall study its behavior on L 2 (X, m X ).By ergodicity, the G-fixed functions are the constant functions, so we restrict our attention to their orthogonal complement L 2 0 (X, m X ) of L 2 -functions with mean 0. Definition 3.1.We say that µ has a spectral gap on X if the associated convolution operator π(µ) restricted to L 2 0 (X, m X ) has spectral radius strictly less than 1.We are going to use the notation ρ(T ) to denote the spectral radius of an operator T .Then by the spectral radius formula, µ having a spectral gap on X can be reformulated as the requirement that Given the existence of a spectral gap, we obtain an almost everywhere convergence result in a quite general setup.Theorem 3.2.Suppose that µ has a spectral gap on X.Then m X -a.e.x ∈ X is generic for the random walk on X given by µ, meaning that as n → ∞ in the weak* topology.This convergence is exponentially fast in the sense that for every fixed f ∈ L 2 (X, m X ) we have for m X -a.e.x ∈ X.
Proof.By separability of C c (X), for the statement about weak* convergence it suffices to prove m X -a.s.convergence for one fixed function f ∈ C c (X).Consequently, it is enough to prove the second assertion of the theorem.To this end, fix a function f ∈ L 2 (X, m X ) and a rational number ρ π(µ)| L 2 0 < α < 1, and consider the Then in view of the spectral radius formula we have for sufficiently large n ∈ N. Fix in addition a rational number ε ∈ (0, 1).By Chebyshev's inequality, the above implies that for large n we have By Borel-Cantelli it follows that for all x in a full measure set A α,ε , the inequality holds only for finitely many n ∈ N. Since π(µ) n f (x) = f d(µ * n * δ x ), we conclude that (3.1) holds for all x in a countable intersection of the sets A α,ε over rational numbers α approaching ρ π(µ)| L 2 0 and ε approaching 0 from above.□ Remark 3.3.In the second conclusion of Theorem 3.2, how long it takes for the exponential rate of convergence to kick in depends on the point x.However, the measure of sets on which one has to wait for a long time can be controlled as follows: ∥ op ≤ α n for all n ≥ N .Then if we additionally take ε ∈ (0, 1) and denote for every n ≥ N .In particular, the measure of the set on which the exponential convergence does not start during the first n steps decays exponentially in n. ⋄ We now demonstrate that the previous result covers the case announced in §1.
Proposition 3.4.Let G be a connected semisimple real Lie group without compact factors and with finite center, Γ ⊂ G a lattice, and X the homogeneous space G/Γ endowed with the Haar measure m X .Suppose that the closed subsemigroup S generated by supp(µ) has the property that Ad(S) is Zariski dense in Ad(G).Then µ has a spectral gap on X.
Proof.Consider the regular representation of G on L 2 0 (X, m X ).By [1, Lemma 3] it doesn't weakly contain the trivial representation.From this, in view of [25, Theorem C], the result follows if we can argue that the projection of µ to any simple factor of G is not supported on a closed amenable subgroup.However, since amenability passes to the Zariski closure (see e.g.[28,Theorem 4.1.15])the latter would imply that one of the simple factors of Ad(G) is amenable, hence compact by a classical result of Furstenberg (see e.g.[28,Proposition 4.1.8]).□ 3.2.Good Height Functions.Inspecting the proof of Theorem 3.2, one observes that every step is effective, with explicit bounds and good control over the measure of exceptional sets, except for the very first one: separability of the space C c (X) of compactly supported continuous functions.In the remainder of this section, we aim to also make effective this step, the goal being exponentially fast convergence µ * n * δ x → m X from almost every starting point, uniformly over functions f on X.As merely continuous functions can behave arbitrarily badly (with respect to the convergence problem at hand), there is no hope of achieving this feat for all f ∈ C c (X).We shall therefore restrict our attention to smooth functions of compact support, and take into account their regularity by considering not just their L 2 , but also certain Sobolev norms.Built into the definition of these norms will be what we call a good height function, the concept of which is introduced in this subsection.
Our setup is as follows: Let G be a real Lie group with Lie algebra g.We endow g with a scalar product, which we use to define a right-invariant metric d G on G. Given a lattice Γ ⊂ G, this metric descends to a metric d X on X = G/Γ such that the projection G → X is locally an isometry.Moreover, we fix an orthonormal basis of g, using which we will identify g with R dim g .Here is the crucial definition.Definition 3.5.We call a measurable function ht : X → (0, ∞) a good height function if there exists 0 < R ≤ 1 and a function r : X → (0, R] with the following properties: (i) The restriction of the exponential map exp : (−R, R) dim g → G is a diffeomorphism onto its image and we have exp((−r/2, r/2) dim g ) ⊂ B G r (e) for all r ≤ R, where B G r (e) denotes the open ball of radius r around the identity e ∈ G with respect to the metric d G on G. (ii) For all x ∈ X, the projection G ⊃ B G r(x) (e) → X, g → gx is injective.(iii) There exist constants c, κ > 0 such that r(x) ≥ c ht(x) −κ for all x ∈ X. (iv) There exists a constant σ > 1 such that ht(x) ≤ σ ht(gx) for all x ∈ X and all g ∈ B G r(x) (e).The definition suggests to think of a good height function as reciprocal of the injectivity radius.And indeed, this viewpoint allows their construction on any homogeneous space X = G/Γ.Proposition 3.6.Let G be a real Lie group and Γ a lattice in G. Then X = G/Γ admits a good height function.
Proof.Choose R > 0 such that condition (i) of the definition is satisfied and set r(x) = min{R, r inj (x)}, where r inj (x) is the injectivity radius at x ∈ X, i.e. the maximal radius such that (ii) holds at x. Define ht(x) = r(x) −1 .
Then the only thing that needs to be verified is the validity of (iv).We claim that it holds with σ = 2.This will follow if we can show that whenever g ∈ B G r(x) (e).To this end, let r > r inj (x).Then by definition, there are distinct g 1 , g 2 ∈ B G r (e) such that g 1 x = g 2 x.As g ∈ B G r(x) (e), right-invariance of the metric implies d G (g i g −1 , e) = d G (g i , g) ≤ d G (g i , e) + d G (g, e) < r + r(x) < 2r for i = 1, 2, and we also have (g 1 g −1 )gx = (g 2 g −1 )gx.This shows that r inj (gx) ≤ 2r, and as r > r inj (x) was arbitrary, we see that (3.2) holds.□ Often, however, one might want to work with different, naturally occurring height functions.The flexibility in our definition of a good height function accommodates this possibility.
In the examples below, we denote by λ 1 (Λ) the length of a shortest non-zero vector in a lattice Λ ⊂ R d .
Example 3.7.Let G = SL d (R) and Γ = SL d (Z).Then X = G/Γ can be identified with the space of lattices in R d with covolume 1 via Then the function ht = λ −1 1 , defined on X via the above identification, is a good height function.Indeed, one can first choose R > 0 such that (i) is satisfied, and then set r(x) = min{R, r inj (x)} as in the proof of Proposition 3.6.Then (ii) is automatically satisfied, and (iv) is valid for a suitable choice of σ due to the inequality λ 1 (gx) ≤ ∥g∥λ 1 (x) for g ∈ G and x ∈ X, where ∥•∥ denotes any matrix norm.To see that also (iii) holds, let x = gΓ and suppose that hx = x for some h ∈ G with h ̸ = e.Then for all γ ∈ SL d (Z), the matrix (gγ) −1 h(gγ) fixes the lattice Z d but is not the identity, so that for some constants c 1 , κ 1 > 0. For a basis change γ ∈ SL d (Z) such that gγ consists of a reduced basis of the lattice x we have ∥gγ∥ ≤ c 2 λ 1 (x) −κ 2 for some c 2 , κ 2 > 0 (cf.e.g.[26,Chapter III]).With this choice, the above inequality implies ∥h − e∥ ≥ cλ 1 (x) κ for c = c 1 /c 2 and κ = κ 1 κ 2 .Since near the identity, the metric d G on G is Lipschitzequivalent to the distance induced by ∥•∥, this establishes (iii).⋄ A similar construction is possible in a more general context.

Example 3.8 ([13]
).Let G = G(R) be the group of real points of a semisimple Q-group G and Γ an arithmetic lattice in G. Choose a rational Ad(Γ)-stable lattice g Z ⊂ g.Then, using similar reasoning as in the previous example, the function ht on X = G/Γ defined by ht(x) = λ 1 (Ad(g)g Z ) −1 for x = gΓ ∈ X is seen to be a good height function (cf.[13, §3.6]).⋄ 3.3.Sobolev Norms.Given a good height function ht on X, the associated Sobolev norm of degree ℓ ≥ 0 of a compactly supported smooth function f ∈ C ∞ c (X) is defined by where the sum runs over differential operators D given by monomials of degree at most ℓ in elements of the fixed orthonormal basis of g in the universal enveloping algebra.
In other words, the differential operators D appearing above are Here are two immediate observations.Lemma 3.9.Let ht be a good height function on X and S ℓ the associated Sobolev norms.
(i) The norms S ℓ are induced by inner products Proof.Part (i) is clear.Part (ii) is also immediate from the definition of the Sobolev norms, once we know that a good height function must be bounded away from 0. The latter, however, follows directly from property (iii) in the definition of a good height function, as the function r appearing there is assumed to be bounded.□ The proof of our convergence result in §3.4 will depend on the following proposition.

Proposition 3.10 ([13]
).For the Sobolev norms associated to a good height function on X, there exists a non-negative integer ℓ 0 ≥ 0 and a constant C > 0 with the following properties: (i) (Sobolev embedding estimate, [13, (3.9)]) (ii) (Finite relative traces, [13, (3.10)])For all integers ℓ ≥ 0 the relative trace Tr(S 2 ℓ |S 2 ℓ+ℓ 0 ) is finite, meaning that for any orthogonal basis (e (k) ) k in the completion of C ∞ c (X) with respect to S ℓ+ℓ 0 We refer to Bernstein-Reznikov [10] for a systematic treatment of relative traces.In particular, it is proved in this reference that the above expression is independent of the choice of orthogonal basis.
The proofs in [13] of the statements in the above proposition are given for the height function from Example 3.8.However, the only properties used are those in our definition of a good height function.In fact, the arguments only depend on validity of the second statement in [13, Lemma 5.1], which holds in our context, as we demonstrate below.Lemma 3.11.Let ht be a good height function on X.Then there exists a nonnegative integer ℓ 0 ≥ 0 and a constant C > 0 such that for every non-negative integer ℓ ≥ 0 and every differential operator D given by a monomial of degree at most ℓ in elements of the fixed basis of g we have for every f ∈ C ∞ c (X) and x ∈ X. Proof.We inspect the function F = Df in a chart around x given by the exponential map: We set ε = r(x)/2, where r : X → (0, R] is the function from the definition of a good height function, d = dim g, and consider F : Then by the first statement of [13, Lemma 5.1], which is simply a Sobolev embedding estimate on R d , we know where C 1 > 0 is a constant depending only on the dimension d of g and S d,ε is the standard degree d Sobolev norm on the open subset (−ε, ε) where the sum is over all multi-indices α of degree at most d and ∂ α F is the corresponding standard partial derivative of F .Using property (iii) in the definition of a good height function, (3.3) implies that where C 2 > 0 is another constant and we used that ht is bounded away from 0 to replace κd appearing in the exponent by ℓ 0 = max{⌈κd⌉, d}.Using properties (i) and (ii) in the definition of a good height function, we find C 3 > 0 such that To see this, one needs to note two things: firstly, that by the chain rule the partial derivatives of F at a point v ∈ (−ε, ε) d in the chart can be expressed as linear combinations of derivatives D ′ F appearing on the right-hand side in (3.5) evaluated at the corresponding point x ′ = exp(v)x, with fixed coefficient functions depending only on finitely many derivatives of the exponential map on (−ε, ε) d ; and secondly, that the Haar measure m X is a smooth measure, meaning that it has a smooth and nowhere vanishing density w.r.t.Lebesgue measure in the chart.Combining (3.4), (3.5), condition (iv) in the definition of a good height function, and plugging back in the definition of F , we finally arrive at for yet another constant C 4 > 0, which is the one appearing in the lemma.□ 3.4.Exponentially Generic Points.Now we are ready to define the notion of effective genericity we wish to establish, and to prove the main convergence result of this section.
Until the end of this section, we fix a good height function ht on X.Moreover, given a bounded measurable function f on X and n ∈ N we will use the notation for x ∈ X.We refer to D n (f ) as the time n discrepancy for the function f .Definition 3.12.We say that a point x ∈ X is (ℓ, β)-exponentially generic if ℓ ≥ 0 is a non-negative integer and β a real number in (0, 1) satisfying where S ℓ is the degree ℓ Sobolev norm associated to ht.
With this terminology, we have the following result, which quantifies the dependence on the function f in the effective part of Theorem 3.2.Theorem 3.13.Let G be a real Lie group, Γ ⊂ G a lattice and X = G/Γ endowed with the Haar measure m X .Suppose that µ has a spectral gap on X.Then there exists a non-negative integer -exponentially generic.
Our argument uses ideas from the proof of [13,Proposition 9.2].Recall that ⟨•, •⟩ ℓ denotes the inner product associated to the Sobolev norm S ℓ .
Proof.Set ℓ 1 = 2ℓ 0 with ℓ 0 from Proposition 3.10.We denote by H the completion of C ∞ c (X) with respect to S ℓ 1 .The first step of the proof is to argue that H admits an orthonormal basis (e (k) ) k with respect to S ℓ 1 that is also orthogonal with respect to S ℓ 0 .To this end, let us endow H with the scalar product ⟨•, •⟩ ℓ 1 associated to S ℓ 1 .This makes H into a Hilbert space.As a consequence of Lemma 3.9(ii), ⟨•, •⟩ ℓ 0 defines a bounded positive definite Hermitian form on (H, ⟨•, •⟩ ℓ 1 ).Using Riesz representation it follows that there is a bounded positive self-adjoint operator T on (H, ⟨•, •⟩ ℓ 1 ) such that ⟨v, w⟩ ℓ 0 = ⟨T v, w⟩ ℓ 1 for all v, w ∈ H. Finiteness of the relative trace Tr(S 2 ℓ 0 |S 2 ℓ 1 ) from Proposition 3.10(ii) then translates into the statement that T is a trace-class operator on (H, ⟨•, •⟩ ℓ 1 ) (cf. [14, Proposition 6.44]); in particular, the operator T is compact (cf.[14, Proposition 6.42]).By the spectral theorem, T is thus diagonalizable.Hence, an orthonormal basis (e (k) ) k of (H, ⟨•, •⟩ ℓ 1 ) consisting of eigenvectors of T is a basis with the desired properties.
Next, fix rational numbers ρ π(µ)| L 2 0 < α < 1 and ε ∈ (0, 1).As in the proof of Theorem 3.2, using Chebyshev's inequality we find that for every k ≥ 0 and large enough n we have = e (k) − e (k) dm X .Since the relative trace Tr(S 2 0 |S 2 ℓ 0 ) is finite by Proposition 3.10, the terms on the right-hand side of (3.6) are summable over k, n ≥ 0. Borel-Cantelli thus implies that lim sup is a null set.Let A α,ε be the complement of this null set.We claim that any x ∈ A α,ε is (ℓ 1 , α 1−ε )-exponentially generic.Fix such a point x.Then we know that there are only finitely many pairs (k, ).Thus, there exists n 0 such that for n ≥ n 0 the inequality for the expansion of f in terms of the orthonormal basis (e (k) ) k .Then, using the triangle inequality, we can estimate the time n discrepancy for f as follows: The exchange of integral and summation involved in the above estimate is justified by part (i) of Proposition 3.10: It ensures that the functions e (k) are defined pointwise and the series expansion of f converges uniformly.Next, for n ≥ n 0 an application of the Cauchy-Schwarz inequality implies that the right-hand side of (3.7) is strictly less than Again by Proposition 3.10, the relative trace Tr(S 2 ℓ 0 |S 2 ℓ 1 ) is finite.Hence, in view of our definition of exponential genericity and the fact that n 0 does not depend on f , combining (3.7) and (3.8) establishes the claim.It follows that all x in a countable intersection of the sets A α,ε over rational numbers α approaching ρ π(µ)| L 2 0 and ε approaching 0 from above are ℓ 1 , ρ π(µ)| L 2 0 -exponentially generic, giving the theorem.□ Remark 3.14.In analogy to Remark 3.3, we can control the measure of the set of points where exponentially generic behavior is not observed for a given number of steps: If we define ) , as the proof of Theorem 3.13 demonstrates.Thus, again, the measure of the set of "bad points", on which exponential genericity takes more than n steps to manifest, is itself exponentially small in n. ⋄

Uniform Cesàro Convergence
In this last section, we explore the situation where the only possible limit in Theorem 1.1 is the normalized Haar measure m X .In this setting, by analogy with the case of unique ergodicity in classical ergodic theory, it is reasonable to expect the Cesàro convergence (1.2) to hold (locally) uniformly in the starting point x 0 .We shall prove in §4.1 below that this indeed holds true.In §4.2, we conclude the article by showing that in many naturally occurring situations something even stronger than locally uniform can be achieved.
Before continuing with the pertinent definitions, let us recall that even though the setup of Theorem 1.1 is our motivation and useful to have in mind, formally we are working with the assumptions stated at the end of §1: (X, m X ) is merely required to be a space with a G-action for which m X is invariant and ergodic.Definition 4.1.A probability measure ν on X is called µ-stationary if µ * ν = ν.The random walk on X induced by µ is called uniquely ergodic if m X is the unique µ-stationary probability measure on X.
In particular, for a random walk to be uniquely ergodic, there must be no finite G-orbits in X, where G denotes the closed subgroup of G generated by µ.In the case that X = G/Γ for a lattice Γ in G, this happens if and only if G is not virtually contained in a conjugate of Γ. (Recall that a subgroup H of G is said to be virtually contained in a subgroup L of G if H ∩ L has finite index in H.) In fact, in many cases of interest, finite orbits are the only obstruction to unique ergodicity: For example, this is true when G is a connected semisimple Lie group without compact factors, Γ is an irreducible lattice, X = G/Γ, and Ad(S) is Zariski dense in Ad(G) (see [8,Corollary 1.8]); and also in the setting of [27], a special case of which is reproduced below as Example 4.8.4.1.Locally Uniform Convergence.The notion of unique ergodicity introduced above coincides with the classical property of unique ergodicity of the Markov operator π(µ).When the space X is compact, this is enough to guarantee that the Cesàro convergence [16, §5.1]).Without compactness, we also need to assume a form of recurrence.Definition 4.2.We say that the random walk on X given by µ is locally uniformly recurrent if for every compact subset K ⊂ X and ε > 0 there exists n 0 ∈ N and a compact subset M ⊂ X with µ * n * δ x (M ) ≥ 1 − ε for all n ≥ n 0 and x ∈ K.It is called locally uniformly recurrent on average if the above holds with the Cesàro averages It is a simple exercise to check that locally uniform recurrence implies locally uniform recurrence on average.In concrete examples, recurrence properties such as these are typically established by constructing a Lyapunov function; see §4.2 below.
The following well-known fact explains why these properties are referred to as "non-escape of mass".Lemma 4.3.Let the sequence {x n } n of points in X be relatively compact and suppose that the random walk on X is locally uniformly recurrent (resp.on average).Then every weak* limit of the sequence (µ * n * δ xn ) n (resp.
□ The proof is immediate and left to the reader.We are now ready to state and prove our first result on locally uniform Cesàro convergence.
Theorem 4.4.Suppose that the random walk on X induced by µ is uniquely ergodic and locally uniformly recurrent on average.Then for every f ∈ C c (X), every compact K ⊂ X, and every ε > 0, there exists n 0 ∈ N such that for every n ≥ n 0 and x ∈ K we have Equivalently, considering the space of probability measures on X as endowed with the weak* topology, the sequence of functions converges to m X uniformly on compact subsets of X as n → ∞.
Proof.The equivalence of the two formulations is due to the definition of neighborhoods in the weak* topology by finitely many test functions in C c (X).
To prove the statement for individual functions, we proceed by contradiction.If the conclusion is false, then for some f ∈ C c (X), K ⊂ X compact and ε > 0 there exist indices n(j) → ∞ and for all j ∈ N. Let ν be a weak* limit point of the sequence Then ν is µ-stationary, and a probability measure because of our recurrence assumption and the fact that all x j lie in the fixed compact set K (Lemma 4.3).But by unique ergodicity this forces ν = m X , contradicting ( is the convolution operator associated to µ introduced in §3.The inequality in the second condition above is referred to as the contraction property of V .
Allowing Lyapunov functions to take the value ∞ is conceptually important for the proofs of results such as Theorem 1.1, in order to show that the random walk does not accumulate near a lower-dimensional homogeneous subspace.Also, affording the possibility of non-continuous Lyapunov functions is crucial in recent constructions given in the literature [6,19].For the purposes of the discussion in this section, however, it is no big restriction to have in mind the case of a continuous Lyapunov function which is finite on all of X. Remark 4.6.Let us collect some immediate observations about Lyapunov functions.
(i) If V is a Lyapunov function, then so are cV and V + c for any constant c > 0. In particular, one may impose an arbitrary lower bound on V , so that it is no restriction to assume that a Lyapunov function takes values ≥ 1, say.(ii) Given a Lyapunov function V ′ : X → [0, ∞] for the n 0 -step random walk (induced by the convolution power µ * n 0 ), one can construct a Lyapunov function V for the random walk given by µ itself by setting (iii) By enlarging α and using properness, the contraction property in the definition of a Lyapunov function V may be replaced by  15]).Identify X = SL 2 (R)/ SL 2 (Z) with the space of unimodular lattices in R 2 as in Example 3.7 and recall that we denote by λ 1 (x) the length of a shortest non-zero vector in x ∈ X.Then for every compactly supported probability measure µ on G whose support generates a Zariski dense subgroup there exist ε, δ > 0 such that V ′ = 1 + ελ −δ 1 is a finite continuous Lyapunov function for the n 0 -step random walk on X induced by µ * n 0 for some n 0 ∈ N.This construction can be generalized to higher dimensions by taking into account the higher successive minima λ 2 , . . ., λ d of lattices in R d .A more advanced construction also ensures existence of Lyapunov functions for Zariski dense probability measures with finite exponential moments when G = G(R) is the group of real points of a Zariski connected semisimple algebraic group G defined over R such that G has no compact factors.⋄ Then for any choice of p 0 , . . ., p m > 0 with m i=0 p i = 1, the measure µ = m i=0 p i δ g i defines a uniquely ergodic random walk on X admitting a finite continuous Lyapunov function.⋄ It is well known that existence of a Lyapunov function implies recurrence properties of the random walk.Lemma 4.9 ([15, Lemma 3.1]).Suppose the random walk on X given by µ admits a finite continuous Lyapunov function V .Then this random walk is locally uniformly recurrent.
The intuitive reason for this behavior is simple: The contraction property means that after a step of the random walk, the value of the Lyapunov function V on average gets smaller by a constant factor, at least when starting outside some compact set K (cf.Remark 4.6(iii) above), which one can think of as the "center" of the space.The set K can be chosen as (closure of) a sublevel set of V .By the contraction property, the number of steps required to reach it is uniform over starting points x in any given sublevel set of V , or in any given compact subset of X in the case that V is finite and continuous.This suggests that we might even let the starting points diverge, as long as this divergence is outcompeted by the geometric rate of contraction of V .We are led to the following notion of recurrence.Definition 4.10.Let (K n ) n be a sequence of subsets of X.We say that the random walk on X given by µ is (K n ) n -uniformly recurrent if for every ε > 0 there exists n 0 ∈ N and a compact subset M ⊂ X with µ * n * δ x (M ) ≥ 1 − ε for all n ≥ n 0 and x ∈ K n .It is called (K n ) n -uniformly recurrent on average if the above holds with the Cesàro averages 1 n n−1 k=0 µ * k * δ x in place of µ * n * δ x .Remark 4.11.We point out that contrary to the locally uniform situation, for the two versions of this property (with/without average) it is generally not clear whether one implies the other.⋄ We are now going to establish such recurrence properties for certain families (K n ) n of sublevel sets of Lyapunov functions, which can be chosen to be increasing If λ(φ) = 0, we say that φ has sub-exponential growth.Proposition 4.12.Let φ : N → [1, ∞) be a function.Suppose that the random walk on X induced by µ admits a Lyapunov function V with contraction factor α < 1 and set K n = V −1 ([0, φ(n)]).
(i) If φ has Lyapunov exponent λ(φ) < log(α −1 ), then the random walk on X given by µ is (K n ) n -uniformly recurrent.The number n 0 in the definition can be chosen independently of ε. (ii) If φ has sub-exponential growth, then the random walk on X given by µ is (K n ) n -uniformly recurrent on average.
The proof is a refinement of the methods in [6,15].
Proof.Let α, β be the constants from the contraction property of V and define B = β 1−α .We are going to use the same set M for both parts of the proposition, namely M = V −1 ([0, 2B/ε]), which is compact since V is proper.Then for n ∈ N and x ∈ K n we find, by repeatedly using the contraction property of V , When the exponential growth rate of φ is less than log(α −1 ), for some n 0 ∈ N we have α n φ(n) ≤ B for all n ≥ n 0 .This proves (i).
In order to prove (ii) we use a similar estimate, but have to ensure that the values µ * k * δ x (M c ) are small for a sufficiently large proportion of 0 ≤ k < n.For x ∈ K n we find, as above, Using straightforward manipulations, we further see the right-hand side of which tends to 0 as n → ∞ by sub-exponential growth of φ.Hence, with k(n) = ⌊εn/4⌋, we may choose n 0 large enough to ensure the above inequality holds for all k ≥ k(n) for n ≥ n 0 .For such n we conclude, using (4.2), which ends the proof of (ii).□ Theorem 4.4 can now be strengthened in the following way.

2. 1 .Example 2 . 1 .
Examples.The first example with periodicity is on finite periodic orbits.In the following, for d ≥ 2 we denote by 1 d the d × d-identity matrix.Consider the principal congruence lattice

Corollary 2 . 8 .
Retain the notation and assumptions from Theorem 1.1 and suppose that Sx 0 is connected.Let d ∈ N and denote by S d the closed subsemigroup of G generated by supp(µ * d ).

1 n
and to exhaust the part of X where the Lyapunov function is finite.Recall that the Lyapunov exponent of a function φ : N → [1, ∞) is the exponential growth rate λ(φ) = lim sup n→∞ log φ(n).

Theorem 4 . 13 .
In addition to the assumptions of Theorem 4.4, suppose that the random walk on X induced by µ admits a Lyapunov function V .Let φ : N → [1, ∞) have sub-exponential growth.Then for every f ∈ C c (X) we havelim n→∞ sup V (x)≤φ(n) 1 n n−1 k=0 X f d(µ * k * δ x ) − X f dm X = 0.
[6]]unov functions are functions enjoying certain contraction properties with respect to the random walk, to the effect that (on average) its dynamics are directed towards the "center" of the space, where the function takes values below some threshold.They were introduced into the study of random walks on homogeneous spaces by Eskin-Margulis[15], whose ideas were further developed by Benoist-Quint[6].
Definition 4.5.A measurable function V : X → [0, ∞] is called a Lyapunov function for the random walk on X induced by µ if 1. it is proper, in the sense that the sublevel sets V −1 ([0, L]) are relatively compact for L ∈ [0, ∞), and 2. there exist constants α