A variational approach to first order kinetic Mean Field Games with local couplings

First order kinetic mean field games formally describe the Nash equilibria of deterministic differential games where agents control their acceleration, asymptotically in the limit as the number of agents tends to infinity. The known results for the well-posedness theory of mean field games with control on the acceleration assume either that the running and final costs are regularizing functionals of the density variable, or the presence of noise, i.e. a second-order system. In this article we construct global in time weak solutions to a first order mean field games system involving kinetic transport operators, where the costs are local (hence non-regularizing) functions of the density variable with polynomial growth. We show the uniqueness of these solutions on the support of the agent density. This is achieved by characterizing solutions through two convex optimization problems in duality. As part of our approach, we develop tools for the analysis of mean field games on a non-compact domain by variational methods. We introduce a notion of `reachable set', built from the initial measure, that allows us to work with initial measures with or without compact support. In this way we are able to obtain crucial estimates on minimizing sequences for merely bounded and continuous initial measures. These are then carefully combined with $L^1$-type averaging lemmas from kinetic theory to obtain pre-compactness for the minimizing sequence. Finally, under stronger convexity and monotonicity assumptions on the data, we prove higher order Sobolev estimates of the solutions.


Introduction
The aim of the theory of mean field games (MFG for short) is to characterize limits of Nash equilibria of stochastic or deterministic differential games when the number of agents tends to infinity. Such models were first proposed about 15 years ago, simultaneously by 47,48]) and Huang-Malhamé-Caines ( [42]).
This theory turned out to be extremely rich in applications and it provided excellent mathematical questions. Its literature has witnessed a huge increase in the last decade. From the theoretical viewpoint, there are two main approaches to the study of MFG. One is based on analytical and PDE techniques, while the other is a probabilistic approach. The first approach goes back to the original works of Lasry-Lions and has been extended in a great variety of directions in the subsequent years by many authors. If a non-degenerate idiosyncratic noise is present in the models, this typically yields a parabolic structure for the corresponding PDEs and one can expect (strong) classical solutions or a suitable regularity for weak solutions to the corresponding PDE systems, even when the corresponding Lagrangians are local functions of the density variable. For a non-exhaustive list of works in this direction we refer the reader to [5,6,22,23,33,34,35,57]. The probabilistic approach proved to be equally successful for problems involving Lagrangians that are nonlocal functions of the measure variable. This approach seems to be very powerful for handling different kinds of noises in combination with the non-degenerate idiosyncratic one, such as the common noise. For a non-exhaustive collection of works in this direction we refer to [18,21,19,20].
When the model lacks a non-degenerate idiosyncratic noise, this clearly poses technical difficulties in the analysis. Typically, it means that additional structural assumptions need to be imposed on the data to be able to hope for (weak) solutions. Such conditions are, for instance, suitable notions of convexity/monotonicity (cf. [1,51]), or the presence of a suitable variational structure, as in the case of potential games ( [13,14,15,38,39,37]). In the case of local couplings, it was pointed out by Lions in [49] that the MFG system (including the planning problem) can be transformed into a degenerate elliptic system in space-time with oblique boundary conditions. Relying on this idea, in a quite general setting, under suitable assumptions on the data (such as strict monotonicity and strong convexity of the Hamiltonians in the measure and momentum variables, respectively; regularity and positivity conditions on the initial data), it has been proven recently in [53,54] that the corresponding first order MFG systems have smooth classical solutions.
For an excellent, relatively complete account on the subject and a summary of results to date we refer the reader to the collection [1].
In this manuscript we study a class of first order kinetic MFG systems, involving Lagrangians that are local functions of the density variable and that possess a variational structure, in the sense of [13,14,15].
In our setting, the MFG system can be formally written as Here Under suitable assumptions on the data, we obtain the global in time existence, uniqueness and Sobolev regularity of weak solutions to (1.1), relying on two convex optimisation problems in duality. One of these problems can be seen as an optimal control problem for the Hamilton-Jacobi equation, while its dual is an optimal control problem for the continuity equation (cf. [13,14,15]).
Review of the literature in connection to our work.
MFG systems of type (1.1) have been introduced in the context of models when agents control their acceleration. It seems that such a model can be traced back to the work [55] (in the engineering community), where the authors proposed a MFG model where agents control their acceleration. In the mathematical community, the first works in this framework seem to be the ones [3,8,12]. These works consider Hamiltonians (with our notation H −f ) and final cost functions that are nonlocal regularizing functions in the measure variable. Moreover, the Hamiltonians need to be either purely quadratic or have quadratic growth in the momentum variable. In addition, in [3,8] further conditions on the initial measure m 0 are also imposed. In [3] m 0 is taken to be compactly supported and Hölder continuous, while in [8] m 0 is taken to be compactly supported. These two works construct weak solutions to the corresponding MFG system in the sense that the Hamilton-Jacobi equation has to be understood in the viscosity sense, while the continuity equation is understood in the sense of distributions. In [12] the initial measure m 0 can be quite general and the corresponding Hamiltonian does not need to have the so-called 'separable structure' which was assumed in [3,8] and is also assumed in this manuscript. These more general hypotheses come at the price of obtaining a weaker notion of solution to the MFG system: the so-called mild solutions. However, the authors show that, under the additional separability assumption on the Hamiltonian, mild solutions become more standard weak solutions in the sense described above.
Several interesting new works are built on the models introduced in [3,8,12]. In [16] the authors study the ergodic behaviour of MFG systems, for the case of Hamiltonians that are purely quadratic in the momentum variable and nonlocal regularizing coupling functions f, g, with additional growth assumption on f in the v variable. In [4] the authors obtain mild solutions to MFG under acceleration control and state constraints, under assumptions similar to the ones in [3] on the Hamiltonians, with the possibility to consider Hamiltonians that are power-like functions in the momentum variable. Lastly, in [50] the author studies a perturbation problem associated to MFG under acceleration control, where the (Lagrangian) cost associated to the acceleration vanishes.
MFG models with degenerate diffusion share some common features with kinetic type problems. In this context we can mention several works. In [25] and [28] the authors study time independent MFG systems with purely quadratic Hamiltonians and nonlocal regularizing coupling functions, where the diffusion operator is hypoelliptic or satisfies a suitable Hörmander condition. It is also worth mentioning that our system (1.1) shares some similarities with MFG models where agents interact also through their velocities. In this direction we refer to the works [2,36,40,44,60].
Finally, a second order MFG system of type (1.1) has been recently studied in [52]. In this work the author obtains weak and renormalized solutions (in the spirit of [57]) to a MFG system that involves a non-degenerate diffusion in the v direction. This seems to be the only work in the context of kinetic type MFG models where the coupling functions f and g are taken to be local functions of the density variable m. Here the Hamiltonian H is assumed to depend only on the momentum variable and either to be globally Lipschitz continuous or to have quadratic/sub-quadratic growth. There are several summability properties and moment bounds imposed on the initial density m 0 . In the case of Lipschitz continuous Hamiltonians, the coupling functions f, g are supposed to fulfill several further assumptions: a strong uniform increasing property in the m variable and their derivatives in the (x, v) variable must have a linear growth condition in the m variable.
In [52] the presence of the diffusion in the v direction allows the author to use suitable De Giorgi type arguments to show that the solution to the Fokker-Planck equation is bounded and has fractional Sobolev regularity. These estimates seem to be instrumental to set up a fixed point scheme and to show that the MFG system has a weak solution. Furthermore, the presence of this diffusion allows to obtain second order Sobolev estimates for the MFG system. Description of our results.
As highlighted above, in this work we are inspired by [13,14,15] and we obtain existence and uniqueness of weak solutions to (1.1) (in the sense of Definition 2.3) via two convex optimisation problems in duality (Problem 3.1 and Problem 3.2). Compared to these works, several major differences arise which require new ideas. A first obvious difference is that in our setting (in contrast to the compact setting of the flat torus which is considered in the mentioned references) the velocity variable v lives in the non-compact space R d . This clearly introduces technical issues in the analysis.
To prove our main results, the general outline of our programme is the same as the one of [13,14,15]: prove the duality for Problem 3.1 and Problem 3.2; suitably relax Problem 3.1 (this will be Problem 3.7) and show that the value of this is the same as the original one; show existence of optimizers for the relaxed problem and apply the duality result again to obtain existence of solutions in a suitable weak sense. In this manuscript H is supposed to have a superlinear growth in the momentum variable, and f and g are supposed to have polynomial growth in their last variables. The growth of f, g may be taken independently of the growth of the Hamiltonian (we refer to the next section for the precise assumptions).
To show that the value of the relaxed problem is the same as the original one, a standard approach used in [13,14,15] is to test the Hamilton-Jacobi inequality of any competitor by competitors of the dual problem (i.e. solutions to the continuity equation). To justify this computation a mollification argument was applied for solutions to the continuity equation. In our case, this mollification alone is not enough because of the non-compact setting. Therefore a delicate cut-off argument has to be also implemented.
The most delicate part, however, is to obtain existence of optimizers to the relaxed problem and in particular to obtain proper compactness results for the minimizing sequences. First, in our case the time trace of the solutions to the Hamilton-Jacobi inequality constraint in Problem 3.7 is quite weak: u(t, ·) has to be understood as a locally finite signed Radon measure. Since in this work m 0 may have non-compact support, it takes additional effort to give a meaning to´M ×R d m 0 u 0 (dxdv) (a term that appears in the objective functional present in Problem 3.7). Our construction, although completely different, has some similarities in spirit with the one in [56], to define similar time boundary traces.
In order to obtain suitable estimates for the minimizing sequence of the relaxed problem, in [13,14,15] a typical trick was to test the Hamilton-Jacobi inequality constraint by the initial measure m 0 . For this reason, it was necessary to impose enough regularity, and more importantly a uniform positive lower bound of this density everywhere. Because of this, estimates on the quantity´T d m 0 u 0 dx, would readily yield summability estimates on u 0 solely. We emphasize that in this manuscript we assume that m 0 is merely a bounded and continuous probability density and so we take a completely different route when obtaining such estimates. We introduce the reachable set U m0 , a set of points in time, space and velocity that can be reached from spt(m 0 ) with arbitrary smooth admissible controls (cf. Definition 2.2). In fact, by the controllability of the underlying ODE system, which satisfies the Kalman rank condition, we have U m0 = ({0} × spt(m 0 )) ∪ (0, T ) × M × R d . In order to obtain our crucial estimates on the corresponding minimizing sequence we use well chosen test functions that are supported in U m0 . This construction seems to be new in the literature on variational MFG and we believe that it could be instrumental also in other settings, to possibly relax regularity, positivity or compact support assumptions on m 0 .
As there is no Hopf-Lax type representation formula available for solutions to our Hamilton-Jacobi equations (which was the case in [13,14]), first, we obtain estimates on truncations of the solutions. These are similar in flavour to the corresponding estimates in [15], and such ideas date back to [61]. As our terminal data typically have merely local summability, this will be the source of additional technical issues (in contrast to [15], where the terminal data was taken to be regular enough).
Let us underline that the ideas and constructions that we have described so far allow us to obtain summability estimates on u and D v u, using the structure of the problem. This is not sufficient to yield weak precompactness for minimizing sequences due to the lack of regularity estimates in x. To recover the necessary compactness we make use of averaging lemmas available in kinetic theory. Averaging lemmas go back to the works [29,30] and provide improved regularity and compactness properties for velocity averages of solutions of kinetic transport equations (see Subsection 6.1 for the precise definitions). For more details and a survey of results we refer the reader to the review [43] and the references cited therein. When regularity with respect to v is additionally available, similar properties can be deduced for the full density function: we refer for instance to [11] for regularity results in the L p case for 1 < p < +∞. We carefully tailor this approach to our setting, combining our estimates on D v u with L 1 averaging lemmas [31,32,41] to deduce precompactness for minimizing sequences. In this way we prove Theorem 6.8 on the existence of a minimizer of Problem 3.7. This in turn implies Theorem 2.4, that system (1.1) has a (unique) weak solution. As was similarly obtained in [13,14,15], we show the uniqueness of m and the uniqueness of u on {m > 0}.
A natural question that arises in the context of variational MFG is whether the variational structure and further strong monotonicity and convexity assumptions on the data would yield higher order Sobolev estimates on weak solutions. Such estimates were recently obtained in more classical frameworks in [17,38,39,40,58,59]. In this manuscript we pursue similar Sobolev estimates, implied by taking stronger assumptions on the data. In comparison with the works [38,39], in our setting we need to work with a considerably weaker notion of time trace of u, which is not stable under perturbations of the initial measure m 0 . Therefore, our Sobolev estimates remain local in time on (0, T ]. Another delicate difference is due to the presence of the kinetic transport term. Because of this, a careful choice of perturbations need to be used, which take into account the kinetic nature of the problem. As a result of this, interestingly, first we obtain estimates on differential operators of the form (tD x + D v ) applied to m and D v u. For the precise results in this direction we refer to Theorem 8.2, Corollary 8.4 and Corollary 8.5.
The structure of the manuscript is as follows. In Section 2 we state our standing assumptions and main results. In Section 3 we present the two variational problems in duality along with the relaxed problem of the primal problem. In Section 4 we have collected some preliminary estimates on weak solutions of the Hamilton-Jacobi inequality obtained on the reachable set U m0 . In Section 5 we show that the relaxed problem has the same value as the primal problem and hence the duality result holds. Section 6 contains the existence result of a solution to the relaxed problem. Here we rely on the combination of the estimates derived in the previous sections and suitably tailored averaging lemmas from kinetic theory, applied in our context for distributional subsolutions to kinetic Hamilton-Jacobi equations. In Section 7 we show that optimizers of the variational problems in duality provide weak solutions to the MFG system and, conversely, weak solutions are also optimizers of the variational problems. Furthermore, strong convexity yields (partial) uniqueness of these solutions. Section 8 is devoted to the derivation of higher order Sobolev estimates for the weak solutions. These require further assumptions on the data.
We end the paper with two appendix sections. In Appendix A we discuss the time regularity of distributional subsolutions to kinetic Hamilton-Jacobi equations which allow us to construct suitable notions of time traces. Finally, in Appendix B we show that truncations and maxima of distributional subsolutions to kinetic Hamilton-Jacobi equations remain distributional subsolutions to suitably modified equations.

Standing Assumptions and Main Results
In this section we state our main results on the existence, uniqueness and Sobolev regularity of solutions to the MFG system.
We define F and G to be the anti-derivatives of the coupling functions f and g with respect to m: Throughout, we make the following assumptions on the Hamiltonian and coupling functions.
is continuous in all variables, and convex and differentiable with respect to p. Furthermore, for some r > 1, H satisfies bounds of the form for all (x, v, p) ∈ M × R d × R d and some constants c > 0 and C H ≥ 0. Finally, the function with respect to the uniform norm.
(H2) F (x, v, m) is continuous in all variables and strictly convex and differentiable with respect to m for m > 0. Moreover, it satisfies the growth condition For m < 0, we set F (x, v, m) = +∞. (H3) G(m) is continuous and strictly convex. Moreover, it satisfies the growth condition is a probability density with finite first moment in velocity: We emphasize that here we impose growth conditions on F , G rather than on f, g.
Example 2.1. For any q > 1 and continuous bounded function c ≥ 0, the function satisfies the given assumptions.
Definition 2.2 (Reachable set). It will be useful to define the set U m0 ⊆ [0, T ] × M × R d to be the set of points potentially reachable by a collection of agents initially distributed according to m 0 and evolving according to the control systemẋ for some control a ∈ C([0, T ]; R d ). Observe that the previous control system satisfies the classical Kalman rank condition, and so we have Under these standing assumptions, we define the following notion of weak solution to the MFG system. Definition 2.3. We say that (u, m) is a weak solution to (1.1), if the following are fulfilled: (vii) The following energy equality holds: 2.1. Existence and Uniqueness. The first of our main results is the existence and uniqueness of these weak solutions.
Theorem 2.4. Let Assumption 1 hold. Then there exists a weak solution (u, m) of the mean field game system (1.1) in the sense of Definition 2.3. This solution is unique, in the sense that if (u 1 , m 1 ) and (u 2 , m 2 ) are both weak solutions in the sense of Definition 2.3, then m 1 = m 2 almost everywhere and u 1 = u 2 almost everywhere on the set {m 1 > 0}.

2.2.
Regularity. Our second main result is Sobolev regularity for weak solutions of the mean field games system (1.1). For this result we assume quadratic growth of the Hamiltonian (r = 2) and stronger convexity and regularity hypotheses on the data, as follows.

Assumption 2.
(H5) (Conditions on the coupling functions) There exists C > 0 such that the functions f, g satisfy Moreover, there exists c f , c g > 0 such that In the above assumptions, if q < 2 or s < 2 one should interpret 0 q−2 and 0 s−2 as +∞. In this way, whenm = 0, for instance, (2.7) reduces to f (x, v, m)m ≥ c f m q , as in the more regular case q ≥ 2. Similar comments can be made for (2.8).
(H6) (Quadratic growth and strong coercivity assumption on H) Suppose that r = 2 and there exist j 1 , j 2 : R d → R d and c H > 0 such that In particular, and in light of our restriction (2.1), we assume that j 1 and j 2 have linear growth.
Under these additional assumptions, we prove the following result. The proof is carried out in Section 8. Then, there exists C > 0 such that Remark 2.6. The estimates appearing in this statement are informal; we in fact obtain uniform L 2 -type summability of differential quotients (see estimate (8.8) below). The corresponding Sobolev estimates, however, are more delicate to obtain, because these would need to be understood in the sense of weighted Sobolev spaces or more generally in the sense of Sobolev spaces with respect to measures. Their precise versions would need to involve tangent spaces with respect to the measure m, but these are beyond the scope of the current manuscript. We refer to [9] on this topic.

Variational problems in duality
We will prove existence of a solution to the MFG system (1.1) through a variational characterisation. In this section we set up the variational problems used to obtain solutions. We recall that here and throughout the rest of the manuscript, we will work under Assumption 1.

3.1.
Optimal control of the Hamilton-Jacobi equation: smooth setting. We define the Fenchel conjugates of F and G respectively by Under our assumptions on F , we have the bounds where q ′ = q/(q − 1) denotes the Hölder conjugate exponent of q. Note also that F * is non-decreasing. Similar observations hold for G * .
Using this, we define the following functional: whenever the integrals are meaningful, and set A(u) = +∞ otherwise. We define a first variational problem associated to this problem.
Optimal control of the continuity equation. To state the dual problem we define the the Lagrangian L : M×R 2d → R, which is the Fenchel conjugate of the Hamiltonian H in the last variable. In other words, for any (x, v, α) ∈ M × R 2d , we define Note that L then satisfies upper and lower bounds of the form 1 with the convention that We then define a second variational problem, (formally) dual to the first.
such that m has a finite first moment in velocity: subject to (m, w) satisfying the following continuity equation: and m| t=0 = m 0 in the sense of a weak trace.
Remark 3.3. Let us comment on the weak trace of m with respect to the time variable. Since we are interested in competitors (m, w) for which B(m, w) is finite, there must exist V ∈ L r ′ (mdxdv dt) such that w = V · m (i.e. V is the density of w with respect to m). So, we notice that the previous equation can be written as ∂ t m + div x (vm) + div v (V m) = 0. The previous arguments and the assumption on m yield that Therefore, [7,Lemma 8.1.2] yields that m has a narrowly continuous representative [0, T ] ∋ t → m t ∈ P(M × R d ). We refer to this representative when considering the previous problem, so that in particular m| t=0 and m| t=T are meaningful.

Duality.
Lemma 3.4. We have the following duality: Proof. This is an application of the classical Fenchel-Rockafeller duality theorem. We define the space Then let E 1 be defined by which is a Banach space when equipped with the norm E 1 is then a Banach space when equipped with the norm On these spaces we define the respective functionals Note that these functionals are continuous and convex. We also define the bounded linear map Λ : Then We wish to apply Fenchel-Rockafeller duality. In order to do this we must verify the existence of u ∈ C 1 b such that A 0 (u), A 1 (Λu) < +∞ and A 1 is continuous at Λu. For example, we may take u to be of the form where C H denotes the constant from the bounds on the Hamiltonian (2.1). We then take ζ ∈ C 1 b (M × R d ) non-negative to have sufficiently strong decay at infinity so that Explicitly, for the case M = T d we may take for example For the case M = R d we may take Then and thus by the bounds on F * (3.1) In addition, we can readily check that A 1 is continuous at Λu, with respect to the topology on E 1 .
Finally, since u 0 = ζ − C H T and m 0 is a probability density, It remains to check that A is bounded below on E 0 . Let u ∈ E 0 and set β : Then, using the growth assumptions on F * and G * , similarly to the inequality (4.2) below, we have where c 0 was set to be a large positive constant depending only on m 0 , T, C H , C F , C G . Therefore, we are in position to apply the Fenchel-Rockafeller duality theorem (cf. [27, Chapter 3, Here E ′ 1 denotes the dual space of E 1 . By [26,IV.6] the dual space of C 0 b may be identified with the space of bounded, regular, finitely additive set functions. We further note that E 2 is isometrically isomorphic to C 0 b via the map α : The dual space E ′ 2 thus consists of precisely those functionals m of the form , In what follows, we are going to show that the above maximisation problem actually admits solutions in a better space than E ′ 1 . So, we have max where the setẼ ′ 1 stands for pairs (m, w) such that m is a finite Radon measure on [0, T ] × M × R d with finite first moment in velocity: and w is a finite vector-valued Radon measure on [0, T ] × M × R d taking values in R d . The proof of this is postponed to Lemma 3.5 below.
Then, by arguing as in [13,Section 3.3], we may identify that where the maximum is taken over (m, w) Lemma 3.5. Using the notations and assumptions from Lemma 3.4, we have Proof. Observe that any pair (m, w) ∈ E ′ 1 induces functionals on C 0 c and (C 0 c ) d . Therefore, there exist a signed Radon measurem with finite total variation and a finite vector-valued measurew which coincide with, respectively, m and w on (the closure of) C 0 c and (C 0 c ) d . Note also that |m| has finite first moment in velocity. Then By considering functions of the form φ = lχ + H 0 , for H 0 (x, v) := H(x, v, 0) (note that our assumptions on H imply in particular that H 0 ∈ C b ) and any non-negative χ ∈ C 0 b and l > 0, and ψ = 0, we find that A * 1 (−(m, w)) = +∞ unless m is a positive functional. Indeed, note that and sup l>0 −m, lχ = +∞ if m, χ < 0. Next, by taking the supremum over the smaller set (φ, ψ) Let us underline that the assumption on H 0 plays a crucial role, otherwise the integral of F * might not be finite for compactly supported test functions.
Since H is convex, for any and in particular we can compare the positive parts: Since F * is non-decreasing, where φ R = φχ R , ψ R = ψχ R for some continuous 0 ≤ χ R ≤ 1 converging pointwise to the constant function 1 as R tends to positive infinity. We conclude that, for anym,w (respectively signed, vector-valued) Radon measures with finite total variation, where we have used that H 0 is also a C 0 b function in order to relabel φ. We have thus proved that is a positive functional with Radon measure partm, then m −m is also a positive functional: given 0 ≤ φ ∈ C 0 b , let 0 ≤ χ R ≤ 1 be a sequence of continuous functions, non-decreasing with R and converging pointwise to the constant function 1 as R tends to positive infinity. Then, since 0 ≤ φχ R ≤ φ, by dominated convergence and the positivity of m, Then, taking supremum over the smaller set u ∈ C 1 c , we have We show that the right hand side is in fact equal to A * 0 (Λ * (m,w)): given u ∈ C 1 , let χ R ∈ C 1 c be a sequence of cut-off functions such that 0 ≤ χ R ≤ 1 and ∇χ R ≤ C for some constant C > 0 independent of R. Suppose also that χ R → 1 and ∇χ R → 0 pointwise as R → +∞. Let u R := uχ R . Then we have m, 1 + |v| < +∞. Moreover u(0, ·) is bounded and therefore integrable with respect to m 0 . Finally, note that we may apply the dominated convergence theorem to find that This completes the proof that the suprema over C 1 b and C 1 c are equal for the Radon measure parts. We conclude that All of the above inequalities are therefore equalities. Moreover, since attains the supremum then the same is true of the Radon measure part (m,w). Thus, without loss of generality, the optimizer is given by some (m, w) ∈Ẽ ′ 1 , i.e. a finite (vector-valued) Radon measure such thatm has a finite first moment in velocity m, 1 + |v| .
Remark 3.6. Let us notice that the the minimizer of B(m, w) is unique (by the convexity of F , G and L in their last variables). Moreover, the growth conditions on F , G and L imply that m ∈ These arguments are similar to the ones in [14, Theorem 2.1] and [13, Lemma 2]. Furthermore, the equation satisfied by m conserves mass, so that m ∈ L ∞ t L 1 x,v, , and in fact m t L 1 3.4. The relaxed problem. The third problem we define is a relaxation of Problem 3.1. Consider the functional , and subject to (3.3), understood in the sense of Definition 3.8.
Definition 3.8. We say that a triple (u, β, β T ) that belongs to the spaces from Problem 3.7 is a weak distributional solution to (ii) u 0 is similarly understood as a certain notion of a trace at t = 0 in a weak sense. In particular, the term which appears in the definition of A is to be understood as in Definition A.10. Moreover, we underline that this quantity is set to be +∞, if there exist φ ∈ C 1 c ({m 0 > 0}) nonnegative such that u 0 , φ = −∞.

The Hamilton-Jacobi Equation
In this section, we analyse the equation (3.3). We take the assumptions appropriate to the minimisation problem we will consider. Therefore, we suppose throughout that (u, β, β T ) ∈ K A is such that A(u, β, β T ) < +∞. From the finiteness of the energy we deduce in particular that

Upper bounds.
We prove upper bounds on u. First, we observe that for any constant l ∈ R the function (u − l) + := max{u − l, 0} satisfies (see Lemma B.1) We use the notation L ∞ + L q ′ to denote the set of functions which becomes a Banach space when equipped with the norm We also use the notation L 1 ∩ L q to denote the intersection of L 1 and L q made into a Banach space under the norm Note that the dual space is given by Lemma 4.1. Let l ∈ R be given and let (u, β, β T ) ∈ K A satisfy (4.1).
Then ζ is smooth and compactly supported and satisfies Recall that when we write (u − l) + (t, ·, ·), we are always referring to the version of u that is weakly right continuous with respect to time (cf. Appendix A, Lemma A.1). Since We compute and This extends by density to all non-negative ψ ∈ (L 1 ∩L q )(M× R d ), and general ψ ∈ (L 1 ∩L q )(M× R d ) by non-negativity of (u − l) + . We conclude by the fact that ( The result follows. (ii) By the definition ofÃ and the assumptions on the data one has where in the last inequality we used the estimate from (i). This further yields the claim in (ii).
Proof. We notice that (i) is a simple consequence of Lemma 4.1(i)-(ii), by setting l = 0 and t = 0 (in the sense of weak trace, given in Definition A.5).

4.2.
Local L 1 bounds. Next, we prove bounds on the negative parts of u and β. We will obtain L 1 loc (U m0 ) bounds, by use of a duality argument involving a certain class of test functions which satisfy the continuity equation associated to the control system.
Then, for any (u, β, β T ) ∈ K A such that A(u, β, β T ) < +∞, the following hold: x,v (φ T ). • The following estimate holds: Proof. Note the following properties of φ: By Lemma 4.1, For the negative part we make use of the equation. A density argument shows that φ is admissible as a test function in the weak form of the Hamilton-Jacobi inequality satisfied by u.
We apply this in the case s = 0, t ∈ (0, T ]. Using the fact that φ satisfies the continuity equation in a pointwise sense, Here, let us notice that we have used the existence of weak traces in the sense of Lemma A.1. In particular the integral´M ×R d u 0 φ 0 dxdv is meaningful and finite, since spt(φ 0 ) ⊆ spt(m 0 ) (Definition A.10).
Since D v u ∈ L r loc (U m0 ) and aφ ∈ C 1 has compact support contained in U m0 , we may integrate by parts to obtain Using the lower bounds on the Hamiltonian H, rearranging terms and using Young's inequality for products (with a small parameter), we obtain Notice that by setting t = T , (4.4) and the fact that u T ≤ β T (together with the bounds that we already have on (β T ) + ) readily yield also that ( This completes the proof.
where φ ∈ C 1 denotes the solution of the continuity equation By compactness of K, there exist finitely many φ i , i = 1, . . . , k such that The function max i φ i is continuous and so Then By Lemma 4.3, this leads to the estimate where C = C K, A(u, β, β T ) . We now claim that This follows from the controllability of the ODE system . Consider the solution φ of (4.5) for the control a found above and with this choice of φ 0 . It follows Finally, we notice that by the structure of the set U m0 , we have the bound (β T ) − ∈ L 1 loc (M × R d ).

Duality for the Relaxed Problem
Theorem 5.1. Problems 3.2 and 3.7 are in duality: By the duality result of Lemma 3.4, It therefore remains only to prove the reverse inequality. This follows from Lemma 5.2 below, which states that for all (u, β, β T ) ∈ K A and (m, w) ∈ K B , Taking the infimum over (u, β, β T ) ∈ K A and supremum over (m, w) ∈ K B gives inf (u,β,βT )∈KA In the proof of this lemma we require the following observation regarding the commutator between the operator v · D x and the operator given by convolution with a fixed function.
Proof of Lemma 5.2. The overall idea of the proof is to use m as a test function in the weak form of the inequality To make this valid, we must first introduce an approximation procedure. First, we introduce a lower cut-off on u and β. Let l ≤ 0 and define u l := max{u, l}. Similarly, for k ≤ 0, let β k := max{β, k}. Then by Lemma B.1 we obtain We emphasize that k and l are taken to be possibly independent at this point.
Next, we approximate m by a function in C 1 c , which is then an admissible test function for the Hamilton-Jacobi equation (5.1). We regularize m by convolution with a mollifier. For ease of presentation, it will be convenient to work with the time, space and velocity variables separately. Fix For the space variable, consider ψ ∈ C ∞ c (R d ) and for δ > 0 let For the time variable, fix θ ∈ C ∞ c (R) and for η > 0 let We then define the full mollifier ϕ by Then define the smooth functions Notice that for the convolution in time, (m, w) needs to be extended. We choose the following extensions. We set w(t, ·, ·) = 0 to for t < 0 and t > T . Then, if t < 0, we set m(t, ·, ·) to be the solution to the problem Similarly, for t > T we set m(t, ·, ·) to be the solution to where m T is the trace of m in time at t = T . As the final step in the approximation, we localize m. As localizers we consider smooth functions We then define m (R) := ζ R m and w (R) := ζ R w.
Then m (R) satisfies the equation where the error term is given by Here, we use the standard commutator notation [Λ 1 , Convergence of the Error Term. We show that the error term E η,δ,ε,R defined by (5.2) converges to zero in the space L 1 t (L 1 + L q ) x,v , as R → +∞ and η, ε, δ → 0, under a certain relationship between these parameters.
For the first term, either for p = 1 or p = q, using the explicit formula for the commutator we estimate x,v , where we have used Lemma 5.3 in the third inequality. Thus by choosing ε = ε(δ) sufficiently small with respect to δ, we may ensure that For the second term, observe that for all R > 0, |v · D x ζ R | ≤ CR −1 and thus For the third term, for either p = 1 or p = q we have x,v . Taking ε = ε(δ) as above, we can then ensure this term converges to zero by choosing R = R(δ) sufficiently large with respect to δ and ε(δ). Thus, for this choice of ε(δ), R(δ), we have Altogether, we have found that there exists a regime R = R(δ) and ε = ε(δ) such that

Testing the Equation.
Using m (R) as a test function in the weak form of the equation for u l , one obtainŝ Using the equation satisfied by m (R) , we havê Next, note that D v u ∈ L r loc (U m0 ). By the chain rule for Lipschitz functions composed with Sobolevregular functions, Thus, using the definition of distributional derivative we may integrate by parts to obtain Since B(m, w) is finite, w is absolutely continuous with respect to m. It follows that there exists Substituting this, we obtain We have shown above that there exists a regime R = R(δ) and ε = ε(δ) such that the final term converges to zero uniformly in η as δ tends to zero, since u l ∈ L ∞ t (L ∞ + L q ′ ) x,v . We now discuss the convergence of the other terms.
Boundary Terms. We consider the boundary terms at t = 0, T . Note that Lemma 4.1 and Corollary 4.2 yield u l (0, ·) ∈ L q ′ + L ∞ (since (u 0 ) + ∈ L ∞ + L q ′ and u l (0, ·) is bounded below), while by Lemma 4.1 and Corollary 4.4 we have (β T ) l ∈ L s ′ + L ∞ We first show that m converges to m T in L 1 ∩ L q , in the limit as δ tends to zero for a certain regime η = η(δ) and R = R(δ), ε = ε(δ) according to the regime already found above.
For t = 0, T , we write We first note that m 0 ∈ (L 1 ∩ L q ) x,v by the assumption that it is a bounded probability density, while m T ∈ L s x,v since the energy B(m, w) is finite, and m T ∈ L 1 x,v since the continuity equation conserves mass.
We apply this in the case f = f i for i = 1, 2, where Consequently, for t = 0, T ,ˆM as δ → 0 with η, R, ε chosen to depend on δ in the manner specified. Finally, we take the limit l → −∞. For the term, t = 0, convergence holds by monotonicity, and the limit is finite since The second term on the right hand side converges due to the assumption on G (2.3), since the integrand is dominated by Term involving βm. Since m ∈ L 1 ∩ L q , by standard results on approximation by mollification in L p spaces we have lim (η,δ,ε)→0,R→+∞ m (R) − m L 1 ∩L q = 0, and thus the same limit holds with η, ε, R chosen to depend on δ as described above. Then, since β k ∈ L ∞ + L q ′ , we deduce that By the definition of Fenchel conjugate, We then take the limit l → −∞. Note F and F * are both lower bounded by integrable functions (conditions (2.2) and (3.1)). Then, by monotonicity, Since the lower bound is integrable and F (·, ·, m) has finite integral by finiteness of the energy, both of these limits are finite. Thus A similar argument shows that where the right hand side is finite.
Finally, we consider k → −∞. Note that sup β≤0 F * (x, v, β) ≤ C F (x, v) ∈ L 1 by assumption (see (3.1)). Since F * (β) is a continuous non-decreasing function of β, as k decreases to negative infinity F * (·, ·, β k ) is decreasing and converges almost everywhere to F * (·, ·, β). Thus we deduce the convergence By the bounds (3.1), the right hand side is finite. Moreover, for any k ≤ 0, Thus we conclude that Lagrangian Term. For the term involving the Lagrangian, we use a similar argument as was used in [14]. This argument is based on the joint convexity of L(x, v, −w/m)m as a function of (m, w). In our case we must additionally account for the convergence of the localizer ζ R . By convexity, for all (t, x, v), the integrand satisfies the inequality Then, note that .
It follows that if we take the regime R = R(δ), ε = ε(δ), η = η(δ) established above, then We stay with this regime and consider the remaining term The integrand converges to zero almost everywhere: note then that The right hand side converges in L 1 to Cm + |w| r ′ m 1−r ′ as δ tends to zero. Thus by dominated convergence we conclude that The reverse inequality follows from Fatou's lemma. Thus Finally, we take the limit l → −∞. Since L x, v, − w m m ∈ L 1 , in the limit we obtain Conclusion. From the discussion above, we have obtained where all terms are finite. Rearranging this inequality, we obtain the statement A(u, β) + B(m, w) ≥ 0.
Corollary 5.4. Let (u, β, β T ) ∈ K A and (m, w) ∈ K B be such that A(u, β, β T ) < +∞ and B(m, w) < +∞. Then In particular, Proof. This is a consequence of the proof of Lemma 5.2 and in particular the inequality (5.4). First, let us show the first part of the statement, i.e. that β − m ∈ L 1 ((0, As in the mentioned proof, let us first pass to the limit with η, δ, ε, R in the inequality (5.4). Then, we pass to the limit as l → −∞ and k → −∞ the remaining terms.
All the terms, except the ones involving ((β T ) − ) l m T and (β − ) k m½ {u>l} pass to the limit, as in the proof. After rearranging, we also find that both´M ×R d ((β T ) − ) l m T dxdv and´T 0´M×R d (β − ) k m½ {u>l} dxdv dt are uniformly bounded, independently of l and k. Therefore, the monotone convergence theorem yields The summability results follow, and so does (5.7).
For the case of general t ∈ [0, T ], we begin by testing the equation to find that, for example, We note that we are referring to the version of u l that is weakly right continuous in time (see Appendix A). The only term that requires attention iŝ For almost all t ∈ [0, T ], u l (t) ∈ (L q ′ + L ∞ )(M × R d ) and m t ∈ L 1 ∩ L q . Thus the arguments for the boundary terms in Lemma 5.2 show that The limit l → −∞ is then taken by monotone convergence, noting that (u t ) + m t ∈ L 1 (M × R d ). Note that the argument for the case (5.6) shows that limit is not negative infinity, since all other terms have finite limits.
Corollary 5.5. Let (u, β, β T ) ∈ K A and (m, w) ∈ K B be such that A(u, β, β T ) < +∞ and B(m, w) < +∞. Then , by a constant that depends only on the data and A(u, β, β T ) and B(m, w). (2) The following estimate holds: Proof. This is a consequence of (5.3). Using the same notation as in the proof of Lemma 5.2, we rewrite (5.3) aŝ Let us observe that for some θ > 0 parameter that we choose later, Young's inequality yields We notice that − r ′ r = 1 − r ′ . Thus, by using the growth condition (2.1) on the Hamiltonian and choosing θ appropriately, we can conclude that there exists a constant C > 0 (independent of the parameters l, k, R, η, ε, δ), such that after passing to the limit with R, η, ε, δ, as in the proof of Lemma 5.2, we obtain Since the right hand side of this inequality is uniformly bounded, independently of l (by Lemma 4.1(ii), Corollary 4.2(ii) and Remark 3.6), the result follows by Fatou's lemma by sending l → −∞.

Using (5.3), we havê
Passing to the limit with R, η, ε, δ, as in the proof of Lemma 5.2, by Fatou's lemma we obtain Finally, taking the limit l, k → −∞ as in the proofs of Lemma 5.2 and Corollary 5.4, we obtain

Existence of a Solution of the Relaxed Problem
In this section we prove the existence of a solution for the relaxed problem. Consider a minimizing sequence (u n , β n , (β T ) n ) n . We will extract a convergent subsequence, and show that the limit constitutes a minimizer of the objective functional.
Assume that: • The family (u n ) n is uniformly bounded in L ∞ t L 1 x,v,loc (U m0 ). • The family (D v u n ) n is uniformly bounded in L r loc (U m0 ). • The family (β n ) n is uniformly bounded in L 1 loc (U m0 ). Then there exists a subsequence (u nj ) j that is strongly convergent in L 1 loc (U m0 ). The proof of this result will be a consequence of some intermediate results that we detail below. Remark 6.2. Let us notice that the assumptions of Proposition 6.1 hold true by Corollary 4.4 and Lemma 4.1.

Proposition 6.1 is proved by treating the Hamilton-Jacobi equation as a kinetic transport equation with right hand side bounded in
A form of compactness for the solutions can be obtained by using an averaging lemma. Averaging lemmas are results in kinetic theory showing that, for L p -bounded families of solutions to the kinetic transport equation, with L p -bounded source terms, the velocity averages enjoy additional fractional Sobolev regularity and/or strong L p -compactness. In our case we are in the setting p = 1, and we use an L 1 averaging result from [31]. It is necessary to assume a certain equi-integrability condition on the solution u n . This condition is defined below.
The required averaging lemma is quoted below. This result was proved in [31] for the stationary case, i.e. the equation v · D x u = β. The result can be adapted to the time dependent equation by standard techniques; see [32] or [41] for statements in the time dependent setting.
, the family of averages (ρ φ [u λ ]) λ∈Λ is relatively (strongly) compact in L 1 loc ([0, T ] × M). In our setting we expect to have local summability estimates in U m0 rather than [0, T ]× M× R d . To deal with this technicality we make use of a localisation procedure: given a compact set K, consider a smooth bump function ζ supported in K. If u λ satisfies the kinetic transport equation (6.2), then u λ ζ satisfies The right hand side of the above equation is bounded in , uniformly in λ, as long as u λ and β λ are bounded in L 1 loc (U m0 ). We wish to apply this to the solutions of the Hamilton-Jacobi equation. To do this, we verify the equi-integrability condition. To prove equi-integrability, we make use of the L r estimates available for the v-derivative D v u. Lemma 6.5. Let (u λ ) λ∈Λ be a bounded family in L ∞ t L 1 x,v,loc (U m0 ). Assume that (D v u λ ) λ∈Λ is a bounded family in L r loc (U m0 ). Then: Proof. We obtain higher integrability in the velocity variable by using Sobolev embedding. We first apply a localisation procedure. Given a compact set K ⊂ U m0 , let ζ K denote a smooth bump function with compact support contained in U m0 , such that ζ K takes values contained in [0, 1] and ζ K ≡ 1 on K. Then Let K ′ denote the support of ζ K . Then Moreover it is compactly supported.
We then apply Sobolev embedding in the v variable. Letting 1 * := d d−1 , we have Thus (u λ ) λ is uniformly bounded in L 1 t,x L 1 * v,loc . We now apply a bootstrap argument: it follows from the above that D v (u λ ζ K ) is bounded in This process can be repeated until we obtain that (u λ ) λ is bounded in L 1 t,x L r * v,loc , for 1/r * = 1/r − 1/d, if r < d; otherwise we may obtain L 1 t,x L α v,loc for any α < +∞. We now prove local equi-integrability. Let (A t,x ) (t,x)∈[0,T ]×M be a measurable family of measurable subsets of R d , such that |A t,x | < η for all t, x. Then where ′ denotes a Hölder conjugate exponent. From the condition on the measure of A t,x , we havê which proves equi-integrability.
It follows that, under the assumptions of Proposition 6.1, Theorem 6.4 can be applied to (ζ K u n ) n for any ζ K ∈ C ∞ c (U m0 ). We deduce strong L 1 loc -compactness for the averages (ρ φ [u n ζ K ]) n . We now use this to prove strong compactness for the full solutions u n . Lemma 6.6. Assume that the family (u n ) n satisfies the following: • (u n ) n share the same compact support K.
, the family of averages (ρ φ [u n ]) n is relatively (strongly) compact in L 1 loc . Then the family (u n ) n is relatively (strongly) compact in L 1 .
Proof. First, note that the first two assumptions imply the weak L 1 sequential compactness of (u n ) n . We pass (without relabelling) to a weakly convergent subsequence u n , and let u denote the weak limit. In the remainder of the proof we improve the mode of convergence of u n to u to strong convergence in L 1 .
Step 1: Approximation by smoothing in v. We approximate u n by a function that is smooth with respect to the v variable. Fix φ ∈ C ∞ c (R d ) and define, for ε > 0, Step 2: Compactness for the approximations. Fix v * ∈ R d . For any ψ ∈ L ∞ t,x with compact support we consider testing the sequence (u n ) n against the test function we deduce that u n,ε (·, ·, v * ) converges weakly in L 1 t,x,loc to u * v φ ε (·, ·, v * ) as n → +∞. Note moreover for each fixed v ∈ R d , u n,ε (t, x, v) is a velocity average with respect to the test function φ ε (v − ·). Therefore, by Theorem 6.4 the convergence in fact holds strongly in L 1 t,x,loc for each v ∈ R d .
Furthermore, for fixed ε > 0, the family (u n,ε ) n is equi- x,v |h|. By an Arzelà-Ascoli argument the convergence therefore holds locally uniformly in v, with respect to the strong topology on L 1 t,x,loc : that is, for all compact sets Consequently, the convergence holds in L 1 loc ; in fact, since u n,ε − u * φ ε is supported for all n in K + B Cε (0), the convergence holds in L 1 .
Step 3: Removing the approximation. The bound on D v u n implies that, for any h ∈ R d , Indeed, by definition of u n,ε , x, ·) L r v , then, one integrates in t, x and takes supremum.
Proof of Proposition 6.1. The proof of this proposition follows by applying the Lemma 6.6 to (ζ K u n ) n .
It remains to obtain the necessary convergence of (β n ) n and (β T,n ) n .
Lemma 6.7. Let (u n , β n , β T,n ) be a minimizing sequence for Problem 3.7. There exists a modification (u n ,β n ,β T,n ) of this sequence that is also minimizing such that (β n ) n is weakly precompact in L 1 loc (U m0 ) and (β T,n ) n is weakly precompact in L 1 loc (M × R d ).
Proof. We replace β n with someβ n ≥ β n and (β T,n ) n withβ T,n ≥ β T,n such that (β n ,β T,n ) is uniformly integrable, and (u n ,β n ,β T,n ) is still a minimizing sequence. We do this in a similar manner to [14]: since (β n ) − is bounded in L 1 loc (U m0 ), using a compact exhaustion of U m0 and a diagonal argument, by [45] it is possible to pass to a subsequence such that the following holds for some J n ∈ R. We defineβ n by Then it is possible to choose J n in such a way that: • For each compact set K ⊂ U m0 , the sequence (β n ) − ½ K is uniformly integrable.
• The measure of the set {(β n ) − > J n } ∩ K converges to zero as n tends to infinity.
We use the exact same construction forβ T,n , and we can get the same properties (now taking K ⊂ M × R d ).
We notice, that by construction the constraints and u T,n ≤β T,n are still satisfied. Finally, By the estimate (3.1), the integrand on the right hand side is dominated by 2C F ∈ L 1 , and thus the right hand side converges to zero as n tends to infinity. The exact same arguments apply to G * andβ T,n too. Thus (u n ,β n ,β T,n ) is a minimizing sequence. Moreover, there exists (u, β, β T ) such that up to passing to a subsequence, (u n ) n converges to u strongly in L 1 loc (U m0 ), (β n ) n converges weakly to β in L 1 loc (U m0 ) and (β T,n ) n converges to β T weakly in L 1 loc (M × R d ).

6.2.
Existence of a minimizer of A over K A . In this subsection, we prove that there exists a minimizer (u, β, β T ) by passing to the limit in the functional Theorem 6.8. Under our standing assumptions, the functional A admits a minimizer over K A .
Proof. Let (u n , β n , β T,n ) n∈N be a minimizing sequence. Without loss of generality, for example by considering u n ∈ C 1 , we may assume equality in the Hamilton-Jacobi equation: For this minimizing sequence we have, for some constant C > 0, sup n A(u n , β n , β T,n ) ≤ C.
We have discussed that this implies uniform in n bounds on the following quantities: . To get the uniform integrability on (β n ) − and (β T,n ) − , we perform the surgery argument as in Lemma 6.7. So, let (u n ,β n ,β T,n ) n∈N be the modification of the minimizing sequence (which will still have uniformly bounded energy). By Proposition 6.1 we know that (u n ) n∈N is strongly precompact in L 1 loc (U m0 ), while Lemma 6.7 yields that (β n ) n∈N and (β T,n ) n∈N are weakly precompact in L 1 loc (U m0 ) and L 1 loc (M × R d ), respectively. In particular, after passing to a subsequence let us denote by u the strong L 1 loc (U m0 ) limit of (u n ) n . In what follows, to ease the notation, we drop the tilde symbol, but whenever we write β n and β T,n , we mean the corresponding modified versions.
Passing to further subsequences (that we do not relabel), there exist limit functions so that we may also assume the following weak convergences: , as n → +∞. With these convergences in hand, we are ready to pass to the limit in the Hamilton-Jacobi inequality constraint and the functional. Note that the weak form of the inequality (Definition 3.8) implies that, for all n and all test functions Note again that φ is compactly supported in U m0 . By the weak convergence of D v u n in L r loc (U 0 ) and the convexity of H it follows that All the other convergences stated above are sufficient to guarantee convergence against φ. So, we obtain that the limit (u, β, β T ) satisfies (3.4).
Next, we consider the convergence in the functional. In addition to the previous convergences, along the previously chosen subsequence, we have The convergence of the sequence ((u 0,n ) − m 0 ) n requires special attention. The boundedness of this sequence in L 1 (M × R d ) lets us conclude that there exists a nonnegative Radon measure ν such that after passing to a subsequence (that we do not relabel) This means in particular that for all φ ∈ C c (M × R d ), we havê Since the the sequence ((u 0,n ) − m 0 ) n∈N is supported in the open set {m 0 > 0}, we get that spt(ν) ⊆ spt(m 0 ). Now, let us take φ ∈ C c ({m 0 > 0}) arbitrary and define ψ := φ/m 0 . Since m 0 ∈ C(M × R d ), by assumption, we have that ψ ∈ C c ({m 0 > 0}) and sô Thus, this means that as n → +∞, (u n,0 ) − converges weakly- * to the nonnegative Radon measure (u 0 ) − := 1 m0 · ν, i.e. (u 0 ) − has density 1 m0 with respect to ν. We notice that this means that (u 0 ) − is absolutely continuous with respect to ν. In fact, we also have that ν is absolutely continuous with respect to (u 0 ) − , and so we can write ν = m 0 · (u 0 ) − .
Let us take now φ ∈ C ∞ c (U m0 ), and test the inequalities satisfied by (u n , β n , β T,n ), similarly to (6.3), to obtain T 0ˆM×R d Incorporating also the previously described convergence of (u 0,n ) n , we can pass to the limit along the chosen subsequence and obtain T 0ˆM×R d where u 0 := (u 0 ) + − (u 0 ) − . We notice that u 0 is a signed Radon measure, supported in spt(m 0 ).
Having in hand this last inequality, we readily check that the assumptions of Lemma A.11 are fulfilled with the choice of β 0 = u 0 and β T as before. This means in particular that u satisfies where when writing the traces u 0 and u T , we are referring to the right continuous version of u in time. Since by construction, u 0 , m 0 =´M ×R d (u 0 ) + m 0 dxdv − (u 0 ) − , m 0 is finite, we have that u 0 , m 0 is meaningful and finite, with − u 0 , m 0 ≤ − u 0 , m 0 .
Lower semicontinuity of the term involving −´M ×R d m 0 u 0 (dxdv).
Proof of Claim. First, notice that since u 0 − u 0 is a positive distribution, it can be represented by a Radon measure. We may therefore write, for some ν 0 ∈ M + (M × R d ), such that spt(ν 0 ) ⊆ spt(m 0 ) and It follows that the Hahn-Jordan decomposition of u 0 satisfies

Now consider any compactly supported function
Then take a non-decreasing sequence of functions ζ k such that ζ k converges pointwise to the indicator function of the set {m 0 > 0} as k tends to infinity: consider for example functions such that This is always possible since m 0 is continuous. Then, by monotone convergence, we indeed havê as desired and the claim follows.
By the weak star convergence of (u 0,n ) + to (u 0 ) + in (L ∞ + L q ′ )(M × R d ), we also have that, for u 0 , the positive part (u 0,n ) + m 0 converges to (u 0 ) + m 0 strongly in L 1 (M × R d ). Since −u 0 ≤ −u 0 as signed measures, we deduce that The term involving G * .
For the term involving G * , we notice that by convexity First note that, for functions β such that β + ∈ L q ′ ([0, T ] × M × R d ) and β − ∈ L 1 loc ([0, T ] × M × R d ), by (3.1) the following inequality holds: Thus (6.6)ˆT The integrand is bounded by the L 1 function |F * (x, v, β)| and converges to zero almost everywhere as δ tends to zero. Thus, taking δ → 0 we obtain (6.6). A similar equality holds for all β n . Therefore, by the convexity of F * (and by arguments similar to the one for G * ), we conclude that Thus, collecting all the previous arguments, one deduces that The thesis of the theorem follows. Corollary 6.9. In the setting and notation of the previous theorem, in fact u 0 = u 0 on {m 0 > 0}.
Proof. Since (u, β, β T ) is a minimizer, where in the last equality we have used that (u n , β n , β T,n ) is a minimizing sequence.
All the above inequalities are therefore equalities. From the inequalities (6.4), (6.5) and (6.7) for each of the terms, we deduce that It follows that u 0 = u 0 as signed measures on {m 0 > 0}. Indeed, first note that u 0 ≥ u 0 as signed measures, or in other words u 0 − u 0 is a nonnegative measure. For any non-negative test function φ 0 ∈ C c ({m 0 > 0}) we have m 0 ≥ ε > 0 on the support of φ 0 , for some ε > 0. Thus there exists a constant C such that φ 0 ≤ Cm 0 . Thus Thus u 0 = u 0 as signed measures on {m 0 > 0}.

Existence and uniqueness of a solution to the MFG system
In this section we prove Theorem 2.4. First, we show that the minimizers of Problems 3.2 and 3.7 that we have obtained in the previous sections provide weak solutions (u, m) of the MFG.
Theorem 7.1. Let (u, β, β T ) be a minimizer of A over K A and let (m, w) be a minimizer of B over As a consequence, (u, m) is a weak solution to (1.1) in the sense of Definition 2.3.
Proof. By Theorem 5.1, Substituting the definitions of the functionals, we obtain Fenchel's inequality then implies that By Corollary 5.4, the left hand side is non-negative, and therefore equality holds: Moreover, equality also holds almost everywhere in the applications of Fenchel's inequality. Thus the following hold almost everywhere in [0, T ] × M × R d : By (7.1) and Corollary 5.5, By Fenchel's inequality, the integrand on the right hand side is non-negative; we deduce that equality holds in the above estimate and thus the integrand is equal almost everywhere to zero. It follows that almost everywhere on the support of m. Moreover, The energy equality then follows from substituting (7.2) and (7.3) into (7.1).
Again, the summability of the right hand side, we find that mf (·, ·, m) − ∈ L 1 [0, T ]×M×R d . Using the exact same arguments for G * , we find similarly that (β T ) + ∈ L s ′ (M×R d ) and m T (β T ) − ∈ L 1 (M×R d ).
Moreover, we have that D v u ∈ L r loc (U m0 ), m ∈ L 1 ([0, T ]× M× R d ) and w ∈ L 1 ([0, T ]× M× R d ; R d ), so (m, w) and (u, β, β T ) are admissible competitors for the two optimisation problems. Now, take (u, β, β T ) as an admissible competitor for the problem involving the functional A. By the convexity and differentiability of F * and G * in their last variable we have where we have used the fact that mf (·, ·, m) ∈ L 1 ([0, T ] × M × R d ) and m T g(·, ·, m T ) ∈ L 1 (M × R d ) (by the arguments at the beginning of this proof). Moreover,
Using the very same ideas and the convexity of F and G, we can conclude similarly that (m, w) must be a minimizer in Problem 3.2.
Finally, we show that solutions in the sense of Definition 2.3 are unique, again following similar ideas as the corresponding ones from [15]. One major difference, however, is that we develop a suitable comparison principle for the distributional solutions to the corresponding Hamilton-Jacobi inequalities. This completes the proof of Theorem 2.4.
Proof of Theorem 2.4. The existence of a weak solution (u, m) follows from combining Theorem 6.8 (existence of a minimizer forÃ), Theorem 5.1 (duality, and the fact that the infimum forB is attained) and Theorem 7.1 (minimizers are weak solutions in the sense of Definition 2.3).
To show that u 1 = u 2 almost everywhere on the set {m > 0}, we first define u = max{u 1 , u 2 }. By Lemma B.2, u also satisfies the Hamilton-Jacobi inequality, with β = f (·, ·, m) and β T = g(·, ·, m T ). Since u i ≤ u for i = 1, 2, we have and thus A(u, β, β T ) ≤ A(u i , β, β T ). Since u i is a minimizer, equality holds. By duality, equality then holds in the energy inequalities of Corollary 5.4 for u and m, with β, β T , w as defined previously. Thus, for almost all t ∈ [0, T ], The same is true replacing u by u i , and sô Thus, since also u i ≤ u, we deduce that u i = u almost everywhere on the set {m > 0}.

Sobolev estimates on the solutions
In this section, we obtain Sobolev estimates on the optimizers of the variational problems, and hence on weak solutions for the MFG system (1.1). The general idea is to "compare" the optimality of the optimizers in the variational problems with their carefully chosen translates. Then using strong convexity of the data one can deduce differential quotient estimates.
These results are inspired by [38,39]. However, because of the kinetic nature of the model we need completely new ideas when we consider perturbations. So, the estimates that we obtain are on suitable kinetic differential operators applied to the solutions. Another crucial difference between our results and the ones in [38,39] is that our Sobolev estimates in the x and v variables are local in time on (0, T ]. The main reason behind this is that we have a weaker notion of trace for u 0 , that we cannot ensure to be stable under perturbations. This imposed further technical complications that require us to work in the case of r = 2. We emphasize that these estimates are consequences of the stronger convexity and regularity assumptions on the data stated in Assumption 2. For competitors (m, w) in Problem 3.2, without loss of generality one might assume the representation w = V m, for a suitable vector field V . Let δ ∈ R d with |δ| ≪ 1 and define We use the notation w δ := V δ m δ .
Similarly, for competitors (u, β, β T ) in Problem 3.7 we define When computing the Legendre transforms of these functions in their last variables we obtain Let us notice that H δ satisfies in particular the hypotheses imposed in Assumptions 1. Correspondingly, we define the functionals A δ and B δ and the constraint sets K δ A and K δ B , using the shifted versions of the data functions. By construction, as a consequence of a change of variable formula, the proof of the following lemma is immediate.
We provide the proof of one of the statements only, the other ones follow similar steps. Suppose that (m δ , w δ ) is an optimizer of B δ over K δ B . This means in particular the minimality of the quantitŷ where in the last equality we have used the change of variables (x, v) → (x − η(t)δ, v − ζ(t)δ). So, this means that the minimality of (m δ , w δ ), after a change of variables, yields the minimality of (m, w).
Remark 8.3. As for Theorem 2.5, this is an informal statement: the result we obtain is on suitable difference quotients as in estimate (8.8) below.
Proof of Theorem 8.2. Let (u n , β n , β T,n ) n∈N be a minimizing sequence for Problem 3.7 such that u n ∈ Let us recall that after passing to a subsequence, that we do not relabel, as a consequence of Proposition 6.1, Lemma 6.7 and by Claim 2 in the proof of Theorem 7.1, we have that Notice that the previous arguments imply also that the subsequence can be chosen such that for all M < 0 Furthermore, by Theorem 7.1, we have that β = f (·, ·, m) and β T = g(·, ·, m T ). Let w = mD pv H(·, ·, D v u).
Fix δ ∈ R d such that |δ| ≪ 1 and ζ : [0, T ] → R as described at the beginning of this subsection. Now, the main idea is to use u δ n as a test function in the weak formulation of the equation satisfied by (m, w) and and u n as test function in the weak form of the equation satisfied by (m δ , w δ ). Then we combine these inequalities with the energy equality (2.4) written for (m, w) and (m δ , w δ ), respectively, and rely on the strong convexity and regularity properties of the data to deduce a differential quotient estimate.
Following these steps, we obtain We combine this with the energy equality (2.4) for (m, w) to get Similarly, using u n as a test function in the weak form of the equation for (m δ , w δ ) and combining with (2.4) for (m δ , w δ ), Adding (8.2) and (8.3), after some changes of variables (translations) and a Taylor expansion of L, we deducê where we have also used the facts that by the choice of η and ζ, we have u δ 0,n = u 0,n , u δ 0 = u 0 and m δ 0 = m 0 . Our aim now is to pass to the limit n → +∞ in (8.4) and derive a differential quotient estimate. For this, we consider each of the terms separately.
Step 1. First, we notice that by (H7) and by the fact that |w| r ′ Step 2. Second, let us notice that by the fact that m ∈ (L 1 ∩ L q )([0, T ] × M × R d ) and by (8.1), for any M < 0 we have where we have used the fact that f (m)m, (f δ (m δ )) + , (f −δ (m −δ )) + ∈ L 1 so that the integrand is upper bounded by an L 1 function to allow us to apply the monotone convergence theorem. Since the left hand side of inequality (8.4) is bounded from below by zero, it follows the right hand side of (8.5) is not negative infinity.
By the very same arguments one can conclude that lim sup Step 3. By Young's inequality, we havê where c > 0 is an arbitrary constant, and C = C(c, T, ζ ′ ) > 0.
Step 4. By the previous steps we can conclude that is uniformly bounded above, independently of n ∈ N. Let us recall that |w| 2 Using the growth condition on H, by choosing c > 0 small enough in our application of Young's inequality we deduce that D v u ±δ . By a change of variable, one can similarly deduce that D v u n is uniformly bounded in L 2 m ±δ ((0, T ) × M × R d ; R d ). Claim. After passing to a subsequence that we do not relabel, we have Proof of Claim. By the uniform boundedness of the sequence, we know that there exists a subsequence of it (that we do not relabel) and ξ ∈ L 2 m ((0, T ) × M × R d ; R d ), as weak limit, i.e.
Thus, we aim to show that ξ = D v u ±δ . As D v u ±δ n ⇀ D v u ±δ , weakly in L 2 loc (U m0 ), as n → +∞, we can argue similarly as in the proof of Claim 2, in the proof of Theorem 7.1 to deduce the claim.
Step 5. By summarizing, (8.4) Using the additional assumption (2.10) and the inequality |a + b| 2 ≤ 2 a 2 + b 2 , for c > 0 sufficiently small, one can conclude that there exists c 0 > 0 depending only on the data, such that Now, our aim is to pass to the limit with n → +∞ first in (8.6). For this we take lim inf n→+∞ of the left hand side and lim sup n→+∞ of the right hand side. We notice that the term −´M ×R d 2u 0,n m 0 dxdv needs special attention, since we do not have upper semicontinuity of it. Because of this, we add to both sides of (8.6) the quantity ×R d G * (β T,n )dxdv before passing to the limit. Thus we obtain All the arguments in the previous steps allow us to pass to the limit. By the fact that (u n , β n , β T,n ) is a minimizing sequence, we get that So, after simplification, one obtains Now, using (2.5) and (2.7) the very same arguments as in [38, computation (4.25)] yield Similarly, (2.6) and (2.8) yield Combining these estimates with (8.7), we get Dividing by |δ| 2 and letting δ → 0, we easily obtain the result.
8.1.1. Recovering estimates on the operator (tD x + D v ) applied to solutions. By choosing a specific structure for the cut-off function ζ, we can recover estimates on more particular differential operators. Suppose that ζ(t) = 0 for t ∈ [0, t 0 /2], and ζ(t) = 1, for t ∈ (t 0 , T ] for some t 0 ∈ (0, T ) (to be chosen to be arbitrary), in such a way that also η(t 0 ) = t 0 . Then in Theorem 8.2, the operator (ηD x + ζD v ), for t > t 0 becomes (tD x + D v ). So, one can state the following local in time corollary.
Corollary 8.4. Suppose that the assumptions of Theorem 8.2 take place. Then, there exists C > 0 such that

8.1.2.
Recovering estimates on the operator D x applied to solutions. Now suppose that η(t) = 0 for t ∈ [0, t 0 /2] and η(t) = 1 for t ∈ (t 0 , T ] (where t 0 ∈ (0, T ) can be chosen arbitrarily). We still require that ζ := η ′ . With this choice of cut-off functions η, ζ, we can formulate the following result as a corollary of Theorem 8.2.
for any h Sobolev function.
Appendix A. Time Regularity In this appendix, we collect some facts about the regularity with respect to time of solutions u of By this we mean that, for any non-negative test function What we discuss is close to the standard theory of distributional solutions. However, in our case technical difficulties arise since, firstly, (A.1) is an inequality and, secondly, we wish to work on the atypical domain U m0 . We therefore found it useful to clarify several points. Our main goal is to give a precise sense to the specification of boundary data for this problem at time t = T , and to give a meaning to u 0 (the 'value of u at time t = 0'), which appears in the functional A defining the variational problem. Throughout this appendix we impose the following summability conditions on the pair (u, β) ∈ L 1 loc (U m0 ) × L 1 loc (U m0 ) and that H satisfies (2.1).
Such a construction is classical, but because of the lack of a precise reference in our context, we sketch the main ideas here. Take a countable dense set Z ⊂ C 1 c (M × R d ); there is a full measure set E ⊂ (0, T ) such that ψ, u(t) = f ψ (t) for all t ∈ E and all ψ ∈ Z, and moreover u(t) ∈ L 1 loc (M × R d ) (the latter is true for almost all t since u is L 1 loc ). Then ũ(t), ψ :=f ψ (t) defines a bounded linear functional on Z, for all t ∈ E. The estimate (A.5) can be used to show that this is in fact true for all t ∈ (0, T ). The resulting functionalũ(t) extends by density to a continuous linear functional on C 1 c (M × R d ). Then the estimate (A.5) can be used to prove that ũ(t), ψ is right continuous for all ψ ∈ C 1 c (M × R d ), not just on Z.
Next, we construct the extension ofũ to the boundaries t = 0, T .
Remark A.3 (Group property). For any s, t ∈ R, T s T t = T s+t .
Lemma A.4. Let u be a solution to (A.1) and letũ be its right continuous representative, obtained in Lemma A.1. Let ψ ∈ C 1 c (M × R d ) be non-negative. Consider the function (0, T ) ∋ t → T t ψ,ũ(t) . Then (1) As t tends to T −, T t ψ,ũ(t) either tends to a finite limit or to positive infinity.
(2) As t tends to 0+, T t ψ,ũ(t) either tends to a finite limit or to negative infinity.
Proof. Observe that It follows that d dt Then the negative part of the time derivative satisfies Thus T t ψ,ũ(t) can be written as the difference of monotone functions, where the decreasing part is absolutely continuous on (0, T ) and can be extended to finite limits at the endpoints. By monotonicity, the increasing part either has a finite limit at t = T , or tends to positive infinity; similarly, at t = 0 it either has a finite limit or tends to negative infinity.
These define linear maps from C 1 c (M × R d ) to R ∪ {−∞} in the case of u 0 , and R ∪ {+∞} in the case of u T .
We now suppose that, in addition to the weak Hamilton-Jacobi inequality in the interior (A.2), u satisfies the following: for all non-negative test In our setting, we will have that β T ∈ L 1 loc (M × R d ) is a given function whose positive part satisfies (β T ) + ∈ L s ′ (M×R d ). In this case, we show below that the time trace u T , enjoys some more properties.
Since β T − u T is a positive linear functional on C 1 c (M × R d ), it is bounded, and therefore u T is also a bounded linear functional on C 1 c (M × R d ).
We now discuss the trace of u at t = 0: u 0 as defined in Definition A.6. ψ, u 0 is defined for all ψ ∈ C 1 c (M × R d ). Our aim is to give a meaning to the quantity m 0 , u 0 , which appears in the definition of the functional A. In the case where m 0 ∈ C 1 c (M × R d ) this is straightforward, noting that we allow the possible value −∞. We now consider the more general case where m 0 ∈ C(M × R d ).
Lemma A.9. Assume that, for all φ ∈ C 1 c ({m 0 > 0}), φ, u 0 = −∞. Then u 0 is represented by a Radon measure on {m 0 > 0}. Furthermore, the positive part (u 0 ) + has the property that Proof. Let φ ∈ C 1 c (M × R d ) be non-negative. Since d dt T t φ,ũ(t) ≥ − T t φ, C H + β + , we have φ, u 0 ≤ˆT 0 T t φ, C H + β + dt + T T φ, β T =: Sφ The right hand side is linear in φ and satisfies ˆT 0 T t φ, C H + β + dt + T T φ, β T ≤ φ L ∞ C H + β + L 1 (K [0,T ] ) + β T L 1 (KT ) ; here K T denotes the set where K is the support of φ, and K [0,T ] is the set Thus S defines a bounded linear functional on C c (M× R d ). In particular it is a distribution; moreover, it is represented by a signed Radon measure. Observe next that S − u 0 is a positive linear functional on C 1 c ({m 0 > 0}), and thus bounded and a distribution. By positivity it is given by a Radon measure ν 0 on {m 0 > 0}. We deduce that That is, u 0 is a signed Radon measure.
Moreover, from the definition of the Hahn-Jordan decomposition we have the following estimate for the positive part: Let φ n ∈ C c (M × R d ) be an increasing sequence of functions such that φ n converges to m 0 as n → ∞. Since sup n φ n , (u 0 ) + ≤ˆT 0 T t m 0 , C H + β + dt + T T m 0 , (β T ) + ≤ C H T m 0 L 1 + T 1/q β + L q ′ m 0 L q + (β T ) + L s ′ m 0 L s , we conclude that m 0 , (u 0 ) + is finite.
Based on the previous lemma, we make the following definition.
This is well-defined (allowing for the possible value +∞) by Lemma A.9.
Lemma A.11. Suppose that the assumptions of Lemma A.6 hold and suppose in addition that holds for all φ ∈ C 1 c (U m0 ), where β 0 ∈ M ({m 0 > 0}) is also given. Then for the trace u 0 of the right continuous version of u we have β 0 ≤ u 0 , in D ′ ({m 0 > 0}), and in particular u 0 , ψ = −∞ for any ψ ∈ C 1 c ({m 0 > 0}). If in addition we suppose that β 0 is such that β 0 , m 0 is meaningful and finite, then u 0 , m 0 is finite and β 0 , m 0 ≤ u 0 , m 0 .
that applies when h is a Lipschitz function and ∂ t u + v · D x u ∈ L 1 loc . However, since in our case ∂ t u + v · D x u may only be a measure, we are not able to use this result directly, or indeed prove a chain rule with equality as in equation (B.2). Nevertheless, the ideas of the proofs in [10,39] can be used to obtain the inequality that is sufficient for our case.
B.6. Limits. We now take the limit as the smoothing parameters tend to zero in the previous inequalities. This procedure yields the proofs of the Lemmas B.1 and B.2. We continue to choose ε = δ 2 to ensure convergence of the commutator. We detail the procedure in the case of the truncation (B.3); the case of the maximum function is similar.
A similar argument is used for the term involving β T .
For the remaining term, use that |γ ′ α | ≤ 1 (the bound being uniform in α > 0), and both β and H(x, v, D v u) are in L 1 loc ((0, T ] × M × R d ) by assumption. Then Finally, taking a sequence of vector fields a converging in L r ′ ((0, T ) × M × R d ) to −D p H(x, v, D v u + ), we conclude that that is, the following holds in the sense of distributions: