Angular adaptivity in P0 space and reduced tolerance solves for Boltzmann transport

Previously we developed an adaptive method in angle, based on solving in Haar wavelet space with a matrix-free multigrid for Boltzmann transport problems. This method scalably mapped to the underlying P$^0$ space during every matrix-free matrix-vector product, however the multigrid method itself was not scalable in the streaming limit. To tackle this we recently built an iterative method based on using an ideal restriction multigrid with frozen GMRES polynomials (AIRG) for Boltzmann transport that showed scalable work with uniform P$^0$ angle in the streaming and scattering limits. This paper details the practical requirements of using this new iterative method with angular adaptivity. Hence we modify our angular adaptivity to occur directly in P$^0$ space, rather than the Haar space. We then develop a modified stabilisation term for our FEM method that results in scalable growth in the number of non-zeros in the streaming operator with P$^0$ adaptivity. We can therefore combine the use of this iterative method with P$^0$ angular adaptivity to solve problems in both the scattering and streaming limits, with close to fixed work and memory use. We also present a CF splitting for multigrid methods based on element agglomeration combined with angular adaptivity, that can produce a semi-coarsening in the streaming limit without access to the matrix entries. The equivalence between our adapted P$^0$ and Haar wavelet spaces also allows us to introduce a robust convergence test for our iterative method when using regular adaptivity. This allows the early termination of the solve in each adapt step, reducing the cost of producing an adapted angular discretisation.


Introduction
In this work we consider the mono-energetic steady-state form of the Boltzmann Transport Equation (BTE) written as Ω • ∇ r ψ(r, Ω) + σ t ψ(r, Ω) − Ω σ s (r, Ω → Ω)ψ(r, Ω )dΩ = S e (r, Ω), where the number of particles moving in direction Ω, at spatial position r is given by the angular flux, ψ(r, Ω).The macroscopic total and scattering cross section are given by σ t and σ s , respectively, with an external source of S e .Solving (1) can be difficult given the dimensionality of the problem; previously we presented methods for building adapted angular discretisations for the BTE [1,2,3].This allowed angular resolution to be focused where important in space/energy.[1] used anisotropic adaptivity on the sphere in a Haar wavelet space, which was built on top of an underlying nested P 0 space.The angular matrices in Haar wavelet space cannot be formed scalably, as the number of non-zeros (nnzs) grows non-linearly with angular refinement.As such, we built a matrix-free multigrid method to solve the adapted problems, which mapped the solution into the underlying P 0 space in O(n) and performed a matrix-vector product with the P 0 angular matrices, which is scalable, before mapping back to Haar space.This method resulted in a reduction in time to solve and memory use when compared to uniform discretisations.We showed that our adaptive process was scalable, enabling the use of up to 15 levels of hierarchical refinement in angle; unfortunately in the streaming limit the matrix-free multigrid could not solve the linear systems with fixed work.This is due to the BTE becoming hyperbolic in the limit of no scattering and we deliberately chose not to use Gauss-Seidel/sweep based smoothers, in an effort to build an iterative method that could scale well on unstructured grids in parallel.
Recently we showed [4] that by solving in P 0 space directly, we could build an iterative method without sweeps with excellent performance in the streaming limit of the BTE.The work in this paper is therefore based on building an adapted angular discretisations in P 0 space and combining this with our new iterative method, enabling the efficient use of adapted angular discretisations in both streaming and scattering problems.
Here we present four contributions: the first is a sparsity-controlled adaptive P 0 discretisation for the BTE; the second is the use of an adapted P 0 discretisation with our AIRG-based iterative method; the third is a method for selecting coarse (C) and fine (F) points for a multigrid hierarchy that can be combined with angular adaptivity to provide semi-coarsenings without requiring matrix entries; and finally a convergence test for iterative methods that decreases the cost of forming an adaptive angular discretisation with regular adaptivity.

Discretisations
The spatial and angular discretisations used in this work are based on that we used previously and we describe them below [5,6,7,8,1,4].

Angular discretisation
We use a P 0 DG FEM in angle, with the lowest resolution given by one P 0 element per octant (similar to an S 2 discretisation).Subsequent levels of refinement come from dividing each element in four, at the halfway points of the azimuth angle and cosine of the polar angle.All the elements at a given level of refinement therefore have constant area and we normalise our constant basis functions so the uniform (and hence diagonal) angular mass matrix is the identity.This P 0 discretisation is similar to an S n product quadrature and features elements that cluster around the poles with uniform refinement.This is not a desirable feature for a uniform discretisation of the BTE, but the nested nature of the refinement means it is simple to build an adapted angular discretisation and hence refinement around the poles only occurs if it is required.
We can build a Haar wavelet discretisation on top of this P 0 space, with the hierarchy in the nested elements replaced with a hierarchy in the wavelet basis functions.These two discretisations are exactly equivalent.As in [4] we solve in P 0 space as we can form an assembled copy of the streaming/removal matrix that has fixed sparsity with angular refinement (see Section 2.2).The equivalence between the P 0 and Haar spaces is still very useful as it allows us to easily tag which angular elements require refinement/de-refinement and hence it enables the reduced tolerance solves discussed below..661832] in a 2D simulation.The P 0 angular discretisation is on the r = 1 sphere, but has been projected onto faceted polyhedra for ease of visualisation.The camera is pointed in the −z direction.
We noted in [1] that solving in P 0 space can easily be used as part of the adaptive process we described previously.Fig. 1 shows an example of an adapted P 0 discretisation at a single node point with several levels of refinement, defined a priori in the +y direction.In this work we only show regular adaptivity (based on achieving uniform global accuracy in the solution) given our iterative methods are agnostic to how the adapted discretisations are formed; regular adaptivity is not an optimal method for many problems (particularly in the streaming limit), please see [1,3] for examples using a priori and goal based adaptivity.There have also been a number of other authors who have used adaptivity in angle in Boltzmann applications; these include sparse grid methods [9], S n methods [10,11,12,13,14,15], FEM methods [16,17,18,19,20,21], and specifically finite element/wavelet methods [22,23,24,25,26].
Our adaptive process starts with a uniform angular discretisation (typically level 1) and performs a solve in P 0 space.The solution is then mapped into Haar wavelet space in O(n) and an error metric is formed.Regular adaptivity results in a discretisation that minimises the global error in the problem and is simple to perform.In wavelet space, the size of the wavelet coefficients is influenced by both the size of the flux and the smoothness of the underlying function; thresholding the wavelet coefficients with a fixed value is therefore sufficient to drive regular adaptivity.As such, we take the angular flux in Haar space, and scale it by a thresholding coefficient.This coefficient is input by the user and drives refinement as it is decreased (towards a uniform discretisation).We also scale by the maximum of the angular flux, in an attempt to make the thresholding coefficients somewhat problem agnostic.If each resulting wavelet coefficient is greater than 1.0, we tag it for refinement.If it is less than 0.01, we tag it for de-refinement.We therefore know which angular elements in P 0 space to refine/de-refine, as they are given by the elements which are in the support of each wavelet; see [1].This is performed across each spatial point separately.The initial condition for the P 0 solve on the adapted discretisation can then be taken from the previous step.The hierarchical nature of the wavelets makes this simple; it is equivalent to interpolating the P 0 solution onto the newly adapted P 0 discretisation.We then continue this adapted process up to a maximum level of refinement, or number of adapt steps.
The only difference between this adaptivity process compared with [1] is that we enforce that if a single wavelet is added, all the wavelets on the same level of refinement that share the same support (a maximum of two other wavelets) on the sphere are also added.Similarly, the removal requires that all wavelets with the same support are below the threshold coefficient and all are removed at once.This ensures that when an angular element is refined (or de-refined), we always have 4 P 0 elements present, which makes the P 0 implementation simpler.In comparison to the matrix-free method in [1], we must also introduce further controls on the sparsity of our discretisation given we solve in P 0 space; this is detailed below.

Spatial discretisation
Our spatial discretisation is a sub-grid scale FEM, which represents the angular flux as ψ = φ + θ, where φ is the solution on a "coarse" scale and θ is the solution on a "fine" scale.We perform a separate finite element expansions on both the fine and coarse scales, with continuous linear basis functions on the coarse scale and discontinuous linear basis functions on the fine scale.We then use (constant) basis functions in angle and enforce that the coarse and fine scales have the same angular expansion on co-located nodes.Our discretised form of (1) can then be written as The number of unknowns in the coarse scale discretised vector, Φ, is NCDOFs and the number of unknowns in the fine scale discretised vector, Θ, is NDDOFs.S Φ and S Θ are the discretised source terms for both scales.(2) is built using standard FEM theory and as such A and D are the standard continuous Galerkin (CG) and discontinuous Galerkin (DG) FEM matrices that result from discretising (1).A Schur complement of block D allows us to solve for the coarse scale variable, Φ, given by with the fine scale solution then computed with The addition of the two solutions Ψ = Φ+Θ (where the coarse solution Φ has been projected onto the fine space) then gives us our discrete solution.Solving the sub-grid scale equations as posed would be more expensive than solving with a DG FEM discretisation, so we sparsify D (see also [8,27,28,25,29,30,1,2]).We replace D −1 in (3) and (4) with D−1 , which is the streaming operator with removal and self-scatter, and vacuum conditions applied on each DG element.This removes the jump terms, resulting in element blocks in D, making the computation of D−1 tractable.With a uniform P 0 angular discretisation (as in [4]), this is sufficient to ensure fixed sparsity with either spatial or angular refinement, as D (and hence D−1 ) has diagonal blocks.Unfortunately this is not sufficient when we have adapted our P 0 angular discretisation with differing angular resolution at each spatial point.If we denote the sparsity of the streaming component 0}, then we enforce ( D−1 ) i, j ∈ S D .This is equivalent to using an ILU(0) to invert our sparsified approximation.
The ILU(0) is only necessary to ensure a fixed sparsity with our adapted P 0 space; for example if we have spatial nodes with differently adapted P 0 discretisations, the construction of our angular discretisation means that the nnzs in each of the blocks of our element matrices depends on how each adapted angle overlaps the others (i.e., the angular mass matrix is no longer diagonal).Fig. 2a shows this on an example element, where the element streaming operator is no longer made up of diagonal blocks.If we inverted this without the sparsity control of an incomplete LU factorisation, we would significantly increase the nnzs in an adapted P 0 space, as shown in Fig. 2b; this is in contrast to a uniform P 0 space where the nnzs in the inverse is the same as the streaming operator.For our adaptivity to be useful, we want the cost of our matvec to scale with the number of adapted unknowns.We can imagine a pathological case where, in one dimension for example, the angular resolution varies across the domain such that each (linear) spatial element has one node at the lowest resolution possible and one at the highest.The streaming operator would have the same nnzs as a uniform P 0 discretisation at the highest resolution, given the effect of overlapping angular elements in the angular matrices, described above.Thankfully in our experience this pathology does not occur in practice; we show this in Section 5.

Iterative method
The iterative method we use in this work comes from [4] and is briefly detailed here.We solve a right preconditioned version of (2) given by We use GMRES (30) to solve this equation and a matrix-free matvec to compute the action of (A − B D−1 C).The preconditioner, M −1 , uses the the additive combination of a CG diffusion-synthetic-acceleration (DSA) method and a sparsified form of our streaming/removal operator, which we denote as The CG DSA preconditioner is based on a CG FEM discretisation of a diffusion equation, D diff , with R angle and P angle the restriction/prolongation of the angular flux to the constant moment and hence We can rewrite (3) as the contribution from a streaming/removal component (denoted with a subscript Ω) and a scattering component (denoted with a subscript S), or where y = D−1 C S and our fine component is In [4] we used , but here our adapated P 0 space in angle requires further sparsification.If we denote the sparsity pattern of B Ω as , then a sparsified streaming operator is given by With uniform P 0 angle, ( 9) is exactly with adapted P 0 angle it is equivalent to computing the matrix product D−1 C with no fill-in, and then using this result to compute B Ω D−1 C Ω , again with no fill-in.This is necessary as again the overlap of adapted angular elements means the product of two (element) matrices with sparsity shown in Fig. 2a results in the sparsity given in Fig. 2b, which is unacceptable.When computing the action of the matrix triple-product one-by-one as part of a matrix-free matvec for the outer GMRES, the extra non-zeros are not a concern, it is only when we wish to form an assembled version of A Ω − B Ω D−1 C Ω to precondition with that we must take care to ensure no extra fill-in occurs.Practically, we store an assembled copy of (9) to apply the preconditioner so we use this as a replacement for the non-sparsified (8).The difference between these two approaches is simply a modified stabilisation term when we have adapted.
We now require a method to apply the inverses in our preconditioner.In [4] we developed a multigrid method known as AIRG, based on a reduction-style multigrid [31,32].We used a single V-cycle of AIRG per application of the preconditioner to apply M −1 Ω .The diffusion operator was applied with a single V-cycle of boomerAMG from hypre.We use the same approach here.If we consider a general linear system Ax = b we can form a block-system due to a coarse/fine (CF) splitting of the unknowns as We discuss the CF splitting further in Section 4. Ideal prolongation and restriction operators are given by with the coarse-grid matrix computed with A coarse = RAP.Repeating this process on the coarse-grid then builds a multigrid hierarchy.AIRG forms approximations to A −1 ff , namely Â−1 ff , by using fixed-order GMRES polynomials.This results in a stationary multigrid method that can be applied with just matrix-vector products (i.e., no dot products).One key feature of AIRG is that it doesn't rely on any block or lower-triangular structure in our matrix.This is essential given that our spatial discretisation results in a CG-stencil in (3) and hence our linear system does not feature lowertriangular blocks in the streaming limit, unlike a traditional DG FEM.If we use angular adaptivity in P 0 space, different angles in an octant can be coupled across space in the streaming limit, and hence we no longer have independent angle blocks in our matrix.Instead our matrix has at most 4 (in 2D or 8 in 3D) angular blocks given by the octant coupling.We found in [4] that with uniform angle, in both the streaming and scattering limit, our iterative method with AIRG used close to fixed work in both the setup and solve with constant memory consumption.We would hope that the block-independent nature of AIRG gives the same performance when we have adapted in angle; we examine this in Section 5.
To measure the amount of work required to solve (3), ( 4) and form Ψ, we use the metrics defined in in [4].This is given in terms of the number of "Work Units", which is a FLOP count scaled by the number of FLOPs required to compute a matrix-free matvec of (8).In order to make comparisons with traditional DG FEM/source iteration methods easier, we also present the FLOP count scaled by the number of FLOPs required to compute a matrix-free matvec with a DG FEM discretisation.Please see [4] for a more detailed definition of these quantities.

CF splitting
In [4] we used the Falgout-CLJP coarsening algorithm implemented in hypre to determine the CF splitting required by our AIRG multigrid method (see [33] for some general strategies related to CF splittings).In an effort to build cheaper CF splitting algorithms, in this work we also use the element agglomeration algorithms from [1,34].These methods were vital to our previous work, as we needed to build and apply spatial tables on coarse elements in order to compute matrix-free matvecs on lower spatial grids.In this work we don't require the coarse elements these methods provide, but we can still use the spatial CF points they provide to pick which of our space/angle DOFs are fine and coarse.Given these algorithms only depend on the spatial grid, they can be cheaper to run than algorithms which require access to the matrix entries, though because of this we would expect them to perform less well, given the same spatial CF points would be applied to each angle, and hence ignore the directionality.For a reduction-based multigrid, the selection of "good" CF points results in a well-conditioned A ff , though as we demonstrate in Section 5 this can be ameliorated through the use of strong approximations to A −1 ff .In this paper when using uniform angle (i.e., for the first solve in our adapt process), our element agglomeration CF splitting has no directionality.As mentioned in Section 2.1 however, the P 0 angular discretisation used in this work can adapt, refining in "important" directions.The adapt process solves a linear system with a uniformly refined angular discretisation in the first step, followed by angular refinement at each spatial point.From there each subsequent adapt steps continues to build anisotropically adapted angular discretisations.For the linear solves after the first step, we can use the directional information contained within the adapted angular discretisations to determine a spatially dependent "coarsening direction" and hence a set of CF points with directionality.There are many ways we could define a "coarsening direction" at a given spatial point; we could pick the direction of the most refined angular element, compute an average direction on the sphere across all the angular elements, compare the distribution of angles or level of refinement across the sphere, etc.For simplicity in this work, we set the coarsening direction at each spatial node point to be the direction given by the centre of the most refined angular element at that point, or if a priori angular refinement is used, the direction of that refinement is used.
We then have a set of coarsening directions at each spatial node point and this is then used to guide our element agglomeration.The spatial coarse points are the vertices of the coarse agglomerates.If a spatial node is designated coarse, all the angular DOFs on that spatial node are also tagged as coarse.The key to this process is that it must result in a CF splitting that reflects the underlying streaming or scattering limits.One of the benefits of using element agglomeration in this fashion is that after several adapt steps, we could consider the "coarsening direction" to be (mostly) converged and hence freeze the element agglomeration.The spatial CF points are then frozen and any additional angular DOFs added in subsequent adapt steps can be cheaply tagged based on the frozen spatial CF points.This could make it cheaper to form the CF splitting with many adapt steps, as we would only need to perform the element agglomeration a small number of times.
For many problems in the streaming limit, the angular adaptivity will pick out a limited number of key direction(s) and hence the coarsening needs to occur in those directions.We should note that the coarsest level of refinement on our angular domain is given by a single basis function per octant.If we have a single important direction, for example in Fig. 1 and our coarsening direction has been computed as say, +y, there will always be some angles (the unrefined elements) that are not well represented by the coarsening direction.In the limit of angular refinement however, given the AMR-style nested refinement of our angular adaptivity, the majority of angles at that node point will be well represented by the coarsening direction.
In scattering regions however, the angular adaptivity tends to uniform refinement and hence the coarsening needs to form well-shaped circular/spherical agglomerates (as there is no "important" direction), with agglomerates that could have a different number of fine elements compared to the streaming case, given possible differences in optimal coarsening rates.The combination of spatially-dependent coarsening directions formed through angular adaptivity and scattering cross-sections should therefore provide us with the information required to compute a directional element agglomeration algorithm that provides "good" CF splitting for the majority of unknowns in our angularly adapted problems.
There are a few related problems with this scheme.For example, both the linear system in the first adapt step and problems where uniform refinement is triggered far from the scattering limit won't have the required directional information.The first of these is not a major concern as the number of unknowns in the first adapt step is small compared to the number of unknowns at all the other adapt steps.In both cases we must return to using the directionless agglomeration algorithms described in [34].We can also have the case where we have adapted with the angular DOFs concentrated in small regions of the spatial mesh, but the element agglomeration proceeds with a constant agglomerate size.This can result in many DOFs in those refined regions not being tagged as "coarse"; the non-uniformity of the distribution of angular DOFs across the spatial mesh therefore persists on the coarse grid.This means our multigrid may have very different operator complexity when compared to uniform angle.These problems may harm the performance of our multigrid methods; we examine this further below but note that the traditional coarsening algorithms that rely on matrix entries can always be used.

Directional element agglomeration method
Previously we presented 7 different element agglomeration methods from the literature and compared their performance on scattering/absorbing problems [34] with the matrix-free multigrid from [1].Here we present one of those algorithms that has been modified to use directional information; all of the algorithms mentioned in [34] can be modified (and perform similarly).
Our element agglomeration has been modified with very simple heuristics; if the average scattering cross-section is greater than 1 (in this work we use the actual cross-section value on each element; ideally this would just be defined as the mean free path in an element, or an equivalent dimensionless quantity), or the number of angles on a spatial node is "close" to that of a uniform discretisation at that level of refinement (we define "close" as having > 63% of the uniform angles; this is equivalent to a level 2 discretisation having more than 2 octants refined), then agglomeration proceeds without directional information.If we do proceed with directional coarsening, the tangent vector to the face on the spatial mesh is compared with the average coarsening direction computed on that face (i.e., an average of the coarsening direction across all the spatial nodes on that face).If those two vectors are close to parallel then agglomeration across that face is discouraged; we define close to parallel as the smallest angle between the vectors being less than π/4.The cleanup routine described in [34] has also been modified to use directional information.Any elements which require cleanup (e.g., unused elements, elements within completely closed agglomerates, etc) are added to the agglomerates closest to the coarsening direction with the smallest number of elements.
Algorithm 1 uses METIS and requires setting a desired agglomerate size to control the coarsening rate.As mentioned above, typically we would consider the ideal coarsening rate to be dependent on the cross-section.Previously [34] we examined the ideal agglomerate size with our matrix-free multigrid applied to scattering/absorbing problems.The simple grid transfer operators we used in that work meant that the multigrid method could be considered a geometric multigrid and as such we found very large number of elements  in two and three dimensions, respectively) in an agglomerate were required to keep the grid complexity low and hence maintain good performance.The same is not the case in this work, where our multigrid methods are much closer to AMG/AMGe style methods and we require high quality interpolation/restriction for good performance across all parameter regimes.This requires much smaller numbers of elements in an agglomerate and hence higher grid complexities, although we do use the same coarsening rate in streaming and scattering problems in this work.
As in [34], once top grid agglomeration has occurred on an unstructured grid, the lower grids have fundamentally different spatial connectivity and hence come to resemble structured grids, where the agglomerate size can be easily set to a constant across levels.As such we set the agglomerate size to 6 in 2D and 12 in 3D on the top grid, whereas on lower grids we set the agglomerate size to 2 in 2D and 4 in 3D (i.e., we perform aggressive coarsening on the top grid).
Algorithm 1 METIS 5.1 [35] directional coarsening algorithm 1: Construct the dual-graph of the mesh 2: Construct a coarsening direction, a, at every node, by using the centre of the most refined angular element 3: Compute s to be the average scattering cross-section at every node from each of the connected elements 4: Get the number of angles on every node, n and the number of angles with a uniform discretisation, n uniform 5: if s > 1 or n ≈ n uniform then 6: Set a = 0 on a node 7: end if 8: Build integer area/volume weights from the dual-graph by scaling relative face areas and volumes to between 1 and 100 9: while we haven't visited every face -denote current face as f do 10: Compute a avg to be the average coarsening direction on f from each of the vertices on this face 11: Compute ϑ to be the dot product between a vector parallel to f and a avg 12: if ϑ < π/4 and a avg 0 then 13: Set the integer area weight on this face to be 1

Results
Outlined below are two examples problems taken from [4], in the streaming and scattering limit that we use to test our P 0 angular adaptivity.We solve our linear systems with GMRES(30) to a relative tolerance of 1× 10 -10 , with an absolute tolerance of 1× 10 -50 and use an initial guess of zero unless otherwise stated.We use the additive preconditioners defined in [4] and all the same parameters as that work.This is the case even when we have used angular adaptivity and hence may have non-zero initial guesses from previous adapt steps.We do this to ensure fair comparisons across different material regimes at different levels of angular refinement.This is in order to show that the convergence of our method is not dependent on a "good" initial condition in some material regimes.In the scattering limit the non-zero guesses from coarser angular discretisations are helpful, but in the streaming limit ray-effects mean that coarse angular discretisations often do not provide good approximations to refined angular discretisations.If using the element agglomeration CF splitting, we rerun this at each adapt step, rather than freeze it after a set number of steps.All tabulations of memory used are scaled by the total NDOFs in ψ in (2), i.e., NDOFs=NCDOFs + NDDOFs.

Angular adaptivity
Section 2.1 described how our P 0 angular discretisation can adapt anisotropically on the sphere, allowing different angular refinement at different spatial points.As such we examine the use of AIRG and our additive iterative method in these adapted systems, along with the directional element agglomeration method described in Section 4.1.To see the performance on the same problems with uniform angle, please see [4].In particular, [4] showed that with uniform angular refinement, in both the streaming and scattering limit we can solve our problem with close to fixed work.The first step of our adapt solves at a (low) uniform level of angular refinement and then this information (through some error metric) is used to trigger one level of refinement where required, followed by subsequent solves/refinements.We therefore expect our subsequent solves to be cheaper than a uniform equivalent given they should have fewer DOFs, and given the sparsity control in Section 2.2 there should be fewer nnzs in our matrices.

Pure streaming problem
Table 1 shows the results from using a regular angle adapt with fixed spatial resolution (this is the third refined mesh from [4]) in the streaming limit, with directional element agglomeration to compute the CF splitting.Fig. 4 shows the scalar flux and the distribution of our angular resolution throughout space.Our adapt process is focusing resolution in the directions away from the source in order to resolve the rays-effects in this problem.Using a regular 1 6.3× 10 adapt process in this problem is not necessary, given the optimal angular resolution can be easily determined a priori (refining in directions looking away from the source), but we wished to show the effect of solving our adapted discretisations at different adapt steps.Fig. 4 also shows that after the first adapt step, the directional element agglomeration is gluing together elements in the "important" direction at each spatial node, given by the refined angular flux in Fig. 5.We see this results in an iteration count and overall work that is close to flat, with approximately 15% growth after 5 adapt steps.Interestingly, if we force the element agglomeration to be directionless at every adapt step, we see very similar cycle and operator complexities, with 15, 16, 17, 17 and 18 iterations.This implies that the directional element agglomeration is unnecessary.Further investigation revealed that it is the high GMRES polynomial order in AIRG (m = 4, i.e., a third order polynomial) that is compensating for the directionless CF splitting; reducing the polynomial order to 1 with the directional CF splitting gives 17, 31, 44, 53, 65 iterations compared with directionless at 17, 34, 40, 56 and 81 iterations.Our strong approximations to the ideal operators and F-point smoothers are compensating for the poorer CF splitting.With higher spatial resolution (not pictured), the adapt can also remove the angular resolution in the regions between the rays that are added in the initial adapt steps; the coarse spatial resolution used in Fig. 4 means that numerical diffusion keeps some of the angular resolution above the removal threshold.Even with this excess resolution, we have 2.8× and 4.8 fewer NDOFs for the adapted solves in step 2 and 3 than in a uniform level 2 and 3 discretisation in this problem.Similarly, there are 2.75 and 4.6× fewer nnzs in the adapted matrices compared to the uniform.This shows that the pathology described in Section 2.2 where the nnzs grows considerably with the adapt is not seen in this problem and that the sparsity control in Section 3 is effective; the NDOFs grow 4.74× from adapt step 1 to 5, with the nnzs growing 4.88×, or an increase of around 3% above the NDOFs.Table 2 shows the results from using the Falgout-CLJP CF splitting instead of our directionless agglomeration algorithm, and we see much lower iteration count with similarly flat work at each adapt step, with fixed memory use of 10-11 copies of the angular flux..This shows that accounting for the directionality of all the angles (not just the "important" angles) can improve the convergence in the streaming limit.Both the directional element agglomeration and Falgout-CLJP CF splitting however result in close to fixed work in this streaming problem when adapting.This shows the power of combining our iterative method and P 0 angular adaptivity and is in contrast to the results in [1], where we saw that after 3 angular adapt steps in a streaming problem, solving in Haar space with a matrix-free multigrid resulted in an iteration count that grew from 25 to 91.

CG nodes
In general with our adapted matrices, we might expect similar spectrums when compared to the uniform matrices (i.e., when comparing a uniform discretisation to that of an adapt with the same maximum level of refinement) and therefore require a similar (or smaller) number of iterations to converge.Typically this is the case.Table 2 however shows that with Falgout-CLJP CF splitting, the iteration count starts at 9 given the adapt process starts with a uniform level 1 solve, the second and third steps however takes 11 and 12 iterations.The results in [4] show that with uniform level 2 and 3 discretisations this problem took 10 and 11 iterations to solve, respectively.The uniform level 3 matrix on this mesh has a condition number of approx.560 compared with around 1780 for the adapted discretisation in step 3 pictured in Fig. 4g, resulting in the increase in iteration count.Fig. 3 shows that the field of values for the adapted discretisation is closer to the origin when compared to the uniform, and the convergence of our GMRES polynomials is affected by this [36].As noted the adapted discretisation has fewer DOFs (and nnzs) so it is still much cheaper to solve than the uniform discretisation, but it is worth noting that solving the matrices generated by our adapt steps is not always equivalent to solving a uniform discretisation.Table 3: Results from using additive preconditioning on a pure scattering problem with total and scattering cross-section of 10.0 in 2D with regular angular adaptivity with a refinement tolerance of 0.001, a maximum of 3 levels of angular refinement and 3 adapt steps.The WUs listed are scaled by the nnzs in the adapted solve at each step.The cycle and operator complexity listed are for AIRG on M Ω with CF splitting by element agglomeration.
We now examine the performance of our additively preconditioned iterative method with adaptivity in the scattering limit.We only require three adapt steps to resolve the required resolution in this problem; Fig. 6 shows this adapt process and we can see the majority of the angular resolution in the problem is focused around the source.Fig. 6 also shows that given the high scattering cross-section, the element agglomeration has proceeded uniformly, with no directional information after the first adapt step.Table 3 shows that with the CF splitting by element agglomeration, we see a plateau in the iteration count and work.In contrast to the pure streaming problem in Section 5.1.1,performing the CF splitting with Falgout-CLJP results in an identical iteration count (see Table 4), with slightly higher work due to the higher cycle complexity.This confirms that although the element agglomeration results in poorer CF splittings, the streaming/removal operator is easier to invert and in problems with a large removal term, a simple CF splitting algorithm without access to the matrix entries can be sufficient to ensure scalability.
For both methods, the iteration count for each of the adapt steps is lower or equal to that of the uniform angular refinement shown in [4], with the work and memory use constant at around 38 WUs and 17 copies of the angular flux, respectively.The NDOFs grows 4.96× from adapt step 1 to 3, with the growth in the nnzs of the streaming/removal operator at 5.07, giving an increase of around 2%.Table 4: Results from using additive preconditioning on a pure scattering problem with total and scattering cross-section of 10.0 in 2D with regular angular adaptivity with a refinement tolerance of 0.001, a maximum of 3 levels of angular refinement and 3 adapt steps.The WUs listed are scaled by the nnzs in the adapted solve at each step.The cycle and operator complexity listed are for AIRG on M Ω with CF splitting by Falgout-CLJP.

Reduced tolerance adapts
In the previous sections we have shown that AIRG and our additively preconditioned iterative method can solve adapted P 0 problems in both the streaming and scattering limit with close to fixed work with a zero initial condition used at each adapt step.In practice however we would like to use the solution from the previous adapt step as an initial condition to reduce the number of iterations; in streaming problems where the ray-effects between different adapt steps do not align we find this doesn't change the convergence.In scattering problems however it can help reduce the iteration count.
Furthermore, our previous work has used reduced tolerance solves [1,2,3] to decrease the cost of our adaptive process, where the linear system at each adapt step, except the last, is solved to a bigger tolerance, typically 1× 10 -3 in the 2-norm; this is not very robust and requires problem specific tuning.We noted in [1] however that we only need to converge sufficiently such that the error in each of our wavelets (this is also true for any hierarchical angular discretisation, such as P n or FP n ) has converged to within a relative tolerance of 0.1, so that we can determine whether the error is greater than the refinement threshold of 1.0 defined in Section 2.1 (and equivalent for de-refinement with a threshold of 0.01).The convergence criteria for the iterative method during each adapt step, except the last, can therefore be set as the infinity norm on the relative error being less than 0.1.
With goal-based adaptivity this requires that the error metric used has a very good effectivity index and we are forced to compute the goal-based error metric at each iteration which may be expensive.For regular adaptivity however this is very simple, as the error is given by a scaled version of our P 0 solution mapped to wavelet space.As such, with our P 0 discretisation we need to map to the equivalent wavelet space (at the cost of an O(n) mapping) and compute the relative change of each wavelet coefficient; if all the wavelet coefficients have converged to a relative tolerance of 0.1 our iterative method has converged sufficiently for this adapt step and we can exactly determine the refinement/de-refinement required by the next adapt step.This is made more difficult by our sub-grid scale discretisation, as the error calculation and thresholding after each adapt step uses Ψ, and hence requires computing Θ with (4) at every iteration which is costly given the matrix multiplications required.Instead, we map the "coarse" variable Φ to wavelet space at each iteration and check the convergence of the wavelet coefficients on the continuous mesh.This doesn't necessarily produce the exact adapted discretisation that would otherwise have been built, but it is very close; there are typically only a handful of wavelets that would otherwise be refined/de-refined at any single adapt step and those can be picked up by subsequent steps.Fig. 8 shows the results from using this reduced tolerance in a streaming problem and we can see that after 5 adapt steps, this results in a scalar flux and adapted discretisation almost identical to that without the reduced tolerance solve, shown in Figures 4m and 4k.We can see in Fig. 8 that there is a small region at around x = 1.5, y = 0 where refinement has not been triggered in the same way, but overall we find this is a very robust way to reduce the cost of our adaptive process.
Table 5 shows the results from using both a non-zero intial guess (which has very little effect) and the reduced tolerance solves in a streaming problem.We should note we have included the extra cost that comes from mapping to wavelet space and computing the relative change in each wavelet coefficient at every iteration in the WUs.Compared to Table 2 we can see the cost of the first four adapt steps has been reduced considerably.If we compare the total cost of all five adapt steps and scale by the nnzs in a uniform level 3 discretisation, we find the total work has reduced from 72 WUs to 54. [4] showed that it costs 70 WUs to solve a uniform level 3 discretisation, so our adapt process with a reduced tolerance solve beats the uniform equivalent.This cost saving only increases as the number of adapt steps is increased; previously in [1,2,3]   We see similar results with scattering, with Table 6 showing a substantial reduction in the number of iterations for the first two adapt steps and Fig. 9 showing that the resulting adapted discretisation is almost identical to that in Fig. 6h.If we compare the total cost of all three adapt steps and scale by the FLOPs required to compute the matrixfree matvec for the uniform level 3 angular discretisation, the total work is reduced from 26 WUs to 17 with the reduced tolerance solves.In both cases this is less than that required to solve the uniform level 3 angular discretisation as [4]

Conclusions
In this work we presented an adaptive angular discretisation, with nested hierarchical P 0 angular elements that can be mapped to the Haar wavelet space we presented previously [1].Once adapted, the angular matrices required between two nodes with different angular discretisations are no longer block diagonal and hence we introduced a modified stabilisation term for use with our low-memory sub-grid scale FEM discretisation that ensured the nnzs in our adapted matrices didn't grow considerably with adaptivity.We found in both pure streaming and scattering problems that the number of nnzs grew 2-3% above that expected from the number of adapted DOFs.This meant we could form the streaming/removal operator scalably with angular P 0 adaptivity and hence use AIRG and the additively preconditioned iterative method we developed [4].The results from this showed that we can solve our adapted problems in both streaming and scattering problems with very close to fixed work and memory.Our methods don't rely on Gauss-Seidel methods/sweeps, block-diagonal, lower triangular structure or diagonal/block scaling of our matrices and can be applied to many different discretisations.
Given our P 0 discretisation is equivalent to a Haar wavelet discretisation, with adaptivity we mapped our solution to Haar space (in O(n)) and hence tagged the angular elements that require angular refinement/de-refinement.As such we introduced the ability to robustly build up our adapted discretisation with regular adaptivity with reduced cost.We achieve this by mapping the coarse solution, Φ, to Haar space and testing the relative convergence of each wavelet coefficient at each iteration of our iterative methods.This resulted in an adapted discretisation very similar to that which would normally be produced and a large reduction in the iteration count of every adapt step prior to solving the final adapted discretisation.In a simple box test problem this reduced the cross-over point of when our adaptive process beats a uniform discretisation, down to only 5 steps with streaming or 3 with scattering.This shows the benefit of forming a P 0 discretisation hierarchically that can be mapped to an equivalent wavelet space, even when the solve occurs in P 0 space; our refinement/de-refinement is simple and we can use this to robustly reduce the cost of our adapt process with regular adaptivity.
We also presented a CF splitting algorithm based on element agglomeration that could use the adapted angular discretisation at each spatial point to determine "important" directions and hence produce a semi-coarsening in the streaming limit without needing the matrix entries.We found this performed worse than typical CF splitting algorithms like Falgout-CLJP, when used to invert the streaming operator, but when used with AIRG to invert the streaming/removal operator with a large total cross-section this method scaled similarly to Falgout-CLJP, but with approximately twice the work in the solve.Given it is simple to freeze the element agglomeration after a few adapt steps, it may work out cheaper overall to use an element-agglomeration CF splitting in scattering problems with many adapt steps, but we leave examining this for future work.
Overall the combination of Falgout-CLJP CF splitting, AIRG and our additively preconditioned iterative method resulted in close to scalable work in the solve in both the streaming and scattering limit with angular adaptivity for the Boltzmann transport problem.In previous work [1] we found that in the streaming limit the iteration count of a matrix-free multigrid method solving in Haar space tripled after only 3 adapt steps in angle; here we found the iteration count of our new method only went from 9 to 12 after 3 adapt steps.We believe this makes our iterative method an attractive choice for solving adapted discretisations of the Boltzmann transport equation.Future work will involve building an optimised version of this method in order to compare the cross-over point where our P 0 adaptivity results in a lower runtime than a uniform discretisation, re-using components built in prior adapt steps in order to reduce setup times and examining the performance in parallel with load-balancing.

5 Figure 1 :
Figure 1: Angular domains showing angular adaptivity focused around a single "important" direction, namely µ ∈ [0, 1] and ω ∈ [1.47976, 1.661832] in a 2D simulation.The P 0 angular discretisation is on the r = 1 sphere, but has been projected onto faceted polyhedra for ease of visualisation.The camera is pointed in the −z direction.

1 ΩFigure 2 :
Figure 2: Sparsity of a element streaming matrix from D given adapted P0 angle in 2D.The three DG nodes have 10, 7 and 10 angles present, respectively; all three nodes start with three octants at level 1 and one octant at level 2, the nodes with 10 angles have one of the level 2 patches further refined to level 3.

Figure 3 :
Figure 3: Field of values of the streaming operators on the third spatial grid.Blue is uniform level 3 angular refinement, red is an adapted angular discretisation with a maximum level of refinement of 3.

Figure 4 :Figure 5 :Figure 7 :Figure 8 :
Figure 4: Adapt results for a 2D pure streaming problem with a small source at the centre of the domain, allowing a maximum of 3 levels of regular angular refinement (giving a max.number of angles as 64).The first column shows the scalar flux and the second the number of angles across space.The third column shows the resulting element agglomeration on the second spatial grid if the directional algorithm in Section 4.1 is used.The rows show consecutive regular adapt steps

Figure 9 :
Figure 9: Adapt results after 3 regular adapt steps in a 2D pure scattering problem with total and scatter cross-section of 10.0 with a small source at the centre of the domain, allowing a maximum of 3 levels of regular angular refinement, with adapt steps prior to the final step solved with wavelet-based reduced tolerance to determine convergence.

Table 1 :
Results from using AIRG on a pure streaming problem in 2D with CF splitting by directional element agglomeration, drop tolerance on A of 0.0075 and R of 0.025, with regular angular adaptivity with a refinement tolerance of 0.001, a maximum of 3 levels of angular refinement and 5 adapt steps.The WUs listed are scaled by the nnzs in the adapted solve at each step.

Table 2 :
Adapt step.NDOFs n its CC Op Complx WUs full WUs DG Memory Results from using AIRG on a pure streaming problem in 2D with CF splitting by the hypre implementation of Falgout-CLJP with a strong threshold of 0.2, drop tolerance on A of 0.0075 and R of 0.025, with regular angular adaptivity with a refinement tolerance of 0.001, a maximum of 3 levels of angular refinement and 5 adapt steps.The WUs listed are scaled by the nnzs in the adapted solve at each step.
Adapt step.NDOFs n its CC Op Complx WUs mf WUs DG Memory we typically needed a higher number of adapt steps than five in order to beat a uniform discretisation.CG nodes Adapt step.NDOFs n its CC Op Complx WUs full WUs DG Memory

Table 5 :
Results from using AIRG on a pure streaming problem in 2D with CF splitting by the hypre implementation of Falgout-CLJP with a strong threshold of 0.2, drop tolerance on A of 0.0075 and R of 0.025, with regular angular adaptivity with a refinement tolerance of 0.001, a maximum of 3 levels of angular refinement, 5 adapt steps and with adapt steps prior to the final step solved with wavelet-based reduced tolerance to determine convergence.The WUs listed are scaled by the nnzs in the adapted solve at each step.
showed this costs 41 WUs.CG nodes Adapt step.NDOFs n its CC Op Complx WUs mf WUs DG Memory

Table 6 :
Results from using additive preconditioning on a pure scattering problem with total and scattering cross-section of 10.0 in 2D with regular angular adaptivity with a refinement tolerance of 0.001, a maximum of 3 levels of angular refinement and 3 adapt steps.The WUs listed are scaled by the nnzs in the adapted solve at each step.The cycle and operator complexity listed are for AIRG on M Ω with CF splitting by Falgout-CLJP.