An Adjoint Method for High-Resolution EPMA Based on the Spherical Harmonics (PN) Model of Electron Transport

Abstract Electron Probe Microanalysis (EPMA) is a nondestructive technique to determine the chemical composition of material samples in the micro- to nanometer range. Based on intensity measurements of x-radiation, the reconstruction of the material composition and structure poses an inverse problem. The reconstruction methods currently applied are based on models that assume a homogeneous or a layered structure of the studied material. To increase the spatial resolution of reconstruction in EPMA the combination of a more sophisticated reconstruction method, which is based on a model that allows complex material structure, together with multiple measurements with varying beam configurations is required. We present a deterministic k-ratio model that is based on the PN model, an approximation of the radiative transfer equation for electron transport. Our goal is to approximate a maximum likelihood solution of the inverse problem using gradient-based optimization. We detail the application of the model in the context of algorithmic differentiation, in particular by deriving its continuous adjoint formulation. Algorithmic differentiation provides the flexibility to adapt the reconstruction method to various material parametrizations and thus to regularize and take into account prior knowledge. Through examples, we verify our implementation and demonstrate the flexibility of the reconstruction/differentiation method.


Introduction
Electron Probe Microanalysis (EPMA) (Heinrich and Newbury 1991;Reimer 1998) is an imaging technique used for the quantitative analysis of the composition of solid material samples at the micro-to nanometer scale.
The sample is excited by a focused beam of electrons which induces multiple relaxation processes inside the sample.In EPMA, the emission of characteristic x-rays is of special focus.If an electron that is induced by the beam strikes a bound electron that occupies an atomic shell of an atom inside the specimen, the bound electron is ejected from its shell and the atom is left with a vacancy.Outer shell electrons fill this vacancy by emitting a quantized x-ray with an energy corresponding to the energy level difference of the excited (initial) and the relaxed (final) level.The energy levels of electron shells are characteristic of a specific atom, hence the energy of the emitted x-ray provides information about the composition of the material sample.Reconstructing the chemical composition from intensities of emitted x-radiation (normalized into k-ratios ðk a Þ a2A , A : measurement setups) forms the inverse problem in EPMA.In Figure 1, we outline the main physical processes that occur during an experiment.
The crucial ingredient to the inverse problem of reconstruction is the definition of a model k 8 P : P !R jAj that maps parameters p 2 P � R n p (P : parameter space, n p : number of parameters) to k-ratios ðk 8 PÞðpÞ 2 R jAj (jAj : number of measurements).Then the reconstruction can be formalized as the following inverse problem: Find the set of parameters p � such that the modeled k-ratios ðk 8 PÞðp � Þ reproduce the experimental k-ratios k exp as good as possible.
The applications of EPMA are manifold (Llovet et al. 2021;Moy, Fournelle, and von der Handt 2019;Pinard et al. 2013).From geology, and material scatter inside the sample and strike bound electrons that leave a vacancy (blue circle).Outer shell electrons (blue discs) fill this vacancy and release an x-ray of characteristic energy (red wobbly line).The x-ray travels through the sample and is counted by a detector.Beam electrons only excite a certain volume of the sample, the interaction volume (gray ellipses) which scales with the beam energy l � : science to electronics, researchers rely on EPMA to determine the material composition and structure of samples.Every application brings with it different prior knowledge and objectives that must be considered in the reconstruction.We subdivide our model into a k-ratio model k : ðG !R n e Þ !R jAj , q 7 !k½q� and a material parametrization P : P !ðG !R n e Þ, p 7 !PðpÞ ¼ q: Thereby q i ðxÞ is the mass concentration of element i in a compound with n e constituents at a point x 2 G: The sample domain G � R 3 will be specified later.While the k-ratio model k can handle general materials, the reconstruction can be tailored to specific applications via the material parametrization P (e.g., using prior knowledge).In particular, our definition includes reconstruction approaches currently applied in EPMA, which are however very restrictive on the material parametrizations P: ZAF-models assume homogeneity of the material inside the interaction volume of each beam.Effectively, this assumption limits the spatial resolution of EPMA to the size of the interaction volume (Buse and Kearns 2020;Carpenter and Jolliff 2015;Moy and Fournelle 2017).Reconstruction methods based on the integration of /ðqzÞ-curves currently only allow for samples that are layered in depth (Moy and Fournelle 2020).We aim to leverage a more sophisticated k-ratio model and the combination of kratio measurements from various beam positions and energies (resp.measurement setups A) to obtain a high-resolution spatial image of the material composition.
In this article, we present our model for k-ratios (taken from B€ unger, Richter, and Torrilhon 2022) with a detail on the dependence on the mass concentrations q).The main part is dedicated to the derivation of an efficient and modular approach to compute the derivatives of our model, facilitating its integration into differentiable programming (Dorigo et al. 2023).We derive our differentiation approach based on the Adjoint Mode of Algorithmic Differentiation (Griewank 2003;Naumann 2011) and the Continuous Adjoint Method (Plessix 2006).The derivative forms the crucial part in the implementation of gradient-based optimization methods that find a local minimum of an objective function as a candidate solution of the inverse problem.We show the agreement of our gradient computation with finite differences and show the advantages of the adjoint method in terms of computation time.We also provide reconstruction experiments that compare different parametrizations in 1D and show a 2D reconstruction example.

The inverse problem of reconstruction
In theory (Stuart 2010;Tarantola 2005), the solution of the inverse problems is defined as knowledge of the posterior uncertainty of the parameters based on the likelihood of the data, which describes the uncertainty of the model and the measurement, and the prior uncertainty of the parameters.
So far, we consider a Bayesian inversion of the presented problem as intractable.However, the statistical understanding interprets our approach as finding the maximum likelihood estimate under the assumption of an isotropic Gaussian likelihood.We define the reconstruction problem as the following optimization problem: a2A ððk a 8 PÞðpÞ − k exp a Þ 2 |ffl ffl ffl ffl ffl ffl ffl ffl ffl ffl ffl ffl ffl ffl ffl ffl ffl ffl ffl ffl ffl ffl ffl ffl {zffl ffl ffl ffl ffl ffl ffl ffl ffl ffl ffl ffl ffl ffl ffl ffl ffl ffl ffl ffl ffl ffl ffl ffl } ¼ðJ 8 k 8 PÞðpÞ (1) Assuming that the objective function J 8 k 8 P is differentiable w.r.t.p, gradient-based optimization methods (Steepest Descent, Conjugate Gradient, L-BFGS) (Nocedal and Wright 2006) can be utilized to iteratively approximate a local minimum p � 2 P: For an efficient application of gradient-based methods, the efficient computation of the gradient is crucial.Additionally, we require our code to remain modular, such that either different parts of the model and, more importantly, the parametrization of the mass concentrations P is exchangeable.We remark that the computation of maximum likelihood estimates can be ambiguous and ill-posed depending on the number of available measurements and the chosen parametrization.Additional prior information about the material would be necessary that could enter our minimization problem (1) as an additional regularization term Rðqð�Þ, pÞ: Regarding the ill-posedness of the inverse problem, we refer to further research and note that the addition of a differentiable regularization term is possible.

The forward model
The characteristic x-ray intensity model used in this article combines a model for electron transport based on the continuous slowing down approximation (CSD) of the linear Boltzmann equation (radiative transfer) with a subsequently applied model for x-ray generation and attenuation.In B€ unger (2021);B€ unger, Sarna, and Torrilhon (2022);B€ unger, Richter, and Torrilhon (2022), the model is derived in more detail.Here we derive and summarize the main equations and address the dependence on mass concentrations q in detail.

K-ratios
In EPMA, x-ray intensity is measured by counting all characteristic x-rays emitted by electron scattering induced by the electron beam.Because of multiple uncertain quantities concerning the experimental device, the x-ray intensity I a is normalized into a k-ratio k a using standard intensities I std a measured from a known reference sample.
Therefore, the normalization eliminates uncertain multiplicative factors that influence the intensity, e.g., the detector efficiency.
An experiment yields multiple k-ratios.Each k-ratio corresponds to a specific x-ray transition and a specific experimental setup, e.g., the beam position, energy or angle.To distinguish between k-ratios, we use a ¼ ða L , a B Þ 2 A to denote a multi-index, that contains all information about the transition a L and the beam setup a B .
a ¼ðZ i 2 fZ 1 , Z 2 , :::, Z n e g, I 2 fK, L1, :::, M1, :::g, F 2 fL1, :::, M1, :::g, |ffl ffl ffl ffl ffl ffl ffl ffl ffl ffl ffl ffl ffl ffl ffl ffl ffl ffl ffl ffl ffl ffl ffl ffl ffl ffl ffl ffl ffl ffl ffl ffl ffl ffl ffl ffl ffl ffl ffl ffl ffl ffl ffl ffl ffl ffl ffl ffl ffl ffl ffl ffl ffl ffl ffl ffl ffl ffl ffl {zffl ffl ffl ffl ffl ffl ffl ffl ffl ffl ffl ffl ffl ffl ffl ffl ffl ffl ffl ffl ffl ffl ffl ffl ffl ffl ffl ffl ffl ffl ffl ffl ffl ffl ffl ffl ffl ffl ffl ffl ffl ffl ffl ffl ffl ffl ffl ffl ffl ffl ffl ffl ffl ffl ffl ffl ffl ffl ffl } The individual symbols are the atomic number Z i 2 a L , the initial (excited) I 2 a L and the final (relaxed) F 2 a L x-ray level of the electron-transition, 1 the mean and variance of the beam's position ðl x, beam , R x, beam Þ 2 a B , of the beam's energy ðl �, beam , R �, beam Þ 2 a B and the mean and concentration parameter of the beam's direction ðl X, beam , j X, beam Þ 2 a B :

Characteristic X-ray emission model
Based on mass concentrations q, we aim for a generic model for k-ratios k a which, according to the usual definition in EPMA, are computed as the ratio of a measured I a and a standard intensity I std a : Both intensities can be computed from the same model, only the underlying mass concentrations, q resp.q std a , changes.Neglecting multiplicative factors that cancel due to the normalization with standard intensities I std a , we formulate the intensity of characteristic xrays I a as the integral of the attenuation field A a , the number of atoms N a and the generation field of characteristic x-rays I a over a domain G � R 3 of the sample which is chosen such that G covers the support of I a , i.e., I a ðxÞ ¼ 0 8x 2 R 3 n G: 1 In IUPAC notation (Jenkins et al. 1991), an x-ray level corresponds to an electron configuration that is typically described by the removal of an electron from the configuration of neutral ground state (e.g., K, L 2 ).A characteristic x-ray is emitted by transitioning from an initial to a final x-ray level, namely the x-ray transition (e.g., K − L 2 ).
The attenuation field A a describes the intensity reduction of generated x-rays while traveling through the sample toward the detector.Attenuation is quantified by Beer-Lamberts law where the attenuation coefficient l a is the weighted sum of mass concentrations q i and mass attenuation coefficients s a, i : 2 In ( 6) the integration domain is the straight line lðx, x d Þ � G from a point x 2 G toward the detector position x d 2 @G, such that the line lðx, x d Þ lies inside the domain G. Neglecting the part outside the domain G assumes that xrays travel unattenuated in the vacuum surrounding the material sample.
The number of atoms N a (per unit volume) is given by the division of the mass concentration field q i by the atomic mass A a of the element Z i 2 a: The third factor that influences the x-ray intensities I a is the field of characteristic x-ray generation.
It contains the electron fluence w a ðx, �, XÞ ¼ jvð�Þjf ðx, �, XÞ (vð�Þ : electron velocity, f ðx, �, XÞ : number density of electrons) of beam electrons at x 2 G with energy � 2 E ¼ ½� cut , � max � into direction X 2 S 2 inside the material probe which is weighted by the x-ray production cross-section r a : 3 The energy interval E is defined by the cutoff energy � cut , which is chosen so small that no more ionization can occur, and � max , which is chosen sufficiently large to capture all beam electrons.

2
In the literature the mass attenuation coefficient is commonly denoted using ðl=qÞ a, i : It depends on the medium i and the x-ray wavelength a.For brevity of notation, we write s a, i : 3 For x-ray transitions a L where the initial state is K, the x-ray production cross-section can be replaced by the ionization cross-section of the K shell (k-ratio normalization eliminates the fluorescence yield).For other transitions, e.g.L or M as the final state, the x-ray production cross-section should include the ionization crosssection of all lower shells weighted by transition probabilities.
Describing the electron fluence w a forms the most difficult part of modeling the emission of characteristic x-rays for inhomogeneous samples.We detail our model for the electron fluence w a in the following section.

Deterministic electron transport model
The dynamics of the beam electrons are described by a linear Boltzmann equation, an integro-differential equation describing the evolution of electron number density in phase-space (position-velocity space), whereby the physical processes governing the evolution of the electron number density are free-flight phases interrupted by elastic and inelastic collisions with atoms of the material probe.Since the timescale of the physical processes is very small in comparison to the duration of the experimental measurements of x-ray intensities, it is valid to neglect the time dependence and assume a steady-state distribution of beam electrons immediately after switching on the electron beam.Furthermore, as inelastic scattering crosssections are highly peaked around small energy losses, i.e., electrons most probably lose a significant amount of energy in a sequence of small energy losses, it is common in EPMA to model the energy loss of projectile electrons as a continuous deceleration along the trajectory.These simplifications lead to modeling the beam electrons by the Boltzmann equation of electron transport in continuous slowing down (CSD, resp.BCSD) approximation -an evolution equation in energy space given by (Larsen et al. 1997) where the energy loss is governed by the stopping power S and the scattering operator Q accounts for elastic and inelastic collisions (CSD-average).Both, the stopping power S and the scattering operator Q are derived from cross-sections for compounds, which are defined by means of the additivity approximation.We use the cross-sections from Salvat et al. (2007) that are also used in B€ unger (2021); Olbrant (2012).Assuming a well-defined atomic mass, we can write both as the sum of atomic quantities S i and Q i 4 weighted by mass concentrations q i .S q ½ � ðx, �Þ ¼ X n e i¼1 q i ðxÞS i ð�Þ (10) 4 For brevity of notation we divide both quantities by the respective atomic masses.
The BCSD model describes the evolution of the electron fluence w a in energy space and consequently needs to be equipped with initial and boundary conditions.As the evolution is prescribed from high to low energies an initial condition defines the electron fluence at the maximal energy � max w a ðx, � ¼ � max , XÞ ¼ w a, 0 ðx, XÞ ¼ 0: (12) We will initialize the electron fluence at an energy � max sufficiently larger than the mean beam energy l �, beam by w a, 0 ðx, XÞ ¼ 0 and model the electron beam by boundary conditions.The electron transport model can be equipped with Dirichlet-type boundary conditions by prescribing the incoming half of the electron fluence at the boundary where n denotes the outward-pointing normal-vector at x 2 @G and w a, in is a given distribution of incoming particles.We model electron beams by a product of Gaussian distributions in space and energy together with a von-Mises-Fisher distribution in direction.An electron beam that is aligned with the outward-pointing normal n of the polished material surface and focused onto the point l x, beam 2 @G is modeled by where N a, x ðxÞ is the probability density function of a Gaussian with mean l x, beam 2 a and covariance R x, beam 2 a: Respectively the distribution in energy N a, � ð�Þ (Gaussian with mean l �, beam 2 a and variance R �, beam 2 a) and the distribution in direction F a, X ðXÞ (von-Mises-Fisher with mean l X, beam 2 a and concentration j X, beam 2 a) is defined.

Spherical harmonic approximation
A numerical solution of the electron transport model in Equation ( 9) is expensive due to the high dimension of phase space of w a ðx, �, XÞ and complexity of the scattering operator Q: Therefore, we employ the method of moments to derive a modal approximation of the linear Boltzmann equation.We reduce the phase space by replacing the direction X as an independent variable by a small set of moments to model the electron fluence w.Here we consider the spherical harmonic (P N ) moment approximation (Case and Zweifel 1967;Mark 1944;Mark 1945;Seibold and Frank 2014), i.e., we approximate the electron fluence w by its series expansion in spherical harmonics Y k l : S 2 !R truncated after some finite degree l � N 2 N: where the expansion coefficients are the spherical harmonic moments of the angular distribution and we model the fluence through the system of evolution equations of its moments u l, k a which we refer to as the P N equations.The P N equations are obtained by weakly enforcing Equation ( 9) for w P N a on the test space spanned by spherical harmonics up to degree N. By gathering the coeffi- the equation for u a can be written as F a ðu a , qÞ ¼ 0: |ffl ffl ffl ffl ffl ffl ffl ffl ffl ffl ffl ffl ffl ffl ffl ffl ffl ffl ffl ffl ffl ffl ffl ffl ffl ffl ffl ffl ffl ffl ffl ffl ffl ffl ffl ffl ffl ffl ffl ffl ffl ffl ffl ffl ffl ffl ffl ffl ffl ffl ffl ffl ffl ffl ffl ffl ffl {zffl ffl ffl ffl ffl ffl ffl ffl ffl ffl ffl ffl ffl ffl ffl ffl ffl ffl ffl ffl ffl ffl ffl ffl ffl ffl ffl ffl ffl ffl ffl ffl ffl ffl ffl ffl ffl ffl ffl ffl ffl ffl ffl ffl ffl ffl ffl ffl ffl ffl ffl ffl ffl ffl ffl ffl ffl } ¼F a ðu a , qÞ With the following definitions for A ðdÞ and Q½q�: Using the method of moments, we reduced the phase space by two dimensions at the cost of replacing a scalar equation with a system of ðN þ 1Þ 2 equations.Additionally, A ðdÞ and Q exhibit structures that can be exploited to develop efficient numerical solvers.Q½q�ðx, �Þ is diagonal as spherical harmonics are eigenfunctions of the collision operator Q½q� such that the computational cost of the scattering operation grows linearly in the number of moments.
The transport matrices A ðdÞ inherit a sparsity pattern from the recursion relation of spherical harmonics.When ordering spherical harmonics that are odd (even) in Cartesian direction d into a vector function Hence, the even (odd) moments u ½e, d� a couple with the odd (even) moments u ½o, d� a only through their spatial derivatives in direction d, which can be employed by central finite difference schemes on staggered grids (Seibold and Frank 2014).
We employ the boundary conditions introduced in B€ unger, Sarna, and Torrilhon (2022).The boundary conditions follow from taking half moments of the Boltzmann boundary condition (13) with odd (in direction d 2 f1, 2, 3g) spherical harmonics as weighting functions and a stabilization step that is similar to the truncation of the series expansion in (15).For a given boundary point x 2 @G with outward pointing boundary normal e d we formulate boundary conditions for the moment vector The boundary conditions are given by These boundary conditions are compatible with the characteristics of the P N equations ( 17) and allow us to assure energy stability, see B€ unger, Sarna, and Torrilhon (2022) for details.From (12) we derive the following initial condition for the moments u a : The P N equations ( 17) describe the evolution of all ðN þ 1Þ 2 moments u a of w a , whereas the intensity model ( 8) only requires the angular average of w a because we assumed the isotropic emission of x-rays.The spherical harmonic Y 0 0 ðXÞ is constant, so we identify the moment u 0, 0 a / Ð S 2 w a ðx, �, XÞdX as part of our x-ray emission model and simplify the integral (8) to the following.

Mass concentration model
Typically, the quantities of interest in EPMA are weight fractions x i ðxÞ ¼

M i ðdxÞ
MðdxÞ that form a partition of unity P n e i¼1 x i ðxÞ ¼ 1: In order to determine mass concentrations qðxÞ from weight fractions xðxÞ, additional assumptions are necessary.We neglect the influence of different molecular or crystal structures and assume that the compound fx 1 , :::, x n e g is formed in a volume-preserving manner.Then the total volume is VðdxÞ ¼ P n e i¼1 V i ðdxÞ ¼ x i ðxÞMðdxÞ q pure i : We derive the following relation between mass fractions and mass concentrations.
x j ðxÞ q with the additional constraints, that p pc i, k � 0 and 26), ( 27) forms the piecewise-constant material parametrization P pc : P !ðG !R n e Þ: In the examples, we will replace the weight fraction parametrization x pc ðxÞ to investigate other parametrizations PðpÞ:

Differentiation of the forward model
As described in the introduction, we consider the inverse problem as a minimization problem and apply iterative gradient-based optimization methods to find a candidate solution p � : Gradient-based optimization methods require the gradient of the objective function dJ dp : An efficient implementation of the gradient of our model, that simultaneously remains extensible and modular (particularly regarding the parametrization) is a challenge.While automatic algorithmic differentiation tools have the potential to transform any numerical code, the current versions of AD tools in julia (Zygote.jl,ReverseDiff.jl)cannot be applied to our model due to memory limitations and lack of performance.
In the following, we present a method, that adopts the structural implementation of derivative code of numerical code from algorithmic differentiation (AD).We focus on the adjoint mode of AD, since we are mainly interested in the gradient of the scalar objective function.The advantage of the adjoint mode is that it scales with the number of outputs, whereas with numerical differentiation (finite differences) or the tangent mode of AD, the complexity scales with the number of inputs.Also, the analytical derivation of our method provides a clear understanding that facilitates its memory-efficient implementation.
The core idea of AD is to base the derivative computation on local function transformations that are automatically composed.This also achieves modularity, i.e., if a particular function is modified, only the corresponding transformation needs to be adjusted.

Adjoint algorithmic differentiation
Algorithmic differentiation (AD) (Burghardt et al. 2022;H€ user 2022;Innes 2018;Kreyszig 1991;Naumann 2011) defines a framework to compute derivatives of numerical program code.In contrast to symbolic or numerical differentiation (finite differences), algorithmic differentiation is without approximation error and modular, so the code remains extensible.Also, the automatic differentiation of numerical code is possible.We review the basic definitions of AD with a (unusual) focus on high/infinite dimensional spaces.Alongside a motivating example, we describe the systematic composition into code that is able to compute derivatives.
Consider a differentiable function f : U � X !Y, x 7 !f ðxÞ ¼ y, where X and Y are Hilbert spaces and U is the open domain of f.Given an input x 2 U, we define the tangent operator @f ðxÞ @x ½�� : X !Y as the linear bounded operator that satisfies Additionally, given a tangent _ x 2 X of the input, we define the tangent _ y of the output by Given the tangent operator @f ðxÞ @x ½��, the corresponding adjoint operator where h�, �i X and h�, �i Y are the respective inner products of X and Y.
Similarly to the tangent, given an adjoint � y 2 Y of the output, we define the adjoint � x of the input by Consequently, the identity h� y, _ yi Y ¼ h� x, _ xi X holds for all _ x 2 X and � y 2 Y: In real, finite dimensional spaces (e.g., ), the tangent operator can be represented by the Jacobian Df ðxÞ 2 R m�n and the adjoint operator by its transpose Df ðxÞ T 2 R n�m : For examples in infinite dimensional (function) spaces, we refer to Sec. 3.2.
The core idea of AD is to subdivide a function into individual subfunctions (with individual tangent and adjoint operators) and intermediate variables (with intermediate tangents and adjoints).As an example, consider the following decomposition of f : U ! Y, x 7 !f ðxÞ ¼ y into two subfunctions g : U ! V and h : V � U ! Y where we use an intermediate variable v 2 V (V is another Hilbert space).
Given x 2 U, v 2 V and y 2 Y as well as an input tangent _ x 2 X, the tangent _ y of is given by where we employ the chain rule to split the tangent operator of f into a combination of tangent operators of g and h.Additionally, we identify the tangent _ v of the intermediate variable.To find the corresponding adjoint operator, we successively use adjoint operators of the subfunctions g and h to isolate _ x in the inner product identity To summarize, given the adjoint � y 2 Y, the adjoints � v and � x are Based on the chain rule, the composition of individual tangent and adjoint operators of a larger decomposition into an algorithm to compute the derivative is very systematic and can be implemented automatically.We describe the methodology using an arbitrary decomposition of a function f : x 7 !f ðxÞ ¼ y with n inputs and m outputs into subfunctions / i¼1, :::I : With ðv j Þ j�i we denote all predecessors of v i , i.e., all intermediate variables v j that v i directly depends on, and with ðv j Þ j�i we denote all successors of v i .Tangent mode AD is the successive computation of tangents Given an input tangent _ x, a tangent _ v i represents the directional derivative of v i into direction _ x: Note that the tangent mode follows the sequence of operations in the evaluation of the primal.In adjoint mode AD, the sequence of operations is reversed.To compute an adjoint � v i , we consider the adjoints ð� v k Þ k�i of all intermediate variables, that depend on v i .
Usually, in AD literature the single assignment code is considered, where all intermediate variables v i 2 R are scalar, and the tangent operators (resp.multiplication with a scalar) is self-adjoint.Then the � � can be neglected.For real multivariate intermediate variables the adjoint is the transpose � � ¼ � T : We briefly comment on some aspects of AD: Computational Complexity: Implementations of tangent and adjoint mode AD differ in algorithmic complexity.The Jacobian of a function f : R n !R m is obtained using tangent mode AD by j ¼ 1, :::, n successive executions of (38) where the input tangents are seeded with unit vectors _ x ¼ e j : Elements of the Jacobian are then retrieved from ðDf ðxÞÞ ij ¼ _ y i _ x¼e j : In contrast to the tangent mode, using adjoint mode AD the Jacobian is obtained by i ¼ 1, :::, m successive applications of (39) where the output adjoints are seeded with unit vector � y ¼ e i : Elements of the Jacobian are then retrieved from ðDf ðxÞÞ ij ¼ � x j � y¼e i : Hence, to compute the full Jacobian, an implementation of tangent mode scales in the number of input variables while the adjoint mode scales in the number of output variables.Modularity and Extensibility: An implementation of either tangent or adjoint mode AD is modular and extensible.It consists of a systematic composition of individual tangent and adjoint operators.For example, changing one subfunction in (37) only affects its corresponding tangent and adjoint operator in (38) resp.( 39).Implementations of the other operators remain.Computational Graph: Both modes of AD can be visualized using a computational graph consisting of vertices for each variable (input, output, intermediate) and directed edges for direct dependence of variables.Assume a labeling of the edges with their corresponding tangent/adjoint operator applied to the tail(tangent)/head(adjoint) of the edge.The tangent is defined as the sum of all incoming edge tangent operators, while the adjoint is defined as the sum of all outgoing edge adjoint operators (c.f.Equations ( 38) and ( 39)).Checkpointing -Adjoint Memory Consumption: While the tangents can be implemented alongside the execution of the primal function and intermediate primal variables (x, v) are available, they must be kept in memory for the computation of adjoints.Storing all intermediate variables is expensive.Checkpointing remedies the high memory consumption by recomputing certain intermediate variables while computing the adjoint operators.In the example (36), the primal v ¼ gðxÞ would be recomputed while evaluating the adjoint of h.
Our model includes the special case, that the mapping from qðxÞ to u a ðx, �Þ is given implicitly by the PDE in Equation ( 17).We state the two basic steps of calculating adjoints in implicit relations and refer to the Appendix A for a more detailed derivation.Consider the implicit definition Fðy ¼ gðxÞ, xÞ ¼ 0 (with differentiable F(y, x)), where we assume that it uniquely defines the differentiable mapping y ¼ gðxÞ: We introduce an additional intermediate adjoint variable � F: Then, given the adjoint � y of the output, the adjoint � x of the input can be computed from without explicit definition of � x ¼ @gðxÞ @x � ½� y�:

Differentiation method for our model
In the following, we will successively define adjoint operators for the individual parts of the model described in Sec. 2. All derivations follow the same scheme: First, we define the tangent operator using the directional derivative and second we derive the adjoint operator from the inner product identity.We will elaborate on complex derivations and only state the adjoint operator for simple ones.In this paper, the only relevant derivatives are with respect to the mass concentrations q, hence all quantities that do not depend on q are treated as constants and their adjoints are omitted.
Compared to the forward computation, adjoints are computed in reversed order.We follow this behavior while deriving the adjoint operators for the individual dependencies.For brevity of notation we omit the accumulation of adjoints (tangents) and respectively only consider the operators that describe the direct propagation.The accumulation becomes apparent in the high-level computational graph in Figure 2. If multiple edges leave a node, the adjoints of all outgoing edges have to be summed up.
� J� : The final operator (1) of our model computes the squared error J from a set of simulated k-ratios k a , a 2 fa 1 , :::g: Given the adjoint � J, which for the computation of derivatives @J @� must be chosen � J ¼ 1, the definition of the adjoint operator of the squared error from � � k a � : Also the adjoint operator for normalization with standard intensities in Equation ( 4) is derived by simple means.
We elaborate on the adjoint operators derived from Equation (5).Linearity of the integral in (5) directly yields the tangent _ I a : Using the standard definition of the inner products in R, ha, bi R ¼ ab and in L 2 ðGÞ, ha, bi L 2 ðGÞ ¼ Ð G a T ðxÞbðxÞdx we derive adjoints � A a , � N a and � I a : We detail � A a :  � N a ðxÞ ¼ A a ðxÞI a ðxÞ � I a and � I a ðxÞ ¼ N a ðxÞA a ðxÞ � I a (44) � ½� q ¼ @N a @q � � N a � : From Equation ( 7), we find the adjoint operator.
� ½� q ¼ @A a @q � � A a � : Deriving the adjoint from the attenuation given in ( 6) requires more analysis.The tangent of the absorption operator A _ a ½q�ðxÞ is given by the directional derivative into direction _ qðxÞ: By inserting the tangent A _ a into the definition of the adjoint (31), we derive the adjoint of the attenuation operator.In the following derivation, we rewrite the line integral from ( 46) as an integral over the whole space G by introducing dðdðx, x d , yÞÞ to denote the Dirac, that indicates the line segment lðx, x d Þ: So dðx, x d , yÞ denotes the distance of y to the line segment lðx, x d Þ between x and x d .
By swapping the order of integration dydx to dxdy, we are able to separate _ q i , but the interpretation of dðdðx, x d , yÞÞ changes.For a given y, dðdðx, x d , yÞÞ now indicates the points x of all lines lðx, x d Þ that contain y.In other words: dðdðx, x d , yÞÞ indicates the line lðy, x d � Þ from y to the point reflection x d � 2 @G of x d 2 @G at x 2 G: Exemplarily, we illustrate lðx, x d Þ, lðx, x d � Þ and dðx, x d , yÞ in Figure 3. From (47), we identify the adjoint operator of (6).
Where the integration domain lðx, x d � Þ is the line between x and reflection x d � of x d at x. 25), the only relevant moment for the ionization field I a is u ð0, 0Þ a , hence the adjoint � u a : � ½� q ¼ @u a @q � � u a � : The relation between q and u a is implicitly given by the P N equations and the boundary conditions, i.e., by the implicit PDE operator F a ðu a , qÞ ¼ 0 (17).We describe the derivation of the adjoint operators (40) for the P N equations.For a detailed derivation of (40) we refer to the Appendix A. Let us introduce the auxiliary adjoint � F a and derive the operators @F a @u a � ½�� and @F a @q � ½�� in the following.Successively, both operators form the adjoint @u a @q � ½��: In particular, we elaborate on the identities (50) � ½ @F a @u a � � F a ¼ � u a � : The adjoint operator @F a @u a � applied to � F a describes the continuous adjoint equation.

S q
½ � ðx, �Þ@ � � F a ðx, �Þ − |ffl ffl ffl ffl ffl ffl ffl ffl ffl ffl ffl ffl ffl ffl ffl ffl ffl ffl ffl ffl ffl ffl ffl ffl ffl ffl ffl ffl ffl ffl ffl ffl ffl ffl ffl ffl ffl ffl ffl ffl ffl ffl ffl ffl ffl ffl ffl ffl ffl ffl ffl ffl ffl {zffl ffl ffl ffl ffl ffl ffl ffl ffl ffl ffl ffl ffl ffl ffl ffl ffl ffl ffl ffl ffl ffl ffl ffl ffl ffl ffl ffl ffl ffl ffl ffl ffl ffl ffl ffl ffl ffl ffl ffl ffl ffl ffl ffl ffl ffl ffl ffl ffl ffl ffl ffl ffl ffl } In analogy to the reversed application of adjoint operators, the adjoint equation describes the evolution of the adjoint � F a in reversed direction (w.r.t. the energy �).Initial and boundary conditions follow from the derivation.
Derivation Linearity of F a in u a directly yields the tangent @F a @u a _ u a : Using integration by parts and symmetry of A ðdÞ and Q, we obtain |ffl ffl ffl ffl ffl ffl ffl ffl ffl ffl ffl ffl ffl ffl ffl ffl ffl ffl ffl ffl ffl ffl ffl ffl ffl ffl ffl ffl ffl ffl ffl ffl ffl ffl ffl ffl ffl ffl ffl ffl {zffl ffl ffl ffl ffl ffl ffl ffl ffl ffl ffl ffl ffl ffl ffl ffl ffl ffl ffl ffl ffl ffl ffl ffl ffl ffl ffl ffl ffl ffl ffl ffl ffl ffl ffl ffl ffl ffl ffl ffl } The last two terms in (54) vanish due to the initial and boundary conditions imposed to _ u a and � F a : The former term is zero because of the conditions _ u a ð�,  the integrand in ( 56) is zero because ðdÞ ÂðdÞ Þ |ffl ffl ffl ffl ffl ffl ffl ffl ffl ffl ffl ffl ffl ffl ffl ffl ffl ffl ffl ffl ffl ffl ffl ffl ffl ffl {zffl ffl ffl ffl ffl ffl ffl ffl ffl ffl ffl ffl ffl ffl ffl ffl ffl ffl ffl ffl ffl ffl ffl ffl ffl ffl } Using that L ðdÞ ¼ L T ðdÞ , defined in ( 23), is symmetric.� ½� q ¼ − @F a @q � � F a � : The tangent @F a @q _ q follows directly from linearity of F a (and S and Q) in q.For the adjoint we find ::: and identify The adjoint of the mass concentration model ( 26) can again be derived by simple means.� x pc i ðxÞ ¼ q tot ðxÞq i ðxÞ − q 2 tot ðxÞ q pure i X n e k¼1 x k ðxÞ� q k ðxÞ (60) x pc � : The indicator functions in ( 27) lead to a partitioning of the integration domain G into G k for the � p i, k : This completes the derivation of adjoint formulations for all parts of the model presented in Sec. 2. Note that for nonlinear operators, the primary result is required to calculate the adjoint, and therefore must be saved during the forward execution or recalculated.This is analogous to discrete adjoint tools.Furthermore, all derived operators are modular and only need to be assembled according to the computational graph.When replacing parts of the model, only the corresponding adjoint operator needs to be adjusted.In the examples in Sections 4.3 and 4.4, we will use this modularity and exchange the parametrizations P:

Computational effort
The motivation for deriving and implementing the adjoint formulation of our model is that of the efficiency advantage over finite differences and the tangent mode of AD.Our implementation inherits the runtime complexity of the adjoint mode, as the runtime of a Jacobian calculation scales in the number of output variables of the primal.The price to pay for faster execution is the higher memory requirement since all primal variables have to be stored during forward execution or recalculated later.
Leveraging the convenience of automated AD tools to derive the discrete adjoint code of our implementation was not possible with current tools in julia.However, our approach somewhat mimics the development approach that tools, e.g., Zygote.jl or ChainRulesCore.jltake.Instead of implementing only tangent and adjoint versions of low-level operations, they aim for efficient implementations of high-level methods.We adopt this approach and derive an adjoint version in the continuous framework and implement discretized versions of the operators in code.
The high-level view offers further insight, and one more possibility to reduce the computational effort for the gradient.Note that the naive adjoint implementation requires jAj "solves" of the adjoint PDE (same amount as the forward).We separate the multi-index into a ¼ ða L , a B Þ 2 A L � A B where a L describes the k-ratio line (material, transition) and a B the beam setup (position, energy).We revisit Equations (44)( 49) and ( 51).We realize that the only dependence of all equations from a B is due to the scalar � I a : Instead of computing � I a , � u a and � F a , we consider � I a and eliminate thereby the dependence from the beam setup a B .The operators for Ĩ a L , ũa L and Fa L do not depend on a B .The accumulation of the adjoint � q i (cf.Equation ( 59), where we sum over all incoming edges in the computational graph 2), can then be rewritten to Instead of solving the adjoint PDE for every � F a it is sufficient to compute Fa L for every a L .The number of expensive adjoint PDE solves that are required is reduced from jAj to jA L j:

Numerical examples
Preliminary results for the examples presented in section 4.3 and 4.4 can be found in (Achuda et al. 2023).

Implementation and discretization
Conceptually the discretization of the P N -equation ( 17) and the adjoint P Nequation ( 51) are the most challenging.For the numerical computation of the moments u a and � F a we implemented a 3D staggered grid finite-difference method based on the method StaRMAP presented in (Seibold and Frank 2014).StaRMAP is specifically designed to exploit the structure of P N -equations.Individual sets of moments u a are discretized on the staggered grids in such a way, that sparsity of the transport matrices A ðdÞ decouples the system and enables the use of second-order central differences that approximate the spatial derivative on the respective displaced grid.Second-order integration in energy is achieved using a splitting of the moments and half steps for each part of the solution.
The boundary conditions in ( 22) and ( 53) are used to compute moments on ghost nodes, located outside the computational domain G. Values of the moments for ghost nodes are set, such that the boundary condition holds for interpolated moments on the boundary of the domain @G: For details see B€ unger (2021).
For integrations in energy and space as well as the line integral for the absorption we use the trapezoidal rule.

Comparison: sensitivities of a 1D material with finite differences
We consider a binary material consisting of copper Cu and nickel Ni and compare sensitivities @I a @p computed using our implementation (resp.� p where � I a is seeded with unit vectors) with sensitivities computed by central finite differences.By means of this comparison of our implementation to finite differences we address several questions at the same time: � By introducing a second parametrization, we exemplarily show the modularity of our implementation.� We provide insight into the nature of the model.The sensitivity @I a @p is the linear approximation of intensities with respect to model parameters, hence a large sensitivity relates to a strong dependency and a small sensitivity to a weak dependency.This is particularly interesting for the inverse problem because it also unveils possible difficulties in the reconstruction.� Agreement with finite differences validates our implementation.� We demonstrate the claimed superiority of our method compared to finite differences in terms of computation time.
We introduce an alternative parametrization, where K k ðxÞ, k 2 f1, :::n k g are triangular functions covering the domain G, in particular The partition of unity P n e i¼1 x i ðxÞ ¼ 1 is (again) guaranteed by p pl i, k � 0 and P n e −1 j¼1 p pl j, k � 1: We refer to the parametrization as piecewise-linear.
The adjoint of the piecewise-linear parametrization � p pl is given by Modularity is set out by the fact, that only the parametrization and its adjoint are implemented, the rest of the code remains unchanged.Now both parametrizations, the piecewise-constant parametrization (with n k ¼ 20 parameters) and the piecewise-linear parametrization (also with n k ¼ 20 parameters) are employed to compute sensitivities @I a @p pc resp.@I a @p pl : The physical setup and the settings of the P N approximation for this example can be found in Table 1.Parameter values for the parametrizations are sampled independently from a uniform distribution in ½0, 1�: We visualize the sensitivities by plotting x pc 1 ðxÞj p¼ @Ia @p pc and x pl 1 ðxÞj p¼ @Ia @p pl in Figure 4. Using the parametrizations xðxÞ is gentle abuse but since the parameters in x pc and x pl are related to space points, an interpretation of the plot is possible.Increasing a parameter, increases the mass concentration of Cu and simultaneously decreases the mass concentration of Ni, hence the positive values for Cu x-rays and the negative value for Ni x-rays.In depth the sensitivity decreases because with greater depth the generation of x-rays decreases and the absorption increases.Figure 4 also compares sensitivities computed by our method (colored, solid lines) with the sensitivities computed by central finite differences (black, dashed lines).The curves coincide.Additionally, we compare relative errors of sensitivities in Table 2. Derivatives computed by algorithmic differentiation are exact (AD computes the exact derivative of the numerical approximation up to machine precision) (Naumann 2011).Our method as well as finite differences only provide approximations of the derivative.For  A visualization of the sensitivities computed using our method and finite differences.
We abuse the parametrization function of each parametrization and plot x 1 ðxÞj p¼ @Ia @p : The curves computed using our method (solid lines) coincide with the curves computed using finite differences (black dashed lines).A quantitative comparison of sensitivity values is given in Table 2.
our method, the underlying approximation of the adjoint state variable introduces errors, for finite differences the approximation error originates from the perturbation h.
In Table 3, the superiority of our method in terms of computation time compared to finite differences becomes apparent.We compare the runtime of a central finite difference method (2nd order) with our implementation for two different numbers of parameters (n p ¼ 20 and n p ¼ 40).The runtime of the forward model is � 0:5s, and the computation of the derivative with finite differences clearly scales with the number of parameters n p .The runtime of our implementation scales in the number of outputs (here jAj ¼ 4) and clearly does not scale with n p .Note the minor overhead because of the backward execution.

1D reconstruction of a sharp and a diffusive interface
We present the reconstruction of a coated material consisting of iron Fe and nickel Ni with a sharp (discontinuous) and a diffusive (continuous) interface between the layers.
� We show the necessity of choosing a suitable parametrization.� We compare the previously defined parametrizations (( 27) and ( 63)) to an additional non-linear parametrization (66) We introduce another parametrization that is based on the non-linear structure of a neural network (with x as the input and x nl as the output.The activation function for one hidden layer is tanhðÞ and sigðÞ for the output layer to enforce 0 � x nl i � 1).The parametrization uses n p ¼ 10 parameters, which we call p nl ¼ ða 1 , a 2 , a 3 , b 1 , b 2 , b 3 , â1 , â2 , â3 , bÞ T : |ffl ffl ffl ffl {zffl ffl ffl ffl } x nl 1 ðxÞ ¼ sigð |ffl ffl ffl ffl ffl ffl ffl ffl ffl ffl ffl {zffl ffl ffl ffl ffl ffl ffl ffl ffl ffl ffl } ¼ŵðxÞ Þ (66b) The adjoints for the non-linear parametrization are given in the following.
For a fair comparison, the number of parameters in the other parametrizations is also chosen to be 10.For the binary material the scalar x 1 ðxÞ completely describes the material, because x 2 ðxÞ ¼ 1 − x 1 ðxÞ: The artificial measurements for the reconstruction are k-ratios from multiple different beam energies between 9 keV and 15 keV: To compute the measurements we assume the reference material to be given.Additional P N model settings and the considered x-rays are tabulated in Table 4.We choose the BFGS method implemented in Optim.jl(Mogensen and Riseth 2018) as optimization method (with additional box-constraints for the piecewise-constant and the piecewise-linear parametrization, and no constraints for the non-linear parametrization).
In Figure 5, the reconstructed density of the material with a sharp interface is visualized using the different parametrizations.From initial configurations (piecewise-constant, piecewise-linear: 0.5 Fe and 0.5 Ni; non-linear: random) the parametrizations iterate toward the reference density (black line) during the optimization.We visualize the initial, first, fifth iteration and the 100th iteration.The inability of the piecewise-linear parametrization to represent discontinuities becomes apparent.However, the piecewise-constant parametrization only approximates perfectly because one of the subdomain interfaces matches the interface of the reference material.On the other hand, the non-linear parametrization is flexible enough to identify the location of the interface and to approximate the discontinuity.
In Figure 6, the reconstructed density of the material with a diffusive interface is visualized.Except for the reference material, exactly the same settings as for the reconstruction of the sharp interface are used.Again, the parametrizations iterate toward the reference density (black line).For the diffusive interface, none of the parametrizations can reconstruct the interface perfectly, but the piecewise-linear and the non-linear parametrizations perform better than the piecewise-constant parametrization.In Figure 7, we visualize the normalized error, the value of the objective function jjkðpÞ − k exp jj 2 , the error of the mass concentrations jjqðxÞ − q true jj 2 and the error of additional k-ratio measurements jjk þ ðpÞ − k exp þ jj 2 (with beam energies 11 keV and 14 keV) during the iterations of the optimization.The kinks in the error of the objective function are an artifact of the Optim.jlimplementation.Their default implementation is based on the combination of a line search method (Hager and Zhang 2005) and the BFGS algorithm presented in (Nocedal and Wright 2006).
Differences in the performance of the different parametrizations are clearly visible.The error in the mass concentrations for the piecewise-constant parametrization for the sharp interface is by far the smallest, due to the possible perfect reconstruction.Only the non-linear parametrization performs similarly well for both examples.
Comparing the error in mass concentrations for piecewise-constant and piecewise-linear parametrization for the sharp and the diffusive interface, they vary.A correlation between the error in mass concentrations and the error of additional k-ratios would be beneficial but is hard to obtain visually.Ideally, both would behave similarly, because then we could propose the error in additional k-ratios as a suitable reconstruction quality measure.However, the evaluation of reconstruction quality measures is beyond the scope of this work and is deferred to further research.

2D reconstruction of an ellipsoidal inclusion
We describe the use case of a reconstruction of an ellipsoidal copper Cu inclusion in an iron Fe substrate from k-ratios obtained from a 12 keV electron beam and multiple beam positions.In Figure 8, we plot k-ratios obtained from an artificial line-scan of the inclusion.Only the ten k-ratios retrieved from five beam positions l y ¼ f−50, −25, 0, 25, 50g nm, which are marked by black crosses, are used for the reconstruction.To compute the k-ratios, the reference material is assumed to be given.We visualize the total density q tot of the reference material in Figure 9 and additionally show the ionization distributions of ðCu, K − L 2 Þ and ðFe, K − L 2 Þ that define the size of the interaction volume.The ellipsoidal Cu inclusion is clearly smaller than the interaction volumes.Additional settings of the k-ratio model are shown in Table 5.
The profile shows an increase in the ðCu, K − L 2 Þ k-ratio for a beam position close to � 25 nm: We assume an elliptical Cu inclusion which we parametrize by p ¼ ðl 1 , l 2 , a, b, r, â, bÞ T , where l 1 and l 2 specify position, a And their adjoint For the reconstruction, we use the L-BFGS algorithm implemented in Optim.jl(Mogensen and Riseth 2018).The total density of the material q tot during the iteration steps is visualized in Figure 10.From the initial guess (educated guess: higher copper concentration at x 30 nm, see Figure 10(a)) the reconstruction converges to the reference material (see Figure 9(a)).Alongside the total density of the reconstructed material, we visualize the k-ratio profiles of ðCu, K − L 2 Þ and ðFe, K − L 2 Þ in Figure 11.The shapes of the curves quickly coincide with the measured k-ratios (black crosses).At iteration 30 the measured k-ratios are already well approximated, the total density, however, is not.For sufficient reconstruction of the ellipsoidal structure, the error in the k-ratios (the objective function) must be further reduced.In Figure 12, we show the value of the objective function jjkðpÞ − k exp jj 2 and the error in mass concentrations jjqðpÞ − q true jj 2 during the iterations of the optimization.For the first 50 iterations, the value of the objective function decreases, whereas the error in the mass concentrations remains close to the initial error.Only after the first 50 iterations, the error in the mass concentrations also decreases.In Figure 10, this effect is  also observed, where only after iteration 90 the shape of the reference ellipse starts to form.

Conclusion
In this paper, we derive an adjoint approach to compute derivatives for a deterministic k-ratio model to be applied in reconstruction in EPMA.The k-ratio model is based on the P N -model, a moment expansion of the continuous slowing down approximation of the linear Boltzmann equation that is pioneered for EPMA in B€ unger, Richter, and Torrilhon (2022).We formulate the model with a focus on the inverse problem and describe the application of the model within an adjoint algorithmic differentiation framework.This work thus represents a building block in the development of a high-resolution reconstruction method that uses gradient-based optimization techniques.By replacing material parametrizations, experiments can be realized that reconstruct different material parameters and take into account different prior knowledge.We conclude the paper with a validation of our differentiation method and reconstruction experiments.The experiments show the influence of the material parametrization in the reconstruction and illustrate our idea of an application of our method.The combination of measurements with prior knowledge makes high-resolution reconstruction problems tractable with a reasonable amount of measurement data.At the same time, we demonstrate the extensibility of our implementation to various material structures.High-resolution imaging, i.e., the reconstruction of material structures that are smaller than the interaction volume, is possible with sufficient data or sufficient material assumptions.But there will always be a balance between the accuracy of the measurement and the model to the accuracy of the reconstruction.Especially for high resolution, the analysis of this relation is key.To achieve a reliable reconstruction result, the quantification and propagation of measurement and model uncertainties are necessary.

Appendix A. Adjoint continuous implicit differentiation
Consider the implicit definition: Given x 2 X, find y 2 Y such that Fðy ¼ gðxÞ, xÞ ¼ 0: We refer to the implicit function with y ¼ gðxÞ and assume that it is unique at x.Given a (tangent) direction _ x, we realize that the directional derivative of F in direction _ x does not change.Employing the chain rule we find the following.

@ @h
Fðgðx þ h _ xÞ, x þ h _ xÞ h¼0 ¼ @Fðy, xÞ @y @gðxÞ @x _ We consider the previous statement in a weak sense, meaning we test the constraint with � F 2 T, where T is an appropriate space.

Figure 1 .
Figure 1.A sketch of the physical processes in EPMA.The sample G is rastered by electron beams (blue) with different positions l x .Electrons e -from the electron beam (blue dotted line) scatter inside the sample and strike bound electrons that leave a vacancy (blue circle).Outer shell electrons (blue discs) fill this vacancy and release an x-ray of characteristic energy (red wobbly line).The x-ray travels through the sample and is counted by a detector.Beam electrons only excite a certain volume of the sample, the interaction volume (gray ellipses) which scales with the beam energy l � : ffl ffl ffl ffl ffl ffl ffl ffl {zffl ffl ffl ffl ffl ffl ffl ffl ffl ffl ffl } :¼q tot ðxÞ (26) As a first possibility, we model weight fractions x pc i ðxÞ as a piecewiseconstant function.We subdivide the spatial domain G into n k 2 N disjoint subdomains G k and introduce the indicator function 1 G k ðxÞ: To guarantee the partition of unity we only use n p ¼ n k � ðn e − 1Þ parameters p pc i, k and define the parametrization � I a A _ a ðxÞdx ¼ hN a ð�ÞI a ð�Þ � I a |ffl ffl ffl ffl ffl ffl ffl ffl ffl {zffl ffl ffl ffl ffl ffl ffl ffl ffl } :¼ � A a ð�Þ , A _ a ð�Þi L 2 ðGÞ (43) Accordingly, the adjoints � N a ð�Þ and � I a ð�Þ are derived.

Figure 2 .
Figure 2. The abstract computational graph of our model.Straight edges (bottom to top) correspond to the primal function evaluation.Bend edges (top to bottom) correspond to the adjoint evaluation and are labeled with the respective adjoint operators.Dashed boxes highlight the individual model evaluation for every experimental setup a 2 A: Note that multiple adjoint edges entering a node correspond to the accumulation of the respective adjoint operators.This is not explicitly addressed in the derivation in section 3.2.

Figure 3 .
Figure 3.The absorption path lðx, x d Þ ¼ fyjdðx, x d , yÞ ¼ 0g from x toward the detector position x d (solid line) and the path used for the adjoint lðx, x d � Þ ¼ fyjdðy, x d , xÞ ¼ 0g (dotted line).G is the computational domain.
Figure 4.A visualization of the sensitivities computed using our method and finite differences.We abuse the parametrization function of each parametrization and plot x 1 ðxÞj p¼ @Ia

Figure 5 .
Figure5.The reconstructed density of a material with a sharp interface between an iron Fe layer covering an Ni substrate.We use the piecewise-constant, the piecewise-linear and the non-linear parametrization for the reconstruction.For the piecewise-constant and the piecewise-linear parametrization, the geometry (interfaces) are visualized by gray vertical lines.The black line shows the reference density.All parametrizations converge.The piecewise-constant parametrization has a clear advantage because the interface aligns with an interface of the parametrization.The piecewise-linear parametrization cannot approximate the discontinuous interface, hence it converges to presented smoothed representation.The non-linear parametrization is flexible enough to identify the location of the interface and also approximates the discontinuity using a strong gradient.

Figure 6 .
Figure 6.The reconstructed density of a material with a diffusive interface between an iron Fe layer covering an Ni substrate.We use the piecewise-constant, the piecewise-linear and the non-linear parametrization for the reconstruction.For the piecewise-constant and the piecewise-linear parametrization, the geometry (interfaces) are visualized by gray vertical lines.The black line shows the reference density.All parametrizations converge.None of the parametrizations can represent the interface perfectly.The piecewise-linear and the non-linear parametrization perform better for this example, because of their flexibility to approximate continuous functions.

Figure 7 .
Figure 7.The value of the objective function and additional reconstruction quality measures during the iterations of the optimization for all parametrizations.The objective function jjkðpÞ − k exp jj 2 is visualized as a solid line.Also, the error in the mass concentrations jjqðxÞ − q true ðxÞjj 2 (dashed line) and the error of k-ratios jjk þ ðpÞ − k exp þ jj 2 (dotted line) from additional beam energies (11keV and 14keV) is plotted.

Figure 8 .
Figure 8.The k-ratio profile of the elliptical Cu inclusion in the Fe substrate computed using a 12 keV electron beam.Dashed lines show the k-ratio profiles of ðCu, K − L 2 Þ (blue) and ðFe, K − L 2 Þ (orange) with a high beam resolution.Black markers show the k-ratios which are used for the reconstruction.While not directly visible, the shape and height of the k-ratio curves encode information about the shape and location of the inclusion and the structure of the interface between inclusion and substrate.

Figure 9 .
Figure 9.The total density and two ionization distribution fields of the reference material.The ionization distribution curves determine the size of the interaction volume.The size of the ellipsoidal Cu inclusion is smaller than the interaction volume.

Figure 10 .
Figure 10.The reconstructed density q tot of the Fe material with a Cu inclusion during the iterations of the L-BFGS optimization.

Figure 11 .
Figure 11.The k-ratio profile of ðCu, K − L 2 Þ and ðFe, K − L 2 Þ during the iterations of the reconstruction.Black crosses mark the measured k-ratios which are considered in the objective function jjkðpÞ − k exp jj 2 (cf.8).The visualized iterations are the same as in 10.Note that the kratios quickly converge to the measured values, although the shape of the ellipse does not yet coincide with the shape of the ellipse in the true material (9).

Figure 12 .
Figure 12.Errors of the objective function jjkðpÞ − k exp jj 2 and the error of the mass concentrations jjqðpÞ − q true jj 2 during the iterations of the L-BFGS optimization.Although the objective function significantly decreases for the first 50 iterations, the error of the mass concentrations remains close to the initial error.The same behavior is noticed in 10 and 11.
ffl ffl ffl ffl ffl {zffl ffl ffl ffl ffl ffl ffl ffl } ffl ffl ffl ffl ffl ffl ffl ffl {zffl ffl ffl ffl ffl ffl ffl ffl ffl }

�
FðnÞr_ yðnÞn − _ yðnÞr � FðnÞndS |ffl ffl ffl ffl ffl ffl ffl ffl ffl ffl ffl ffl ffl ffl ffl ffl ffl ffl ffl ffl ffl ffl ffl ffl ffl ffl ffl ffl ffl {zffl ffl ffl ffl ffl ffl ffl ffl ffl ffl ffl ffl ffl ffl ffl ffl ffl ffl ffl ffl ffl ffl ffl ffl ffl ffl ffl ffl ffl } R |ffl ffl ffl ffl ffl ffl ffl ffl ffl ffl ffl ffl ffl ffl ffl ffl ffl ffl ffl ffl ffl ffl ffl ffl ffl ffl ffl ffl ffl ffl ffl ffl ffl ffl ffl ffl ffl ffl ffl ffl ffl ffl ffl ffl ffl ffl ffl ffl ffl ffl ffl ffl ffl ffl ffl ffl ffl ffl ffl ffl ffl ffl ffl ffl ffl ffl ffl ffl ffl ffl ffl ffl ffl ffl ffl ffl ffl ffl ffl ffl {zffl ffl ffl ffl ffl ffl ffl ffl ffl ffl ffl ffl ffl ffl ffl ffl ffl ffl ffl ffl ffl ffl ffl ffl ffl ffl ffl ffl ffl ffl ffl ffl ffl ffl ffl ffl ffl ffl ffl ffl ffl ffl ffl ffl ffl ffl ffl ffl ffl ffl ffl ffl ffl ffl ffl ffl ffl ffl ffl ffl ffl ffl ffl ffl ffl ffl ffl ffl ffl ffl ffl ffl ffl ffl ffl ffl ffl ffl ffl ffl } �¼� max |ffl ffl ffl ffl ffl ffl ffl ffl ffl ffl {zffl ffl ffl ffl ffl ffl ffl ffl ffl ffl } Þj �¼� min |ffl ffl ffl ffl ffl ffl ffl ffl fflffl {zffl ffl ffl ffl ffl ffl ffl ffl ffl ffl } a : With the divergence theorem

Table 1 .
Model settings used for the comparison of 1D sensitivities.

Table 2 .
Relative errors in the sensitivities/derivatives of intensities for multiple x-rays a L and parametrization methods xðxÞ: Sensitivities/derivatives are computed using our implementation and using a central finite difference approximation of order 2.

Table 3 .
Runtimes of the computation of sensitivities/derivatives using our implementation and a central finite difference approximation of order 2 (FiniteDiff.jl).

Table 4 .
Model settings used for the reconstruction of the 1D interfaces.

Table 5 .
Model settings used for the 2D reconstruction of the ellipsoidal Cu inclusion.