On machine learning force fields for metallic nanoparticles

ABSTRACT Machine learning algorithms have recently emerged as a tool to generate force fields which display accuracies approaching the ones of the ab-initio calculations they are trained on, but are much faster to compute. The enhanced computational speed of machine learning force fields results key for modelling metallic nanoparticles, as their fluxionality and multi-funneled energy landscape needs to be sampled over long time scales. In this review, we first formally introduce the most commonly used machine learning algorithms for force field generation, briefly outlining their structure and properties. We then address the core issue of training database selection, reporting methodologies both already used and yet unused in literature. We finally report and discuss the recent literature regarding machine learning force fields to sample the energy landscape and study the catalytic activity of metallic nanoparticles. Graphical abstract


I. INTRODUCTION
Metallic nanoparticles (MNPs) are finite objects, containing up to several thousands of atoms.2][3] MNPs' chemical and physical propertiesdifferent from their single atom or bulk counterparts -depend strongly on the MNPs' architecture (i.e., their size, shape, chemical composition and chemical ordering).][6][7][8] In the past two decades many efforts have been carried out to sample the multi-funnelled energy landscape of MNPs, mainly to identify MNPs' putative global minima. 9To this end, various sampling methods have been employed, such as Monte Carlo schemes 10 , basin hopping [11][12][13][14][15] , hyperspatial optimization 16 , and evolutionary algorithms [17][18][19][20][21] to name a few.While prediction from global minimum or archetypical structures may be helpful in deciphering structure-property relationship, only the accurate estimate of the finite-temperature population of relevant isomer enables to obtain accurate predictions of the ensemble properties.From a numerical point of view, molecular dynamics, path sampling 22,23 , metadynamics [24][25][26][27][28] , and finite temperature partition function evaluation approaches [29][30][31][32][33] offer a dramatically effective way to account for structural changes and fluxional behaviours, and thus to estimate faithful population distributions.The evaluation of converged results is however expensive, i.e., it necessitates of thousands if not millions of energy and force calculations. 27,31n the recent years the advent of machine learning (ML) techniques paved the way towards establishing a novel paradigm in the computational studies of nanoparticles' structural and chemophysical properties.Together with big-data techniques that easily and better characterize nanoparticles' geometries during simulations [34][35][36] and experiments, [37][38][39] machine learning force fields (ML-FFs) represent the most ground-breaking application of artificial intelligence models to further the breath and scope of numerical prediction and design of MNPs.Machine learning force fields' (FFs) accuracy is comparable to the one of the electronic structure methods they are trained on, but their speed can approach the one of semi-empirical methods.Their use in algorithms to sample minima and transition states in nanoparticles' energy landscapes thus promises to reduce the computational cost of sampling schemes and in turn to establish them as routine, comprehensive and exhaustive procedures that enable to bridge simulations and real systems used in experimental investigations.The aim of this review is to offer a representative and pedagogical overview of the state-of-theart of ML-FFs for MNPs, while also presenting a discussion on the open questions related to the subject.The review is organized as follows: in section II the local atomic environment representations most commonly employed in ML-FF generation will be discussed and presented to the reader.In section III three supervised ML algorithms for ML-FF generation will be formally introduced and briefly explained: linear regression, artificial neural networks (ANN), and Gaussian process regression (GPR).Section IV will discuss the issue of training database selection, presenting some methods that have been already used in literature together with algorithms that have not already been used in the field.Subsequently, section V will present examples from literature on the application of ML to the study of nanoparticles, together with figures extracted from selected works.Finally, the conclusion will summarise the earlier presented discussion.

II. DESCRIPTORS OF LOCAL ATOMIC ENVIRONMENTS
A force field establishes a mapping between atomic coordinates and the corresponding total energy of the system.The construction of a force field can therefore be framed as a supervised learning problem where the objective is to find a function that yields a total energy E (and/or forces F i ) given the coordinates and species of atoms in a system R.1][42] Therefore, in order to improve the computational scaling, and to construct force fields which can be used on systems with arbitrary number of atoms, the total energy E(R) of a system can be approximated as a sum of local energy contributions (ρ i ), each pertaining to an atom i surrounded by its local atomic environment ρ i : 43 E(R) where the local atomic environment ρ i refers to a vector or higher dimensional object that contains information regarding the positions and atomic species of atoms contained in a sphere of radius r c centred on atom i.The approximation of Eq. 1 is based on the near-sightedness principle, i.e. on the fact that forces acting on atoms are not strongly dependant on the positions and species of atoms very far away. 44The near-sightedness assumption does not hold for interactions which are inherently long-ranged, i.e. electrostatic interactions; and these have to be treated separately.
The force vectors F i can be calculated using the definition: where r i is the vector containing the Cartesian coordinates of atom i.
for supervised learning problems.A fraction of such data set is used to train the model (training set D tr ), and the remaining part to test its accuracy (validation set D val ).
We will now assume the learning problem to be focused on predicting local energies (ρ i ), since total energies and forces can be derived from local energies (Eqs. 1, 2).This assumption is made mainly for educational purposes, as machine learning force fields can be trained using total energies, forces, or a combination of the two.
In general, a local energy (ρ i ) of a given atom i is not learned directly as a function of the atomic Cartesian coordinates.On the contrary, the coordinates of the atoms in the vicinity of atom i (within a chosen cutoff r c ) are usually first transformed to a vector q i , commonly called a descriptor, and then a regression model is used to learn the map (q i ) from the descriptor to the sought local energy function.This procedure is highly beneficial to the transferability of the generated force field as well as to the learning speed of the ML algorithm used, provided that the chosen descriptor possesses some key properties.

A. Descriptor's properties
A suitable representation of the local environment around an atom is encoded in a descriptor, which must possess the following key properties.Firstly, in order to strictly constraint the learned force field to respect fundamental physical invariances, any descriptor should be invariant upon rigid translations, rotations and reflections of the local environment, as well as invariant upon any permutation of atoms of the same chemical species.Arguably, invariance properties could be learned automatically by any sufficiently flexible algorithm when provided with enough data.6][47][48] Secondly, it is fundamental for a descriptor to be differentiable with respect to the atomic coordinates, as this is required for an analytic computation of the atomic forces after the interpolation of an energy function.Thirdly, a descriptor should contain enough information to well capture the relevant physics of the system.As an illustrative example, the total mass of a local atomic environment ρ i is a perfectly invariant function, but can hardly provide enough information for constructing a useful interaction model.Moreover, it would be desirable for a descriptor not only to contain enough information, but also to capture, to some extent, important features of the sought energy or force function as this will make learning process faster and resulting force field potentially more accurate.Finally, in addition to all other properties a descriptor should also be computationally inexpensive w.r.t. the reference method (e.g.DFT).Indeed, as an extreme example, the density functional theory (DFT) force predicted on a given atom fulfils all the conditions given above while still being a useless descriptor in practice.The requirements listed above leave a large freedom of choosing a single descriptor among countless options.For this reason, many local atomic environment descriptors have been developed and applied to build ML-FFs [49][50][51] , and in some occurrences such descriptors have been left for a ML algorithm to learn and optimise 52 In the following two paragraphs, we focus on the two descriptors that have been used the most for building atomistic force fields for nanoclusters: atomic symmetry functions 43 and power spectrum 53 .Afterwards, we will point out the common features possessed by these successful descriptors.Their remarkable similarity will then naturally bring to the exploration of another class of descriptors, simply encompassing explicit n-body degrees of freedom, which have recently been used successfully to build force fields for nanoclusters 54 .

B. Atomic symmetry functions
The first descriptor used in the context of ML-FF fitting consists in a set of functions of the local environment which are invariant by construction over the physical symmetries mentioned earlier.These functions are called atomic symmetry functions (ASF) and, since they were introduced by Behler and Parrinello in their pioneer work on neural networks force fields 43 , they are also often referred to as Behler-Parrinello functions.ASFs form a basis set for the expansion of 2-body (radial) and 3-body (angular) distribution functions. 43Over the years, variations on the structure of ASFs have been developed, e.g. using Chebyshev polynomials to improve their spatial resolution's efficiency 55 .A brief description of their initial formulation 43 follows.Given a local atomic environment ρ i defined by the positions {r ij } M j=1 of M atoms within a cutoff radius r c of a central atom located at the origin of the reference frame, the 2 and 3-body ASFs take the form, respectively: where f c (r) is a cutoff function smoothly approaching zero as r ij approaches the cutoff radius r c .Its presence in the symmetry functions ensures that the descriptor smoothly varies when atoms transit through the radial cutoff, thus avoiding any discontinuity in the learned energy or force function.Both G i 2 and G i 3 depend on some parameters, namely θ 2 = (η, r s ) and θ 3 = (η, ζ, λ) (with λ ∈ {−1, 1}), and the descriptor vector q i ASF is built by evaluating these functions (Eqs.3, 4) for a large number of these parameters.The ASF descriptor, therefore, requires a selection of relevant parameters for the θ 2 and θ 3 ; this can be achieved via fingerprint optimisations as recently discussed by Imbalzano et al. 56 .The function G i 2 can be seen to provide information about all the distances from the central atom to its M neighbours.On the other hand, G i 3 encompasses information on every triplet of atoms which include the central atom, meaning that they capture respectively 2 and 3-body information about the local environment ρ i .It is important to note that using a 2 or 3-body descriptor does not limit the generated ML-FF to be low order, since an arbitrary non linear function of such a descriptor will be able to model higher order interactions as well. 57[60][61][62][63][64][65][66][67][68][69]

C. Spherical harmonics power spectrum
The power spectrum of a spherical harmonics (SH) expansion is another popular choice for atomic descriptors.It was first introduced to fit energies with GPR 53,70 , but has later found applications also for ANN FFs 71,72 .The power spectrum descriptor has been proven to be equivalent to the widely used "Smooth Overlap of Atomic Orbitals" (SOAP) representation when using a dot product kernel within GPR. 53 To build a power spectrum descriptor, it is customary to first write the local environment as a sum of Gaussian functions, each centred on one of the M atoms within the local environment ρ i defined by the cutoff radius r c : (5) Then the key idea is that of expanding the angular part the above function in a SH basis {Y lm (r)}, in order to easily build a rotationally invariant descriptor as the power spectrum of this expansion. 53Since SH can only retain the angular information of the above function (Eq.5), the radial part is expanded in another basis, g n (r).The specific choice of the radial basis is not crucial to the descriptor, common choices are polynomials or equispaced Gaussian functions.All together, the function ρ i (r) can hence be formally rewritten as: The coefficients c i nlm of the above SH expansion are not rotationally invariant.However, the power spectrum of the expansion can easily be written down in terms of the SH coefficients as: The power spectrum descriptor q i P S is defined as the vector containing the power spectrum coefficients described above.It is rotationally invariant and can hence be effectively used to represent atomic environments.The power spectrum is more expensive to compute than the ASF descriptor, but it requires the user to choose only the width σ of the Gaussian functions in Eq. 5 and where to truncate the expansion in Eq. 6, instead of the full range of ASF parameters.Recently, a more efficient variant of the power spectrum descriptor which reduces its computational cost has been proposed.The interested reader is referred to Thompson et al. 73 for more detailed information on such variant.

D. Explicit n-body features
In spite of the apparent differences between the mathematical expressions of the two descriptors discussed above, the power spectrum (Eq.7) and the ASF representation (Eqs.3 and 4) are very similar at a fundamental level.Indeed, both can be considered as symmetric representations of the 2 or 3-body information present in the atomic environment.This is particularly clear in the case of the ASFs (Eq.4), which are seen to depend rather explicitly on distances and angles.The n-body nature of the power spectrum descriptor is instead less obvious from its mathematical formulation (Eqs.6 and 7), but it follows from the analysis reported in Ref. 57 -later formulated also in Ref. 74 -that the power spectrum is a 3-body descriptor, meaning that any linear regression built on such a descriptor will give rise to a 3-body force field.This fact can be intuitively understood considering that the Gaussian expansion in Eq. 5 is made of contributions coming from pairs of atoms, and can thus be seen as a 2-body descriptors.The power spectrum is then constructed by mixing rotationally invariant information coming from pairs of such 2-body descriptors (Eq.7), making the final descriptor 3-body.It is not a surprising fact that both of the descriptors mentioned so far (which are indeed among the most used in practical applications) are constructed using 2 and/or 3-body features.Low order descriptors can in fact be expected to capture well the ionic and covalent nature of chemical bonds, thus representing a rather natural choice.Moreover, while the absence of any angular information in a 2-body descriptor will always prevent a correct characterization of higher order interactions, a 3-body descriptor does not present the same problem, meaning that a nonlinear function of a 3-body descriptor (e.g., a neural network or a Gaussian process model) can in principle capture well also higher order interactions.Building on these ideas, one can think of explicitly using the 2 and 3-body degrees of freedom present in a given environment directly as descriptors.For instance, a 2-body descriptor can be simply given by the unordered set of distances from the central atom to all other atoms within the configurations, while a 3-body descriptor can be given by the unordered set of triplets of distances between a central atom and any two neighbors The above descriptors contain by construction the full 2-and 3-body information about the local environment.They are moreover computationally efficient, simple to interpret, and do not require any choice of parameters or truncation approximation.These advantages come at a cost.Indeed, the requirement that the above sets are unordered is strictly needed to preserve permutational invariance, and such a condition must be imposed to the ML algorithm processing the inputs.This can be done rather straightforwardly when using a Gaussian process regression model (see Section III C), and in fact explicit n-body features as the one provided above have been extensively used in this framework 57,75,76 , also for building force fields for nanoclusters 54 .

III. MACHINE LEARNING ALGORITHMS TO GENERATE FORCE FIELDS
The three most commonly used methods to learn the local energy function (q i ) are linear regression, artificial neural networks and Gaussian process regression.These methods are schematically compared in Figure 1.In the following, these three algorithms are briefly presented.

A. Linear Regression
A straightforward way to model the local energy function i is via linear regression: depicted in a similar manner so to ease the recognition of parallellisms and differences between the methods.
Gaussian processes can be imagined to be equivalent to a fully connected ANN with a single infinite layer, this analogy has been proven rigorously in Refs.Neal 77 and Rasmussen 78 .
where w are the weights, ˆ (q i ) indicates the modelled energy function and φ is a function chosen a priori (e.g.polynomial function, sinusoidal function, etc.).The weights that minimize the squared error loss L 2 : can be found analytically.In Eq. 11 the predicted and real total energies Ên , E n of each system are calculated per Eq. 1, and N tr indicates the number of training points used.In the linear regression case, the solution to the learning problem is fast to compute, and the predictions the model yields are computationally cheap.Despite the simplicity of the learning model, a variation of this method has been proven to yield interesting results, for example in aiding the prediction of adsorption energies for RhAu nanoparticles. 79,80onetheless, in most applications linear regression models lack the complexity needed to construct an accurate force field.

B. Artificial Neural Networks
a. Structure ANNs are composed of nodes organised in layers which connect the input q i to an output (q i ).Networks with few hidden layers are called "shallow", whereas the term "deep" is used to indicate networks with a high number of hidden layers.Nodes are connected by weights w which are optimized during training, and biases b, also optimized, are added at every layer of the ANN.ANNs are universal approximators 81,82 , meaning that in the limit of an infinite hidden layer they can approximate any continuous function to arbitrary precision.ANNs have recently been used to predict total energies of MNP 63,71,72 , study their thermodynamic stability 62 , explore their phase diagram 63 , aid the search for minima structures 60,67,69 , and run molecular dynamics (MD) simulations 59,61,83 .The price to pay for such flexibility and expressive power is that ANNs are very data-hungry, and typically require orders of magnitude more training points than linear regression or GPR methods to reach the desired accuracy.In the specific case of an ANN with two hidden layers, the prediction for a local energy (q n ) is given by: where φ is an activation function (sigmoid, hyperbolic tangent, etc.), the j l indices run over all nodes of layer l (here l = 1, 2), the w l,l 's are the weights connecting the nodes of layer l to the nodes of layer l , b l are the biases added to the nodes of layer l.The structure of the artificial neural network just described can be visualised in Figure 1 and it is easy to see how Eq. 12 generalises to networks containing more than two hidden layers.

b. Training
The training of an ANN consists in the search of weights w and biases b which minimize a loss function L on a training set containing N tr points.Typically the squared error loss (Eq.11) is chosen for regression problems such as energy and/or force fit .The parameter space {w, b} contains from thousands to millions of parameters, and the optimization task is not trivial as the loss function is highly non-convex, presenting many local minima.For this reason, non-stochastic gradient descent on the loss landscape induced by the full training set would be non-optimal.Therefore, batch training is typically used to introduce stochasticity into the optimization: a subset of training points is selected and a number of gradient descent steps is taken; this process is then iterated.Typically, the training process is stopped once the error on the validation set starts increasing, indicating that the neural network has reached the point when it is starting to over-fit on the training data.c.Beyond feed-forward artificial neural networks While only feed-forward ANNs are covered in this section, a plethora of variations on neural networks have been developed in recent years: recurrent neural networks, convolutional neural networks, variational autoencoders, and generative adversarial networks to name a few. 84These methods share the same principles of weight and bias optimization via batch training, but contain differences (e.g. in layer structure) which substantially modify the tasks each method excels at.To the authors' knowledge, there is no existing literature where beyond-feed-forward neural networks are applied to the generation of force fields for MNPs specifically, but they have been employed for other systems. 52,85. Gaussian Process Regression a. Formalism GPR is a Bayesian technique used for supervised learning tasks.Typically, GPR can require orders of magnitude less data points than ANNs to train, which makes it a suitable choice when data is expensive to generate and/or scarce.Another advantage of GPR techniques is that, being a fully Bayesian approach to regression, it is possible to estimate an uncertainty associated to every prediction.As for the two previously described algorithms, here too the local energy function is learnt (instead of the total energy), and then summed according to Eq. 1 to obtain the total energy.For educational purposes, in the following we assume that a database of local energies pertaining to local atomic environments is available.This is of course not the case in reality, and the interested reader is referred to Bartók and Csányi 70 and Glielmo et al. 86 for more details on how a local energy can be learnt using a database of total energies and forces.GPR assumes that outputs are distributed like a Gaussian stochastic process.Given a training data set containing N tr local atomic environment descriptors {q i }, i = 1, . . ., N tr , and a local energy kernel function k(q i , q j ), the prediction yielded by the trained GPR on a new local atomic environment descriptor q * is: where α i are the weights computed during training via a straightforward matrix inversion.As already mentioned, we can also compute the uncertainty VAR(q * ) associated with a prediction ˆ (q * ): where k indicates the vector containing the local energy kernel function evaluated between q * and q i for i = 1, • • • , N tr , K is the Gram matrix containing the kernel function evaluated between every pair of inputs (q i , q j ), with i, j = 1, • • • , N tr , and σ n is a hyperparameter that governs the noise associated with the training data.This uncertainty (Eq.14) is such that the error incurred by our GPR when predicting the total energy (q * ) is expected to be normally distributed with mean zero and standard deviation VAR(q * ).It is possible to learn from atomic forces instead of, or in conjunction with, total energies; this requires the construction of a kernel function for forces.In order to preserve energy conservation of the force field, such a force kernel function can be obtained by differentiation of an energy kernel function: where r i , r j indicate the Cartesian coordinates of the central atom in the local atomic environments i and j respectively.
b.The kernel function The structure of the kernel function k is of great importance in GPR since its properties directly affect the properties of the force field it generates.The kernel function acts as a similarity function between pairs of descriptors of local atomic environments (q i , q j ).Kernel functions also allow for the use of descriptors which do not inherently posses all the required invariance properties, since these can be imposed on the kernel function itself.For instance, invariance over the rotations can be enforced by performing an Haar integration over the O(3) group 47,57 , invariance over translations can be enforced by integrating over R 3 , 74 and invariance over permutations can be enforced by direct sum over the permutation group 57,76 .Some kernels commonly used in literature for ML-FF generation will now be described.
The SOAP energy-energy kernel k SOAP can be straightforwardly built from the power spectrum descriptor derived in Eq. 7: the dot product between normalized power spectrum descriptors of two local atomic environments (q n P S , qm P S ) is elevated to a power to obtain: where ζ is a positive integer. 53e n-body energy-energy kernel for any finite n can be built from the n-body explicit descriptors (Eqs.8, 9) by, for example, taking the Gaussian difference of the descriptors summed over the relevant permutation groups and over each dimension of the descriptor.In the case of the 2-body kernel this results in: where σ is the characteristic lengthscale of the kernel.The 3-body kernel instead reads: where P c (|P c | = 6) is the set of permutations of three elements.
c. Mapping force fields One of the main drawbacks of GPR FFs is that, despite being computationally faster than ab-initio methods to evaluate, they scale linearly with the number of training points, N tr (see Eq. 13).This burden can be avoided for a particular sub-class of kernel functions, and the computational cost of the so-derived mapped force fields (MFFs) is comparable to classical tabulated force fields. 54The constraint that must be imposed on the kernel function in order for the FF to be mappable is that its order must be finite 57 or, in other words, that the kernel function is not many-body.The mapping procedure starts with the identification of the maximum order of the kernel function used, for example 2-body for kernels which only depend on interatomic distances from central atoms, 3-body for kernels that use angles, 4-body for kernels that use torsion angles, and so on.Subsequently, the local energy prediction made by the GP is calculated and stored on an array of values.For example, for a 2-body kernel, the bond energy can be calculated and stored on an array of distances between 1 Å and the cutoff radius r c .This process yields a tabulated force field of the same order of the kernel.Predictions during MD runs can then be obtained by using an interpolator on the stored values for every pair, triplet, or quadruplet (when using, respectively, 2, 3 or 4-body kernels) of atoms in the system.This method has been shown to increase computational speed by a factor 10 4 with respect to traditional GPR without incurring in any additional accuracy loss. 54,57The interested reader is directed to MFF, a Python package that implements the above method within the framework of GPR FFs: https://github.com/kcl-tscm/mff. 87

IV. DATABASE SELECTION
The accuracy of a ML-FF depends on whether the predictions are made in an extrapolation or interpolation region.Without giving a formal definition, it can be helpful to think of an interpolation region as that region of the descriptor space which is "close" to the training database.Generally speaking, accuracy is higher, and predicted variance (in the case of GPR) is lower, when predictions are made in an interpolation regime.Therefore, it is preferable to always work in interpolation and, if possible, detect when and where the algorithm is extrapolating.Enforcing the force field to work in an interpolation regime becomes crucial in the case of nanoparticles and nanoalloys, where the number of competing isomers, even at small sizes, is very large.Due to this inherent complexity of the configuration landscape, it would be desirable to include in the initial training set a diverse array of isomers which yield accurate predictions also on never-seen-before geometries and/or chemical orderings.Two example cases extracted from the literature may help set this problem into practical scenarios.In a study on Ni 19 , it was shown that force prediction error was substantially lower if training and testing on the same morphology rather than training and testing on two different MNP morphologies. 54Furthermore, training on structurally heterogeneous databases was shown to lead to a balanced trade-off between versatility and overall accuracy.By the same token, training on low-symmetry or defected structures, which present very different local atomic environments, resulted in a more accurate machine learning model.This is caused by the higher variety of local atomic environments present is low-symmetry morphologies, which allow the convex hull spanned by the training inputs to encompass a larger phase space.In figure 2 it can be seen how training on low-symmetry structures (4HCP) yields force fields that are more accurate on a target morphology w.r.t.training on high-symmetry structures (3HCP, DIH, BIP).In the case of a work focusing on AuRh nanoparticles, it was also reported that regression models predict binding energies of N, O, C molecules correctly or not depending on the size of the nanoparticles and surfaces in the training set. 79Figure 3 shows how training sets which contained only single crystal training points incur in high errors when predicting binding energies on nanoparticles, and viceversa.On the other hand, training sets which contain both the nanoparticle and the single crystal morphologies consistently incur in low binding energy mean absolute errors for MNPs ranging from diameter 0.5 nm to bulk.
A diverse set of strategies has therefore been developed in the machine learning community to build non-biased and automatically comprehensive training databases.These databases must be such that the machine learning force field generates predictions in an interpolation regime for the study of interest, and must assure that redundant, costly, information is not gathered for small regions of the phase space with respect to the investigation of interest.Training database selection algorithms can be broadly classified into two categories: either the database needs to be generated ad-hoc (active learning), or it must be sub-sampled from a larger, already existent database (sparsification).In the following sections we introduce the aforementioned concepts and extract some examples from literature to better inform the reader.

A. Active learning
If there is no sufficient pre-existent ab-initio training database for the task of interest, one convenient way to minimize the computational effort in building it is active learning.Active learning indicates a framework where the ML algorithm is iteratively trained on structures where the algorithm's predictions are deemed uncertain.If this is the case, forces and energies are computed with ab-initio methods on the structures where the ML predictions were not accurate enough, and finally inserted into the training database.While GPR can naturally return a predicted variance (Eq.14), in the case of ANNs an uncertainty estimate can be obtained as the variance of the predictions made by multiple ANNs trained for the same task. 58,88n example use of this approach can be found in conjunction with a genetic algorithm and a basin hopping scheme in Jennings et al. 89 .Here, ML-FFs are used as a computationally inexpensive energy predictors so to fast-screen for energetically relevant isomers and thus facilitating the quick convergence of the optimization algorithms driving the energetic landscape exploration as well as the surrogate machine learning model training.In a similar fashion, surrogate machine learning force fields can be also used to first quickly explore energetically relevant portions of the conformational space and rearrangement pathways via molecular dynamics and gather diverse ensembles of nanoparticle structures for which electronic structure calculations are executed.These are then fed as novel energy and force training points to the machine learning model in an iterative fashion until convergence of the model in the region of interest. 64,67

B. Database sparsification
If a database which is considered comprehensive for the system of interest is available, it can be useful to sub-sample a smaller training set from the full dataset, in order to reduce the computational cost of training and using a ML algorithm.This is especially beneficial in the case of GPR, where the training computational cost scales as O(N 3 tr ) and the prediction cost scales with O(N tr ), where N tr is the number of training points.Many database sparsification methods exist, but up to now only few applications relative to the construction of ML-FFs can be found in literature.Nonetheless, we will briefly present some database sparsification methods for ML-FFs, namely: farthest point sampling, measured error sampling, CUR decomposition, and descriptor-space sampling.
a. Farthest point sampling Farthest point sampling (FPS) is a database selection method where points are iteratively added to a training set based on their distance, according to a pre-selected metric, from the points already present in the training set.At each iteration, distances between the points inside and the points outside of the training set are computed, and the point furthest away from the training set, according to the chosen distance metric, is included.The computational cost of this method is O(N tr • N ), where N is the total size of the database to sub-sample from.FPS sampling has been employed in literature to reduce training set size for Gaussian Process regression methods. 90,91While using GPR, predicted variance can be used instead of distance from training set as a way to measure diversity.In this framework, points are iteratively added to the training set based on the value yielded by Eq. 14.
b. Measured error sampling Another class of iterative methods, which can be used independently of the ML framework employed, is based on the measured error incurred by the ML algorithm.The starting training database contains a small number of randomly sampled data points, and is then progressively expanded by inclusion of data points on which the measured incurred error is maximum.This method should be effective in reducing the maximum error incurred on the global database at the cost of an increased computational complexity w.r.t.other similar methods, e.g.FPS.
c. CUR decomposition CUR decomposition is a matrix-approximation technique that decomposes a matrix A into a product of three matrices C, U, R. 92 The method is similar to a low-rank single value decomposition approximation, but is built so that C is composed of columns of the original A, and R from rows of the same A. CUR decomposition can be applied to GPR by first building the Gram matrix K between all the points in the full dataset, and then reducing its rank by selecting the N columns (or rows) that compose C (or R).The method has a computational scaling of O(N 2 ); this can be reduced by subdividing the entire datasets into batches of size N batch N , thus introducing an approximation but reducing the scaling to O(N 2 batch • B) where B is the number of batches.Note that this method is not only helpful in identifying a sparse but information-rich database, but also in determining non redundant sets of descriptors. 56Descriptor-space sampling The following sampling method is inspired by the database sparsification algorithm used in Bernstein, Csányi, and Deringer 93 , but more general in its formulation.First, a physical descriptor must be chosen e.g.distance from, or angle w.r.t., the central atom of local atomic environments.Secondly, a set of discrete bins spanning the possible values the chosen descriptor can take must be built (e.g. a 1-D histogram containing distances from 1.5 Å to 5 Å binned every 0.05 Å). Loal atomic environments in the total database are then shuffled and accepted into the training set if they contain at least one occurrence of a binned descriptor's value which is not already present in the training data set.This simple method assures that the final training set contains environments which are as diverse at possible in the descriptor's space.This method is very computationally efficient, as only a single pass over the database is required.Fig. 4 shows a comparison of the mean absolute error on force vectors of three of the aforementioned sampling methods when building a ML-FF using GPR on a database containing a total of 13000 (PtCu) 13 structures, it is evident how descriptor-space sampling is the best performing algorithm for this system.For tests done with the measured error, predicted variance and CUR database selection methods, the total database was subdivided into batches and training sets were built by grouping the points selected in each of these batches.This was done in order to reduce the computational cost of such methods.It is interesting to note how, for this particular system, GPR algorithms trained using databases selected by methods other than descriptor-space sampling incur in errors similar to the ones yielded by randomly chosen training datasets.Such poor performance could be caused by the batch approximation used.In any case, the descriptor-space sampling method has a better scaling than any other method presented, is overall faster to compute even on small datasets, and shows a good performance on the analysed dataset.

V. APPLICATIONS OF ML-FFS FOR METALLIC NANOPARTICLES
After having introduced the formalisms behind the machine learning algorithms for generating fast and accurate energy predictions, we focus on their applications to understand structural properties of metallic nanoparticles.This section is divided into two parts.The first subsection reports applications where ML-FFs were used to supplement or substitute ab initio calculations for energy calculations, e.g. for global minima structure search or estimations and mapping of activation energies.The second subsection focuses on applications of ML-FFs to finite temperature simulations over several tens of ns, carried out e.g. to estimate phase diagrams and/or thermal properties of metallic nanoparticles.

A. ML-FFs for energy calculations
Machine learning force fields can be used as a surrogate for DFT when exploring the configurational space of nanoparticles.These ML-FFs must exhibit good accuracy especially when predicting energies, and can be trained either a priori or on-the-fly, depending on the size of the available database.In this section, we briefly resume the state-of-the-art on energy prediction yielded by ML-FFs to sample local minima, including chemical re-ordering, and to estimate adsorption energies.a. Monometallic nanoparticles The first application of machine learning to speed up the sampling of monometallic nanoparticles' energetic landscape consisted in the use of an ANN employing ASFs (Eqs.3, 4) as descriptors to search for global minima of gold nanoclusters within the basin-hopping method by Ouyang, Xie and Jiang.A new putative global minimum which has a core-shell structure of Au 10 @Au 48 and C4 symmetry was found, highlighting the benefit of comprehensive and fast exploration powered by neural network force fields. 65ubsequently, Zhai and Alexandrova employed a GPU accelerated ANN for sampling the potential energy surface of Pt 9 and Pt 13 .The authors used a 4-body descriptor as input to the ANN, and carried out global minima searches where the ML-FF was used to complement the DFT calculations and speed up the process.Finite temperature effects were also included a posteriori to probe the population distributions of such systems.This comprehensive study showed that the ensemble-averaged vertical ionization potential of the sytems under investigation changes when temperature increases, and that the catalytc property under operando conditions can be different from that evaluated at the global minimum structure. 94ore recently, Sun and Sautet presented an application of a genetic algorithm coupled with a high dimensional neural network potential using ASFs as descriptors to accelerate the comprehensive search for low energy metastable structures of Pt 13 under different H pressures.The presence and influence of these Pt 13 structures during catalysis was discussed for hydrogen  evolution reaction and methane activation.Although the ensemble of accessible metastable structures is relatively small under reaction condition, these structures can strongly influence the experimentally observed activity. 68   b.Nanoalloys The computational speed-up offered by ML-FFs to global minima search is key in the study of nanoalloys, as the extra degree of freedom given by the number of possible homotops increases the dimensionality of the space that has to be explored.Jennings and co-workers employed GPR force fields using a built-for-purpose local atomic environment descriptor to search for stable, compositionally variant, geometrically similar PtAu nanoalloys.The machine learning approach yielded a 50-fold reduction in the number of required energy calculations compared to a traditional brute force genetic algorithm. 89Figure 7 shows how the ML-based genetic algorithm was able to faithfully reproduce the convex hull for the excess energy of a (PtAu) 147 nanoparticle (left) while reducing by orders of magnitude the number of energy calculations required (right).One of the latest applications of machine learning force fields for nanoalloys involved the study of a trimetallic system, Cu-Pd-Ag.The fast and accurate ANN force field, powered by a stratified training scheme, was used to define a force field which, together with multi-tribe evolutionary searches, improved the efficiency in identifying stable elemental (30-80 atoms), binary (50, 55, and 80 atoms), and ternary (50, 55, and 80 atoms) Cu-Pd-Ag clusters.The best candidate structures identified with the neural network model, which used ASFs as local atomic environment descriptor, were showcased to have consistently lower energy at the density functional theory level compared with those found with searches employing an initial layer of inter-atomic potentials search. 69Nanocatalysts Machine learning energy predictions not only aid the search for the ensemble of energetically relevant nanoparticles' isomers, but also greatly enhance the scope for high throughput probing of active sites properties for catalytic reactions.One of the first applications of machine learning algorithms to speed up the high throughput characterization of the adsoprtion properties of available sites in a nanoparticle was developed by Ulissi et al.. Active sites for every stable low-index facet of a NiGa bimetallic crystal were enumerated and catalogued while the activity of these sites with respect to CO 2 adsorption was explored using a neural network-based surrogate model.This approach, which used ASFs as the local atomic environment descriptor of choice, reduced the number of explicit DFT calculations required for activity estimates by an order of magnitude.While most facets had similar activity to Ni surfaces, a few exposed Ni sites showed a very favorable on-top CO configuration.64  ticles.The employed method efficiently and accurately predicted the energetics of catalytic reactions on nanoparticles while providing information on structure-property relationships when combined with kinetic analysis.79 The same authors later predicted the binding energies of N, O, and NO with RhAu surfaces and particles using the same approach.Kinetic analyses of the direct decomposition of NO on RhAu nanoparticles were carried on to demonstrate that catalytic activity increases with a decrease in the particle diameter to 2.0 nm.Below a diameter of 1.5 nm a drop in the catalytic activity is registered and rationalized in terms of the disappearance of active alloyed corner sites on the small nanoparticles.80 AuCu nanoalloys' adsorption properties have also been investigated, as reported by Jäger et al.. Potential energy scans of hydrogen on AuCu clusters and on MoS 2 surfaces were conducted to compare and assess the accuracy of the Smooth Overlap of Atomic Positions, Many-Body Tensor Representation 95 , Couloumb matrix 96 and Atom-Centered Symmetry Functions in a kernel ridge regression framework.97

B. ML-FFs for finite temperature simulations
Beyond sampling local minima in the energetic landscape of MNPs, MD-based studies of kinetic rearrangements and thermodynamic stability of MNPs are also of great interest. 1With this in mind, employing ML-FFs to run fast MD simulations becomes more and more enticing.At the nanoscale, this allows to predict the dynamical and thermodynamical properties of MNPs.In this section we discuss the state-of-the-art of ML-FFs for MD simulations of MNPs and nanoalloys.a. Monometallic nanoparticles Artrith et al. investigated the structural and energetic properties of copper clusters when supported on zinc oxide in the first ever application of neural network force fields for metallic nanoparticles.In their seminal work, the authors assess the accuracy of ANN potentials employing the ASF descriptors.The manuscript builds on top of a previous paper by Artrith and Behler where a defected Cu surface was used as a benchmark for ANN force fields 58 providing an estimation of the accuracy and speed possible when using ML-FFs for metallic nanoparticles.The training of heterogenous ensembles of structures was demonstrated to be transferable to complex large-scale simulations of several defected stepped  surfaces and nanoparticles, such as the one displayed in Figure 9. 59 ANN force fields developed in the group of Bulusu, which used the ASF descriptor, were employed to sample local and global minima in the potential energy surface of Na nanoparticles of size 16-40, where transitions were also thoroughly probed using a Monte Carlo scheme.The accuracy of the force field, and the timescale probed, allowed to establish the presence of a characteristic premelting peak in the heat capacity curve, preciding a main melting peak, for clusters in the 20-40 atoms size range, 62 , corroborating the observations of stepwise melting in small Na clusters first reported by Aguado and coworkers 98 .The same group later studied Au nanoclusters' energetic and thermodynamic properties.ANN force fields were employed to probe the potential energy landscape and thermodyanmical behaviour of Au 17 , Au 34 , Au 58 .Here too, ASFs were the local atomic descriptors of choice to generate inputs for the ANN.Canonical and microcanonical molecular dynamics sampling was performed for a total simulation time of around 3 ns for each nanoparticle.The study used such data to demonstrate the presence of a dynamical coexistence of solid-like and liquid-like phases near melting transition.The investigation further encompassed the estimation of the probability at finite temperatures for a set of isomers lying less than 0.5 eV from the global minimum structure.For Au 34 , in particular, the global minimum structure resulted far from being the most dominant structure, even at low temperatures. 63ater, ANN force fields trained by first-principles density functional theory total energies were applied to search for global minima of gold nanoclusters within the basin-hopping method.In this case, the same authors decided to employ a descriptor based on the spherical harmonics expansion (Eq.6), reporting its increased efficiency w.r.t.ASFs.A study on the fluxionality in Au 147 was performed, and it was concluded that the system presents a dynamic surface.Such observation was concluded to be highly relevant in understanding reaction dynamics catalysed by Au nanoparticles.The putative global minimum of Au 147 found by the authors using an ANN force field is reported in Fig. 10, alongside the perfect icosahedron structure. 71he use of Gaussian process regression with 2-body, 3-body, and many-body descriptors (Eqs.8, 9) and kernel functions (Eqs.17, 18) was instead reported for the first time by Zeni et al., which modelled interatomic forces in Ni 19 nanoclusters.Thermodynamical properties as the melting point were probed thoroughly as the cost of the simulations carried out was 100000 times lower than DFT calculations.Fluxionality at temperature below melting was observed along such timescales. 54Nanoalloys Also for the case of nanoalloys, machine learning force fields represent a highly helpful technology enabling thorough assessment of energetic and thermodynamic properties.The first application of neural networks to build force fields of nanoalloys was carried on by Arthrith and Behler.They employed shallow artificial neural networks with Behler-Parrinello symmetry function descriptors (Eq.4) for the prediction of the composition and atomic ordering equilibrium architecture of AuCu alloy nanoparticles.Site-based Monte Carlo simulations were used to sample the composition space while molecular dynamics simulations simultaneously enabled to sample the structure space.An extensive set of equilibrium properties for many temperatures and chemical potentials were thus assessed.Consistent with previous studies, the most stable structures were found to exhibit Cu(core)Au(shell) configurations.Temperature dependent favourable alloy arrangement was also observed, with enhanced Au concentration at the particle core for increasing temperatures.61 Other alloys of Au where investigated by the groups of Bulusu, by means of neural network force fieldswith ASF as descriptors to predict global minimum structures of (AgAu) 55 nanoalloys across different compositions.Pure Au and Au rich compositions minima resulted lower in energy compared to previous reports.Thermodynamical and energetic properties were also thoroughly assessed (c-T phase diagram, surface area, surface charge, probability of isomers, and Landau free energies) to rationalize the enhancement of the catalytic property of AgAu nanoalloys by incorporation of Ag up to 24 by composition in AgAu nanoparticles.This result was found to match previous experimental data.66 The development of novel methods for fast and accurate force and energy calculations is of even greater importance for the study of nanoparticles in conditions closer to the operando one.High-dimensional neural network force fields using the ASFs were also incorporated with Monte Carlo and MD simulations by Kang et al. to identify not only active, but also electrochemically stable PtNiCu nanocatalysts for oxigen reduction reaction in acidic solution.The computationally efficient and precise approach proposed a promising oxigen reduction reaction candidate: a 2.6 nm diameter icosahedron comprising a 60 percent of Pt and a remaining equal mixture of Ni and Cu.83

VI. CONCLUSIONS
In this review, we showcase how machine learning force fields offer the possibility to predict energies and forces with accuracy close to ab-initio methods, but at a much lower computational cost.The approaches to building machine learning force fields are multiple, and this review presents the major algorithms used so far in literature: linear regression, artificial neural networks and Gaussian process regression.Their advent and exploitation allows to tackle the complexity of the energy landscape of metallic nanoparticles, even in the case of operando conditions, for example when an oxide substrate is included and explicitely modelled.Accurate but expensive algorithms to sample the free and potential energy landscapes of metallic nanoparticles (e.g.molecular dynamics, metadynamics, transition path sampling, basin hopping, harmonic approaches, nested sampling) will all greatly benefit from the deployment of machine learning-derived fast and accurate force fields.Particular care must be placed on training database selection to ensure that the relevant parts of phase space are included without redundancy.For this reason, a section of this review focuses entirely on describing the major algorithms for database selection.The core of this review is, however, to provide the state-of-the-art of machine learning force fields to model metallic nanoparticles and nanoalloys.Inspired by the on-going process of designing optimal metallic nanoparticles for target applications, and moving towards a numerical driven search, accurate estimates of how structural changes affect metallic nanoparticles' properties are in high need.To achieve this result, fast and accurate force fields, that allow the exploration of long time scales without the caveat of fitted parameters, are required.

FIG. 1 :
FIG. 1: Schematic comparison of linear regression, 2-layer ANN, and GPR for the local energy prediction task.The symbols in the figure mirror the ones used in the main text.All three algorithms have been

FIG. 2 :FIG. 3 :
FIG. 2: Mean absolute error on the DFT force vector incurred by a 3-body GPR algorithm trained on five isomers of Ni 19 and tested on the defected double icosahedron (dDIH) isomer.Figure adapted with permission from Zeni et al. 54 .

FIG. 4 :
FIG. 4: Log mean absolute error on force vectors incurred by a 2-body (left) and a 3-body (right) GPR algorithm as a function of the log number of training points N tr when trained and tested on a total database of (PtCu) 13 local atomic environments, with N = 13000.The training points have been selected according to the three algorithms described above, and also with random sampling.The black dashed line indicates an estimate of the convergence error, obtained by training the algorithm on 2000 randomly selected training points.The error bars display the standard deviation of the error, obtained by repeating the process five times.

FIG. 5 :
FIG. 5: (a), (b), (c) Relative energies (eV) of the 20 Pt 13 H 18 isomers in the low energy metastable ensemble (zero represents the energy of the global minimum) and structures of the three most stable ones.Grey spheres indicate Pt atoms while red ones indicate H atoms.(d) Thermodynamics stability of Pt 13 Hx clusters (x = 0, 18, 26) as a function of temperature and hydrogen pressure.Figure adapted with permission from Sun and Figure adapted with permission from Sun and Sautet 68 .Copyright (2018) American Chemical Society.

FIG. 6 :
FIG. 6: (a) The atomistic model of a Pt 3 Ag 3 nanoparticle supported on a Pt(111).Ag and Pt atoms colored green and blue, respectively, the slab Pt atoms are white.(b) (c) The success rate of locating the global minimum as a function of the number of candidates evaluated and the number of needed parent calculations, respectively.(d) Graph highlighting the significant reduction in the average parent calculations needed.

Figure
Figure reprinted with permission from Kolsbjerg, Peterson, and Hammer 67 .Copyright 2018 by the American Physical Society .

Figure 5
reports the lowest energy isomers found for Pt 13 in the presence of H, together with the thermodynamic stability diagram of Pt 13 H x clusters.The same year, Kolsbjerg et al. looked at putative global minimum structures for Pt 13 on a MgO support.The authors used an on-the-fly trained ANN force field to relax structures to local minima during evolutionary algorithm searches; here, ASFs were the descriptors of choice.The computational speed-up inherent to the framework most importantly enabled the screening of hundreds of kinetic rearrangement pathways connecting different low-energy conformers.

Figure 6
is shown to highlight the decrease in the number of ab-initio calculations performed thanks to the use of an ANN during the minima structure search.

FIG. 7 :
FIG. 7: (a) Excess energy of a (PtAu) 147 nanoparticle as a function of its chemical composition, located with a ML-accelerated genetic algorithm employing effective-medium theory calculations.(b) Number of energy calculations as a function of the nanoparticle's composition.The four lines correspond to traditional genetic algorithms (GA), machine learning accelerated GA (MLaGA), serialized MLaGA and MLaGA utilising uncertainty (uMLaGA); average and standard deviation over five searches are shown.Figure reprinted with permission from Jennings et al. 89 .
FIG. 8: a Scheme used for training and use of the model.New training data is acquired via DFT single-point calculations.b Scatter plot for three iterations of the convergence system, starting from very poor predictions and converging to more accurate predictions of adsorption energy.c Convergence of the accuracy of the CO adsorption energies with respect to the training set size.Points possess some inherent noise due to the stochastic nature of the neural network training algorithms.Figure adapted with permission from Ulissi et al. 64 .Copyright (2017) American Chemical Society.

FIG. 9 :
FIG. 9: (a) Bottom-view of a snapshot of a MD simulation at 1000 K of a Cu 612 cluster at the ZnO(1010) surface; five copper atoms have been selected to compare the ANN-predicted forces with the DFT forces (in d).(b) Side view of the cluster.(c) A top view of the ZnO(1010) surface is shown.Five oxygen and five zinc atoms have been chosen for a closer investigation of the forces (in (d)).(d) Comparison of the force modulus of two DFT force evaluations using atoms within 6 Å and 9 Å from the central one, and the neural network force field on the whole slab for the atoms highlighted in (a) and (c).Figure adapted with permission from Figure adapted with permission from Artrith, Hiller, and Behler 59 .Copyright 2013 WILEY-VCH Verlag GmbH & Co. KGaA, Weinheim.

FIG. 10 :
FIG. 10: (a) Geometry of an Au 147 icosahedron.(b) Geometry of the putative ground minimum for Au 147 found using an ANN force field.Figure (b) is color coded so to highlight the three shells of Au atoms.Figure Reprinted from Jindal, Chiriki, and Bulusu 71 , with the permission of AIP Publishing.