Transferable atoms: an intra-atomic perspective through the study of homogeneous oligopeptides

ABSTRACT Quantifying an atom's transferability, in a force field context, demands a quantitative understanding of how an atom ‘experiences’ the surrounding environment both intra-atomically and inter-atomically. Here we investigate the intra-atomic (EintraA) viewpoint through the study of the atoms Cα, Hα, N, O, and S in a series of ‘mono’-, tri- and penta-peptides. The remaining inter-atomic viewpoint consists of an electrostatic (via multipole moments), exchange and correlation components respectively, of which the electrostatic component has been previously reported. Together these four energy components, as calculated from the Interacting Quantum Atoms (IQA) partitioning approach, express the foundation of the Quantum Chemical Topological Force Field (QCTFF). In order to have transferability within a force field, smaller sample systems must be calculated and developed as representative of larger target systems. The Cα, Hα, N, O and S atoms in a tri-peptide are energetically comparable to those in their penta-peptide configurations, within 2.1 kJ/mol in absolute value (1 exception). Across all five elements, this energy difference is on average ∼0.3 kJ/mol. On average, the tri-peptide sample systems represent a ∼8.2 Å atomic horizon around the central atoms of interest. Thus, both the previous knowledge of the ∼10.3 Å horizon sphere and ∼0.4 kJ/mol error required by the electrostatic multipole moments, determine how two of the four key QCTFF energy components are affected by an atom's molecular environment.


Introduction
Here we push computational boundaries to help answering an important but rather under-documented question. This question is of interest to anyone concerned [1] with Quantum Chemical Topology (QCT) [2][3][4][5] and focuses on the energetic transferability of a topological atom, now within the context of Interacting Quantum Atoms (IQA) [6]. An answer to the question of energetic transferability is pivotal in the development of a force field because transferability is its zeroth cornerstone [7]. All structure and dynamics predicted by a force field depend on the energy predictions it makes, and therefore we need to know the size of the atomic environment that still influences the energy of a central atom of interest.
A force field is asked to make predictions on a target system, which can be a protein in a large box of water, or a sizeable piece of RNA. Inevitably, target systems are large, and of course too large to serve as a sampling system for a force field. Even if this force field parameterisation were possible, it would defeat the object because it does not make sense to parameterise a dedicated force field for each target system. Instead, one parameterises a much smaller set, which is the sampling system. The question is then how small this sampling system can be. In the case of a protein as a target system, can the sampling system be a single amino acid? Or should it be a protein fragment consisting of three amino acids? Transferability studies can answer this question although perfect transferability is impossible: one cannot take a topological atom from a sampling system and literally transfer it to the target system without introducing an energetic error. So, really, transferability is not binary but a sliding scale, and we will discuss our data below keeping this in mind.
In the past, we have reported that 'the true predictive power of a force field depends on the reliability of the information transfer of small molecules (or molecular clusters) to large molecules. Only if this transferability is high, a force field will make reliable predictions' [8]. The expression 'information transfer' is deliberately kept general and open in order to welcome various atomic properties as gauges of transferability. For example, in a study [9] on the transferability of methylene and methyl fragments in alkylethers, atomic volumes (and even bond critical point properties) were invoked to quantify transferability. Because we believe that energy is the ultimate arbiter in force field transferability, we have studied properties more directly related to energy than volume, or intra-atomic energy itself, as we do in this paper. Given that interatomic electrostatic energy, at sufficiently long range, can be exactly represented by a multipole expansion [10], it makes sense to study the transferability of atomic multipole moments. This was done in the context of computing atom types [11], while the next study [12] on transferability came one step closer to energy by studying the atomic electrostatic potential. Although useful, this type of assessment demands the construction of a grid at which the potential is evaluated. The necessity for such a grid is a vulnerability, which can be circumvented by going beyond the potential and investigating the interatomic electrostatic energy itself. However, this type of assessment introduces one or more atoms tasked to probe the central atom of interest. Such an investigation [13] was done some time ago for a water trimer and a microhydrated serine. Amongst other findings, this study showed that the atoms of serine are more transferable, in going from the isolated serine to the Serine …(H 2 O) 5 supermolecule, than the atoms in the water cluster.
Other research groups have also investigated the issue of transferability [14][15][16][17][18][19][20]. The transferability question does not solely focus on the role of atomic multipole moments in the literature. The highly transferable nature of common structural properties such as bond lengths and angles has also been reported [21].

General background
We strive towards the completion of a topological force field known as QCTFF [22]. QCTFF will be a novel atomistic protein force field that builds on the principle of QCT, which is using the gradient vector field as a (minimal) means to let a quantum function partition itself in space. If this quantum function is the electron density then the product of the partitioning is a set of topological atoms. This was the first result of QCT, better known under the name Quantum Theory of Atoms in Molecules (QTAIM) [3]. An example of topological atoms is shown in Figure 1. Topological atoms are boxes of finite volume, with a peculiar shape that precisely reflects the whole quantum system they are part of. In other words, the whole system imprints its presence (at least in principle, not necessarily with much 'numerical power') onto each of the (topological) atoms within the system. We also note that there are no gaps [23] between the atoms. This means that each little bit of electron density in a potentially large 'pocket' belongs to one atom or another. This means that the electrostatic potential this piece of electron density generates also belongs to an atom, always. As a further consequence, one can assert that all energy contributions (invariably associated with electron density) are always assigned to one atom or a pair of atoms. In short, all intraatomic or interatomic energies are accounted for.
QCTFF is built on the precise constellation of four resolutions [22]. The first resolution is summarised by adopting the topological atom as the carrier of chemical information. Indeed, the total energy of a sampling system, and then ultimately a target system, can be predicted from the information that the atoms carry. The second resolution stipulates that the total energy is partitioned into four types of energy contributions, each Figure . A representation of the topological atoms in Leucine, which is capped both at the N-terminus and the C-terminus by a peptide bond. The nuclear configuration is taken from a (Leu) conformer geometry-optimised at HF/-+G(d,p) level of theory.
of which has a chemical meaning: (1) intra-atomic selfenergy, and interatomic (2) Coulomb, (3) exchange and (4) correlation energy. A short mathematical description of the IQA partitioning is provided in Section 2.3. This scheme is inspired by IQA, which in turn was inspired by the early establishment [24] of a six-dimensional integration over two topological atoms simultaneously. This type of calculation returns the potential energy between two atoms, for any molecular geometry. This achievement makes the calculation of the intra-atomic energy independent of the atomic virial theorem [25]. Third, any 1/r type of interaction can be expanded, thereby introducing spherical harmonic (atomic) multipole moments. This expansion is of great practical use for the Coulomb energy, provided that the multipole expansion converges [26,27]. Finally, a variation in nuclear configuration causes a change in a given atom's energies and its multipole moments. The machine learning method Kriging [28,29] has the capacity to capture the mapping between the coordinates of the atom's surrounding nuclei (input) and this atom's energies and multipole moments (output). The input is cast into a number of so-called features, the details of which are described elsewhere [29]. It suffices to state here that features are essentially internal geometrical coordinates that allow a given atom to describe its own atomic environment (that is, basically by means of nuclear positions). Note that a Kriging model is trained on a data set of sample systems.
When each of the four aforementioned energies undergoes Kriging, then a complete molecular energy Kriging model is generated. Compiling these models, along with other comparable models from other molecules into a database, results in the ultimate formation of QCTFF. At short range, the electrostatic energy (or Coulomb energy without being pedantic) between two atoms can only be calculated directly, that is, without using multipole moments. These interatomic energies then serve as output for a Kriging model. However, at long-range, a convergent multipole expansion is used and each Kriging model takes up the multipole moment as its output. So far, most work has been done [30][31][32][33][34][35] on multipole moments, up to hexadecapole in fact.
The exchange energy can also be expanded into socalled exchange (multipole) moments as first demonstrated [36] in 2007. However, these moments carry imprints of the molecular orbitals themselves, a difficulty that has not been overcome. This is why we instead follow the route of unexpanded exchange energies. Fortunately, for saturated systems [37] (and many non-metallic condensed matter systems are saturated) these energies drop off very quickly with distance. The local nature of the exchange energies makes it feasible to krige them as energies. Locality means that a given atom does not have to be aware of too deep an environment. Because only the immediate neighbours suffice, the number of possible nuclear configurations around the given atom is restricted. Indeed, atoms that are covalently attached to a given atom cannot move around as much compared with atoms further away (that are hence less covalently attached).
Dynamical correlation energies were first [38] calculated through coupled cluster theory, and can also be offered to a Kriging engine to then be mapped onto the coordinates of the surrounding atoms. Work is in progress to do this for inter-atomic correlation energies from Møller-Plesset wave functions. However, the first non-electrostatic energy that was ever kriged is the atomic kinetic energy [39]. Building on this success more non-electrostatic energy components have been kriged in our lab and will be published in due course. However, the subject of this paper is not Kriging but the transferability of a non-electrostatic energy, namely the intra-atomic energy or self-energy, for short. This paper will report the results seen for the intra-atomic self-energy in a series of oligopeptides of varying length.
The intra-atomic energy represents the energy that an atom possesses inside a molecular system. Understanding the energetic cost of transferring an atom from a smaller sampling system into a larger target system reveals whether the sampling system is suitable to be used as a sample for this target. If the sampling system is too small then one observes a large change in self-energy between the given atom in the sampling system as compared with that atom in the target system. An assessment of the atom in the sample system and then in the target system results in a quantitative measure of their similarity. This similarity measure enables the computation of atom types, an achievement that was realised some time ago [11] but based on atomic properties such as multipole moments, virial-based atomic energy and volume. In future work, the intra-atomic energy of an atom could add value and play a role in the classification and computation of atom types, which is under-researched.

Horizon sphere, atomic horizon and compounds studied
The case studies chosen for the analysis focus on homogeneous oligopeptides of three possible sizes: mono-(n = 1), tri-(n = 3) and penta-peptide (n = 5), where n is the number of amino acids in the peptide chain. Eight systems are analysed in total, seven of which (Alanine, Glycine, Isoleucine, Leucine, Serine, Threonine and Valine) will be used to investigate the transferability of the central α-carbon, α-hydrogen, aminooxygen and amino-nitrogen atoms. The eighth system (cysteine) is used to investigate the transferability of a sulphur sidechain atom. This investigation is analogous to that in a previous paper featuring the study of the same five elements (C, H, O, N and S) in the small and naturally occurring protein crambin [40]. That study introduced the concept of a horizon sphere. This concept addresses the following simple question: For a selected central atom, how do its neighbours influence its multipole moments and at which distance can their influence be ignored?
This horizon sphere was presented as a metaphorical sphere, which when centred on a single atom, correlates the energetic change in an atom's multipole moment with the sphere's radius. This allows the intuitive mapping of two commonly known physical parameters (interaction energy and distance). Operationally, the horizon sphere incrementally increases its radius in steps of 0.1Å and observes the growing number of other atoms appearing within its volume. At every step, new atoms may enter the sphere. If not, the horizon sphere grows by another step. As such, a set of nested atomic configurations appears, each containing the central atom, and one can observe how the multipole moments of the central atom change with increasing configuration size. In other words, the horizon sphere allows a chemically meaningful measurement of how far out the central atom still experiences the presence of its atomic environment. In short, how far does the given atom 'feel'?
In crambin, the largest structure considered had a radius of 12Å, and was taken as the reference structure. It consisted of 294 atoms in total, while four atoms (one of each element or C, H, O and N) were selected as 'probing atoms' . The multipole moments of the latter always remained invariant. These multipole moments were combined with the varying multipole moments of the central atom in the horizon sphere to yield the electrostatic interaction energy between that atom and a probing atom.
The conclusion from the crambin work was that each element had its own horizon sphere radius (C α = 10.5Å, H α = 7.7Å, O = 11.0Å, N = 11.3Å and S = 10.8Å) where each atom's multipole moments are influenced by the presence of other atoms (by no more than 0.4 kJ/mol). The current study expands and complements the crambin study from the perspective of atomic self-energy. The latter produces its own horizon sphere radii, as will become clear later. In the presence of two types of radii, the question is then how to quantify transferability. One way would be to take the larger radius of the two because the larger radius is the most 'demanding' in terms of transferability. However, one could also argue that ultimately a force field adds the various energy contributions that it uses to describe a system, and hence the sum of the energies needs to be screened for transferability. In this paper, we will not follow either option but focus on the transferability of the self-energy itself. We know from unpublished work (on different systems, i.e. large water clusters) that the exchange energy generates smaller horizon radii. This is not surprising in the light of previously published work [37], which shows how quickly exchange energy drops off with distance (for saturated systems).
In retrospect, the horizon sphere would have been better called atomic horizon thereby allowing it to be not spherical, in general. In this work we do not construct a horizon sphere but control the size of the sample system by varying the length of an oligopeptide. This size control occurs 'linearly' in that the chain length of the oligopeptide is varied, which is reminiscent of the primary structure of a protein. However, the oligopeptide may very well curl up, reminding us of the importance of secondary structure. The largest size oligopeptide then represents a globular environment with respect to which a given atom is studied. The edge of this environment is more accurately called an atomic horizon. One can then introduce a 'pseudo'-radius for this atomic horizon by measuring the distance between the nuclear positions of the atom of interest and the atom furthest away from it.

Interacting quantum atoms (IQA): some relevant formulae
The IQA partitioning quantitatively describes the energetic of topological atomic, that is, their interaction as well as their internal energy, through a combination of kinetic and potential energies [6]. These energies consist of an intra-atomic or an interatomic contribution. At an unrefined level, the IQA formalism partitions the molecule's energy according to the following equation: The energy term E intra A can be broken down in three contributions: where T A is the kinetic energy of the electrons, V ee AA is the electron-electron repulsive potential energy and V en AA is the attractive electron-nuclear potential energy, all within atom A. Note that the kinetic energy is well defined for the topological atom, which would not be true for an arbitrary (atomic) subspace. Together, these three energy contributions comprise the self-energy possessed by a single atom. This energy is the central quantity investigated in the present work. For completeness, the remaining interatomic energy attributed to an atom is defined in the following equation: where V en AB , V ne AB and V ee AB are as described above but with respect to both A and B. The quantity V nn AB is the repulsive nuclear-nuclear potential energy, which is totally classical within the Born-Oppenheimer approximation. For the sake of completeness, the V ee AB contribution can be specified further, where the first term on the right-hand side embodies the Coulomb interaction between electrons in atoms A and B, the second term is the electronic exchange energy (between A and B) while the third term is the most challenging term to calculate, which is associated with dynamic correlation or dispersion. A further rearrangement of the energies take places in Equation (5), and following this, a new expression for the complete interaction energy between two atoms, denoted V inter AB , can then be formed, as Here, V elec AB represents the complete electrostatic interaction energy between two atoms A and B, now including the interaction with the respective nuclei. This quantity (rather than V AB Coul ) is the energy that has been expanded as a multipolar series on many occasions [26,27,[40][41][42][43][44] in the past.
For a more exhaustive description of the partitioning scheme including additional formulae and previous applications, the reader is directed to the original literature by Blanco et al. [6,38,[45][46][47][48]. For the purpose of this paper, we will only present the transferability assessment from the point-of-view of the intra-atomic energy E intra A as defined in Equation (2).

Computational details
The penta-peptide geometries were the result of a geometry optimisation at the HF/6-31+G(d,p) theory level using the GAUSSIAN09 program [49]. No frequency calculations were carried out because confirming that the optimised geometries are true energy minima is not essential in reaching the conclusion of this paper, and the cost of these extra calculations is huge, given their size. A randomly generated penta-glycine (Penta-gly) was created from scratch by the program Gaussview and its sidechain changed according to the amino acid to be analysed. For each of the eight oligopeptides studied, the penta-peptide always provided the exact geometry of the tri-peptide and the single amino acid, that is, for the atoms the penta-peptide (n = 5) has in common with its derivatives (n = 3 or n = 1). Hence, the tripeptide and the single amino acid were not geometryoptimised. Thus, the current transferability study freezes out any geometry changes while comparing the pentapeptide with the tri-peptide, for example. Note that neither di-peptides nor tetra-peptides were included in this study (i.e. n = 2 or n = 4). They are excluded to ensure that the radius decreases approximately symmetrically at either side of the central amino acid under study.
Following the optimisation of the penta-peptide, this system is first trimmed to form the corresponding tripeptide and then again to form the single amino acid. When trimming the N-terminus, the CH 3 NHC( = O)group is removed and replaced by a hydrogen atom. This means that the first α-carbon of the original pentapeptide now becomes a methyl group. This methyl group caps the emerging tri-peptide in the same way as the now removed methyl capped the original penta-peptide.
Similarly, trimming at the C-terminus means removing the CH 3 C(= O)NH-group and replacing it by a hydrogen. Again, this substitution generates a terminal methyl group, which is part of the newly formed but familiar CH 3 NHC(= O)-capping group. The geometrical positions of hydrogen atoms in these methyl caps are predetermined by previous atomic positions in the larger chain. Their bond lengths are standardised to be 1.07Å in accordance with the standard parameters in Gaussview.
The IQA energy partitioning was carried out by the program AIMAll [50][51][52]. Default inputs were used. The keyword 'encomp = 3' was included in the input, which is the short-hand input name referring to 'Energy Components' . Some poorly calculated atoms were recomputed with a specified outer angular quadrature ('sky-high_lebedev') for the atom under study, instead of the default 'auto' in an attempt to obtain a more accurate calculation. This improved some atomic integration errors, but also worsened some. The best from both sets of runs were selected for further analysis. Typical CPU times for oxygen atoms (which took most time compared to other elements) in penta-peptides amounted to up to 24 hours on 32 cores, highlighting the compute intense nature of the current study. Table S1 in the Supplementary Information gives an impression of the general atomic integration accuracy obtained from the L( ) value [53], which settles for about 0.2-0.5 kJ/mol for all elements except carbon where L(C) = 2.0 kJ/mol. The latter error is worrisome but could not be improved in spite of several attempts.
With regard to visualisation, Figure 1 was generated with in-house software called IRIS, which is based on earlier work [54,55]. The nuclear configuration is taken from a (Leu) 5 conformer geometry-optimised at HF/6-31+G(d,p) level of theory (51 Molecular Orbitals and 542 Gaussian primitives). This figure represents mono-Leucine whose geometry was taken directly from the optimised penta-Leucine (the central amino group '3') and capped by the familiar CH 3 NH-group at the Cterminus and CH 3 C( = O)-group at the N-terminus. The wave function was again calculated at HF/6-31+G(d,p) level. IRIS's default settings were employed, other than using wireframe for the surface and altering the transparency. Default element colours were used. The images of each of the eight penta-peptides (geometry-optimised at HF/6-31+G(d,p) level) shown in Figure 2 were created by AIMStudio [50]. Each penta-peptide is capped at both termini by the same groups as in Figure 1. Default settings were used other than changing the electron density cutoff to 1×10 −6 a.u. in order to ensure that there are no 'gaps' in the non-covalent interaction lines. Each molecule was viewed individually and screenshot. The eight screenshots were combined to give the final image in Figure 2.

Results and discussion
Here we monitor how the E intra A energies of each of the five elements occurring in naturally amino acids change with peptide size. The hypothesis for the overall analysis is that the central atoms of the penta-peptides will be closest in energy to the corresponding central atoms in the tri-peptides. If the hypothesis is true, within a suitable energy margin, then the tri-peptide atoms can sufficiently accurately represent (or model) the atoms in the (penta-peptide) target system. Figure 2 shows the precise configurations of each of the eight penta-peptides investigated. The geometry of these peptide chains are not constrained or biased to any conformations during the ab initio geometry optimisation and are all initialised in a consistent and random way. Hence, no optimisation is directed towards a predominant favourable growth pattern, e.g. 'linear' in one direction. Linear oligopeptides are like open chains, i.e. extended in one dimension. On average, the internuclear distances in such open configurations are larger, as opposed to 'curled' configuration where the central atom is surrounded by atoms that are closer by. One expects more distant atoms (in the environment) to have less influence on the central atom. Therefore, the central atom will be more readily transferable, since most of its environment in the penta-peptide does not matter much. Hence, a transferability test on an open (i.e. linear, extended) configuration is less severe than one on a curly (i.e. globular) configuration. The majority of the configurations are curly, so our transferability tests are severe. Finally we note that, although not directly investigated here, it is well known that the number of local energy minima present in oligopeptide conformational space is vast [56]. Based on the data presented here, we cannot be sure that the fixed configurations studied are representative for configurations found in the Protein Data Bank, for example. However, the uniform treatment and the lack of bias gives some comfort that the results may be universal.
Across the eight geometry-optimised penta-peptides, the penta-Gly system showed a preference for helical growth (in the peptide chain). However, the penta-Thr system showed no obvious preference in adopting a pronounced secondary structure, while the remaining six penta-peptides curl up to form conformations resembling β-turns. In all penta-peptides, numerous intramolecular interactions were observed, marked by a complex network of bond and ring critical points. These intramolecular interactions also increase the possibility for a network of interactions that connects the atoms of the central amino acid to the termini of the oligopeptide. This phenomenon is prevalent in water clusters and known to influence atomic energies [57,58]. This effect adds another dimension of complexity to the study.

The differences in self-energies (E intra
A see Equation (2)) for each element across each peptide system are summarised in Table 1 (for nitrogen), Table 2 (for oxygen), Table 3 (for carbon), Table 4 (for hydrogen) and Table 5 (for sulphur). Each table details the difference in intra-atomic energies, denoted E, for the given atom across the sequence of three oligopeptides ('penta' , 'tri' and 'mono'), one entry for each of the eight amino acids (except cysteine). The data of the cysteine oligopeptides are only used to study the element sulphur. The energy differences listed in all the tables are very much smaller than the typical magnitude of the E intra A energies for the five possible elements, which are huge: nitrogen ∼−140,000 kJ/mol, oxygen ∼−195,000 kJ/mol, carbon ∼−100,000 kJ/mol, hydrogen ∼−1200 kJ/mol and sulphur ∼−1,042,000 kJ/mol, respectively. First, the nitrogen and oxygen results confirm the hypothesis stated above, that the energy difference between the tri-and penta-peptide (i.e. E 2 ) is the smallest possible of the three energy differences. The maximum energy difference (in absolute value) between the penta-peptide and tri-peptide self-energy for N is 3.3 kJ/mol, and 0.8 kJ/mol for O. Overall, combining all entries of oxygen and nitrogen, in 11 of the 14( = 2×7) Table . Energy differences ( E) in peptidic nitrogen (see Figure ) intra-atomic energies. All energies are in kJ/mol.   a The numbers marked in bold and italics are where the hypothesis fails. The hypothesis fails as a result of the mean error being too large to observe the smaller (more sensitive) differences ( E) observed between the carbon atoms in the peptide chains. This convention is also used in Table . cases, energy differences are smaller than 1 kJ/mol. This is a very pleasing result, considering the magnitude of the intra-atomic energies of these atoms. However, this result also highlights the accuracy required for a study of this nature. The average integration errors (L( ), given in the fifth columns of each table), validate this conclusion. The reported errors are an average of the integration error calculated for the atom in each of the three chain lengths. Overall, the total intra-atomic energy differences across the olidopeptides are below 0.0016% and 0.0012%, respectively.
Observing the trends for the α-carbon and its bonded α-hydrogen atom, the message is less pleasing. For the carbon atoms the hypothesis only holds for 4 (Ala, Thr, Val and Ile) of the 7 amino acids. The same is true for the α-hydrogens (Ala, Thr, Gly and Leu). However, for both C α and H α , much smaller total energy differences are observed across all oligopeptides, in general. For carbon and hydrogen, absolute energy differences fall under 3.9 kJ/mol (two exceptions: Ala and Ile) and 0.65 kJ/mol (one exception: Ala), respectively. These small energy differences combined with the relatively much larger averaged atomic integration errors L( ), make for less convincing results. It appears that, when the averaged integration error has a magnitude similar to that of the energy difference, then the hypothesis does not hold. When the average integration error is sufficiently smaller than the energy difference, then the hypothesis holds. Perhaps fortuitously, some hydrogen energy differences still confirm the hypothesis, despite the integration errors being relatively large in comparison to each of the energy differences (e.g. Thr E 2 = 0.14 (±0.4) kJ/mol and Gly E 2 = −0.01 (±0.3) kJ/mol)). However, there are still three cases (Ser, Val and Ile) where the integration error is the accuracy limiting factor. The integration errors observed throughout the hydrogen atom analysis are smaller than 0.4 kJ/mol in absolute value, and would normally be considered very accurate. However, for these atoms, the difference in the E intra A energy is very low and in most cases is smaller than the mean error. The same observations can be made for the α-carbons, which are known to be more difficult to accurately integrate due to the increased complexity due to its tetrahedral hybridisation [59]. The hybridisation of the atom  generally appears to correlate well with the accuracy of the calculation (see Table S1). The atomic integration errors are not a problem for the oxygen atoms and only a minor effect for the nitrogen atoms. Overall, these atoms have the optimal balance of good atomic integration errors and large E intra A energetic differences (>5 kJ/mol in absolute value).
Finally, Table 5 shows that the sulphur atom in the Cys oligopeptide chains also conforms to the expected trend with an energy difference, all in absolute value terms of 1.67 kJ/mol between tri-and penta-chains ( E 2 ) as compared with 7.10 kJ/mol and 8.78 kJ/mol for E 1 and E 3 , respectively. Tables S2-S6 of the Supplementary Information summarise the pseudo-radii of the atomic horizons of each of the elements in each of the oligiopeptides studied. Previous work on crambin [40] showed that the electrostatic multipole moments generate a unique horizon sphere radius for each element. Table 6 lists these radii alongside the atomic horizon pseudo-radii obtained here for the averaged E intra A values of the tri-peptides. It is clear that, for each element, the E intra A causes a smaller atomic horizon compared to the horizon sphere of the multipole moments. The one exception to this conclusion is H α , for which the multipolar horizon sphere radius is smaller, by 0.6Å. However, as we have observed the H α atoms to show very little energetic change across all mono-, tri-and penta-peptide chain lengths (| E | approx. < 0.3 kJ/mol), we believe it is fair to treat the 'mono-peptide' (i.e. single amino acid) as a suitable sample system size for these atoms. Hence, a new atomic horizon pseudo-radius can be calculated for H α only, based on the mono-peptide spheres. The mono-peptide averaged atomic horizon pseudo-radius is 5.2Å for the H α atoms, which coincides with observing the smaller E intra A pseudo-radius compared to the multipolar horizon sphere radius.
The energy differences associated with E intra A are smaller than those associated with the multipole moments. In general, an energy difference (whether from multipole moments or E intra A ) becomes smaller and smaller with increasing size of the sample system. If this energy difference reaches zero, one can conclude that convergence occurs. Comparing E 1 (TRI-MONO) and E 2 (PENTA-TRI) gives an impression of the speed of convergence. Indeed, if E 2 (PENTA-TRI) << E 1 (TRI-MONO) then the convergence is fast. From the respective tables it is clear the N, O and S atoms show fast convergence. However, the C α atom and H α atom converges slower. We note that their E 1 (TRI-MONO) values were already quite small in the first place. Returning to sulphur, it is regarded as fast converging despite a large E 2 (PENTA-TRI). This fact can be rationalised by the larger number of electrons that a sulphur atom owns and, thus, the greater the complexity and sensitivity of the E intra A energy. In general, the convergence is faster for E intra A compared the multipolar energies (studied in crambin [40]), as can be seen from Table 6. In addition, the atomic horizons of the various elements are also smaller for the E intra A as compared with those of the multipolar energies.

Conclusion
The atomic self-energy, denoted E intra A , is studied as a gauge of energetic transferability for five elements (H, C, N, O and S) occurring in homogeneous oligopeptides (of increasing length) of eight possible amino acids. The selfenergy of a given atom is systematically monitored as a function of a chemical environment growing in size but while freezing the geometry of the central amino acid.
The central hypothesis of the current work is that E intra A of an atom in a tri-peptide is quantitatively close to E intra A of the corresponding atom in the pentapeptide. This hypothesis proves unreservedly correct for the oxygen, nitrogen and sulphur atoms. However, for the α-carbon atoms the hypothesis is harder to prove because the atomic integration errors are larger than for the other four elements. The α-hydrogen atoms show very small differences in E intra A across all three sizes of (oligo)peptide. However, for α-hydrogen and α-carbon overall, the central hypothesis is true in 8 out of 14 (= 2×7) peptides. For the remaining six cases, the atomic integration errors are too large to be conclusive.
We have learned that for the elements studied, the tripeptide is a sufficient sample system to accurately predict the energy (on average to within ∼0.32 kJ/mol) for the corresponding element in the target penta-peptide. For hydrogen, the sample size can be reduced further to a single amino acid as a result of small energy differences (| E | approx. < 0.3 kJ/mol) across all oligopeptides.
The convergence of the intra-atomic energy is generally faster compared to multipole moments. In addition, the atomic horizon pseudo-radii are smaller than the radii of the multipolar electrostatics horizon spheres.
Atomic integration errors will ever improve with algorithmic efficiency and, thus too, will the accuracy of the partitioned energies of a molecule. Future work will involve the study of the E IQA A component for a complete description of the atomic horizon. This will provide supplementary resource and knowledge towards the development of QCTFF.