Categorization of inorganic crystal structures by Delaunay tetrahedralization

ABSTRACT Crystal structure prediction is demanding in two senses: this is a difficult problem, both experimentally and computationally, yet we want to find new materials with new properties that take unknown crystal structures. Using statistical information on the local environment of already reported stable crystal structures would be helpful when, for instance, finding new crystal structures through visually adding atoms sequentially to a cluster of atoms, which is an intuitive procedure that does not require substantial computational resources. This study conducted hierarchical classification of frequently appearing structure types in the ICSD database by comparing the occurrence ratios of H, B, C, N, O, F, and transition metals (TM) in binary, ternary, and quaternary crystals. The bond angle and solid angle distributions of Delaunay polyhedra with atoms as vertices were considered as descriptors of the local environment for each structure type. Hierarchical clustering of structure types based on bond angle or solid angle distributions revealed that the structure types can be broadly categorized into those favoring TM oxides, having low TM occurrence, and having high TM occurrence. Our results indicate that the anion, or lack of, plays a key role in determining the crystal structure type, and structure types strongly preferred by TM oxides can accommodate a broad range of angles. GRAPHICAL ABSTRACT


Introduction
Prediction of crystal structures from the constituent atoms and their ratios is still a big frontier of solid-state inorganic chemistry. The diversity of crystal structures results in a huge uncertainty in the chemists' guesses of the choices of possible structures from information on constituent elements only. There also remains many unreported crystal structures unknown to us.
Oganov et al. conceived a fully-automated method to predict crystal structures named USPEX [1][2][3], to attack this frontier. This method employs evolutionary algorithms to generate candidate structures and searches the most stable structure by relaxing and calculating their formation energies, for example by the embedded atom model (EAM) or even density functional theory (DFT) calculations. Such efforts are computationally expensive because formation energies of many crystal structures need to be evaluated, which could reach the order of 10 3 to 10 4 . Necessary parameters, such as the number of individuals per generation and number of generations, must be obtained from experience. As a result, this technique cannot be casually used to search reasonable crystal structures for arbitrary materials. As another effort, Wang et al. developed the CALYPSO code that uses particle swarm optimization for determination of energetically stable/metastable crystal structures of materials at given chemical compositions and external conditions such as pressure [4,5]. The CrySPY code can use Bayesian optimization and Look Ahead based on Quadratic Approximation in addition to random search and evolutionary algorithm for crystal structure prediction [6].
We believe that there are needs for alternative tools to guess and identify reasonable inorganic structures that are way less computationally intensive, thus less time-consuming, and easily accessible by a broader group of inorganic chemists. Guidance based on statistical information on the local environment of already reported stable crystal structures would be helpful to avoid absurd structures. Real-time display of such information allows visual addition of atoms, one by one, to build potentially stable crystals while ensuring that all bond lengths and angles are in their reasonable ranges.
Several descriptors have been developed to describe the local environments in inorganic crystal structures. The most classical such descriptor is the element radius. Representative radii values are derived by statistically analyzing the interatomic distances within a selected group of inorganic crystals. Examples are atomic [7,8], ionic [9,10], covalent [11], metallic [12], and van der Waals radii [9,13,14]. These enabled many inorganic chemists to exclude implausible candidates of stable structures where interatomic distances are highly inconsistent with reported element radii. An intensive analysis of local crystal structures has been carried out by Villars et al [15]. They examined the topologies of local coordination polyhedra in ~ 5000 binary compounds and categorized them based on their topologies. They classified 147 classical structure types into 97 coordination types, and expressed their occurrences on a two-dimensional grid map, where the two axes are the constituent elements arranged by the Mendeleev number. However, their analysis of coordination polyhedra was not applicable to ternary compounds, thus they employed a different classification scheme that uses the concentration-dependent sums of the valence-electron numbers VE, differences in pseudopotential radii R, and the differences in electronegativity X.
In this study, we employed Delaunay polyhedra as basic structural units for structural classification of multinary compounds. The Delaunay tetrahedralization technique is a three-dimensional version of Delaunay triangulation [16]. In general, Delaunay tetrahedra can be uniquely determined such that all four vertices of a tetrahedron lie on a sphere where no other vertices are inside. It divides the space into a set of tetrahedra from a given set of vertices such that the resultant set of tetrahedra are most isotropic. One counterexample is a simple cubic structure, where Delaunay polyhedra are cubes and their eight vertices lie on a sphere without any vertices inside. While Delaunay tetrahedralization is a basic technique in three-dimensional computer graphics, a few studies have applied this on inorganic crystal structures. The conjugate technique, Voronoi tessellation, is more commonly used for the analysis of crystal structures, although the unit polygons are quite complex. [17][18][19][20] In particular, the area of a Voronoi polyhedron centered at an atom is reported to be proportional to the strength of an atomic interaction of a given type [21,22].
Visualization of Delaunay tetrahedra can be useful in a crystal structure building tool because the tetrahedron is a primitive unit structure that resembles the real process of crystal formation: an atom wanders on the surface of a cluster of atoms and settles in a stable position. However, chemical bonding could cause preference of some forms of Delaunay polyhedra compared to others in a similar manner to preference of some interatomic distances and bond angles based on the chemical bonding. For instance, sp 3 bonding in the diamond structure results in bond angles of 109°, while the bond angles between nearest neighbors are 90° and 180° in the rocksalt structure. Delaunay polyhedra cannot be reasonable building units for crystal structure building, based on constituent elements and their ratios, without knowing the bonding-type dependences of the chemistry.
We attempted a preliminary analysis of this 'Delaunay chemistry' in this study. We carried out Delaunay tetrahedralization on the 100 most common structure prototypes in the Inorganic Crystal Structure Database (ICSD) [23,24]. We then hierarchically classified these prototypes based on the shapes of the constituent Delaunay polyhedra, to investigate if the basic concept of inorganic chemistry appears in the local geometry of the Delaunay polyhedra. The shape of each Delaunay polyhedron was described by the set of bond angles and solid angles at the vertices.

Selection of the common crystal structure prototypes
Crystal structure files were downloaded from the ICSD as 242,848 CIF files in June 2021. The entries with the same 'StructureFormula', FormulaUnitsPerCell" and 'HMS' (Hermann-Mauguin symbol of the space group) were removed as duplicates. Note that this procedure does not remove the duplicates with identical space group number but different HMSs, such as Pnma and Pbnm in the space group number 62. The HMS and space group number information in the ICSD database is not curated well and contains many mistakes such as non-consistent HMS and space group number relations, hence we refrained from stripping too much information. From the 179,602 non-duplicate structures, we removed structures with partial occupancies or any errors during the conversion into VASP POSCAR format. Out of the remaining 85,403 structure files, we selected the compounds composed of two to four elements and without noble gases (He-Rn) and actinides (Ac-Lr). The resultant 61,256 structure files were classified by the names of the structure types as recorded in the ICSD. The 100 most frequent structure types were selected for analysis.
The occurrence ratio of H, B, C, N, O, F, and transition metals (TM, defined as Sc-Cu, Y-Ag, La, Ce, and Hf-Au) were identified for each structure type. These ratios (0-100%) were used as elements of seven-dimensional feature vectors for hierarchical classification using the Euclidean metric and Ward's method. We focused on occurrence ratios of these elements to detect ionic compounds with popular anions (H, B, C, N, O, F) and to distinguish compounds consisting of main-group elements only. TM metals were included in 63.5% of the 61,256 structure files. O and F were the most and least popular anion with occurrence ratio 38.2% and 5.3%, respectively.

Bond angles and solid angles of a Delaunay polyhedron
Positions of all atoms that lie on a sphere where no other atoms are inside were considered as vertices of a Delaunay polyhedron in this study. Delaunay polyhedra are generally tetrahedra, but may have more than four facets in exceptional cases. We use polyhedra that are not tetrahedra in these degenerate cases; not forcing tetrahedralization avoids making a choice among equally valid options. Bond angles and the solid angle of Delaunay polyhedra of an arbitrary atom A were considered for use as descriptors of a crystal. The solid angle is important because this value reflects the directional relations between four atoms (three bonds) rather than three for a bond angle (two bonds). For example, the solid angle is small when three bonds point close to each other, thus atoms with many bonds tend to have small solid angles. Atoms with distorted coordination environments can have breadth in the distribution of solid angles compared with atoms at highly symmetric sites.
Delaunay polyhedron embraces a void, so the size of a face of the Delaunay polyhedron could be considered as a measure of the channel between this void and a neighboring void. The solid angle distribution therefore acts as a fingerprint of a crystal incorporating geometrical meaning.
Angular information describes the crystal from a viewpoint completely different from bond lengths. Angles are, in one sense, normalized by bond lengths. Bond angles of, for example, rocksalt LiF and RbI are identical even though the bond lengths of the latter are almost 2 times that of the former. Normalizing by the shortest bond length may appear as one solution to scale the geometry crystals. However, how to scale is not an easy problem in anisotropic crystals such as hexagonal wurtzite; the c/a ratio varies from 1.60 in ZnO to 1.64 in ZnS. The problem is even more complicated in more complex crystals. This study chose to focuses on angular information as descriptors of a crystal.
The positions of atoms in the crystal other than A are denoted as P i (=P 1 , P 2 , . . .). The bond angles and solid angles of a Delaunay polyhedron are determined as follows.
The qhull code [25] was used in this study. The vertices of a Delaunay polyhedron, other than A, are labeled as Facets where A is not included as a vertex are not investigated hereon. • All angles ffR km AR kn of facet F k are enumerated.
The largest angle is the bond angle of O for facet F k . • The set of vertices of Q j that share a ridge with O are denoted as S n (S 1 , , and the projection of S n on a unit sphere centered at A is labeled as S' n . • The convex hull of points A and S n is obtained.
The facets of the convex hull must be triangulated; any choice can be used if there are multiple choices of triangulation. • The sum of solid angles of apex A, corresponding to triangles where A is not a vertex, is calculated. For a triangle with vertices S 1 , S 2 , and S 3 , the solid angle is [26,27,28] 2arccos 1 þ cos ffS 1 AS 2 ð Þ þ cos ffS 2 AS 3 ð Þ þ cos ffS 3 AS 1 ð Þ ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffiffi The total of such solid angles is the solid angle of this Delaunay polyhedron for apex A. Figure 1 shows an example of solid angle determination in a bcc crystal. The tetrahedron with atoms shown in red is discussed here. The above algorithm is almost trivial in tetrahedra; S 1 , S 2 , and S 3 are simply three atoms other than the apex, and cosines can easily be obtained using the Cartesian coordinates of the atoms.
Things become much more complicated with the Delaunay polyhedron is not a tetrahedron. There are multiple, equally reasonable ways to split such a Delaunay non-tetrahedron into tetrahedra, and we want to avoid dealing with this degeneracy issue. The aforementioned algorithm is demonstrated when the Delaunay polyhedron is a cube (Figure 2), which appears in the simple cubic and related structures such as rocksalt, but is valid for all possible polyhedron shapes. The same atom (vertex) is denoted with various symbols depending on the role of the atom in the analysis. There are eight vertices, A and Q 1 to Q 7 , and six facets, F 1 to F 6 , in Figure 2(a). Facet F 1 is shown in Figure 2 There are three angles of the form ffR 1m AR 1n (orange arcs), and the largest angle, ffR 11 AR 13 ¼ 90 � , is the bond angle of A for this facet. There are three vertices, S 1 to S 3 , that are connected by a ridge of the Delaunay polyhedron to A (Figure 2(c)). These vertices are projected to the unit sphere (projected vertices are S' 1 to S' 3 ) centered at A, and the area of the spherical triangle with vertices S' 1 to S' 3 (spherical excess) is π/2. Therefore, the solid angle of A is π/2 sr, or 1/8 of the solid angle of a sphere (4π sr). Table S1 shows the 100 most frequent structure types together with occurrence ratios of H, B, C, N, O, F, and TMs. The most frequent structure type was Perovskite#CaTiO3 with 747 entries, while the least  frequent was MnP#FeAs and PbCl2 with 60 entries each. Figs. S1-S2 show the hierarchical classification result. Structure types were classified into three broad (A-C) groups and nine refined (A1-C9) subgroups, and data in Table S1 is sorted by group. Table 1 shows a summary of these groups (the group of each structure type is also shown in Table S1). Group A, which has a high occurrence ratio of O and TM, branches off first. The remaining structures are then separated into groups B and C, which are distinguished mostly by the TM occurrence ratio. Groups A and B can each be split into two subgroups according to the O occurrence ratio. Boride-and fluoridefavoring cluster types (subgroups C5 and C6) split off first in group C, followed by structures with very high TM and low anion-forming element (H, B, C, N, O, and F) occurrences (subgroup C7). The remaining structure types are separated into two clusters (subgroups C8 and C9) based on the occurrence ratio of C, N, and O. The hierarchical classification of structure types well reflects the nature of bonding, for instance ionic and metallic, as anticipated from the type of constituent elements.

Relation between structure types and favored bond angles
The relation between structure type and favored bond angle provides a measure of reasonableness on the estimated crystal structure for given constituent elements. For example, one can speculate whether a crystal structure is likely or not likely to be accomplished. The distribution of bond angles can be regarded as a descriptor of the crystal structure. The bond angles in a crystal can be summarized as a histogram with a certain bin size, which is available as a feature vector. The element x i of the feature vector of the bond angle distribution, x=(x 0 , x 1 , x 2 , ., x i. , .), is the ratio of bond angles between [ib, (i + 1)b), where the bin size b was chosen to be 4.5 degrees in this study, and the histogram for a structure type was obtained using all bond angles in all crystals within the structure type. The total number of bond angle records for an entire structure type varied from 9552 in the CdI 2 -type structure (hP3) (61 crystals) to 1315200 in the Th 6 Mn 23 -type structure (167 crystals). The sum of all elements of the feature vector x was normalized to unity. Figures S3-S4 show the hierarchical classification of bond angles, and the feature vector x for each structure type is shown in Figs. S5-S9. The 100 structure types can be categorized into three groups, I to III, and class II is further refined into two subgroups, II2 and II3. Group I (subgroup I1) is the first branch and contains 32 structure types. Subgroups II2 and II3 have 31 and 32 structure types, respectively. The average in each subgroup of the standard deviation, over all bins, of feature vector elements (fraction of bonds in each bin) for constituent structure types is .086, .056, and .048 in subgroups I1, II2, and II3, respectively. The standard deviation is zero in the hypothetical case where the fraction of bond angles in all bins are equal while the standard deviation is maximized when all bond angles are entirely in one bin. This means that the bond angle distribution is relatively diverse in subgroup II3. Group III (subgroup III4) has six structure types, which all have no degrees of freedom in internal coordinates due to symmetry. Possible bond angles there are 45°, 70.5°, and 90°.
The matrix of number of structure types shown with structure type groups A to C and bond angle subgroups I1 to III4 is given in Table 2. The ratio of structure type groups in each bond angle subgroup are visualized in Figure 3. We find a striking difference Table 1. Summary and description of groups identified from hierarchical classification of structure types. The examples shown are the two most frequently occurring structure type for each subgroup (three in B3 because of a tie in the number of crystals and one in C6 because there is only one structure type). between subgroup II3 and the others: 17 out of 18 group A structure types are found in subgroup II3 containing 31 out of 100 structure types. This means that structure types favored by TM oxides can accommodate a wider range of bond angles compared to other chemistries.

Relation between structure types and favored Delaunay solid angles
The bond angle reflects the relation between the atom of interest O and two nearby atoms. In contrast, the solid angle of the Delaunay polyhedron around the atom O is a measure of the spatial distribution of atoms. As with bond angles, the solid angle distributions were obtained and histograms were derived by structure type. The element y i of the feature vector of the bond angle distribution, y=(y 0 , y 1 , y 2 , .., y i. , ..), is the ratio of bond angles between [is, (i + 1)s), where a bin size s of .03π sr was used in this study. Figures S10-S11 show the hierarchical classifications for solid angles, and the feature vector y for each structure type is shown in Figs. S12-S16. The 100 structure types are categorized into three groups, i to iii, and group iii is further refined into five subgroups, iii3 to iii7. Group i (subgroup i1) is the first branch and contains six structure types with no degrees of freedom in internal coordinates. These are characterized by a large fraction of π/6 sr solid angles. Group ii (ii2) is the second branch with 30 structures. Group iii splits into subgroups iii3 to iii7. The average solid angle among the 100 structure types is the lowest in the four structure types in subgroup iii3. Subgroup iii7 has no degrees of freedom in internal coordinates, and the variety of solid angles are also limited; their fractions of the π/6 sr solid angle is lower than in group i.
The average in each subgroup of the standard deviation, over all solid angle bins, of feature vector elements (fraction of bonds in each of the considered 37 bins) of constituent structure types in subgroups ii2, iii3, iii4, iii5, and iii6 is 0.073, 0.063, 0.065, 0.060, and 0.066, respectively. Therefore, subgroup ii2 (group ii) has a relatively concentrated distribution of solid angles. The distribution of the difference of solid angle fractions, [0.09π, 0.12π) -[0.12π, 0.15π) sr, among subgroups iii4, iii5, iii6 can be expressed using (mean, standard deviation) = (−0.22, 0.13), (−0.03, 0.10), and (0.14, 0.11), respectively. In other words, the fraction of solid angles in the range [0.09π, 0.12π) is smaller, almost the same, and larger than [0.12π, 0.15π) sr in subgroups iii4, iii5, iii6, respectively. Group ii can also be distinguished from subgroups iii4, iii5, and iii6 by comparing the fraction of bonds in the range [0.06π, 0.12π) and [0.15π, 0.21π) sr. The latter is larger than the former by 0.12 or more in all structure types in group ii. On the other hand, this relation is satisfied by only three of the 57 structure types in subgroups iii4, iii5, and iii6 (the maximum among the three is 0.15).  I1  II2  II3  III4  Sum  A  1  17  18  B  5  6  5  2  18  C  26  24  10  4  64  Sum  31  31  32  6  A matrix of number of structure types, shown with structure type groups A to C and solid angle subgroups i1 to iii7 as the axes, is given in Table 3, and the ratio of structure type classes in each solid angle subgroup is visualized in Figure 4. Again, group A structure types are concentrated in a solid angle subgroup; 15 out of 18 group A structure types are found in subgroup iii5 with 30 out of 100 structure types. There is a large group, ii2, with 30 structure types that do not contain any group A structure type. The relatively concentrated solid angle distributions as in subgroup ii2 is also not favored in group A TM oxides. These results suggest that, as in bond angles, TM oxides strongly prefer some subgroups of solid angle distributions over others. Relatively concentrated solid angle distributions as in subgroups i1, ii2, and iii7 are not favored in group A TM oxides. The average solid angle in group A structure types is 0.18π sr (~4π/22), which is much larger than what is allowed in subgroup iii3 (average 0.15π sr). Table 4 shows the matrix of number of structure types shown with bond angle subgroups I1 to III4 and solid angle subgroups i1 to iii7 as axes. The matrix is far from sparse; when subgroups with less than 10 structure types are removed (III4, i1, iii3, and iii7), 10 out of 12 matrix elements are occupied. This means that the bond angle and solid angle subgroups are quite independent. Sizeable matrix elements in Table 4 are I1-ii2 and II3-iii5 with 23 structure types each. The former contains two group B and 21 group C structure types, indicating a high concentration of TM-rich non-oxides. On the other hand, 15 out of 18 groups A structure types are concentrated in the latter 23 out of 100 structure types.

Additional remarks
One may consider performing crystal structure prediction solely based on bond angles and solid angles and finding new crystals. Unfortunately, this is a very difficult task. Construction of relative atom positions from angular information only is very subtle compared to the inverse problem of obtaining angular information from atomic positions. In one example, angular information does not change when a simple cubic structure is gradually elongated in the <100> direction. Information on the scale (lattice parameter size) is lacking because of normalization. The labels of atoms are removed, thus how to arrange elements at atom positions, even if identified, is not prescribed. Despite these drawbacks, angular information provides an additional viewpoint to Table 3. Matrix of number of structure types based on the structure type and solid angle subgroup. Solid angle   i1  ii2  iii3  iii4  iii5  iii6  iii7  Sum  A  2  15  1  18  B  1  4  1  3  6  1  2  18  C  5  26  3  10  9  10  1  64  Sum  6  30  4  15  30  12  3 Solid angle   i1  ii2  iii3  iii4  iii5  iii6  iii7  Sum  I1  1  23  3  2  2  31  II2  6  8  7  10  31  II3  1  1  5  23  2  32  III4  5  1  6  Sum  6  30  4  15  30  12 3 100 discuss crystal structures that complement existing studies. One application is checking whether a predicted crystal structure is reasonable given its stoichiometry.

Summary
This study categorized occurrence ratios of H, B, C, N, O, F, and TM in frequently appearing structure types in binary, ternary, and quaternary compounds in the ICSD database. The structure types can be broadly categorized into those favoring TM oxides, low TM occurrence, and high TM occurrence. The bond angle and solid angle distributions of Delaunay polyhedra with atoms as vertices were calculated as descriptors of the local environment for each structure type, and hierarchical categorization was conducted to categorize the distributions. TM oxides preferred a limited number of bond angle and solid angle subgroups. Use of bond angle and solid angle (sub)groups together found a particular sizeable group favored by TM oxide structure types and another by high TM occurrence structure types. Our results indicate that the anion, or lack of, plays a key role in determining the crystal structure, and structures preferred by TM oxides can accommodate a broad range of bond angles and solid angles.

Disclosure statement
No potential conflict of interest was reported by the author(s).

Funding
This study was funded by grants (No. JPMJCR17J3 and JPMJCR19J1) from CREST of the Japan Science and Technology Agency (JST). The VESTA code [28] was used to draw Figs. 1 and 2.