Can G4-like composite Ab Initio methods accurately predict vibrational harmonic frequencies?

Minimally empirical G4-like composite wavefunction theories [E. Semidalas and J. M. L. Martin, J. Chem. Theory Comput. 16, 4238–4255 and 7507–7524 (2020)] trained against the large and chemically diverse GMTKN55 benchmark suite have demonstrated both accuracy and cost-effectiveness in predicting thermochemistry, barrier heights, and noncovalent interaction energies. Here, we assess the spectroscopic accuracy of top-performing methods: G4-n, cc-G4-n, and G4-n-F12, and validate them against explicitly correlated coupled-cluster CCSD(T*)(F12*) harmonic vibrational frequencies and experimental data from the HFREQ2014 dataset, of small first- and second-row polyatomics. G4-T is three times more accurate than plain CCSD(T)/def2-TZVP, while G4-T $ _{\rm ano} $ ano is two times superior to CCSD(T)/ano-pVTZ. Combining CCSD(T)/ano-pVTZ with MP2-F12 in a parameter-free composite scheme results to a root-mean-square deviation of 5 cm $ ^{-1} $ −1 relative to experiment, comparable to CCSD(T) at the complete basis set limit. Application to the harmonic frequencies of benzene reveals a significant advantage of composites with ANO basis sets – MP2/ano-pVmZ and [CCSD(T)-MP2]/ano-pVTZ (m = Q or 5) – over similar protocols based on CCSD(T)/def2-TZVP. Overall, G4-type composite energy schemes, particularly when combined with ANO basis sets in CCSD(T), are accurate and comparatively inexpensive tools for computational vibrational spectroscopy. GRAPHICAL ABSTRACT


Introduction
In the realm of computational spectroscopy, [1] accurate prediction of molecular vibrations has long been a valuable tool in chemistry, biochemistry, and materials science.[2][3][4][5][6][7] To achieve spectroscopic accuracy, defined as an error of less than 1 cm −1 from gas-phase vibrations, [8,9] it may be necessary to venture beyond the 'gold-standard' CCSD(T) method, [10] i.e. coupled-cluster with single, double, and perturbative triple excitations, and employ higher-order correlation approaches that include quadruple excitations, such as in the CCSDT(Q) method.[11] Additionally, the slower basis set convergence [12][13][14] of the correlation energy can quickly lead to computational constraints, especially for the electronic structure of larger molecules.
In 2011, Radom and co-workers introduced the G4(MP2)-6X protocol, [33] an improved G4(MP2) variant, featuring six empirical parameters for correlation energies and another six for the high-level correction.Building on this, Chan et al. [34] shifted from Pople-to Karlsruhe-type basis sets in their G4(MP2)-XK approach.Inspired by their work, we presented an hierarchy of G4-type cWFTs [35,36], validated against the energetics of the chemically large GMTKN55 benchmark suite.[37] (general main-group thermochemistry, kinetics, and noncovalent interactions, 55 subsets) Another avenue for improving cost-effectiveness and accuracy in cWFTs is using explicitly correlated theory, [38][39][40][41] where r 12 terms that depend on the interelectronic distance, e.g. of the form [1-exp(γ/r 12 )]/γ, are added in the wave function.This inclusion accelerates basis set convergence, [42,43] with R12/F12 methods typically requiring 2-3 additional basis set cardinal numbers or "zetas" compared to conventional calculations.[38,40,41] Presently, numerous explicitly correlated composite thermochemical protocols have been reported, including the W4-F12, [44] ccCA-F12, [45] G4-m-F12, [36] and SVECV-f12 theories.[46] For harmonic vibrational frequencies it is generally accepted that valence-only CCSD(T) suffices, particularly in systems with low static correlation that are dominated by a single reference determinant.[8,47] This arises from a fortuitous error compensation between the approximate treatment of the triples term, the missing core-valence correlation, and the neglect of higher order excitations.
There has been a fair amount of work on post-CCSD(T) cWFT methods in the context of vibrational spectroscopy.We note, inter alia, the 2005 work of Heckert et al. [59,60] and Puzzarini et al. [61] on accurate geometries viz.rotational constants.Ruden et al. [62] considered quadruples and quintuples terms in CCSD(T)-based composite schemes for harmonic frequencies of HF, N 2 , F 2 , and CO, while Karton and Martin [9] applied pointwise W4 theory (and truncations thereof, as well as the enhanced W4.3 theory) to spectroscopic constants and electric properties of 28 first-and second-row diatomics, as well as several polyatomics.[9] The spectroscopic constants of formaldehyde

Computational details
All calculations were performed on the Chemfarm HPC cluster of the Faculty of Chemistry at the Weizmann Institute, mostly using the MOLPRO 2022.3 [77] electronic structure program system.Built on top of the ALASKA integral derivative package, [78] canonical MP2 [79] analytical derivatives (Ref.[80] and references therein; see Ref. [81] for the specific MOLPRO implementation) and canonical CCSD(T) [10,82] analytical first derivatives (Ref.[50] and references therein) were evaluated with nondegenerate symmetry enabled, while force constant matrices (Hessians) were evaluated semi-numerically using central differences of gradients.For verification purposes, MP2 Hessians in the same basis set were also calculated analytically [83] using Gaussian 16; [84] we found harmonic frequencies from the analytical and semi-numerical Hessians to differ by only on the order of 0.03 cm −1 , which is negligible in the context of this work.
In our study, we employed various basis set families, including the Weigend-Ahlrichs def2 family [92]: def2-SVP, def2-nZVP, and def2-nZVPP, (n = T and Q), along with their augmented alternatives with diffuse functions def2-SVPD and def2-nZVPPD (n = T and Q).[93] The combination of def2-nZVPP on hydrogen atoms and def2-nZVPPD on main group elements is denoted as def2-nZVPPD'.Among the atomic natural orbital (ANOs) basis sets pioneered by Almlöf and Taylor, [94] we chose the ano-pVnZ (n = D,T,Q,5) of Neese and Valeev,[95] as well as their aug-ano-pVnZ (diffuse function augmented) and saug-ano-pVnZ (minimally augmented) variants from the same reference.Table 1 provides a list of abbreviations for methods and basis sets used in this study.
Among the correlation consistent basis set family, we consider the cc-pVnZ and aug-cc-pVnZ basis sets (n=D,T,Q,5) [96][97][98] on hydrogen and the first row, and the (aug-)-cc-pV(n+d)Z basis sets [99] on second-row elements, which include an additional tight d function as was previously found [100,101] to be important when these elements are in high oxidation states.(In this work, the largest impact is seen for SO 2 .)Additionally, for calculations including inner-shell correlation, we employed the core-valence weighted aug-cc-pwCVnZ (n = T and Q) basis sets.[102] The shorthand haVnZ+d refers in this paper to the combination of cc-pVnZ on hydrogen with aug-cc-pVnZ on first-row atoms and aug-cc-pV(n+d)Z on second-row atoms.
Aside from the orbital basis set (OBS) employed in a standard explicitly correlated calculation with densityfitting, there are three additional auxiliary basis sets (ABS): the 'JKFit' basis set for the Coulomb and exchange integrals, the 'MP2Fit' basis set for density fitting in MP2, and the 'CABS' also known as complementary auxiliary basis set.[103,104] We utilized the cc-pVnZ-F12 [105] (n = T and Q) basis sets as OBS, along with the default cc-pVnZ-F12/JKFit and cc-pVnZ-F12/MP2Fit in MOLPRO as JKFit and MP2Fit ABSs, respectively.For CABS, we used Yousaf and Peterson's cc-pVnZ-F12/OptRI.[106] Slater-type geminal terms of the F12 form [1-exp (γ r 12 )]/γ were used with a β geminal exponent of 1.0 for both triple-and quadruple-ζ OBS, as recommended in Table V of Ref. [90] In the text below, "VnZ-F12" signifies the cc-pVnZ-F12 basis sets.
To validate the accuracy of composite schemes, we used the HFREQ2014 dataset [75] of harmonic frequencies for small molecules (Table 2).Error statistics were estimated relative to CCSD(T*)(F12*)/VQZ-F12 calculations (reference) and experimental values from ref. [75] and references therein.On a related note, Mehta et al. [110] considered the same CCSD(T*)(F12*)/VQZ-F12 reference for HFREQ2014 in their study on the performance of double-hybrid density functional theory for molecular vibrations; they also carried out CCSD(T) calculations there for comparison, but excluded some of the HFREQ2014 species such as F 2 , HNO, and CF 2 .Also, Head-Gordon and coworkers [111] recently introduced analytical second derivatives of VV10 dispersion corrected [112] containing density functionals and evaluated their predictive accuracy for harmonic frequencies across various molecular systems including those in the HFREQ2014 dataset.They concluded that while the VV10-enhanced DFT functionals offered no advantage for small-molecule vibrational spectra, but a significant improvement was seen in vibrational spectra of noncovalent complexes.
Geometries were optimized using the total electronic energy as the target function for each cWFT method, employing the numerical gradient.That was accomplished through the optg procedure [113] in MOLPRO.The optimizations were completed once the maximum gradient component was less than 10 −5 hartree/bohr, the optimization step was less than 10 −5 bohr, and the change in total energy from the previous iteration was less than 10 −11 hartree.In numerical gradients and Hessians, the default stepsize of 0.01 a.u. was used, unless otherwise noted (see Table 3 below for cases where a 0.005 a.u.value was used).The cutoffs of two-electron integrals were set to 10 −20 for screening and 10 −18 for the prefactor test.The total energy in subsequent calculations of force constant calculations was converged 4 G4-n G4-like theory with CCSD(T) contributions from def2-SVPD or def2-TZVP basis sets (n=D,T) [35] cc-G4-n G4-like theory with core-valence correlation at the MP2 Level (n=D,T) [36] G4-n-F12 explicitly correlated RI-MP2-F12-based G4-like theory [36] def2-nZVPP Weigend-Ahlrichs def2 basis set family (n=T,Q) [92] cc-pVnZ correlation-consistent basis set family for valence correlation (n=D,T,Q) [96][97][98] cc-pVnZ-F12 correlation-consistent basis set family for use in explicitly correlated calculations (n=D,T,Q) [105] ano-pVnZ atomic natural orbital basis sets (n=D,T,Q,5) [95] GMTKN55 general main-group thermochemistry, kinetics, and noncovalent interactions, 55 subsets [37] WTMAD2 weighted mean absolute deviation (type 2) [37] We provide an implementation of hfreq for automated geometry optimization and calculation of harmonic vibrational frequencies with composite energy schemes at the following Github link 'https://github.com/msemidalas/hfreq'.
hfreq is written in Python and the Git repository contains several sample files to reproduce the results of this work.In A reviewer highlighted a very recent paper by Jensen, [115] in which three methods for extrapolating vibrational frequencies are discussed.The first, 'Opt-xpol,' involves geometry optimization by minimizing the basis setextrapolated energy, followed by frequency calculation from the extrapolated Hessian at the optimized geometry: this parallels our approach in the present work and in the hfreq code.The second, 'v-xpol,' directly extrapolates vibrational frequencies from optimized geometries using two different basis sets, an approach explored earlier by Varandas [116] and Broda and colleagues.[117][118][119] The 'H-xpol' approach directly extrapolates optimized Hessians, regardless of reference geometries.Jensen's findings show that all three approaches yield similar results for small molecules using double-triple ζ extrapolation in cc-pVnZ basis sets at wB97X-D [120] and MP2 levels.However, for H-bonded complexes, 'H-xpol' yields unsatisfactory results with extrapolation from pcseg-0 and pcseg-1 basis sets, [121] as well as from pcseg-1 and pcseg-2, likely due to poor reference geometries.
3 Results and discussion

Performance of CCSD(T) for harmonic frequencies
Assessing CCSD(T) for harmonic frequencies is the first step to estimate the accuracy of composite energy schemes.
Kesharwani and Martin reported that valence-only CCSD(T) at the complete basis set limit is about 5 cm −1 as accurate as the experimental harmonic frequencies for the HFREQ2014 dataset.[75] (For the avoidance of doubt, we should stress that these experimental data are truly harmonic, obtained from fitting series expansion in the vibrational quantum numbers to many vibrational band origins.)Explicitly-correlated CCSD(T*)(F12*) with VQZ-F12 achieves a root-mean-square deviation (RMSD) of 4.7 cm −1 compared to experiment; T* denotes pointwise Marchetti-Werner scaling [108] of the triples.In a recent DFT study of harmonic frequencies, [110]  CBS refers to the CCSD(T) complete basis set limit a Using a step size of 0.005 a.u.during numerical differentiation leads to minor increases in RMSDs by 0.01-0.03cm −1 .b 33.84 and 32.15 cm −1 , respectively, if the acetylene bending frequencies are excluded.
with fairly large basis sets, such as VQZ-F12, between either the point-wise scaling T* or scaling the triples term by a constant factor (T s ) [122].
Table 3 presents the error statistics of CCSD(T) with different classes of basis sets compared to reference calculated harmonic frequencies and experimental values for HFREQ2014; the error distribution is also depicted as a 'box and whiskers plot' in Figure 2.
First of all, concerning the reference, we could have made two basically equivalent choices, as CCSD(T*)(F12*)/VQZ-F12 and pointwise CCSD(T)/haV{Q,5}Z+d extrapolation differ from each other by just 1.0 cm −1 RMS.We have selected the former throughout.(For the avoidance of doubt, doubly and triply degenerate frequencies are assigned weights of 2 and 3, respectively, in the statistics.) Second, the addition of tight d functions to the second-row aug-cc-pVnZ basis sets has a very significant effect in SO 2 for smaller n.At first sight, no similar phenomenon is seen for the ANO basis sets; however, the primitive d functions that make up the d symmetry ANOs have exponents 5.0755, 2.1833, 0.9392, 0.404, and 0.1738; hence, the high-exponent space is already adequately covered in the primitives.
Third, in both ANO and correlation consistent families, augmented basis sets have better statistics than unaugmented ones (with the exception of cc-pVDZ+d).
Fourth, among the ANO family, the fully augmented aug-ano-pVnZ basis sets have slightly better error statistics than the more economical 'minimally augmented' saug-ano-pVnZ basis sets.
Quadruple-ζ quality basis sets from all families outperform n = D and T members.The correlation-consistent haVQZ+d and the ANO set saug-ano-pVQZ yield similar RMSDs of 4.9 and 5.0 cm −1 compared to experiment.For triple-ζ and especially double-ζ, the ANO basis sets are markedly superior.Somewhat surprisingly, def2-TZVPP and def2-QZVP appear to be superior to their cc-pVnZ counterparts, and similar error reductions occur as more zetas are added, with n = Q having a small RMSD exp (5.8 cm −1 ).(We note that def2-QZVP and def2-QZVPP are equivalent for the molecules considered here, and that the step-size choice for numerical differentiation, 0.01 or 0.005 a.u., has no significant effect on the calculated frequencies.)CCSD(T) with def2-TZVP is in nearly thrice worse agreement with experiment than the complete basis set limit.
Given that def2-TZVP is used in the high-level correction [CCSD(T)-MP2] because of its lower cost in the G4-type cWFTs, any improvements of the error statistics over 'pure' CCSD(T)/def2-TZVP would make these cWFTs useful for spectroscopy.
ANO basis sets outperform similarly-sized correlation consistent basis sets with notable improvements in RMSD CBS , which are 7.2, 1.7, and 1 cm −1 over VnZ with n = D, T, and Q, respectively.The lowest errors are obtained for ano-pV5Z in CCSD(T) with an RMSD of 2.8 cm −1 relative to reference.Another possibility is that the ANO basis sets -which are better equipped than small-medium sized correlation consistent basis sets for predicting accurately harmonic frequencies [76,123,124] -may offer advantages in composite wave function schemes as discussed below.
The hypersensitivity to the basis set of the acetylene bending frequencies was first noted by Lee and coworkers [125] and analyzed in detail in Refs.[126,127] as an intramolecular BSSE (basis set superposition error) problem.Another frequency that exhibits basis set hypersensitivity is the umbrella mode of ammonia -which is a conspicuous outlier even for ano-pV5Z, less so for saug-ano-pV5Z.

Composite wave function theory approaches
Composite wave function theory has paved the way for cost-effective computations without significantly compromising accuracy.Various classes of additivity schemes have been studied previously, successfully predicting geometrical parameters, [59,60] rotational constants, [61] and vibrational frequencies.[58,62,63,70,71,128] The cost-effectiveness in these methods is achieved by combining various levels of electron correlation treatments using additive approximations, basis set extrapolations, and, when applicable, empirical corrections.By way of illustration, consider the following expression: Now, if we express MP2 as HF + E2 and CCSD(T) as HF + E2 + HLC, we can simplify it further, as E = HF/LARGE + E2/LARGE + HF/SMALL + E2/SMALL + HLC/SMALL -HF/SMALL -MP2/SMALL where SMALL and LARGE correspond to two different basis set sizes.Simplifying this equation, we get E = HF/LARGE + E2/LARGE + HLC/SMALL.Therefore, the basis set for HF is effectively the same as LARGE for the MP2 correlation contribution, ensuring there is no mismatch as the other HF contributions cancel out.
Table 4 details error statistics for harmonic frequencies in the HFREQ2014 species, comparing them to calculated and experimental data.Detailed equations of our G4-type composite schemes have been previously provided in Refs.
[ 35,36] and the top performing G4-n, cc-G4-n, and G4-n-F12 methods in prediction of reaction energetics are now validated for harmonic frequencies.Some of these approaches are parameter-free while others achieve optimal results with a maximum of two fitted parameters.In addition, we consider the performance of W1 val and W2 val theories, i.e., Weizmann-n theories [25][26][27] including only valence correlation.
It can be rightly argued that, beyond small molecules where spectral inversion is comparatively easy, fundamental frequencies are more relevant for practical applications than harmonic frequencies.However, as shown in Table 4 of Ref. [129], even for CCSD(T)/CBS simple scaling of harmonic frequencies carries an intrinsic error of about 25 cm−1, comparable to the uncertainty in hybrid DFT harmonic frequencies.The use of dual or multiple scaling factors for different frequency ranges at semi-arbitrary cutoff points (e.g., [130][131][132][133]) is a half-measure at best; second-order rotation-vibration perturbation theory (VPT2) [107] will require a semidiagonal quartic force field.Schneider and Thiel [134] as far back as 1989 (in the context of semiempirical MO theory) pointed out that all the required force constants can be obtained by finite differences (in normal coordinates) of analytical second derivatives: this may be a viable approach for the present cWFT methods, or cWFT harmonic frequencies may be combined with DFT anharmonic force fields as demonstrated by Boese and Martin for the azabenzenes.

G4-type cWFTs based on CCSD(T)/def2-TZVP
The most accurate result, using a def2 basis set for the CCSD(T) part, materializes for G4-T with RMSD CCSD(T)/CBS of 4.7 cm −1 and RMSD exp of 5.4 cm −1 .In other words, not materially different from CCSD(T) at the valence CBS limit.Fundamentally, the same result is obtained if the force constants are obtained fully numerically at CCSD(T) and analytically at MP2.The lower-cost cost approach, G4-D, based on def2-SVP in CCSD(T), is three times worse in accuracy, while in G4-D-v2 with def2-SVPD, the error statistics are cut in half.The largest errors occur in the π g and π u degenerate bending modes of acetylene, which G4-D and G4-D-v2 underestimate by 26.9 and 39.9 cm −1 , while the accurate G4-T falls short by only 0.7 cm −1 .
An important technical note for wave function calculations using basis sets augmented with diffuse functions, particularly for linear molecules, is that employing such basis sets, e.g.def2-nZVPPD (n=T, Q) for acetylene, may result in an overlap matrix S with very small eigenvalues.Gaussian implements a form of SVD (singular value decomposition) in which the eigenvectors of S with eigenvalues below a cutoff (default value: 10 −6 ) are discarded.This leads to truncation of the virtual orbital space, which may not be consistent across the surface -or even along a single normal mode displacement -and hence may cause erratic harmonic frequencies.In the present work, this occurred for the bending frequencies of acetylenes.To address this, we disabled the 'SVD screening' in Gaussian using IOp(3/32)=2; MOLPRO has no such screening in the first place, but for acetylene, def2-nZVPPD led to S eigenvalues below the default THROVL=10 −8 , and lowering THROVL brought on numerical issues that required severely tightening the integral evaluation cutoffs.No such problems were seen if the 'D' functions were retained on the heavy atoms but not on H, which we denote def2-nZVPPD ′ and applied for acetylene.
Substituting E 2/def2−{T,Q}ZVPD ′ in G4 type approaches, where diffuse functions are omitted on hydrogen atoms, has no significant effect on error statistics when compared to reference or experiment.G4-T' stands out as the most accurate with an RMSD CBS of 4.7 4 cm −1 , a mere 0.11 cm −1 above regular G4-T.Moreover, opting for def2-T,QZVPD' in G4-type approaches effectively addresses concerns related to small eigenvalues in the overlap matrix, particularly for linear molecules, without compromising accuracy.
In our initial studies on G4-like cWFTs [35,36] we did not explore a {D,T} extrapolation in MP2, like E2/def2-{SVSP,TZVPPD} +[CCSD(T)-MP2]/def2-SVSP, where def2-SVSP means no p functions on hydrogen atoms, assuming it would be less accurate than other cWFTs.We trained such a G4-D-v0 composite on the GMTKN55 dataset and found a high WTMAD2 (weighted mean absolute deviation) value of 4.77 kcal/mol compared to CCSD(T)/CBS data, extracted from the ACCDB database, [135] or higher level from our earlier work.The RMSD values for harmonic frequencies are unacceptably high, reaching 26.07 and 27.29 cm −1 compared to CCSD(T)/CBS and experimental values: this is on par with (much cheaper) hybrid DFT functionals such as B3LYP and TPSS0.[129] However, standard CCSD(T)/def2-SVSP is far less accurate than G4-D-v0, with corresponding RMSDs of 61.37 and 60.46 cm −1 .Clearly, there is no advantage in a {D,T} extrapolation and if 10 cm −1 errors are acceptable then empirical spin-scaled double-hybrid functionals, such as DSD-PBEP86-D3(BJ) [136] and revDSD-PBEP86-D4 [137], represent a much more cost-effective alternate.
What about the effects of core-core and core-valence correlation?To answer that, we consider the correlation consistent augmented aug-cc-pwCVnZ basis sets that provide the necessary radial and angular flexibility in the corevalence region.In 2018, Sylvetsky and Martin [138] showed that a awCV{T,Q}Z basis set extrapolation at CCSD(T) proved sufficient and captured the most significant part of electron correlation.Here, we follow a two-step approach to assess those effects.Firstly, we check the effects of increased basis set radial flexibility for valence correlation by replacing def2-{T,Q}ZVPPD with awCV{T,Q}Z at the MP2 level, while only correlating the valence electrons.In doing so, no material gains in accuracy are seen, but rather the reverse.The RMSD CCSD(T)/CBS increases by 2 cm −1 for cc-G4(FC)-D, 0.9 cm −1 for cc-G4(FC)-D-v2, and 0.6 cm −1 for cc-G4-T, each compared to the corresponding G4-n.
Secondly, we correlate all electrons in MP2 and the results showed the CV correlation improves the RMSD by 2.0 cm −1 in cc-G4-D relative to cc-G4(FC)-D, with the former achieving identical accuracy to G4-D.Further improvement of 2.2 cm −1 occurs with def2-SVPD in CCSD(T).The lowest RMSD of 4.25 cm −1 is found for cc-G4-T, this result represents a 1.0 cm −1 amelioration over valence-only cc-G4(FC)-T, and even surpassing G4-T by 0.3 cm −1 .
For higher accuracy regimes, we refer the reader to our recent study [58] on ground-state spectroscopic constants of diatomic molecules from post-CCSD(T) up to CCSDTQ56.We showed there that 2 cm −1 accuracy is achievable on a semi-routine basis (see Table 5 in ref. [58]), but that this requires both post-CCSD(T) valence correlation correction at least at the CCSDT(Q) Λ level and core-valence correlation corrections at the CCSD(T) level.(Including each on its own will actually make agreement worse, as valence CCSD(T) benefits from a felicitous error compensation.)We have repeated the Dunham analyses [139] from Ref. [58] without the scalar relativistic correction.The largest individual difference in ω e is seen for HCl (-4.3 cm −1 ) followed by -2.8 cm −1 for HF, but most effects are on the order of 1 cm −1 or less.Thus, the RMSD on ω e increased only mildly, from 2.10 to 2.55 cm −1 , while the effect on other spectroscopic constants was negligible: obviously, compared to 5-10 cm −1 RMSDs for more approximate methods, such an increase is entirely negligible.Needless to say, this will no longer be the case for heavy p-block compounds.
We attempted to add diagonal Born-Oppenheimer corrections at the CCSD/aug-cc-pVTZ level using the implementation [140] in CFOUR [141].While corrections may exceed 1 cm −1 for H 2 , BH, and the like, for heavier diatomics they are negligible compared to other remaining error sources.

ANO basis sets for G4-type cWFTs
ano-pVnZ basis sets are now considered in the CCSD(T) part of cWFTs for the harmonic frequencies in HFREQ2014.
The energy expressions of these cWFTs are simply derived from the sum of the total MP2 energy with a larger basis set and the higher level [CCSD(T)-MP2]/ano-pVTZ correction, without introducing empirical parameters.Using E 2/ano−pV QZ in G4-T ano -v1, we find an RMSD of 6.6 cm −1 w.r.t both reference and experiment (see Table 4 and Fig. 3).Closer agreement to CCSD(T) at the complete basis set limit is achieved with ano-pV5Z in MP2, leading to the G4-T ano -v2 variant, which shows better RMSD exp than G4-T' and G4-T by by approximately 0.3 cm −1 .
In our earlier work, [36] we found that MP2-F12-based methods, such as the parameter-free cc-G4-F12-T, yielded the lowest WTMAD2 values (∼1.0 kcal/mol) for the energetics of the GMTKN55 benchmark suite.[37] That prompts the question whether the predicted harmonic frequencies can become more accurate by substituting explicitly correlated MP2-F12 for conventional MP2 in composite energy schemes.Among the tested MP2-F12-based variants, G4-T ano -F12-v2 is the most accurate with an RMSD of 3.7 cm −1 relative to reference calculated frequencies, compared to 7 cm −1 for G4-D ano -F12-v2.Reducing the basis set size to n = T from n = Q in MP2-F12/VnZ-F12 worsens RMSD CCSD(T)/CBS by 1.2 cm −1 for G4-T ano -F12 and by 0.9 5 cm −1 for G4-D ano -F12.Clearly, combining MP2-F12 correlation with CCSD(T)/ano-pVnZ presents an attractive option for accurate vibrational frequencies in parameter-free cWFTs.The calculated frequencies can be inspected in the Supporting Information.
Finally, we note that the Weizmann-n methods, W1 and W2, lead to the lowest RMSDs relative to the calculated reference harmonic frequencies, at 1.96 and 1.15 cm −1 , respectively.There is no systematic improvement in RMSD exp values over the most accurate ANO-based method, indicating that the convergence towards CCSD(T)/CBS has been achieved.

Harmonic frequencies of benzene: Successes and limitations of cWFTs
In 1997, Martin, Taylor, and Lee [76] computed the CCSD(T) geometry and harmonic force field of C 6 H 6 , and noted that the two out-of-plane ring modes ω 4 and ω 5 exhibit a more pronounced form of the same hypersensitivity as seen for the acetylene bending frequencies [125] and traced to intramolecular BSSE in Ref. [126].(Moran et al. [127] later extended this discussion to benzene.)Limitations of available computers at the time precluded going to larger basis sets such as haVQZ, but since ANOs minimize the BSSE for a given contracted size, ANO4321 was attempted and found to be resilient than cc-pVTZ.
As extracting a full set of experimental harmonic frequencies and anharmonicity constants for such a large molecule would require a staggering number of vibrational band origins, the available 'experimental' harmonic frequencies of Miani et al. [142] are in truth semi-experimental (a term introduced in the spectroscopic realm by Jean Demaison [143]), namely, from combining experimental fundamentals with a DFT calculated quartic force field.(We note in passing that results of a 'blind challenge' on the ground-state correlation energy of benzene were recently reported.[144].) Hence benzene would appear to be a good 'proof of concept' for the application of composite WFTs to harmonic frequencies of not-so-small molecules.Owing to the high symmetry, we can actually carry out CCSD(T) and CCSD(T*)(F12*) calculations close to the basis set limit, giving us a realistic reference.Any cWFT that would reproduce the harmonic frequencies well might be a good candidate for an anharmonic force field.
Table 5 showcases the calculated harmonic frequencies using 'pure' coupled-cluster methods and their corresponding cWFTs.For the reference, we consider CCSD(T)/ano-pV5Z harmonic frequencies.Geometry optimizations were carried out in D 2h point group symmetry and the Hessian was obtained through the method of finite differences.

Timing comparison of cWFT and DFT methods
The reviewers requested a timing comparison of the present and alternative approaches.Fig. 4 depicts 'wall clock' times of selected cWFT and DFT methods for semi-numerical evaluation of harmonic frequencies obtained using MOLPRO from non-stationary geometries.All calculations ran on identical architecture nodes using 16 physical cores of an Intel Xeon Gold 5320 CPU with a maximum memory of 46.4 GB per core.
A notable speedup occurs with increasing molecular size when using cWFTs compared to 'brute force' CCSD(T)    in the largest basis set of the composite scheme, the latter achieves speedups for cyclobutadiene by a factor of 53 for G4-D' and 27 for G4-T'.For ethylene, those ratios decrease to 29 and 21, respectively.
The MP2-F12-based method G4-D ano -F12-v1 is on par with G4-D'.However, the former exploits the accelerated basis set convergence at the MP2-F12 level, resulting to a 3.55 cm −1 improvement in RMSD for HFREQ2014.

Conclusions
In this study, we have thoroughly examined the performance of various coupled-cluster composite wave function approaches (cWFT) for harmonic frequencies.Our investigation involved the development of extrapolation formulas for force constants, while also enabling geometry optimization and harmonic frequency calculations through an implementation we provide.
We have validated the top-performing composite energy schemes, based on previous evaluations for reaction energetics in Refs.[35,36] using the large GMTKN55 test suite, against the harmonic frequencies in the HFREQ2014 dataset from CCSD(T*)(F12*)/VQZ-F12 calculations and experimental data.G4-T is three times more accurate than plain CCSD(T)/def2-TZVP, while G4-T ano is twice as accurate as CCSD(T)/ano-pVTZ.Notably, ANO basis sets combined with explicitly correlated MP2-F12, such as G4-T ano -F12, show promising performance, achieving accuracy of 5 cm −1 compared to the experiment, and they are on par with the accuracy of CCSD(T) at the complete basis set limit.
Following closely was our standard G4-T approach, built upon def2-TZVP for the high level correction.Additionally, the Weizmann-n theories, W1 and W2, delivered the most accurate results when compared to the calculated reference harmonic vibrational frequencies.
The addition of diffuse functions on hydrogen does not materially help performance for neutral molecules, and in fact causes significant near-linear-dependence issues.In codes that eliminate 'near-singular' eigenvectors of the overlap matrix (i.e., those for which the eigenvalue drops below a threshold), adding superfluous basis functions in general -and diffuse functions where they are unneeded in particular -can cause discontinuities on a correlated potential energy surface as orbitals drop in and out of the virtual space.When carrying out (semi)numerical frequency calculations, this can cause erratic results, as we observed here for acetylene.
In summary, we recommend the following: • If an accuracy of 20-30 cm −1 is sufficient, or if anharmonicity's deviation from a simple scaling factor exceeds that level (and an anharmonic force field is not a practical option), then consider a DFT option such as ωB97M-V [151] or the even more economical B97M-V [152].
• If 10 cm −1 is satisfactory, an empirical double hybrid like DSD-PBEP86 or revDSD-PBEP86 may be the right choice.
• For higher accuracy within the range of 1-2 cm −1 , it is important to extend beyond CCSD(T) as well as consider relativistic effects and the impact of diagonal Born-Oppenheimer corrections, especially in the case of hydrides.

1 )FIGURE 2 .
FIGURE 2. Box-and-whisker plot for the deviations of harmonic vibrational frequencies at CCSD(T) with various basis sets from CCSD(T*)(F12*)/VQZ-F12 for the HFREQ2014 dataset.The interquartile range (IQR) is the difference between the third and first quartiles (IQR = Q 3 -Q 1 ).The upper whisker extends up to Q 3 + 1.5*IQR, while the lower whisker extends down to Q 1 -1.5*IQR, and outliers are shown outside the whiskers.The median is indicated by a red line, while a green dotted line represents the mean.The blue band indicates ±10 cm −1 .

TABLE 4 .
Root-mean-square deviations and mean absolute deviations (cm −1 ) of calculated harmonic frequencies with various composite wave function schemes from computed CCSD(T*)(F12*)/VQZ-F12 calculations (referred to as CCSD(T)/CBS) and experiment for the HFREQ2014 , H: Hessian, and E: Energy is doubly differentiated numerically.b c 1 is the extrapolation coefficient of MP2 correlation energies.

1 )FIGURE 3 .
FIGURE 3. Box-and-whisker plot for harmonic frequency deviations of composite wave function schemes from CCSD(T*)(F12*)/VQZ-F12 for the HFREQ2014 dataset.Plot description details are the same as for Fig.2.

FIGURE 4 .
FIGURE 4. Visual representation of wall clock times for selected cWFT and DFT methods.Note that the y axis is logarithmic.

TABLE 1 .
Abbreviations for methods, basis sets, and other terms in the present work

TABLE 5 .
Calculated and experimentally derived harmonic frequencies (in cm −1 ) for the benzene molecule