General embedded cluster protocol for accurate modeling of oxygen vacancies in metal-oxides

The O vacancy (Ov) formation energy, $E_\textrm{Ov}$, is an important property of a metal-oxide, governing its performance in applications such as fuel cells or heterogeneous catalysis. These defects are routinely studied with density functional theory (DFT). However, it is well-recognized that standard DFT formulations (e.g. the generalized gradient approximation) are insufficient for modeling the Ov, requiring higher levels of theory. The embedded cluster method offers a promising approach to compute $E_\textrm{Ov}$ accurately, giving access to all electronic structure methods. Central to this approach is the construction of quantum(-mechanically treated) clusters placed within suitable embedding environments. Unfortunately, current approaches to constructing the quantum clusters either require large system sizes, preventing application of high-level methods, or require significant manual input, preventing investigations of multiple systems simultaneously. In this work, we present a systematic and general quantum cluster design protocol that can determine small converged quantum clusters for studying the Ov in metal-oxides with accurate methods such as local coupled cluster with single, double and perturbative triple excitations [CCSD(T)]. We apply this protocol to study the Ov in the bulk and surface planes of rutile TiO2 and rocksalt MgO, producing the first accurate and well-converged determinations of $E_\textrm{Ov}$ with this method. These reference values are used to benchmark exchange-correlation functionals in DFT and we find that all studied functionals underestimate $E_\textrm{Ov}$, with the average error decreasing along the rungs of Jacob's ladder. This protocol is automatable for high-throughput calculations and can be generalized to study other point defects or adsorbates.


I. INTRODUCTION
Metal-oxides are a class of material with wide applications in fuel 1 and solar cells 2 , high-k dielectrics 3 , and the catalysis industry 4,5 .As the most prevalent defect in metal-oxides, controlling the concentration of O vacancies (Ovs) in these systems underpins much of the major advances to their applications.The dominant quantity determining the Ov concentration is the Ov formation energy, E Ov .
It is pivotal that E Ov can be determined accurately.The Ov concentration can change by several orders of magnitude with small (∼ 0.1 eV) changes in the E Ov at a given temperature 6 , resulting in drastic changes in thermodynamic, electronic and optical properties of the metal-oxide.For example, such changes in the Ov concentration can change an insulating oxide into a photocatalyst 7 or metal 8 .E Ov is also a measure of the reducibility of a metal-oxide system, with (thermal) reduction being a vital step in the thermochemical cycles for H 2 O and CO 2 splitting 9,10 .Similarly, E Ov has also been shown to correlate well with key catalytic properties such as the adsorption and bond activation energies of various molecules [11][12][13] on metal-oxide surfaces.
The reliable experimental determination of E Ov is very challenging as it depends sensitively on many factors 14 (e.g.Ov concentration, presence of dopants, and crystallite size).As such, E Ov has been mostly computed through electronic structure modeling, particularly with density functional theory (DFT).However, E Ov is highly sensitive to the approximation of the exchange-correlation (XC) functional in DFT, oftentimes leading to large disagreements in their predictions.For example, in the (110) rutile TiO 2 surface 15 , E Ov can vary by more than 1.5 eV -around 50% of the absolute E Ov -with similar discrepancies observed in MgO 16 and other metaloxide systems 15,17 .Additionally, DFT with the generalized gradient approximation (GGA) XC functional has been shown to be inadequate for modeling Ovs in transition metal oxides such as rutile TiO 2 , predicting that the unpaired electrons produced during Ov formation are delocalized 18 , whilst hybrid functionals which incorporate exact exchange predict localized electrons on adjacent Ti sites 19 .There is a clear need for high-accuracy and well-converged reference values of the E Ov for these systems.
The (electrostatic) embedded cluster approach 20 offers the potential to efficiently apply accurate methods to study the Ov.

arXiv:2202.04633v2 [cond-mat.mtrl-sci] 31 Mar 2022
It limits explicit quantum-mechanical calculations to only a finite-sized quantum cluster, with the electrostatic interactions from the rest of the system approximated by point charges.Over the past few decades, there have been numerous applications of this approach to the Ov in metal-oxides 21 .These studies have ranged from applying the basic electrostatic embedding that has been described [22][23][24] , to more involved setups which utilize polarizable environments [25][26][27][28] (of varying degrees of sophistication) or couples the quantum cluster to the environment self-consistently via the "perturbed cluster" approach 29 .Regardless of the approach taken, the outstanding challenge is finding a quantum cluster that is sufficiently small, or even the smallest, which allows for inexpensive modeling at the reference level of theory whilst minimizing finite-size errors to converge results to the bulk (i.e.infinite size) limit.This process is difficult because the quantum cluster can take any chemical formula (i.e.size) and for each chemical formula, there can be many possible shapes, with widely differing convergence behaviors 30,31 .
To circumvent the complexities with searching the entire size and shape space, converged clusters are normally selected from a set of clusters generated from chosen design rubrics 24,32 .The most common rubrics are to keep quantum clusters stoichiometric and spherical 16,33 .Identifying suitable quantum clusters which follow these design principles require significant time investment and manual input, paired with lots of trial-and-error.As such, only a handful of clusters are normally created, making it difficult to affirm the quality of a cluster as well as study multiple crystal systems simultaneously.Additionally, recent work 32,34 has shown that these rubrics lead to poor convergence with cluster size.
There is growing interest in approaches which can lessen the manual labor involved.These approaches range from building clusters using layers (based on unit cell multipoles 35 or coordination spheres 36 ) to using building blocks of either the unit cell 31,34,37,38 or fully-coordinated ions 37,39 .Whilst the quantum cluster series generated from these approaches have been shown to converge properties of an ideal crystal (e.g.NMR constants 34 , bandgaps 37 and optical spectra 39,40 ), their extension to the study of surfaces or point defects such as the Ov is not clear and they still suffer from important deficiencies.For example, the positioning of the building blocks in the building block approach still requires manual definition.With the layered approach, the number of atoms within each layer increases significantly compared to the previous, providing little granularity in the sizes sampled.This leads to large converged quantum clusters that are not conducive for high accuracy calculations, which typically exhibit steep system size scaling.
To summarize, a systematic and general approach for designing a quantum cluster set which provides good granularity in the sizes and shapes being sampled whilst ensuring rapid convergence is currently lacking.In this work, we propose a quantum cluster design protocol (named SKZCAM after the authors' initials) which achieves these qualities for computing accurate Ov formation energies in metal-oxides.The core enabling development is to put the control of the shape and size of a cluster into a robust and flexible framework using the radial distribution function (RDF) to divide metal cations into "shells".The O anion positions for each cluster arise naturally from the metal cation positions based on the criteria that all dangling bonds are removed from the metal cations.The result is a process that requires no manual intervention to generate clusters of high granularity for virtually any metaloxide crystal system and surface termination, whilst converging rapidly with size.
This protocol is used to obtain small converged clusters for studying the E Ov in the bulk and common surface planes of rocksalt MgO and rutile TiO 2 , two technologically relevant metal-oxide systems with properties that depend sensitively on the Ov.These clusters (involving fewer than 600 correlated electrons for all systems) enable the accurate local natural orbital (LNO-)CCSD(T) [41][42][43][44][45] method to be applied to compute E Ov .We use our computed values to benchmark common DFT XC functionals, ranging from GGAs up to doublehybrids, and find that all studied XC functionals underestimate E Ov , with this error decreasing, on average, along the rungs of Jacob's ladder.

II. COMPUTATIONAL DETAILS
We describe the embedded cluster and supercell approaches used to generate the necessary structures to evaluate E Ov for bulk rocksalt MgO and its (100) surface as well as bulk rutile TiO 2 and its (110) surface in this section.This quantity is calculated as: where The definition given in Eq. 1 computes the unrelaxed O vacancy formation energy E Ov .Whilst the use of unrelaxed Ov structures precludes direct comparison with experiment when relaxation effects are significant, it provides an upper bound to the relaxed E Ov and a valid reference when comparing DFT values to high accuracy methods at the same geometry.In any case, relaxation effects have been found to be negligible for rocksalt MgO 46,47 .Additionally, whilst this effect is significant for rutile TiO 2 48 , it is hard to quantify accurately due to a strong dependence on the spin-state (see Sec. II C) and chosen DFT XC functional (see Sec. S2 of the supplementary material), making it most appropriate to evaluate the unrelaxed E Ov for the purposes of this work.

A. Electrostatic embedded cluster calculations
The electrostatic embedded cluster approach, illustrated in Fig. 1 (a), used in this study features a quantum(-mechanical) cluster centred around the O vacancy.To model the longrange electrostatic potential from the rest of the material, this cluster is surrounded by a sphere and hemisphere of point charges of radius 30 and 40 Å for the bulk and surface respectively; formal point charges have been placed at the metal and O ion crystallographic positions.In the vicinity of the quantum cluster (< 7 Å), the positive point charges are "capped" with the effective core potential (ECP) of the corresponding metal ion, taken from the Stuttgart/Cologne group 49,50 , to avoid electron leakage from the bonded O ions at the boundary of the quantum cluster.The chosen radii of the different regions (see Sec. S3 of the supplementary material) can converge E Ov to 0.01 eV.The placement of the point charges and ECPs alongside the atoms of the quantum cluster were constructed using py-ChemShell 51 .
DFT calculations were performed in ORCA 35 version 5.0 and MRCC 52 2020, with the latter interfaced to libXC 53 .The def2-SVP, def2-TZVPP and def2-QZVPP Weigend-Ahlrichs 54 basis sets were used throughout this paper, with the standard def2-JK 55,56 fitting basis set used for Coulomb and exchange integrals.Convergence tests indicate that the def2-SVP and def2-TZVPP basis sets exhibit errors of 0.4 and 0.02 eV w.r.t. the def2-QZVPP basis set (see Sec. S4 of the supplementary material).
Localized orbital correlated wave-function theory (cWFT) calculations were performed with the LNO-CCSD(T) and local Møller Plesset perturbation theory (LMP2) implementations of Nagy et al. 44,57 in MRCC, using the "Normal" LNO thresholds.The MP2 contribution to the B2PLYP 58 XC functional was also evaluated using the LMP2 implementation of MRCC.Complete basis set (CBS) extrapolation parameters for the def2-TZVPP and def2-QZVPP pair, CBS(TZVPP/QZVPP), taken from Neese and Valeev 59 , were used for the Hartree-Fock (HF) and correlation energy components of the cWFT total energy.Oxygen basis functions were placed at the Ov site and only valence electrons were correlated.Convergence tests indicate that these settings can give accuracy to within 0.1 eV (see Sec. S4 B of the supplementary material).Deficiencies due to frozen-core or basis set size were accounted for through a correction computed on small tractable clusters featuring a "reduced frozen-core" 60 (e.g.Ne for Ti and He for Mg) and basis set involving cc-pwCVnZ 61,62 and aug-cc-pVnZ 63 on the metal and O ions respectively (see Sec. S5 of the supplementary material), which has been extrapolated 59 for n = T and Q.The def2-nZVPP-RI auxiliary basis sets 64,65 were used for the local cWFT calculations with the def2 basis sets, whilst the automatic auxiliary basis functions of Stoychev et al. 66,67 were generated for the correlationconsistent (cc) basis sets.

B. Periodic supercell calculations
Periodic supercell calculations with DFT were performed in the Vienna Ab initio Simulation Package (VASP) 68,69 .These calculations serve to define the positions for constructing embedded cluster calculations as well as to provide reference E Ov values.All systems used structures optimized at the R2SCAN 70 DFT level as these agree well with experimental lattice parameter values (see Sec. S6 of the supplementary material).The bulk rutile TiO 2 calculations employed a 192 atom (2 √ 2 × 2 √ 2 × 4) supercell and the bulk rocksalt MgO calculations used a 64-atom (2 × 2 × 2) supercell, both with a (2 × 2 × 2) Γ-centred Monkhorst-Pack k-point sampling.The (001) MgO surface and (110) TiO 2 surface calculations both employed an asymmetric five-layer slab with the top two layers allowed to relax to form the pristine surface.A p(2 × 4) and (2 × 2) supercell was used for the TiO 2 and MgO surfaces respectively, each computed with a (2 × 2 × 1) Monkhorst-Pack mesh and 12 Å of vacuum.For the TiO 2 surface, a correction to the p(2 × 6) supercell size, at the PBE 71 level, was applied for the PBE0 72 hybrid DFT calculation (see Sec. S7 of the supplementary material).An energy cutoff of 500 eV was used for all four systems, with small core projector augmented wave (PAW) potentials on the metal cations, leaving 12 and 10 valence electrons for Ti and Mg respectively.The standard PAW potential was used on the O anion.
When constructing embedded cluster systems, the optimized bulk unit cells were repeated into supercells with dimensions larger than the embedded clusters to allow for them to be cleaved out.For the surface systems, the five-layer supercell slab was concatenated (along the surface normal direction) with an additional 25 layers (taken from the bulk) before being repeated into supercells for embedded cluster calculations.

C. Spin-state of the Ov in Rutile TiO 2
The unrelaxed Ov systems of rocksalt MgO and rutile TiO 2 , in both their bulk and common surface planes, all feature a well-controlled closed-shell singlet state 26,46,47,73 .Whilst this spin state is appropriate for the MgO systems (since relaxation effects are negligible), there are significant discrepancies regarding the spin-state of the relaxed Ov structures of rutile TiO 2 within the literature, both from experimental and computational studies.
Experimental results from electron paramagnetic resonance (EPR) and infrared spectroscopy have detected that the excess electrons, believed to originate from Ovs, localize on Ti ions of the system, forming a deep band gap state [74][75][76] and pointing towards an open-shell spin state 77,78 .However these experiments also conflict with the high mobilities predicted by electrical measurements 79 , suggesting a shallow n-type donor 80 which favors delocalized electrons.
Computational studies on the Ov of rutile TiO 2 have indicated various possible spin-states, ranging from the closed-shell singlet 73,81 , to the open-shell triplet 26,82 and singlet 48,83,84 states.For the open-shell states, the formation of polarons 85 (i.e.localized electrons on Ti ions) brings additional complications.Depending on the localization site of the polarons and degree of polaronic distortion, there can be a wide range of formation energies 19,84,86 .
A recent benchmark study using CCSD(T) by Chen et al. 81 has found that the closed-shell singlet state is still the most stable state with the inclusion of relaxation effects.The validity of CCSD(T) for studying the Ov in TiO 2 was confirmed from preliminary full configuration interaction quantum Monte Carlo (FCIQMC) calculations, which reveal the single-reference character of the Ov state.However it should be noted that Chen et al. did not directly model polarons due to the large cluster sizes required 32 , which could potentially change the conclusions.We expect that the advances detailed in this work may provide the possibility to resolve these open questions/discrepancies for the TiO 2 system in future studies.

A. Quantum Cluster Design Protocol
Designing a systematic and general set of quantum clusters for metal-oxides is a complex process; we have identified and summarized the key challenges involved in Fig. 1 (b).Two highly important and interdependent factors are the size and shape of the quantum cluster.For metal-oxides, these two factors are predominantly controlled by the metal cations in the quantum cluster since these systems can be considered as formed from polyhedrons (e.g.TiO 6 octahedron for TiO 2 ) centered around the metal cations -highlighted in light blue for bulk rutile TiO 2 in Fig. 1 (b).The choice of studied quantum cluster sizes is normally a manual process, often involving large arbitrary changes in sizes along the series.It is important that this process can be made systematic, whilst sampling clusters in small increments, so that smalleven the smallest -converged clusters can be found for the property being studied, particularly when applying expensive cWFT methods.For a given size, the metal cations can take up virtually any spatial arrangement (i.e.shape) and E Ov can vary significantly depending on the chosen shape, as shown in Sec.S8 A of the supplementary material.Under the electrostatic embedding scheme, the quantum cluster can be chosen to take up any charge.Thus, for a given metal cation configuration (i.e.size and shape), there can be virtually any number of O anions; if the ratio of O to Ti ions is more (less) than the stoichiometric ratio, then a negatively (positively) charged quantum cluster is formed.Here, the same challenges with deciding the number and spatial arrangement of O anions exist as the metal cations.If any of the factors discussed above are performed in an inconsistent manner, the convergence of the quantum cluster series becomes non-monotonic as shown by the red line of the schematic graph in Fig. 1 (b).In this work, we propose the SKZCAM approach for constructing a set of quantum clusters which shows fast and systematic convergence towards the bulk limit (blue line in the schematic graph of Fig. 1 (b)).We define a rigorous framework for deciding the metal cation configurations in the quantum cluster series, with the O anion configuration naturally arising based on a separate robust rubric.
The RDF of the number of metal cations as a function of distance from the Ov is used to generate the metal cation configurations in the quantum cluster series.Using bulk rutile TiO 2 as an example in Fig. 2 (a), we see that the metal Ti cations arrange as shells (with the first three given distinct colors) of symmetry-related equidistant cations around the Ov, denoted by the gray sphere within a black circle.In an RDF plot, given at the bottom panel of Fig. 2 (b), these Ti cation shells will appear as distinct peaks.Starting from the first metal cation shell/peak found in the RDF plot, metal cation configurations of systematically increasing size can be created by adding subsequent metal cation shells, with the first six metal cation configurations for this series visualized in Fig. 2 (b).The RDF shells/peaks are completely controlled by the crystal structure, point defect site and surface termination of the system, requiring no manual input.Furthermore, it provides good granularity in the size (i.e.number) of metal cations sampled in the quantum cluster series.At the same time, a variety of shapes are studied, ranging from spherical to cuboidal, all whilst ensuring symmetry about the point defect is maintained.
The metal cations serve as the base for the subsequent O anion configuration in each cluster.It is most physical to consider only O sites which ensure at least one bond to a nearby metal cation, as shown as translucent red spheres for the quantum clusters in Fig. 2 (b).In principle, any number of these O anion sites can be used; we take the unambiguous approach of placing O anions in all shown positions, ensuring no dangling bonds on the metal cations.This is the most chemically intuitive approach because it is equivalent to fully-coordinating all of the metal cations for bulk systems.For surface systems, some metal cations at the surface are not fully-coordinated due to the nature of the surface termination.As a result, the O anion configuration is completely determined by the metal cation configuration, which is in turn controlled by the RDF of metal cations around the point defect.We note that because the ratio of O anions to metal cations exceeds the stoichiometric ratio with this choice, the resulting quantum clusters are all negatively charged.
Beyond being fully systematic and general whilst providing good granularity, this SKZCAM approach also converges rapidly with cluster size.As shown by the blue markers in Fig. 2 (c), E Ov becomes converged to within 0.05 eV of the bulk limit (illustrated by the gray error bars) for the 22 Ti ion cluster (consisting of the first ten RDF shells/peaks), with all subsequent clusters staying converged.As comparison, we have also constructed stoichiometric neutral clusters for a subset of the Ti cation configurations as shown by the red markers.These clusters are the most common type within the literature and we see a poor non-monotonic convergence towards the bulk limit, requiring clusters larger than the studied range of sizes.Notwithstanding the slow convergence, there is also significant ambiguity in constructing stoichiometric clusters for a given metal cation configuration since there are more possible O anion sites than allowed by stoichiometry.We give the E Ov for 6 possible O anion configurations in the 3 Ti ion FIG. 2. Illustration of the SKZCAM approach for designing the quantum clusters described in this work, using bulk rutile TiO 2 as an example.Around the gray O vacancy (Ov) outlined by a black circle, the Ti cations arrange as "shells" consisting of equidistant, symmetry related cations; we highlight the first three shells in (a) with unique colors.These shells appear as peaks in the radial distribution function (RDF) 4πr 2 g(r) plot of Ti cations around the Ov in (b).Starting from the first shell, clusters of systematically increasing size can be generated by incorporating subsequent shells.We give the example of the first six quantum clusters generated in (b).In our approach, we choose O anions, illustrated by translucent red spheres, such that all dangling bonds around the Ti cations are removed.This combination leads to a systematic convergence of E Ov at the PBE-DFT level towards the bulk limit with quantum cluster size, as indicated by the blue markers in (c).The red markers correspond to stoichiometric clusters created to have the same Ti cations as a subset of the SKZCAM approach clusters, but involving fewer O anions to meet the stoichiometric ratio.We observe poor non-monotonic convergence towards the bulk limit with this choice.The bulk limit in (c) was calculated from a supercell calculation described in Sec.II B, whilst embedded cluster calculations were performed with the def2-TZVPP basis set.quantum cluster (visualized in Sec.S8 B) in Fig. 2 (c) -there is a wide range in E Ov of 1.0 eV.For larger (stoichiometric) cluster sizes, there will be more possible O anion configurations; we have calculated E Ov for only one such O anion configuration at larger cluster sizes based on including O anions closest to the Ov.
Whilst the converged 22 Ti ion cluster found is already quite small, it may be beneficial to search for smaller converged clusters, particularly for performing expensive cWFT calculations.The SKZCAM approach provides a robust framework for defining the shape of a quantum cluster based on metal cation shells, which serve as the building blocks of the cluster.The 22 Ti ion cluster consists of the first ten metal cation RDF shells/peak, with the removal of the furthest (tenth) shell leading to an unconverged 20 Ti ion cluster.By systematically removing closer RDF shells from the 22 Ti ion cluster (see Sec. S9 of the supplementary material), we identify a smaller 18 Ti ion cluster formed from removing the seventh metal cation shell (see green marker in Fig. 2 (c)).Removal of any subsequent shells from this 18 Ti ion cluster leads to large changes in E Ov .

B. Converged Clusters for the Ov
The SKZCAM approach outlined in the previous section is applied to create systematic sets of quantum clusters to study E Ov at the DFT level and beyond for bulk rocksalt MgO and its (001) surface, as well as bulk rutile TiO 2 and its (110) surface in Fig. 3.
For the MgO systems and TiO 2 bulk, there is systematic convergence towards the bulk limit, approximated from a supercell calculation.We consider convergence reached at the point when E Ov starts to plateau, with the cluster located at this point being the smallest converged cluster (SCC).We find an SCC with 38 Mg ions and 17 Mg ions for MgO bulk and surface respectively.Beyond these SCC sizes, the E Ov of larger clusters are less than 0.02 eV from the bulk limit, as illustrated by the gray error bars.Although the 6 Mg ion quantum cluster for bulk MgO is within 0.02 eV of the bulk limit, this agreement is fortuitous at the DFT level because at higher levels of theory (LMP2), it differs by > 0.3 eV from larger converged cluster sizes (see Sec. S11 A of the supplementary material).For TiO 2 bulk, the SCC with 18 Ti ions shows an error of 0.05 eV w.r.t. the bulk limit, with this error decreasing slowly with cluster size subsequently.This slow convergence could arise because the electrostatic embedding setup used here does not explicitly include (long-range) polariza-FIG.3. The deviation from the bulk limit of the O vacancy formation energy (E Ov ) as a function of quantum cluster size -generated via the SKZCAM approach -for (a) MgO bulk, (b) MgO surface, (c) TiO 2 bulk and (d) TiO 2 surface.The smallest converged cluster is marked with a red circle for each system and illustrated in each panel, where the light blue, orange and translucent red spheres correspond to Ti, Mg and O ions respectively.For the MgO systems, embedded cluster calculations were performed at the def2-SVP PBE-DFT level, with a correction to the def2-TZVPP basis to enable comparison to the bulk limit (see Sec. S10 of the supplementary material).The TiO 2 systems feature explicit def2-TZVPP calculations (to ensure smoother convergence with cluster size), with PBE and PBE0 used for the bulk and surface respectively.The bulk limit results were calculated using the supercell approach with the corresponding DFT level in each panel.tion effects; more sophisticated setups 87-89 including polarizable force fields beyond the quantum cluster could potentially improve this convergence.
The series of quantum clusters for the MgO systems were generated using the approach described in Section III of adding RDF shells of increasing distance from the Ov.We do not expect for there to be smaller converged clusters (in terms of number of Mg cations) beyond the sampled series for these systems due to their high degree of ionic bonding and cubic symmetry.For rutile TiO 2 , where there is a degree of directional covalent bonding 90 and anisotropy, there may be even smaller clusters.As discussed in Sec.III A, the SKZCAM approach provides the flexibility to find these smaller converged clusters, as has been done to find the 18 Ti ion SCC in TiO 2 bulk.
The (110) rutile TiO 2 surface requires a separate discus-sion due to its complex electronic structure.As seen in Sec.S12 of the supplementary material, the "noisy" nature of its convergence arises from well-behaved odd-even oscillations in E Ov when Ti ions are added along specific crystallographic directions of the surface.Such odd-even size oscillations commonly appear in rutile TiO 2 surface [91][92][93] calculations.We find that the amplitude of these oscillations are correlated to the degree of self-interaction error in the electronic structure method, being weaker in PBE0 compared to PBE, and completely absent in methods (e.g.HF and LMP2) that do not suffer from self-interaction error.The 21 Ti ion cluster was selected as the SCC on the basis of good agreement with the bulk limit at the PBE and PBE0 levels (< 0.03 eV for the latter) as well as having reached the convergence plateau for HF and MP2 (see Sec. S11 B of the supplementary material).The gray 0.05 eV error bar in Fig. 3 (d) indicates the average error of the clusters larger than (and including) the chosen SCC.
We find evidence that the SCC determined at an appropriate DFT level also leads to converged clusters -not necessarily the smallest possible -at other levels of theory, including cWFT methods.As seen in Fig. 4 for bulk rutile TiO 2 , the SCC (predicted from PBE calculations) shows small changes that are less than 0.03 eV (indicated by the gray error bars) in E Ov compared to larger quantum clusters at all studied levels of theory, from HF to PBE0 and LMP2.We observe the same behavior in the MgO systems and TiO 2 surface (see Sec. S11 of the supplementary material).
Whilst there is good agreement between all levels of theory in the quantum clusters larger than (and including) the SCC from the DFT calculation, their behavior can vary significantly for smaller unconverged clusters.To bypass the steep scaling of cWFT methods, it is common to employ the ∆ HL LL approach within the literature 16,94,95 to produce reference quantities.Here, the difference between a high-level (HL) and a low-level (LL) theory (e.g.DFT) for a small tractable quantum cluster is added as a correction to the same low level theory computed at the bulk and basis set limits.The implicit underlying assumption is that the difference between the HL and LL methods stays the same regardless of cluster size.The differing size convergence of the various levels of theory suggests that such an assumption could exhibit uncontrolled errors if performed for arbitrarily small crystals.

C. Reference Ov Formation Energy
Recent advances in localized orbital variants of CCSD(T) (e.g.LNO-CCSD(T) 45 , DLPNO-CCSD(T) 96 , PNO-LCCSD(T) 97 , etc.) have extended the remit of coupled cluster methods.To put these advances into perspective, for the def2-QZVPP basis set on a node equipped with 72 CPU cores, a PBE and PBE0 single-point calculation on the 17 Mg ion SCC for MgO surface took 1.5 and 2.5 hours respectively, whilst this time increased to only 7 hours for LNO-CCSD(T).Such a calculation would be far outside the reach of canonical CCSD(T), with the largest studied being a cluster with 6 Mg ions and 9 O anions 16 .These developments, alongside the identification of relatively small and converged clusters, enable us to compute Ov at the local CCSD(T) level with large basis sets.The O vacancy formation energy, E Ov , computed with LNO-CCSD(T) in the O-rich limit for the four systems are 7.68 ± 0.15 eV (MgO bulk), 7.18 ± 0.15 eV (MgO (001) surface), 6.39 ± 0.15 eV (TiO 2 bulk) and 5.55 ± 0.15 eV (TiO 2 (110) surface).The decrease in E Ov moving from bulk to surface can be expected due to the lowered coordination around the O anion sites on the surface.
Error bars have been added to the LNO-CCSD(T) values reported above.These have been conservatively estimated at 0.15 eV.This estimate comprises of errors arising due to: (i) basis set size and frozen core treatment; (ii) local approximation thresholds; and (iii) quantum cluster finite size errors.Tests on smaller clusters (see Sec. S4 B of the supplementary material) shows that the (i) CBS(TZVPP/QZVPP) basis and frozen core treatment (Ar on Ti and Ne on Mg) chosen give good agreement (∼ 0.1 eV) w.r.t. a larger basis set and small frozen core (Ne on Ti and He on Mg).Based on the deviations observed in small clusters, our best estimates of E Ov were corrected for the bias due to the basis set and frozen core treatment (see Sec. S5 of the supplementary material).The (ii) LNO threshold settings chosen have been validated against canonical CCSD(T) to give small errors of < 0.04 eV (see Sec. S13 of supplementary).Finite size errors are expected to be less than 0.05 eV -the typical error found for DFT w.r.t. the bulk limit (in Fig. 3) as well as from LMP2 calculations against larger clusters (see Sec. S11 of the supplementary material).Given the fast basis set convergence and all-electron nature of the embedded cluster DFT calculations, finite size errors (0.05 eV) are expected to be the dominant source of error.

D. Comparison of DFT Functionals
On top of enabling the accurate LNO-CCSD(T) method to be applied, the embedded cluster approach allows virtually all DFT XC functionals to be applied at low cost, including double-hybrids -normally computationally inaccessible for solid-state periodic DFT codes.We use our obtained LNO-CCSD(T) reference values to evaluate the performance for a range of DFT XC functionals, with their deviation from LNO-CCSD(T) E Ov values plotted in Fig. 5.These errors, alongside the computed E Ov , for all of the studied methods are summarized in Table I.For all 4 systems, the studied XC functionals underestimate E Ov w.r.t.LNO-CCSD(T).This error decreases, on average, when using XC functionals on higher rungs, with meta-GGAs, hybrids and double-hybrids (DH) showing consistent improvement over the GGAs.The ob- served trends and variations of the XC functionals are quite consistent between the bulk and surface of the same system, but differs from one material to the next.Out of the studied DFT XC functionals, the double-hybrid B2PLYP functional shows the best agreement to the LNO-CCSD(T) reference (mean absolute error, MAE, of 0.14 eV), with all points lying within the combined DFT and LNO-CCSD(T) error bars in Fig. 5.The ωB97X 98 hybrid and SCAN 99 meta-GGA XC functionals give the next best performance, both with an MAE of 0.23 eV.In the hybrid functionals, B3LYP 100 also shows good (MAE of 0.32 eV) and consistent performance across the 4 systems.On the other hand, PBE0 as well as HSE06 101 -two common hybrid functionals for metal-oxide systems -give variable performance, having large errors (of around 0.7 eV) for the MgO systems, which lowers (to around 0.2 eV) in the TiO 2 systems.The GGAs all severely underestimate E Ov , with MAE's all exceeding 0.6 eV.The XC functionals in Fig. 5 are arranged based on decreasing MAE within their respective Jacob's ladder rungs.XC functionals which incorporate the Becke-88 (B) exchange and Lee-Yang-Parr (LYP) correlation functionals, such as BLYP and B3LYP, are one of the top performers in their respective rungs, with B2PLYP being the best overall studied XC functional, as discussed previously.
Lower-level cWFT methods such as LMP2 and LNO-CCSD are automatically generated in any LNO-CCSD(T) cal-culation and they are also compared in Fig. 5.Both LMP2 and LNO-CCSD show excellent agreement to LNO-CCSD(T), with MAEs of 0.11 and 0.19 eV respectively, and only the B2PLYP DFT XC functional has errors of a similar (small) size.As the errors of these three methods are all close to or within the error bars of the LNO-CCSD(T) values, it is not possible to ascertain which method performs better.We do find, however, that the good agreement of LMP2 appears to arise from fortuitous error cancellations, as elaborated in Sec.IV A.

IV. DISCUSSION
The two key developments of this work are: (i) a protocol for obtaining small converged clusters for performing reference calculations of the Ov at high levels of theory; and (ii) the assessment of the accuracy of XC functionals for studying the E Ov .It is important that both these developments are properly contextualized, either from physical theory or comparison to the literature.For the latter development, we will try to rationalize the observed trends in performances of the XC functionals in Sec.IV A. For the former, we will compare our LNO-CCSD(T) reference values to those in the literature in Sec.IV B.

A. Origin of DFT Underestimation
Seeking to understand the relative performance of DFT XC functionals in complex systems, such as metal-oxides, is not straightforward.In particular, E Ov is a quantity that depends on many factors.Nonetheless, we believe our results can reveal some useful insights on this property.
Systematic studies on a series of metal-oxides have shown that there is a correlation between the band gap and E Ov in these systems 48,102 , as is confirmed by the larger E Ov in MgO (with a PBE band gap of 4.83 eV in the bulk 47 ) over TiO 2 (1.88 eV PBE band gap 103 ) in our results.Physically, this correlation arises because the removal of an O atom leaves two electrons (originally occupying the O 2p band), which must redistribute by occupying an empty band from the conduction band 10 .This redistribution energy, and in turn E Ov , would then be correlated with the size of the band gap of the crystal system.
Within the same crystal system, we find that the predicted DFT band gaps provides a rational basis for understanding the underestimation of E Ov in most DFT XC functionals as well as the relative trends between the Jacob's ladder rungs.More specifically, the errors in E Ov w.r.t.reference methods should be related to the deviations of the XC functionals from the experimental or reference method band gap.Due to the presence of self-interaction error 104 (SIE), semilocal functionals (meta-GGAs and below) will underestimate the band gap, hence underestimating E Ov , with meta-GGAs normally giving improved band gaps over GGAs 105 .Hybrid functionals correct for some of the SIE via the incorporation of exact exchange, with the possibility that band gaps are overestimated 99,106 .For TABLE I. E Ov (in eV) estimates at various levels of DFT XC functional approximations for the smallest converged clusters (SCCs) of rocksalt MgO and rutile TiO 2 , in their bulk and common surface planes.The final estimate of the B2PLYP, LMP2, LNO-CCSD and LNO-CCSD(T) methods are also given.These values are calculated using the O 2 binding energy of the corresponding electronic structure level (see Sec. S1 of the supplementary material) in Eq. 1. Errors are given w.r.t. the final LNO-CCSD(T) estimate.The computational details for these calculations are provided in the text.The band gap can only serve as a general guide for the relative performance of DFT XC functionals and there are functionals which do not follow this trend, suggesting that there are other important factors which influence E Ov .The key example is the PBE0 functional in rutile TiO 2 , which underestimates E Ov despite predicting a band gap 103 of 4.05 eV that overestimates experimental electronic band gaps from photoemission experiments, typically in the range of 3.3-4.0eV [108][109][110][111] .Additionally, despite predicting band gaps that are 0.6 and 1.0 eV higher than HSE06 for bulk MgO 112,113 and rutile TiO 2 73,103 respectively, these two functionals have very similar performances (both with MAE close to 0.5 eV) across the range of systems.Despite predicting worse band gaps 114 , the improved overall performance of the SCAN functional (with MAE of 0.23 eV) over many hybrid functionals is another example.

MgO bulk
In GGAs, on top of the underestimated band gap due to the SIE, a major contribution to its errors also arises from a poor description of the binding energy of the O 2 moleculethey tend to predict strong overbinding (e.g.0.50 eV/atom for PBE).It is common within the literature to correct for this error by replacing the GGA O 2 binding energy in Eq. 1 with the experimental binding energy of 5.22 eV 115 .With this change, the underestimation is decreased significantly, with MAE decreases of the range of 0.3-0.5 eV for the GGAs (see Sec. S14 of the supplementary material).XC functionals on higher rungs are less impacted, with improvements in the MAE of less than 0.2 eV for the meta-GGAs and hybrids.We also find that the better performance of LMP2 over (the more accurate) LNO-CCSD method is fortuitous, arising from its overbinding of 0.22 eV/atom (in Sec.S1 of the supplementary material) and it exhibits a larger MAE of 0.34 eV relative to the 0.11 eV of LNO-CCSD when using the experimental binding

B. Comparison to Previous Work for MgO Bulk
As the prototypical system for studying the Ov in metaloxides, bulk MgO has been subject to several studies involving high level reference methods 16,29,47,116,117 and out of the 4 studied systems, it is the only system, to our knowledge, where E Ov has been experimentally determined 118 .Hence, bulk MgO makes for a good system to assess the accuracy of our obtained reference LNO-CCSD(T) values.
Experimental determination of E Ov can be challenging due to the many experimental factors that can influence it.For MgO bulk, the single available value of 9.29 eV was obtained from additive coloring experiments by Kappers et al. 16,118 .
These experiments involve heating MgO crystals in Mg vapor under high temperatures and pressures.According to Smakula's formula 119 , the measured optical absorption spectra at these temperatures allows for the determination of E Ov from the relative Mg vapor and Ov concentrations.As seen in Table II, this value differs by > 1.5 eV from our LNO-CCSD(T) values and other high-level methods.This discrepancy is likely attributed to the uncertainties arising in the experiment.For example, the oscillator strength obtained by Kappers et al. differs significantly (> 70 %) compared to a previous experiment 120 .Richter et al. 16 have also attributed this discrepancy to thermal equilibrium not being reached in the experiment.Additionally, some of the assumptions in Smakula's formula can be questionable for ionic solids such as MgO 119 .Beyond experimental uncertainties, some of this discrepancy can also arise due to the neglect of temperature effects in the static LNO-CCSD(T) calculations.
Given the above considerations, it is more appropriate to compare high-level references available for the E Ov in MgO bulk.Ertekin and Grossman 47 have evaluated E Ov with DMC for MgO bulk, finding the DMC value to be 0.5 eV higher than PBE.These E Ov values were computed under Mg-rich conditions, where E Ov is defined as: with E[MgO] and E[Mg] being the total energy of bulk MgO and Mg per formula unit respectively.Compared to the Orich limit in Eq. 1, this definition will not suffer from the poor O 2 binding description in E bind .If the O 2 overbinding (0.50 eV/atom) is corrected in the PBE E Ov , our LNO-CCSD(T) value also becomes 0.52 eV higher than PBE, agreeing with DMC values.∆ CCSD(T) PBE embedded cluster calculations by Richter et al. 16 have obtained an E Ov value of 6.85 eV, which is 0.83 eV smaller than our LNO-CCSD(T) estimate.In Sec.S15 of the supplementary material, we show that 0.2 eV of this difference can be attributed to differing lattice parameters and inclusion of structural relaxation around the Ov by performing ∆ LNO-CCSD(T) PBE calculations on the same quantum clusters as Richter et al.. Remaining differences could arise from the use of differing embedding environments, which our work (Fig. 4 and Sec.S11 A of the supplementary material), alongside others 81 , have shown cWFT methods to be highly sensitive to, requiring careful convergence.The reference E Ov value from the ∆ CCSD(T) PBE E Ov study was also found to be smaller than the PBE (with overbinding corrected) value by 0.09 eV, which goes against trends observed from high-level calculations on other metal-oxide systems 121,122 as well as the DMC study of Ertekin et al.. Explicit MP2 calculations on a converged cluster 29 give a E Ov of 7.13 eV, closer to our LNO-CCSD(T) value.The difference w.r.t.our LNO-CCSD(T) or LMP2 values arise due to the use of a small (double-zeta quality) basis set; larger basis sets or CBS extrapolations can increase E Ov significantly by > 0.5 eV w.r.t. a double-zeta basis set for MgO bulk (see Sec. S4 of the supplementary material).
To our knowledge, the only available high level reference calculation in the TiO 2 system comes from a DFT+GW study 123 , with the large number of electrons in Ti making ap-plication of methods such as canonical CCSD(T) highly expensive 81 .Additionally, DMC has not been applied to study the Ov in TiO 2 potentially because the inclusion of a 3d transition metal brings additional complications with the starting trial wave-function 124 and the need to validate its pseudopotentials 125,126 .

V. CONCLUSION
In this work, we discuss a systematic and general approach (named SKZCAM after the authors' initials) for designing small converged quantum clusters for studying the O vacancy with the electrostatic embedded cluster method.When combined with localized orbital correlated wave-function theory methods such as LNO-CCSD(T), this approach allows for accurate determination of the Ov formation energy (E Ov ) at a computationally tractable cost.We applied this approach to compute E Ov values for the bulk and common surface planes of rocksalt MgO and rutile TiO 2 systems.Comparison of these reference values to common DFT XC functionals shows that the studied XC functionals underestimate E Ov for all studied systems.We observe general improvements in the XC functional errors as Jacob's ladder is ascended, which we find can be correlated to the improvements in the predicted band gaps of the XC functionals.Of the XC functionals studied, the double-hybrid B2PLYP functional gives the best performance, with a mean absolute error within the error bars of the reference calculation.Other BLYP-based functionals, such as BLYP and B3LYP are also found to perform well within their respective Jacob's ladder rungs, alongside the meta-GGA SCAN and the hybrid ωB97X functionals.
Although this work focuses on the Ov formation energy, the simple and intuitive nature of the outlined protocol should allow its application to other chemical problems, including molecular adsorption, spectroscopic quantities or complex (ternary) metal-oxide systems.Furthermore, the approach can be automated, making it amenable for integration into existing high-throughput calculation frameworks.Such highthroughput frameworks 127 for point defects or molecular adsorbates are highly desirable.For example, it can be used to screen for target applications in catalysis 11,102 or to produce large reference databases to validate current XC functionals 128 .Beyond the applications, we have defined a rigorous and well-founded framework for controlling the shape of the embedded quantum clusters.This framework can be valuable in not only defining the quantum cluster in electrostatic embedded cluster approaches, but also in other embedding approaches 20,[129][130][131][132] , where multiple clusters, corresponding to different levels of theory, may have to be defined simultaneously.

SUPPLEMENTARY MATERIAL
See the supplementary material for a detailed compilation of the obtained results as well as further data and analysis to support the points made throughout the text.

S1. XC FUNCTIONAL DEPENDENCE OF O 2 BINDING ENERGIES
The O 2 binding energies quoted in Table S1 were calculated using: where   bulk and its (110) surface.The relaxation energy, E rel., is defined as: where E[D-MO] and E[r-D-MO] are the energies of the unrelaxed and relaxed defected structures, containing an Ov, respectively.We observe a wide range of around 0.45 and 0.22 eV in the relaxation energies for the bulk and surface just at the semilocal XC functional level.

S3. CONVERGENCE OF POINT CHARGE AND EFFECTIVE CORE POTENTIAL BOUNDARY LENGTH
Table S3 quotes the change in E Ov for a embedded cluster of the TiO 2 surface as the point charge (PC) and effective core potential (ECP) regions are changed in length.TiO 2 surface is used as the example as it represents the system that is most difficult to converge.We see that the changes in E Ov are all small (∼0.01 eV) for substantial changes in both the PC and ECP regions, suggesting our chosen parameters of 7 Å and 40 Å are sufficient for this system.

B. LNO-CCSD(T)
During our basis set tests, we tested three commonly used basis sets: def2, aug'-cc-pVXZ (A'VXZ) and aug'-cc-pwC'VXZ (A'C'VXZ) basis set families.For the latter two basis sets, only the O ions have been augmented with diffuse functions, with the cc-pVXZ and cc-pwCVXZ basis sets placed on the metal cation for the A'VXZ and A'C'VXZ basis sets respectively.In Figure S1, the basis set convergence of these three basis set families at the LNO-CCSD(T) level are plotted for a MgO bulk (with 6 Mg ions) and surface (with 5 Mg ions) quantum cluster -small quantum clusters obtained through the SKZCAM approach.We have also compared E Ov for small He or large Ne frozen core on the Mg cation.As shown in these tests, the use of the CBS(TZVPP/QZVPP) basis set extrapolation with the def2 basis set with a large Ne frozen core shows excellent agreement (to less than 0.1 eV) compared to the most accurate calculation: CBS(QZ/5Z) with the A'C'VXZ basis set family using a small frozen core.We have performed a similar comparison for the TiO 2 systems in Fig. S2 and obtain good agreement of around 0.1 eV as well.The quantum clusters used for the bulk and surface consisted of 3 and 2 Ti ions respectively, all obtained from the SKZCAM approach.surface for reasons discussed in Sec.III B of the main text.These calculations are expensive and to circumvent some of these computational costs, a smaller (2 × 4) supercell slab was used for the PBE0 calculations with finite size correction to the (2 × 6) supercell slab approximated by PBE.
The difference of these two numbers for PBE is added as a correction to the (2 × 4) supercell calculation of PBE0 to approximate its results at the bulk limit.

B. Ambiguity in the O anion spatial arrangement for stoichiometric clusters
For the 3 Ti ion quantum cluster in Fig. 2 of the main text, we have computed E Ov for several quantum clusters.There is the singular red point corresponding to the cluster created by the SKZCAM approach.There is no ambiguity in the O anion positions since we have used the robust rubric of removing all dangling bonds on the Ti cations.On the other hand, for a stoichiometric quantum cluster of the same size, there can be several O anion configurations for the same Ti cations and we show six such possibilities in Fig. S4.These clusters have a wide range of E Ov spanning over 1.0 eV in Fig. 2 (c) of the main text, with no intuition and easy way to find the "optimal" O anion configuration.The number of possible configurations is expected to become even larger as we go to larger quantum clusters.

S9. FINDING SMALLER CONVERGED CLUSTERS WITH THE SKZCAM APPROACH
In Sec.III A of the main text, we have described the SKZCAM approach to find small converged quantum clusters.Based on the radial parameter (i.e.including RDF shells based on distance from the Ov), we find a 22 Ti ion quantum cluster, consisting of the first ten RDF shells.From this quantum cluster, smaller converged clusters can be found by considering the removal of RDF shells closer to the Ov.In Fig. S5, we show the change in E Ov when shells, numbered by distance from Ov, are removed from the quantum cluster.We find that removal of the 7 th RDF shell leads to little change in E Ov at a substantial decrease in size to a cluster with 18 Ti ions -the smallest converged cluster described in the main text.

S10. BASIS SET CORRECTION FOR RDF SIZE CONVERGENCE IN MGO
To aid the speed of calculations and because larger basis sets suffer from linear dependencies at larger cluster sizes, we have performed size convergence calculations in MgO using the def2-SVP basis set.To allow comparison to the bulk limit, we have shifted the def2-SVP E Ov values for all of the studied quantum clusters by a correction to the def2-TZVPP level computed from their average difference for a few clusters beyond the SCC, as shown in Table S8.TABLE S8.The difference in the def2-SVP and def2-TZVPP O vacancy formation energy (E Ov ) for a few quantum clusters larger than the smallest converged cluster for MgO bulk and surface.A correction is computed from the average difference, which we then apply to the def2-SVP E Ov of all the studied clusters for the respective system in Fig. 3

S12. ODD-EVEN OSCILLATIONS IN TIO 2 SURFACE
The poor convergence of the TiO 2 surface with cluster size (Fig. 3 (d) in the main text) is a result of the presence of odd-even oscillations in E Ov as Ti ions are appended along the two crystallographic axes of the TiO 2 surface.These odd-even oscillations in E Ov are illustrated in Fig. S7.Starting from a base quantum cluster (e.g.Structure 1), we have appended additional Ti ions along the [110], [001] and [ 110] directions of the cluster to create larger clusters along those directions, as seen in Fig. S7 (a).The change in E Ov for increasing cluster size along these three directions is depicted in Fig. S7 (b).We observe large oscillation amplitudes in E Ov in the range of 0.4 and 0.2 eV for the [ 110] and [001] directions respectively, with no oscillations in the [110]   direction.

METHOD
In this section, we have performed our own calculations to understand the observed discrepancy of 0.83 eV between our LNO-CCSD(T) E Ov of 7.68 eV for MgO bulk with the value of 6.85 eV by Richter et al. 5  In Table S13, we have recomputed the E Ov from the ∆ CCSD(T) PBE approach and compared each of FIG.1.(a) Schematic of the electrostatic embedding approach for bulk rutile TiO 2 .The quantum cluster (in the orange region) is treated with the electronic structure theory of choice.It is surrounded by a spherical (hemispherical) field of point charges in the green and blue regions for a bulk (surface) system.In the vicinity (green region) of the quantum cluster, the point charges are replaced with effective core potentials to prevent spurious charge leakage out of the quantum cluster.Normally, converged quantum clusters are selected from a series of clusters constructed through chosen design rubrics.Panel (b) highlights the key challenges with designing such a series of quantum clusters for metal-oxides: (i) deciding what sizes (in terms of number of metal cations, visualized as light blue spheres) to sample; (ii) deciding how to arrange these metal cations (i.e.shape); (iii) and choosing the charge of each quantum cluster, which is largely controlled by the number of O anions (visualized as red spheres).Depending on the chosen rubrics for these three factors, convergence of the O vacancy formation energy E Ov can be either systematic or inconsistent, as indicated by the blue and red lines in the schematic graph.The points representing the 'Inconsistent approach' were taken from calculations of stoichiometric, spherical-shaped quantum clusters with inconsistent changes in the number of Ti ions (i.e.size) and spatial arrangement of O anions (i.e.charge).

FIG. 4 .
FIG.4.The variation in the O vacancy formation energy (E Ov ) with cluster size around the smallest converged cluster (SCC), containing 18 Ti ions, for bulk rutile TiO 2 at the PBE, PBE0, HF and LMP2 levels of theory.The gray 0.03 eV error bar indicates the maximum level of deviation observed across the three levels of theory for sizes beyond the SCC.All calculations were performed with the def2-TZVPP basis set.

FIG. 5 .
FIG. 5. Comparison of various DFT XC functionals, ranging from GGAs to double-hybrids, as well as LMP2 and LNO-CCSD to LNO-CCSD(T) reference calculations of the O vacancy formation energy (E Ov ) in MgO bulk, MgO surface, TiO 2 bulk and TiO 2 surface.DFT calculations were performed at the def2-QZVPP level with the cWFT and double-hybrid (DH) calculations following the same procedure (discussed in the text) as the LNO-CCSD(T) calculations.The gray error bars are included to indicate the combined 0.2 eV error of the DFT (0.05 eV from finite size errors) and LNO-CCSD(T) (0.15 eV as discussed in Sec.III C) values.

MgOTiO2
FIG. S1.Convergence of the O vacancy formation energy (E Ov ) with basis set size in (a) bulk and (b) surface MgO for the def2, A'VXZ and A'C'VXZ basis set families.A He and Ne frozen core on Mg is tested for both sets.Two point basis set extrapolations for DZ/TZ, TZ/QZ and QZ/5Z combos are also plotted for the respective basis set families.These calculations were all performed with "Tight" LNO thresholds.
FIG. S3.Illustrating the range of O vacancy formation energy (E Ov ) that can be found for different shapes for a given size of quantum cluster for both rutile TiO 2 bulk and surface systems.
FIG. S4.The quantum cluster, created through the SKZCAM approach, involving 3 Ti ions for bulk rutile TiO 2 is visualized in the red box.In the blue box, we give six possible stoichiometric clusters involving the same 3 Ti ions, but with different O anion spatial arrangements.
FIG. S5.Change in O vacancy formation energy (E Ov ) when Ti cation shells corresponding to different RDF peaks are removed from the 22 Ti ion cluster.

Figure
Figure S6 plots the deviation of E Ov w.r.t. the 21 Ti ion SCC for TiO 2 surface.For clusters larger than the SCC, there is little change in the HF, LMP2 and PBE0 levels, all to within the gray FIG. S6.Deviation in the O vacancy formation energy (E Ov ) from the 21 Ti ion SCC for quantum clusters larger and smaller than the SCC of TiO 2 surface at the PBE, PBE0, HF and LMP2 levels of theory.

[
FIG. S8.The (a) Mg 6 O 9 and (b) Mg 14 O 19 quantum clusters used to study the Ov in the work of Richter et al. 5 .
the individual terms in Eq. 3 to assess where differences may arise.The E PBE Ov [Mg 14 O 19 ] system used by Richter et al. allowed for relaxation of the Mg ions close the Ov, whilst in our work, we have not allowed for this relaxation.If no relaxation around the Ov is used, as we have done in this work, Richter et al. would obtain a E Ov of 6.97 eV compared to 6.85 eV for the relaxed cluster.Thus, relaxation effects in the Mg 14 O 19 cluster accounted for 0.12 eV in the difference w.r.t.our LNO-CCSD(T) estimate.Our calculations with LNO-CCSD(T) with "Tight" LNO thresholds, using the same A'C'VXZ basis set and frozen core treatment as Richter et al. produces a ∆ LNO-CCSD(T) PBE value of 7.73 eV, which is ∼ 0.1 eV higher than our best E Ov estimate using a large 38 Mg ion cluster in Table S12.Thus, there is still a large difference w.r.t. the values obtained by Richter et al. even if the ∆ CCSD(T) PBE method is reproduced.Considering the individual terms of Eq. 3 in both our calculation and that of Richter et al. in Table S13, we find that the the PBE E Ov value obtained from the Mg 6 O 9 and Mg 14 O 19 clusters are similar between the two studies.There is a increase of 0.08 eV for the two clusters in our study due to the use of different lattice parameters (R2SCAN in this work whilst PBE was used by Richter et al.).The main culprit for the different E Ov values between the two studies lies within the E Ov value computed with CCSD(T) or LNO-CCSD(T) on the Mg 6 O 9 quantum cluster.

TABLE II .
The O vacancy formation energy (E Ov ) in bulk MgO obtained through high-level theory methods and experiment from the literature and this study.The experiment and MP2 values were modified from their quoted values in the original papers to ensure the same definition of E Ov as this paper.The difference between the reference method and PBE is also shown: E Method E[O] and E[O 2 ] are the total energies of the O atom and O 2 molecule, both in the unrestricted triplet state.We have used the same O 2 geometry obtained from R2SCAN structural optimizations, with the O atoms separated by 1.207 Å.

TABLE S1 .
O 2 binding energy predicted by DFT XC functionals as well as at the B2PLYP, LMP2, LNO-CCSD and LNO-CCSD(T) correlated wave-function theory (cWFT) levels.DFT calculations were performed at the def2-QZVPP level.cWFT methods (including B2PLYP) were computed through a CBS(TZVPP/QZVPP) extrapolation with a 1s frozen core on the O atoms.
Table S2 shows the change in relaxation energy for different semilocal XC functionals in TiO 2

TABLE S2 .
Relaxation energy of the Ov defect in the closed-shell singlet state in TiO 2 bulk and its(110)

TABLE S3 .
Change in O vacancy formation energy (E Ov ) as point charge (PC) and effective core potential (ECP) regions of an embedded cluster are changed.Values are computed at the PBE-DFT level with a def2-SVP basis set.The quantum cluster region consists of 31 Ti ions, taken from the quantum cluster series Our basis set tests on the smallest converged clusters (SCCs) of the four studied systems in TableS4shows that at the PBE0 level, def2-TZVPP and def2-QZVPP E Ov values are within 0.02 eV, indicating that they are both converged.def2-SVP shows large differences up to 0.3 eV from the def2-QZVPP basis set.

TABLE S4 .
Convergence of the O vacancy formation energy (E Ov ) with basis set size in the SCCs of rocksalt MgO and rutile TiO 2 , in their bulk and surface, for the def2-SVP, def2-TZVPP and def2-QZVPP basis sets.Calculations were performed at the PBE0 level.

TABLE S5 .
E Ov difference (in eV) at the LNO-CCSD(T) level moving from CBS(TZVPP/QZVPP) with large frozen core (Ar for Ti and Ne for Mg) to CBS(A'C'VTZ/A'C'VQZ) with small frozen core (Ne for Ti and He for Mg) for small MgO and TiO 2 quantum clusters.For some clusters, such as those consisting of the first RDF shell in MgO surface and TiO 2 bulk, we have not computed this difference because these

TABLE S6 .
Lattice parameters for the conventional unit cells of rutile TiO 2 (a, c and u) and rocksalt MgO (a) predicted by LDA, PBE, R2SCAN and HSE06 exchange-correlation functionals in DFT compared to experiment.Calculations were performed in VASP with an 800 eV energy cutoff with 9 × 9 × 14 and 9 × 9 × 9 k-point meshes for TiO 2 and MgO respectively.R2SCAN gives the best agreement of the lattice parameter out of all the studied functionals.

Table
S7 lists the bulk limit values used in Fig.3of the main text.Other than TiO 2 surface, the other systems only required PBE bulk limit values.PBE0 calculations were performed for TiO 2

TABLE S7 .
O vacancy formation energy (E Ov ) computed for rocksalt MgO and rutile TiO 2 , for both their bulk and surface forms.The E Ov values were obtained using the corresponding DFT O 2 molecular binding energy from TableS1.
2, both in its bulk and common surface plane is plotted.All of the studied quantum clusters are negatively charged with O anions placed to ensure no dangling bonds on the Ti cations, with the only difference being different spatial arrangements of the Ti cations.These E Ov are computed at the def2-SVP PBE-DFT level and we find that there can be a wide range of over 0.3 and 0.8 eV for the bulk and surface respectively.
of the main text.Ov at the LMP2 level as the quantum cluster size is increased for MgO bulk and surface respectively.For both of these systems, the smallest converged cluster (SCC) obtained at the PBE-DFT level, highlighted in bold, has errors w.r.t. the largest studied cluster of less than 0.05 eV.LNO-CCSD(T) results are also shown for the MgO surface in TableS10and we observe a similar size convergence behavior as at the LMP2 level.

TABLE S9 .
Change in O vacancy formation energy (E Ov ) with cluster size -generated using the SKZ-CAM approach -at the LMP2 level for bulk MgO.Errors were calculated w.r.t. the largest computationally tractable cluster.Calculations were performed in MRCC 2020 at the def2-TZVPP level with "Normal"

TABLE S10 .
Change in O vacancy formation energy (E Ov ) with cluster size -generated using the SKZ-CAM approach -at the LMP2 and LNO-CCSD(T) level for the (001) MgO surface.Calculations were performed in MRCC 2020 at the def2-TZVPP level with "Normal" LNO threshold settings.

TABLE S13 .
The ∆ CCSD(T) PBE computed values of E Ov from the supplementary material of Richter et al. as well as explicit calculations on the same quantum clusters which we performed in this study.We have shown above that 0.2 out of the 0.8 eV difference between our best LNO-CCSD(T) estimate and the ∆ CCSD(T) PBE estimate of Richter et al. arises due to differences in lattice parameter and inclusion of relaxation effects by Richter et al..The remaining 0.6 eV gap could arise from differing electrostatic embedding environments (e.g. point charge and ECP regions) between the two studies.Our work and others 6 have indicated that cWFT methods are highly sensitive to the electrostatic embedding environment compared to DFT and this requires careful convergence, such as those in TableS9or Fig.4of the main text.