Ultrafast estimation of electronic couplings for electron transfer between pi-conjugated organic molecules. II

The development of highly efficient methods for the calculation of electronic coupling matrix elements between electron donor and acceptor is an important goal in theoretical organic semiconductor research. In previous work we introduced the analytic overlap method (AOM) for this purpose, which is an ultrafast electronic coupling estimator parametrized to and orders of magnitude faster than density functional theory (DFT) calculations at a reasonably small loss in accuracy. In this work we reparametrize and extend AOM to molecules containing nitrogen, oxygen, fluorine, and sulfur heteroatoms using 921 dimer configurations from the recently introduced HAB79 dataset. We find again a very good linear correlation between the frontier orbital overlap, calculated ultrafast in an optimized minimum Slater basis, and DFT reference electronic couplings. The new parametrization scheme is shown to be transferable to sulfur-containing polyaromatic hydrocarbons in experimentally resolved dimeric configurations. Our extension of AOM enables high-throughput screening of very large databases of chemically diverse organic crystal structures and the application of computationally intense non-adiabatic molecular dynamics methods to charge transport in state-of-the-art organic semiconductors, e.g., non-fullerene acceptors.

For the modelling of charge carrier transport in soft condensed media (e.g., organic and biological semiconductors) it is important to properly account for the strong thermal fluctuations of the molecules and several methods have been developed to do that, e.g. Kinetic Monte Carlo [24][25][26][27], transient localization theory [28,29] or non-adiabatic molecular dynamics (MD) simulations [30][31]. A common feature of these methods is the very large number of transfer integrals that need to be evaluated to converge charge mobilities. As a result, one needs to select an appropriate electronic coupling calculation scheme as to strike balance between computational efficiency and chemical accuracy.
The analytic overlap method for electronic coupling calculation was designed with these requirements in mind [23]. The idea behind this method is that the computationally expensive DFT calculation of charge transfer integrals can be effectively substituted by very rapid estimations of frontier molecular orbital (FMO) overlap integrals, multiplied by an appropriate scaling coefficient. The FMOs are reconstructed very efficiently in an optimized minimum Slater type orbital (STO) basis set allowing the ultrafast analytic calculation of FMO overlaps and electronic couplings for a large number of dimer configurations [32].
The parametrization of AOM relies on the compilation of a molecular training set and a series of single molecule DFT calculation for the determination of the FMOs and dimer electronic coupling calculations using an appropriate level of theory as to establish a linear approximation between overlap and charge transfer integrals. The original version of the AOM relied on planewave DFT calculations for both single molecule and dimeric configurations using a limited molecular training set comprised of small pi-conjugated molecules and C60 Accepted to J. Chem. Phys. 10.1063/5.0076555 3 fullerenes, primarily focusing on carbon-rich compounds with minimal heteroatom contribution to the FMO electronic density [23].
From a materials point of view, the AOM has been successfully utilized for the modeling of charge transfer mechanisms for homocyclic pristine and functionalized polyaromatic hydrocarbons (PAHs) for organic electronics [33][34][35][36]. Furthermore, from a theoretical and method development perspective, the AOM is at the heart of the fragment orbital-based surface hopping non-adiabatic MD simulation methodology for charge transport modeling [34], while simultaneously constituting a robust electronic couplings estimator for pi-conjugated systems [7].
Two issues limit the applicability of our previous implementation of the AOM [23].
First, reference electronic couplings were obtained with planewave DFT calculations in the gas phase, which generated a significant computational overhead for large pi-conjugated organic molecules compared to atomic orbital-based DFT calculations. Second, the chemical diversity of the training molecular dataset with regard to types of PAH heteroatoms and dimer structural variability was very limited. The focus of this work is to effectively address both of these issues. With regard to the first issue, we take advantage of the recently implemented [16] POD method [15] for reference electronic coupling calculations from DFT with atom-centered Gaussian-type orbitals (GTO). AOM is recast here to support the DFT FMO orbitals in a GTO basis as input for FMO reconstruction in a minimum Slater basis. As regards to the diversity of the chemical space covered by the AOM, we utilize the newly introduced HAB79 dataset [37] containing nitrogen, oxygen, fluorine, and sulfur heteroatoms. This database contains 79 single molecules and 921 dimer configurations, providing a wealth of data for the reparametrization of key parameters of AOM (that is, Slater decay coefficients), hence enabling ultrafast electronic coupling calculation for a plethora of organic semiconductors found in state-of-the-art organic electronics and photovoltaics applications (e.g. non-fullerene acceptors).
The revised AOM framework provides reparametrized optimal Slater decay coefficients for carbon and nitrogen chemical species and new parameters for oxygen, fluorine, and sulfur (oxygen and sulfur parameters in the original AOM implementation relied on unoptimized Clementi-Raimondi (C-R) reference values [38] and fluorine was not included).
Using the new parametrization, a universal scaling constant satisfying the linear relationship between electronic coupling and overlap integrals is established and successfully tested on a molecular dataset taken from experimentally resolved molecular crystal structures. Accepted to J. Chem. Phys. 10.1063/5.0076555 4 The layout of the paper is organized follows: all computational details are outlined in Sec. II, followed by the results in Sec. III. The conclusions of this study are presented in Sec.

II. Computational details
The AOM asserts that the electronic coupling ab between two non-orthogonal diabatic charge transfer electronic states, Ψ a and Ψ b , is, to a good approximation, proportional to the overlap integral ab between the two states, thus satisfying the linear relationship We note that a similar relation is used in Hückel theory to approximate the resonance integral.
The difference is that in the latter the resonance integral refers to the electronic interaction of atomic orbitals centered at neighboring atoms on the same molecule, whereas in AOM the coupling matrix element is between two diabatic wavefunctions centered on neighboring molecules.
In most methods for coupling matrix element calculation, including POD, the two is a primitive GTO in its cartesian form and GTO the number of GTO basis functions. Similar expansions are made for the HOMO of the acceptor or for the LUMO of donor and acceptor in case of excess electron transfer. The FMO is subsequently projected into a STO basis set only with s-and p-type angular momentum contributions: where STO the number of STO basis functions and with ; the radial part of the STO orbital: and , an orthonormal real spherical harmonic: The GTO-to-STO FMO projection is effectively reduced to the solution of the following linear system of equations For the sake of computational simplicity and efficiency, STO-GTO overlap integrals are calculated using the Gaussian expansion method by O-ohata et al. [39] using expansions of 10, 6, and 8 GTOs for 1s and 2s, 2p, 3s and 3p STO orbitals, respectively.
Defining the projection completeness as the set of STO exponential coefficients per chemical species and angular momentum channel is chosen so as to maximize 1. In order to achieve an augmented level of chemical diversity, the projection completeness optimization procedure with respect to is carried out considering an ensemble of mol different molecules. To this end, we define a GTOto-STO projection goodness metric function carried out by means of a derivative-free Nelder-Mead simplex optimization algorithm [40].
Using the optimized Slater decay coefficients, we can arrive at a set of expansion coefficients for every isolated molecule under study, thus concluding the GTO-to-STO projection procedure.
The expansion of the FMOs, Eq. 5, can be simplified further. The FMOs of piconjugated organic molecules are usually formed of linear combinations of p orbitals pointing in the direction perpendicular to the plane of pi-conjugation. Hence, we project the Slater p orbitals of each atom i participating in pi-conjugation on p-orbitals orthogonal to the plane, , , and on two p-orbitals parallel to the plane, , and , . As shown before, the expansion coefficients of the orbitals and s orbitals can be neglected to a very good approximation [23]. Hence the expansion Eq. 5 simplifies to, A similar projection is carried out for the HOMO of the acceptor. After renormalization, these orbitals are then used to calculate the overlap in an optimal, minimalistic representation of the FMOs, The AOM estimate for electronic coupling is then, where ̅ is a constant of proportion obtained by a linear fit of | ̅ ab | to | ab | values from POD DFT reference calculations. Given a set of dimeric configurations, in order to obtain optimal linear correlation between reference electronic coupling values | ab | and AOM calculated overlap integrals | ̅ ab |, a further tuning of the 2p and 3p STO exponential coefficients ̅ per chemical species is required. This optimization is carried out using the Nelder-Mead simplex algorithm, aiming to minimize the sum of squared residuals during a least-squares linear fit between | ab | and | ̅ ab | in a logarithmic scale, effectively calculating the intercept ′ log ̅ and minimizing the residual in equation As regards technical details of DFT single molecule calculations, FMOs of isolated molecules are calculated at the PBE/DZVP-GTH minimal level of theory [41,42]. Electronic couplings from scaled POD (sPOD) calculations at the PBE/DZVP-GTH level of theory were taken from the HAB79 dataset [37]. The POD couplings were scaled by a factor of 1.282, as dictated by the correlation analysis using reference ab-initio values at the minimal active space CASSCF/NEVPT2 level of theory, corresponding to a mean relative signed error of -3.3% and a mean relative unsigned error of 17.9% [37]. All DFT calculations are carried out using the CP2K package [43].

III. Results and discussion
The training of the revised AOM scheme in this work is carried out using the newly introduced HAB79 dataset: a selection of planar heterocyclic PAHs with non-degenerate FMOs, containing nitrogen, oxygen, fluorine, and sulfur heteroatoms with significant contribution to the electronic density, and capable of either n-or p-type semiconducting behavior [37].
The first step in the training procedure is dedicated to isolated molecules and aims at the determination of the optimal STO decay exponents for atomic species and orbital type and molecule-specific expansion coefficients , both optimizing the statistical distribution of the GTO-to-STO projection completeness value via the metric described in Eq. 11, with the STO decay exponents explicitly treated during the optimization procedure and the expansion coefficients derived via the solution of Eq. 9 for every set. We have carried out two types of computational procedures for . In the first procedure we group the HAB79 molecules in subsets that contain the same atom types actively participating to the FMO density and examine the landscape of the goodness metric of Eq. 11 by means of 2D grid evaluations. Doing so, we validate the global minima resolved by the Nelder-Mead simplex optimization on selected lower dimensional 2D grids. In the second procedure we carry out a simultaneous multi-dimensional optimization of for all molecules of HAB79.
In the first procedure, the HAB79 dataset is divided into 5 subsets, as depicted in  The contour diagrams of the projection goodness metric, along with projection completeness histograms utilizing the C-R and optimized STO exponential coefficients for the aforementioned HAB79 subsets are depicted in Fig. 4 and Fig. 5. As a reminder, the C-R coefficients were obtained by Clementi and Raimondi [38] by means of SCF single atom calculations for the expression of atomic orbitals using a STO basis set -hence may not be optimal for our purpose. In our previous work [23], the C-R coefficients were used for oxygen and sulfur as the optimization of these coefficients was not possible due to a lack of molecules in the previously used training set with significant contributions to the FMO density on these atom types. The optimization procedure clearly shifts all projection completeness distributions towards unity, and reduces the standard deviation relative to C-R.
In the second procedure, a simultaneous minimization using the complete HAB79 dataset (molecules 1-79) is carried out by means of a Nelder-Mead simplex, resulting in the optimized values reported in Table I.
The performance of the Nelder-Mead derivative-free optimization procedure in minimizing the projection goodness metric objective function is illustrated in Fig. 6.     After the calculation of the STO decay coefficients reported in Table I that optimize the projection procedure, we arrive at a set of STO-based FMO expansion coefficients for every molecule in the dataset. Previous work has shown that the set of coefficients derived from the projection part of this particular methodology cannot fully account for a sufficient linear relationship between overlap and electronic coupling values [23]. This is also the case for the current work, as demonstrated by the correlation plots depicted in Fig. 7.  Table I corresponds to the global minimum attained by the simplex optimization procedure.
Contour plots of the residual objective function calculated on two-dimensional grids of STO decay coefficients for HAB79 subsets are illustrated in Fig. 8, along with the simplex-derived global minimum position. The optimal linear correlation using the set ̅ between overlap and charge coupling is shown in Fig. 9 with a universal scaling constant ̅ equal to 9463 meV.
We find that that the linear correlation is excellent for large coupling values (> 100 meV) and still rather good for values down to about 10 meV, while we notice a significant spread for smaller coupling values.    Table II using scaled POD and ab-initio values for the HAB79 dataset. The MRUE of AOM relative to sPOD is reasonable considering that this is an average error over electronic couplings that span 5-6 orders of magnitude. A further breakdown of the AOM error for each order of magnitude of reference sPOD coupling is summarized in Table   III. As apparent from Fig. 7, the MRUE for couplings larger than 100 meV is very small, 8.5%, and still reasonably small for couplings between 10-100 meV, 24.7%. For couplings smaller than 10 meV, AOM appears to be less reliable. However, we would like to point out that also for DFT electronic structure calculations it becomes increasingly challenging to accurately calculate small electronic coupling values and converge them with respect to basis set and grid size, so some of the error could derive from inaccuracies of the reference calculations.
Fortunately, the electronic couplings that determine transport in organic (opto-)electronic materials are typically on the order of 10-200 meV, which is the regime that is very well captured by the present AOM parametrization. The fact that the MRUE of AOM relative to ab-initio couplings is smaller than relative to sPOD is most likely a coincidence. The small statistical deviations reported for the sulfur-based PAH testing dataset constitute a promising feature as regards the predictive capabilities of AOM calculations.

IV. Conclusions
In this work we have reparametrized the analytic overlap method (AOM) for ultrafast electronic coupling calculation, we extended the parametrization to heteroatoms commonly found in semiconducting organic molecules, namely nitrogen, oxygen, fluorine and sulfur, and we made AOM compatible with DFT electronic coupling calculations that employ Gaussiantype basis sets. The original AOM implementation relied on unoptimized values for oxygen and sulfur, taken directly from the work of Clementi and Raimondi [38], and a parametrized value for nitrogen based only on the pyrrole molecule; hence the need for (re)parametrization in this work. The good linear correlation between frontier orbital overlap and DFT electronic coupling that we reported in our first paper [23] is retained for the much larger and more diverse database investigated in this work. The couplings of the vast majority of the 921 dimers investigated here can be well described by a single linear fit, i.e. by a single constant ̅ . The transferability of the revised parametrization was tested on a set of application relevant sulfurcontaining PAHs that was not included in the parametrization. The error was not larger (in fact, smaller) than for the training set, which makes us confident that transferability is a general feature of AOM.
The major advantage of AOM is the sheer speed with which electronic couplings can be estimated as it requires only the analytic evaluation of overlap integrals using a small number of Slater type orbitals. One AOM coupling calculation on a dimer of a 152-atom molecule (e.g. a porphyrin dimer) typically takes 14 milliseconds on a single compute core, whereas the explicit POD electronic coupling calculations takes 527 seconds on 144 cores (Intel Xeon CPU E5-2630 v3 @ 2.40GHz), a speed-up of a factor of 10 4 -10 5 and a saving in computing resources of a factor of 100.
The error of AOM, while reasonably small given the large chemical space and range of coupling values covered, can be reduced by parametrizing the linear relation, i.e. the ̅ constant, specifically for a given molecule in question. This usually requires only a small number of dimer configurations, typically not more than a few tens [34,35]. Another approach to reduce the error, which could be investigated in future work, is to use state-of-the-art machine learning methods (ML) to estimate and correct the error between AOM and DFT Accepted to J. Chem. Phys. 10.1063/5.0076555 20 coupling estimates (ΔML). Compared to ML methods, AOM has the advantage that it already knows the complicated nodal shape of the FMOs, which determines electronic coupling, so this important feature no longer needs to learned. Hence, we expect that ΔML will need a much smaller number of reference data than ML models alone.
The AOM reparametrization procedure employed here is based on a well-defined derivative-free optimization scheme, the Nelder-Mead simplex method, and an appropriate choice of the objective functions. As a result, the method can be easily extended to cover additional application-relevant heteroatoms, such as other peripheral halogen atoms or PAH core boron, phosphorus, and heavy chalcogen additions -provided the existence and curation of appropriate training molecular datasets.
Finally, the new code for GTO-to-STO projection and for calculation of AOM electronic couplings is publicly available at https://github.com/oziogos/pyAOMlite, along with all reference single molecule and dimer information needed to validate the results presented in this work.

Supplementary material
Refer to the supplementary material for a full list of CCDC deposition numbers for all molecules belonging to the AOM sulfur-based PAH testing set.