TBMaLT, a flexible toolkit for combining tight-binding and machine learning

Tight-binding approaches, especially the Density Functional Tight-Binding (DFTB) and the extended tight-binding schemes, allow for efficient quantum mechanical simulations of large systems and long-time scales. They are derived from ab initio density functional theory using pragmatic approximations and some empirical terms, ensuring a fine balance between speed and accuracy. Their accuracy can be improved by tuning the empirical parameters using machine learning techniques, especially when information about the local environment of the atoms is incorporated. As the significant quantum mechanical contributions are still provided by the tight-binding models


I. INTRODUCTION
Many recent technological advancements can be attributed, at least in part, to the discovery and development of novel functional materials. Computational simulation methods have become an important tool in the field of materials research in which the ability to explore configurational space and to accurately predict material properties is of paramount importance. Ab initio methods, such as Density Functional Theory (DFT), are commonplace as they offer a comparably high degree of accuracy and provide access to the electronic structure. However, their high computational cost makes them ill-suited to tackle problems requiring large time and size scales, such as molecular overlayer formation. While classical force field methods are able to reach the required scales at a reasonable cost, they do not provide access to electronic structure and are commonly unable to capture the full complexity of the potential energy surface; particularly when metals are involved. Semi-empirical methods are commonly referred to as "bridging" The Journal of Chemical Physics ARTICLE scitation.org/journal/jcp methods due to their ability to reach size and time scales that are off-limits to traditional ab initio methods while retaining electronic structure. Unfortunately, the parameter sets upon which such methods commonly rely require a non-trivial degree of effort to create and might suffer from limited transferability for complex systems. However, by leveraging the power of machine learning (ML), one could augment existing bridging methods to improve their transferability, reduce the complexity of creating parameter sets, and even introduce reactivity. The field of machine learning has experienced steady growth over the past two decades and has recently exploded in popularity due to the accessibility afforded by frameworks, such as TensorFlow 1 and PyTorch. 2 Machine learning has been used with great success to tackle problems in a variety of fields, with chemistry, physics, and materials science being no exception. Several groups demonstrated successful and promising approaches for combining various levels of materials simulation methods with machine learning, such as in Refs. 3-9. We will omit an in-depth discussion of machine learning and its successes within the field of materials science in this paper, as detailed reviews 10-12 on the subject already exist, and the purpose of this publication is not to demonstrate any novel methods but to provide an overview about the current capabilities of the Tight-Binding Machine Learning Toolkit (TBMaLT) software package.
Applications of machine learning to materials modeling can be loosely placed into one of three categories: (i) assistive, (ii) augmentative, and (iii) substitutive. Assistive implementations are those which do not directly modify the method to which they are applied but rather fulfill an ancillary role, such as aiding the creation of new parameters. 13,14 Those which directly modify or replace parts of or add new components to their parent method are deemed to be augmentative; some examples being those replacing traditional parameterizations with neural networks or those offering alternative routes to the chemical Hamiltonian. 13,15,16 Novel ML-based approaches that are not templated off of any standard simulation method are labeled as substitutive as they are used in place of traditional methods; methods that directly predict specific properties, or groups thereof, are also included in this category. While such categorization is entirely artificial and somewhat subjective, it does highlight the notion that many approaches are based on established methods. Thus, assistive and augmentative approaches could, in theory, make use of existing codes to reduce the work required to implement them. When working with non-ML-based augmentations, one may directly modify existing codes to add new functionality and methods. However, this is not always viable for ML-based augmentations.
While novel ML-augmented methods are an attractive prospect, they can require a disproportionately large degree of effort to fully implement when compared to their non-ML counterparts. This is because the majority of production-level simulation packages were never designed to accommodate machine learning, and thus lack associated features, such as batch operability and automatic gradients. Therefore, existing code cannot easily be reused to reduce the work required, and one cannot simply generate and test a new method without first having to create and test a framework within which the new method can be used. With the Tight-Binding Machine Learning Toolkit (TBMaLT), 17 we would like to introduce an ML-capable framework that offers tested and documented, ready-to-use components to facilitate the rapid prototyping and development of new tight-binding models. It is hoped that the presence of such a framework will not only reduce the work required to implement, validate, and share new models but also aid in reducing duplicated efforts.
To keep the scope of the project to a reasonable size, the decision was made to limit the focus of TBMaLT exclusively to tightbinding type methods. Tight-binding theory [such as in Density Functional Tight-Binding (DFTB) 18,19 or in Extended Tight-Binding (xTB) 20,21 ] was chosen because it lends itself well to the ideas of machine learning as it tends to focus on speed and scalability without giving up access to the electronic band structure. It is important to note that the purpose of this body of work is not to present a novel tight-binding model but rather to highlight the existence and capabilities of the TBMaLT framework.

II. STRUCTURE AND DESIGN
The main driving force behind TBMaLT's development was the common need for a testbed within which novel, commonly machine learning-based, tight-binding methods could be implemented without having to repeatedly write and test all the associated boilerplate code. Such a framework could, in addition to helping to reduce a potential source of duplicated effort, be employed to reduce the complexity associated with parameterizing currently existing models by automating much of the fitting process. The primary design philosophy behind this project emphasizes flexibility and usability over raw computational efficiency. TBMaLT's intended use case deviates from traditional modeling packages in that it is intended to facilitate the rapid prototyping, development, and validation of new models rather than the production-level use of existing ones.
The TBMaLT framework was written in Python as it is a highly accessible language, which lends itself well to rapid prototyping, which is prolifically used in the field of machine learning. 22 Critical machine learning functionality, such as automated analytical gradients, was provided through the use of the PyTorch package. 2 PyTorch was selected over other common machine learning frameworks due to its ease of use, simple pythonic syntax, and flexible nature. TBMaLT has been written in a modular and plug-and-play manner. This reduces the complexity associated with implementing new models and keeps users from having to modify the source code to do so. This has the added benefit of allowing new components to be used beyond their initial scope as components from different models could be arbitrarily combined to create new ones.
TBMaLT supports batch operability through the use of packed padding, which involves expanding a set of n rank k arrays to a common size and concatenating them into a single rank (k + 1) array. The padding values are commonly set to zero or one, depending on usage. This allows for operations to be vectorized and optimized for graphics processing unit (GPU)/single-instruction multiple data (SIMD) execution. The low memory efficiency of this method can commonly be offset via sparse tensors. However, it should be noted that the undeveloped nature of sparse tensors in PyTorch, at the time of writing, prevents their current use. Lorentzian and conditional broadening were used to alleviate the gradient stability issues with performing eigendecompositions on systems with degenerate The Journal of Chemical Physics ARTICLE scitation.org/journal/jcp eigenspectra, such as an equilibrated CH 4 molecule. 23,24 Conversion from the generalized to the standard eigenvalue problem was achieved via Cholesky decomposition as it produced more stable gradients than Löwdin orthogonalization. Self-consistency cycles, such as the self-consistent-charge (SCC) cycle of DFTB and xTB, are then performed (if applicable to the method), during which systems are sequentially culled from the batch as they converge to prevent over-convergence. The default behavior in TBMaLT is to track the gradients during the SCC cycles using PyTorch's autograd engine, resulting in accurate gradients with respect to the input features and accurate forces. All results presented in the main paper were obtained this way. Depending on the system sizes, however, the overhead caused by the gradient tracking might become substantial. Different strategies can be applied in TBMaLT to reduce this overhead. One might calculate the converged charges (at a given value of the input feature vector) for each system outside of the purview of PyTorch's autograd engine and use those charges as input when building the charge dependent Hamiltonians (with gradient tracking) and omit any further SCC cycles. Keeping those input charges constant for several epochs, as first implemented by Yaron and co-workers, 13 helps to reduce the computational complexity associated with the gradient computation. Nevertheless, the input charges must be updated regularly (e.g., every tenth epoch as in Ref. 13) to take their change due to the changes in the input feature vector into account. Alternatively, in systems where one expects the converged charges to have only a weak dependence on the input feature vector (e.g., in mostly homogeneous systems), one might also carry out the external SCC cycles without the autograd feature for each input feature vector. In contrast to the previous approach, this yields continuous error functions without jumps, but the obtained gradients will always deviate from the exact ones due to the missing derivative of the converged charges with respect to the input feature vector. If the deviation is small, the optimization might, nevertheless, converge to a solution close to the one obtained with exact gradients (as demonstrated for rattled bulk silicon structures in Fig. S1).
In a typical workflow, a base Calculator is selected and initialized by supplying it with the requisite Feed objects. The Feed objects are optimizable entities that are responsible for supplying the necessary inputs to a given calculation. The Calculator is then tasked with interrogating these feeds and using the resulting data to compute various properties as and when they are requested. During initialization, the Calculator may also be provided with calculatorspecific settings, such as those that may pertain to charge mixing, finite temperature, and so on. Once initialized, the Calculator is given a target system, or batch thereof, and is then itself interrogated to retrieve the desired properties. These properties can then be compared to those computed via the reference method, and a loss can be calculated. The PyTorch autograd engine can then be invoked, and the resulting gradients are used to update the properties contained within some or all of the feeds. A schematic depiction of this process is provided for the SCC-DFTB case in Fig. 1. The method can be modified by selecting a different choice of Calculator or Feed objects. For example, the SCC-DFTB implementation displayed in Fig. 1 can be morphed into xTB by swapping out the Hamiltonian and overlap Feed objects. The modular nature of TBMaLT means that in many cases, the Feed objects can themselves be modified or otherwise augmented to change the nature of the method.

III. METHODOLOGY
The TBMaLT framework currently offers two different tightbinding approaches: the DFTB and the xTB methods. In the following, we give a concise summary of these two formalisms in order to supply the necessary theoretical background for the applications described in Sec. IV.

A. DFTB
The density functional tight-binding method 18,19 is an approximation to density functional theory (for a review, see Ref. 25). It starts from the Kohn-Sham ansatz, where the electron density ρ(r) is represented as a sum of independent particle-densities, where the sum runs over all one-electron states and fi is the occupation number of state ψ i . Writing the density as a deviation with respect to a reference density, ρ(r) = ρ 0 (r) + δρ(r), the total energy is expanded up to second (or sometimes third) order and is traditionally collected into three energy terms, the repulsive energy Erep, the band energy E band , and the charge transfer related second order energy E 2 . The reference density is composed as a sum of the (compressed) atomic densities ρ 0 (r) = ∑ A ρ A 0 (r) from all atoms A in the system. The last two terms in Eq. (2) are explicitly calculated with several approximations, while the repulsive energy is used to compensate for the accuracy loss due to those approximations. In the traditional DFTB approach, the repulsive energy is written as a sum of pairwise interactions, where the sum runs over all atom pairs in the system, and the two-body contributions E sp(A) sp(B) rep depend on the species of the two atoms A and B, respectively, and their distances RAB. The two-body contributions are usually obtained by a fitting procedure, requiring DFTB total energies and/or forces to reproduce corresponding reference quantities (typically originating from ab initio calculations).
The band-structure energy can be written as where the one-electron Hamiltonian H = − 1 2 ∇ 2 + V eff contains the kinetic energy and the Kohn-Sham effective potential V eff calculated at the reference density ρ 0 (r). The one-electron wavefunctions are expanded into a linear combination of atomic orbitals which leads to the generalized eigenvalue problem with Hμν = ⟨ϕ μ |H|ϕ ν ⟩ and Sμν = ⟨ϕ μ |ϕ ν ⟩ being the Hamiltonian and overlap matrices, respectively, in the basis of the atomic orbitals. The Hamiltonian matrix element Hμ A ν B between the atomic orbitals ϕ μ on atom A and ϕ ν on atom B is approximated as where εμ is the electron eigenenergy (onsite energy) belonging to orbital ϕ μ in a free atom. Due to the two-center approximation in the matrix elements between different atoms (A ≠ B), the actual Hamiltonian (and overlap) matrices can be instantaneously constructed from pre-computed distance-dependent integrals for specific orbital orientations (ss σ , sp σ , pp σ , pp π , etc.) by applying the Slater-Koster transformations. 26 One typically uses a minimal basis with maximal one polarization orbital in DFTB. In order to improve the accuracy, the basis functions and the atomic densities are obtained by solving the atomic Kohn-Sham equation for an atom in a confinement potential V conf , Often, the confinement potential is chosen to be a harmonic potential V conf (r) = (r/r 0 ) 2 and different values for the compression radius r 0 are used when calculating the compressed atomic orbitals ϕ λ and the compressed atomic density ρ 0 I (in separate calculations) for a given atom type I.
Finally, the charge fluctuation-dependent energy term is written as a Coulomb-like interaction of the gross Mulliken monopole charges on the atoms where the coupling between the charges q A and q B on atoms A and B depends on the distance RAB between the atoms and on the second derivative of the energy of the free DFT atom with respect to the charge, UI = ∂ 2 E atom I /∂q 2 I (chemical hardness). The second term in Eq. (10) is short-ranged. The Coulomb term is long-ranged, though, and needs special summation techniques (e.g., Ewald summation 27 ) when calculated for periodic systems. The charge fluctuations also contribute to the Hamiltonian as and consequently also to the resulting electronic structure. Since the Mulliken population depends on the solution of Eq. (6), a self-consistent procedure is needed to obtain the final quantities of a calculation, similar to the self-consistent field (SCF) in DFT calculations.

B. xTB
The extended tight-binding (xTB) method is rooted in the same energy expansion as DFTB but contains different energy expressions and approximations and employs a mostly atom-wise parameterization strategy. Note that the following expressions are restricted to the first iteration of the xTB method (GFN1-xTB) as TBMaLT currently implements the GFN1-xTB Hamiltonian.
The GFN1-xTB total energy reads as whereby the zeroth order energy expansion is given by a repulsion term Erep approximated by an atom-pairwise screened Coulomb potential, 28 and a dispersion term E disp given by the established DFT-D3(BJ) dispersion correction scheme. 29,30 For the first order energy, an extended Hückel-type (EHT) term E EHT is employed, where H 0 and P denote the core Hamiltonian and the density matrix, respectively. This energy term is the analog of E band in Eq. (2).

ARTICLE scitation.org/journal/jcp
Interatomic electrostatic interactions E el originate from the second and third order energy expansion and are expressed with orbitalresolved Mulliken charges q μ similar to DFTB. In fact, E 2 el closely resembles Eq. (9), replacing the Coulomb interaction with the analytical Mataga-Nishimoto-Ohno-Klopman kernel 31-33 where favg denotes the (harmonic) average of the atomic Hubbard parameters U A,l and U B,l ′ , and g is a free parameter. The third order term E 3 el includes the diagonal elements of the charge derivative of the Hubbard parameter Γμ, improving the description of highly charged systems. 34 A classical halogen bond correction E XB is added to remedy the poor description of halogen bonds due to the isotropic point charge approximation. Finally, the electronic free energy G Fermi is added. 35 This term arises from fractional occupations generated by Fermi smearing in order to handle static correlation cases. Variational minimization of the total energy expression yields the tight-binding Hamiltonian matrix elements Hμν for the generalized eigenvalue problem [Eq. (6)]. The Hamiltonian can be separated in a core Hamiltonian H 0 stemming from E EHT and a charge-dependent Hamiltonian H 1 derived from E el : H = H 0 + H 1 .
Similar to DFTB, the extended tight-binding (xTB) Hamiltonian is defined in a valence-only but partially polarized minimal basis set of spherical Gaussian-type orbitals (GTO). The GTO contraction coefficients are obtained by expanding Slater-type orbitals (STO) using the STO-nG scheme. 36 The core Hamiltonian matrix elements are obtained in an EHT-like fashion by where Sμν denotes the overlap matrix elements, X EN an electronegativity-dependent term, and F scale an elaborate scaling function. The latter depends on the angular momenta of the shells and the distance of the atomic centers and consumes the majority of the xTB parameters. The diagonal elements H μμ/νν are computed from parameterized, shell-resolved atomic level energies hμ and their local environment described by the coordination number CNμ as with the scaling factor k CN μ being an adjustable parameter. The charge-dependent Hamiltonian takes a similar form as DFTB's charge fluctuation Hamiltonian [Eq. (11)] but extends it by the third order term Correspondingly, and similar to DFTB, xTB also requires a selfconsistent procedure. The original implementation of xTB, including a more detailed description of the energy expressions, can be found in Refs. 20 and 21.

IV. RESULTS
In this section, we present the first proof-of-concept showcases demonstrating the current capabilities of the TBMaLT framework. We include both, DFTB and xTB applications, in order to emphasize the Hamiltonian-type agnostic structure of the framework. Since the applications described below are meant to serve as simple showcases only, we have chosen rather small training set sizes (which we have found to be suitable for those cases 37 ), but omitted a detailed systematic study on the influence of the training set choice on the final results.

Global and local parameter optimization
The traditional parameterization technique in DFTB involves an initial choice of atomic orbitals obtained from confined atomic calculations, which are then used to calculate the two-center approximated Hamiltonian integrals and the overlap integrals, as shown in Eqs. (5), (7), and (8). Within the TBMaLT framework, we have tested three different optimization strategies that have the potential to improve this procedure.
In the first approach, we carry out a global optimization of the compression radii of the species-dependent confinement potentials [V conf in Eq. (8)] and the onsite energies [εμ in Eq. (7)]. This way, we obtain a "global" basis set for every species adapted to the training set. This approach is similar to the traditional parameterization approach but allows for an optimized choice of the basis orbitals and the onsite energies.
The second approach is similar, but we represent the distancedependent two-center integrals of Eq. (7) and the similar overlap integrals with splines and optimize the spline coefficients directly. This method is similar to the approach followed by Yaron and co-workers 13 and yields globally optimized integral tables for the corresponding species pairs. Unfortunately, by optimizing the twocenter diatomic integrals directly, one loses the concept of a welldefined basis. This inevitably leads to inconsistencies between the obtained Hamiltonian and overlap matrices, although those can be limited by starting the optimization from a good and consistent initial guess. Furthermore, one cannot represent calculated quantities (such as electron density or eigenstates) as a function of space anymore.
In our third approach, we reintroduce the concept of the welldefined basis by predicting individual confinement radii V conf and onsite energies εμ for each atom based on its local environment. This way, the basis becomes atom-dependent instead of atom-type dependent, as in the traditional approach. The Hamiltonian and the overlap matrices can be consistently calculated within this basis, and all obtained quantities have a well-defined real space representation.

ARTICLE scitation.org/journal/jcp
In all three approaches, the optimization is driven by using the loss function where N describes the number of systems in the training dataset, and m is the number of physical properties taken into account. P DFT and P DFTB are the target physical property values from the reference DFT and the DFTB calculations, respectively, and ωj is the weight associated with a given physical property.
For the training procedure, we used a subset of the ANAKIN-ME dataset, 38 where every molecule contained one heavy atom only (ANI-1 1 ). From this subset, we have randomly selected 1000 and 400 molecules for the training and test sets, respectively, for all three optimization methods. In order to reduce the bias in our training, we have carried out three independent training runs for each optimization method, selecting the molecules of the training and test sets randomly from the ANI-1 1 dataset each time. The reference DFT data were calculated using the FHI-aims code. 39 We have used the Perdew-Burke-Ernzerhof (PBE) functional 40 with the "tight" basis set. For the local basis optimization, we have used atomcentered symmetry functions (ACSFs) 41 to represent the chemical environment of the atoms in the ML input and the random forest method as the ML algorithm. 42 The ACSFs have been implemented in TBMaLT, while Scikit-learn 43 has been used to perform the random forest predictions. (Further details about the ML settings can be found in the supplementary material.) Figure 2 shows the result of the optimizations when using the dipole moment as the only target property in the loss function (m = 1). The reported mean absolute errors (MAEs) are the averages of the MAEs calculated over the test sets in each of the three training runs. For comparison, we also report the performance when using the mio-1-1 parameter set. 19 All three optimization techniques show a significant improvement, with the local basis optimization model performing best.
A more detailed discussion of the three different optimization techniques and further use cases can be found in Ref. 37.

Training on the density of states for periodic systems
The TBMaLT framework can also treat periodic systems and allows for their inclusion into optimization and training scenarios (currently with the DFTB Hamiltonian only). This should be demonstrated by the following showcase, where DFTB parameters were trained on bulk silicon structures in order to improve the density of states (DOS) description. Unlike discrete properties, such as Mulliken charges or dipole moments, the DOS is represented by a continuous distribution in a certain energy range. Therefore, the Hellinger distance 44 was employed as a loss function to evaluate the difference between pairs of distributions as it enables learning not only on the absolute differences but also on the trends of the curves. By using a similar strategy as discussed in Sec. IV A 1, the diatomic two-center integrals were represented by splines, which were optimized in order to obtain the Hamiltonian and the overlap matrices yielding an improved DOS for the training dataset.

FIG. 2.
Mean absolute error of dipole moments from DFTB calculations with respect to reference DFT calculations. The Hamiltonian and overlap matrices of the DFTB calculations have been constructed using the mio − 1 − 1 parameter set (blue), the globally optimized basis model (yellow), the environment dependently optimized basis model (green), and the globally optimized splines of the diatomic two-center integrals (red). The values with the corresponding error bars represent averages over the test sets of three independent training runs, with the training and test set sizes of 1000 and 400, respectively.
First, we generated a database containing 50 randomly rattled silicon bulk systems with supercells of 64 atoms [an example is shown in Fig. 3(a)]. DFT calculations employing the HSE06 hybrid functional 45 with a screening parameter of 0.11 a −1 0 were carried out with the FHI-aims code 39 using the "tight" parameter set. During the training process, the siband-1-1 parameter set 46 was used as the starting point for the optimization of the spline representations of the DFTB diatomic Hamiltonian and overlap integrals, which were optimized simultaneously. In order to ensure that the DFT and DFTB DOS results are comparable, the DOS was calculated by applying identical Gaussian broadening to the eigenvalues in both cases. The same energy ranges (from −3.3 eV to +1.6 eV relative to the Fermi levels) were used to sample the DOS curves and to calculate the loss between the DFT and the DFTB results. To reduce the influence of sampling on the training data, 30 systems were randomly selected and partitioned into three distinct training sets, yielding three independent training runs. The remaining 20 unseen systems were then used to test the model. Figures 3(b) and 3(c) compare the DOS results with the DFT references on the test dataset in the three training runs before and after the ML optimization of the DFTB parameters. The ML optimization leads to more accurate results, reducing the average Hellinger and MSE losses from 10.8 and 19.4 to 6.2 and 5.4, respectively. It is also clear that the results after the ML optimization fit the DFT references better than the original DFTB calculations using the siband-1-1 parameter set.
Additional tests were also performed to evaluate the transferability of the model; i.e., whether it makes reasonable predictions for larger systems after having been trained on small ones. This was done by training models exclusively on the small 64-atom systems and evaluating them on larger 512-atom systems. In order to reduce the computational efforts, training and testing references were obtained by using the GGA-PBE functional 40 in this case. As shown in Fig. 3(d) shift of the DOS curve below the Fermi level and enhance the accuracy above the Fermi level, making a total improvement of 34% and 66% as measured by the Hellinger loss and the MSE loss, respectively.

B. xTB
This section illustrates the possibility of using other tightbinding schemes within the TBMaLT framework. For this purpose, we employ the GFN1-xTB Hamiltonian to improve its parameterization for applications on specific chemical spaces. The Hamiltonian is integrated into the TBMaLT workflow employing a GFN1-xTB adapter. 47 Prior to further analysis, it is verified that the xTB Hamiltonian reproduces the number of electrons as well as the orbital occupancies within TBMaLT.
The optimization procedure employs gradient descent with the Adam optimizer 48 and an MSE loss function. The atomic charges obtained from SCC-DFTB calculations are used as the target quantity. The corresponding DFT reference charges are obtained as described in Sec. IV A. Using the ANAKIN-ME subset ANI-1 1 with a single heavy atom (C, N, and O), Hamiltonian-dependent parameters (scaling of electronegativities k EN and shell parameters) as well as element-specific scaling parameters (elemental pair parameters k pair ) are optimized. For training purposes, we utilize 50 samples and test on 1000 samples, both containing an equal ratio of molecular species. Despite the minimal split, the chemical space of the dataset is well represented due to the choice of only a single heavy atom. Moreover, it facilitates only minor deviations from the original general-purpose GFN1-xTB parameterization and demonstrates the general robustness of the fit, being able to transfer to a multitude of samples not included in the fit. However, since the study is intended as a proof-of-principle, one should notice that the chemical variety in the employed dataset is rather limited. Figure 4 shows the performance (MAE) of the default and various reoptimized GFN1-xTB parameterizations with respect to the DFT reference values on the ANI-1 1 dataset. For training   FIG. 4. Comparison of reparameterizations of different parameters on ANI-1 1 data. Hereby, solely atomic charges are utilized as the fit target. Furthermore, the change in nominal value of dipole moments for the new parameterizations is given to demonstrate the transferability of our approach. The deviation is given by the MAE to the reference method. The optimized parameters comprise the element pair-specific scaling parameter k H− H pair (yellow), the electronegativity scaling parameter k EN (red), as well as all global Hamiltonian-related parameters of GFN1-xTB including k EN (green). In particular, the reparameterization based on electronegativities k EN shows a significant improvement over the existing default parameterization.

ARTICLE
scitation.org/journal/jcp on the atomic charges (Fig. 4, left), reparameterizations based on Hamiltonian-(red, green) and element-specific (yellow) parameters lead to a significant improvement over the initial parameterization (blue). As refitting all Hamiltonian-based parameters grants the most flexibility to the optimization, the resulting reparameterization expectedly achieves the largest improvement of ∼44%. Considering the only slightly worse performance (∼40% error decrease) of the single-parameter fit of the electronegativity scaling k EN (red), however, it becomes apparent that this parameter is most important and mainly drives the Hamiltonian fit. This is due to the fact that the original parameterization is designed to give reliable results over a wide range of chemical space. Since this study targets small molecules with limited binding motifs and simple structures, the electronic properties are optimized for individual heavy atoms. Furthermore, it becomes evident that the reparameterization based on the atomic charges also has a beneficial effect on other properties, as exemplified by the dipole moments (Fig. 4, right). This demonstrates the possibility of improving the overall accuracy for specific use cases by means of a GFN1-xTB reparameterization.

V. SUMMARY
Combining simple tight-binding models with machine learning is a very promising approach for materials simulations. Machine learning can improve the accuracy of such models significantly while having a well-defined physical model provides the advantage of an easier interpretation of the simulation results as compared to approaches, where physical quantities are predicted "directly" via machine learning without an underlying classical or quantum mechanical model. Also, the training procedure can be expected to be less involved when the physics is provided by the tight-binding model. Derivation of the ground state and excited state quantities not considered during training is straightforward in these approaches, and their transferability should be considerably better as of the direct approaches, provided the physical behavior is qualitatively well described by the chosen model.
We have started the development of the open-source software package TBMaLT, 17 which provides a toolkit for combining machine learning approaches with atomistic simulation models. Our main goal is to provide a framework for the rapid prototyping of such hybrid approaches and for testing their capabilities and limits. The toolkit has been written using the PyTorch framework, providing backpropagated gradients out of the box. TBMaLT's overall structure is designed to be physical model agnostic, opening the possibility for implementing and testing arbitrary (atom-based) approaches. Although the development only started recently, the package already offers the DFTB model (as a built-in module) and the xTB model (via a connector to an external PyTorch-based xTB implementation). Further models can be implemented or integrated easily. TBMaLT comes with extensive unit tests and a well-documented application programming interface.
We have presented a few proof-of-concept showcases, which demonstrate the viability of such hybrid approaches as well as the current capabilities of the TBMaLT framework. Our project's development follows the usual open-source development workflow and is open to external contributors. We hope that the software package will be useful for the scientific community and will help to catalyze the development of faster and more accurate materials simulation methods in the near future.

SUPPLEMENTARY MATERIAL
The supplementary material contains further details to the comparison of the approximative external SCC cycle technique with the exact gradient calculation (as discussed in Sec. II) for a rattled silicon structure and additional details about the machine learning settings used in Sec. IV A 1. Additionally, a dataset containing the geometries and the ab initio reference data used for the show case applications in Sec. IV is provided.