Accurate, Affordable, and Generalisable Machine Learning Simulations of Transition Metal X-ray Absorption Spectra using the XANESNET Deep Neural Network

The affordable, accurate, and reliable prediction of spectroscopic observables plays a key role in the analysis of increasingly-complex experiments. In this Article, we develop and deploy a deep neural network (DNN) – XANESNET – for predicting the lineshape of first-row transition metal K-edge X-ray absorption near-edge structure (XANES) spectra. XANESNET predicts the spectral intensities using only information about the local coordination geometry of the transition metal complexes encoded in a feature vector of weighted atom-centred symmetry functions (wACSF). We address in detail the calibration of the feature vector for the particularities of the problem at hand, and we explore the individual feature importances to reveal the physical insight that XANESNET obtains at the Fe K-edge. XANESNET relies on only a few judiciously-selected features – radial information on the first and second coordination shells suffices, along with angular information sufficient to separate satisfactorily key coordination geometries. The feature importance is found to reflect the XANES spectral window under consideration and is consistent with the expected underlying physics. We subsequently apply XANESNET at nine first-row transition metal (Ti–Zn) K-edges. It can be optimised in as little as a minute, predicts instantaneously, and provides K-edge XANES spectra with an average accuracy of ca. ± 2–4% in which the positions of prominent peaks are matched with a > 90% hit rate to sub-eV (ca. 0.8 eV) error.


I. INTRODUCTION
Wherever there are valuable data to be predicted, processed, labelled, or mined, one is guaranteed to find machine learning models working autonomously and leveraging recent advances in the accessibility of hardware and software optimised for the task at hand.][46][47][48][49][50][51][52][53][54][55] It ought to be of no great surprise that spectroscopyalready in renaissance following fast-paced developments in methodology and instrumentation, especially at highbrilliance light sources [56][57][58][59][60] should also be simultaneously a) Electronic mail: conor.rankine@ncl.ac.uk b) Electronic mail: tom.penfold@ncl.ac.uk transformed by machine learning. 61Indeed, the two are a natural pairing; machine learning is similarly grounded in linear mathematics (e.g.least-squares regression) and probability (e.g.maximum-likelihood parametric estimation) -concepts that are familiar to experimental spectroscopists.With the popularity of emergent spectroscopies on an upward trajectory, resulting increasingly in situations where new methods and new users are brought together, machine learning offers a route to affordable and accurate "out-of-the-box", "limitedexpertise-required" analyses.
The prediction of spectroscopic observables -a paradigmatic "forward" mapping -is a central objective of computational chemistry for spectroscopists as it serves as a conduit between experiment and theory.Achieving a detailed understanding of the properties of a molecule/material on the atomic level via simulations is often the key to understanding and explaining experimentally-observed phenomena; ultimately, it is also the key to harnessing them in practical applications.The challenge lies in making the calculations capable of capturing satisfactorily the complexity of the phenomena while simulta-neously accurate, affordable, and generally-applicable enough to appeal to users.It transpires -unsurprisingly -that this is a tall order indeed!4]104,105 The latter approach, which we adopt in this Article and elsewhere where we have worked with machine learning models for XAS in theoretical 71 and practical 73,74 settings, circumvents the formidable challenge of predicting the huge number of resonances around the Xray absorption edge. 106Sitting alongside the well-developed theory for XAS (e.g.multiple scattering theory, multiplet theory, and Bethe-Salpeter k-space approaches, plus extensions of popular ab initio quantum chemical strategies), 106 machine learning models for fast "forward" XAS mappings are well placed to unlock affordable analyses in particularly challenging cases, e.g.0][121][122][123][124] In these cases, many configurations need to be sampled to simulate XAS with even qualitative accuracy, but the time-and resource-intensiveness of the individual computational calculations presently makes such treatments challenging. 106n this Article, we build on our earlier proof-of-principle work in Ref.
71 to develop and deploy a deep neural network (DNN) 125 -XANESNET (Fig. 1) -for predicting the lineshape of first-row transition metal K-edge X-ray absorption near-edge structure (XANES) spectra.XANESNET predicts the K-edge XANES spectral intensities using only information about the local coordination geometry of the transition metal complexes.We address in detail the calibration of the feature vector that encodes this information for the particularities of the problem at hand, and we explore the individual feature importances to reveal the physical insight that XANESNET provides at the Fe K-edge.We subsequently transfer XANESNET to nine first-row transition metal (Ti-Zn) K-edges, where we benchmark predictive power and performance.

A. Datasets
Our reference datasets comprise X-ray absorption site geometries ("samples") of first-row transition metal (Ti-Zn) complexes harvested from the transition metal Quantum Machine (tmQM) dataset. 126,127K-edge XANES spectra ("labels") for these structures were calculated using multiple scattering theory (MST) as implemented in the FDMNES 128,129 FIG. 1.A schematic of the XANESNET DNN and workflow detailed in this Article.The local geometries around first-row transition metal X-ray absorption sites (I; "samples, Section II A) are inputs, and the corresponding theoretically-calculated K-edge XANES spectra (II; "labels", Section II C), are outputs.The samples are encoded as descriptive features vectors (III; Sections II B 2) and associated with their labels to construct reference datasets from which the the DNN (IV, Section II B 1) discovers a "forward" structure-tospectrum mapping via iterative optimisation of the internal weights (V).We start in familiar territory at Fe K-edge, and then extend the DNN across the first row of transition metals (Ti-Zn; VI).package (Section II C).We have developed nine independent reference datasets, one for each first-row transition metal (Ti-Zn) X-ray absorption edge; the number of samples contained in the reference datasets scales from as few as ca.1100 (V) to ca. 8660 (Ni).A summary of the number of samples contained in the reference datasets is given in the SI (Table S1).
We have made the reference datasets publicly available (see our Data Availability Statement for details).
250 samples from each reference dataset were isolated at random to form "held-out" testing datasets (evaluated postoptimisation only; Section III D).The remaining samples comprised the training and validation datasets used during optimisation (Sections III A-III C).The training and validation subsets were constructed "on-the-fly" throughout via repeated K-fold cross validation with five repeats and five folds, i.e. a five-times-repeated 80:20 split.

Architecture
The architecture of the XANESNET DNN used in this Article is based on the deep multilayer perceptron (MLP) model and comprises an input layer, two hidden layers, and an output layer.All layers are dense, i.e. fully connected, and each hidden layer performs a nonlinear transformation using the rectified linear unit (relu) activation function.The input layer comprises N neurons (to accept a feature vector of length N encoding the local environment around an X-ray absorption site; Section II B 2), the hidden layers each comprise 512 neurons, and the output layer comprises 226 neurons from which the discretised K-edge XANES spectrum is retrieved after regression, i.e.XANESNET is a multi-output MLP with each output neuron corresponding to the spectral intensity at a given energy gridpoint.The architecture of the XANESNET DNN is The internal weights, W, are optimised via iterative feedforward and backpropagation cycles to minimise the empirical loss, J(W), defined here as the mean-squared error (MSE) between the predicted, µ predict , and target, µ target , K-edge XANES spectra over the reference dataset, i.e. an optimal set of internal weights, W * , is sought that satisfies argmin W (J(W)).
Gradients of the empirical loss with respect to the internal weights, δ J(W)/δ W, were estimated over minibatches of 32 samples and updated iteratively according to the Adaptive Moment Estimation (ADAM) 130 algorithm.The learning rate for the ADAM algorithm was set to 1 × 10 −4 .The internal weights were initially set according to the He 131 uniform distribution.Unless explicitly stated in this Article, optimisation was carried out over 512 iterative epochs.
Regularization was implemented to minimize the propensity of overfitting; batch standardization and dropout were applied at each hidden layer.The probability, p, of dropout was set to 0.25.
The XANESNET DNN is programmed in Python 3 with the TensorFlow 132 /Keras 133 API and integrated into a Scikit-Learn 134 (sklearn) data pre-and post-processing pipeline via the KerasRegressor wrapper for Scikit-Learn.The Atomic Simulation Environment 135 (ase) API is used to handle and manipulate molecular structures.The code is publicly available under the GNU Public License (GPLv3) on GitLab. 136

Featurisation
The local environments around X-ray absorption sites are encoded via dimensionality reduction using the weighted atom-centered symmetry function (wACSF) descriptor of Gastegger and Marquetand et al. 137 which builds on top of the generalised ACSF descriptor introduced by Behler 138,139 to overcome the unfavourable scaling as the number of atom types in the dataset grows.The recent review by Behler in Ref. 140 is strongly recommended to the unfamiliar reader.The wACSF descriptor (or "feature vector", G i ) for an arbitrary absorption site, i, is constructed via concatenation of a "global" (G 1 ) wACSF, n radial (G 2 ; two-body) wACSF, and m angular (G 4 ; three-body) wACSF, i.e. it takes the form: where n and m are chosen to cover satisfactorily the radial and angular space of the reference dataset and discriminate different atomic environments.
The G 1 , G 2 , and G 4 wACSF each take the forms: where i, j, and k index atomic sites, Z i is the atomic number of the atom at site i, r i j is the interatomic distance between sites i and j, and θ jik is the interatomic angle between sites j, i, and k. f c is a radial cutoff function (the cutoff set at some radial distance, r c ) that ensures that the wACSF vary smoothly and, ultimately, go to zero where r i j ≥ r c ; it takes the form: The radial distance, r c , supplied to f c has to be sufficiently large to include an appropriate number of nearest neighbours.From the perspective of an absorbing atom in X-ray spectroscopy, r c has to reflect the "field of view" (i.e. the maximum cutoff distance to which XANES is sensitive); for this reason, r c = 6.0 Å throughout.
η, µ, λ , and ζ are parameters that have to be calibrated.The effects of η and µ on the radial resolution and extent, and of λ and ζ on the angular resolution and extent, are illustrated in Fig. 2. The calibration of these parameters can be achieved manually or automatically -in the latter case, e.g., via an intelligent sampling/Bayesian approach, decomposition, or principle component analysis (PCA), 141 or using a genetic algorithm. 137An alternative approach designed to work "out-of-the-box" is given by the suggested parameterisation strategy of Gastegger and Marquetand et al., described in Ref. 137 .Here, one first defines an auxiliary radial grid, R, as a linearly-interpolated space of k points, r, between r min.and r max., and then obtains either "centred" (Fig. 2; upper panel) pairs of η and µ parameters via setting µ to zero in all cases and setting η as: or "shifted" (Fig. 2; centre panel) pairs of η and µ parameters via setting µ to each point on the auxilliary radial grid and setting η as: In the former case (Eq.6), the wACSF are centred at the X-ray absorption site and differ in their radial extent.In the latter case (Eq.7), their radial extent is constant and their centre shifts away from the X-ray absorption site, profiling the local environment in a series of concentric "shells".
G 4 wACSF additionally need to have λ and ζ parameters defined.Every pair of η and µ parameters is typically repeated for λ = ±1.0 to obtain a full 360 • angular view, and each triple of η, µ, and λ parameters can optionally be repeated for a series of values of ζ to refine the angular resolution (Fig. 2; lower panel).
Unless explicitly stated in this Article, all G 2 wACSF were constructed according to the "shifted" scheme and all G 4 wACSF were constructed according to the "centred" scheme.

C. XANES Simulation
All first-row transition metal (Ti-Zn) K-edge XANES spectra were calculated using MST as implemented in the FDMNES 128,129 package.The spectral windows were set between −15.0 and +60.0 eV (relative to the X-ray absorption edges; see Table S1), and the absorption cross-sections were calculated in steps of 0.2 eV (i.e.376 points).A selfconsistent muffin-tin potential with a cutoff radius of 6.0 Å around the X-ray absorption site was used.The interaction with the X-ray field was described by the electric quadrupole approximation, and scalar relativistic effects were included.
The calculated absorption cross-sections were preprocessed via convolution with a fixed-width Lorentzian function (the width, Γ i , depending on the X-ray absorption edge; see Table S1) and resampled via interpolation into 226 points.

III. RESULTS AND DISCUSSION
We turn to the Results and Discussion here, which are broken down as follows.In the first place, we parameterise a suitable G i feature vector (Section III A) and, subsequently, explore elements of the data preprocessing pipeline (Section III B), assessing the performance of the XANESNET DNN at the Fe K-edge.In the second place, we explore what the XANESNET DNN takes into consideration when predicting Fe K-edge XANES spectra (i.e. which features matter, and to what extent; Section III C).We subsequently generalise the XANESNET DNN across all of the first-row transition metal (Ti-Zn) K-edges (Section III D) and benchmark performance.

A. Featurisation and Parameterisation
In this Section, we address the way in which the local environments around the transition metal X-ray absorption sites are introduced into the XANESNET DNN, i.e. the encoding, or "featurisation", of the Cartesian coordinates as parameterised G i vectors (Section II B 2).We initially focus on the Fe K-edge reference dataset; results for the other eight reference datasets are, however, included in the SI.
In the first instance, we assess the performance of the "centred" and "shifted" parameterisation schemes (Section II B 2) for the G 2 and G 4 wACSF.Fig. 3 displays the relative performance of the XANESNET DNN at the Fe K-edge where the local environments around the X-ray absorption sites are featurised as G i vectors of length 97, i.e. containing a single G 1 wACSF and either 96 G 2 (Fig. 3; left panel) or 96 G 4 (Fig. 3; right panel) wACSF.
Reflecting the results presented in Ref. 137 , we verify that the G 2 and G 4 wACSF benefit from a "shifted" and "centred" parameterisation scheme, respectively.However, the performance penalty for following the less-suitable of the two parameterisation schemes is much greater for the G 4 wACSF in this work (−225%) compared to Ref.
137 (−20%).In contrast, the performance penalty for the G 2 wACSF in this work (−100%) is in line with the aforementioned results (−75%).Acknowledging differences in the G i vector length and machine-learning model architecture, this result nonetheless evidences that the extent to which the G 4 wACSF are parameterised optimally is of comparably greater importance in this work as they communicate comparably more information in the context of the present problem.This reflects either i) a more 'direct' physical relationship between the inputs and outputs {i.e. a stronger link between the local (angular) environment and the transition metal K-edge XANES spectrum (cf.enthalpies in Ref. 137 ), which could be expected as resonances in the post-edge are, after all, geometric in origin} or ii) the greater importance of the G 4 wACSF, generally, in discriminating between the diverse coordination geometries of the transition metal complexes in the reference dataset(s).We return to the latter point throughout this Article.
Performance is predictably improved via mixing G 2 and G 4 wACSF.Fig. 4 displays the relative performance of the XANESNET DNN at the Fe K-edge as a function of the 0 : 9  G 2 :G 4 composition of the (length 97) G i vector.These data are displayed for the other eight transition metal K-edge reference datasets in the SI (Fig. S1) and exhibit similar trends to those shown in Fig. 4. Performance is optimal with 32 G 2 and 64 G 4 wACSF and displays a heavy skew towards the inclusion of angular information in a 2:1 G 4 :G 2 ratio.Performance is modestly improved further via the inclusion of higher values of ζ into the G 4 wACSF.In order to keep the length and composition (32 G 2 and 64 G 4 wACSF) of the G i vector constant, and considering that each triple of η, µ, and λ parameters is repeated for each additional value of ζ by construction, sets of one {1}, two {1,2}, four, {1,2,4,8}, and eight {1,2,4,8,16,32,64,128} additional values of ζ were trialled.Fig. 5 displays the relative performance of the XANESNET DNN at the Fe K-edge as a function of the greatest value of ζ , ζ max , included.These data are displayed for the other eight transition metal K-edge reference datasets in the SI (Fig. S2).Fig. 5 shows an improvement in performance up to ζ max = 128 compared to ζ max = 1 (−10%).
The inclusion of higher values of ζ focuses the angular extent of the G 4 wACSF around 180 • (Fig. 2).This perhaps has limited utility in machine learning applications using popular databases of small organic systems (e.g.QM7, QM9) where linear and right-angled triples of atoms are infrequently encountered but is of considerable utility here, where it apparently improves the ability of the XANESNET DNN to discriminate between local transition metal coordination environments as these angles are commonplace in canonical coordination geometries, e.g.octahedral, square-planar, squarebase-and trigonal-(bi)pyramidal.
We will consequently carry forward a (length 97) G i vector comprising the G 1 wACSF and 32 and 64 G 2 and G 4 wACSF, respectively, with G 4 wACSF up to ζ max = 8 to balance the performance gain attainable by adding higher values of ζ against the cost of sacrificing pairs of µ and η parameters expressly and, consequently, limiting flexibility.

B. Optimisation and Performance
The G i vector parameterised in Section III A now delivers strong performance at the Fe K-edge, yet it is still -in a sense -suboptimal, as it is likely to contain low-variance features and feature-to-feature correlations as a byproduct of its construction that are (in the best case) redundant or (in the worst case) an obstacle to noise-free learning.Using variance and correlation threshold filters in the data preprocessing pipeline, redundant (low-variance and/or highly correlated) features in the G i vectors are able to be eliminated.Fig. 6 displays the relative performance of the XANES-NET DNN at the Fe K-edge as a function of the percentage of features eliminated via action of a variance threshold filter.These data are displayed for the other eight transition metal K-edge reference datasets in the SI (Fig. S3).It is possible to eliminate up to 25% of features (performance penalty < −1%) from the G i vector without consequence and, potentially, up to 50% of features without incurring a wholly unacceptable performance penalty (−10%), should exceptionally compact G i vectors be required.
Erring on the side of caution and eliminating 25% of features from the G i vector yields a truncated G i vector of length 71 (with the G 1 wACSF retained, and otherwise comprising 28 G 2 and 42 G 4 wACSF).The reduced dimensions of the truncated G i vector coupled with the compact [N × 512 × 512 × 226] architecture (Sections II B 1 and II B 2) reduces the number of internal weights in the XANESNET DNN to 414,208 (cf.>3,000,000 in our earlier work; Ref. 71 ), lowering the propensity for overfitting, accelerating optimisation, and opening up the opportunity to investigate computationally-intensive feature selection algorithms (Section III C).Fig. 7 displays the relative performance of the XANES-NET DNN at the Fe K-edge as a function of the number of feedforward/backpropagation epochs and the elapsed time in seconds taken to carry out the optimisation.These data are displayed for the other eight transition metal K-edge reference datasets in the SI (Fig. S4).With the reference datasets used  in this Article, the XANESNET DNN takes advantage of its simple and compact MLP architecture; it can be optimised to convergence in ca.512-1024 feedforward/backpropagation epochs -a process that can be completed in as little as a minute using an off-the-shelf commercial-grade CPU (AMD Ryzen Threadripper 3970X; 3.7-4.5GHz) or GPU (nVidia RTX 3070, 5888 CUDA cores; 1.5-1.7 GHz).

C. Feature Importance and Selection
In this Section, we carry forward the G i vector parameterised in Section III A with 25% of the features eliminated through the action of the variance filter as in Section III B. We turn our attention towards addressing a different question: what is the XANESNET DNN taking into consideration when predicting K-edge XANES spectra (i.e. which features matter, and to what extent?) and can it be considered physical?
The relative inference feature importance of each of the features comprising the G i vector has been assessed via scrambling the values of the G i vectors featurewise over the reference dataset and assessing the performance penalty in each instance at inference time.The objective of this feature importance experiment is to identify how reliant the XANES-NET DNN is on each feature for the purpose of producing accurate predictions: the greater the performance penalty when the feature is scrambled, the greater the reliance on that feature the model expresses.Fig. 8 displays the results of the feature importance experiment on the XANESNET DNN at the Fe K-edge.The feature importance of each of the G 2 (Fig. 8; centre panel) and G 4 (Fig. 8; lower panel) wACSF, using the relative performance as a proxy, is plot relative to the optimal baseline performance.These data are displayed for the other eight transition metal K-edge reference datasets in the SI {Figs.S5 (G 2 ) and S6 (G 4 )}.
In the first place, we focus on the feature importance of the G 2 wACSF (Fig. 8; centre panel); these mirror the radial distribution of atomic sites around the X-ray absorption site (Fig. 8; upper panel).The greatest feature importance is found for first coordination shell around the X-ray absorption site {windows I, II (coordination with light, first-row elements, e.g.C, N, O, F), and III (coordination with heavier, second-row-andabove elements, e.g.Si, P, S, Cl, Br, I), Fig. 8; upper panel} with decreasing feature importance found for the second (windows IV and V) and third (window VI and beyond) coordination shells.The feature importance approximately reflects the density of atomic sites at the distance at which the G 2 wACSF is centred on the radial distribution, i.e. at the associated value of the µ parameter (Section II B 2), although this is not without exception.For example, the G 2 wACSF centred around 1.5-1.6Å (µ = 1.47 and 1.63 Å) have among the highest feature importance in the G i vector, yet there are very few atomic sites located at this distance in the radial distribution (window I).Leakage of feature importance from the most important G 2 wACSF (µ = 1.8 Å; window II, which encodes the first coordination shell) is a contributing factor as the Gaussians centred here overlap on account of their full-widths-at-half-maxima (FWHM ≈ 0.3 Å) and, if one feature is scrambled, the radial information lost can be recovered partially from neighbour- ing features.However, the values of the G 2 wACSF centred around 1.5-1.6Å are also strongly indicative of a particular class of coordination complex in the reference dataset -the transition metal hydride -as no other atomic sites are as close to the X-ray absorption site as H in these coordination complexes.In this sense, these G 2 wACSF act as useful yet rudimentary 'classifiers' and are allocated a higher feature importance than one would otherwise expect given the low density of atomic sites at this distance in the radial distribution.
In the second place, we focus on the feature importance of the G 4 wACSF (Fig. 8; lower panel).Each white/shaded block represents G 4 wACSF constructed with a fixed value of ζ (Section II B 2) from the set employed ({1,2,4,8}; Section III A) and the trend of increasing feature importance (i.e.increasing performance) with increasing value(s) of ζ supports our earlier results.Within each white/shaded block, the same trend, or pattern, recurs.There are two peaks in feature importance that appear as if merged into a single peak where ζ = 1.0 and that separate as ζ is increased and the angular resolution is refined (Fig. 2).These correspond to the two key types of local angular environment around X-ray absorption sites: the linear (180 • ) and right-angled (90 • ) coordination geometries, e.g.octahedral and square-planar, among others, and the tetrahedral (105 • -115 • ) coordination geometries.It is interesting to note that, while the feature importance of the G 4 wACSF for the other eight transition metal K-edge reference datasets (Figure S6) show similar trends, Ni and Zn have comparably greater G 4 feature importance than one would otherwise expect.We associate this with the greater number of four-coordinate transition metal complexes contained in the Ni and Zn reference datasets 127 -in particular, the prevalence of tetrahedral and square-planar coordination geometriesand the utility of the G 4 wACSF for discriminating between them.
In Fig. 9, we alternatively assess the feature importance of the G 2 wACSF in two different regions of the XANES spec- trum; a lower-energy region in the neighbourhood of the Xray absorption edge spanning −3.0 → +3.0 eV and a higherenergy region in the post-edge spanning +50.0 → +56.0 eV (relative to the X-ray absorption edge).Fig. 9 displays the difference feature importance obtained by subtracting the relative feature importance in the latter from the former.The first coordination shell is of approximately equal importance to the accurate prediction of the XANES spectrum in each of the two regions.However, G 2 wACSF with lower and higher values of µ (encoding atomic sites closer to, and further from, respectively, the X-ray absorption site) are relatively more and less important, respectively, in the higherenergy region.Fig. 9 indicates a shift from a balanced reliance on all of the G 2 wACSF in the lower-energy region near the X-ray absorption edge to increased reliance on only those G 2 wACSF with lower values of µ that encode atomic sites in the first coordination shell as the energy is increased.Importantly, this mirrors the expected physics: core photoelectrons excited close to the X-ray absorption edge (i.e. in the lower-energy region) have low kinetic energy and, by extension, longer wavelengths -consequently, this region of the X-ray absorption spectrum is more sensitive to structure further away from the X-ray absorption site.However, in the higher-energy region, the greater kinetic energy of the core photoelectrons -which, consequently, have shorter wavelengths -results in a reduced "field of view", limiting the structural sensitivity to the immediate locality of the X-ray absorption site.Indeed, resonances with energy > 50 eV above the X-ray absorption edge are usually classified as belonging to the extended X-ray absorption fine structure (EXAFS) region which is well understood to exhibit structural sensitivity only to the first coordination shell around the X-ray absorption site. 142rmed with what we now know about feature importance, we can use the carried-forward G i vector to construct a further-truncated G i vector from the ground up including only the most important features, i.e. following a "select-frommodel" strategy.Fig. 10 displays the performance of the XANESNET DNN as a function of the percentage of features included in this further-truncated G i vector.Only about 60% of the features from the original carried-forward G i vector are required to obtain performance that converges to the baseline.Including only these features yields a compact G i vector of length 43 containing only the most important information: the G 1 wACSF, and 12 and 30 G 2 and G 4 wACSF, respectively.The composition is displayed pictorially in the inset pie chart on Fig. 10 -again, the G 4 wACSF are overweighted compared to the G 2 wACSF in an approximate 1:2 ratio, indicative of their importance in discriminating between the diverse coordination geometries of the transition metal complexes in the reference dataset.
To demonstrate that this ground-up construction based on feature importance is not biased by including only the features with high evaluated feature importance when taken together, i.e. from the feature importance experiment with the whole carried-forward G i vector exposed to the XANESNET DNN, we have also carried out another ground-up construction and top-down deconstruction using "forward" and "backward" sequential feature selection (SFS), respectively.The SFS experiment involves adding (in the "forward" formulation) or eliminating (in the "backward" formulation) features sequentially to/from the G i vector; the choice of feature to add or eliminate from the pool of available features is made to maximise the performance of the machine-learning model, and each feature addition or elimination is trialled independently.SFS is a consequently a computationally-intensive feature selection algorithm and can require hundreds to thousands of iterations for a DNN, depending on the target length of the desired G i vector.
The plots displaying the feature importance of the G 2 (Fig. 8; centre panel) and G 4 (Fig. 8; lower panel) wACSF are decorated with triangular markers above the features that were selected via "forward" SFS (the "backward" SFS result was not materially different) to obtain a further-truncated G i vector of length 33.All of the G 2 wACSF covering the first coordination shell (windows I, II, and III, Fig. 8; upper panel) were selected, as were G 2 wACSF with high feature importance in the second coordination shell (windows IV and V).Of the G 4 wACSF, those with highest feature importance were not all selected, although high-importance features were still selected more often than not, and more features were selected from high-ζ blocks.
The G i vector constructed via "forward" SFS comprised the G 1 wACSF, 10 G 2 wACSF and G 4 wACSF, i.e. it converged towards a similar composition and, incidentally, towards similar performance by comparison with the longer G i vector constructed via the "select-from-model" strategy.

D. Extension to Transition Metal K-Edges
The XANESNET DNN demonstrably needs very little judiciously-selected information to deliver accurate and afforable predictions of Fe K-edge XANES spectra for arbitrary Fe X-ray absorption sites; radial information on the first (and to a lesser extent, the second) coordination shells suffices with angular information sufficient to separate satisfactorily key coordination geometries (Section III C).Although the exact composition of the G i vector is dataset-dependent (one of the themes we have explored in this Article with respect to the coordination complexes in the tmQM dataset and the particularities of the problem at hand), the calibration carried out here is extensible across the first-row transition metal (Ti-Zn) reference datasets as coordination distances are not greatly different on average and canonical coordination geometries are found consistently.In this Section, we demonstrate the performance of the XANESNET DNN at predicting the K-edge XANES spectra of the nine "held-out" transition metal test datasets (Ti-Zn, 250 samples each; Section II A).Fig. 11 displays histograms of the median percentage error, ∆µ, between target, µ target , and predicted, µ predict , first-row transition metal K-edge XANES spectra; key properties of these distributions (medians, upper and lower quartiles, and skewness coefficients) are tabulated in Table I.Across the nine first-row transition metal reference datasets, the median ∆µ is typically sub-5% (ca.4.3%, on average) with the lower and upper quartiles situated symmetrically ca.2-3% under and above, respectively, presenting a tight interquartile range of ca.3-5% that testifies to the balanced performance of the XANESNET DNN.Coupled with the high positive skewness coefficients (> 1.0) across the reference datasets that place predictions squarely towards the higher-performance end of these figures, we are confident that the XANESNET DNN delivers accurate and affordable predictions that generalise well across this block of the periodic table.
The predicted K-edge XANES spectra can optionally be broadened via an additional postprocessing step to account for diverse effects on the spectral resolution including, although not limited to, core-hole lifetime broadening, instrument response, and many-body effects, e.g.inelastic losses.If this postprocessing step is carried out (as is routine, and typically with an energy-dependent arctangent function; see Eq. 2 in Ref. 71 ), performance is improved appreciably (see the values in parantheses in Table I; arctangent broadening parameters are tabulated in Table S1).Across the nine first-row transition metal reference datasets, the median ∆µ is reduced to ca. 3% (2.8%, on average) and the interquartile range tightens further to ca. 2-3% post-broadening, with the greatest improvements in the finely-structured edge region of the K-edge XANES spectra.Fig. 12 displays parity plots of the error in energy, ∆E, between target, E target , and predicted, E predict , peak positions in the first-row transition metal K-edge XANES spectra (a key metric for the experimental spectroscopist); key properties (means, maxima, standard deviations, and R 2 coefficients) are tabulated in Table II.The XANESNET DNN consistently predicts the positions of prominent peaks in the target K-edge XANES spectra to sub-eV (ca.0.80 eV, on average) accuracy across the nine first-row transition metal reference datasets, reproducing > 90% of identified targets.The coefficients of determination, R 2 -which are, for all reference datasets, > 0.99 -evidence encouragingly strong linear relationships between E target and E predict .

IV. CONCLUSION
In this Article, we have built on our earlier proof-ofprinciple work in Ref. 71 and practical applications in Refs.72 and 74 to develop and deploy a new compact neural network -the XANESNET DNN -for predicting the lineshape of transition metal K-edge XANES spectra.The XANESNET DNN is > 80% smaller, an order of magnitude faster to optimise, and yet nonetheless displays improved predictive power and an encouraging potential for generality across the periodic table.We have extended the scope of our study beyond the familiar Fe K-edge to the nine first-row transition metal (Ti-Zn) K-edges and assessed the predictive power and generality of the XANESNET DNN here.Our model is able to predict K-edge XANES spectral intensities with an average accuracy of ca. ± 2-4% across the selected spectral windows (−15.0 → +60 eV relative to each X-ray absorption edge), and to predict the positions of prominent peaks with a > 90% hit rate and sub-eV (ca.0.80 eV) accuracy.
We have addressed in detail the calibration of the feature vector (G i ) that encodes the information on the local environment around the X-ray absorption site, and carried out an assessment of the relative importance of the individual featuresparticularly the radial (G 2 ) and angular (G 4 ) components.We found that very little judiciously-selected geometric information is actually needed or, indeed, used to map feature vectors onto the lineshape of the corresponding K-edge XANES spectrum; radial information on the first (and to a lesser extent, the second) coordination shells suffices alongside a quantity of angular information sufficient to separate satisfactorily key classes of coordination geometry.We found, in addition, that the relative importance of the individual features differs depending on the spectral window under consideration.In lowenergy windows near the X-ray absorption edge, all features are taken into account in a balanced way, while in higherenergy windows in the post-edge, features encoding radial information closer to the X-ray absorption site are ascribed higher importance, mirroring the expected physics in the shift from multiple scattering to single scattering with increasing energy.
Although the exact composition of the feature vector is dataset-dependent (one of the themes we have explored in this Article with respect to the coordination complexes in the tmQM dataset and the particularities of our problem), the calibration carried out here has nonetheless proved extensible across our first-row transition metal (Ti-Zn) reference datasets with great effect.
While accuracy, affordability, and generality (with respect to the identity of the absorption site) are no longer cardinal challenges, there are -of course -new challenges to tackle and opportunities to embrace which, most pressingly, include i) the incorporation of electronic information and ii) dataset curation.On the topic of i), the XANESNET DNN currently considers only the local geometric environment around the X-ray absorption site of interest -consequently, its ability to describe charge-state-dependent spectral features remains uncertain.The key question here is "can electronic effects be reproduced by the XANESNET DNN with a sufficiently large reference dataset (i.e. to what extent is the electronic information implicit?), or do we need to input electronic information explicitly?"It is true that the energetic position of an X-ray absorption edge depends on the electron density at the X-ray absorption site, e.g. a reduction in electron density will shift the X-ray absorption edge towards a higher energy as it is consequently harder to remove the core electrons.However, such a shift can also be associated with structural change (expressed empirically via Natoli's Rule: 143 the energetic position of an X-ray absorption edge is in proportion to the average coordination distance).In fact, as changes in the charge state and local coordination geometry around the X-ray absorption site are often strongly coupled in coordination complexes, disentangling the extent of the competition between geometric and electronic effects presents a considerable challenge.5][146][147][148][149] .On the topic of ii), there are two key questions: "how can massive coordination complex datasets (rivalling popular molecular organic datasets) be curated/constructed?" and "is it necessary to construct bespoke molecular coordination complex datasets for machine learning in X-ray spectroscopy?"There is potential for intelligent (guided) and/or combinatorial strategies, and advances in high-throughput computing will be well-leveraged here. 150

FIG. 2 .
FIG.2.Schematic of the effect of the η, µ, λ , and ζ parameters on the symmetry function forms.Upper Panel: a "centred" parameterisation scheme where µ = 0.0 and η is varied; lighter-coloured lines correspond to higher values of η.Centre Panel: a "shifted" parameterisation scheme where η is fixed and µ is varied; lighter-coloured lines correspond to higher values of µ.Lower Panel: the effect of the λ , and ζ parameters on the angular component of a G 4 symmetry function; the solid and dashed lines correspond to λ = +1.0 and λ = −1.0,respectively, and lighter-coloured lines correspond to higher values of ζ .

- 4 FIG. 3 .
FIG. 3. Performance at the Fe K-edge for the "centred" and "shifted" parameterisation schemes.Performance is plot relative (in %) to the best performance in the panel.Validation results; five-times-repeated five-fold cross-validation.Left Panel: 96 G 2 wACSF.Right Panel: 96 G 4 wACSF.

FIG. 6 .
FIG.6.Performance at the Fe K-edge as a function of the percentage of features eliminated via action of a variance threshold filter.Performance is plot relative (in %) to the best performance in the panel.Validation results; five-times-repeated five-fold cross-validation.32 G 2 wACSF and 64 G 4 wACSF.

3 %FIG. 7 .
FIG.7.Performance at the Fe K-edge as a function of the number of feedforward/backpropagation epochs and the elapsed time in seconds (optimised using an nVidia RTX 3070).Performance is plot relative (in %) to the best performance in the panel.Validation results; fivetimes-repeated five-fold cross-validation.28 G 2 wACSF and 42 G 4 wACSF.

4 FIG. 8 .
FIG. 8. Feature importance for G 2 and G 4 wACSF at the Fe K-edge.Upper Panel: histogram of the radial distribution of atomic sites around the X-ray absorption site in the Fe K-edge reference dataset.Centre Panel: feature importance for G 2 wACSF.Performance is plot relative (in %) to the baseline.Triangular markers indicate G 2 wACSF selected via sequential feature selection (SFS).Lower Panel: feature importance for G 4 wACSF.Performance is plot relative (in %) to the baseline.Triangular markers indicate G 4 wACSF selected via SFS.28 G 2 wACSF and 42 G 4 wACSF.

2 G 4 FIG. 10 .
FIG.10.Performance at the Fe K-edge as a function of the percentage of features included via a "select-from-model" strategy targeting high feature importance.Performance is plot relative (in %) to the baseline.Validation results; five-times-repeated five-fold crossvalidation.28 G 2 wACSF and 42 G 4 wACSF.
FIG.4.Performance at the Fe K-edge as a function of the G 2 :G 4 composition of the G i vector.Performance is plot relative (in %) to the best performance in the panel.Validation results; five-timesrepeated five-fold cross-validation.96 G 2/4 wACSF.

TABLE I .
Summary a of the median percentage errors, ∆µ median (%), upper and lower quartiles, and skewness coefficients for the ∆µ distribution histograms (Fig.11).
a Values in parenthesis are after arctangent broadening; TableS1.

TABLE II .
Summary of the mean peak position errors, ∆E mean (eV), maximum peak position errors, ∆E max (eV), standard deviations, σ (eV), and R 2 coefficients for the peak position parity plots (Fig.12).