Machine Learning on Neutron and X-Ray Scattering

Neutron and X-ray scattering represent two state-of-the-art materials characterization techniques that measure materials' structural and dynamical properties with high precision. These techniques play critical roles in understanding a wide variety of materials systems, from catalysis to polymers, nanomaterials to macromolecules, and energy materials to quantum materials. In recent years, neutron and X-ray scattering have received a significant boost due to the development and increased application of machine learning to materials problems. This article reviews the recent progress in applying machine learning techniques to augment various neutron and X-ray scattering techniques. We highlight the integration of machine learning methods into the typical workflow of scattering experiments. We focus on scattering problems that faced challenge with traditional methods but addressable using machine learning, such as leveraging the knowledge of simple materials to model more complicated systems, learning with limited data or incomplete labels, identifying meaningful spectra and materials' representations for learning tasks, mitigating spectral noise, and many others. We present an outlook on a few emerging roles machine learning may play in broad types of scattering and spectroscopic problems in the foreseeable future.


I.1. Neutron and X-ray scattering in the data era
Neutron and X-ray scattering are two closely related and complementary techniques that can be used to measure a wide variety of materials' structural and dynamical properties, from atomic to mesoscopic scales 1,2 . Representing two state-of-the-art materials characterization techniques, neutron and X-ray scattering have witnessed significant advancement in the past several decades. As the average neutron flux reached a plateau 15 2 10 n/cm /s for reactor-based neutron generation, accelerator-based neutron generation has improved steadily (Figure 1a) 3 . The planned Second Target Station (STS) at Oak Ridge National Lab (ORNL) has a 25x enhancement in brightness and a factor of 10-1000 capability enhancement in instruments comparing other neutron sources in the US. For Xray scattering, the peak brightness of synchrotron sources has increased drastically across a broad range of X-ray photon energies (Figure 1b) 4 . In fact, the improvement in peak brightness of synchrotron X-ray sources even exceeds the rate of Moore's law (Figure 1c) 5,6 , with a few major facility upgrades such as APS-U, ESRF-EBS, and PETRA-IV, bringing significant capability boosts. A direct consequence of the enhanced capability is the high efficiency of data collection, enabling the measurement of more diverse types of materials.
In addition to increased data availability for a broader materials composition space, the higher brightness further opens up the possibility for higher-dimensional data collection for a single material type or within one scattering experiment. Spectroscopies like time-offlight inelastic neutron scattering measure the dynamical structure factor in fourdimensional (4D) momentum-energy ( , )  Q space, while X-ray photon correlation spectroscopy measures the intensity auto-correlation in 4D momentum-time ( , ) t Q space 7 .
The emerging frontier of multimodal scattering, which simultaneously measures samples with multiple probes, or in in-situ environments such as extreme temperature or pressure, elastic strain, or applied electrical and magnetic fields, introduces additional dimensions to the measured parameter space. Alongside high intrinsic momentum Q, energy  , and time t dimensions, multimodality leads to an even higher overall data dimension and adds inevitable complexities to data analysis.
Lastly, the discovery of new functional and quantum materialsoften accompanied by novel or unexpected emergent propertiesposes a significant challenge to materials analysis. In many scattering experiments with a given measurable signal S exp (Q,w ,t,...) , there exist associated theoretical models model  In short, large data volume, high data dimension combined with multimodality, and new classes of quantum and functional materials with emergent properties that go beyond approximate models, all call for a revolutionary approach to learn materials properties from neutron and X-ray scattering data. Machine learning 8,9 , especially emerging techniques that incorporate physical insights [10][11][12][13][14][15] or respect symmetries and physical constraints of atomic, crystalline, and molecular structures [16][17][18][19][20][21][22][23][24][25][26] , appears to be a promising and powerful tool to extract useful information from large, high-dimensional datasets, going far beyond approximate models. The past few years have witnessed a surge in machine learning research with scattering and spectroscopic applications. Even so, we foresee that machine learning, if properly implemented, has the potential to not only serve as a powerful tool to do data analysis, but also to gain new knowledge and physical insights of materials, which can assist experimental design and accelerate materials discovery.

I.2. Integrating machine learning into the scattering setup
Machine learning has already been widely applied to materials science in many aspects, especially in directly predicting or facilitating predictions of various materials properties from structural information, including but not limited to mechanical properties 24,26-28 , thermodynamic properties 27,[29][30][31] , and electronic properties 24,[32][33][34][35][36][37][38] . The strong predictive power and representation learning ability of machine learning models can lead to much lower computational cost compared to expensive numerical methods like first-principles calculations but with comparable accuracy. This feature greatly accelerates the materials discovery and design [39][40][41][42][43][44] . Machine learning models can also be trained to learn interatomic force fields and potential energy surfaces [45][46][47][48][49][50] , where the accurate yet computationallycheap access to atomic potentials has proven successful in simulating the transitions in a disordered silicon system with 100,000 atoms 51 . Machine learning models have already initiated a paradigm shift in the way people study materials science and physics [52][53][54][55][56][57][58] .
To see how machine learning can be applied to neutron and X-ray scattering, we show a simple scattering setup in Figure 2a. A beam of neutrons or X-ray photons is generated at the source. After passing through the beam optics that prepares the incident beam state, the beam impinges on the sample with a set of incident parameters ( , , , ,...) In this source-sample-detector tripartite scheme, the possible application scope of machine learning can be seen clearly: At the "source" stage, machine learning can be used to optimize beam optics; at the "sample" stage, machine learning can be used to better learn materials properties; while at the "detector" stage, machine learning can be used to improve data quality, such as realizing super-resolution. Setting aside the "source" and "detector" stages, which will be introduced in Section IV, we focus on the "sample" stage, particularly the application of machine learning to relate materials' spectra and their properties.
To further illustrate the general relationship between machine learning and scattering spectra, we consider the scattering data as one component in a typical machine learning architecture. In the case of supervised machine learning, the scattering spectral data can serve either as input to predict other materials properties (Figure 2b), or as output generated from known or accessible materials parameters, such as atomic structures and other materials representations (Figure 2c). Alternately, unsupervised machine learning can be used to identify underlying patterns in spectral data through dimensionality reduction and clustering, which can be useful for data exploration or identification of key descriptors in the data (Figure 2d).

I.3. Machine learning architectures for scattering data
With the various roles machine learning may play in a scattering experiment pipeline, one may ask what particular machine learning architecture should be used for a certain task.
Given the no free lunch theorem for optimization 59 16 , the atom-centered symmetry functions which contain both radial and angular information 62 , and the smooth overlap of atomic positions (SOAP) and power spectrum 63 .
A review on materials representation can be found in Ref 54 .
Representation of scattering data. Paired with materials representation is the representation of scattering data. The scattering intensity can be stored as a high-dimensional array indexed by momentum k, energy , and polarization ε. Such data structures are naturally compatible with convolutional neural networks (CNN), which has been widely applied in image processing. Moreover, atomic structures can also be interpreted as images by computing density fields in 3D real-space grids based on atomic species and positions, which enables them to work with convolutional filters 43,64 . Architectures beyond CNN, such as deep U-Net, also exist, which decreases the feature size while increasing the feature numbers, then performs the inverse operation with skip connections enabled between corresponding levels (Figure 3a) 65 .
Autoencoder and generator. Another useful architecture is the variational autoencoder (VAE) 66 , which compresses the input into some distributed area in a lower-dimensional latent space (encoding), followed by the optimized recovery of input from the lowdimensional representation (decoding). The latent space is thus a "compressed" continuous representation of the training samples, which can be very useful in learning representations for materials (Figure 3b). For example, VAE can be combined with CNN to learn latent representations of atomic structures 64 . Moreover, the stability of crystal structures can be easily inferred from latent space clustering 43 , and similar ideas can also be applied to analyze scattering data, such as X-ray absorption spectroscopy 38 or neutron diffuse scattering 67 . Another use of VAE is to serve as generative models to facilitate material design, such as generating new structures through sampling and exploring the latent space 43 .
The generative adversarial network (GAN) is another popular generative framework that is composed of a generator and a discriminator (Figure 3c) 68 . The generator is a neural network that converts latent space representations to desired objects such as crystal structures 44 , while the discriminator is another network that aims to discern "fake" (generated) from "realistic" (training) samples. The main goal of the generator is to create high-fidelity objects that can pass through the discriminator test.
Graphic neural networks. Graph neural networks with nodes and edges are naturally suited to represent atomic structures, where atoms can be represented as nodes, and their bonds correspond to edges in a graph. In graphs, information at each node is updated with filtered information from its neighboring nodes, mimicking the local chemical environment where an atom is most influenced by neighboring atoms. The crystal graph CNN (Figure 3d) 24 and the Euclidean neural network (E 3 NN) 19 Non-parametric learning algorithms. The aforementioned machine learning architectures contain parameters that need to be learned during the training process, yet there exist plenty of unsupervised learning or non-parametric algorithms which do not contain learnable parameters but are more procedural. For instance, k-means clustering and Gaussian mixture models (GMMs) can be applied to data clustering, decision trees such as gradient boost trees (GB Trees) can be applied for classification and regression (Figure 3f), and principal component analysis (PCA) can be used for data dimension reduction. One particularly interesting method is the non-negative matrix factorization (NMF), which decomposes a matrix into lower dimensions but maintains an intuitive representation composed of different parts (Figure 3e) 71,72 . Conceptually, NMF resembles the widely used dimension reduction algorithm of PCA, but with additional non-negativity constraints. Such nonnegative matrix descriptions are extremely powerful when interpreting some physical signals like music spectrograms 73

II.1. Diffraction with machine learning
To see how machine learning can benefit neutron and X-ray diffraction, we follow the taxonomy in Figure 2: in supervised learning, diffraction can serve as an input to predict materials properties, or predict the structure itself, while in unsupervised learning, diffraction can be used to perform classification without additional data labels. The inverse problems, such as using structures or other physical properties to predict diffraction patterns, either belong to physics-based forward problems or have less value for machine learning studies and will be left out of this discussion.
Diffraction or structure as input, property as output. Since the most straightforward information extractable from diffraction is atomic structure, whose variation is directly associated with mechanical properties, we start by discussing an example that shows how diffraction can be used to predict elastic constants in complicated materials, taking highentropy alloys as an example. High-entropy alloys have received tremendous attention in the past decade due to their extraordinary strength-to-weight ratios and chemical stability.
However, given their complex atomic configurations, direct property calculation has been challenging. To enable efficient prediction of elastic constants in high-entropy alloys, Kim et al. conduct a combined neutron diffraction, ab initio calculation, and machine learning study 74 . In this study, an in-situ diffraction experiment and high-quality ab initio density  x and labels E j y form another set that is easier to obtain, such as from simple crystalline solids or from efficient forward computation. The key step is to build a predictive model using the large, "easy" set{ , } Diffraction as input, structure as output. In addition to the direct structure-to-property prediction, given the close relationship between diffraction and structure, another type of machine learning problem is to perform diffraction-to-structure prediction, as done by Unsupervised learning of diffraction. Besides supervised learning on XRD or PDF spectra, another boost for scattering data lies in unsupervised learning, which seeks the internal categorical data structure. Since conventional fitting and refinement methods have been maturely developed to identify phases among different crystallographic structures, one key application for unsupervised learning is phase identification in a complex compositional phase space where multi-phases coexist. One key milestone in the unsupervised learning algorithm for XRD analysis is the NMF, which can decompose the spectra into simpler Beyond learning materials properties and seeking for structure-property relations, machine learning has also been applied to empower the analysis of the diffraction patterns themselves [82][83][84][85][86] . Since the focus here is to explore materials properties, we leave those examples to Section IV.2 as part of the section on the data analysis process.

II.2. Small-angle neutron and X-ray scattering
Small-angle scattering (SAS), including small-angle neutron and X-ray scattering (SANS Spectra as input, structure as output. Since the original goal for SAS is to learn structural information, we start by introducing one example that predicts structural properties. Franke et al. provide such a machine-learning-based structure predictor in bio-macromolecular solutions 102 . For a given geometrical object, although the form factor and the corresponding SAS spectra are directly computable (Figure 6a), the effect of disorder must be considered to generate data that are close to reality. To consider the disorder effect, an ensemble optimization method is implemented to generate SAS patterns with random chains followed by averaging to simulate mixtures (Figure 6b), which augment the original geometrical data (Figure 6e, orange block). The first task is to classify the shape of macromolecules from SAS. Defining a structural parameter called radius of gyration , respectively. It can be seen directly that different basic shapes separate well in this 3D parameter space (Figure 6b). By performing unsupervised k-nearest neighbor classification, the shapes with mixture and disorders can also be classified from SAS curves.
To perform structural parameter prediction, a separate set of atomistic structure data from protein database (PDB) is used to compute both SAS patterns and structural parameters, from which a predictive machine-learning model is built, showing good transferability when applied to experimental database (Figure 6d). A summarized workflow is shown in highlighting the different focus in obtaining shape features and structural parameters, respectively. The goal of shape classification and structure parameter prediction, with different synthetic data augmentation for respective tasks, represents an active area using machine learning on SAS 106,111 , which has been applied systems like RNA 108 and 3D protein structures 110 . Machine learning also enables a direct analysis using 2D SAS data 105,112 , where traditional analysis frequently needs a data reduction to 1D for further analysis.
Another example of machine learning applied to SAS is in micromagnetic structural determination from SANS. As in the soft matter cases, real space structural information is encoded in 2D maps of the neutron scattering cross-section. As noted previously, a strong benefit of magnetic SANS is that the structure factor and cross-section are relatively straightforward to calculate from a theoretical modeloften a micromagnetic continuum  103 , which is a feedback loop "force-field parameters  molecular dynamics (MD) simulations  SAS calculations  force-field parameters" (Figure 6f). Such refined force-field parameters improve the agreement between simulated SAS and certain experimental data.

II.3. Imaging and tomography
Neutron 113,114 and X-ray 115 imaging encompass a variety of modalities and have become essential techniques to unravel multidimensional and multiscale information in materials systems. As the complexity and size of imaging data grows, machine learning has also been applied to solve a variety of imaging-related computational tasks, including tomography and phase-contrast imaging. Tomography and phase-contrast imaging are two types of high-dimensional imaging concerning the beam absorption and phase-shift associated with sample rotation, respectively. We restrict the further discussion to materials science and refer the readers to other reviews for applications in biomedical imaging 116,117 . Despite the variety of imaging modalities, the major data processing steps generally include image reconstruction and image segmentation.
In image reconstruction, one recovers the real space information (usually the amplitude and phase of the imaged object) from data obtained at different sample positions. Neural network-based reconstruction algorithms have been shown to improve the reconstruction speed and quality 118 , as demonstrated in a neutron tomography experiment 119 (Figure 8a) that aims to directly solve the phase retrieval problem in a particular type of CDI called ptychography, where the sample is scanned through 134

III.2. Photoemission spectroscopies
Contrary to XAS, which is generally bulk sensitive, there is another surface-sensitive, photon-in, electron-out technique, named photoelectron or photoemission spectroscopy (PES), measured with light sources from hard X-ray to extreme ultraviolet (UV) energy range. PES provides the direct access to a material's electronic structure 155,156 . The high sensitivity of X-ray photoelectron spectroscopy (XPS) to the chemical environment makes it an essential tool for composition quantification. In this regard, machine learning-based spectrum fitting may be used to disentangle complex overlapping spectra. Aarva et al. use fingerprint spectra calculated with bonding motifs obtained from an unsupervised clustering algorithm to fit X-ray photoelectron spectra 157 . Drera et al. use simulated spectra to train a CNN to predict chemical composition directly from multicomponent X-ray photoelectron spectra from a survey spectra library 158 . Their approach obviates the need to fit these complex spectra directly, while showing robustness against the contaminant signal within the survey spectra.
Apart from chemical quantification, modern PES with momentum-resolving detectors is capable of mapping the entire electronic structure of materials through multidimensional detection of photoelectron energy and momentum distributions 155,159 . The resulting 4D intensity data in energy-momentum space from PES share the same data structure with the inelastic scattering for vibrational spectra. While this analogy implies transferability of machine learning approaches developed for inelastic scattering to be discussed in section III.3, the relation between PES observables and microscopic quantities is significantly more complex due to the quantum nature of the electronic states and the multiple prefactors that effectively modulate the intensity values in a momentum-dependent manner 160  Remarkably, this approach does not require training but a reasonably good prior guess as a starting point. Its reasonable computational scaling allows the reconstruction of multiband dispersions within the entire Brillouin zone, as demonstrated in 2D material tungsten diselenide (WSe2).

III.3. Inelastic scattering
One of the major triumphs of neutron and X-ray scattering is inelastic scattering, which  (Figure 9a). The inherent symmetry effectively augments data without increasing its volume. Intuitively, the physical constraint imposed in the neural network acts like regularization, while encoding symmetry into the neural network resembles data augmentation of input data by symmetry operations. The predicted phonon DOS is shown in Figure 9b with each of four rows representing an error quartile. For lower error predictions (first three rows in Figure   9b), the fine shape of DOS can be well captured; for high-error predictions (fourth row in Figure 9b), the coarse feature such as bandwidth and DOS gap can still largely be predicted.
With such a predictive model available, the computational cost for phonon DOS is significantly reduced, and the prediction in alloy systems become feasible.

IV.1. Instrument and beam
Thus far, the discussion has focused on using machine learning-augmented elastic and inelastic scattering, and spectroscopies to better elucidate materials properties. Given the central role of beamline infrastructure in a successful scattering experiment, machine learning has also been applied to optimize instrument operation [189][190][191][192] (Figure 11c, top) due to the insertion device gaps (Figure 11c, bottom). By constructing a neural-network-based supervised learning model with x=Insertion device gaps or phase configurations, y=beam size, it can be shown that the neural network outperforms simple regression models and can better capture the beam sizes (Figure 11d, top) with less error (Figure 11d, bottom). It is worth mentioning that the chosen fully-connected artificial neural network contains 3 hidden layers and more parameters, which may also contribute to the superior performance than polynomial regression models.

IV.2. Data collection and processing
Machine learning can also greatly facilitate the scattering data collection and processing.
Here, by "processing" we mean procedures like data refinement, denoising, automatic information-background segmentation etc., but do not include extracting further materials' information. Given the precious beamtime resources, the central question is to extract the same amount information with reduced beamtime. For diffractometry, one typical problem is the diffraction peak-background segmentation, which usually requires fine-collected Measurements can also be accelerated by reducing necessary sampling points in parameter space with guidance from machine learning. Kanazawa et al. propose a workflow that optimizes the automatic sequential Q-sampling, which suggests the next Q-point based on uncertainties estimated from previously measured data (Figure 12c) 194 .
Noack et al. 195,196 have used kriging 197,198 , a Gaussian process regression method, to design experimental sampling strategy in spatially-resolved SAXS measurement of block copolymer thin films. Compared to a complete set of SAXS measurements sampled using a regular grid as is done traditionally, the authors show that the use of kriging and its variants they have developed in required only a fraction of the sampled spatial coordinates, while arriving at a reconstruction with comparable detail to the outcome of the grid scan.
Their closed-loop approach highlights the potential for experimental automation to improve the efficiency in data acquisition and to maximize the information gathered from fragile samples.
Chang et al. address a similar challenge by applying CNN to SANS spectra data to reach super-resolution 107 . Even for anisotropic scattering, the CNN-based super-resolution reconstruction allows a better agreement to the ground truth than the conventional bicubic algorithm (Figure 12d).
Finally, machine learning can also be applied in problems that improve other data collection processes, such as calibrating the rotation axis for X-ray tomography 199 , improving the phase contrast-spatial resolution contradiction in phase-contrast imaging 200 , optimizing data segmentation in transmission X-ray microscopy (TXM) 201 , data visualization in neutron scattering data 202 , and achieving super-resolution in X-ray tomography 203 .

V.1. Machine learning on time-resolved spectroscopies
A wide variety of machine learning models are available to study the dynamics of physical systems, for example, the recurrent neural network (RNN) and RNN-based architectures.
They can be used for metamodeling of structural dynamics 204 , inferring quantum evolution of superconducting qubits 205 , and for modeling quantum many-body systems on large lattices 206 . RNN based models have been applied to study spectra such as nonlinear tomographic absorption spectroscopy 207 and optical spectra for optoelectronic polymers 208 .
However, their applications to time-resolved neutron or X-ray scattering are still scarce. In the context of scattering measurements, extra challenges exist given that physical processes are now reflected through neutron or photon counts on detector arrays, accompanied by noise and loss of phase information. Fortunately, neural networks are good at denoising 209 , solving phase retrieval problems 210,211 , and dealing with information missing in time series 212 . Thus RNN based models can serve as promising techniques to extract deeper insight from time-resolved neutron and X-ray spectra.
Neural ordinary differential equations (Neural ODE) is an alternative framework that can be used to learn from time-series data 213 . This framework can be intimately related to physical models, it is able to make good extrapolation with limited training data and thus has found its applications in quantum phenomena 214,215 . It will become rather interesting to see how neural networks can be combined with physical models and be learned from scattering data with Neural ODE to exploit new physics understanding. Another approach for learning complex nonlinear dynamics is deep Koopman operators 216,217 : where an autoencoder-like structure is developed to connect observed states with intrinsic states represented by the learned Koopman coordinates, where the intrinsic states get evolved with learned dynamics within the latent space. Such an architecture can be analogously mapped to physical observables, i.e., scattering data, and the intrinsic quantum states of measured specimens, thus can also serve as a promising approach to interpret time-resolved scattering data.

V.2. Leveraging information in real and reciprocal spaces
Frameworks that employ the principles of symmetry and Fourier transforms could efficiently learn models of complex physical systems and help us efficiently harness scattering data, in either real space, reciprocal space, or both.
Symmetry and Fourier transforms are two of the most valuable and commonly used computational tools for tackling complex physics problems. These tools encode much of the domain knowledge we have about arbitrary scientific data in 3D space: 1) The properties of physical systems (geometry and geometric tensor fields) transform predictably under rotations, translations, and inversion (aka, the 3D Euclidean symmetry).
2) While physical systems can be described with equal accuracy in both real (position) space and reciprocal (momentum) space, some patterns/operations (e.g., convolutions, derivatives) are much simpler to identify/evaluate in one space than the other. The beauty of symmetry and Fourier transforms is that they make no assumptions about the incoming data (only that it exists in 3D Euclidean space); this generality is also an opportunity for improvement. The strength of machine learning is the ability to build efficient algorithms by leveraging the context contained in a given dataset to forgo expensive computation.
A constant theme in scattering data is acquiring data in reciprocal space and having that data to inform something traditionally represented in real space. While there are models that can operate on these domains separately, it would be a valuable and natural direction to extend these methods to simultaneously operate and exchange information in both spaces. This would also allow the user to input and output data in whichever space is more convenient and intuitive, and can directly support methods like diffraction imaging, which contain information in both spaces.

V.4. High-performance computing for quantum materials
Increasingly, studies in functional materials underscore quantum phenomena emergent from entanglement. These quantum phenomena, such as quantum spin liquids, unconventional superconductivity, and many-body localization, are beyond the structure information description. However, the associated correlations are encoded in the inelastic scattering spectroscopies through the energy-momentum resolution, which motivates corresponding theoretical predictions. Due to quantum entanglement, semiclassical theories like the mean-field theory, linear spin-wave theory, or even DFT become insufficient due to the absence of static or dynamic electron correlations. Thus, the machine learning and cross-validation of spectroscopies associated with these materials require sophisticated computational methods.
To sufficiently include the quantum entanglement in spectral calculations, two promising routes have been widely attempted. The first route is the correction of DFT by embedding other methods. Beyond the elementary DFT+U corrections for total energy, the GW method allows a self-consistent correction of the Green's function using the screened Coulomb interaction in the random-phase approximation (RPA) form 229 . A more sophisticated correction for strong correlation effects is the DFT + DMFT (dynamical mean-field theory) method 230 . By mapping the self-energy into a single-site impurity problem, DMFT further includes local high-order correlations in the spectral calculations 231 . These corrections on top of DFT enable spectral calculations for materials with substantial quantum entanglement. However, as the corrections are usually biased, the accuracy of the results is sometimes not well-controlled. The DFT+DMFT method has been widely used to simulate the single-particle Green's function relevant for photoemission experiments 232 . Its numerical complexity increases dramatically when extended to two-particle or four-particle correlation functions, which are required to evaluate inelastic scattering cross-sections. Implemented using the Bethe-Salpeter equation   Spectral data serving as input to a supervised machine learning model for a materials' classification or property prediction task. c. Materials structural or property data serving as input to a supervised machine learning model for direct and efficient prediction of the full scattering spectra. d. Spectral data as part of an unsupervised machine learning task that can identify inherent patterns or clusters within the dataset that may correspond to meaningful physical parameters.    and disordered chains to three features to construct a geometric shape space, with each datapoint labeled according to its associated geometric class. Similarly, a model for structural parameter prediction is obtained by reducing the simulated SAS patterns of asymmetric units and biological assemblies available in the PDB to the same three features. Each datapoint in this structural parameter space is associated with a value for each structural parameter of interest, such as maximal extent (Dmax) and molecular mass. Subsequently, the geometry and structural parameters of an entity in an unknown biomolecular solution can be determined by computing the three features using its experimental SAS spectrum, mapping to the corresponding coordinate in either geometric or structural parameter spaces, and weighting the contributions of its k-nearest neighbors. f. Flowchart depicting the modified ForceBalance-SAS algorithm. Adapted from Demerdash et al. 103 . The full bar height represents the score for models trained on the full XANES feature space, while the gray bars represent the results using only the pre-edge region. Adapted from Carbone et al. 143 . b. Representative RDC for an arbitrary system for nine different values of α, depicting the increasing resolution of the RDC with increasing α. Adapted from Madkhali et al. 145 . c. Schematic illustration of a message-passing neural network (MPNN). d. Time evolution of the operando Cr K-edge XANES spectra of a Cr VI /SiO2 catalyst during reduction with ethylene (from black to red) and during ethylene polymerization (from red to bold orange). Adapted from Guda et al. 151 .  134 . b. The PtychoNN takes diffraction patterns at different spatial scan points (row A), and retrieve real space amplitude and phase (row C and E), which show good agreement with conventional phase retrieval algorithm approach (row B and D). Adapted from Cherukara et al. 134 .
c. An adaptive machine learning framework that recovers real space electron densities from diffraction patterns. This framework takes output of the 3D CNN as initial condition of the extremum seeking algorithm. Adapted from Scheinker and Pokharel 135 .

Figure 9. Direct prediction of phonon density of states (DOS) with the Euclidean neural network (E 3 NN). a.
Crystal structures is encoded into graphs with atoms being nodes with feature vectors, and bonds being edges. The graph is then passed into the E 3 NN that preserves the crystallographic symmetry. b. The predicted phonon DOS is displayed in four rows with each representing an error quartile. Fine details of DOS can be well captured for samples in first three rows (lower error predictions), while coarse features such as bandwidth and DOS gap can still be largely predicted for last row (higher error predictions). Reproduced from Chen, Andrejevic, and Smidt et al. 185 .   189 . c. Variation in electron beam size (top) exists due to insertion device gaps (bottom). d. A neural network is trained to accurately predict beam sizes based on insertion gaps, the comparisons on vertical beam sizes (top) and deviations of different model predictions (bottom) clearly shows that the neural network can outperform regression models Subfigures c. and d. are reproduced from Leeman et al. 190 . Figure 12. Machine learning applications in collecting and processing scattering data. a. Deep U-Net is applied to provide better peak masks for more accurate Bragg peaks integral. Reproduced from Sullivan et al. 84 . b. Predicted SANS pattern from traditional kernel density estimation (KDE) and Gaussian mixed model with B-spline-based prior (BSGMM), where the latter approach yields smoother, suggesting better predictions are obtained. Reproduced from Asahara et al. 193 . c. Datadriven sequential measurement for SANS by proposing next sampling Q points based on previously measured data. Reproduced from Kanazawa et al. 194 . d. Comparisons about super-resolution for SANS data made by CNN-based method and baseline bicubic up-sampling algorithm, where the CNN-based method yields better reconstruction of the high-resolution data. Reproduced from Chang et al. 107 .