Elucidating proximity magnetism through polarized neutron reflectometry and machine learning

Polarized neutron reflectometry is a powerful technique to interrogate the structures of multilayered magnetic materials with depth sensitivity and nanometer resolution. However, reflectometry profiles often inhabit a complicated objective function landscape using traditional fitting methods, posing a significant challenge to parameter retrieval. In this work, we develop a data-driven framework to recover the sample parameters from polarized neutron reflectometry data with minimal user intervention. We train a variational autoencoder to map reflectometry profiles with moderate experimental noise to an interpretable, low-dimensional space from which sample parameters can be extracted with high resolution. We apply our method to recover the scattering length density profiles of the topological insulator-ferromagnetic insulator heterostructure Bi$_2$Se$_3$/EuS exhibiting proximity magnetism, in good agreement with the results of conventional fitting. We further analyze a more challenging reflectometry profile of the topological insulator-antiferromagnet heterostructure (Bi,Sb)$_2$Te$_3$/Cr$_2$O$_3$ and identify possible interfacial proximity magnetism in this material. We anticipate the framework developed here can be applied to resolve hidden interfacial phenomena in a broad range of layered systems.

However, subtle interfacial phenomena such as proximity magnetism can be difficult to single out from bulk contributions to the PNR signature.The composition and magnetic profiles of a sample are typically expressed in terms of the nuclear and magnetic scattering length densities (SLD), which can be recovered from reflectometry measurements by fitting the data to a candidate model.This traditionally involves building a theoretical model of the experimental system in terms of structural parameters -density, thickness, interface roughness, and magnetization -of the constituent layers, and simulating the associated reflectometry profile using the methods of Parratt recursion [55] or the Abeles matrix formalism [56].However, due to information loss about the phase of the reflected neutrons, different SLD profiles can gener-ate highly similar reflectivities, leading to a complicated cost landscape between the theoretical and experimental profiles with potentially many local minima.Thus, parameter refinement often demands expert insight to identify a suitable starting point and adequately constrain the parameter space of the model.Methods to resolve the phase ambiguity through carefully designed experiments [57][58][59][60][61] have also been proposed.More generally, additional insights from X-ray diffraction, transmission electron microscopy, and/or bulk magnetometry are required, as well as selection of appropriate models [62,63] and optimization methods, which can play a critical role in steering the refinement process.For example, many existing neutron and X-ray reflectivity refinement programs (e.g.GenX [64], Refl1D [65], StochFit [66]), implement stochastic optimization methods such as differential evolution, simulated annealing, or stochastic tunneling, to better manage multiple local minima.More recently, machine learning-guided fitting approaches have been introduced to both improve and automate parameter retrieval from neutron reflectometry data with promising results [67][68][69][70][71].However, elucidating subtle interfacial phenomena such as magnetic proximity from PNR signatures remains a significant challenge.
In this work, we develop an alternate, data-driven framework to retrieve the sample parameters of candidate proximity-coupled systems from their PNR profiles with minimal user intervention.Using a variational autoencoder (VAE), we map reflectometry profiles simulated from a broad range of candidate physical parameters to a low-dimensional latent space from which the true sample parameters can be readily obtained.The decoded profiles directly inform the suitability of the parameter search space through the reconstruction quality and are robust to moderate perturbation of the input reflectivities emulating experimental noise.Importantly, we find that the latent mapping naturally bypasses the issue of multiple local minima and is both well-organized and visually interpretable in terms of the physical parameters.Thus, the latent representation can further be used to automatically refine the parameter search space for poorly reconstructed profiles.We evaluate our model on its ability to recover the sample parameters of a well-studied TI-ferromagnetic (FM) insulator heterostructure, Bi 2 Se 3 /EuS, exhibiting proximity magnetism.Our model predictions are found to be consistent with the results of traditional fitting methods and at the same time require no expert insight for parameter initialization or refinement.We conclude by applying our model to analyze a more challenging PNR profile of the TI-antiferromagnet (AFM) heterostructure, (Bi,Sb) 2 Te 3 /Cr 2 O 3 , which points to possible proximity magnetism at the resolution limit.

BACKGROUND
Polarized neutron reflectometry measures the spin-dependent specular reflection of an incident neutron beam from the surface of a magnetic thin film.The reflectometry profile, R(Q), is a function of the wave vector transfer Q = 4π sin θ/λ, where θ is the angle of reflection and λ is the wavelength of the neutron.In the first Born approximation, the reflectivities of the neutron spin nonflip channels, R ++ (Q) and R −− (Q), are related to the nuclear and magnetic SLDs according to [13], where ρ N and ρ M are, respectively, the nuclear and magnetic scattering length densities; φ is the angle between the magnetization and the neutron polarization; and the superscript + and − denote the neutron spin up and down states, respectively.The coordinate z measures the depth perpendicular to the sample surface.To develop the VAE-based approach to recover the SLD profiles of candidate proximity-coupled systems, we consider the specific thin film system consisting of a Bi 2 Se 3 /EuS heterostructure atop a sapphire (α−Al 2 O 3 ) substrate with an amorphous a-Al 2 O 3 capping layer, as shown in Fig. 1a.Proximity-induced magnetism has been reported at the interface between the TI Bi 2 Se 3 and EuS, a ferromagnet with a Curie temperature of approximately 16.6 K [24,72].The reflectometry profile shown in Fig. 1b consists of two curves, R ++ and R −− , corresponding to the two neutron spin non-flip channels aligned parallel and antiparallel, respectively, to an in-plane external magnetic field H ext = 1 T. Note that R ++ and R −− have been normalized to a maximum value of 1.In a typical parameter refinement program, the theoretical model is fit simultaneously to both spin channels to obtain the SLD profile of the sample (Fig. 1c).However, due to the phase ambiguity and large number of fitting parameters, different SLD profiles can produce excellent fits to the measured data.For instance, the fit obtained in Fig. 1b corresponds to the SLD profile shown in Fig. 1c, which proposes a high interface roughness between the Bi 2 Se 3 and EuS films but no discernible proximity effect.Fig. S1 shows three additional fits to the data obtained using the GenX parameter refinement program with different initial populations, which produce mixed results for the relevant parameters.The objective of the data-driven approach is to retrieve the optimal physical parameters of a target sample from its PNR profile under moderate experimental noise, and to compute a reliable SLD profile from learned sample parameters with minimal influence from common issues in iterative optimization algorithms, such as sensitivity to parameter initialization and stagnation.We find this approach can further inform the suitability of the entire parameter search space, not just the predicted parameters.
Our approach is based on the VAE [73], an unsupervised deep generative model which is trained to recover an input from a low-dimensional encoding by minimizing the associated reconstruction error.The VAE comprises an encoder and decoder network, as shown in Fig. 1d.The encoder network outputs parameters to a probability density, q θ (z | x), parameterized by a set of trainable weights θ, from which the latent features z are sampled.The decoder network outputs the parameters to the probability distribution of the data, p φ (x | z), parameterized by a set of weights φ, using the sampled latent features z.In contrast to a simple autoencoder, the VAE assumes that the encoded -or latent -vector elements are drawn from a prior distribution p(z), which is enforced by an additional regularization term in the loss function, where D KL denotes the Kullback-Leibler (KL) divergence, computed between the returned distribution q θ (z|x) of the latent vector z and the prior distribution p(z), and β is a hyperparameter regulating the degree of entanglement between the learned latent channels [74].The prior distributions are typically modeled as independent unit Gaussians, i.e., p(z i ) ∼ N (0, 1), and the approximate posterior q θ (z | x) as a Gaussian with mean and variance estimated by the encoder.Additional details of our specific implementation are provided in the Supplementary Information.By encoding the input as a distribution rather than as a single point, the VAE compels the latent space to be smooth and continuous, with nearby points corresponding to similar reconstructions of the input.Thus, in the context of PNR, the VAE can be considered as a way to map PNR profiles into a well-organized and informative low-dimensional space, as they naturally evolve as a function of a few well-defined structural parameters (Fig. S3).The potential advantages of a VAE-based approach to parameter retrieval are further exemplified through a toy example described in the Supplementary Information.

Network architecture
Like the conventional fitting programs, the VAE treats the R ++ and R −− channels jointly using a convolutional neural network (CNN) encoder with a combination of one-and two-dimensional kernels (Fig. 1d).The convolutional and pooling layers are followed by a set of fullyconnected layers operating on the flattened CNN output, returning the predicted means µ and standard deviations σ of the normal distributions from which the latent vector z is sampled.When the latent representation is conditioned on the sample parameters, we can interpret each latent channel as effectively returning a distribution over one parameter's value, illustrated schematically in Fig. 1d.The mean values of the latent distributions are fed to a simple regressor consisting of a single hidden and activation layer that predicts the physical parameter values.A ReLU activation is used to restrict the predicted values to be non-negative in accordance with the physical parameters.At the same time, the sampled vector z is passed to the decoder, which returns the reconstructed profiles R ++ and R −− .All three networks -encoder, decoder, and regressor -are trained end-to-end by minimizing the total loss function, where v and v denote the true and predicted parameter values, respectively, and λ is a hyperparameter weighting the contribution of parameter regression to the total loss.Details regarding the implementation of the loss function and the selected hyperparameters are provided in the Supplementary Information.

Data preparation
To generate the training and development datasets for the neural network model, we used the GenX neutron reflectivity modeling code [64] to simulate the PNR profiles of 200, 000 candidate systems of the Bi 2 Se 3 /EuS heterostructure.For each example, the constituent layers are parameterized by their density, thickness, roughness, and magnetization, which are sampled uniformly at random over a range of experimentally feasible values (Fig. S4).Importantly, these parameter ranges can be quite broad around the set of nominal parameter values, and can differ in size for different quantities depending on their level of uncertainty.For example, the parameter ranges for the amorphous capping layer are intentionally broader compared to the TI and FM layer thicknesses that are carefully controlled during growth.Density and magnetization are expressed in terms of formula units, which are compatible with the GenX simulation software; however, the final results are converted to conventional units before plotting.The proximity effect is modeled as a thin interfacial layer between the Bi 2 Se 3 and EuS films with a sampled thickness, roughness, and magnetization, and sharing the density value of the neighboring TI film.The proximity layer magnetization is constrained not to exceed the sampled value of the EuS magnetization for any given example.Additionally, the minimum possible thickness of the proximity layer is set to 2 Å representing a target spatial resolution threshold.Note that the neutron wavelengths for the PNR measurements conducted in this work are on the order of 5 Å.Examples for which the sampled proximity layer thickness falls below the threshold are simulated without an interfacial layer and are designated as non-proximity-coupled.The PNR profiles are simulated over the experimentally-accessible Q-range from 0.1 to 1.3 nm −1 , and the intensities normalized to a maximum value of 1.To simulate experimental noise, the generated PNR profiles are randomly perturbed at each Q point by sampling a Gaussian distribution with standard deviation estimated using the errorbars of the corresponding experimental reflectometry profile.Specific details regarding noise estimation, including selection of the standard deviation and dependence of the results on different noise levels, are provided in the Supplementary Information.
Additionally, the instrument resolution and background are sampled uniformly at random between 0.001 and 0.01 nm −1 , and 10 −8 and 10 −4 on a logarithmic scale, respectively.Since the reflectivity spans nearly eight decades, the base-10 logarithm of the profiles is used as input (output) of the encoder (decoder) to more equitably treat the intensity values.To similarly place the sample parameters on equal footing, the output of the regressor is taken to be the standardized values of the physical parameters.In particular, the regressor is trained to predict the density, thickness, and roughness of each thin film layer, as well as the magnetization of the ferromagnetic and proximity layers.The substrate density is also predicted by the regressor, but substrate thickness and roughness are excluded from fitting as the substrate is considered macroscopically thick with a relatively uniform surface roughness (approximately 3 Å for the sapphire substrate used in this system).While the instrument resolution and background are not predicted explicitly by the regressor, variations in the data as a function of these parameters can still be captured by the unregularized latent dimensions.They can thus be regarded as underlying degrees of freedom which may be more complex functions of the latent space.The freedom to choose the number of output quantities, even as the training data reflect variations in the full set of sample and instrument parameters, is one advantage of the machine learning-based approach: It allows one to output only the most relevant quantities, reducing the needed training data volume and neural network size.Additionally, machine learning makes it possible to seek hidden relationships between the data and parameters that may not be captured in an approximate theoretical model.Finally, the generated data are subdivided into training, validation, and test sets according to a 70/10/20% split.The data generation, model implementation, and analysis codes are provided in the linked GitHub repository [75].

RESULTS
We evaluate the trained VAE on its ability to recover the sample parameters from the experimental PNR profiles of the Bi 2 Se 3 /EuS system.The loss trajectories of the training and validation sets are shown in Fig. S8.We first compare the measured and decoded reflectometry profiles corresponding to four PNR experiments taken at different temperatures between 5 K and 300 K in Fig. 2a.These experimental PNR data are reproduced from Ref. 24.The left panel of Fig. 2a shows the reconstructed reflectometry profiles for both spin channels for the measurements at 5 K.The right panel shows the spin asymmetry (R ++ −R −− )/(R ++ +R −− ) calculated for both the measured and decoded profiles of the four experimental reflectivities at 5, 50, 75 and 300 K. Representative reconstructions of the test dataset in each error quartile are also shown in Fig. S10a.The four decoded experimental profiles are all found to be inliers of the distribution of reconstruction errors (Fig. S10b).This suggests that the chosen parameter ranges are likely suitable for the data under consideration.Next, using the parameter values predicted by the regressor, we calculate the SLD profiles for the measurements at each temperature (Fig. 2b).Note that the SLD profiles are computed directly using predicted parameter values and are not derived from the reconstructed PNR profiles.The nuclear (NSLD) and absorption (ASLD) scattering length density profiles appear largely consistent for the measurements at different temperatures, suggesting that the predicted values of the temperature-independent parameters, such as the thickness and density of each layer, are physically plausible.However, we do observe a change in the NSLD of the bulk FM layer at 300K that is worth mentioning.By examining the underlying parameters generating each SLD profile, we identify that the bulk FM thickness increases slightly with temperature, which appears to coincide with slight reductions in the thickness of the TI and proximity layers, and a more significant reduction in the interface roughness.At this stage, the exact origin of this temperature dependence is not well understood; the roughness values predicted for the FM and capping layers do not appear to follow a clear temperature-dependent trend and are likely prone to bigger uncertainties than the bulk parameters, but a possibility is that higher temperatures contribute to smoothing the buried interfaces, such as those between the TI and FM, and FM and capping layers, which could partially explain the NSLD fluctuation.The magnetic scattering length density (MSLD) profile is maximal at the EuS layer and exhibits a slight shoulder near the TI interface at 5 K, corresponding to the proximity layer.The MSLD magnitude drops progressively as the temperature is increased and disappears at 300K.These observations can be further traced back to the latent representations of the four experimental examples.In Fig. 2c, we visualize the latent space by projecting the encoded test dataset along the two dimensions with the largest local gradients for a given parameter value, e.g.substrate density ρ sub .Specifically, the horizontal and vertical axes of each subplot correspond to the latent dimensions with the largest and second largest gradient of the target parameter, respectively.The local gradients ∂v i /∂z j , where v i denotes the i-th parameter and z j denotes the j-th latent channel, are estimated using the 32 nearest-neighbors of each scattered point.Visualizations of all predicted parameters are given in Fig. S12.The scattered points, each corresponding to one profile of the test dataset, are colored according to the true value of the parameter viewed in each subplot.We find that the latent space is well-organized according to these parameter values, including the thickness t prox and magnetization m prox of the proximity layer.The latent representations of the four experimental PNR profiles are indicated by the outlined circles in the projection plots and are colored by the corresponding temperature.Notably, for temperature-independent quantities such as the TI density ρ TI , the experimental points at different temperatures are generally insensitive to the gradient direction of the underlying parameter value, while for those like the EuS magnetization m FM and m prox , the points at different temperatures follow the gradient direction of the parameter values closely.This corroborates our observations that the trained VAE learns a sensible and interpretable latent representation of PNR profiles from which the physical parameter values may be estimated.
The degree of parameter entanglement can be inferred from Fig. 2d, which shows the magnitudes of the average gradients of each sample parameter with respect to each latent channel.The first 13 latent dimensions are conditioned to vary with the values of the corresponding parameter, while the remaining channels are not explicitly linked to one specific physical parameter, i.e. they are regularized by the conventional standard normal prior distribution.However, we observe that these "free" channels sometimes participate in relating two or more parameters to one another, as observed for t prox , t TI , and t FM in Fig. 2d.
Next, we assess the overall regression accuracy of our trained model on the test dataset for each of the sample parameters visualized in Fig. 2c.In each subplot of Fig. 3a, the test data points are histogrammed according to the true and predicted values of a given sample parameter.Corresponding plots for the complete set of predicted parameters are shown in Fig. S14a.We also include the histogram for the parameter mt prox , defined as the product of proximity layer thickness and magnetization, in the last panel of Fig. 3a.The values of the bulk layer properties appear very well reproduced by the regressor, while t prox and m prox , which exhibit much weaker signatures and tend to be expressed most in the noisier, high-Q region of the PNR profiles, are somewhat underestimated at large values and overestimated for nonproximity-coupled samples.Note that the sharp discontinuity in the t prox histogram corresponds to the resolution threshold of 2 Å imposed on the generated data.
To assess the reproducibility of the regression results, we trained ten identical models with different initial weights and collected statistics of the resulting predictions.Fig. 3b shows the predictions of these models for the values of t prox , m prox , and m FM .We can optimize the trade-off between the true (tpr) and false (fpr) positive rates of correctly classifying proximity magnetism to obtain classification thresholds of thickness and magnetization that best separate the data points between the two classes.These allow us to estimate the resolution threshold of the trained model to correctly distinguish samples with and without proximity magnetism within a certain confidence interval.The method of threshold determination is described in detail in the Supplementary Information and yields the average classification thresholds of thickness and magnetization across the ten models indicated by the dashed gray lines in Fig. 3b.The thresholds are found at 5.1 Å and 16 emu cm −3 (1 emu = 1 A•m 2 ), corresponding to recalls of 85% for t prox , and 80% for m prox , for both the positive and negative classes (Fig. S14b-c).The optimal thickness threshold is slightly higher than the resolution threshold of the generated data but corresponds well to the neutron wavelength of the reflectivity simulation, 4.75 Å.While a small spread in the predicted values across the ten models is observed, the overall trend in the predictions is as expected.Notably, all ten models predict t prox and m prox values above their respective thresholds at 5 K.The values of m prox decay with increasing temperature while t prox remains relatively constant until a slight drop at 300K.Similarly, m FM drops rapidly beyond its Curie temperature.We note that the predicted values of m prox at intermediate temperatures are still often nonzero.This may be attributable to strong magnetic fluctuations above the EuS Curie temperature stabilizing a weak proximity effect below the resolution threshold of our model [47,76].Note that if weak proximity magnetism persists at high temperatures, it must be below the resolution threshold of our current model.A tailored network trained on a narrow range of parameters can potentially be devised to clarify even weaker signatures of proximity magnetism that may be expected at higher temperatures; however, the current model is highly suitable for surveying the evolution of proximity magnetism over a broad experimental parameter space, such as a wide temperature range.The predicted values obtained across the ten models for the remaining sample parameters, which are expected to be temperatureindependent, are plotted in Fig. S16.Here, we observe similar consistency among most parameters, with the exception of fluctuations in the roughnesses and FM layer thickness noted in the previous discussion.

Resolving interfacial antiferromagnetic coupling
Lastly, we apply our approach to elucidate proximity magnetism from a more challenging PNR profile of an intrinsic TI (Bi,Sb) 2 Te 3 interfaced with the AFM Cr 2 O 3 , shown schematically in Fig. 4a.Bulk Cr 2 O 3 is a well-known antiferromagnetic insulator with a Néel temperature of 307 K.At the interface between a TI and AFM, magnetic atoms on the AFM surface have been shown to induce interfacial ferromagnetic order in the TI which can survive at much higher temperatures than that produced by doping or interfacing with a FM film, owing to the typically higher Néel temperatures [25,28,29,33,77,78].However, magnetic proximity coupling between an AFM and TI is comparatively weaker and thereby more challenging to isolate experimentally.Fig. 4b shows the experimental PNR profile of the intrinsic (Bi,Sb) 2 Te 3 /Cr 2 O 3 system measured at the Magnetism Reflectometer at the Spallation Neutron Source (SNS) at Oak Ridge National Laboratory [79].Data were collected at a temperature of 5 K with an in-plane external magnetic field of 1 T. The TI has a nominal composition of (Bi 0.2 Sb 0.8 ) 2 Te 3 .Additional experimental details are described in the Methods section.Subtle evidence of spin splitting is observed in the associated plot of the spin asymmetry, (R ++ − R −− )/(R ++ + R −− ), shown in Fig. 4c.The experimental data in Figs.4b-c are superimposed with the corresponding best fit obtained using the GenX parameter refinement program [64].However, a major challenge encountered during conventional fitting is that repeated refinement with different initial populations fails to reproducibly predict the proximity effect, as the weak spin splitting observed in the PNR profiles could be attributed to either a small net magnetization at the AFM surface, or to proximity magnetism in the interfacial TI layer.To address this challenge, we train our VAE on a set of synthetic PNR profiles of the heterostructure shown in Fig. 4a.Similar to the case of Bi 2 Se 3 /EuS, we simulate the PNR profiles of 200, 000 candidate systems parameterized by the density, thickness, and roughness of each layer, which included the TI, AFM, sapphire substrate, Te capping layer, and a possible TeO 2 surface film.In a subset of these examples, we model the presence of an interfacial FM layer on either the AFM or TI surface, or both, parameterized by thickness, roughness, and magnetization, and sharing the density value of the corresponding bulk layer.Note that the predefined range for the magnetization of the interfacial layers is equivalent in terms of formula units but inequivalent in units of emu cm −3 due to the very different densities of the TI and AFM materials (Fig. S5).The theoretical thickness resolution is likewise set to 2 Å for both possible interfacial layers.PNR profiles are simulated over the experimentally-measured Q-range from 0.1 to 1.72 nm −1 and normalized to a maximum value of 1.The instrument background is sampled uniformly at random between 10 −8 and 10 −4 on a logarithmic scale.The remaining data preparation steps are conducted as for the Bi 2 Se 3 /EuS example.The reconstructed PNR profile, shown in Fig. 4e, matches the experimental data closely and is in the top reconstruction error quartile (Fig. S11).The corresponding SLD profile obtained using the predicted sample parameters is shown in Fig. 4d, where an interfacial FM layer is evidenced by the peak in the MSLD profile at the TI surface.We also plot the spin asymmetry in Fig. 4f, which displays a weak non-zero signature near 0.5 nm −1 .Most significantly, the trained VAE yields a latent representation for the test dataset and experimental example as shown in Fig. 4g.The points in each subplot are colored according to the true values of the thickness or magnetization of the interfacial FM layer on either the AFM surface, denoted t iAFM and m iAFM , or the TI surface, denoted t prox and m prox , re-spectively.The remaining sample parameters predicted by the model are plotted in Fig. S13a.We see that the experimental PNR profile is mapped unambiguously to a region with t iAFM ≈ 0 and m iAFM ≈ 0, while t prox and m prox are predicted to be approximately 8.4 Å and 12 emu cm −3 , respectively.The regression accuracy for each predicted parameter is plotted in Fig. S15a.To test the robustness of these predictions, we train another ten identical models with different initial weights.Figs.4h  and i show the predictions of the ten models for the values of t iAFM and m iAFM , and t prox and m prox , respectively.The gray dashed lines delimiting proximity-coupled examples are similarly obtained by optimizing the trade-off between the true and false positive rates of correctly classifying proximity magnetism in the validation set for each model.The average threshold values for the proximity layer are found at 4.4 Å and 8.2 emu cm −3 , corresponding to recalls of 76 and 72% in both classes, respectively (Fig. S15b-c).Since the values of m iAFM are roughly three times as large as those of m prox in conventional units, the resolution threshold for m iAFM is computed separately following the same procedure and found to be 20.6 emu cm −3 .Using these thresholds, we find that only half of the models predict proximity magnetism is present when considering the predicted m prox values, though t prox is well-resolved by most models; however, almost all models unambiguously predict t iAFM and m iAFM well below their respective thresholds.Thus, although the VAE approach pushes the boundary for resolving subtle magnetic signatures, it is possible that a very weak proximity effect is present at or slightly below the resolution threshold we could achieve with the current model.Nonetheless, the predictions could be used as a valuable screening tool before conducting finer measurements with either longer acquisition times or at higher Q to more clearly resolve the spin splitting, which could potentially benefit experimental planning and optimize the use of scientific user facilities.

DISCUSSION
Machine learning methods are valuable means of uncovering hidden patterns in materials data and elucidating the relationships between structural descriptors and measured quantities.However, it is often desirable to balance the flexibility of "black box" neural networks with a degree of interpretability in terms of physical parameters.We accomplish this in our VAE-based framework by conditioning the latent channels to emulate the behavior of the original sample parameters, which enables direct, visual inspection of encoded profiles in terms of meaningful physical quantities.However, our assessment of parameter entanglement in Fig. 2d reveals underlying correlations between sample parameters; for example, proximity layer thickness and magnetization are deeply entwined, since finite proximity magnetization implies finite thickness of the interfacial layer, and vice versa.This suggests that the prior assumption that all latent dimensions are sampled from independent normal distributions does not perfectly describe a latent space that is conditioned to vary directly with certain correlated physical parameters.A possible improvement to the existing approach would be to describe the latent space in terms of several joint distributions of a few strongly-correlated parameters, which can be tuned to balance the number of necessary network parameters.We note on a few additional considerations for future work.In particular, density fluctuations may be present in certain samples and can require fitting a number of distinct sub-layers of each material.Additionally, a more comprehensive study of the effects of noise on the VAE outcomes would be relevant to determine the effectiveness of such a model to screen candidate systems.These and other specialized features can be readily integrated into the framework presented in this work.Finally, it is important to acknowledge that while the present work aims to reduce reliance on expert insight for PNR parameter retrieval, domain knowledge is still needed to construct a layered representation of the system that considers all relevant features for generating the training data, as well as in the interpretation of the final results.Machine learning-driven discovery in this domain might entail more sophisticated models that not only determine the suitability of a particular hypothesis, such as the presence or absence of proximity magnetism, but rather discover the plausible mechanisms underlying a given observation.

CONCLUSION
A quantitative understanding of structural and magnetic information encoded in PNR measurements is often critical to resolving important interfacial phenomena, but experimental factors and lack of adequate fitting constraints can impede parameter retrieval without expert insight.In this work, we construct a data-driven framework for PNR parameter retrieval by training a conditioned VAE to map reflectometry profiles with moderate experimental noise to a well-organized, low-dimensional space from which sample parameters can be readily obtained.We balance the flexibility and interpretability of our model through latent space engineering, enabling indepth analysis of the resulting predictions.Compared to traditional fitting methods, our framework involves minimal user intervention overall, requiring no expert insight for parameter initialization or refinement, yet is capable of resolving parameter values near the experimental resolution limit.It further enables evaluation of the entire parameter search space by readily identifying outliers of the chosen domain.A possible extension of the framework is suggested to account for intrinsic correlations between conditioning variables.We apply our method to recover the SLD profiles of two proximity-coupled systems at subnanometer resolution, and we envision its potential application to a broader context of elusive phases expressed through weak experimental signatures, such as the axion insulator and topological superconducting phases.We anticipate that the methodology developed in this work can facilitate the development of comprehensive and fully-automated analysis routines for PNR parameter retrieval of a broad range of materials systems, as well as inform the wide spectrum of spectroscopic analysis workflows requiring parameter refinement.

Training details
Training data were generated using a Python implementation of the GenX neutron reflectivity modeling code [64].Simulation of the 200, 000 PNR profiles took approximately 48 sec.using 25 parallel processes on Intel(R) Xeon(R) Gold 5218 processors.Neural network models were implemented in Python using the PyTorch [80] libraries and trained on a Quadro RTX 6000 graphics processing unit (GPU) with 24 GB of random access memory (RAM).For the architecture used in this work, training a single epoch took ∼45 sec., and each model was trained over 100 epochs.Additional details of the network architecture and final hyperparameters are provided in the Supplementary Information.

Experimental details
Intrinsic (Bi,Sb) 2 Te 3 consisting of a ∼15 quintuplelayer (QL) of (Bi,Sb) 2 Te 3 with nominal composition (Bi 0.2 Sb 0.8 ) 2 Te 3 was grown by molecular beam epitaxy (MBE) on a thin film of the AFM insulator Cr 2 O 3 with a nominal thickness of 20 nm.The Bi:Sb ratio was optimized to locate the Fermi level near the Dirac node of the electronic surface states.The AFM Cr 2 O 3 film was grown on a sapphire substrate in a pulsed laser deposition chamber with a base pressure of 2×10 −8 mbar.A ∼10 nm amorphous Te capping layer was deposited on top of the (Bi,Sb) 2 Te 3 film to protect from degradation.PNR experiments were carried out at the Magnetism Reflectometer at the Spallation Neutron Source (SNS) at Oak Ridge National Laboratory at a temperature of 5 K with a 1 T in-plane magnetic field.

SUPPLEMENTARY MATERIAL
See supplemental material for additional details regarding data preparation and noise estimation; the VAE architecture and training history; the complete set of latent space visualizations; and the full set of regressor performance results, including the method of threshold determination.

PARAMETER RETRIEVAL WITH GENX
We illustrate some of the challenges that may be encountered using conventional parameter refinement programs when the sample parameters cannot be adequately constrained using additional characterization methods.Figs.S1a-d show the fitted reflectivity and corresponding SLD profiles of the 5 K PNR measurement of Bi 2 Se 3 /EuS obtained through the GenX parameter refinement program using different initial populations.While all four fits are of comparable quality (Table I) and generate similar SLD profiles, the underlying sample parameters differ considerably, as shown in Fig. S1e.In particular, while the interface roughness between the Bi 2 Se 3 and EuS films and the FM film thickness are comparable for all four fits, the thickness and magnetization of the proximity layer at the interface vary dramatically; in fact, the fits shown in Figs.S1a-b correspond to little to no proximity magnetism, in contrast to the experimentally-determined result.This motivates the development of an alternate approach to resolve potential ambiguities resulting from insufficient parameter constraints or expert supervision during fitting.

MOTIVATING EXAMPLE FOR VAE-BASED PARAMETER RETRIEVAL
To demonstrate the potential advantages of VAE-based parameter retrieval, we consider the simplified example illustrated in Fig. S2.The objective is to determine the sample parameters y 1 and y 2 that best reproduce the comparatively high-dimensional target signature x target , denoted in red in Fig. S2a.In this simple example, the landscape of a mean squared error (MSE) loss L = x − x target 2 contains one global and one local minimum in (y 1 , y 2 )-space; thus, rather than optimize directly in this space, we train the VAE to recover the input from a low (3)-dimensional latent representation, z (Fig. S2b).Since the latent space is organized such that nearby points correspond to similar reconstructions of the input, the loss function viewed in this space has only a single, global minimum in the region of interest (Fig. S2d).We note that the dimension of z must be higher than that of (y 1 , y 2 ), i.e., greater than 2, but is termed low-dimensional as it is substantially smaller than the dimension of x.This is particularly true for the target application of PNR, where measured reflectometry profiles contain more than 100 points which must be correlated to only a dozen or so underlying sample parameters.Additionally, if the target signature were instead an outlier of the chosen domain for parameters y 1 and y 2 (teal point in Fig. S2a), we could readily single it out by its large reconstruction error, shown in Fig. S2c.This would signify a need to expand or modify the parameter space, thus informing the suitability of the chosen parameter ranges.For example, the latent vector corresponding to the outlier in Fig. S2c is seen to sit off the manifold generated by mapping the test set to the latent space, shown in Fig. S2e, indicating it is not generated by parameters in the chosen range.Thus far, the considered framework is fully unsupervised, with no direct correspondence between the intermediate features and the original parameters y 1 and y 2 .However, as we expect the output to evolve predictably with the sample parameters, we can supervise the VAE training by conditioning a subset of the latent features z to vary directly with y 1 and y 2 following the formulation of Ref. 81, as shown in Fig. S2e.This can make the latent space more informative and interpretable by encouraging each selected dimension to organize according to a specific physical parameter, thereby facilitating quantitative prediction of its value.

DATA PREPARATION AND NOISE ESTIMATION
As described in the main text, training data are generated using the GenX neutron reflectivity modeling code [64], which simulates the PNR profiles using a layered representation of the candidate heterostructures parameterized by the density, thickness, roughness, and magnetization of each layer.The variable parameters are sampled uniformly at random over a range of experimentally feasible values, shown in Figs.S4 and S5 for the Bi 2 Se 3 /EuS and (Bi,Sb) 2 Te 3 /Cr 2 O 3 systems, respectively.Fig. S3 shows the representative evolution of the PNR profiles with respect to selected parameters for the Bi 2 Se 3 /EuS system.For noise estimation, it is reasonable to assume that the uncertainties in the measured reflectometry profiles follow the Poisson distribution, i.e. the uncertainty in where N 0 is the maximum count rate, and hence, As noted in the main text, each profile is simulated with a randomly sampled instrument background, i.e.R(Q) = R n (Q) + R bkg , where we define the neutron counts where δ ≡ 1/ N bkg is empirically determined such that ∆R(Q)/R(Q) approximates the ratio between experimental errorbars and data points.The simulated data are perturbed at each Q by randomly sampling a Gaussian distribution with standard deviation σ(Q) = ∆R(Q)/R(Q), and computing the perturbed profile as R(Q) = R(Q)(1+ (Q)), where (Q) denotes the sampled value.Independent perturbation of the profile at each Q leads to a highly oscillatory profile, particularly at high Q that does not emulate experiment well.Thus, the perturbed profiles are lightly smoothed by a uniform moving average over R(Q) as a function of 1/Q, which we find creates synthetic profiles that resemble experimental ones.Due to dimensionality reduction of the input profile by the VAE encoder, the VAE returns predicted profiles that are typically less noisy than the inputs (e.g.see Fig. S10).Fig. S6a-c compares the simulated ∆R(Q)/R(Q) for different choices of δ, 0.1, 0.5, and 0.9, against the experimental ∆R(Q)/R(Q) of the Bi 2 Se 3 /EuS sample.These results establish δ = 0.5 as a suitable choice to best match the given experimental statistics.In Fig. S6d, we also plot the total loss, reconstruction loss, and label loss achieved by identical VAEs trained on synthetic PNR profiles perturbed with each noise level.Finally, we also evaluate the noise dependence of the conventional and VAE-based approaches.Fig. S7a shows fitted reflectivity profiles obtained using the GenX parameter refinement program for a simulated profile perturbed by different levels of noise.The case with δ = 0 should in theory be perfectly recoverable as it is both simulated and fitted by the same program.We show the same profiles in Fig. S7b alongside reconstructions obtained by the VAE framework for δ > 0. Table II lists the MSEs between the fitted or predicted profile and the corresponding input obtained with each method for each noise level.While the MSEs of the VAE-predicted profiles are generally higher than those fitted by GenX (Table II), the discrepancy is reduced at higher δ, and at δ = 0.9, the profiles predicted by the VAE are in fact a better match to the noise-free profiles than those obtained by GenX.Moreover, the parameter values predicted by the VAE almost always correspond much more closely with the true parameter values than those obtained from the GenX fit, as shown in Fig. S7c, which is particularly evident for larger values of δ.Thus, the VAE may be particularly useful when working with data acquired in a short time, which often includes large statistical noise.

VAE TRAINING DETAILS
We train the VAE by minimizing the loss, The first term represents the reconstruction loss, or the expected negative log-likelihood of x given z; the second term regularizes the distribution of latent features; and the third gives the error between the (standardized) true and predicted parameter values of the regressor.The latent channels which are conditioned on specific sample parameters are drawn from Gaussian prior distributions with mean v and variance 1, i.e., p(z i ) ∼ N (v i , 1) for each parameter i, while the so-called free latent channels adopt the conventional Gaussian prior with zero mean, i.e., p(z i ) ∼ N (0, 1).The approximate posterior q θ (z | x) and likelihood p φ (x | z) are taken to be Gaussian-distributed; for this choice of likelihood, minimizing the negative log-likelihood of x with respect to θ is equivalent to minimizing the mean squared error between x and x , where x represents the reconstructed input x.Thus, the first term is directly implemented as x − x 2 .The encoder, decoder, and regressor networks were optimized jointly using the Adam optimizer; the hyperparameters used for the final model are listed in Table III

VAE REGRESSOR PERFORMANCE
The complete sets of histograms visualizing the VAE regressor performance for the Bi 2 Se 3 /EuS and (Bi,Sb) 2 Te 3 /Cr 2 O 3 systems are shown in Figs.S14a and S15a, respectively.Additionally, Figs.S14b-c and S15b-c illustrate the method of threshold determination used to classify the presence or absence of interfacial TI and AFM layers.In particular, the resolution limits of the trained model in recovering the values of t prox and m prox (and t iAFM and m iAFM for the (Bi,Sb) 2 Te 3 /Cr 2 O 3 system), are obtained by the following procedure.In each case, we consider a pair of thresholds, θ true and θ pred , (θ pred > th true ), corresponding to the true and predicted parameter values, respectively.The true threshold is the theoretical resolution limit set by, for instance, the parameter ranges used to generate the data or the experimental resolution, while the predicted threshold is the resolution limit set by the trained model, i.e. it incorporates the errors of imperfect prediction by the VAE regressor.When displayed on the two-dimensional histograms shown in the left panels of Figs.S14b-c   The objective of the above is to obtain the lowest possible fpr with the highest possible tpr while minimizing the discrepancy between tpr and (1 − fpr), also known as the true negative rate, ensuring that recovery of both positive and negative classes is comparably accurate.We define the average θ * pred obtained over 10 instances of the trained model with different initial weights as the resolution limit of the VAE approach for each of t prox , m prox , t iAFM , and m iAFM , corresponding to the gray dashed lines plotted in Figs.S16 and S17 as described in the main text.

REPRODUCIBILITY OF VAE PREDICTIONS
We show the predictions for the full set of sample parameters produced by ten trained models with different initial weights for the Bi 2 Se 3 /EuS and (Bi,Sb) 2 Te 3 /Cr 2 O 3 systems in Figs.S16 and S17, respectively.In both cases, the predicted values for most parameters are similar across different models, particularly for bulk properties, but a few parameters exhibit greater uncertainty (e.g.σ cap in Fig. S17.Additionally, while the temperature dependence of the quantities in Fig. S16 is mostly as expected, a few exceptions are worth mentioning.First, the values of most bulk temperature-independent quantities, such as materials' densities and thicknesses, are predicted consistently across different temperatures.However, there is a slight increase in the thickness of the FM layer at 300 K which appears to coincide with slight reductions in t TI and t prox , and a more significant reduction in σ prox .In fact, the predicted roughnesses for all interfaces and surfaces is lowest at 300 K, but reasons for this cannot be fully understood from the present model.Additionally, while σ prox generally decreases with increased temperature, the roughness values for the FM and capping layers do not appear to follow a clear temperature-dependent trend.

FIG. 1 .
FIG. 1. VAE with regression for PNR data analysis.a. Schematic illustration of the proximity-coupled Bi2Se3/EuS system.A depiction of the spin-polarized neutron reflectometry experiment under an externally applied magnetic field is shown at right.b.Reflectometry profile of the heterostructure in a measured at T = 5 K. Solid lines correspond to a representative fit obtained using the GenX paramter refinement program.Error-bars represent ±1 standard deviation.c.Nuclear, magnetic, and absorption SLD profiles corresponding to the fit in b. d.Schematic illustration of the VAE architecture used for PNR parameter retrieval.

FIG. 2 .
FIG. 2. VAE performance on experimental data.a.At left, the decoded PNR profile of the Bi2Se3/EuS heterostructure measured at 5 K.For each channel, points represent experimental data and solid lines are the corresponding reconstruction.At right, the spin asymmetry (R ++ − R −− )/(R ++ + R −− ) calculated from the data (points) and decoded profiles (solid lines) of four PNR measurements taken at different temperatures.Error-bars represent ±1 standard deviation.b.Nuclear (NSLD), magnetic (MSLD), and absorption (ASLD) scattering length density profiles obtained from the regressor predictions for the measurements at the four temperatures in a.The MSLD contribution from the proximity layer is shown in dark green.c.Projections of the latent encoding of the test dataset along the latent dimensions with the largest gradients for different sample parameters (density ρ, thickness t, magnetization m), colored by their true values.Outlined points show the latent encoding of the four experimental measurements from a. Specifically, the horizontal and vertical axes correspond to the latent dimensions with the largest and second largest gradient of the target parameter, respectively.d.Parameter entanglement inferred from gradients along each latent channel.Heatmap indicates the relative magnitudes of the gradients of a given parameter with respect to each latent channel.Namely, each cell is colored by the normalized value of ∂vi/∂zj for the i-th parameter and j-th latent channel.Normalization maps gradients in each row to lie between 0 and 1.

FIG. 3 .
FIG. 3. Regressor performance and predictions on experimental data.a. Histograms of predicted versus true values for different sample parameters of the test dataset.b.The predictions of proximity layer thickness tprox and magnetization mprox, and FM layer magnetization mFM obtained from 10 instances of the VAE trained with different initial weights, shown as a function of the measurement temperature of the corresponding experiment.Gray dashed lines indicate the optimal thresholds obtained for proximity classification.Scattered points above (below) the determined threshold are colored yellow (blue).Violin plots of the predicted values for experiments with majority predictions above (below) the threshold are shaded yellow (blue).

FIG. 4 .
FIG. 4. Proximity magnetism in the TI-AFM system.a. Schematic illustration of the (Bi,Sb)2Te3/Cr2O3 system.b.Experimental PNR profile at 5 K and best fit obtained with the GenX parameter refinement program.An expanded view of the splitting between the two channels at high Q is shown in the inset.Error-bars represent ±1 standard deviation.c.The spin asymmetry (R ++ − R −− )/(R ++ + R −− ) calculated from the measured (points) and best fit (solid line) R ++ and R −− profiles shown in b. d.Nuclear (NSLD), magnetic (MSLD), and absorption (ASLD) scattering length density profiles obtained from the regressor predictions for the experimental measurement shown in b.The MSLD contribution from the proximity layer is shown in dark green.e. Decoded profiles of the PNR measurement in b.An expanded view of the profiles at high Q is shown in the inset.f.Spin asymmetry calculated from the measured (points) and predicted (solid line) R ++ and R −− profiles shown in b. g.Projections of the latent encoding of the test dataset along the latent dimensions with the largest gradients for the thickness and magnetization of the interfacial AFM and TI layers.Red points show the latent encoding of the PNR profile in b. h-i.The predictions of h.interfacial AFM layer thickness and magnetization, and i. proximity layer thickness and magnetization obtained from 10 instances of the VAE trained with different initial weights.Gray dashed lines indicate the optimal thresholds for proximity classification.Scattered points above (below) the threshold are colored yellow (blue).

FIG. S1 .
FIG. S1.Representative GenX fits with varied initialization.a. (Left) Reflectometry profile of the Bi2Se3/EuS system measured at 5 K. Solid lines correspond to the fit obtained using the GenX paramter refinement program with a particular initial parameter population.Error-bars represent ±1 standard deviation.(Right) Nuclear, magnetic, and absorption SLD profiles corresponding to the fit in the left panel.b-c.Analogous to a but obtained using different initial populations for the GenX fit.e. Fitted values for selected parameters obtained from the fits displayed in a-d.
FIG. S2.Advantages of VAE-based parameter retrieval.a. Toy model of a signature x generated by two parameters, y1 and y2.The loss L denoting the MSE between an arbitrary x and a target signature xtarget contains one global and one local minimum in (y1, y2)-space.b.Schematic illustration of the VAE architecture.c.Norms of the predicted versus true signatures of the test dataset.Each point is colored according to the MSE between the associated x pred and xtrue.The signature xtarget (red point) generated by (y1, y2) within the domain of training samples (yellow patch in a) is reconstructed with comparatively low MSE, while the signature x outlier outside the domain (teal point) is an outlier, reflected also in the adjacent box plot.d.Density plot of the loss L as a function of the three-dimensional latent space z showing a single minimum in the region of interest.A two dimensional slice through the z3 = 0 plane is shown at the right.The latent representations of the target and outlier signatures from a projected onto the z3 = 0 plane are indicated by the red and teal points, respectively.The shaded disk and dashed circle denote one and three standard deviations away from the latent coordinate of the target signature, respectively.The "×" denotes the minimum of the loss L. e. Visualization of the latent representation of the test dataset.Each point is colored according to the associated true values of y1 (left) and y2 (right), showing that z1 and z2 vary jointly with y1 and y2, respectively.The latent representations of the target and outlier signatures are indicated by the red and teal points, respectively.

FIG. S3 .
FIG. S3.PNR profile evolution with sample parameters.Representative examples for the evolution of PNR profiles with respect to a continuous change in a. TI layer thickness, b.TI layer density, c. top surface roughness of the interfacial proximity layer, and d. magnetization strength of the FM layer.

FIG. S4 .
FIG. S4.Parameter ranges for Bi2Se3/EuS data generation.Distribution of parameter values for the density, thickness, interface roughness, and magnetization of each layer in the generated data.Distributions are split between samples with (solid) and without (hatched) an interfacial proximity layer, termed positive and negative counts in the legend, respectively.Density and magnetization are sampled using their values in terms of formula units (middle column), which are compatible with the GenX simulation software.The corresponding values in conventional units are shown in the rightmost column (1 emu = 1 A•m 2 ).

FIG. S7 .
FIG. S7.Noise dependence of conventional and VAE-based methods.a. Simulated (points) and fitted (solid lines) reflectivity profiles with different levels of noise, indicated by the parameter δ, using GenX parameter refinement.The simulated curve is one of the Bi2Se3/EuS training examples.b.Simulated (points) and reconstructed (solid lines) reflectivity profiles with different levels of noise, indicated by the parameter δ, using the VAE framework.c.Fitted or predicted parameter values corresponding to the profiles in a and b obtained using GenX (blue) and the VAE framework (yellow).

FIG. S8 .
FIG. S8.Training history for the Bi2Se3/EuS system.Trajectories of the total loss and individual contributions from reconstruction loss, KL divergence (KLD) loss, and label loss for the training and validation sets.The mean squared errors (MSEs) between the decoded and true PNR profiles (Reconstruction MSE) and between predicted and true sample parameters (Label MSE) are also shown.
FIG. S14.Regressor performance for the Bi2Se3/EuS system.a. Histograms of predicted versus true values for different sample parameters of the test dataset.b-c.Threshold determination for proximity layer b. thickness and c. magnetization.In the left panels of b and c, black and red solid lines denote optimal thresholds on the true and predicted values, respectively, which subdivide the histogram into four rectangular quadrants delimiting the false positive (upper left), true positive (upper right), false negative (lower right), and true negative (lower left) examples, where the positive class exhibits proximity magnetism.The right panels of b and c show the true and false positive rates corresponding to different choices of threshold for both the true and predicted parameter values, with each point colored by the predicted threshold value (always the larger of the two).The optimal thresholds are selected to balance the trade-off between true and false positive rates, corresponding to the outlined point in the right panels.

FIG. S15 .
FIG. S15.Regressor performance for the (Bi,Sb)2Te3/Cr2O3 system.a. Histograms of predicted versus true values for different sample parameters of the test dataset.b-c.Threshold determination for proximity layer b. thickness and c. magnetization.In the left panels of b and c, black and red solid lines denote optimal thresholds on the true and predicted values, respectively, which subdivide the histogram into four rectangular quadrants delimiting the false positive (upper left), true positive (upper right), false negative (lower right), and true negative (lower left) examples, where the positive class exhibits proximity magnetism.The right panels of b and c show the true and false positive rates corresponding to different choices of threshold for both the true and predicted parameter values, with each point colored by the predicted threshold value (always the larger of the two).The optimal thresholds are selected to balance the trade-off between true and false positive rates, corresponding to the outlined point in the right panels.

FIG. S16 .
FIG. S16.Statistics of predicted parameter values for the Bi2Se3/EuS system.The predictions of each sample parameter obtained from 10 instances of the VAE trained with different initial weights, shown as a function of the measurement temperature of the corresponding experiment.Gray dashed lines indicate the optimal threshold obtained for proximity classification.Scattered points above (below) the determined threshold are colored yellow (blue).Violin plots of the predicted values for experiments with majority predictions above (below) the threshold are shaded yellow (blue).

TABLE I .
GenX fit quality with different initial populations.
* The GenX FoM is the MSE between the logarithms of the true and fitted reflectivity profiles, added over the spin channels.

TABLE II .
Noise-dependence of MSE.

TABLE III .
Final network hyperparameters.