Featureless adaptive optimization accelerates functional electronic materials design

Forward and inverse design of equilibrium materials properties have been accelerated with data-driven methods. Electronic materials exhibiting phase transitions between metastable states, such as metal-insulator transition (MIT) materials with abrupt electrical resistivity transformations, however, are challenging to decode. Conventional machine learning methods display limited predictive capability for MIT materials, because data scarcity and the absence of features impede model training. Here we demonstrate a discovery strategy based on multi-objective Bayesian optimization to directly circumvent these bottlenecks by utilizing latent variable Gaussian processes combined with high-fidelity electronic structure calculations for validation in the chalcogenide lacunar spinel family. We directly and simultaneously learn phase stability and band gap tunability from chemical composition alone to efficiently discover 12 superior compositions on the Pareto front. Previously unidentified electronic transitions also emerge from our featureless adaptive optimization engine (AOE). Our methodology readily generalizes to optimization of multiple properties, enabling co-design of complex multifunctional materials especially where prior data is sparse.

Upon traversing a critical temperature, the electrical resistivity of a metal-insulator transition (MIT) material can change by orders of magnitude. Athermal approaches may also trigger the electronic transitions, including (chemical) pressure, variable carrier-densities, and applied electromagnetic fields. The transformations can be used to encode, store, and process information for beyond von-Neumann microelectronics and overcome performance limits of conventional field-effect transistors 9 for advanced logic/memory technologies. 10 Because macroscopic MITs occur in materials with diverse chemistries and structures (Fig. 1a), various microscopic mechanisms -electron-lattice interactions, electron-electron interactions, or a combination thereof -lead to large variations in critical temperatures and accessible resistivity changes. This diversity exacerbates the efficient discovery and optimization challenge of achieving multiple property requirements to outperform silicon-based devices, 11 including stability, large reversible resistivity changes (≈10 5 ), and above room-temperature operation.
The complex lacunar spinel family AM a M b 3 Q 8 with trivalent main group A, transition metal M, and chalcogenide Q ions demonstrate the complexity active in MIT materials design. The structure comprises transition-metal clusters (TMC) with M a and M b cations at the apical and basal positions of the tetrahedra (Fig. 1b inset). Although there are hundreds of possible elemental combinations on the four lattice sites in the crystal structure (Fig.1a), only tens of the lacunar spinels have been experimentally reported. 12,13 For example, GaV 4 S 8 (M a = M b = V) exhibits a MIT, 14 exotic spin textures, 15 and multiferroism 16 while GaVTi 3 S 8 shows negative magnetore-sistance and half-metallic ferromagnetism. 17 Most lacunar spinels are narrow-bandwidth semiconductors in their ground states; 12,18 these electronic properties are governed by distortions of the local TMC from the ideal T d geometry, 19 which manifest as low-frequency phonons as shown for GaMo 4 S 8 (Fig.1b, blue curve). Jahn-Teller-type distortions, which correspond to elongation along the [111] direction alter the TMC geometry, are particularly important; they transform the insulating GaMo 4 S 8 ground state into a metastable metallic phase (Fig. 1c). The MIT arises from a redistribution of electrons among the structure-driven orbital hierarchy (Fig. 1c insets). Furthermore, these phases host low energy electronic structures, discernible from the projected density of states (pDOS) in Fig.1c, that arise from the different M a and M b sites. This capability to exhibit distinct and tunable electronic phases poses a challenge in the design of lacunar spinels from physicsbased models while also making them an ideal system for MIT performance optimization.
We seek lacunar spinels that exhibit high stability and large resistivity-switching ratios, which we formulate as design objectives for MIT-materials discovery. We reduce the approximately O(10 3 ) compositional space to 270 candidates that maintain a 1 M a to 3 M b ratio. (AM a 2 M b 2 Q 8 compositions are excluded as they remove the C 3v symmetry fundamental to the MIT; Cr is also excluded from occupying the M b site, because it destabilizes 20 the cluster.) We define the first design objective as the decomposition enthalpy change (∆H d , Fig.1d), and use density functional theory (DFT) simulations to evaluate formation energies. Materials with larger ∆H d are expected to be more synthesizable 21 and stable during operation, making it a useful filter to prioritize compounds for subsequent theoretical analysis and synthetic processing. The second design objective is the ground state band gap (E g ). We use it as a proxy for the resistivity-switching ratio since E g is positively correlated with the resistivity change between different electronic states (Fig.1d). A larger E g also allows for greater band-gap tunability through control over the C 3v distortion, which is a desirable feature for programmable electronics. Importantly, because E g is small for most MIT materials, stability is expected to be lower and more difficult to achieve than that of nonpolymorphous compounds with majority ionic or covalent bonding. 22 The nonlinear responses of both design objectives bring severe challenges to compound optimization beyond those amplified by chemical combinatorics using data-driven models. Machine learning (ML) methods for materials design show promising results when sufficient training data is available for learning: 8,23 The predictive performance (error and efficiency) is limited by the quality and quantity of the data, typically > O(10 2 ) (Fig. 2), and features whose selection may be informed by domain knowledge or trial-and-error approaches. Thus, conventional ML models cannot accommodate the data dearth posed by the limited number of lacunar spinels undergoing thermal MITs. Furthermore, the lack of microscopic understanding in how different compositions influence the MIT lead to ambiguity in whether any features can be suitably formulated for discovery of MIT materials from structure and composition alone rather than through effective Hamiltonians. 24 We overcome these obstacles by implementing a cyclic adaptive optimization engine (AOE) shown in Fig.2, which consists of four interative tasks (vide infra): property evaluation, aggregation of data (in a repository), featureless learning, and composition optimization. Beyond returning a predictive model capable of predicting properties from compositions alone, our iterative AOE leverages earlier approaches [25][26][27] to deliver materials with superior performance by design of composition-based solutions on the Pareto front spanned by the multiple objectives. The AOE has the important advantage of bypassing the feature engineering procedure as in conventional ML methods; it learns properties directly from the chemical composition at each site (i.e., A, M a , M b , Q). We achieve featureless learning by transforming discrete categorical variables (i.e., elemental compositions) into a continuous numerical space and then perform composition optimization under the multiple objectives through latent variable Gaussian processes (LVGP, see Methods).
We start the MIT-materials AOE for the lacunar spinel family through an initial design of experiment (DoE) consisting of four experimentally known compounds within the family and eight new compositions. The eight new compositions are obtained using Latin Hypercube Design to ensure each element is present in the dataset (see Methods and Supplementary Fig. 1). Next, we use high-fidelity DFT simulations to evaluate ∆H d and E g (see Methods). This is the most resource-intensive step among the four tasks; therefore, it is desirable to iterate through the AOE (property evaluation) loop as few times as possible. Then, we create a data repository that contains entries for both composition and the evaluated properties. Unlike other ML methods, we do not rely on a large number of existing data at either the onset or later in the learning process.
Next, we construct a LVGP model by mapping the elemental compositions (e.g., Al, Ga, In) into a two-dimensional (2D) latent space (Fig.2, lower right inset) such that relative positions in this space are obtained using maximum likelihood estimation (MLE). This latent space representation enables us to construct a Gaussian process surrogate model of the unknown underlying Conventional machine learning models typically require a large amount of raw data > O(10 2 ) from existing databases. Feature engineering is an essential step to extract effective materials descriptors based on chemical intuition. Then, machine learning models are constructed after some preprocessing procedure (e.g. train-test-split, data standardization). The validation step checks the model performance and helps with hyper-parameter tuning. Finally, the predictive model is ready for production and can be applied to unobserved input data. Lower panel, The adaptive materials discovery scheme starts from an initial set of design of experiments (DoE), where system variables, design objectives, and design space are first defined for the problem, providing a few O(10 1 ) candidate materials to initialize the discovery procedure. Property evaluation, The target material properties (design objectives) are evaluated either by experimental measurement or theoretical simulations. Candidate composition and its evaluated properties are then added to a Data repository, which initially may either be empty or only contains entries for existing materials within the design space. Its size grows as more candidate materials are evaluated during the adaptive optimization process. Featureless learning involves directly learning from the chemical composition of materials comprising the data repository by mapping each compositional variable into a two-dimensional latent space (spanned by z 1 and z 2 ) using maximum likelihood estimation, which enables the construction of a latent variable Gaussian process (LVGP) surrogate model. One surrogate model is constructed for each design objective using all currently available knowledge within the data repository. Composition optimization, Multi-objective Bayesian optimization is then performed with the LVGP models to obtain the next candidate material composition with the highest expected maximin improvement (EMI) value. The model accounts for uncertainty with the 95% confidence interval shown as the shadowed area around the new compositions (the green symbols). In the lower left inset, the green star composition outperforms the green circle composition, and will be passed to the next property evaluation procedure. The iterative optimization step continues until all compounds satisfying the objectives are discovered, forming the Pareto front, or computational resources expire. design objectives, ∆H d and E g , as a function of composition. The multi-objective Bayesian optimization (MOBO) step then begins and we use the LVGP models to predict ∆H d and E g of the unexplored compositions in our design space; we choose the next candidate composition for evaluation using the expected maximin improvement (EMI, see Methods) as the acquisition function, which quantitatively describes the performance gain compared against the compositions at the current Pareto front. This acquisition function considers both exploration of compositions with high uncertainty (Fig. 2, shaded ellipses, lower left inset) as well as exploitation of candidates with high performance gain. The composition with highest EMI is then selected for DFT simulation (property evaluation), at which point another AOE cycle commences.
The aforementioned iterative optimization procedure progresses and explores the available design space. One new lacunar spinel composition is evaluated and added to repository after each AOE iteration. The LVGP models are also updated in each iteration as more knowledge becomes available. Owing to the high computational cost of the property evaluation process, we terminate the optimization process after searching through 1/3 of the entire design space. In order to validate the effectiveness of this method, we ultimately evaluated ∆H d and E g with DFT calculations of all 270 compositions within the design space by expending approximately 3 × 10 6 CPU hours. Fig. 3a displays the results of the AOE. We successfully identify all 12 materials at the true Pareto front within 53 iterations (red asterisks, upper panel)-compositions and objectiverelated properties are enumerated in Table 1. Combined with the 12 compounds from our initial DoE, we explored less than 25% of the entire design space before identifying all lacunar spinels on the Pareto front. Interestingly, Pareto-front compositions are mostly found with high EMI values, showing that our model makes beneficial recommendations on which composition to evaluate next. High prediction uncertainty likely explains why a Pareto-front composition is not identified for some iterations with a large EMI. The EMI values reduce to nearly zero after all Pareto front compositions are identified (blue, upper panel) since all candidates not sampled are dominated by the Pareto front compounds. We also show the absolute error in the LVGP-predicted ∆H d (pink) and E g (orange) values of the evaluated composition at each iteration to further demonstrate the effectiveness of our model (Fig.3a). We find a general decreasing trend in error and therefore better model predictability as it becomes aware of more compositionproperty knowledge. The aforementioned performance is robust as revealed by our multi-trial results ( Supplementary Fig. 2), where we find the AOE successfully identifies 90 % of the true Pareto-front compositions by exploring 30 % of design space with different initial DoE sets. Fig. 3b shows the history of composition explored by the AOE for the first 60 iterations. The detailed optimization history is shown in Supplementary Fig. 3. The initial DoE sets are relatively scarcely distributed away from the Pareto front, yet the model explores regions far from that covered by the DoE sets and is able to identify 75% of Pareto front compositions within the first 40 iterations. First, we begin to understand this performance by examining the distribution of elements sampled by the MOBO (Fig.3c). Our model does not exhibit much compositional bias upon sampling elements for the A site; however, it shows clear preferences for choosing certain elements on other sites. V and Mo are sampled more frequently on the basal M b site, while Nb and Ta are less favored on the apical M a site. Se is also preferred over S and Te for the Q site. Second, we examine the 2D latent space representations for both design objectives obtained after 60 iterations of AOE ( Fig.3d and e). The relative positioning of elements in the latent space reflects correlations in their influence on properties; elements in close proximity exhibit similar impact. Interestingly, different transition metals exhibit distinct correlation patterns across various sites and objective properties. This variation leads us to conclude that (i) the transition metals contribute to stability and band gap in different and unexpected ways, and (ii) the lack of any resemblance in element positioning in the site-dependent latent spaces, except for the M a site, to the periodic table indicates that chemical-intuition-based MIT design within the lacunar spinels is highly nontrivial. For example, chromium is located far from the other elements in the M a latent space, indicating that its influence on properties is distinct. Indeed, Cr-containing compounds have significantly lower E g and higher ∆H d (Supplementary Fig. 4).
Third, we use DFT simulations to examine the properties of the identified Pareto-front compositions, focusing on ∆H d , E g , and the Jahn-Teller active phonon ν JT involved in the MIT (Table 1). We find most Pareto-front compositions consist of two different cations on the M a and M b site, only three have M a = M b , with 75 % of the optimized materials being selenides. GaV 4 Se 8 is the only Pareto front compound previously synthesized. All compounds exhibit R3m symmetry and are dynamically stable in their ground state (ν JT > 0). The phonon frequencies of the selenides, including ν JT are lower than those of the sulfides. All of the designed lacunar spinels also exhibit semiconducting gaps with semilocal exchange-correlation and static Coulomb interactions and exhibit nonzero electric polarizations. Compositions with larger band gaps tend to have lower stability as determined by ∆H d : 2/3 are stable (∆H d > 0, indicating decomposition is endothermic), whereas four of the 12 compounds comprising Mo have small values of ∆H d < 0, which could nonetheless be stable and synthesizable. 21,28 Chemical intuition-based interpretations of the observed ∆H d -E g trade-offs among these compounds, however, are difficult to formulate and reinforces the effectiveness of the AOE.
Although the ground states of these materials are all semiconducting, we find two different electronic transitions upon traversing the ideal TMC geometry (θ m = 60 • ): the expected (Type I) metal-to-insulator transition and an unexpected (Type II) semiconductor-to-insulator transition (SIT). Fig.4a shows the changes to the electronic structure for the MIT lacunar spinels AlTaV 3 Se 8 and InWMo 3 Se 8 with the insulating state (lower panel) always lower in energy than the metastable metallic phase (upper panel) after the Jahn-Teller-type distortion (θ m 60 • , Table 1). The pDOS of these compounds show that the metallic state in the Type I transition arises from cluster distortion-triggered orbital ordering and occupancy changes, similar to the mechanism depicted in Fig. 1b. However, the metallic states are different owing to the chemistry of the metals comprising the TMCs. We also find that the basal M b site plays a more decisive role near the Fermi level with minor contribution from the apical M a site. The M a site on the other hand, plays an active role in the Jahn-Teller-active phonon owing to differences in atomic mass ( Table 1). The remaining lacunar spinels in Fig.4a, InNbMo 3 Se 8 , InTaMo 3 Se 8 , InCrV 3 S 8 , and InWV 3 S 8 , exhibit a Type II transition. The lower and upper panel show their ground and metastable state pDOS, respectively. Interestingly, some compounds undergo singlet formation and transform into a nonmagnetic phase (e.g., InNbMo 3 Se 8 ) while others remain ferromagnetic after the cluster distortion (e.g., InCrV 3 S 8 ) owing to competition between spin-pairing and magnetic interactions. Last, we model the switching process and resistivity upon structural distortion for InWMo 3 Se 8 (Type I) and InTaMo 3 Se 8 (Type II) by modulating the amplitude of the ν JT atomic displacements for each material in both the (insulating) ground and (metallic or semiconducting) metastable states. The DFTsimulated energy and corresponding band gap at different cluster angles (θ m ) are shown in Fig. 4b. Both compounds show first-order transitions. Owing to the small changes in the TMC geometry required for switching, readily available external stimuli could be used to trigger the transitions. 14,29,30 The simulated DC resistivity of InWMo 3 Se 8 and InTaMo 3 Se 8 clearly shows the promising functionality of these newly discovered compositions in the lacunar spinel family (Fig.4c). Since we successfully identify all 12 Pareto-front compositions by searching through less than 25% of the design space, our work demonstrates the efficiency of featureless adaptive materials discovery for electronic materials design. The AOE is particularly useful when data availability and physical understanding of the target materials system is limited at either the atomic or microstructural scale.

Online Content
Any methods, additional references, Nature Research reporting summaries, source data, extended data, supplementary information, acknowledgements, peer review information; details of author contributions and competing interests; and statements of data and code availability are available at URL to be inserted by Publisher.

Density functional calculations.
We perform DFT simulations as implemented in the Vienna Ab initio Simulation Package (VASP). 31,32 The projector augmented-wave (PAW) potentials 33 are used for all elements in our calculations with the following valence electron configurations: Al (3s 2 3p 1 ), Ga (3d 10 4s 2 4p 1 ), In (4d 10 5s 2 5p 1 ), V (3s 2 3p 6 3d 4 4s 1 ), Nb (4s 2 4p 6 4d 4 5s 1 ), Ta (5p 6 5d 4 6s 1 ), Cr (3s 2 3p 6 3d 5 4s 1 ), Mo (4s 2 4p 6 4d 5 5s 1 ), W(5s 2 5p 6 5d 5 6s 1 ), S (3s 2 3p 4 ), Se (4s 2 4p 4 ), and Te(5s 2 5p 4 ). We use exchange-correlation potentials (V xc ) as implemented by Perdew-Burke-Ernzerhof (PBE). 34 The effect of on-site Coulomb interactions (PBE+U) is considered with a U value of 2.0 eV for all 6 transition metals, as suggested by previous studies. 35,36 Numerous spin configurations are evaluated to ensure the global ground state is achieved and that those states are consistent with available experimental magnetic data. 37 Spin-orbit interactions (SOI) are not considered in our calculations. Although it has been shown that SOI leads to interesting molecular j eff states, 36 this order does not strongly affect the size of the ground state electronic band gaps, even 5d transition metals lacunar spinels. 35 A Γ-centered 6 × 6 × 6 k-point mesh with a 500 eV kinetic energy cutoff is used. We employ Gaussian smearing with a small 0.05 eV width. For density-of-state calculations, we use the tetrahedron method with Blöchl corrections. 38 Electric polarizations along the [111] direction are simulated using the Berry phase method. 39 The crystal structures of the existing lacunar spinels are obtained from our previous DFT studies, 40 structures of new compositions are obtained by replacing the elements on the corresponding crystallographic sites from existing structures. We perform full lattice relaxations until the residual forces on each individual atom are less than 1.0 meVÅ −1 . The DFTrelaxed crystal structures of the Pareto front compositions are available at Ref. 41. We initialize the relaxation with various magnetic moment configurations, the converged configuration with the lowest energy is reported as the DFT ground state. Zone center (k = 0) phonon frequencies and eigendisplacements are obtained using the frozen-phonon method with preand post-processing performed with the Phonopy package. 42 The decomposition pathways are automatically generated using Grand Canonical Linear Programming 43 from the Open Quantum Materials Database. 44 Resistivity simulations are performed using electronic structures computed from VASP as previously described, but with an increased 24 × 24 × 24 k-point mesh and the BoltzTrap2 package. 45 We also assume that all M a sites have the same orientation within the crystal. In order to validate this model, we simulated a 2 × 2 × 2 supercell of InNbMo 3 Se 8 with one Nb atom oriented in a different direction from the other seven. We find that the ground state E g as well as ∆H d exhibit negligible changes from the homogeneous description. We also compared the change in properties with the anti-ferromagnetic spin configuration using a doubled simulation cell with the ferromagnetic ground state. As before, we find there are no significant changes in the aforementioned properties. These results are reasonable because the local structure of the TMC dictates the low-energy band structure near the Fermi level.

Adaptive Optimization Engine (AOE). Dataset initialization.
A dataset of 12 compounds is used to initiate MOBO. This set includes four known lacunar spinel compositions (GaMo 4 S 8 , GaV 4 S 8 , GaNb 4 Se 8 , and GaTa 4 Se 8 ) and eight new compositions generated by a discretized Latin Hypercube Design (LHD 46 ) as shown in Supplementary Fig. 1.
Fitting the latent variable Gaussian process model. Conventional Gaussian process (GP) modelling has been developed for only quantitative design variables and the associated correlation functions cannot handle categorical variables. To overcome this limitation, LVGP maps each categorical variable to a 2D Cartesian latent space, 47,48 establishing a numerical representation for different categories. With this mapping, the covariance model over categorical design variables can be any standard GP covariance model for quantitative variables, e.g., the Gaussian correlation function. In the AOE, two independent LVGP models with Gaussian correlation function are fit at each iteration to predict E g and ∆H d , respectively. In each LVGP model, categorical variables A, M a , M b and Q are represented by a 2D numerical latent variable vector to evaluate their correlation. Note that each categorical variable resides in its unique latent space. For the LVGP model predicting E g , let z A = [z A 1 , z A 2 ] denote the latent variable for the A site. Similarly, z M a , z M b , and z Q denote the latent variables for M a , M b and Q site, respectively. Then, the Gaussian correlation (ρ) between E g of two compounds, e.g. GaMoV 3 S 8 and AlNbW 3 Se 8 , is: where . 2 represents the Euclidean 2-norm. This procedure is used to compute the correlation matrix for properties of all ) in a similar manner. Multiobjective Bayesian optimization. Consider the lacunar spinel family AM a M b 3 Q 8 with A ∈ {Al, Ga, In}, M a ∈ {V, Nb, Ta, Cr, Mo, W}, M b ∈ {V, Nb, Ta, Mo, W} and Q ∈ {S, Se, Te}. The design space (C) comprises 270 compounds, each compound is represented by four design variables A, M a , M b and Q with three, six, five, and three choices, respectively. Our objective is to maximize E g and ∆H d , which is represented in standard optimization formulation as: Starting from the initial dataset, the AOE evaluates new candidate compounds by gauging their improvement in the design objectives. Here, we use the expected maximin improvement (EMI) metric 49 to guide the adaptive sampling framework. The Maximin Improvement (I M ) for compound c is: where C PF is the current set of Pareto front compositions. To facilitate the comparison in Equation 3, we scale the value of each design objective P using the scheme P(·) = (P(·) − P min )/(P max − P min ) where P max and P min are the maximum and minimum value of property observed so far. The EMI of compound c is defined as the expected value of I M : We evaluate the EMI through Monte Carlo sampling with 500 trials. An alternative approach to EMI calculations is based on its geometrical interpretation. 50 At each AOE iteration, the EMI is calculated for all compositions that are not yet present in the data repository. The composition with largest EMI will be sampled next in property evaluation and then added to the data repository.

Data Availability
The simulation output files are available upon reasonable request. They are not publicly available due to the very large file sizes. Parameters of the input files are described in the computational methods. Atomic structures for the simulation cells are available upon reasonable request.

Code Availability
The Vienna Ab initio Simulation Package (VASP) is a proprietary software available for purchase at https://www.vasp.at/. Data processing scripts written to process output files and create figures are available upon request. The R-package âĂĲLVGPâĂİ contains the code for fitting LVGP models to general mixed-variable datasets and is available from the Comprehensive R Archive Network (CRAN) at http://CRAN.Rproject.org/package=LVGP. The MOBO codes used in the current study are available from the corresponding authors upon reasonable request.