A recommendation engine for suggesting unexpected thermoelectric chemistries

The experimental search for new thermoelectric materials remains largely confined to a limited set of successful chemical and structural families, such as chalcogenides, skutterudites, and Zintl phases. In principle, computational tools such as density functional theory (DFT) offer the possibility of rationally guiding experimental synthesis efforts toward very different chemistries. However, in practice, predicting thermoelectric properties from first principles remains a challenging endeavor, and experimental researchers generally do not directly use computation to drive their own synthesis efforts. To bridge this practical gap between experimental needs and computational tools, we report an open machine learning-based recommendation engine (http://thermoelectrics.citrination.com) for materials researchers that suggests promising new thermoelectric compositions, and evaluates the feasibility of user-designed compounds. We show that this engine can identify interesting chemistries very different from known thermoelectrics. Specifically, we describe the experimental characterization of one example set of compounds derived from our engine, RE12Co5Bi (RE = Gd, Er), which exhibits surprising thermoelectric performance given its unprecedentedly high loading with metallic d and f block elements, and warrants further investigation as a new thermoelectric material platform.


I. INTRODUCTION
The experimental search for new thermoelectric materials remains largely confined to a limited set of successful chemical and structural families, such as chalcogenides, skutterudites, and Zintl phases. [1][2][3] In principle, computational tools such as density functional theory (DFT) offer the possibility of rationally guiding experimental synthesis efforts toward very different chemistries. However, in practice, predicting thermoelectric properties from first principles remains a challenging endeavor, 4 and experimental researchers generally do not directly use computation to drive their own synthesis efforts. To bridge this practical gap between experimental needs and computational tools, we report an open machine learning-based recommendation engine (http://thermoelectrics.citrination.com) for materials researchers that suggests promising new thermoelectric compositions, and evaluates the feasibility of userdesigned compounds. We show that this engine can identify interesting chemistries very different from known thermoelectrics. Specifically, we describe the experimental characterization of one example set of compounds derived from our engine, RE 12 Co 5 Bi (RE = Gd, Er), which exhibits surprising thermoelectric performance given its unprecedentedly high loading with metallic d and f block elements, and warrants further investigation as a new thermoelectric material platform.
For any materials problem, breaking out of "local optima" in composition space to discover entirely new chemistries remains a notoriously difficult challenge. 5 Many of the most notable materials classes under investigation today-from Na x CoO 2 derived thermoelectrics 6 to iron arsenide superconductors 7 -were discovered fortuitously. As a result, experimental efforts often gravitate toward incrementally improving known chemistries (via a) Electronic mail: mgaultois@mrl.ucsb.edu b) Electronic mail: bryce@citrine.io doping, nanostructuring, etc.), as these efforts are more likely to bear fruit than high-risk searches through chemical whitespace for entirely new materials.
The consequence of research communities' focus on further exploitation of known chemistries rather than exploration of unknown chemistries is that much of composition space simply remains uncharacterized. In Fig. 1a, we illustrate the remarkable chemical homogeneity of most thermoelectric materials investigated to date. We plot each material from the thermoelectric database of Gaultois et al. 8 on the periodic table based on the composition-weighted average of the positions of elements in the material. The tight cluster of previously investigated chemistries is, as expected, dominated by chalcogenides and p-block elements such as Sn and Sb. In contrast, we also show the positions of Gd 12 Co 5 Bi and Er 12 Co 5 Bi, materials derived from our recommendation engine, which we characterize as a new class of thermoelectrics in this work. These materials are almost pure intermetallics, in sharp contrast to thermoelectric compounds investigated to date (Fig. 1b). The objective of our recommendation engine is to directly enable experimental researchers to rapidly identify new materials, such as RE 12 Co 5 Bi, that are very distinct from known compound classes, and worthy of further study.

II. A MATERIALS RECOMMENDATION ENGINE
Our recommendation engine is a machine learningbased approach 9,10 for efficiently driving synthetic efforts toward promising new chemistries. We have trained a machine learning model to make a binary true/false prediction of whether the (1) Seebeck coefficient, (2) electrical resistivity, (3) thermal conductivity, and (4) band gap of input materials are within acceptable ranges for thermoelectric applications. We define these ranges as follows: (1) |S| > 100 µV K −1 ; (2) ρ < 10 −2 Ω cm; (3) κ < 10 W m −1 K −1 ; and (4) E g > 0 eV. We would classify any material for which the answer to all these ques- tions is "yes" as a potentially promising thermoelectric that may warrant further study. The purpose of our recommendation engine is thus neither to make quantitative predictions of these thermoelectric properties, nor to definitively identify record-setting compounds-these remain open challenges for future work. Rather, the engine is intended to greatly augment the chemical intuition of experimental researchers working on materials discovery. In particular, we have found that our model's ability to screen vast numbers of possible compositions and shortlist interesting candidates can inspire materials syntheses that would not have been obvious a priori.

Model validation
The training set for our recommendation engine comprises a large body of both experimental thermoelectric characterization data 8 and first principles-derived electronic structure data. 5,11 Our model uses these input data to learn interesting chemical trends that could be exploited to design new materials. We visualize the accuracy of our recommendation engine's predictions in Fig. 2, which represents the results of leave-one-out crossvalidation (LOOCV) on our training data (in the case of the band gap data, we performed LOOCV on a subset of the extremely large training set). In the LOOCV procedure, if we have n total measurements of a particular property such as thermal conductivity, we train our machine learning model on n − 1 of these values and predict the nth (left out) value. We perform one training step and prediction for each property value, and present the error distribution for all n values in Fig. 2. The error distribution then provides us with a sense of how we may expect the model to perform on new materials of which we have no prior knowledge. Fig. 2 indicates that our engine generally makes very reliable assessments of thermoelectric materials properties. The modes of the error distributions are in each case close to 0. For each property, the engine's errors skew toward false negatives (resistivity, band gap, thermal conductivity) or false positives (Seebeck), which reflects the fact that the underlying training data do not contain equal fractions of positive and negative examples. Seebeck coefficients prove most difficult to assess (i.e., the error distribution for that property has the largest standard deviation), likely because there are strikingly different mechanisms that underpin the values, for example, strongly correlated oxides as opposed to degenerate semiconductors.

III. DISCUSSION
In this work, we are interested not only in developing a model that gives accurate predictions of materials properties, but also in making it immediately accessible and useful for experimental researchers. To that end, we have published our recommendation engine as a web app at http://thermoelectrics.citrination.com, where researchers may explore a pre-computed list of around 25,000 known compounds, and also use our model to evaluate in real-time their own materials candidates. In this way, we hope that the app serves as a rapid triage tool for ideas for potential new thermoelectric materials. Our pre-computed list may be arranged according to the probabilities associated with any one of the four properties we are modeling, and is sorted by default according a composite score that takes all four properties into account. Furthermore, the user may specify cutoff thresholds for any of the properties, and thereby greatly reduce the size of the list.
As we believe our extensive precomputed list contains some interesting and heretofore uncharacterized candidate thermoelectric materials, we now comment on a select set of high-ranking compounds. Several of these compounds are given in Table I. TaVO 5 and TaPO 5 occur in an analogous crystal structure to the phosphate tungsten bronzes. 12,13 These materials can be expected to have good thermoelectric performance given the heavy atoms, the potential for low electrical resistivity provided by the repeating ReO 3 -type structural network that is highly connected in three dimensions, and the intrinsic crystallographic shear provided by the crystal structure. Although the phosphate tungsten bronzes themselves are not highly rated, their metallic electrical transport properties are encouraging for structural analogues. 14 Moreover, TaVO 5 has a neg-ative coefficient of thermal expansion and a structural transition at 600 • C. 15 This structural transition may lead to softening of phonon modes and anharmonic scattering, which may lead to low thermal conductivity. The second material of interest we present is Tl 9 SbTe 6 . Though this compound was not included in the thermoelectric database, it scores highly within the recommendation engine, and good thermoelectric performance has been subsequently demonstrated in recent work. 16 The suggestion of TaAlO 4 , SrCrO 3 , TaSbO 4 and other oxides expected to be insulators can be understood because the recommendation engine uses as training data references where stoichiometric formulas were primarily reported rather than doping details. 17,18 Nevertheless, with doping through substitution or reduction, these compound may exhibit moderate electrical performance. Further, these materials all feature extended structures that are highly connected in three dimensions, an important feature for low electrical resistivity. Moreover, the large mass contrast on the cation sublattice in TaAlO 4 (edge shared TaO 6 and AlO 6 octahedra) could lead to low thermal conductivity, and previous reports have shown that SrCrO 3 is metallic when synthesized under pressure. 19 Many of the high-ranking candidate materials are interesting because of their highly connected extended structures, even though the recommendation engine does not use features of crystal structure to make its suggestions. The chief disadvantage to training prediction algorithms using crystal structure is that structure then becomes a required input for making predictions, and yet structure is by definition not available for uncharacterized materials. However, the absence of crystal structure does cause our engine difficulty where changes in crystal structure with similar elemental compositions cause large changes in physical properties. For example, both DyPO 4 and LaPO 4 are predicted to have low thermal conductivity. However, LaPO 4 is monazite, a corner edge-shared structure, whereas DyPO 4 is xenotime, 20 an edge-shared structure leading to inherently higher thermal conductivity. 21

New materials and their properties
Our final and most important task in this work is to demonstrate that our recommendation engine can indeed guide researchers toward interesting experimental discoveries. Among the set of high-scoring candidate materials, we selected Er 12 Co 5 Bi and Gd 12 Co 5 Bi to characterize as thermoelectric materials due to their facile synthesis through arc melting, and due to the fact they are chemically quite distinct from known thermoelectrics (Fig. 1). While the RE 12 Co 5 Bi (RE = rare earth) family of compounds has only been sparsely studied in the literature, their crystal structure and initial low-temperature electrical and magnetic properties have been reported by Mar and coworkers. 22 The crystal structure of RE 12 Co 5 Bi is For each material in our training set and each property, the recommendation engine gives a confidence score between 0 and 1 that the property value falls within the ideal windows we have defined for thermoelectric applications. Errors approaching +1 represent false negatives (our engine thought the material would be poor for that property, but it is actually good); and an error of −1 is a false positive (our engine thought the material would be good for that property, but it is actually poor). shown in Figure 3. Interestingly, the crystal structure of our candidate thermoelectric exhibits notable similarity to the structures of known thermoelectrics, in spite of the fact that crystal structure was not an input feature for our recommendation engine. Ho 12 Co 5 Bi is the eponymous structure prototype (orthorhombic, space group Immm) adopted by a series of rare-earth intermetallics RE 12 Co 5 Bi (RE = Y, Gd, . . . , Tm). In this structure, the Ho 12 Bi icosahedra play an analogous role to the LaP 12 icosahedra in the filled skutterudite prototype LaFe 4 P 12 ; rare-earth atoms "rattling" within their 12-fold coordinated cages is the idiosyncratic feature of filled skutteru-dites that imparts low thermal conductivity so prized in thermoelectric materials. In fact, if the transition metal atoms, which occupy different sites in these structures, are disregarded, the Ho 12 Bi framework is an antitype to the LaP 12 framework, with the roles of the rare-earth and group 15 elements reversed. We hypothesize its crystallographic similarity to skutterudite could be partly responsible for the thermoelectric behavior of RE 12 Co 5 Bi (RE = Gd, Er).
We give a full thermoelectric characterization of Er 12 Co 5 Bi and Gd 12 Co 5 Bi in Fig. 4. Based on these results, we report the discovery of a new thermoelectric class, which remains a completely unoptimized, pure bulk material and thus lends itself to further study. Notably, the material falls far outside the usual search space for thermoelectrics (Fig. 1), and was neither the result of simple interpolation between known compounds nor obvious from a strict chemical intuition standpoint. The electrical resistivity is commensurate with other highperforming materials such as chalcogenides, although the Seebeck coefficient is too low for the material to be competitive with the best-known thermoelectrics. Furthermore, the thermal conductivity is relatively high, but the filled cage structure lends itself to substitution that has successfully reduced thermal conductivity in the skutterudite systems. 3,23 In RE 12 Co 5 Bi (RE = Gd, Er), the thermal conductivity from 300 K to 800 K ranges from 4 W m −1 K −1 to 8 W m −1 K −1 , comparable to the half-Heuslers. 24,25 The electrical performance figure of merit κzT is around 0.03 at 400 K, which is actually higher than that of nearly 30% of the thermoelectrics in the Gaultois et al. thermoelectrics database; 8 of course, the database is a highly self-selected set of materials, consisting of literature-reported thermoelectrics, and would skew toward much higher κzT values than would a random subset of all crystalline materials. We note, of course, that the zT of several other thermoelectric materials can be significantly improved through carrier con-centration tuning and microstructural engineering. For example, undoped polycrystalline Si has a 60-fold increase in performance after optimization, going from zT < 0.01 to 0.6 at 300 K. 26 Another observation from Fig. 4 illustrates the scientific boon of studying entirely new classes of materials. Unexpectedly, RE 12 Co 5 Bi (RE = Gd, Er) exhibits increasing thermal conductivity with temperature. The increasing electrical resistivity with temperature indicates metallic electrical transport, so the electrical contribution to the total thermal conductivity should therefore decrease with increasing temperature. Additionally, the phonon contribution to thermal conductivity should also decrease with increasing temperature due to more phonon-phonon (Umklapp) scattering. 27 Thermal conductivity is calculated from the following relation: κ = α ρ C p , where α is thermal diffusivity, C p is heat capacity, and ρ is density. Normally, thermal diffusivity has a negative temperature dependence whereas heat capacity and density both have positive temperature dependence. However, for this compound we observe a positive temperature dependence for the thermal diffusivity even after multiple measurements, the origin of which is not presently understood. Materials with increasing thermal conductivity with temperature are rare, though not unprecedented, 28,29 and further studies on this class of compounds to shed light on this anomaly could thus lead to new strategies for thermoelectric materials optimization.

IV. CONCLUSIONS
This initial experimental validation of our recommendation engine is encouraging. The present work represents the first time that machine learning has been used to suggest an experimentally viable new compound from true chemical white space, where no prior characterization had hinted at promising chemistries. The implication is that our approach-wherein a data-driven computational tool directly augments experimental capabilities and intuition-is a semi-rational way to discover new materials families that may have desirable properties. We suggest that such an paradigm could eventually replace trial-and-error and fortuity in the search for new materials across a wide variety of application areas.

ACKNOWLEDGMENTS
We thank Ram Seshadri for helpful discussions and insight. We thank the National Science Foundation for support of this research through NSF-DMR 1121053, as well as the Natural Sciences and Engineering Research Council of Canada (NSERC EXPERIMENTAL DETAILS RE 12 Co 5 Bi (RE = Gd, Er) samples were made by arcmelting freshly filed Er or Gd pieces (99.9%, Hefa), Co powder (99.8%, Cerac), and Bi powder (99.999%, Alfa Aesar). Stoichiometric mixtures (0.5 g total mass) with 5-7% excess of bismuth were pressed into pellets and melted twice in arc-melting furnace under argon atmosphere (Edmund Bühler Compact Arc Melter MAM-1). The total mass loss after melting was < 1%. The sam-ples were sealed in silica tubes and annealed at 1070 K for one week, then quenched in cold water. To produce enough material for physical property measurement, ∼70 samples of each compound were prepared, and pure samples were combined by melting into a single ingot of ∼5 g, which was sanded to yield the appropriate geometry (either a rectangular bar, or a cylinder). Density was measured using Archimedes' method; the final pellets had densities 100% of the single crystal values (ρ Gd12Co5Bi = 8.6 g/cm 3 , ρ Er12Co5Bi = 9.9 g/cm 3 ).
Powder X-ray diffraction patterns were collected using an INEL CPS 120 diffractometer with Cu Kα 1 radiation at room temperature. Rietveld refinement was used to confirm the structure and phase purity.
High-temperature thermoelectric properties (electrical resistivity and Seebeck coefficient) were measured with an ULVAC Technologies ZEM-3. Sample bars had approximate dimensions of 9 mm×4 mm×4 mm. Measurements were performed with a helium under-pressure, and data was collected from 300 K to 800 K through three heating and cooling cycles over 18 hours to ensure sample stability and reproducibility.