Perspective: NanoMine: A material genome approach for polymer nanocomposites analysis and design

Polymer nanocomposites are a designer class of materials where nanoscale particles, functional chemistry, and polymer resin combine to provide materials with unprecedented combinations of physical properties. In this paper, we introduce NanoMine, a data-driven web-based platform for analysis and design of polymer nanocomposite systems under the material genome concept. This open data resource strives to curate experimental and computational data on nanocomposite processing, structure, and properties, as well as to provide analysis and modeling tools that leverage curated data for material property prediction and design. With a continuously expanding dataset and toolkit, NanoMine encourages community feedback and input to construct a sustainable infrastructure that benefits nanocomposite material research and development.


I. INTRODUCTION
Research efforts in the past decades on nano-reinforced polymeric materials provide numerous examples of property enhancement across multiple physical property domains including thermomechanical, dielectric, optical, and other properties. 1-6 Results from these works have generated a tremendous amount of experimental and simulation data covering a wide range of polymer, particle, and chemistry combinations. While some mechanisms that lead to property changes in nanocomposites are gradually being revealed, our ability to comprehensively understand the underlying physics and principles across the whole nanocomposite domain is still quite limited, and the design and development of new nanocomposite materials remain largely dependent on Edisonian, trial-and-error iterations.
With the enormous amount of research data scattered throughout the publicly available literature as well as in private spreadsheets and files, it is impossible at present to perform a comprehensive search of the data, thereby making the design of new functional materials an increasingly inefficient process. Most research data are reported and represented without a unified standard of data representation in terms of terminology and data format, as well as required degree of detail, completion, and accuracy. Consequently, conventional keyword-based web engine searching provides search results that are far from complete for effective material design or even exploratory fact searching. Even a simple deterministic query on single or multiple material parameters can be challenging.
a Author to whom correspondence should be addressed. Electronic mail: cbrinson@northwestern.edu In the past decade, initiatives in academic and industrial domains have been launched to construct web-based data-driven platforms for design of new materials. MatNavi, developed by the National Institute of Material Science of Japan, is one of the largest open materials databases. 7 As part of MatNavi, PolyInfo provides an online database of structure and properties for pure polymer materials with property data from more than 14 000 literature sources. 8 Outside of academia, examples of online material databases include MatWeb, a search engine of materials data cataloged from manufacturer and suppliers, 9 and Granta Material Intelligence (pay for service), a commercial enterprise scale, solution-based, material data management infrastructure for a broad variety of industrial applications. 10 Since the inauguration of the Material Genome Initiative (MGI) in 2011, 11 many efforts have been focused on integrating the efforts of material science, informatics, and information technology to develop new infrastructure for materials discovery and design. Several existing online platforms have utilized results from computational techniques coupled with data analytics for screening and discovery of promising material systems with enhanced properties, such as the Materials Project 12 and the Open Quantum Materials Database (OQMD), 13 both of which take advantage of high-throughput density functional theory (DFT) to generate large datasets and apply predictive analytics methods for material design. In the private sector, Citrine Informatics has been providing a cloud-based platform with a fast expanding material database containing datasets from multiple sources as well as data-driven material design tools. 14 However, among all current platforms, polymer nanocomposites are a largely untouched space. Instead of well cataloged material properties or calculated results from large-scale numerical simulation, most reports on novel nanocomposite materials come from the literature of experimental investigations. In contrast to directly reading data from pre-defined, well-established formats, extraction of data from the literature requires a thorough curation process. Additionally, the complexity of hierarchical structures in composites along with infinite possibilities of polymer, particle, and surface chemistry combinations leads to a considerably less developed data and design space for nanocomposites. 15 Moreover, small changes in processing conditions and surface chemistry result in dramatic changes to filler-matrix interphase characteristics and microstructure, both of which are critical to the composite properties. 16 Finally, the phase diagram based tools (e.g., CALPHAD 17 ) developed for exploration of metallic alloy materials are not applicable to this materials domain. A comprehensive methodology to fully account for processing, structure, and property (p-s-p) information for nanocomposite materials has not been established.
In this work, we present the NanoMine 18 framework, a data-driven web-based infrastructure that combines a database, data-driven analysis tools, and physics-based modeling for polymer nanocomposite materials analysis and design. NanoMine aims to apply, and customize, the material genome approach for nanocomposites in order to facilitate efficient material selection and design. This paper will describe current and ongoing efforts in each major component of NanoMine, followed by an example of a design workflow that takes advantages of NanoMine tools for composite property prediction. We conclude by summarizing current capabilities and plans for future development.

II. COMPONENTS OF NANOMINE
The underlying principle of NanoMine is to create a living, open-source data resource for nanocomposites which provides data archiving and exchange, statistical analysis, and physics-based modeling for property prediction and materials design. Fig. 1 provides an overview of the three major thrusts within the NanoMine framework: material database, analysis tools, and simulations. Within the NanoMine data resource, we aim to capture the physical properties reported in the literature and from individual research labs and nano-and micro-structures of corresponding experimental samples, as well as material processing conditions using standardized format and terminology. With sufficient data accumulated in each of the p-s-p domains, statistical correlations are developed to link processing conditions, quantified microstructure information, and macroscopic property response through parameterized formulations and statistically meaningful correlations, coupled by image analysis techniques and physics-based simulations.

A. Database
Current development version of NanoMine is implemented with the Material Data Curator System (MDCS), developed at the National Institute of Standards and Technology (NIST), as the database and interface infrastructure. 19 The system beta version contains basic functionality for data curation (data entry with pre-defined XML schema) and data exploration (select from existing materials and look up data by query). A customized template is constructed to archive raw p-s-p parameters from data sources, so that users can query the database to look up and retrieve data as well as process the data using analysis and modeling features in NanoMine to obtain microstructure information and simulated material response.
A well-defined data structure, or schema, is vital to effectively collect and archive materials data and to enable accurate data retrieval and comparisons. 20 The terminology used in various data sources referring to an identical quantity needs to be unified, and the data types used to represent and store the entities should also be well defined and self-explanatory. 21 To find the set of most commonly recurring parameters and construct the XML schema, we surveyed 30 representative papers on polymer nanocomposites published within the past decade.  Based on this initial literature survey, we developed a data template to serve as an initial, coherent view of available data and terminology standards containing all key parameters associated with processing, structure, and properties along with meta data from the data source. A summary of attributes in the template along with the descriptions is listed in Table I. The three data types that are used to store the attributes are shown in Fig. 2.
Since nanocomposites are a very broad class of materials and property targets, the initial curation process was concentrated on samples with surface treated spherical inorganic nanofillers, where the publications contained corresponding micrographs with explicit nanophase dispersion, well documented processing, and characterization procedures, as well as clearly plotted functional data of viscoelastic and/or dielectric properties. This preliminary work serves to test our data-driven approach using a subset of polymer nanocomposite materials. NanoMine currently contains data on more than 300 distinct material samples from the literature, each with specific composition, synthesis, processing, and measurement conditions (e.g., PGMA-functionalized nanosilica with 7 vol. % loading in epoxy matrix measured at room temperature) and each associated one or more reported properties, data on the measurement techniques, and micrograph images, depending on the available information from the data source. The data available in NanoMine will be constantly growing along with the continuous curation process. Future efforts will expand the current data resource to include more types of nanocomposite materials, for example, fillers with higher aspect ratios (such   as carbon nanotubes, nanofibers, and nanoplates) with anisotropic nanophase dispersion, as well as extended property domains (thermal, optical, and structural properties). NanoMine provides multiple channels for users to access curated data. Simple Search provides a graphical user interface to select composition and/or properties of nanocomposites in order to find archived samples that match the searching criteria. In Advanced Search, users can select any attribute used during data curation to query the database. For example, users can limit the journal(s) and year(s) of publication or choose only samples that were synthesized using a particular processing method. Upon request, users can also access the Representational State Transfer (REST) application programming interface (API) scripts that come with the MDCS distribution to insert, modify, and retrieve data. These methods provide flexible channels to exchange data between users and the data resource.

B. Analysis tools
In addition to the database, NanoMine also provides analysis tools for quantitative investigation of curated data to assist with material selection and property prediction. NanoMine aims to provide a practical suite of toolkits tailored to and integrated with curated data on polymer nanocomposites. In the beta site, we have implemented three tools related to microstructure analysis as open, web-based modules and are continuing to develop and publish new methodology for mining and analysis of data. Fig. 3 shows features of the three implemented tools for microstructure analysis. One tool, Niblack Binarization, adopts a dynamic local thresholding algorithm to convert input grayscale micrographs into a binary image with separated nanofiller and matrix phases. 52 As the threshold value that separates filler from matrix phase is only applied locally within fixed size windows, this local binarization algorithm ensures that the background noise and influence from uneven brightness are eliminated. Using this binary image, the Descriptor Characterization tool quantifies nanophase dispersion into statistical descriptors that capture the composition, geometry, and dispersion of nanofillers. 53 Sample descriptors for each category include volume fraction (composition), major and minor axes (geometry), and nearest center distance (dispersion). With higher-order statistics included, the descriptors are capable of describing the quantitative dispersion of the entire population of fillers. With the calculated descriptors, the 3D Reconstruction tool generates a statistically equivalent 3D microstructure from the 2D binary image. 54 Assuming ellipsoidal nanoparticle clusters, the reconstruction algorithm generates coordinates, size, and orientation of the clusters by matching the descriptors between 2D and 3D domains. Since most available experimental microstructure data are in 2D space, this algorithm provides a fast and reliable conversion from 2D to 3D FIG. 3. Microstructure characterization and reconstruction workflow. Input grayscale micrograph (a) is first converted to binary image (b) consisting of polymer and particle phases using dynamic local threshold algorithms (Niblack Binarization). Dispersion from microstructure is then quantified into statistical descriptors (c) (Descriptor Characterization). Using pixel moving method, a statistically equivalent 3D microstructure (d) is constructed that exhibits the same dispersion state as in original 2D domain (3D Reconstruction). The 3D morphology can then be used for deterministic analysis and modeling, such as finite element modeling of macroscopic response. domain. Along with constituent property assignment, the generated microstructure can then serve as input to 3D physics modeling to simulate composite properties. In addition to microstructure analysis, data analytics tools are being implemented to analyze existing data in NanoMine. For example, our work has shown that mixing energy associated with the processing steps can be statistically correlated with microstructure dispersion. 52 Given the nanocomposite constituents and mixing energy calculated from the extrusion procedures, microstructure descriptors can be obtained through matrix-dependent analytical expressions based on statistical learning. This process is informed by the predicted surface energy of the constituents. 55 This work is currently being refined into a tool to incorporate into NanoMine. The suite of data and tools can be used to assist in microstructure design to optimize properties for specific nanocomposite systems as illustrated in our previous paper, in which the descriptor-based methodology was applied to characterize and reconstruct a nanodielectric microstructure and understand the impact of dispersion on properties. 56 In future work, we will expand the capabilities of the toolkit to construct statistically meaningful correlations among a broader set of p-s-p parameters by exploring curated data in NanoMine to achieve more efficient material selection and design.

C. Simulations
The third pillar of NanoMine incorporates physics-based continuum modeling for prediction of macroscopic material response in nanocomposites. As a widely adopted technique in composite material simulation, finite element analysis (FEA) has been demonstrated as an effective method for prediction of macroscopic composite properties as it includes both explicit input of microstructure geometry and distinction of phases among different constituents. [57][58][59] As illustrated in previous work on the development of FEA models, 55,60,61 available data from NanoMine can be taken as input to the FEA models in order to simulate a macroscopic composite response.
NanoMine currently has implemented two FEA models as web-tools for prediction of viscoelastic and dielectric properties. Fig. 4 shows the workflow of the web-tools after integration with the data resource and analysis tools. With interphase properties and representative microstructure determined from micrographs or predicted from surface energy and mixing energy parameters, a 3D FEA model is built with commercial software (COMSOL/Abaqus) using API and subroutine scripts. An automatically generated finite element mesh is determined based on the aggregation and shapes of clusters. After being fully configured, the FEA model is submitted to a job scheduler on a remote workstation. The simulation takes around 30 min for a typical nanodielectric system (well dispersed with about 50 nanoparticle clusters in an RVE (representative volume element) FIG. 4. Simulation workflow from material selection to macroscopic property prediction. Molecular structures of constituents (a) are used to derive energetic terms (b) that represent constituent interactions. Interphase properties (c) and microstructure dispersion (d) are then predicted from interfacial energies under statistical correlations. At the final step, FEA model (e) takes input from microstructure and material properties (polymer, particle, and the interphase) to simulate continuum composite response. square with side of 1 µm), while the time between job submission and delivery of the final result may vary depending on the wait list of the job scheduler. Users are able to check the status and retrieve final results of jobs using unique job identification numbers allocated at job submission. Implemented FEA web-tools are able to simulate thermomechanical (glass transition temperature, storage/loss moduli) and dielectric (permittivity, loss tangent) properties as well as to accommodate interphase property prediction for known combinations of polymer, particle, and surface treatments. The simulation modules are being expanded to cover more physical properties and phenomena, such as dielectric breakdown strength. The interphase property prediction tool for general nanocomposite materials is still under development; however, initial algorithms for interphase property approximations can be found in publications. 55,61 Integrated with materials data and microstructure analysis, the FEA web-tools provide predicted composite performance upon user-defined material constituents and will be refined and added with more functionalities in ongoing development.

III. PROPERTY PREDICTION AND DESIGN WORKFLOW
A material data resource and associated analysis and simulation components should ultimately serve to design new materials. Using the database and analysis tools provided by NanoMine, a user can do both property prediction for given nanocomposite constituent/processing combinations and will be able to perform material design to obtain desired nanocomposite properties. A sample workflow is illustrated in Fig. 5, which demonstrates a typical scenario for prediction or design of a nanodielectric system. Suppose the user request is to predict dielectric permittivity for an epoxy based nanosilica filled nanocomposite material. Leveraging curated data, the first step is to query the database for existing data of polymer and particle properties as material property input to the FEA model and to use the embedded heuristic tools of Materials Quantitative Structure-Property Relationship (MQSPR) to predict relevant surface energies. 55 Two sub processes will be carried out to process the data: (1) microstructure and (2) interphase. Using statistical correlations, processing parameters from the extrusion processes 52 and the constituent surface energies 55 are related to descriptors and then the 3D Reconstruction tool is employed to generate a 3D morphology that serves as input to the FEA model. For the interphase input, an energy-interphase correlation (Interphase Tool) is applied to derive local interphase properties, which is then loaded into the FEA FIG. 5. Examples of using NanoMine data for prediction and design. Top direction: material property prediction from material selection to simulated composite dielectric properties. From material constituents and processing information, descriptors are determined from the surface energies and used to reconstruct 3D microstructure for FEA simulation. Constituent characteristics also predict interphase properties that serve as input to FEA simulation. Bottom process: design from target material performance to necessary constituents and chemistry. Using optimization algorithm and metamodeling, target performance is mapped with candidate microstructure and interphase, which can back track necessary constituents and processing using data mining models. model. The FEA simulation is then carried out to compute dielectric spectroscopy that corresponds to the selected polymer, particle, and surface treatments. A higher order user request will be to design a composite with targeted properties. For exploratory searching, users can browse the database and look up curated data using a simple query. For example, users can find nanocomposites with modulus and dielectric permittivity confined by ranges as illustrated in Fig. 6. A more challenging type of query is to look for ideal material compositions and processing steps in order to synthesize a nanocomposite sample that can achieve desired properties, as shown in the reverse direction in Fig. 5. NanoMine is being developed to tackle exactly this challenge. The existing tools are coupled with a design optimization algorithm and metamodeling to choose the microstructure, constituent, and interphase properties to attain the goal and then the necessary constituents and chemistry to achieve it. A comprehensive predictive framework with data mining models trained from the entire set of the structured p-s-p data will be built so that the input target performance can be associated with best matched samples and corresponding p-s-p parameters. This design task is currently the focus of ongoing development.

IV. CONCLUSION
Research efforts on nanocomposites in the past decades have accumulated an enormous yet scattered population of data that can be potentially reused and analyzed to guide new material design. Given the existing resource and the unappreciated value behind the large amount of data, the material genome concept provides an alternative data-driven principle to the future of material science research. Similar to other classes of materials with established conventions of material data representation and systematic methodology, material genome analysis and prediction of polymer nanocomposites also require a valid, working framework that combines collected data from past research as well as advanced analysis and modeling techniques.
In this paper, we demonstrate the NanoMine data-centric platform for design and analysis of new functional nanocomposite materials. The three major components in this platform (database, analysis tools, and simulation modules) are separately outlined, with a sample design workflow that integrates the features from all three components. We have established a working template that stores processing-structure-property data of nanocomposites, curated a selected set of nanocomposite literature that pertains to a specific criterion and application domain, integrated the database with previously in-house developed statistical analysis tools, and demonstrated the linkage among the components that leads to prediction of macroscopic composite property. Based on preliminary work with the development system, future efforts on NanoMine will focus on building a production web infrastructure that can benefit the research community. Along with continuously populating the database with the literature and lab data, we will accommodate data curation to additional data sources, such as experimental characterization equipment, user defined uploaded data files through crowd-sourcing, and high-throughput computations. Additional design optimization workflow will also be integrated with the data resource, so that users can take advantage of the available tools for robust design of processing, composition, and expected performance of new materials. We encourage community input of dataset as well as analysis methodology to expand the prediction capability of the current system, and we envision NanoMine as a sustainable public infrastructure for research and development of new nanocomposite materials.