Image-driven discriminative and generative machine learning algorithms for establishing microstructure-processing relationships

We investigate methods of microstructure representation for the purpose of predicting processing condition from microstructure image data. A binary alloy (uranium-molybdenum) that is currently under development as a nuclear fuel was studied for the purpose of developing an improved machine learning approach to image recognition, characterization, and building predictive capabilities linking microstructure to processing conditions. Here, we test different microstructure representations and evaluate model performance based on the F1 score. A F1 score of 95.1% was achieved for distinguishing between micrographs corresponding to ten different thermo-mechanical material processing conditions. We find that our newly developed microstructure representation describes image data well, and the traditional approach of utilizing area fractions of different phases is insufficient for distinguishing between multiple classes using a relatively small, imbalanced original data set of 272 images. To explore the applicability of generative methods for supplementing such limited data sets, generative adversarial networks were trained to generate artificial microstructure images. Two different generative networks were trained and tested to assess performance. Challenges and best practices associated with applying machine learning to limited microstructure image data sets is also discussed. Our work has implications for quantitative microstructure analysis, and development of microstructure-processing relationships in limited data sets typical of metallurgical process design studies.


I. INTRODUCTION
Microstructure image data is rich in information regarding morphology and implied composition of constituent phases, and can provide unique insight into the pathways leading to microstructure formation, and mechanisms responsible for material behavior and performance.Thus, the analysis of micrographs (i.e.microstructure image data) is central to several materials science studies establishing processing-structureproperty relationships, and for the design of new material systems.Despite the ubiquity of micrographs in material science research, significant challenges exist related to consistent and accurate recognition and analysis of image data.Such challenges arise from the domain knowledge and skill required to obtain micrographs, the diverse types of image data possible (e.g.optical and electron microscopy), domain-specific challenges to image analysis techniques, and more.With the advancing application of artificial intelligence (AI) (i.e. machine learning) in a wide range of fields, we find the application of established AI methods to microstructure recognition and analysis opens up an opportunity for computationally-guided experiments, and objective, repeatable analysis of image data.To address the need for improved microstructure quantification via image-driven machine learning using small, imbalanced data sets, we investigate microstructure-processing relationships in a model binary uranium-molybdenum (U-Mo) alloy.The U-Mo system is of particular interest due to the alloy's applicability as a nuclear fuel for research reactors, a) Electronic mail: elizabeth.kautz@pnnl.govb) Electronic mail: lewisd2@rpi.eduand the need to understand microstructure-processing relationships for improved fabrication design and fuel qualification.
Uranium (U) alloyed with 10 weight percent (wt%) molybdenum (Mo), referred to here as U-10Mo, is currently under development as a new metallic nuclear fuel for application in research and radioisotope production facilities.U-10Mo is a candidate for low enriched U (LEU) fuel, designed to replace currently used highly enriched U (HEU) fuels with the aim of reducing proliferation and safety risks associated with HEU handling and operation [1][2][3][4][5] .A monolithic, platetype design for U-10Mo has been selected due to the high U densities achievable while meeting the low enrichment specification, where the fuel must have ≤ 20% 235 U relative to all U isotopes.In order to fabricate fuel plates to meet dimensional requirements, the U-10Mo alloy must be subjected to several thermo-mechanical processing steps, leading to microstructural evolution during fabrication (e.g.hot rolling, hot isostatic pressing).To design a fuel with microstructure that meets performance requirements, and to enable future materials processing design, the microstructure-processing relationship must be well-established.
The equilibrium phase of pure U at room temperature (α-U) has an orthorhombic crystal structure.α-U is known to experience non-uniform thermal expansion in a high temperature, irradiation environment, thus 10 wt% Mo is added to stabilize the high-temperature BCC γ-U phase at room temperature.During processing, U-10Mo is exposed to temperatures below 560-575°C during hot isostatic pressing (HIP).Below these temperatures, a eutectoid decomposition of the metastable γ-UMo matrix phase into α-U and γ (U 2 Mo) is expected, based on the equilibrium binary phase diagram 6 .Prior work has demonstrated that this eutectoid decomposition occurs via a discontinuous precipitation (DP) mechanism.The decomposition involves the γ-UMo matrix phase transforming to α-U and Mo-enriched γ-UMo products with lamellar morphology.Our previous work showed that this transformation was initiated primarily at grain boundaries, and interphase interfaces, where Si segregation was observed [7][8][9][10] .
Significant prior work has been performed using machine learning for a range of materials science applications [11][12][13][14][15][16][17][18][19][20] .A rapidly growing area in machine learning in materials science is in image data quantification.Previous studies demonstrated success of convolutional neural networks (CNNs) in microstructure recognition tasks without significant development time, and state-of-the-art performance for a wide range of microstructures (e.g.forged titanium, perlitic steel, metal powder, ceramics) [12][13][14][15][16]21,22 . Additonally, the application of machine learning methods to large image data sets, such as those available through ImageNet is routinely done 23,24 .The ImageNet database includes over 14 million natural images that can be used for training and testing of machine learning models.Application of machine learning methods to limited data sets is still a frontier in the machine learning community 17,21,[25][26][27] , and is of interest to materials science data analysis problems, where only limited, unbalanced, or historic data sets are available, and the cost/time associated with obtaining very large data sets is prohibitive.
The present study explores the applicability of imagedriven machine learning methods 13 to developing microstructure-processing relationships.Specifically, we seek to understand the role of several thermomechanical processing steps on the microstructure evolution observed in the U-10Mo system.An improved approach to determining microstructure-processing relationships is developed and presented here, involving feature extraction, segmentation, and classification using a random forest model.Microstructure image data is segmented to identify microstructural features of interest and quantify area fraction of these features, including the γ-UMo matrix, uranium carbide, and DP reaction transformation products.The application of generative adversarial networks (GANs) 28,29 is also discussed as an emerging method for microstructure image generation.Our work has broad implications for machine learning applications in microstructure image analysis, and the development of quantitative microstructure-processing relationships in a wide range of alloy systems.

A. Image Data
Image data used in this work is from two scanning electron microscopes (SEMs): a FEI Quanta dual beam Focused Ion Beam/Scanning Electron Microscope (FIB/SEM), and a JEOL JSM-7600F Field Emission SEM.The backscatter electron detector was used for improved atomic number (Z) contrast.Two different microscope operators took the images.Thus, the image data analyzed here was diverse in terms of resolution, contrast, focus, and magnifications selected.The idea in using a variety of images taken by different operators using different microscopes (but all of the same samples) was to develop a more robust model that can distinguish between different material processing conditions.
All images used in this work are of a depleted U-10Mo alloy fabricated and prepared according to details presented elsewhere 30,31 .Images were taken over a range of magnifications from 250x to 500x.Image data was labeled based on processing condition, detailed in Figure 1.Ten different image classes were studied, where each class corresponds to a different processing history that generates a unique microstructure.The processing conditions detailed in Figure 1 include two different homogenization annealing treatments (900°C-48hr and 1000°C-16hr) and several thermo-mechanical processing steps such as cold and hot rolling.Each image class is therefore labeled by the homogenization treatment (HT) and processing condition (C) numbers, where 900°C-48hr is referred to as HT1, and 1000°C-16hr is HT2.Processing conditions are indicated by C followed by the number in the list of all possible conditions in Figure 1.For example, HT1-C1 is a U-10Mo sample that is in the as-cast and homogenized condition, where homogenization was done at 900°C for 48 hours.
Original images vary in size.Different image sizes used in this work (in pixels) include: 2048 by 2560, 1448 by 2048, 1428 by 2048, and 1024 by 1280.All images included a scale bar region which was removed prior to training and testing by cropping the image.The data set analyzed here consists of a total of 272 original images from 10 classes.Bilateral filters were applied to each image for noise removal while keeping edges sharp.In our study, we chose the diameter of each pixel neighborhood as 15 while keeping all other default parameters.

Feature Extraction
In order to determine how to best quantify microstructure image data on the U-10Mo system (as a function of thermomechanical processing parameters), different methods of feature extraction were developed and tested.Here, each microstructure image is described by a feature vector, and how that feature vector is derived either depends on area fraction of different regions, or spatial relationships between microstructural features of interest.These two types of features are referred to here as area and spatial features, respectively.These two different feature types are extracted from each image after segmentation.Area features are simply the area fractions of each phase or region (γ-UMo matrix, UC, and lamellar transformation products).U-10Mo microstructures have been described by the area fractions of these regions in prior work. 9,30,32Spatial features are computed by first measuring the following for each region (matrix, carbide, lamellar transformation products): the x and y coordinate of the centroid, area (in square pixels), and the ratio of area of the region to the area of its bounding box.The spatial feature is simply a  concatenation the following measures: the number of regions, the mean and standard deviation of the areas, the standard deviation of the centroid coordinates, and the mean and standard deviation of area ratios.

C. Approach and Machine Learning Model
All experimentation was carried out with Python version 3.6.9with the help of various open-source libraries.The opencv, scipy, skimage, numpy, and sklearn packages (compatible with Python version 3.6.9)were used for training, testing, and validation.All relevant model parameters are summarized in Table I.
The approach to image recognition and characterization de-veloped here is schematically described in Figure 3(a), and involves the following steps: (1) image segmentation, (2) extracting interpretable features from the image data, and (3) classifying microstructures from different classes based on extracted features.
The image segmentation algorithm used here is based on our prior work where k-means was applied to classify image pixels based on the grayscale values 32 .This method is built upon the assumption that pixels correspond to different grayscale values and the differences between clusters are significant.However, in our data set not every image comes with three different phases (dictated by processing condition), which leaves us with the question whether k is 2 or 3 for each image.While there are some well-known methods for choosing k, such as the elbow method and the silhouette method, they do not work well in our experiments.A reason for why these methods do not work here is that the grayscale shades are typically spread out evenly on a γ-compressed nonlinear scale, which means there are insignificant differences in grayscale values, even though they are noticeable to the human eye.Thus, for the image data used here, a specific k value needs to be hard coded for each class.However, this hard coding requires ground truth knowledge about the image processing condition (i.e.class label).
To overcome the limitations associated with applying kmeans clustering to our image data, we developed a two-stage segmentation method which combines the k-means clustering and the image morphology.This approach is schematically  described in Figure 3 (b).In the first stage, we apply k-means clustering based on the grayscale values of each pixel with k = 2.The purpose of this step is to segment the γ-UMo matrix phase from the rest of the image.In the second stage, we apply morphological opening and closing (i.e.dilation and erosion) 34 , to remove the fine-scale lamellae in the transformed region (so that this region is considered as a single grayscale value), and smooth the border of UC inclusions.These morphological operations aide in improving segmentation results via k-means.It is noted that for the purpose of this work, it is desirable that transformation products that appear as fine lamellae are treated as one region, where distinguishing between lamellae is not needed.

III. RESULTS AND DISCUSSION
A. Developing microstructure-processing relationships using discriminative learning methods Developing understanding of microstructure-processing relationships and improving predictive capability becomes more difficult as processing complexity increases.In our case study on the U-10Mo system, several steps are performed during fuel fabrication, and the ability to recognize what processing parameters lead to a given microstructure can allow for improved process design and quality control.However, the question of how to quantitatively describe microstructure image data in order to predict processing condition from microstructure images remains unanswered.Significant prior work has been performed in which area fraction of different phases with varying gray scale serves as a proxy for volume fraction, and is thus used as the primary quantitative microstruc-ture descriptor 9,30,32 .Yet, the choice of area fraction may not be the best metric when several phenomena are changing with varying processing condition, such as extent of phase transformations, distribution or fragmentation of inclusions, and change in grain size and morphology.To measure how accurately different features can represent microstructures, we use features (area and spatial, described above) as inputs to train a Random Forest model to predict the corresponding processing history.Images are segmented and area and spatial features are extracted.In addition, we collect other texture features, such as the Haralick features and the local binary patterns (LBP), which have previously been demonstrated to represent microstructure image data well. 13,32The following four experiments were considered to explore metrics of microstructure representation: 1. Characterization of micrographs using area features only.For the two tasks below, we train two separate Random Forest models for classification and 5-fold cross-validation is applied to evaluate the model performance: (a) A 10-class classification to predict microstructure processing history (HT1-C1, HT1-C2, HT2-C1, etc.) (b) A binary classification to predict the homogenization temperatures (900C-48hr or 1000C-16hr) 2. Characterization of micrographs based on area, and spatial features, in an effort to increase predictive power of our model.Similar to Experiment 1 (above), we train two Random Forest models for the two tasks listed in item 1a-b, above.
3. Characterization of microgrpahs using area, spatial, and texture features.All features are concatenated as a single feature vector to represent a microstructure image.
4. Binary classification for each possible pair of processing histories (HT1-C1, HT1-C2, HT2-C1, etc.) based on area features only.This experiment provides a detailed investigation of how well area features represent micrographs.
Training results from these three experiments are summarized in Table II.The model performance are measured in F1 score, defined as follows:  features available (both interpretable area and spatial features, and texture features), and reach an F1 score of 95.1% for the 10-class classification task.This result serves as a benchmark for this data set and allows us to evaluate the predictive power of other models.While the area features have long been regarded as a strong indication of the microstructure processing history, from the microstructure representation experiments detailed here, we find that the predictive power of area features is actually very limited.Based on a trained Random Forest model from Experiment 3, the feature importance of the area feature corresponding to UC is 0.09, which is higher than the other 40 features.However, there are many other features from spatial and texture features with a feature importance of approximately 0.06.This conclusion can also be verified in Experiment 4, where we find that for binary classifications between two specific processing histories, the area features do not always result in high F1 scores.This finding is highlighted by very poor classification performance listed in the matrix, for example a F1 score of 61% for the following two conditions which were both homogenized at 900C-48hr: HT1-C4 (cold rolled to 0.025 inches) and HT1-C6 (cold rolled to 0.008 inches and annealed at 700C-1hr).
B. Synthetic microstructure generation using Generative Adversarial Networks Generative adversarial networks (GANs) have been proven successful for many image synthesis and unsupervised learning tasks 29 .It is a popular framework for representation learning, such as disentangle pose from lighting in 3D rendered images 35 , and image completion, where large missing regions are synthesized utilizing the surrounding image features 36 .Variants of GANs have surpassed many other generative models in the quality of samples as well as their underlying representation.Recently, GANs have emerged as a promising methodology for application in computational materials design, for the purpose of developing structure-property and structure-performance relations via physical simulations. 37,38ANs are implemented by deep neural networks, and thus are able to capture complex microstructural characteristics.Hence, we investigate different GAN architectures here for the specific material system of U-10Mo, and the task of generating realistic artificial micrographs that could be useful in supplementing real data sets or used in an effort to predict microstructure from processing parameters.
A GAN framework consists of a generator, G, that generates samples from a noise variable, z, and a discriminator,D, that aims to distinguish between samples from the real data distribution and those from the synthetic data distribution (from the generator).The training of a GAN can be summarized as a two-player minimax game: where p data is the underlying distribution of real images and p z is some noise distribution.
Although the objective of the training is straightforward, the actual training can be quite unstable because of the nonconvex cost functions and the high-dimensional parameter space. 39In practice, the model could encounter many problems, such as vanishing gradients, where the discriminator gets too good and the generator fails to make progress, and the mode collapses (i.e. the generator collapses to a state where it always outputs the same sample).Especially in cases where we want the GAN to learn a disentangled representation of the training data and output high-quality samples, the training can be extremely hard to converge.
In this work, we make use of multiple variants of GANs to generate artificial microstructure images and demonstrate how GANs can be used to synthesize realistic images with varying resolution.In this small case study, the same set of original SEM-BSE U-10Mo micrographs described previously are used as the training set.Images are cropped into 1024 by 1024 and resized to 512 by 512 square pixels.We choose 512 by 512 as the size of training images and output samples for several reasons, including: 1. High-resolution images are needed for characterization.
Phases such as the lamellar transformation products may not be represented well if images are too low in resolution.
2. The microstructural area contained within the image should be large enough to reflect the processing history.This would help to keep the variance of the training images small enough so that the GAN synthesized images represent the different classess well.
3. The most recent GAN technology is capable of higher resolution images up to 1024 squared and in this study we wished to explore this higher resolution capability.
Two different GAN architectures were trained and compared on the basis of artificial image quality: (1) a progressively growing GAN 40 , where samples are generated from latent noise in original micrographs, and (2) a Pix2Pix Generative Model 41 , where a segmentation label map is provided as input.

Progressive Growing GAN
Progressively growing GAN (pg-GAN) is an adversarial network variant that helps to stabilize the training of a highresolution GAN.The generator is initialized with low resolution images, over which new layers are added progressively to capture finer spatial details.Each of the new layers is treated as a residual block that smoothly blends into the network when the resolution of the GAN is doubled.
Data augmentation is applied to the original set of 272 SEM-BSE images in order to increase number of images available for training.The data augmentation utilized here involves cropping original images into smaller squares with a horizontal shift, rotation by 90 degrees and horizontal/vertical flipping.Finally, images are resized to 512 by 512 square pixels.Using these methods a total of 10,880 images were available for training.
We follow the model specification from the original paper 40 with Python 3.6.9and TensorFlow-GPU 1.13.1.We use the Adam optimizer with the default learning rate scheduling algorithm.Training images and output samples are both 512 by 512 square pixels and the training length is set to 1,000,000 images.To measure model performance, we sampled 1,000 images generated by the model, with examples given in Figure 7.The images sampled from the generated set are qualitatively close to the real data distribution although some images contain visual artifacts, such as image (f) in Figure 7 (the boundary between the microstructure and the background should be a straight line).
Lastly, the approximated distribution is entangled, meaning that the image data are encoded in a complicated manner and the input noise variables are not interpretable (see Figure 6).Although the synthetic images are visually nice, we cannot interpret the role of the input noise vectors in the generation of synthetic images.This blocks us from understanding how samples from different processing histories are distributed in the learnt space of microstructure images or possibly revealing  their underlying connections.Additionally, from visual examination of the Figure 7 artificial images, we find some spatial patterns that are not visible in training images.Visually, we see the texture of artificial images is different, and such texture anomalies are discussed in further detail in Section III B 4.
To better assess the GAN results, we turn to automated methods such as the sliced Wasserstein distance (SWD) 42 to measure how similar artificial images are to the training set over different scales.We measure the SWD with the checkpoint images during training and plot the distances with respect to the number of training images fed into the model, with results given in Figure 8.We find that the SWD at different resolutions generally decreases as the training proceeds.The model converges after approximately 8,000 thousand training images.Even with limited micrographs for training, the progressive growing GAN can still learn the real data distribution well, as demonstrated by the realistic synthetic images shown in Figure 7(a)-(f).

Pix2Pix Generative Model
Unlike the Progressive Growing GAN, Pix2Pix GAN is a conditional GAN 43 variant that learns a mapping from some extra given information y ("condition") and input noise z to output images.The objective of a conditional GAN is given by: min The Pix2Pix model is widely used for image-to-image translation, such as image style transfer, labels to Facade, and edges to photo.In this work, we use the labels generated from the segmentation algorithm detailed above as style A and realistic microstructure images as style B. Overall, the Pix2Pix generative model takes a labeled image as input and generates a realistic microstructure image, as shown in Figure 9.We note here that the segmented image given as the input includes some noise (due to charging from the sample in the SEM) that was segmented as a separate phase.Although from a segmentation point of view, this is not an accurate representation of the microstructure (i.e. the noise is not an important microstructure feature of the image), it does mean that the charging artefact in the original image is accurately captured in the synthetic image, thereby making the synthetic image realistic when compared to the original image data.The same set of 272 original images is used, after removing the scale bar and cropping each original image into squares.These images are then used for training, and are referred to as real B, which serve as the ground-truth images (style B).Prior to training, real image labels are prepared, where the real images are referred to as real A (the model input).Image segmentation, such as the algorithm suggested in 32 or the one described here in Section II B 1, can be applied so that for each image in real B, we have a label image in real A.
We use the default model specification described in the Pix2Pix model paper 41 .After the training is finished, 50 synthetic images are randomly sampled and some of them are displayed in Figure 10.From the sampled images, we can tell that the synthetic images generated by the Pix2Pix model are visually close to the ground-truth.With sufficient information from the label images, they are more realistic than those sampled from the Progressive Growing GAN.While spatial patterns are not visible in these sampled images, we apply the same measurements as we did on the Progressive Growing GAN as a comparison, which can be found in Section III B 4.  While the outputs from the Pix2Pix model are visually more realistic, they require additional information as inputs.They ignore the distribution of microstructures in the image and focus on the learning and simulating the textures in real images.Training a high-resolution GAN with interpretable conditions remains a challenge.For future studies, training a high-resolution GAN could be split into two steps in which we first focus on synthesizing the underlying representation of the microstructures (such as the label image), and then in a second step, adding texture to the image.

Analysis of Microstructure Distribution Learnt by the pg-GAN
In this section, we measure the differences between the microstructure represented by the real images and synthetic images.We apply the characterization pipeline introduced in Section II C. Considering the visual differences (image "sharpness" and local patterns) between real images and synthetic images, and that original images are resized to a smaller size before being used for GAN training, we use a slightly different parameter setting (given in Table III) from the previously described to ensure the quality of image segmentation.
We generate 300 synthetic images by random sampling from the trained pg-GAN model.Synthetic images are seg-  11, along with the predicted processing histories.It can be found that the area features collected from real images and area features collected from synthetic images come from similar distributions.
To quantitatively measure whether the synthetic image are from the same distribution as the real images, or how well the pg-GAN has learnt to represent the microstructure, we carry out the two experiments below.
From the previous experiments, we have collected area features, spatial features, and texture features from the 272 original images and the 300 synthetic images.Two models are trained to classify whether a specific features is collected from a real image or a synthetic image.The first model uses the area features as the only input, while the second model takes all the features (area, spatial, texture) as the input.Both models use the Gaussian Process Classifier (GPC) to learn two distributions for the features from real images and synthetic images.With 5-fold cross-validation, the accuracies for the two models are 52.6% and 52.4% respectively.The fact that both models fail to distinguish between features collected from real images and synthetic images supports our assumption that the pg-GAN managed to learn the underlying distribution of microstructure well.
Here, we consider the area features collected from the 272 original images and the 300 synthetic images.For each pair of processing histories (e.g.HT1-C1 and HT1-C2), we train a binary classification model to classify between the area features collected from real HT1-C1 images and the area features collected from synthetic HT1-C2 images.Again, the Gaussian Process Classifier (GPC) is used as the model classifier and the classification results are reported in Figure 12 as F1 scores with 5-fold cross-validation.It is noted here that since there are no synthetic images predicted as HT2-C1 in the 300 synthetic images, the column corresponding to HT2-C1 is left empty in the matrix.
It can be found that on the main diagonal matrix, where the real area features and synthetic area features are from the same processing histories, the F1 scores are generally low.This finding is consistent with our assumption that the pg-GAN managed to learn the underlying distribution of microstruc-ture well.Also, the classification performance in other cells off the main diagonal are relatively high with a few exceptions.For those off-diagonal cells with low F1 scores, such as the classification between HT2-C3 and HT1-C3, they are quite consistent with the classification results reported in Figure 5.The poor classification performance for this particular example of HT2-C3 and HT1-C3 may be attributed to the very similar microstructure generated after few processing steps, where the only difference between these two conditions is the homogenization temperature and time.
By evaluating classification performance between features from real and synthetic images, and between area features for different processing histories, we can compare the area features, spatial features, and texture features collected from real images and synthetic images.In previous experiments, we have shown that these features are quite good at characterizing the microstructure.Especially for the Random Forest model that takes all features as the input, processing history of microstucture can be predicted with an F1 score of 95.1%.However, these features are not explicitly considered during the GAN training.From the fact that features collected from real or synthetic images are not distinguishable indicates that the pg-GAN model managed to learn the underlying distribution of microstructure well.Although the results from binary classification are highly based on the processing histories that are predicted by a Random Forest model, and not originally given by the pg-GAN, the results can serve as a comparison with the experiment results in Section III and suggest the underlying representation learnt by the pg-GAN is similar to the microstructure representation given by the real images.

Texture Differences in Real Versus Synthetic Micrographs
Sub-regions of images from real microstructures and images generated by the GAN were subjected to the Discrete Fourier Transform (DFT) operation.This was performed with the objective of studying the spatial patterns exhibited by these two classes of images.
DFT samples a discrete set of frequencies corresponding to the size of the image in the spatial domain.In the Fourier domain, the intensity at frequency point (k,l) is calculated by: where f(i,j) is the pixel intensity at position (i,j) in the real space.Images exhibiting geometric or spatial patterns tend to amplify specific frequencies in the Fourier domain, and hence the Fourier transform of an image can be used to highlight spatial patterns present in the image.For example, when an image exhibits horizontal patterns seperated by a pixel distance of WIDTH/2, its Fourier Transform exhibits a local maxima at (k,l) = (2,0).
In this work, DFT was implemented in Python using the OpenCV 44 and Numpy 45 libraries.The 'dft' method in OpenCV was used to perform the transform, and the 'fftshift'   Figure 13 shows a characteristic example of an image of a real microstructure and the Fourier transforms performed on the different sub-regions of the image.The sub-regions selected for this analysis are all 200px × 200px in size.It can be observed that the horizontal texture manifests itself in thin vertical frequency lines on the transforms.
Figure 14 shows a characteristic example of a synthetic image generated from an adversarial network and the Fourier transforms performed on the different sub-regions of the image.It can be seen that the synthetic images are characterized by strong patterns, which manifest as local maximas at several frequency points on the transformed images.The Fourier transforms of a synthetic image exhibit more local maximas than the corresponding transforms of a typical image of a real microstructure.
In order to quantitatively analyze how the magnitudes of Discrete Fourier Transform can be used to characterize the texture in real and synthetic images, as well as microstructure from different processing histories.We collect the magnitudes of the transform as a feature vector, referred to as DFT features.More specifically, after the DFT process, we take the magnitudes from the 9x9 region in the center and flatten them into a 1-dimensional feature vector.We collect the DFT features from the 272 real images and the 300 synthetic images, which are prepared in Section III B 3. Two experiments are carried out: 1. Using the DFT features only, we train a Gaussian Process Classifier model to predict whether a DFT feature vector is collected from real microstructure images or synthetic microstructure images.With 5-fold crossvalidation, the accuracy of the model is 99.4%.
2. For each pair of processing histories (e.g.HT1-C1 and HT1-C2), we train a Gaussian Process Classifier model to predict if a DFT feature vector is collected from real HT1-C1 images and real HT1-C2 images.Similar experiments are conducted between real HT1-C1 images and synthetic HT1-C2 images, and between synthetic HT1-C1 images and synthetic HT1-C2 images.Model performance are measured in F1 scores, as reported in Figure 15.
It can be concluded that as a feature vector, the DFT magnitudes can discriminate between real and synthetic images well, but fail to characterize microstructure from different processing histories.Considering the synthetic images generated from the pg-GAN model, the DFT features are good indicators that the synthetic images are not perfect.
It should be noted that the synthetic image used for the analysis above was generated after the adversarial network had reached a steady state (i.e., at a point when the network can no longer distinguish between a real and a synthetic image).Thus, it is reasonable to conclude that though the images generated by GANs exhibit spatial patterns that are not present in the training images, the presence of these spurious patterns is not significant enough to bias the discriminator.The likelihood of the occurrence of such patterns must be taken into consideration for classification problems such as texture detection in a microstructure dataset.

IV. CHALLENGES AND BEST PRACTICES
Several challenges associated with recognition and quantification of microstructure image data exist, due to various limitations inherent in materials science studies.The two major challenges are limited size of datasets and imbalanced datasets.Many machine learning (particularly deep learning) models require large datasets for training.A deep learning model generalizes to test datasets better as the size of training set is increased.In a typical environment for microstructure imaging, generating high quality micrographs in large numbers is dependent upon metallographic sample preparation, microscope operator skill, facilities, and time.For objectives such as multi-class classification, training from real data sets could lead to a situation where few classes have a dispropor- tionate number of images, highlighted in prior work 22 , and in the U-10Mo data set studied here.Imbalanced datasets can result in a reduced classification accuracy for the class with the disproportionately lower number of images, even if the overall accuracy is within acceptable tolerance.With respect to the case study of GAN seen earlier in the study, an imbalanced dataset may also result in a reduced significance for the spatial patterns present in the disproportionate class among the patterns exhibited by the synthetically generated images.Given these challenges, the authors suggest several best practices, in addition to the current work, to produce meaningful results from an image driven machine learning approach to microstructure recognition and quantification: 1. Shallow learning and conventional learning techniques: Convolutional Neural Networks and other algorithms such as SVMs or ensemble classifiers are capable of performance within an acceptable tolerance for simple classification problems.
2. Semantic Segmentation: DeCost et al 14 demonstrated the use of the semantic segmentation algorithm to isolate objects in the microstructure dataset 3. Serialization of techniques through a task pipeline: Prior work has demonstrated a method for quantitative feature extraction by automating the algorithm selection process through a task pipeline. 22 Dataset augmentation: The user may artificially populate the training dataset by methods such as cropping, rotating, and adding uncorrelated noise to the original images in order to ensure that the training process generalizes reasonably well to test datasets.
5. Automated algorithm selection and hyperparameter optimization in machine learning: Given the size and scale of datasets in material science, automatic methods for selection of algorithms and hyperparameters [46][47][48][49][50][51] can prove to be a viable option for fine tuning the parameters of algorithms and improve generalization performance as much as possible given the scarcity of data to learn from.

V. CONCLUSIONS
In this work, we perform multi-class classification for the purpose of linking microstructure to processing condition.The original data set consists of micrographs for ten different thermo-mechanical processing conditions of a U-10Mo alloy.We evaluate the classification model performance for different microstructure reperesentations, and results reveal that area, spatial, and texture information are needed for accurately describing image data.Using this newly developed microstructure representation, an F1 score of 95.1% was achieved for distinguishing between micrographs corresponding to ten different thermo-mechanical material processing conditions.Generative adversarial networks were also explored to better understand if synthetic image data could be used to supplement small, imbalanced original image data sets.Two different networks were trained and tested to assess performance: Progressive Growing GAN and Pix2Pix GAN.We find that the Progressive Growing GAN introduces spatial patterns that are not present in original image data.Texture detection in a microstructure dataset, might be adversely affected by the presence of such spurious patterns.Our work highlights that semantically meaningful segmentation alone may be insufficient in representing image data, particularly as the complexity of material processing and resultant microstructure increase.Hence, the need for predictive or generative methods is a frontier in materials science and engineering, and should be leveraged in future studies to accelerate the materials design and characterization process.

Figure 1 .
Figure 1.(a) Schematic of U-10Mo fuel fabrication, where the steps shown in the bracket were used to generate microstructure imaged and analyzed in this work.(b) Sample matrix of sample processing conditions analyzed.For each condition, SEM micrographs were analyzed.Two different homogenization temperatures were utilized, and thus samples are grouped based on the homogenization treatment performed.Sample conditions are indicated by homogenization treatment (HT) number, where HT-1 refers to 900°C-48hr, and HT-2 refers to 1000°C-16hr.The processing condition (C) the micrographs represent are indicated by C followed by a number that corresponds to the number processing condition (e.g.C1 is a sample in the as-cast condition).

Figure 2 .
Figure 2. Example micrographs from different classes used and micrographs from selected class labels used in this work.A total of ten classes were assigned as part of this effort.In the area of classification problems this is a large number of class labels and furthermore the availability of microstructures and the subtle differences between structures further complicates the problem of successful classification.

Figure 3 .
Figure 3. (a) Schematic describing the approach developed for the 10 class classification experiment performed in this work.A two-stage segmentation approach was used, and is schematically described in (b).The two stage segmentation is described as follows: (1) stage 1 involved the use of multiple features, and (2) stage 2 utilized areabased features.A random forest model was then used for classification and five-fold cross validation was used to validate results.

Figure 4 .
Figure 4. Visualization of area feature for each phase.

Figure 5 .
Figure 5. Model performance of binary classification for each pair of processing histories for the U-10Mo microstructure.Here, the F1 scores are reported on a 0 to 1 scale, where 1 indicates 100% correct predictions.

Figure 6 .
Figure 6.Schematic of an entangled representation (left) and disentangled representation (right).In the entangled representation, data are encoded in a complicated manner.In the disentangled representation, the independent variables are interpretable.

Figure 7 .
Figure 7. Example synthetic images generated by the trained Progressive Growing GAN.Images given in (a)-(f) show varying microstructural features, specifically different extent of lamellar transformation products, and distribution of carbides.The micrograph in (f) shows the edge of a fuel plate.

Figure 8 .
Figure 8.We use the sliced Wasserstein distance (SWD) between the training images and generated images to evaluate the model performance.This figure shows the SWD under different resolutions as the training progress.

Figure 9 .
Figure 9. Pix2Pix Generative Model takes a label image as input and outputs a synthetic microstructure image.In this example, the Real A image is generated from a sample with the HT1-C1 processing history.

Figure 10 .
Figure 10.Randomly sampled images from the trained Pix2Pix model.Each row is a different sample and from left to right are the images from real A, fake B (model output), real B (ground-truth).

Figure 11 .
Figure 11.Comparison between area features from real images and synthetic images.The processing histories of the synthetic images are predicted by a Random Forest model trained on features from real images.

Figure 12 .
Figure 12. Results of binary classification between area features collected from real class A images and area features collected from synthetic class B images.Model performance are measure in F1 scores with 5-fold cross-validation.Notice that there are no synthetic images predicted as HT2-C1 in the 300 synthetic images.
method in Numpy was used to shift the zero-frequency (DC frequency) to the center of the transform.The insets on Figures 13 and 14 are magnitudes of the transforms that have been subjected to a log filter, in order to visualize the local maximas effectively.

Figure 13 .
Figure 13.(a.)A characteristic image of a real microstructure, showing visible texture in the horizontal direction.(b.) Fourier Transforms of different sub-regions of size 200 x 200, superimposed upon the original image at the corresponding region.The background pattern is reflected in the thin vertical lines on the transform plots.

Figure 14 .
Figure 14.(a.)A characteristic image of a microstructure generated by pg-GAN.(b.) Fourier Transforms of different sub-regions of size 200x200, superimposed upon the original image at the corresponding region.Though these synthetic images show strong background features, the presence of these features does not bias the discriminator.

Figure 15 .
Figure 15.Binary classification between microstructure images from different processing histories based on DFT features.The model performance are measured in F1 scores.From left to right, the matrices measure the model performance between: (a) real images and real images, (b) real images and synthetic images, and (c) synthetic images and synthetic images.

Table I .
Parameters selected for model specification, compilation, and cross validation for U-10Mo image analysis.

Table II .
Summary of experiments, features used to represent microstructure image data, metric, and performance.
features to help improve model classification results.The performance of models is improved significantly to 78.9% and 65.1% for Experiment 2a and 2b, respectively.This increased performance indicates that spatial features are correlated with the processing histories.In Experiment 3, we used all the

Table III .
Parameters selected for microstructure characterization of synthetic images.