A data-driven approach to construct a quantitative relationship between microstructural features of fatigue cracks and contact acoustic nonlinearity

This study demonstrates the feasibility of a data-driven approach to construct a quantitative relationship between nonlinear acoustic parameters and microstructural features of contact interfaces. The near-surface nonlinearity is measured using dynamic acousto-elastic testing (DAET) with a surface wave probe, while the microstructural features are extracted from scanning electron microscopy (SEM) images of fatigue cracks. Four aluminum alloy samples, each having a fatigue crack are prepared. Six local nonlinearity parameters are measured at different locations along the crack propagation direction. A total of 40 local measurements are acquired. A principal component analysis (PCA) reveals that all six nonlinearity parameters are correlated and hence can be replaced by one principal component (PC). Fifteen crack micro-geometrical features at each measurement point were extracted from the SEM images. Regression analysis is used to relate the PC of the nonlinearity parameters to the microstructural features at the crack interface. We compare three regression models that take variable selection into account: stepwise multiple linear regression (MLR), stepwise principal component regression (PCR), and least absolute shrinkage and selection operator (LASSO). Despite having different principles, the three predictive models identify two features as the most significant in predicting the interface nonlinearity: the crack aperture (opening) distribution and the distance to the crack tip. The differences between the three models and the physical interpretation of the data-driven predictions are discussed.This study demonstrates the feasibility of a data-driven approach to construct a quantitative relationship between nonlinear acoustic parameters and microstructural features of contact interfaces. The near-surface nonlinearity is measured using dynamic acousto-elastic testing (DAET) with a surface wave probe, while the microstructural features are extracted from scanning electron microscopy (SEM) images of fatigue cracks. Four aluminum alloy samples, each having a fatigue crack are prepared. Six local nonlinearity parameters are measured at different locations along the crack propagation direction. A total of 40 local measurements are acquired. A principal component analysis (PCA) reveals that all six nonlinearity parameters are correlated and hence can be replaced by one principal component (PC). Fifteen crack micro-geometrical features at each measurement point were extracted from the SEM images. Regression analysis is used to relate the PC of the nonlinearity parameters to the microstructural features ...


I. INTRODUCTION
Nonlinear acoustic/ultrasonic techniques for non-destructive evaluation (NDE) based on the contact acoustic nonlinearity (CAN) 1 have shown great potential in detecting defects in the form of tight contact interfaces such as closed cracks, 2-4 imperfect bonded interfaces, 5,6 and loosening bolted joints, 7 which are difficult or impossible to detect using linear acoustic/ultrasonic techniques. However, the qualitative nature of the diagnostics and the uncertainties in interpretation of the results are impediments to a widespread adoption of these techniques. For example, some studies have reported that the nonlinearity parameter increases monotonically with crack density, 8,9 but in other studies, the nonlinearity parameter reaches a plateau, 10 after which it may even decrease. 11,12 One possible explanation is the changing micro-geometry of cracks over the course of damage. 13 Previous studies have shown that crack aperture and interface roughness have significant influence on the magnitudes of nonlinearity parameters (NPs) obtained from higher harmonic generation tests 3,14 and dynamic acousto-elastic testing (DAET). 2 Understanding the relationship between geometric features of cracks or contact interfaces and CAN is necessary for making sense of the nonlinear acoustic testing results when these techniques are used to detect "invisible" cracks. Moreover, this relationship is essential when the nonlinear acoustic signatures are used to characterize the contact interface. Beyond NDE, this knowledge has important implications in other domains such as geomechanics and geophysics, where the properties of fractured rocks are of interest. 15 Developing cross-scale physical models to describe the relation between the interface micro-geometric features and nonlinear acoustic signatures is a challenging task. Most existing analytical models are based on the principles of contact mechanics i.e., Greenwood and Williamson (GW) theory. 16 Within GW framework, the rough contact can be modeled as a spring with an equivalent contact stiffness. The stiffness-separation relationship is obtained from the distribution of asperity heights, where each asperity summit is assumed to be spherical and not interacting with neighboring asperities. The resulting nonlinear stiffness-separation relationship is then used to predict the amplitude of generated higher harmonics. 17,18 Although good qualitative agreement with highly idealized experimental results are reported, 17,18 these models are not validated on real cracks or fractures. Possible reasons are the unavailability of data and difficulties in incorporating real micro-geometric features into these models. In addition, some of the assumptions used in the analytical models are not realistic, which may compromise the accuracy of model predictions. For example, GW theory assumes that: 1) asperities are far apart and do not interact with each other and 2) the deformation occurs only at asperities; bulk deformation is not taken into account. Numerical simulations such as finite element methods provide an alternative way to predict nonlinearity at contact interfaces. [19][20][21][22][23][24][25][26] However, the numerical models do not include the roughness or other interfacial microstructural features because of the prohibitively high computational cost associated with multi-scale simulations.
Considering the complexity and limitations of physical models, in this study, we investigate the feasibility of using data-driven modeling to relate NPs and microstructural features of real fatigue cracks. To the best of our knowledge, this is the first study to use a data-driven approach to illuminate the physical mechanisms behind CAN. In this study, 15 geometric features of fatigue cracks on four aluminum samples are extracted from scanning electron microscopy (SEM) images. The NPs are measured using DAET at different locations along each crack. A total of 40 local DAET measurements are obtained. We compare three regression models for correlating crack features and NPs: stepwise multiple linear regression (MLR), stepwise principal component regression (PCR) and least absolute shrinkage and selection operator (LASSO). The three methods have different working principles. The stepwise MLR allows selection of the optimal crack features as predictor variables. The PCR is used to accommodate the potential correlation among predictor variables (multicollinearity), which may lead to instability of the predictor coefficient estimates. 27 We implement a stepwise PCR to choose the optimal principal components (PCs) for regression. 28 Finally, we use a regularization regression technique -LASSO -to perform variable selection in order to prevent over-fitting. The three regression models are compared and discussed in light of the available physical models.
The paper is organized as follows. Section II describes the test samples and experimental protocol. In Section III, we present the data analysis methods, including image processing steps and regression analyses. The results and discussion are given in Section IV followed by Summary and Conclusions in Section V.

A. Sample preparation
The data pertaining to four A7075 aluminum samples (30×40×170 mm 3 ) are presented in this study: Samples #1 through #4. A 60 degree 3 mm triangular notch was machined at the midlength (85 mm) of each sample in preparation for a three-point bending fatigue test. During the fatigue loading, a fatigue crack initiated from the notch. The crack propagated up to about halfway through the sample width but with a different crack growth rate for each sample controlled by the choice of stress intensity factors Kmin and Kmax. For one of the samples (Sample #4), the crack was induced progressively in 5 steps; the fatigue loading was interrupted when the crack length reached ∼ 6, 9, 11, 14, and 17 mm. All samples have similar alloy compositions and mechanical properties (A7075-T7351) except for one sample (Sample #2), which has undergone a different age hardening processing: A7075-T6. 2,[29][30][31] Table I lists the details of  these four test samples. B. Dynamic acousto-elastic testing (DAET) with a Rayleigh wave probe DAET 31-37 is a pump-probe scheme that uses the coupling between a low frequency (LF) pump and a high frequency (HF) A LF source (pump) makes the test sample vibrate at a moderate strain level (10 -6 ∼10 -5 ) while a pair of HF ultrasound transducers (probe) simultaneously probe the pump-induced changes in ultrasound pulse velocity and attenuation. The strain-dependencies of wave velocity and attenuation are analyzed to obtain the NPs. In this study, we use a piezoceramic disk to make the specimen resonate at its compressional mode (∼7.3kHz) to promote opening and closing ("clapping") motion at the crack interface, and a pair of Rayleigh surface wave transducers to probe (2MHz) the near-surface nonlinearity at several locations along each fatigue crack. 2 This test setup allows a direct measurement of CAN at crack interfaces. In addition, this choice of probe enables an unprecedented opportunity to directly relate the extracted NPs and the micro-geometrical features of the fatigue cracks on the surface captured in SEM images. Fig. 1(a) shows the schematic test setup. The HF probes scan along the crack propagation direction to obtain the local NPs at several locations along each crack ( Fig. 1(b)). Collectively, a total of 40 local nonlinear measurements are acquired on both sides (Side A and Side B) of all four samples. More details on the experimental setup and procedure are provided in Jin et al. 1 We extract the NPs by determining the strain-induced changes in HF pulse velocities Δc Δc where c ref is the reference wave velocity at the unperturbed state, O is the offset that describes the conditioning effects, 39 while β and δ are the quadratic and cubic nonlinearity parameters, respectively. Similarly, the nonlinear attenuation described by the relative transmission loss can be expressed as: where O TL , β TL , and δ TL are the corresponding NPs. The schematic representations of the extracted NPs are shown in Fig. 1

ARTICLE scitation.org/journal/adv
Microscope in backscattered mode with ×2000 magnification. The SEM image series are stitched together to build images of the entire length of each crack.

D. Test protocol
After preparing each fatigued sample (Section II A), DAET with a Rayleigh surface wave probe was used to measure the crack interface nonlinearity near the surface at four predefined locations ( Fig. 1(b)). In addition, the microscopic features of each crack were captured using SEM. In case of the progressively damaged Sample #4, both DAET and SEM measurements were repeated at each damage step. Fig. 2 depicts the measurement sequence.

III. DATA ANALYSIS
The data analysis includes: (1) SEM image processing to obtain the micro-geometrical features of the crack interface and (2) regression analyses. The details of DAET data analysis including the extraction of NPs can be found in Jin et al. 2 and are not repeated here.

A. Image processing
The analysis of SEM images consists of a series of preprocessing steps followed by feature extraction from the processed images. 2

Pre-processing
The flowchart in Fig. 3 summarizes the pre-processing steps including two exemplary images at intermediate stages of analysis. The analysis was performed using various functions available in MATLAB R2018b image processing toolbox (www.mathworks.com). 41 We first transform the grayscale SEM images into binary images (Original Binary Images in Fig. 3) by thresholding based on the overall grayscale of all the pixels using Otsu's method. 42 Next, a closing morphological operation (dilation followed by erosion) with a radius of 20 pixels was applied to remove small pixel clusters inside the crack to facilitate the subsequent analyses. The resulting binary images (Processed Binary Images in Fig. 3) were then used to obtain skeleton images by performing a thinning morphological operation. All the parameters related to crack features are extracted from the processed binary and skeleton images.

Feature extraction
We extracted the following 15 features from the SEM images: A description of each feature is given below.
a. Crack length features (LO, LC, L T ). Crack length is calculated from the skeleton images. Since many segments of cracks are closed (aperture is zero), the original skeleton is discontinuous. The open crack length LO is approximated as the perimeter of the original skeleton divided by 2. To calculate L T , the discontinuities in the skeleton caused by the closed crack segments are approximated using a shape-preserving piecewise cubic interpolation. 42 The closed crack length LC is calculated from the difference between L T and LO.
We obtain the open/closed crack ratio LO/LC by simply dividing LO by LC.
b. Skeleton roughness features (L T /LP, Rq, Ra). Projected length LP is the apparent crack length that falls within the probing area (the HF transducer area). Therefore, feature L T /LP is a measure of the crack path waviness and tortuosity. To calculate Rq and Ra, the profile is first filtered; a fifth order Butterworth high-pass filter with a 0.293mm spatial cutoff frequency is used to filter out the low-frequency waviness of the skeleton. These parameters are then calculated from the filtered profile as: where zn is the profile of filtered skeleton obtained by using the built-in MATLAB function 'bwmorph' with the 'thin' option, and N is the number of pixels on the skeleton. Higher Rq or Ra value indicates higher roughness. 43 c. Crack aperture distribution features (w, σw, wO, σwO, k, θ, R sk ). Crack aperture distribution quantifies the spatial variation in crack aperture along its length. The aperture is estimated from the processed binary images as the product of crack pixel count in the vertical direction and the approximate size of each pixel (0.1465 μm). The closed crack refers to the crack segments, where the aperture is estimated as 0. We understand that this approach does not provide the exact crack aperture but its vertical projection because the crack does not always run horizontally across the image. However, for the purpose of this study, the vertical projection can be viewed as the "effective aperture". This is because the LF pumping is vertically aligned and therefore, activates the crack opening-closing motion in the vertical direction. The mean w and standard deviation σw of the crack aperture are calculated from the effective aperture distribution. In addition, we calculate the corresponding values wO and σwO for the open crack portion by excluding the closed crack segments (aperture w = 0) from the distribution. We find that the distributions for open crack segments can be well described by Gamma probability density functions f ( is Gamma function, and k and θ are the distribution shape and scale parameters, respectively. The extracted values for k and θ are included as crack features. Finally, the Skewness (R sk ) is a measure of the symmetry of the profile about its average: 43 A R sk value of 0 corresponds to a symmetric distribution and positive (or negative) values correspond to a distribution skewed towards high (low) aperture regions.
d. Distance to crack tip (d). This parameter is defined as the distance between the crack tip and the center of the HF ultrasonic probe at each measurement point.

B. Statistical analysis
We use three different regression methods namely, stepwise MLR, stepwise PCR, and LASSO, to relate the measured NPs and extracted crack microstructural features. In the regression analysis, we treat NPs as dependent or prediction variables while the 15 crack features are considered independent or predictor variables. This section provides concise descriptions of each regression method. Although the three methods share similarities, they use different strategies for dimension reduction and over-fitting prevention. A comparison of the resulting regression models is given in the following section (Section IV). A physically meaningful regression is expected to be independent of the choice of regression analysis utilized.

Stepwise multiple linear regression (MLR)
The MLR model for NP prediction is of the form: where b 0 , b 1 , . . ., bn represent the regression coefficients, and e is the corresponding error. Measured NPs and the features extracted from the SEM images are treated as dependent (y) and independent (x 1 , . . ., xn) variables, respectively. 27 Both dependent and independent variables are normalized by standardization. The coefficients are estimated using an ordinary least squares fit. Since including all the predictor variables for model prediction may lead to over-fitting, a subset of predictor variables should be selected. To choose the most informative predictor variables or predictors, a stepwise procedure is implemented. 27 Stepwise MLR allows selection of the best predictors following a systematic iterative procedure, where at each iteration, a predictor variable is added and removed based on a predefined criterion. 27 In this study, the criterion for adding or removing predictors is based on the p-value. The modified regression model takes the following form: where w 1 , . . ., wn are binary weights (0 or 1). The predictor selection procedure starts with setting all the binary weights equal to zero: w 1 = . . . = wn = 0. At each of the following steps, a predictor with the smallest p-value is added. The iterative process continues until there is no predictor left that has a p-values less than a prescribed significance threshold, which is set to 0.05 here. This activates a predictor elimination procedure to remove the predictors with the p-values greater than a predefined significance threshold, which is set to 0.1 in this study. The selection and elimination procedures are repeated alternately until no predictor may be added or removed.

Stepwise principal components regression (PCR)
PCR is a regression techniques based on principal components analysis (PCA). 44 PCA is a statistical tool that linearly transforms an arbitrary set of variables into an orthogonal set. The new orthogonal variables are known as principal components (PCs); each PC is a linear combination of original variables. PCA is implemented through calculating the eigenvectors and eigenvalues of the covariance matrix of the original variables. The procedure to calculate PCs is as follows. The normalized predictor matrix X is organized as N variables (e.g., crack microstructural features in this study), each with M observations in a M × N matrix. The corresponding covariance matrix CX is given as: The PCs are calculated through an eigenvalue decomposition of matrix CX: where the columns of matrix P are the eigenvectors of CX and the diagonal elements of the diagonal matrix Λ are the corresponding eigenvalues. The eigenvectors of CX are the PCs. The PCs are arranged in a descending order according to their corresponding eigenvalues. The larger the eigenvalue, the more significant is the PC. The PC with the largest eigenvalue is called the first PC (PC1) and captures the largest variation in the data, followed by the 2 nd , 3 rd . . . PCs (PC2, PC3. . .). The dimension reduction is achieved by ignoring the less significant components. It is customary to retain only the PCs with the largest eigenvalues and exclude those corresponding to the small eigenvalues. As will be shown in the following section, we use this approach to reduce the dimensionality of dependent or prediction variables i.e., NPs. However, when applied to predictor variables, this approach may increase the risk of discarding useful information while still including insignificant variables for regression analysis, because the PCs corresponding to the highest eigenvalues do not necessary have more predictive powers than the discarded ones. 45 As such, we use all PCs as independent variables in the linear regression model. 28 The optimal PC selection is achieved by adopting the stepwise procedure described in Section III B 1. The modified regression model has the form: where p 1 , . . ., pn are the PCs of the predictor variables, andb 0 ,b 1 , . . ., bn are the regression coefficients. Having determined thebi, the corresponding regression coefficients bi for the predictor are obtained through the following transformation: where the operator ○ denotes inner product.

Least absolute shrinkage and selection operator (LASSO)
LASSO offers an alternative way of variable selection in order to improve prediction accuracy and to reduce the risk of overfitting. 46 LASSO is essentially a regularized least-squares regression scheme, where the objective function is augmented by a L1-norm penalty term scaled by a regularization parameter λ ≥ 0: Inclusion of the L1-norm (∑ N j=1 |bj|) forces certain coefficients to become zero. Therefore, it yields a simplified model that excludes less significant variables. The larger the λ, the more coefficients are reduced to zero. The optimal value of λ can be determined by cross-validation as will be shown in the following section.

A. PCA on predictor variables
As noted in Section II C, we can extract a total of six NPs from DAET results namely, O, β, δ, O TL , β TL , and δ TL . The covariance analysis shows that these NPs (after normalization) are moderately to strongly correlated (Table II) suggesting that the dimensionality of the problem may be reduced using PCA. Table III shows the coefficients of the PCs of nonlinearity parameters, or (PC) NP , together with the corresponding eigenvalues and cumulative explained variances. We observe that (PC1) NP explains the majority of the variance in the data (84.3%) and that all six NPs contribute to (PC1) NP with similar absolute weights (0.4±0.04). This observation suggests that a single variable, (PC1) NP can adequately represent all 6 NPs without losing much variance (15.7%). Therefore, we use (PC1) NP to represent NPs, or NP (PC1) as the dependent variable in the following regression analyses. Table IV shows the statistics pertaining to the stepwise MLR model used to predict the dependent variable, NP (PC1). Out of 15 predictors, only four are preserved for NP (PC1) prediction after stepwise variable selection: d, shape parameter k, scale parameter θ, and wO. Among these four features, scale parameter θ has the largest coefficient and wO has a negative coefficient. We also report the corresponding t-statistics and p-value to demonstrate the significance of regression coefficients. The t-statistics is the ratio of the estimated coefficient to its standard error (SE): t = b i SE(b i ) ; the larger the absolute value of t-statistics, the more significant is the corresponding regression coefficient. The p-value for each coefficient tests the null hypothesis that this coefficient is equal to zero. A coefficient with a low p-value (typically <0.05) is likely to be significant. We   Fig. 4 shows that the predicted and observed values for NP (PC1) are in good agreement with an R-squared value of 0.911.

C. Stepwise PCR
In order to construct the PCR model, 15 PCs are obtained from the 15 crack features (Fig. 5) and then used to predict NP (PC1). To differentiate the PCs of nonlinearity parameters or (PCs) NP , the PCs of crack features are hereinafter referred to as (PCs) feature . At the predictor selection stage, we keep the predictors with a p-value < 0.05 and remove those with a p-value > 0.1. Table V lists the retained (PCs) feature and presents the corresponding regression analysis statistics. Although the regression analysis shows a relatively high R-square value of 0.924, there are indications that the model may be over-fitting the data. Specifically, the coefficient for (PC14) feature is one or two orders of magnitude larger than the other coefficients (Table V). To resolve this issue, we resort to a stricter predictor selection criterion: keeping the predictors with a p-value   Fig. 6. The overall performance of the model is good and has a slightly lower R-squared value of 0.774 compared to the stepwise MLR.

D. LASSO
The lasso regression analysis results are shown in Fig. 7 and Fig. 8. Fig. 7(a) shows how the model predictive performancequantified by mean squared error (MSE) -varies with the choice of λ. The MSE values are calculated from 5-fold cross-validations. 46 As λ increases, the MSE decreases until reaching a minimum at λmin = 0.016, after which it increases. For very small λ (∼0), the model includes little regularization and therefore, is very similar to the stepwise MLR model. However, the high MSE values suggest overfitting. With increasing λ, the MSE decreases because some of the insignificant crack features are excluded from the model. Ideally,   47 The trace plot in Fig. 7(b) shows how the coefficient of each predictor variable varies with L1-norm ( Fig. 7(b)). At an extremely high regularization (very large λ), the value of L1-norm reduces to 0 resulting in a trivial model with all coefficients equal to 0. With decreasing λ, the L1-norm increases, and the predictive variables start to appear in the predictive model as their coefficients assume non-zero values. The selected model (λ 1SE = 0.293) is controlled by three crack features: d, scale parameter θ and wO as shown in Fig. 7(b). Among the three, scale parameter θ has the larger coefficient and thus, a greater contribution (Table VIII). Fig. 8 shows that the predictive and observed NP (PC1) agree reasonably well with an R-squared value of 0.819.

E. Discussion: Comparison of regression results and physical interpretation
Despite the differences in their principles, all three models have attributed the nonlinearity of the cracks tested in this study to d and scale parameter θ. The agreement among the models is an indication that the regression models are physically meaningful, and the relations do not depend on the choice of the regression analysis. It is not surprising that d has a positive contribution to NPs. Our laser interferometry measurements show that the "clapping" amplitude at crack interfaces during LF oscillation decreases as the measurement point moves away from the notch towards the crack tip In other words, the nonlinearity is expected to be higher. The other significant parameter with a positive regression coefficient is identified as the scale parameter θ. A higher θ is an indication of a more spread aperture distribution (see Fig. 10). Consequently, the value of θ is strongly correlated to the average aperture of open crack wO with a correlation coefficient of 0.94 (Fig. 9); a higher θ suggests a crack with a wider aperture. In other words, the nonlinearity measures higher at locations, where the crack aperture is larger. This may seem contradictory to what some other researchers have reported i.e., the nonlinearity measures larger at closed cracks than open cracks. 30,49 This apparent discrepancy is due to the fact that the aperture of all our cracks are very small (<5 microns) and they all fall in the class of 'closed cracks'. Our observation is in accord with previous experimental studies on the nonlinearity of contact interfaces 50 and real fatigue cracks, 3 where the reported CAN measured based on the generated second harmonic amplitudes decreases with increasing nominal contact pressure, which in turn, decreases the interface aperture. Physical models developed by Biwa et al. 51 and Pecorari 17 have also predicted similar trends of nonlinearity as a function of contact pressure (smaller contact pressure suggests a higher aperture). In sum, the regression model predictions align well with the physics-based expectations. They also agree with our corresponding analytical modeling results. 48 Although the results from the three models used in this study agree, each offers certain advantages and limitations. Both Stepwise MLR and LASSO output very different coefficients for two highly correlated predictors: scale parameter θ and wO. This is because neither stepwise MLR nor LASSO can address multicollinearity. Although multicollinearity among predictor variables does not impede a reasonable prediction, it complicates the physical interpretation of the predictor variable influence on the dependent variable due to unstable predictive variables' coefficients. 27 PCR on the other hand, provides an effective way to address multicollinearity since all (PCs) feature are linearly independent. For example, the corresponding stepwise PCR coefficients of scale parameter θ and wO are very similar as given in Table VII. We also observe negative PCR regression coefficients for skeleton roughness Rq and Ra. This finding is in accord with our previous observations 2 that skeleton-roughness induced crack closure inhibits clapping and thus reduces the interface nonlinearity. However, the overall predictability of stepwise PCR is the lowest among the studied models (R 2 = 0.774). Fig. 10 summarizes the knowledge gained from the regression analyses.
Finally, we have found that all the nonlinearity parameters O, β, δ, O TL , β TL , and δ TL for the fatigue-cracked aluminum samples are somewhat correlated. This observation is in odds with the previous experimental results on disparate rocks, where only the offset (O and O TL ) and curvature (δ and δ TL ) were found to be correlated. 40 The experimental work on rocks suggests different physical mechanisms contributing to the offset or curvature than the slope.
The slope is a measure of classical nonlinearity due to acoustoelastic behavior, while the offset and curvature are associated with transient elastic softening or conditioning, 33,38 as well as asymmetry between tension and compression phases. The mechanism of acoustic nonlinear behaviors at contact interfaces is somewhat different. Intuitively, the slope β is a result of the stiffness variation at the contact interfaces during clapping; but unlike in granular media, little softening/conditioning leading to offset is observed. However, offset and curvature parameters could be the result of asymmetry between the tension and compression phases (positive and negative strain in Fig. 1(c) and (d) respectively). This would explain why all six parameters are correlated. Further investigation would be needed to test this hypothesis.

V. SUMMARY AND CONCLUSIONS
Considering the complexity of the physical mechanisms giving rise to acoustic nonlinearity at rough contact interfaces, we attempt a data-driven approach to find the quantitative relationship between CAN and microstructural features of contact interfaces. We use data corresponding to four fatigued aluminum alloy samples. Local acoustic nonlinearity is measured using DAET with a Rayleigh surface wave probe at various positions along the crack propagation direction. Using a surface wave probe enables measuring the local nonlinearity of the crack within 1-1.5 mm from the surface. This allows us to directly relate the locally measured near-surface nonlinearities to the microstructural features of the cracks visible on the surface.
A total of six nonlinearity parameters are obtained from DAET. However, these parameters are correlated, thus are combined into a single parameter using PCA. This composite parameter is then treated as the dependent variable in the subsequent regression analyses. A total of 15 crack micro-geometric features at each measurement location are extracted from the analysis of SEM images. We use three regression models that incorporate variable selection (dimension reduction): stepwise MLR, stepwise PCR, and LASSO. All three ARTICLE scitation.org/journal/adv models show good predictability performance with R-squared values greater than 0.75. Despite their different principles, all three regression models agree that crack aperture (opening) distribution and the distance to crack tip play the most important role in predicting the nonlinearity parameters from DAET. This agreement suggests that the model predictions are physically meaningful. The advantages and limitations of each model are discussed. This study demonstrates the feasibility of using data-driven models to relate nonlinearity parameters and micro-geometric features of contact interfaces. The results of statistical models agree well with our physical interpretation. We acknowledge that the dataset used in this study is relatively small and may not include sufficient variations. A larger dataset with more variability in crack aperture sizes and interface roughness characteristics is expected to result in better predictability and more versatile modeling. Nevertheless, this study is the first to show the utility of a data-driven approach and use of high-resolution imagery to illuminate complex physical mechanisms. This approach complements our parallel efforts on developing a physics-based model for our experimental observation. 2,48