Machine learning depinning of dislocation pileups

We study a one-dimensional model of a dislocation pileup driven by an external stress and interacting with random quenched disorder, focusing on predictability of the plastic deformation process. Upon quasistatically ramping up the externally applied stress from zero the system responds by exhibiting an irregular stress--strain curve consisting of a sequence of strain bursts, i.e., critical-like dislocation avalanches. The strain bursts are power-law distributed up to a cutoff scale which increases with the stress level up to a critical flow stress value. There, the system undergoes a depinning phase transition and the dislocations start moving indefinitely, i.e., the strain burst size diverges. Using sample-specific information about the pinning landscape as well as the initial dislocation configuration as input, we employ predictive models such as linear regression, simple neural networks and convolutional neural networks to study the predictability of the simulated stress--strain curves of individual samples. Our results show that the response of the system -- including the flow stress value -- can be predicted quite well, with the correlation coefficient between predicted and actual stress exhibiting a non-monotonic dependence on strain. We also discuss our attempts to predict the individual strain bursts.

(Dated: 1 July 2020) We study a one-dimensional model of a dislocation pileup driven by an external stress and interacting with random quenched disorder, focusing on predictability of the plastic deformation process. Upon quasistatically ramping up the externally applied stress from zero the system responds by exhibiting an irregular stress-strain curve consisting of a sequence of strain bursts, i.e., critical-like dislocation avalanches. The strain bursts are power-law distributed up to a cutoff scale which increases with the stress level up to a critical flow stress value. There, the system undergoes a depinning phase transition and the dislocations start moving indefinitely, i.e., the strain burst size diverges. Using sample-specific information about the pinning landscape as well as the initial dislocation configuration as input, we employ predictive models such as linear regression, simple neural networks and convolutional neural networks to study the predictability of the simulated stress-strain curves of individual samples. Our results show that the response of the system -including the flow stress value -can be predicted quite well, with the correlation coefficient between predicted and actual stress exhibiting a non-monotonic dependence on strain. We also discuss our attempts to predict the individual strain bursts.

I. INTRODUCTION
One of the key ideas of many disciplines including materials science in particular is that structure and properties of materials tend to be related. 1,2 For instance, much of metallurgy is about tuning materials microstructure to design materials with desired mechanical properties. 2 Typically the relation between, say, the precipitate content of a precipitationhardened alloy 3 and its yield strength is understood as an average property, and indeed in the case of macroscopic bulk samples the sample-to-sample variations in the yield stress of identically prepared samples tend to be small. However, this changes when dealing with samples with sizes down to the micrometer range and below: Recent micropillar compression experiments have revealed the fluctuating, irregular nature of small-scale crystal plasticity, 4,5 originating from the collective, critical-like dynamics of interacting dislocations mediating the deformation process. The fact that plasticity of small crystals proceeds via an irregular sequence of strain bursts with a broad size distribution implies also that sampleto-sample variations of the plastic response or the stress-strain curves might be considerable, 6 even if the samples have been prepared and the deformation experiments performed using the same protocol.
This raises the important general question of what can be said about the relation between initial structure and mechanical properties for individual samples of small crystals, when random variations in the initial (micro)structure become important. One way of framing the issue is to consider what we refer to as deformation predictability: Given a small crystalline sample with a specific arrangement of pre-existing dislocations and possibly of some other defects interfering with dislocation motion, how well can one use such information to predict the plastic response of that sample? A key challenge here is the high dimensionality of any reasonable description a) Electronic mail: lasse.laurson@tuni.fi of the disordered initial state, combined with the possibility of non-linearities in the mapping from the initial state properties to the ensuing response. This likely implies that simple empirical laws relating sample properties to its plastic response may be difficult to formulate. Challenges of this type are a key factor behind the recent emergence of machine learning (ML) algorithms as an important part of the toolbox of scientists in a wide range of fields including also physics and materials science. 7 ML has been demonstrated to be an efficient approach to address a wide spectrum of problems including materials property prediction, discovery of novel materials, etc. [8][9][10][11][12][13][14] These developments have lead to the emergence of a novel research field referred to as materials informatics, 15 where informatics methods -including ML -are used to determine material properties that are hard to measure or compute using traditional methods.
In general, many supervised ML algorithms including neural networks are capable of learning non-linear mappings from a high-dimensional feature vector to a desired output, and hence these methods should be useful tools when addressing the question of deformation predictability. An important opening in this direction was recently achieved by Salmenjoki et al., 16 who applied ML to study deformation predictability in the case of simple two-dimensional discrete dislocation dynamics (DDD) simulations. The key idea of Ref. 16 was that exploiting ML algorithms provides a useful set of methods to quantify predictability of complex systems such as plastically deforming crystals. This was achieved by training ML models including neural networks and support vector machines to infer the mapping from the initial dislocation microstructure to the response of the system to applied stresses, characterized by the stress-strain curve. Following Ref. 16 , ML has been further applied to predicting stress-strain curves of plastically deforming crystals, 17 and learning the interaction kernel of dislocations. 18 Recently ML has also been applied to the closely related problems of, e.g., predicting the local yielding dynamics of dry foams, 19 the creep failure time of disordered materials, 20 as well as the occurrence times of "laboratory earthquakes". 21 By its nature, the problem of predicting the plastic deformation process of crystalline samples depends on details such as whether the crystal only contains pre-existing glissile dislocations (this was the case in Ref. 16 ), or if other defect populations interacting with the dislocations are present. The latter could include solute atoms, precipitates, voids, etc., or even grain boundaries in the case of polycrystals. If present, a description of these additional defects needs to be included in the initial state of the system used as input for the predictive ML models. The presence of such static defects within the crystal may also change the response of the crystal to applied stresses and thus the process to be predicted in a fundamental manner. It has been shown that the deformation dynamics of "pure" dislocation systems are governed by dislocation jamming, 22,23 resulting in glassy dislocation dynamics characterized by "extended criticality", with the cutoff scale of the size distribution of dislocation avalanches diverging with the system size at any stress level. [24][25][26] This can be contrasted with systems where significant pinning of dislocations due to other defects within the crystal may instead induce a depinning phase transition of the dislocation assembly, resulting in critical-like dislocation dynamics only in the immediate proximity of the critical point (stress) of the depinning transition. 27,28 In this paper, we study deformation predictability in a onedimensional periodic model of a dislocation pileup, interacting with a quenched (frozen) random pinning landscape. The model is perhaps the simplest possible system including both interacting mobile dislocations and a quenched random pinning field interfering with dislocation motion, and therefore it serves as a useful playground to explore the ideas of MLbased deformation predictability discussed above. It is known to exhibit a depinning phase transition at a critical flow stress value σ flow , separating pinned and moving phases of the dislocation system. 29 We generate a large database of stress-strain curves, each corresponding to a unique randomly generated initial microstructure, by simulating the model with a quasistatically increasing applied stress for different system sizes (or different numbers of dislocations, N). First, the statistical properties of the stress-strain curves and the strain bursts are analyzed, followed by training and testing of various predictive models ranging from linear regression to convolutional neural networks (CNNs) to establish mappings from the initial random microstructure (defined by both the static pinning landscape and the initial configuration of the dislocations) to the response of the system to applied stresses (i.e., the stressstrain curve).
Our results show that the different predictive models employed are capable of learning the relation between the input and the stress-strain curves quite well, as measured by the correlation coefficient between the predicted and simulated stress value at a given strain. In particular, the sample flow stress (i.e., the sample-dependent finite size critical point of the depinning transition) can be predicted surprisingly well, with the correlation coefficient reaching values as high as 0.89 for a regularized CNN. We also explore the predictability of individual strain bursts taking place during the deformation process. Critical avalanches are expected to be inherently unpredictable, and indeed we find approximately zero predictability for most of the strain bursts belonging to the power law scaling regime of their size distribution. Interestingly, we also find that strain bursts belonging to the cutoffs of the distributions as well as those taking place very close to the flow stress exhibit non-vanishing predictability.

II. THE MODEL: EDGE DISLOCATION PILEUP WITH QUENCHED DISORDER
The dislocation pileup model we study here is similar to the one in Refs. 29,30 . It describes a system of N straight, parallel edge dislocations subject to an external shear stress σ gliding along the direction set by their Burgers vector (here, the x direction) within a given glide plane. By neglecting roughness of the dislocation lines, the system may be described by a set of point dislocations with coordinates x i representing cross sections of the dislocation lines moving in one dimension. The dislocations interact repulsively with each other via longrange stress fields with the stress field magnitude inversely proportional to their mutual distance. 29 In addition, the dislocations are taken to interact via short-range forces with a set of N p randomly positioned Gaussian pinning centres, playing here the role of quenched disorder. In real crystals these pinning centres could be solute atoms, or immobile forest dislocations threading the plane of the pileup. The overdamped equations of motion of the N point dislocations thus read where χ is the effective viscosity, x i the position of the ith dislocation, µ the shear modulus, b the Burgers vector magnitude, and F is the pinning force landscape. We choose a pinning force consisting of Gaussian potential wells centered at pinning sites x p , given by where x p is the position of the pth pinning point, E p a pinning energy scale, and s p the standard deviation of the Gaussian. For simplicity, we set χ = µ = b = 1, and E p = s 2 p = 0.5. Periodic boundary conditions within a system of size L are employed by performing an infinite sum over the stress fields of the periodic images of each dislocation, 29 The average spacing between dislocations is set to L/N = 16 and that between pinning sites to L/N p = 2. The sum over pinning sites is truncated so that only pinning sites that are closer than a cut-off distance of 8 are included in the sum.
The simulations are carried out by first choosing the positions of the N p pinning sites at random from a uniform distribution along the one-dimensional system of length L. N dislocations are then placed in the system with the constant where m σ = 10 −4 is the maximum stress rate, v th = 2 · 10 −4 is a threshold velocity value,v(t) is the spatially averaged dislocation velocity, and r = 100 is a shape parameter. In practice this amounts to a continuous, smooth approximation of the step function, with dσ /dt ≈ m σ forv < v th and dσ /dt ≈ 0 forv > v th , such that σ is increased only in between strain bursts and kept constant during them. Strain ε is accumulated by dislocation motion, The simulations are run until the dislocations start to flow indefinitely, i.e., the strain burst size diverges, which happens at a sample-dependent flow stress σ flow . The stress-strain curves σ (ε) are recorded by storing σ (t) and ε(t) every 20 dimensionless time unit during the simulations. Example stressstrain curves are shown in Fig. 1(b) for three different system sizes. Notice the characteristic irregular staircase-like struc-ture of the curves, as well as the larger size of strain bursts for smaller systems. The resulting stress-strain curves, each corresponding to a different realization of the random pinning landscape and relaxed initial dislocation positions, are stored in a database consisting of 10 000 stress-strain curves for each system size, to be used as training and testing data for the predictive models. For the largest systems (N = 64 . . . 512), we also generate a separate dataset of 1 000 stress-strain curves stored at a finer time resolution of ∆t = 2, in order to better detect individual strain bursts. This latter data set is used to analyze the statistical properties of the strain bursts, as well as to train and test ML models for predicting strain bursts.

III. PREDICTIVE MODELS: FROM LINEAR REGRESSION TO CONVOLUTIONAL NEURAL NETWORKS
In what follows we will introduce the predictive models used in this study. These encompass both linear and nonlinear models using hand-picked features as input, as well as a CNN that requires less feature engineering as it uses a complete description of the system's initial state as input. Using these different models allows us to assess the dependence of our results on any particular model.
Before training, each input feature (each input channel in the CNN case) is standardized by subtracting the mean and dividing by the standard deviation, and the dataset is divided randomly into a training set containing 80% of the data points, and a testing set containing the remaining 20%. Predictability is measured by calculating the correlation coefficient between predicted and target outputs of the testing set and averaging the result over 5 training instances. Predictive models are implemented using the Python libraries scikit-learn 31 for LASSO and Keras 32 for simple neural networks and CNNs.
The prediction models work by minimizing the loss, which mainly consists of the mean squared error. In addition, most models are L 1 -regularized, which adds a penalty term to the loss to prevent overlearning. L 1 regularization often improves the result and encourages the model to focus on the most useful/promising input features while discarding the unimportant, and as a byproduct it allows us to determine which input features are used by the model.

A. Linear regression: LASSO
A linear regression model performs an affine transformation, where input features are first linearly combined by multiplication with a (weight) matrix and then biased by adding a translation vector. If the model is optimized using L 1 -regularization for the weight matrix, it is commonly called LASSO (linear absolute shrinkage and selection operator 33 ). In this study, this model is applied by using sklearn.linear_model.Lasso, which has a built-in optimizer and only requires the user to choose the regularization parameter α. We found that values of α in the range 10 −3 α 10 −2 gave good results, and chose α = 10 −2 .

B. Simple neural networks
When two linear regression models are stacked in such a way that the intermediate result is activated by applying a non-linear activation function, such as the rectified linear unit (ReLU), the combined model becomes a simple neural network which can, to some extent, learn general non-linear mappings. An intermediate result of such model is called a hidden layer, and elements of the intermediate result are referred to as hidden units. L 1 -regularization is applied to each layer, with a parameter α n describing the strength of regularization at each layer.
Tested variants for stress-strain curve prediction differ by the number of hidden units, hidden layers and regularization. Most have one hidden layer with 64 hidden units and use α 1 = 10 −2 for the first weight matrix and α n = 10 −5 for the rest. All models use ReLU-activations for the hidden layers. One variant has less regularization, meaning that α 1 = 10 −3 instead of 10 −2 . Theoretically, one hidden layer is enough for reproducing any continuous mapping given enough hidden units, but a model with 3 hidden layers is employed here as well for comparison to other studies. The models are trained using the Adam optimizer 34 with learning rate 10 −3 for 100 epochs.
The predictability of strain bursts is studied using a simple neural network having one hidden layer with 64 hidden units. Model output is in this case very big, consisting of a vectorized (one-dimensional) version of an avalanche map that is originally two-dimensional (see the Results section below for details). We found that all parameters collapsed to 0 if α is too high, so a value of 10 −5 is used for all weight matrices. A lower α parameter makes the model susceptible to overlearning. Therefore, models are trained only for 10 epochs but with a higher learning rate of 10 −2 .

Selecting features for prediction
The LASSO and simple neural network models require hand-picked features for predicting stress-strain curves and strain bursts. It turns out that good, information-rich features for flow stress prediction are given by quantiles of distributions derived from the sample-specific pinning landscapes. Fig. 2(a) shows the correlation of the flow stress with quantiles of the pinning energy E, the pinning force F, as well as In a non-interacting system, the flow stress would be controlled by the most negative pinning force F, which provides the strongest obstacle for dislocation motion. In an interacting system, it is possible for the dislocations to push on each other and overcome the strongest pinning force at a smaller applied stress. As a result, the correlation between the most negative value of F [given by the 0% quantile of F in Fig. 2(a)] and the flow stress is quite small. Interestingly, the 5% quantile of F exhibits a much larger negative correlation of approximately −0.8 with σ flow , independently of system size [ Fig. 2(b)]. Other quantiles of the pinning force as well as of the other quantities listed in Fig. 2(a) also appear to contain useful information for flow stress prediction. We therefore give the entire quantile curve (sampled on 64 uniformly chosen points) of each field as input features.
In addition to these input features describing the pinning landscape, we also give input features that describe the initial relaxed configuration of dislocations {x j } N j=1 . In particular, we use the values E(x j ), F(x j ), DF(x j ) and D 2 F(x j ), as well as the distances between successive dislocations, Dx j = x j+1 −x j (with x N+1 = x 1 +L). These fields are given as sorted lists, so they can be interpreted as quantiles.
Finally, we choose a set of features given by local extrema of E and F, as well as the successive differences in energy extrema D(E extrema), also given as 64 uniformly spaced quantiles.

C. Convolutional neural networks
These models take discretized fields as input and have the ability to find out if the target output depends on the spatial variations within the input fields. The CNN structure used in this study is adapted for periodic input signals by applying a convolution-activation-pooling scheme that continues to downsample the input array down to a single spatial element. Initially, an input field has 1 024 = 2 10 spatial elements, so there are 10 convolutional layers (each layer halves the number of spatial elements). This makes the result effectively independent of the particular choice of origin of the input fields.
All convolutional layers employed here use a window size The CNN is trained using the same Adam optimizer as for other neural network models, with a learning rate of 10 −3 , learning rate decay of 10 −5 and 50 training epochs. We found that even weak L 1 -regularization caused the parameters to collapse to 0 in most cases, so we do not regularize these models. However, in the case of predicting flow stress only, a kernel regularization value of 5 · 10 −4 for the convolutional layers is found to work.
The input to the CNNs consists of the pinning landscape fields E, F, DF and D 2 F (although we found that giving one pinning field, such as E or F, suffices because the CNN can compute the higher derivatives by differentiation), as well as a field J defined as a sum of Gaussians centered on the relaxed dislocation positions x i , This field J serves to describe the dislocation positions in terms of an input field with similar structure as the pinning landscape, and which is independent of the spatial resolution [see Fig. 7(a) for examples of these input fields]. Notice that by contrast to the case of hand-picked features, these fields are given in order rather than sorted. The right panel of (d) shows that the highest stress bin data in the largest system size exhibits a power-law part consistent with an exponent τ ≈ 1.0 [see Eq. (7)].

IV. RESULTS
In this section, we first consider the statistical properties of our dataset of stress-strain curves. Then we report our results for training the predictive models, and employ them to explore the predictability of the deformation dynamics of the pileup model. Finally, we also discuss our results on the problem of predicting individual strain bursts.

A. Statistical properties of the stress-strain curves
We start by quantifying the system size dependent sampleto-sample variations in the plastic response of the system. Figs. 3(a-c) show bivariate histograms of the stress-strain curves for three system sizes (with N = 16, 64 and 256, respectively), illustrating that the sample-to-sample variability   in the plastic response is more pronounced in smaller systems. The same applies to the probability densities of the sample flow stresses σ flow , shown in Fig. 3(d): The distributions are quite broad for small systems and get narrower upon increasing the system size, presumably approaching a delta peak in the thermodynamic limit. The average magnitude of the σ flowvalues is in agreement with previous results. 29 As the stress-strain curves consist of a sequence of discrete strain bursts or dislocation avalanches ∆ε, we consider also their stress-dependent size distributions P(∆ε, σ ). Quite generally for systems exhibiting a depinning phase transition, one expects them to obey the scaling form where τ is the critical exponent of the avalanche size distribution, f (x) is a scaling function that obeys f (x) = const. for x 1 and f (x) → 0 for x 1, and ∆ε * (σ ) is a stressdependent cutoff avalanche size. Left panels of Figs. 4(ad) show the bivariate histograms of the avalanche count as a function of σ and avalanche size ∆ε for different system sizes (with N = 64, 128, 256 and 512, respectively). These illustrate that indeed the maximum strain burst size increases upon approaching the flow stress from below, and the increase close to the flow stress is more clear-cut for larger systems. Notice also that these histograms show an excess of strain bursts around σ ≈ 0.4. This feature is likely related to the characteristic pinning force magnitude due to an isolated pinning point given by Eq. (2). Right panels of Figs. 4(a-d) show the corresponding stress-resolved strain burst size distributions P(∆ε, σ ), considering three different stress bins below the flow stress for each system size. For the largest system sizes considered, the data appears to be consistent with a power-law part characterized by an exponent τ ≈ 1.0 [see right panels of Figs. 4(c) and (d)]. This value is somewhat lower than the expectation of τ ≈ 1.25 based on the scaling of the elastic energy of the pileup in Fourier space, 29 suggesting that the pileup model would be in the same universality class with depinning of contact lines 35 and planar cracks. 36,37 As we focus here on studying rather small systems, this difference in likely due to finite size effects.
Overall, the statistical properties of the plastic response of the pileup model shows two key features relevant for deformation predictability: (i) There are sample-to-sample variations in the response (that we will try to predict using machine learning), and (ii) the distribution of strain bursts ∆ε displays characteristics of a critical system, suggesting that individual strain bursts may be inherently unpredictable. In what follows we will use our trained predictive models to explore how well (i) the sample stress-strain curves and (ii) individual strain bursts can be predicted.   5 shows the mean correlation coefficient between the predicted and actual stress values as a function of strain for N = 64, using linear regression (LASSO) and different variants of simple neural networks. First, we note that the flow stress prediction is only slightly better than the magnitude of the correlation with the 5% quantile of the pinning force landscape found above (∼ 0.8, see Fig. 2), with the correlation coefficient assuming values in the range of 0.825-0.84 depending on the model used (the large-strain plateau in Fig. 5). Second, the correlation clearly depends on strain, exhibiting a rather pronounced dip reaching values as low as 0.65 or so for intermediate strains, along with two additional, lesser dips for smaller strains, before reaching a value close to 1 as the strain goes to zero. The latter is true especially for LASSO, while the neural network models seem to struggle a bit in capturing the deformation dynamics for very small strains. The reduced predictability for intermediate strains could be related to the importance of inherently unpredictable dislocation avalanches, in analogy with results of Ref. 16 .
It is interesting to analyze which features the predictive models end up using for prediction, and how that might depend on strain. Since the models utilize heavy L 1 -regularization, analysis of the learned weights becomes straightforward. Features that contribute to predictability are allowed to have high regularization penalty in exchange for decreasing loss, while unimportant features will not decrease loss sufficiently to justify a high regularization penalty. Fig. 6 shows how much penalty is generated by the weights corresponding to each feature category for the LASSO model. At the start, the pinning force derivative at the relaxed dislocation positions, DF(x j ), is the most active feature, but also the pinning energy at the initial dislocation positions, E(x j ), is used. This is reasonable as such quantities could be used to estimate the initial linear response of the dislocations. Interestingly, and perhaps somewhat counter-intuitively, the pinning energy at the initial dislocation positions becomes even more important for intermediate strains, in addition to a larger diversity of features used. Flow stress prediction uses mainly the quantiles of the pinning force F, as well as its local extrema points. This dependence of the important features on strain indicates that predicting the response of the system to infinitesimal stresses is of a different nature compared to predicting the flow stress.

Predictability of stress-strain curves: convolutional neural networks
We then proceed to employ CNNs as predictive models. Fig. 7(b) shows the mean correlation as a function of strain as obtained using three different types of CNNs, along with the LASSO result for reference. Here, the system size N = 16 is considered. First, we consider a CNN which predicts the entire stress-strain curve at once [a "generalist" CNN, shown as a red line in Fig. 7(b)]. This model actually underperforms the simple linear regression model for majority of the strain values. This could be due to the fundamentally different nature of the problem for different strain values, as illustrated by Fig. 6, showing that different features are important for different strains. Hence, the "generalist" CNN might get exhausted and not be able to optimally learn the entire stress-strain curve at once.
Thus, we consider also a "specialist" CNN which is trained separately for each strain value to learn the corresponding value of stress. The result is shown as yellow circles in Fig. 7(b). Indeed, as compared to the generalist model, the specialist model is able to learn the deformation dynamics of the pileup model much better, especially for intermediate strains, where the generalist CNN exhibits a deep dip in the correlation coefficient, presumably due to the importance of hard to predict dislocation avalanches for those strains. It is interesting that the specialist model is able to reach a correlation coefficient exceeding 0.8 also in this regime.
Notably, neither of the models discussed above is able to predict the flow stress much better than by just considering the correlation with the 5% quantile of the pinning force F (see Fig. 2). We therefore try to further improve the specialist CNN by switching on L 1 -regularization and using only the pinning force F as the input field (this was shown above to be the most informative quantity for flow stress prediction). The individual purple point at a large strain value in Fig. 7(b) is the resulting flow stress prediction, with a correlation coefficient of 0.89, i.e., significantly better correlation than using any of the other models considered.

C. Predictability of strain bursts
Finally, we explore the possibility of predicting individual strain bursts along the stress-strain curves. First, we need to formulate a problem that is suitable for analysis using the predictive models at hand. To this end, we generalize the bivariate avalanche histograms (left panels of Fig. 4) to the case of individual samples. As such, these would be rather sparse sets of points in the space spanned by avalanche size and stress. We therefore smooth out these maps by convolving with a Gaussian with a standard deviation of 0.03 units of stress in the vertical direction, and of 0.15 units of log 10 (avalanche size) in the horizontal direction; both correspond to 3 times the bin size used in Figs. 4 and 8. The studied region is limited to bins where at least one avalanche appears within the dataset, so that the Gaussian smoothing does not cause distribution boundaries to extend to impossible areas. Lastly, the maps are standardized by removing the mean and dividing by the standard deviation at each bin. We then employ a simple neural network with 64 hidden units to find a mapping from the hand-picked input features to the smoothed sample-specific bivariate avalanche histograms.
Three different examples of smoothed avalanche maps are shown in the left column of Fig. 8, with corresponding predicted avalanche maps in the right column. While the predictions are clearly not perfect, the algorithm is able to capture some key features of these avalanche maps. Fig. 9 shows correlation maps between the predicted and actual avalanche densities at 4 different system sizes, averaged over 5 training instances. These show a number of interesting features: (i) Large parts of the sample-specific avalanche histograms appear to be completely unpredictable, with the correlation coefficient assuming a value very close to zero (shown in black in Fig. 9). This is especially true for most of the regions corresponding to the power-law scaling regime of the avalanche size distribution, in agreement with the idea that critical avalanches should be intrinsically hard to predict. 38 (ii) Avalanches with the largest size for each stress value (i.e., those belonging to the cutoff of the stress-resolved avalanche size distribution) appear to exhibit some degree of predictability, although the related correlation coefficient values are somewhat lower than those found for predictability of the stress-strain curves. This is agreement with the idea that avalanches in the cutoff are not critical and hence somewhat predictable. (iii) Surprisingly, we also find non-vanishing predictability of avalanches of any size occurring in the immediate proximity of the flow stress (bright horizontal segments in the upper parts of the panels of Fig. 9). This is surprising because those events are expected to be critical and hence should be unpredictable.
In order to shed some light on the origin of the key features of avalanche predictability discussed above, we finish by considering the correlation between the avalanche count and flow stress. The logic here is that since we have demonstrated above that the sample-specific flow stress can be predicted quite well, if the avalanche maps are correlated with the flow stress, these should be predictable as well. Fig. 10 reveals two stand-out features: (i) The avalanches taking place in the immediate proximity of the flow stress exhibit clear positive correlation with the flow stress, and (ii) the largest avalanches for stresses smaller than the flow stress (i.e., avalanches belonging to the stress-dependent cut-off of the avalanche size distribution) are negatively correlated with the flow stress. This means that both a higher than average number of avalanches very close to the flow stress and a lower than average number of large avalanches at smaller stresses imply a higher than average flow stress value. The good level of predictability of the flow stress demonstrated above thus translates into reasonable predictability of these features of the sample-specific bivariate avalanche histograms. Notice that the widths of the predictable bands in the avalanche maps ( Fig. 9) appear to become thinner with increasing system size, so we cannot exclude the possibility that this avalanche predictability would be a finite size effect.

V. DISCUSSION AND CONCLUSIONS
We have established that predictive models ranging from linear regression to CNNs can be used to predict the stressstrain curves of the pileup model with a good accuracy. While the different models employed give rise to somewhat different results, a practical conclusion is that the model giving the highest correlation coefficient for a given strain provides a lower limit for deformation predictability of the pileup system. Notably the "specialist" CNN discussed above results in correlation coefficients exceeding 0.8 over the entire range of strains considered, and adding regularization further improves the correlation coefficient of the flow stress prediction to 0.89. These should be interpreted as lower limits of deformation predictability as we obviously cannot exclude the possibility that a hypothetical predictive model not considered here might outperform our models.
While the predictability of the stress-strain curves turns out to be quite good, it is not perfect. This is to be expected because the stress-strain curves consist of critical-like strain bursts which are expected to be inherently hard to predict. Indeed, our attempts to predict the sample-level bivariate avalanche count resulted in essentially zero predictability for most of the "critical" dislocation avalanches belonging to the power-law distributed part of the avalanche size distribution. On the other hand, these sample-level avalanche distributions also showed an interesting regularity: A larger than average flow stress value indicates lower than average number of the largest avalanches for stresses below the flow stress, and larger than average number of avalanches in the immediate vicinity of the flow stress. It would be interesting to extend such analysis from our stress-controlled simulations to strain-controlled ones.
The pileup model studied here has the advantage that the dislocation system will sample the entire one-dimensional random energy landscape, and hence extracting relevant descriptors of the sample-specific quenched disorder is straightforward. This is in contrast to other systems exhibiting a depinning phase transition such as magnetic domain walls in disordered ferromagnetic thin films, 39 planar crack fronts in disordered solids, 36,37,40 or individual dislocation lines interacting with point defects. 41 In such systems, driven by an external force in a direction perpendicular to the average elastic line direction, the disorder sampled by a given realization is not known a priori. Nevertheless, it would be interesting to extend the present study to consider the predictability of depinning dynamics in such systems. Finally, we point out two crucial outstanding questions related to deformation predictability: The problem remains to be addressed in realistic three-dimensional DDD simulations, with 28,42 or without 25 additional defects interfering with dislocation motion, as well as in experiments where descriptors of the initial sample microstructure could be extracted, e.g., by X-ray measurement techniques. [43][44][45]