Learning-based approach to plasticity in athermal sheared amorphous packings: Improving softness

The plasticity of amorphous solids undergoing shear is characterized by quasi-localized rearrangements of particles. While many models of plasticity exist, the precise relationship between plastic dynamics and the structure of a particle's local environment remains an open question. Previously, machine learning was used to identify a structural predictor of rearrangements, called"softness."Although softness has been shown to predict which particles will rearrange with high accuracy, the method can be difficult to implement in experiments where data is limited and the combinations of descriptors it identifies are often difficult to interpret physically. Here we address both of these weaknesses, presenting two major improvements to the standard softness method. First, we present a natural representation of each particle's observed mobility, allowing for the use of statistical models which are both simpler and provide greater accuracy in limited data sets. Second, we employ persistent homology as a systematic means of identifying simple, topologically-informed, structural quantities that are easy to interpret and measure experimentally. We test our methods on two-dimensional athermal packings of soft spheres under quasi-static shear. We find that the same structural information which predicts small variations in the response is also predictive of where plastic events will localize. We also find that excellent accuracy is achieved in athermal sheared packings using simply a particle's species and the number of nearest neighbor contacts.


I. INTRODUCTION
Machine learning has proven effective in identifying a structural quantity, softness, which predicts plastic rearrangements in solids [1][2][3] .In crystals, defects in crystalline order such as dislocations and grain boundaries -around which rearrangements are known to localize 4,5 -are typically characterized by high softness 3 .In disordered solids, softness succeeds remarkably well at addressing the longstanding challenge of identifying a structural indicator of a particle's propensity to rearrange.In particular, in supercooled liquids, the probability that a particle will rearrange depends approximately exponentially on the particle's softness, spanning several orders of magnitude 2 .Although softness is highly predictive in a wide range of systems studied in both simulations and experiments 1,2,6 , however, it still suffers from some significant drawbacks.
The first drawback is a practical one.Calculation of softness is significantly constrained by the need for training examples of rearranging particles, which constitute a very small fraction of the total number of particles in the system 2 .This has the effect of substantially increasing the number of independent configurations needed, reducing the method's practical use for analyzing limited experimental data.
The second drawback is a scientific one.Although softness yields insight into the underlying physics of glassy systems and can be considered a quantification of the old idea of a cage 2 , its meaning in terms of the local structure can be difficult to interpret because it is defined in terms of a large number of local parameters.This diminishes the insight that it can provide for the development of a structural theory of plasticity from first principles.
To calculate softness, a support vector machine (SVM) is trained to sort particles into one of two classes based on their current local structure: particles that are likely to participate in a rearrangement in the future (those with "high softness" environments) and particles that are not likely to participate in a rearrangement (those with "highly negative softness" environments).To train the classifier, examples of non-rearranging and rearranging particles (which we will refer to as "rearrangers" and "non-rearranger", respectively) must first be identified by observing a series of configurations undergoing rearrangement events in either simulation or experiment.The SVM then attempts to find a hyperplane which best separates the two classes of particles within a high-dimensional space of structural descriptors.These descriptors are derived from each particle's local pair correlation function.Softness is computed as the signed distance from the hyperplane, with particles located far from the hyperplane in the positive direction being softer, and therefore more likely to rearrange, while particles located far from the hyperplane in the negative direction considered to be harder and less likely to rearrange.
Here we propose two refinements to the softness calculation, addressing both drawbacks of the previous approach.First, we consider local variations in the displacement far from a plastic event to define a dynamical quantity which effectively characterizes a particle's susceptibility to rearrangements, or mobility.Since we can calculate this quantity for each particle in a system, we avoid the problem of having to choose examples of relatively mobile and immobile particles, converting the softness problem into one of regression rather than classification.This has the effect of greatly increasing the amount of data available from a single configuration and thereby improving performance when applying this technique to experimental systems or simulations with limited data.
Second, we use persistent homology, a form of topological data analysis, to systematically define a set of simple local structural parameters in a physically meaningful way, eliminating much of the guesswork.We demonstrate how to combine persistent homology with a machinelearning-based approach to identify a new version of softness that captures correlations between the dynamics and the local topological structure for each particle.We compare both aspects of our new approach with current methods to compute softness and demonstrate that it is just as effective.We find that the same structural information which predicts local fluctuations in the displacement field is also predictive of rearrangements.Furthermore, we find that excellent accuracy can be achieved with very few structural descriptors, in this case simply a particle's species and the number of nearest neighbors contacts.
This article is organized as follows: In Sec.II, we describe our process for generating configurations of particles at the onset of rearrangement.In Sec.III, we describe how particle dynamics are characterized to define the original softness, and introduce our new measure of effective particle mobility.We explain how the choice of characterization determines which type of statistical model is appropriate, which in turn affects the accuracy of the softness method when data is limited.In Sec.IV, we apply persistent homology to our configurations and interpret the results of the procedure.Next, Sec.V shows how to find correlations between the dynamical and structural characterizations we have developed and uses the resulting insight to define a new set of structural descriptors.Finally, Sec VI describes the results of our analysis with further discussion in Sec.VII.

II. ONSET OF PLASTIC REARRANGEMENTS
To analyze the relationship between structure and dynamics which underlies plasticity, we generate an ensemble of configurations of particles undergoing plastic rearrangement.We first create a collection of jammed packings of soft spheres and then athermally shear each configuration until the onset of rearrangement.
Prior to shearing, we prepare each configuration using standard methods from the study of jamming 7 .We start by placing N = 2 14 particles at random positions in a square periodic box in d = 2 dimensions.To reduce the probability of creating packings with crystalline structures, we choose a 50% : 50% bidisperse mixture of particles with radii of σ and 1.4σ, where σ is the radius of the smaller particles size.The size of the box is chosen to achieve an average packing fraction of φ = 0.95, well above the jamming density for this ratio of radii.Particles interact according to a finite-ranged soft pairwise Frequency ω (b) Figure 1.(a) A typical stress-strain curve for a packing in our simulations.Elastic branches are broken up by sudden plastic events where the shear stress drops; it is the initial rearrangement during these events which we study.(b) The lowest 5 normal mode frequencies of the dynamical matrix near one of these plastic events, showing that a single mode frequency goes to zero.The set of displacements described by this critical mode is the initial rearrangement for this plastic event.
(Hertzian) potential defined by where R i is the radius of particle i, r ij is the distance between particles i and j, and sets the energy scale.Next, we apply the FIRE algorithm to find a local minimum of the energy for the configuration, i.e. a mechanically stable state 8 .We generated approximately 400 independent configurations in this way, each from a different set of random initial conditions.Next, we examine each of these configurations under athermal quasistatic shear, broken up into a sequence of small strain steps.At each step, we apply a simple shear strain of 10 −5 by changing the shape of the simulation cell and re-minimize the energy to find a new stable state.Before energy minimization takes place, we make an educated guess as to where the particles will move up to linear order.During the first step, particles are displaced affinely according to the applied global strain, while in subsequent steps, the particles are instead moved along the non-affine displacement field produced by the previous strain step.Since the elastic response of this model is almost piece-wise linear, this produces a state closer to the new energy minimum and reduces the time spent on minimization.
As the system is strained, the shear stress and energy rise; after a sufficient amount of shear strain, the two drop suddenly, as shown in Fig. 1.These stress drops correspond to plastic (irreversible) events in which particles often change neighbors 9 .When such an event is detected, we back up to the configuration prior to the event, and approach it a second time with a smaller strain step size.This process is repeated until the event is passed through with a strain step size of 10 −12 .At this very small strain step, the main source of numerical error becomes the finite error of the minimization algorithm, rather than the finite strain step.We note that similar algorithms have been used in previous work 10,11 .
Our goal is to identify which particles are structurally predisposed to rearrange during one of these plastic events.However, these events can often involve a sequence of many smaller rearrangements as the movements of some particles can induce the movements of others, resulting in an avalanche of rearrangements 9 .A particle with a relatively hard structure may become softer during this sequence of rearrangements, thus the structure at the beginning of the event should not be expected to strongly predict motion towards the end of the event.
Therefore, we specifically try to identify structures which correlate with the initial motion at the beginning of an event, following earlier work 12 .At the onset of a plastic event, the system becomes linearly unstable toward a single direction in the dN dimensional space of particle coordinates, denoted x i for particle i; this direction corresponds to the onset of the initial rearrangement 9 .We identify this direction by diagonalizing the Hessian (i.e. the matrix of second derivatives of the energy or dynamical matrix) and identify the eigenvector whose eigenvalue goes to zero at the onset of instability, as shown in Fig. 1.We denote this "critical mode" u, with the displacement of particle i denoted u i .An example of such a critical mode is shown in Fig. 2.During each shear trajectory, we record the first 10 particle rearrangement events.We utilize the first event in each trajectory for our analyses and the remaining events to identify examples of particles that are unlikely to rearrange in the future for the classification-based approach (see next section).

III. DYNAMICAL CHARACTERIZATION AND SUPERVISED LEARNING STRATEGY
The softness method relies on measuring each particle's mobility by observing configurations of particles undergoing rearrangements in either simulated or experimental systems.To accomplish this, the method requires a means to quantify the amount by which each particle participates in a given rearrangement.This choice of dynamical characterization determines the type of statistical model that is most appropriate to identify correlations between particle dynamics and local structure.In this section, we describe the characterization of particle dynamics used by the original softness method and how it leads to a classification-based approach.We then show how the original approach may be generalized to allow for a simpler regression-based model, greatly improving its power when applied to limited data.By choosing a more natural characterization, we are better able to take advantage of available data.

A. Classification via Dynamical Outliers
In order to define an observable proxy for particle mobility, we start with the critical mode u describing the onset of a rearrangement, as defined in the previous section.Since rearrangements are characterized by the relative motion within local neighborhoods of particles, we use a measure of the local non-affine motion in each particle's environment, commonly referred to as D 2 min 13 .This ensures that particles which move together as rigid clusters are assigned small measures of motion, even if they display large displacements.We define this quantity for a particle i in the standard way, where F the deformation gradient matrix of size d × d calculated to minimize D 2 min (i).The sum iterates over all particles in the neighborhood of particle i (excluding itself), represented by the set N i ( ) with size |N i ( )|.The size of the neighborhood is set by a discrete cutoff distance , which we calculate using the minimum path length between pairs of particles in the Delaunay triangulation of the configuration (typically equivalent to a Euclidean distance of 2-3 particles diameters; see Sec.B 1 for a comprehensive discussion).The vector ∆ x ij = x j − x i is the position of particle j relative to i and ∆ u ij = u j − u i is the relative displacement between the two particles calculated from the critical mode.
Fig. 2(a) depicts D 2 min for each particle in a configuration at the onset of a rearrangement.The inset shows the local neighborhood at the "source" of the rearrangement (defined as the particle with largest value of D 2 min ) with arrows depicting the critical mode u, indicating the initial particle movements.We expect u to decay like r 1−d where r is the distance from the source of the event and d is the dimension 14 .This suggests a power law dependence of r −d for for the non-affine motion as measured by D 2 min , which depends on the difference in motion between adjacent particles and thus scales like the strain.
Once D 2 min has been calculated, the naive approach would be to perform a simple linear regression to determine the correlation between each particle's observed D 2 min and a measure of its local structure.However, the power-law dependence of D 2 min poses a practical problem; it is both system-size dependent and ranges over many orders of magnitude for a given rearrangement (e.g.almost 10 orders of magnitude in the example depicted in Fig. 2(a)).The result is that any correlations between D 2 min and any structural quantities of interest are weakened.Indeed, we find that a linear model based on D 2 min and the various structural quantities we consider in this work only account for a small percentage of the observed variance.
To avoid this issue, the original softness method converts the problem into one of classification, defining two classes of particles: rearrangers, which will rearrange in the near future, and non-rearrangers, which have not rearranged for a long time (or strain in this case).To train the classifier, examples of particles from both classes are identified by imposing strict cutoff thresholds on observed values of D 2 min .In each configuration, examples of rearrangers are found by choosing particles such that D 2 min is greater than a cutoff q r .Similarly, examples of non-rearrangers are chosen by identifying particles with low observed D 2 min within a relatively long window of time into the future.In this work, we set an upper threshold q nr on the maximum D 2 min experienced by a particle within a window of 10 rearrangement events into the future (including the current event).
This strategy reduces noise by limiting training to dynamical outliers, or particles in the tails of the distribution of D 2 min , as it is much easier to distinguish between particles in these two regimes.The primary problem with this approach is that there is not always a natural choice for the thresholds q r and q nr .As we will demonstrate, the choice of these cutoffs can dramatically affect the resulting classification accuracy of the model.To the maximize the accuracy, these threshold must be taken to be very strict, greatly reducing the number of particles that can be utilized from each individual configuration.As a result, achieving adequate accuracy with such a strategy would require a prohibitively large number of samples.While loosening the strictness of the thresholds increases the number of samples, it also introduces noise as some particles may have large values of D 2 min simply due to their proximity to the rearrangement, even though structurally they are indistinguishable from non-rearranging particles.Similarly, particles with small values of D 2 min may simply be far away from the source of the rearrangement, but still have local structures with particularly low stability.

B. Regression via Locally Rescaled Motion
To remedy these problems, we take note of the fact that previous studies have determined that a typical jammed packing can have many soft spots, or local regions with particularly low energy barriers to rearrangement, with the source of the rearrangement contained within just one (or a small number) of them 15 .Since a rearrangement in a homogeneous system would consist of a local plastic event surrounded by a decaying strain field described by the Eshelby kernel 14 , we hypothesize that the related critical mode will consist of such a field interacting with the underlying structural softness field.The result of this interaction will manifest as small-scale variations on top of the continuum response.In this view, soft spots are not associated simply with particles with relatively large values of D 2 min , but rather with particles with large observed motion relative to particles within their local environment.Therefore, we would like to normalize D 2 min so that it is independent of position relative to the the source of the rearrangement, but still captures particle level variations in the response.
To accomplish this, we introduce a simple modification to the D 2 min field.For each particle within the ith particle's neighborhood including itself, we measure D 2 min (i) and calculate the average D 2 min Ni( ) .We then rescale D 2 min (i) by this average value to obtain For simplicity we consider the same neighborhood of particles for both the original calculation of D 2 min and this locally rescaled version, but in principle, each could be chosen separately.We choose the discrete cutoff distance such that it is large enough to capture local variations in the D 2 min , but not too large as to lose information about potential soft spots.We find that a distance of = 2 is as small as possible while still capturing local variations in the critical mode, commensurate with known length scales of D 2 min spatial correlations 6 .Fig 2(b) depicts this locally rescaled measure of the non-affine deformation for the rearrangement in Fig 2(a).We see that ∆ 2 min still captures local fluctuations in the response, but eliminates distance and angular dependencies without having to fit a functional form or explicitly address finite size effects.We posit that this measurement of a particle's participation in a rearrangement is a proxy for a particle's mobility.Since ∆ 2 min no longer varies over many orders of magnitude with distance from the rearrangement source, standard linear regression now becomes practical.This change of training strategy allows us to avoid choosing cutoffs to identify training examples (rearrangers and non-rearrangers).Instead, we can utilize all of the particles in the system.We will see that this dramatically improves predictive accuracy when data is limited.

IV. STRUCTURAL CHARACTERIZATION OF LOCAL PARTICLE ENVIRONMENT
Now that we have defined a dynamical quantity that locally quantifies each particle's participation in a rearrangement, the next step is to characterize each particle's local structure.Typically, a choice must be made as to which aspects of the local structure to measure.The original softness method uses structural descriptors derived from a particle's local pair correlation function 1 .These structural descriptors were originally proposed by Behler and Parrinello as a means to parameterize potential energy surfaces for use in density-functional theory 16 .In effect, the Behler-Parrinello (BP) descriptors form an arbitrary basis which provides an over-determined representation of local structure (see Appendix B 1 a).Because they are not specialized for any particular system, the number of necessary descriptors can be very large, even after many redundant or non-informative features have been eliminated by the training process.This means that the resulting form of softness, composed of a linear combination of these parameters, can be difficult to interpret.
Rather than choose an arbitrary basis of descriptors, here we turn to persistent homology, a method in topological data analysis, to systematically identify a natural set of descriptors.This procedure both minimizes guesswork and provides descriptors that are both tailored to the system of interest and easier to interpret.In the past, the persistence algorithm has been used to study various topological aspects of configurations of particles in twodimensions and higher [17][18][19] .In this section, we outline the procedure for applying the persistence algorithm to jammed packings of particles.We then measure the statistical properties of the topological features within such systems and explain their physical interpretations.

A. Persistent Homology
Persistent homology is a technique that detects and characterizes topological features contained within geometrically and/or topologically structured data 20,21 .In this case, we use it to characterize each two-dimensional configuration of jammed soft spheres at the onset of rearrangement.For each particle i we know its position x i and its interaction radius R i .In general, the types of topological features the persistence algorithm can detect include connected components, loops, voids, etc.While the first two types of features are relevant in two dimensions, we primarily focus on loops in this study (or onedimensional cycles, which we refer to simply as cycles from now on).
To perform the persistence algorithm on a configuration of particles, we perform a filtration of its weighted Delaunay triangulation 20 .To apply this filtration, we start by calculating the weighted Delaunay triangulation of the configuration of particles, using the squared radius R 2 i of each particle as its weight.Fig. 3(a) depicts a configuration and its associated weighted Delaunay triangulation.This triangulation has the property that each contact between a pair of particles correspond to an edge in the triangulation (although the converse is not always true).This ensures that it encodes the particle contact topology and therefore the mathematical constraints of the system.In two dimensions, this Delaunay triangulation is composed of three different types of simplices: vertices, edges, and triangles (in three dimensions we would also have tetrahedra).The filtration assigns an ordering to each of these elements from which we can build up the triangulation piece by piece.The subsets of the triangulation we observe at each step are called alpha complexes, providing representations of the configuration at different length scales until we achieve the full Delaunay triangulation.
To find the ordering of simplices, we place a disc (or d-dimensional ball in d dimensions) of radius at the center of each particle i. Next, we use the control parameter α to gradually increase the size of these discs.At the value α = 0, each disc has the same radius as its corresponding particle in the configuration, while for α < 0 (> 0) each disc is smaller (larger) than its corresponding particle.Initially, we start with a value of α = −σ 2 such that none of the discs overlap, where σ is the minimum interaction radius of all the particles.As shown in Fig. 3(b), particles with radii of σ initially appear as points, while particles with larger radii are finite discs.Each point or disc represents a separate connected component corresponding to a single vertex in the Delaunay triangulation.
At this point, the alpha complex consists of all the vertices, but none of the edges nor triangles.As α increases, we consider the union of the discs; if a pair of discs starts to overlap and there exists an edge between the corresponding vertices in the full Delaunay triangulation, we add the edge to the alpha complex.Similarly, at the instant that a triplet of discs starts to overlap at a single point and there exists a triangle composed of the associated vertices in the Delaunay triangulation, we add the triangle to the alpha complex.
If enough edges have been added, a cycle of edges may appear surrounding a hole in the union of discs.When this occurs, we say that the cycle is "born" and record the value of α at that instance, α b .Fig. 3(c) shows the birth of a new cycle at α b = −0.014σ 2 highlighted in red with the participating discs in blue.As α further increases, the hole that the cycle surrounds can break up into smaller holes as edges are added and shrink as triangles are placed in the alpha complex.Eventually, when a hole is completely filled in we say the corresponding cycle has "died" and again record the value of α for this event, α d .Fig. 3(d) shows the death of the red cycle at α d = 0.47σ 2 .At this instance, the triangle highlighted in green is placed into the alpha complex, plugging the hole that the cycle surrounds.
The value α b for each cycle measures the length of the largest edge comprising that cycle, while α d measures the overall scale of the cycle.We continue increasing α until the discs fill all of space and the Delaunay triangulation is complete.In this way, each cycle that appears during the filtration is assigned a birth-death pair (α b , α d ) encoding its inherent length scales.We plot this birthdeath pair on a persistence diagram as demonstrated in Fig. 3(e).The collection of all birth-death pairs encodes the complete topological information at all length scales contained within the configuration.For a more detailed mathematical explanation of the persistence algorithm, we refer the reader to Ref. 20.We generate weighted Delaunay triangulations using CGAL 22 .We also note that CGAL can be used to compute α-values and the associated filtrations, although we used our own implementation.
(a) (b) Death:  All the characteristics of the persistence diagrams we have noted correspond to different types of prominent features present in the particle configurations -in this case one-dimensional cycles.In order to interpret the precise meanings of these features, we identify and classify each cycle according to both its size and composition.The size is determined by the number of particles, or equivalently, the number of edges in the underlying Delaunay triangulation.Composition is determined by the types (radii) of the particles involved, along with the types of the edges.In the original configuration (corresponding to α = 0 in the filtration where each disc is the same radius as its particle), an edge corresponds to two particles which either overlap or do not overlap, which we call contacts and gaps, respectively.Cycles can be composed of any combination of contacts and gaps.If a cycle is born with α b < 0, then its largest length edge corresponds to a contact and the cycle must therefore be completely composed of contacts.Conversely, if a cycle is born with α b > 0, its largest edge is a gap, but it may also contain some contacts.Finally, if a cycle has more than three edges, we can assess whether it contains any gaps in its interior.In two-dimensions, each cycle is the boundary of a two-dimensional surface composed of triangles.An interior edge is one contained in this surface, but not located on its boundary.
Since the persistence algorithm does not provide a unique representation of the cycles it detects, we choose a particular representation of each cycle when it is born.We explain our procedure for identifying these birth cycles in the Appendix in Sec.A 1.However, the exact choice we make does not affect the overall results.Once we have associated each point in the persistence diagram with a particular cycle, we see that specific types of cycles have births/deaths in different regions of the persistence diagram.In Figs.4(b-i)-(b-iv), we sort cycles according to size and composition in terms of contacts and gaps.Inset within each panel is a representative example of the type of cycle observed in that region of the persistence diagram.As shown in Figs.4(b-i) and (b-ii), cycles with more than three edges are located all throughout the vertical band.Cycles which contain at least one gap in their interior are located in the upper part, while those without interior gaps concentrate in the lower part.Fig. 4(b-iii) shows that the lower part of the band also contains triangular cycles, containing exactly three contacts.Since the vertical band is located at α b < 0, all of these cycles are composed solely of contacts.Depicted in Fig. 4(Biv), the band running along the diagonal also contains triangular cycles composed of exactly three edges, coinciding with and extending out from the lower part of the vertical band.When α b > 0, these triangular cycles will always contain one or more gaps.
The decomposition of the persistence diagram can be taken one step further to understand the effects of particle size and position on the triangular cycles.In Figs.4(C-i)-(C-iv), we have sorted the triangular cycles into groups based on the combinations of particle sizes.Again, inset within each panel is a representative example of a triangular cycle in that region of the persistence diagram.Since there are two possible particle radii, we observe four different combinations of three particles: (ci) three small particles, (c-ii) two small particles and one large particle, (c-iii) one small particle and two large particles, and (c-iv) three large particles.
For each of these cases, we can analytically calculate a birth-death curve which approximates the sub-band, highlighted by the orange curves.Starting with three particles arranged into a triangle with three contacts, we calculate α b and α d as we continuously open up one of the contacts into a gap.We explain this calculation in more detail in the Appendix in Sec.A 2. As the gap opens up and a triangle becomes more elongated and less regular, both α b and α d increase while the difference between them decreases.Eventually, the triangle elongates so much that α b = α d and the curve ends at its intersection with the diagonal line.Each of the different combinations of particles comprises a separate curve.In the cases with one large particle and two small particles or one small particle and two large particles, there are two places where the gap can be placed: between particles of the same type or particles of differing types.This means we can calculate two different curves for each of these cases.However, these curves are so close together that it is difficult to distinguish between them within the corresponding sub-bands.In addition, contacts in these cycles can have varying amounts of overlap and sometimes cycles can have more than one gap.Both of these effects contribute to the widths of the sub-bands.
In summary, calculating composite persistence diagrams for our jammed configurations provides a rigorous statistical representation of local topological structures.By identifying the cycles corresponding to each point and then decomposing the persistence diagrams accordingly, we can fully understand how different types of cycles correspond to features we observe in the composite persistence diagram for all cycles.

V. CONNECTING DYNAMICS AND STRUCTURE
Now that we have established quantitative descriptions of both a particle's dynamics during a rearrangement and its local structure, we search for correlations between the two.To do this, we color each pixel in the composite persistence diagrams according to the average amount of motion undergone by the cycles present in that pixel.For each cycle, we measure value of either D 2 min or ∆ 2 min averaged across all particles in that cycle, we we denote D 2 min • and ∆ 2 min • , respectively.Next, we average each cycle-defined measure of dynamics for each pixel in the persistence diagrams across all cycles present in that pixel.min .The only exception we observe is a very narrow vertical band at α b = 0 which contains slightly larger measures of motion.This indicates that particles that participate more in rearrangements tend to contain contacts that have such small numerical values of overlap that they are almost gaps.However, this signal is very weak.
In contrast, Fig. 5(b) depicts the persistence diagram correlated with ∆ 2 min , the normalized measure of motion.
Here we observe very strong correlations between motion and cycle type; cycles located in the lower left-hand corner are typically located in neighborhoods with relatively low amounts of motion relative to their surroundings, while cycles located in either of the two bands tend to participate more strongly in rearrangements.This contrast is especially strong in the diagonal band with an almost step-like jump in ∆ 2 min • occurring across the α b = 0 line.
In Figs.5(c-i)-(c-iv), we correlate ∆ 2 min with the persistence diagrams of the four classes of cycles.Insets depict examples of the types of cycles represented in each panel.We see in Figs.5(c-i) and (c-iii) that cycles with more than three edges and no interior gaps, along with triangles with no gaps, typically have low ∆ 2 min , colored in red.On the other hand, Figs.5(c-ii) and (c-iv) show that cycles with more than three edges that contain interior gaps, along with triangles that contain at least one gap, typically have high ∆ 2 min , colored in blue.This correspondence between gaps and ∆ 2 min is also present in the sub-bands comprising the full diagonal band.The subband curves we show in Figs.4(c-i)-(c-iv) correspond to triangles with exactly one gap.If a triangle has more than one gap, it will result in a larger α d , moving the cycle upwards in the persistence diagram away from the associated curve.We see in Fig. 5(b) that the regions of persistence diagrams corresponding to triangles with more than one gap are a darker blue than those with one gap, indicating larger values of ∆ 2 min .From all of these observations, we posit that a particle's participation in a rearrangement relative to its local environment, as measured by ∆ 2 min , is determined by the presence or absence of gaps, or conversely, the number of contacts.The more gaps, or less contacts, a particle shares with its nearest neighbors, the larger its participation in a given rearrangement will be relative to its local neighborhood.

A. Topologically-Informed Structural Descriptors
Based on these observations, we use the numbers of gaps and contacts in a particle's local environment to construct a set of local structural descriptors.In order to allow for the possibility that a particle is affected by more than just its immediate nearest neighbor structure, we allow structural descriptors to be defined at a range of distances from a particle of interest.Since gaps and contacts are defined in terms of the Delaunay triangulation which captures a configuration's contact structure, we define all distances in terms of this triangulation.We start by defining the distance d jk as the minimum path length in the triangulation between particles j and k, counted in terms of the number of edges (e.g., nearest neighbors are distance one, next nearest neighbors are distance two, etc.).Fig. 6(a) shows an example of these distances for a neighborhood around a specific particle shown in blue.This measure of distance has the nice property that it is defined in a way which takes into account the contact topology, along with differences in particle radii.We use this discrete distance in all aspects of our softness procedure, including the cutoff distance used to calculate D 2 min and ∆ 2 min described previously.In those cases, we consider all particles to be in the neighborhood of a particle i if d ij ≤ .
Next, we assign a measure of distance between a particle i and particular edge (j, k) defined in terms of its pair of vertices j and k as the sum of the distances of particles j and k from i.As depicted in Fig. 6(b), this definition of distance separates the edges in the triangulation into "layers" at different distances.Edges that are incident with particle i are assigned distance d i,(j,k) = 1, while those that are incident with two nearest neighbors of i are at distance d i,(j,k) = 2, etc.
Using this definition of distance, we can simply count the number of gaps and contacts present in each layer of the Delaunay triangulation.For a particle i, we denote the number of gaps and contacts located at a distance d i,(j,k) = m as g m i and c m i , respectively.We also include the particle species p i , where p i = 0 if the radius R i = σ and p i = 1 otherwise.The result is a list of structural descriptors where max is the maximum distance considered.In Fig. 7, we plot the distributions of ∆ 2 min for particles with different numbers of (a) gaps and (b) contacts in their nearest neighbor environments, g 1 i and c 1 i , respectively.Already, we see that the larger the number of gaps a particle has, and the lower its number of contacts, the larger value of ∆ 2 min it will have on average.To fully characterize this correlation, we perform linear regression with the structural descriptors x i acting as our independent variables and the locally rescaled non-affine deformation ∆ 2 min acting as our dependent variable.The result is a new definition of softness, composed of a linear combination of gaps and contacts at different discrete distances,   where µ is an index for the components of x i in Eq. 6 and the weights w µ are determined by the regression.In the next section, we compare this new formulation of softness with the previous version of the method.

VI. RESULTS
We separately compare each aspect of our new method with the previous version of softness.We test four different combinations of dynamical characterization (D 2 min or ∆ 2 min ) and structural descriptors (BP descriptors or gaps/contacts).The accuracy each combination of methods is reported in Table I. Results for jammed packings in higher spatial dimensions and for a variety of pressures are reported in Ref. 23  When performing classification, we have chosen our cutoffs to identify examples of non-rearrangers and rearrangers, q nr and q r , as strictly as possible, limiting ourselves to one particle per class in each configuration.These two particles exhibit the largest and smallest D 2 min in each configuration.As we will demonstrate, this results in the highest possible classification accuracies when we use the SVM approach.
In all cases, we report a metric of accuracy appropriate to the type of model used.For the classifation models, we report the binary classification accuracy, the percentage of particles that are correctly classified as having positive or negative softness.For the regression-based models, we report R 2 , the fraction of the variance in the dynamics explained by the model.
To ensure that we do not overfit our models, we perform cross-validation, training on one set of trajectories and computing test scores on another independent set of trajectories.When cross-validation demonstrates no significant difference between the training and testing accuracy, we report the mean and variance of the accuracy obtained via bootstrapping.Otherwise, we report the mean and variance of the cross-validated test accuracy.We have also chosen our model hyperparameters via cross-validation into order to maximize the accuracy.We refer the reader to the Appendix, Sec.B 4, for a complete description of our training procedures and choices of hyperparamters.
For the classification-based models using D 2 min , we see that the structural descriptors based on gaps and contacts performs slightly better than the BP descriptors.However, both sets of descriptors perform well with accuracies greater than 90%.We see similar results for the regression-based schemes using ∆ 2 min , with gaps and contacts performing slightly better, but both sets of descriptors resulting in R 2 values around 20%.
We also see that a classification scheme based on ∆ 2 min performs well for both sets of descriptors.In contrast, performing regression on D 2 min performs very poorly, resulting in R 2 accuracies of only a couple percent.This shows that ∆ 2 min is a more robust measure of dynamics, performing well independent of the choice of statistical model.
We note that the classification accuracies in Table I tend to be much higher than the corresponding regression accuracies.This is because classifiers only attempt to sort particles into binary classes and also only consider particles that could be considered as outliers in the distribution of D 2 min or ∆ 2 min .This means that it is not appropriate to directly compare classification and regression accuracies.Since a sample may contain many "soft spots," only one (or a few) of which will rearrange in a particular event, a good criterion to measure success is whether or not the rearrangement always localizes around a soft particle.In order to compare both classes of models with this criterion, we follow the approach of Ref. 15.We identify the particle i max in each configuration with largest D 2 min , the global maximum.This particle can be considered, in effect, the "source" of the rearrangement.Next, we record the percentile of the global maximum's computed value of softness S imax within the distribution of all particles in its respective configuration.This is equivalent to evaluating the cumulative distribution function of softness at S imax , which we denote CDF(S imax ).If a model were to perfectly predict which particle is most likely to rearrange within a configuration, then the global maximum in D 2 min would coincide with the maximum value of softness in that configuration and we would obtain CDF(S imax ) = 1.If the model failed completely so that a random particle is chosen, then we would obtain CDF(S imax ) = 0.5.We report CDF(S imax ) , the average percentile of the global maxima in D 2 min in each configuration for all models in Table I.We find that all combinations of methods perform comparably well, consistently placing the global maxima in D 2 min in at least the 86th percentile of softness.One major benefit to using ∆ 2 min with a regressionbased scheme is the small amount of data needed to obtain high accuracy.Fig. 8 shows the accuracyof all four methods as a function of the number of configurations used in training.In order to calculate error bars via bootstrapping or cross-validation, the minimum number of trajectories needed is two.In Fig. 8(a), we see that the classification accuracy is dramatically affected by the amount of training data for both sets of descriptor, and does not begin to level off until one has several hundred configurations.In contrast, we see in Fig. 8(b) that the regression accuracy is already at its maximum when using the minimum number of configurations, namely 2. In fact, a single rearrangement configuration would likely be sufficient to attain the maximum regression accuracy.In Fig. 8(c), we plot CDF(S imax ) for the four different schemes.Again, we see that in the case of classification, CDF(S imax ) is strongly dependent on the amount of training data, while in the case of regression it is high even for two configurations.We note that while the variance of CDF(S imax ) is large for regression with small numbers of trajectories, it decreases very quickly with additional data and the mean score remains high.
Another benefit to using regression in concert with ∆ 2 min is that training examples of specific particles are not needed.In Fig. 9, we investigate the effect of the thresholds used to choose training examples for classification in combination with D 2 min .We parameterize these thresholds, q r and q nr in terms of the percentiles in the D 2 min distribution within each configuration.For simplicity we choose to set these thresholds equal such that q = q r = q nr .In Fig. 9(a), we report the classification accuracy as a function of q.We see that the accuracy is greatly affected by the choice of q.In Fig. 9(b) we observe that while CDF(S imax ) is far less sensitive, it is still affected by the choice of q.For reference, we have also provided the corresponding regression accuracies using ∆ 2 min for both sets of descriptors, shown as dashed horizontal lines.We see that CDF(S imax ) converges to these values at very strict thresholds.Finally, we compare the dependence of the four schemes on the number of included structural descriptors and the size of the local environment that they encompass.For both sets of descriptors, we sort each descriptor by its average Euclidean distance from the particle of interest (see Appendix, Sec.B 1 for details).Starting with the complete set of descriptors, we iteratively remove the descriptor at the largest distance, retraining our models at each step and measuring the new accuracies.Fig. 10 shows the training accuracies for our different models as a function of maximum descriptor distance r max .In Figs.10(a) and (b) we plot the accuracies for our classification and regression models, respectively, using both of descriptors.In Fig. 10(c) we compare CDF(S imax ) for all four methods.We see that in all four cases the accuracy rapidly increases up to a distance of about 0.5 to 1.0 particle radii for both sets of descriptors, with marginal improvements at larger separations.For gaps and contacts, we find that only two descriptors are sufficient for a particle i: the particle species p i the number of contacts of the particle with its neighbors c In contrast, the BP descriptors require at least around 20 descriptors to describe this environment.In principle, a different choice of parameters in the definitions of the descriptors could reduce this number, but it is not clear what these parameters should be a priori.We conclude that gaps and contacts provide a more concise description of the local structure.

VII. DISCUSSION
In summary, we have introduced two major improvements to the softness method.First, we have defined a locally rescaled version of D 2 min -represented as ∆ 2 min -which captures local variations in the relative motion of particles during rearrangements, converting the softness problem from one of classification to regression.The result is a more natural characterization which avoids the need to define classes of particles that are more or less likely to rearrange.This allows us to take advantage of all particles in a given data set, rather than just statistical outliers in observed mobility, greatly reducing the number of configurations needed to compute softness.In our case, this improvement leads to a several-hundredfold decrease in the amount of data needed to achieve an accuracy of CDF(S imax ) = 0.86.This is clearly a major advantage when there is limited data available for training, as is often the case in laboratory experiments.Second, we have demonstrated a procedure for characterizing the local structure of particle configurations.This procedure, based on persistent homology, allows for the systematic development of topologically-informed structural descriptors that can be specialized to a system of interest.This results in a more concise and interpretable set of descriptors which further decreases the amount of data required in training (see Fig. 8).The simplicity of the resulting descriptors also allows features to be included at further distances from each particle, allowing for the possibility of capturing structural correlations beyond each particle's immediate proximity.Furthermore, as opposed to the traditional BP descriptors, features based on gaps and contacts do not contain any extra parameters in their definitions, alleviating the need to fine tune the descriptor hyperparameters for each system of interest.
In the case of two-dimensional jammed packings of soft particles undergoing quasi-static shear, we find that ∆ 2 min correlates strongly with a particle's susceptibility to rearrangements, providing an indirect measure of each particle's mobility.We also find that the nearest neighbor environment -a particle's species and the number of contacts with its neighbors -contains most of the local structural information captured by softness in two dimensions.
For the physical problem of interest in this papers, we evaluate ∆ 2 min for critical vibrational modes whose frequency vanishes at stress drops during athermal quasistatic shear.The success of softness trained on ∆ 2 min for predicting plastic events in this context suggests that local variations in the response to a rearrangement are closely related to particle mobility.This suggests that ∆ 2 min could be applied to a variety of related physical systems as a means of providing insight into particle dynamics.For example, ∆ 2 min could be evaluated for any low-frequency vibrational modes -not just the critical mode associated with an instability -as a highly efficient way of extracting soft spots 12 .It could even be evaluated for the relative displacements between different configurations (for example, to the difference between two configurations separated by a strain step) to provide a potentially more useful measure of mobility than D 2 min in systems with inhomogeneous loads.It could also be evaluated to study the response of configurations with force dipoles applied in numerical simulation or even experiments.Furthermore, one could use softness trained on ∆ 2 min to predict rearrangements in athermal systems experiencing other types of loading such as uniform compression or expansion, or in thermal systems that are either quiescent or under load.
Although softness is highly useful as a structural predictor of mobility, there are situations where such a predictor is not needed and it may suffice to characterize the mobility.Elastoplasticity models are based on the premise that the coupling between mobility and elasticity is key to understanding the deformation and flow of disordered solids.Characterization of the interplay between ∆ 2 min and elasticity and how this varies from system to system could lead to new elastoplasticity models based on relevant microscopic information.
In this study, we used the persistence analysis as a systematic means of identifying a set of local structural variables relevant to dynamics in jammed packings.The analysis can readily be extended to particles with more complicated sets of interactions.While it is always possible to form an unweighted Delaunay triangulation given just particle positions, a cell complex (i.e., a generalization of a triangulation, see Ref. 20) which corresponds more closely to the actual constraints or interactions in the system may provide cleaner results.For example, one could imagine developing a generalization of the alphashape filtration, and associated triangulation, for nonspherical particles.In lieu of a rigorous mathematical formulation, it would also be possible to pixelate the underlying space into a cubical complex and then evaluate the total potential energy on the vertices between pixels (or voxels) 24,25 .Once could then perform a filtration of the potential energy function on this cell complex.If the particles have well-defined boundaries, a Euclidean distance transform on the cubical complex could also suffice.In all cases, once an appropriate filtration is chosen, the persistence algorithm will provide a complete characterization of the topological structure.
We have shown that ∆ 2 min is a better dynamical quantity than D 2 min for determining softness from linear regression.For classification, ∆ 2 min and D 2 min are equally effective.However, we note that for classification one could use local minima and maxima in ∆ 2 min -or even D 2 min directly -as natural classes of non-rearrangers and rearrangers, respectively, instead of placing stringent thresholds on D 2 min .This uses data more efficiently so that fewer snapshots are needed, although not as efficiently as linear regression.
Our result that gaps and contacts provide a concise and predictive characterization of local structure dovetails nicely with our understanding of jammed systems, where the contact number is a key quantity.Here we find that for predicting mobility, it is not only contacts that are important but also gaps in the Voronoi cell.It would be interesting to determine whether gaps are themselves important or are a signature of some other underlying aspect of local structure that is more closely related to the contacts.
While we find that we are able to achieve high percentiles of softness for the particles which experience the most motion, we still do not achieve a perfect 100% accuracy.The fact that we observe R 2 values of only about 20% provides a strong indication that our method still does not capture all of the relevant structural information.The fact that we still achieve high accuracy for predicting the sources of rearrangements implies that these particles tend to be outliers in the distributions of local structures.
One important aspect of the local structure we neglect is the contact stresses between particles.We note that although it is not utilized, the persistence analysis should capture this information in principle.In fact, we observe a slight correlation of ∆ 2 min with α b and α d in Figs.5(ci) and (c-iii) for both triangles with no gaps and cycles with more than three edges that contain no interior gaps.The farther a feature is from the vertical α b = 0 line, the smaller its average value of ∆ 2 min seems to be.That is, particles with in environments with larger contact overlaps, i.e. larger stress, seem to be less mobile.In addition, the externally applied shear strain provides a natural anisotropy to the system, which has been shown in other studies to be important in fully capturing which particles are likely to rearrange 15,26 .However, we do not include any orientational information in our descriptors and the persistence algorithm we have demonstrated does not take orientation into account.It would be useful to develop a way to either include this information in the persistent homology framework or at least find a means to correlate it with the features found by the standard algorithm.The addition of information about stresses and orientation into our analysis would help to provide an upper limit on the value of local structural information for the prediction of plastic rearrangements.
In order to decompose the persistence diagrams in Figs 4 and 5, we sort cycles according to size (number of edges), edge types (gaps vs. overlaps), and particle species (small vs. large radii).However, the persistence algorithm does not provide a unique representation of the cycles it detects.Instead, it only indicates when classes of homologous cycles appear and disappear during the course of the filtration.This means that for any hole that appears in the union of discs, there may be multiple cycles of edges in the triangulation that encircle that hole, resulting in the same birth-death pair.Consequently, it is necessary to choose a particular representation of each cycle in order to classify them according to type.For this study, the exact choice we make does not affect the overall results, so we choose a representation of each cycle in a way that arises naturally from the persistence algorithm.
To perform the persistence algorithm, one starts by constructing the boundary matrix ∂, an operator which maps simplices (vertices, edges, triangles, etc.) in a cell complex to their boundaries.Each row and column in ∂ represents a simplex, sorted in order of their appearance during the filtration (for simplicity, we will refer to the simplex represented by row or column i as simplex i).The jth row of ∂ is defined such that the element in the ith row is one if simplex i is a boundary of simplex j and zero otherwise.Performing the persistence algorithm in our case amounts to transforming ∂ to Smith normal form via column additions and subtractions modulo 2. In the resulting reduced matrix R, each column has a different pivot, the maximal row index of the nonzero column entries.If a column j has nonzero entries and its pivot is row index i, this means a feature was born upon the introduction of the simplex i and died with simplex j (see Ref. 20 for more detailed explanation).
The nonzero elements of a column j in R are a linear combination of simplices which form a cycle.Since each of these nonzero elements has index less than or equal to the pivot, each of the cycle's constituent simplices were present at the time that feature was born.Therefore, this cycle forms a representation of the topological feature at the time of birth, a birth cycle.Furthermore, the boundary of the simplex j forms a unique representation of the feature right before it's death, its death cycle.This means that column j is homologous to the birth cycle we have described.In other words, at the time right before death, both the birth and death cycles surround the same hole in the triangulation.
Therefore, given a feature that is born with simplex i and dies with simplex j, we choose the jth column of the reduced boundary matrix R as a representative cycle.It is a cycle that both describes a feature when it is born and is homologous to the cycle at the time of death.Furthermore, it is easy to compute, as it can simply be read off R when computing the persistence algorithm with no modification.We acknowledge that various methods exist to identify representative cycles.For example, one could calculate optimal cycles, choosing the smallest representation of each cycle.However, this method can be cumbersome, requiring the implementation of integer programming techniques 27 .A basis of birth cycles can also be found efficiently by finding the matrix V such that R = ∂V and reading off the columns corresponding to simplices at which features are born.In both cases, the resulting cycles are not always guaranteed to be homologous to a feature at the time of death (or even birth for optimal cycles).Our method of simply reading off the columns of R does not suffer from any of these drawbacks.

Triangular Cycle Persistence Curves
In this section, we derive the analytic forms of the persistence curves for the four type of triangles shown in Figs 4(c-i)-(c-iv).First, we introduce a general formalism for calculating α-values for simplices embedded in d-dimensional space.Next, we derive the solutions for the birth and death of triangular cycles with a single gap as a function of the size of the gap.

a. Calculating α-values
Suppose we have a simplex consisting of n points in d-dimensions (n ≤ d + 1) with positions v i where i = 1, . . ., n with each point assigned a weight w i .In the context of soft interacting spheres, we can write each weight as where R i is the interaction radius of particle i and α is a scale factor used to control the radii of the balls when performing the filtration of the weighted Delaunay triangulation.
Next, we define the weighted squared distance, or power, of a point x from v i as Note that π i ( x) = 0 is the equation of a sphere centered at v i with radius √ w i .The point x is said to be orthogonal to v i if the power between the two points is zero.We define a power sphere of a set of points as the d-dimensional sphere centered at x with x orthogonal to each point.
During a filtration on a Delaunay triangulation, the value of α at which a simplex comes into existence , i.e., that at which its balls all come into contact, is equivalent to that of the power sphere of its vertices.The position of the power sphere is then the point at which each ball comes into contact.Therefore, our goal is to find a position a and a scale factor α which define a power sphere for our n points.In the case where all points are equally weighted, this problem is equivalent to calculating the radius and position of a circumscribing d-sphere.
If n = d + 1, the radius is unique, but if n < d + 1, we will choose the unique sphere with minimum radius.
First, for each point, we write down its orthogonality condition, Expanding the square, we obtain We note that this is a nonlinear equation for a.To linearize, we define which is a set of n linear equations of d + 1 unknowns, a and q.
In the general case, we need an additional n − (d + 1) constraints to determine a unique solution.If we choose the minimum radius circumsphere, then its center will be coplanar with our n points.Thus, we write the center of the sphere as a parametric function of the points, where we have introduced an additional n − 1 free parameters s i , i = 2, . . ., n.We now have a total of d + 1 + n − 1 = d + n free parameters ( a, {s i }, q).We also have an additional d equations giving us a total of d + n, which means we have enough information to solve for the circumscribing power sphere.We then solve for a and q using the system of linear equations represented by Eqs.A6 and A7 and obtain the scale factor using Eq.A5.
We note that if n = d + 1, as is the case for a triangle in two-dimensions or a tetrahedron in three-dimensions, then we can omit Eq.A7.  (A9) We wish to calculate α b and α d as a function of the triangle shape as one of the sides opens up into a gap.Initially, when all three particles are in contact, we assume each pair of particles i and j overlaps by an amount δ ij > 0, which can depend on particle species (see the next section for more details).We place the gap between particles 2 and 3, parameterized by a parameter ε such that at a minimum value of ε = 0 all particles overlap by δ ij .All together, we parameterize the pairwise distances between particles as Fig. 11 depicts a schematic of such a triangle.From here, we calculate α b and α d as a function of the gap parameter ε.First, to derive the birth of the triangle, we calculate α for each edge in the triangle (n = 2).We use Eqs.A6 and A7, along with Eq.A5 to solve for α ij for an edge between particles i and j, resulting in This equation can be shown to be symmetric in i and j.The triangle will be born when all three possible pairs of particles start to overlap.Consequently, we take the maximum value α ij out of all three pairs: Next, we derive the the value of α at which the cycle defined by the triangle dies during the filtration.For a triangle in two-dimensions, n = d + 1 and Eqs.A6 and A5 are sufficient to solve for α, with the result The persistence curve is defined parametrically as a function of ε by following the path of the point (α b (ε), α d (ε)) for a particular combination of particle sizes.The gap parameter ε starts at a value of ε min = 0.As ε increases and a triangle is stretched out, eventually, the difference between the birth and death α-values of the triangle will approach one another.If there is a point at which they coincide, then the maximum valid value of the gap parameter, ε max will be the solution to the equation If a solution to this equation does not exist, then ε max will be the point at which they are closest together defined by )   This derivation can easily be extended to higherdimensional simplices such as tetrahedra, and also to higher embedding dimensions.In addition, one could choose different values of the initial overlap between particles or could explore the area swept out in the persistence diagrams by adding more than one gap to a simplex.

c. Estimating Contact Overlap
In the previous section, we introduced the contact overlap δ ij for a pair of particles i and j.In general, the average value of this overlap will depend on the details of the To construct our training set for classification, we first calculate D2 min for each configuration and sort the particles in increasing order.Next, we convert this ordering to the quantile of each particle i within that configuration, denoted q i , which ranges from 0 to 1. Examples of soft particles are then chosen as particles where q i is greater than q r , the upper quantile threshold.
To select hard particles, we use a slightly different approach.For each particle in a particular rearrangement configuration, we record the maximum value of D 2 min experienced by that particle within a window of 10 future rearrangements (including the current rearrangement).Next, we again convert this quantity to a quantile representation q i and choose all particles with q i less than q nr as examples of hard particles.In this work, we always choose q r = q nr .

b. Model
To perform classification, we utilize a support vector machine (SVM).In this framework, each particle i has a label y i where y i = −1 for soft particles and y i = 1 for hard particles, along with a vector of features x i , which may be BP descriptors or our descriptors based on gaps and contacts.We define N to be the number of particles used in training.Training the classifier than equates to solving the following optimization problem for the vector of weights w, intercept b, and slack variables ζ i : The hyperparameter C controls regularization.This formulation equates to finding a hyperplane with normal vector w which best separates the two classes of particles in the space of features.We then calculate the softness S i for each particle as a weighted sum of features,

Regression
The formulation for regression we use is standard ridge regression.For each particle in the training set, we have an independent value y i given by ∆ 2 min (i) along with a vector of features x i , which may be BP descriptors or our descriptors based on gaps and contacts.Defining N as the number of particles in training, we perform the following optimization problem for the vector of weights w and intercept b: The hyperparameter α controls regularization.W then calculate the softness S i for each particle as a weighted sum of features, (B8)

Machine Learning Protocol
We perform most of our machine learning tasks using the scikit-learn Python package 30 .Our learning protocol consists of the following three steps steps: 1. Rescale all structural descriptors to zero mean and unit variance.
3. Fit the model and use cross-validation or bootstrapping to calculate the mean and standard deviation of the test accuracy.
We elaborate on these steps in the following sections.Before training either of our models, we independently standardize each structural descriptor.To standardize a descriptor, we subtract its observed median value and divide the descriptor by the interquartile range (IQR), the difference between the 25th and 75th percentiles.We determine the median and IQR only considering data in the training set within a particular cross-validation or bootstrapping set, so as to avoid over-fitting.We use the median and IQR to standardize so as to reduce the potential influence of outliers.

b. Cross-validation
To avoid overfitting, we perform repeated two-fold cross-validation in which we randomly sort our trajectories into two sets of configurations, a training set and a test set, both of equal size.We then fit our model using the training set data and evaluate accuracy on the test set.We repeat this process 32 times and measure the mean and variance of the resulting accuracies.

c. Hyperparameter Search
For each of the two model types, we optimize one hyperparamter: C for classification or α for regression.To determine the best value for a hyperparamter, we scan through a range of values spaced on a log-scale and calculate a cross-validated train and test accuracies at each value.We then choose the parameter which results in the largest test accuracy.Fig. 12 shows the range of values scanned for the primary models reported in the main text.For regression we chose α = 10 3 when using gaps and contacts and α = 10 −5 when using BP descriptors.For classification, we find that the choice of q does not have a significant affect on this optimal value.In Figs.12(a) and (b), we have shown results for q = 10 −4.5 , corresponding to one particle of each class per configuration.For this model type, we chose C = 10 −2 for both types of descriptors.In principle, the cross-validation could also be performed for the discrete radius of the neighborhood used to calculate D 2 min and separately the discrete neighborhood radius used to rescale D 2 min to calculate ∆ 2 min .In addition, there are many choices for the hyperparameters that are used to define the BP descriptors.

d. Computing Model Accuracy
After performing our hyperparameter search, we take note of whether there is a significant difference between the training and test accuracies.If there is a significant difference for a particular model, we use cross-validation when we later evaluate the success of that model, reporting the relevant accuracy and CDF(S imax ) values as measured on the test set.If there is no significant difference, then we perform bootstrapping to evaluate success, as it requires significantly less computational power.When performing bootstrapping, we randomly sample configurations with replacement.We then fit the model to this resampled data set and evaluate accuracy and CDF(S imax ) on the same dataset.In both cases, we always resample our data set 32 times.In Figs.12(a) and (b), we see that for classification, the test accuracy is generally less than the train accuracy.This means that we always use cross-validation to evaluate success for classification-based schemes.In contrast, we see in Figs.12(c) and (d), the two accuracies do not differ significantly for regression.We therefore use bootstrapping to evaulate success when performing regression.

Figure 2 .
Figure 2. The onset of a rearrangement within a twodimensional configuration of jammed soft particles undergoing shear.(a) The non-affine deformation D 2 min for each particle in the configuration calculated from the critical mode u, describing the onset of the rearrangement.(Inset) Zoomed-in view of the neighborhood within the red box located near the particle with the largest value of D 2 min , defining the source of the rearrangement.Arrows indicate each particle's motion within the critical mode.(b) The locally rescaled non-affine deformation ∆ 2 min calculated for each particle and (Inset) the corresponding zoomed-in view of the rearrangement source.

Figure 3 .
Figure 3. (a) Weighted Delaunay triangulation of a packing of particles consisting of vertices at the center of each particle, edges between neighboring particles, and triangles between triplets of mutually adjacent particles.Some edges in the triangulation correspond to particle contacts, while others do not.(b) Initial configuration encountered during the filtration at αmin = −σ 2 .Particles with radii σ first appear as points, while particles with larger radii begin as as finite-sized discs.The corresponding alpha complex, a subset of the Delaunay triangulation, consists of a single point at the center of every particle.(c) Birth of the cycle consisting of the red edges at α b = −0.014σ 2 , representing the overlaps between the discs highlighted in blue.The current alpha complex consists of the vertices and edges shown as black lines.(d) Death of the cycle from (c) at α d = −0.014σ 2 when the green triangle is placed in the triangulation, representing the mutual overlap of the discs at its corners.The alpha complex has additional edges compared to (c), along with triangles wherever three discs overlap (triangles not shown).(e) Resulting persistence diagram quantifying all cycles encountered in the configuration.The cycle that is born in (c) and dies in (d) is highlighted in red.

Figure 4 .
Figure 4. (a) Average persistence diagram calculated from configurations at the onset of rearranging.Each pixel represents the the number of cycle observed at that particle α b and α d divided by the total number of configurations examined.The black vertical line highlights α b = 0, while the black dashed line highlights α b = α d .(b) Decomposition of the persistence diagram into cycles of different sizes and edge types.The vertical band is composed of (b-i) cycles with more than three edges and no interior gaps and (b-ii) cycles with more than three edges and at least one interior gap, while the diagonal band is composed (b-iii) cycles with exactly three edges and no gaps and (b-iv) cycles with three edges and at least one gap.(c) Decomposition of the diagonal band consisting of cycles with three edges according to constituent particle species.The sub-bands are composed of triangular cycles with (c-i) three small particles, (c-ii) two small particles and one large particle, (c-iii) one small particle and two large particles, and (c-iv) three large particles.Analytical predictions of the four sub-bands are shown as orange curves (see Appendix, Sec.A 2). Examples of each cycle type are shown in the insets, highlighted by the red edges and blue discs.Contacts are depicted as solid lines, while gaps are dashed lines.

Fig. 5 (Figure 5 .
Figure 5. (a) Correlation between cycles detected by the persistence algorithm and the non-affine motion of the particles comprising each cycle.Each pixel is colored according to the value of D 2 min • averaged across all cycles in that bin.There is no significant correlation between the persistence diagram and D 2 min • besides a small excess of motion for cycles exactly at α b = 0.The black vertical line highlights α b = 0, while the black dashed line highlights α b = α d .(b) Correlation between cycles and their normalized motion ∆ 2 min • averaged across all cycles within each bin.There are significant correlations between regions of the persistence diagram and ∆ 2 min •.(c) Decomposition of the persistence diagram according to number of edges and edge types correlated with ∆ 2 min .Cycles with small values of ∆ 2 min (shown in red) tend to consist of (c-i) more than three edges and contain no interior gaps or (c-iii) exactly three edges with no gaps.Cycles with large values of ∆ 2 min (shown in blue) tend to consist of (c-ii) more than three edges and at least one interior gap or (c-iv) exactly three edges with at least one gap.Examples of each cycle type are shown in the insets, highlighted by the red edges and blue discs.Contacts are depicted as solid lines, while gaps are dashed lines.

Figure 6 .
Figure 6.Discrete distances defined in terms of a configuration's weighted Delaunay triangulation.(a) Distance di,j of each particle j from the central particle i shown in blue.Particle distances are taken as the minimum path length along the edges of the the triangulation between the two particles.(b) Distance d i,(j,k) of each edge (j, k) composed of particles j and k from the central particle i. Edge distances are calculated from the sum of the distances of their respective vertices from the particle of interest.

c 1 iFigure 7 .
Figure 7. Distributions of the locally rescaled non-affine deformation ∆ 2 min for (a-i) small (Ri = σ) and (a-ii) large (Ri = 1.4σ particles with different numbers of gaps in the first layer of their local Delaunay triangulation g 1 i .Similarly, the distributions of ∆ 2 min for (b-i) small and (b-ii) large ) particles with different numbers of contacts in their first layer c 1 i .The full distributions including all particles are shown as black dashed curves.Particles with more gaps or less contacts tend to have larger values of ∆ 2 min on average.

Figure 8 .
Figure 8. Model accuracies as a function of the number of independent configurations (one rearrangement configuration per trajectory) included in the training set.(a) Classification accuracy using D 2 min with gaps and contacts (blue) and BP descriptors (red).(b) Regression accuracy R 2 using ∆ 2 min with gaps and contacts (green) and BP descriptors (magenta).(c) Average percentile of the particle with largest D 2 min within each frame for the four models, colored according to (a) and (b).Error bars correspond to the variance of the accuracies computed via cross-validation or bootstrapping.

2 minFigure 9 .
Figure 9. Accuracies as a function of quantile thresholds q = qr = qnr used to identify training examples of rearranging or non-rearranging particles for classification.A smaller value of q indicates a stricter threshold and smaller training set.(a) Classification accuracy for a classifier trained using gaps and contacts (blue) and BP descriptors (red).(b) Average percentile of the particle with largest D 2min within each frame CDF(Si max ) .The result for the classification models are are shown using solid lines while the corresponding accuracies for the regression models using ∆ 2 min are shown as dashed lines for gaps and contacts (green) and BP descriptors (magenta) .Error bars correspond to the variance of the accuracies computed via cross-validation or bootstrapping.Similarly, transparent bands surrounding the dashed lines represent variance for the regression models.

Figure 10 .
Figure 10.Accuracies as a function of maximum Euclidean descriptor distance rmax for included structural descriptors.Distance for each descriptor is averaged over all instances of that feature and measured in units of σ, the minimum particle radius.(a) Classification accuracies for the classification models using gaps and contacts (blue) and BP descriptors (red).(b) Regression accuracies R 2 for regression models using gaps and contacts (green)and BP descriptors (red).(c) Average percentile of the particle with largest D 2 min within each frame CDF(Si max ) for all four models.Transparent bands surrounding the lines indicate the variance of training accuracies computed via cross validation.

Figure 12 .
Figure 12.Search for optimal hyperpameters the four different models.In all cases, we show the training and test accuracies (solid and dashed lines, respectively) relevant to the model type.Classification accuracy is shown as a function of the regularization paramter C for (a) descriptors based on gaps and contacts and (b) BP descriptors.Similarly, regression accuracy is shown as a function of the regularization parameter α for (a) gaps and and contacts and (b) BP descriptors.

Table I .
Comparison of four different combinations of statistical model, dynamical measure and structural descriptors.The accuracy metric is determined by model type: binary classification accuracy is used to measure classification success and R 2 is used to measure regression accuracy.To compare all softness schemes, we report the percentile of the softness value of the particle with largest D 2 min averaged over each configuration, CDF(S imax ) . a