Analyzing collective motion with machine learning and topology

We use topological data analysis and machine learning to study a seminal model of collective motion in biology [M. R. D’Orsogna et al., Phys. Rev. Lett. 96, 104302 (2006)]. This model describes agents interacting nonlinearly via attractive-repulsive social forces and gives rise to collective behaviors such as flocking and milling. To classify the emergent collective motion in a large library of numerical simulations and to recover model parameters from the simulation data, we apply machine learning techniques to two different types of input. First, we input time series of order parameters traditionally used in studies of collective motion. Second, we input measures based on topology that summarize the time-varying persistent homology of simulation data over multiple scales. This topological approach does not require prior knowledge of the expected patterns. For both unsupervised and supervised machine learning methods, the topological approach outperforms the one that is based on traditional order parameters.


I. INTRODUCTION
Fundamental to many nonlinear systems is the link between local rules and global behaviors. For instance, what in uence does coupling between oscillators have on the ability of a network to synchronize? How does policing a ect hotspots of crime in urban areas? Why do the movement decisions that sh make lead to schooling structures? In this paper, we consider the relationship between Chaos ARTICLE scitation.org/journal/cha local rules and global behavior in service of two tasks of interest in nonlinear science, namely, classi cation of dynamics and parameter recovery. We conduct this study in the context of one of the aforementioned applications: collective motion in biology. Animal groups such as ocks, herds, schools, and swarms are ubiquitous in nature, 1,2 inspire numerous mathematical models [3][4][5][6] and motivate biomimetic approaches to engineering and computer science. 7,8 In animal groups, two levels of behavior come into play. First, at the individual level, organisms make decisions about how to move through space. It is well-established in the biological literature that social interactions between organisms play a key role. Second, at the group level, collective motion may occur, where animals coordinate and produce emergent patterns. Mathematical models can help link these two levels of behavior. This type of linkage is fundamental to innumerable nonlinear systems displaying collective behavior.
While there is a vast literature on mathematical models of collective motion, we focus on the in uential model of D'Orsogna et al. 9 This model describes agents whose movement is determined by selfpropulsion, drag, and social attraction-repulsion, forces frequently used in collective motion studies. 10,11 The model produces paradigmatic behaviors such as rotating rings, vortices, disorganized swarms, and traveling groups; these mimic sh schools, swarms of midges, bird ocks, and more. [12][13][14] Studies of collective motion models often focus on a forward problem: given a particular model, what dynamics are observed for di erent parameters? In our present work, however, we study an inverse problem: given observed data, what parameters could have produced it? Also, given observed data, what paradigmatic dynamics are being exhibited? 15,16 There are numerous strategies to infer parameters from data with simple di erential equation models, including Bayesian inference 17 and frequentist approaches. 18 Our primary contribution here is to solve an inverse problem with an agent-based model using tools from topological data analysis and machine learning. Though we focus on collective motion in the D'Orsogna model, our protocol is applicable in many other settings.
Machine learning and mathematical modeling have traditionally been viewed as separate ways of understanding data. On one hand, machine learning can extract predictions of complex relations within large data sets. On the other hand, modeling can be used to hypothesize how mechanisms lead to an observed behavior. There is a growing understanding, however, that modeling and machine learning can be used in synergy. 19 For example, Lu et al. use nonparametric estimators to learn the rules governing the observed output of agent-based models. 20 The method is applied to fundamental interacting particle systems, 21 models of social in uence, 22 predator-swarm systems, 23 and phototactic bacteria. 24 Our study hinges on the use of topological data analysis (TDA), a set of tools that "help the data analyst summarize and visualize complex datasets." 25 TDA has played a pivotal role in studies of breast cancer, spinal cord injury, contagion, and other biological systems. [26][27][28]45 A primary tool in the TDA toolbox is persistent homology. Homology has to do with calculating certain topological characteristics, while persistence refers to examining which of these are maintained across multiple scales in the data. In its fundamental form, persistent homology provides a framework for describing the topology of a static data set. However, ideas from persistent homology have been extended to time-varying data. One approach is the Contour Realization Of Computed k-dimensional hole Evolution in the Rips complex, known more simply as a crocker. 29 A crocker shows contours of quantities called Betti numbers as a function of time and of persistence scale, providing a topological summary of time-varying point clouds of data. Recent work has shown how crockers can be used for exploratory data analysis of collective motion and for judging the tness of potential mathematical models of experimental data. 29,30 Our present work brings together mathematical modeling, machine learning, and TDA to study collective motion in the D'Orsogna model. More speci cally, we construct crockers from numerical simulations of the model and use them as inputs to machine learning clustering and classi cation algorithms in order to identify di erent paradigmatic patterns and the model parameters that produce them. We compare this approach to a more traditional one, which uses order parameters commonly used in studies of collective motion. While the traditional order parameters are typically chosen using prior knowledge of the system, the TDA tools can be used with no prior knowledge and are problem-independent. In our methodological study, we use data simulated from models so that we can compare inferred parameters to those actually used to generate the data. Once trained, the machine learning algorithms can be used to infer model parameters from experimental data.
The rest of this paper is organized as follows. Section II describes our methods, namely, numerical simulation of the D'Orsogna model, computation of traditional order parameters and crockers, and machine learning techniques. Section III presents our results, including our primary nding: the topological approach outperforms the traditional one. We conclude in Sec. IV.

A. D'Orsogna model and numerical simulations
The D'Orsogna model 9,31 describes the motion of N interacting agents of mass m. Each agent is characterized by its position and velocity x i , v i ∈ R d , i = 1, . . . , N. We focus on the two-dimensional case, d = 2, throughout this study, though the case for d = 3 has been considered in Ref. 32. Position and velocity obey the coupled, nonlinear ordinary di erential equations, where ∇ i is the gradient with respect to x i . Equation (1a) states that the time derivative of position is velocity. Equation (1b) is Newton's law, with the right hand side describing three forces acting on each agent: self-propulsion with strength α, nonlinear drag with strength β, and social interactions. These social interactions are attractiverepulsive, as speci ed by the Morse-type potential in (1c). The rst term inside the brackets describes social repulsion of overall strength C r > 0. Repulsion decays exponentially in space, with characteristic length scale of decay r > 0. The exponential decay re ects that organisms' ability to sense each other through sight, sound, or smell Chaos ARTICLE scitation.org/journal/cha decays over distance. Restated, an organism will be more heavily in uenced by near individuals than far ones. The second term is signed oppositely, and hence describes social attraction. The summation in (1c) means that a given organism interacts with all other organisms, albeit with in uence decaying exponentially in space. Biological modeling studies typically assume that repulsion is strong but operates over a short length scale, while attraction is weaker but operates over a longer scale. For (1), this would mean C r > C a and r < a . In this case, the Morse potential U has a wellde ned minimum, representing the distance at which attraction and repulsion balance for two organisms in isolation. However, this distance is not the separation observed between individuals in a group because of the nonlinear all-to-all coupling. Regardless, as in Ref. 9, we do not enforce the restriction C r > C a and r < a .
For our study, we set N = 200. After nondimensionalizing (1), we are left with four dimensionless parameters: α, β, C = C r /C a , and l = l r /l a . We set α = 1.5, and β = 0.5, corresponding to a base case of parameters from Ref. 9 and explore the remaining parameter space by varying the ratios C = C r /C a and = r / a . Both C and will take on values in {0.1, 0.5, 0.9, 2.0, 3.0}, resulting in 25 different possible parameter combinations. For each combination, we perform 100 simulations using MATLAB's ode45 function. Because the D'Orsogna model is typically independent of initial conditions, we draw x i (0) and v i (0) each from a uniform distribution on [−1, 1] d . We simulate until t = 100, allowing the swarm to attain a dynamic equilibrium state. From the computed trajectories, we sample the positions and velocities every 0.05 time units. Thus, the nal output .05 for each of our 2500 simulations.
These simulations produce paradigmatic collective motion including single mills, double mills, double rings (referred to simply as rings in Ref. 9), collective swarms, and group escape, which we split into three distinct classes. 9 For the remainder of this paper, we refer to paradigmatic collective behaviors as phenotypes. The rst column of Fig. 1 shows a representative snapshot of each major phenotype, and Fig. 2 shows the three distinct escape types. Table I lists the values of (C, ) that produce each phenotype in our library of simulations. We describe these phenotypes in more detail at the end of Sec. II C.

B. Order parameters
Investigators often use order parameters to summarize the output of collective motion experiments or simulations. 9,11,29 Summaries are necessary because it is impractical or impossible to manually inspect large amounts of raw output data. The order parameters are intended to suggest if and when certain types of group behavior emerge in a population, for instance, when many individuals move in the same direction or rotate with the same orientation. Typical order parameters used for (1) include polarization, angular momentum, absolute angular momentum, and the mean distance to the nearest neighbor. 29 We use these four in our present study, plotted for each phenotype in the second column of Fig. 1.
Group polarization P(t) measures the degree of alignment between agents and is given by with P = 1 signifying that all agents have the same direction of motion. All phenotypes in Fig. 1 exhibit low P(t), suggesting no translational ocking. Angular momentum M ang (t) can detect rotational motion and is given by where , and x cm (t) refers to the center of mass of the agents. A group with M ang = 1 would have individuals sharing perfectly rotational motion. Following Ref. 31, we also consider the absolute angular momentum M abs (t), given by Discrepancies between angular momentum and absolute angular momentum can distinguish a single mill from a double mill, in which counter-rotating agents cancel out each other's angular momentum.
For example, we observe in Fig. 1 that M abs (t) ≈ M ang (t) for the single mill, whereas M abs (t) > M ang (t) for the double mill. Finally, we consider the mean distance to nearest neighbor, D NN (t), which may distinguish group escape behavior from other phenotypes. 33 D NN (t) becomes very large for the escape phenotype over time in Fig. 1 as the particles act repulsively.
In calculating time series of these order parameters, we downsample by a factor of 23 (chosen since it is a divisor of 2001, the original number of simulation frames), resulting in M = 87 time points. While some information is lost with downsampling, it makes subsequent computations faster, while maintaining a high level of classi cation accuracy with the downsampling rate we have chosen.

C. Persistent homology and crockers
While order parameters can be useful summaries of collective motion data, they are typically designed in a problem-speci c manner and with some knowledge of the expected dynamics. We compare an order parameter approach to one that measures the topology of the data. This approach is arguably more agnostic and less applicationspeci c. We now review relevant ideas from topology. To make this review broadly accessible, we keep it conceptual. For some technical details, see, e.g., Refs. 29, 34, and 35.
We begin our explanation of persistent homology and crockers by focusing on agents' positions during one time step of a simulation (or experiment). These data constitute a point cloud, made up of N points in R d . To study the topology of a point cloud, we transform it into an object called a simplicial complex. While there are many ways to construct a simplicial complex, we use the Vietoris-Rips (VR) complex, a common choice in TDA because it is e cient to compute.
To build a VR complex, we select a distance and draw a ball of diameter around each point. If two balls intersect, we connect them with an edge. If three balls all pairwise intersect, we connect all three edges and ll in the resulting triangle. If four balls all pairwise intersect, we connect all six edges, ll in each of the four triangles bounded by those edges, and ll in the solid tetrahedron bounded by the four triangles. Points, edges, lled triangles, and solid tetrahedra are called 0-, 1-, 2-, and 3-simplices, and more generally, k-simplices for any k + 1 points with -balls that pairwise intersect. With a simplicial complex built from our point cloud, we now measure its topology by calculating its Betti numbers. Betti numbers b k are topological invariants, meaning that they are unchanged under continuous deformations of the object such as stretching, compressing, warping, and bending. Thus, they measure something fundamental about the shape of the object. More speci cally, Betti numbers enumerate the number of distinct holes in the complex that have a k-dimensional boundary, that is, a hole surrounded by k-simplices. For instance, b 0 counts the number of connected components in the simplicial complex. Similarly, b 1 counts the number of topological loops that bound a 2D void. Betti number b 2 counts the number of trapped 3D volumes and so on as dimension increases. Algebraic topology tells us how to encode the calculation of Betti numbers as a linear algebra problem; see standard topology texts or Ref. 36 for a tutorial. The calculation is a homology computation because b k is the rank of an algebraic object called a homology group.
In the discussion above, no value of was speci ed. Persistent homology constructs simplicial complexes and calculates Betti numbers for a range of values. There exist powerful software packages that automate this process. 35 We use the Ripser package. 37 The TABLE I. Collective motion phenotypes and corresponding social interaction potential parameters in our library of simulations of the D'Orsogna model (1). Here, C = C r /C a , = r / a , and N = 200. We fix the remaining parameters α = 1.5 and β = 0.5 as in Ref. outputs of these computations are birth and death values of , that is, the values of for which the various features enumerated by b k appear and disappear. The word persistence refers to the ranges of over which features persist. For example, features that persist over large ranges of might be interpreted as signals rather than topological noise. There exist many ways to organize the birth and death information, with the most common being objects called barcodes and persistence diagrams. 35 Additionally, for a given value of k, one could construct a vector in which each entry gives the value of b k for a speci c value of (say, on a grid). This information, b k ( ), is a Betti curve.
Small perturbations in data produce small perturbations in persistence diagrams; that is to say, persistence diagrams are stable to noise near the data. 38 However, persistence computations are not stable with respect to outliers. In practice, a codensity measure can be used to lter outliers. Further work in this area develops notions of distance that are robust to large quantities of empirical noise and outliers. [39][40][41] A clustering approach could also be used to limit the e ects of this type of noise. 42 Thus far, we have discussed topological analysis of static point clouds. If we allow our agents' positions to evolve dynamically in time, t, then we can construct a Betti curve for each frame of the simulation or experiment, and concatenate these into a matrix. We let time t vary along columns and vary along rows. Each entry speci es the Betti number b k for a speci c pair (t, ). The matrix is a topological signature of the time-varying data of a simulation and once vectorized can serve as input to machine learning algorithms. Equivalently, for visualization, one could take this matrix and construct a contour plot of b k (t, ). Such a plot is the Contour Realization Of Computed k-dimensional hole Evolution in the Rips complex, or crocker, de ned in Ref. 29.
As mentioned in Sec. II A, our data consist of numerical solutions of (1). While the D'Orsogna model tracks agents' positions and velocities, we restrict ourselves to using position data in our topological analysis. This approach has two advantages. First, it renders our techniques applicable to experimental data, where position is the most easily observed quantity. Second, it circumvents a potential scaling disparity between numerical values of position and velocity when performing TDA on the data, as exhibited in Fig. 3. In contrast, three out of four order parameters described above require knowledge of the velocity.
To regain some of the information lost by excluding velocity, we incorporate time-delayed position information into some of our  Figure 5 and accompanying text in Appendix A describe the 4D data for single mills, double mills, and double rings. We will also consider position-only crockers {computed on a point cloud consisting of [x i (t j )], where the sampling of j is the same as that discussed for the order parameters in Sec. II B}. Still, challenges remain with the normalization of our topological data. With escape phenotypes, interagent distances can approach in nity, whereas they remain bounded for other phenotypes. To circumvent this challenge, when performing our topological analyses, we take any agent whose distance from the origin crosses the threshold x i ∞ = 10 and edit the simulation data to hold it xed at this position.
Even after enforcing this bound, phenotypes occur on a range of scales. While agent coordinate positions are capped at 10 for group escape, they are as small as 10 −3 for collective swarms. We normalize position data across all simulations with a global normalization constant to ensure −1 ≤ x i (t) ∞ ≤ 1. With this scaling, the smallest normalized phenotypes have typical distances of 10 −4 . Thus, we compute persistent homology with varying logarithmically between 10 −4 and 1 with 200 grid points such that q = 10 −4+q , = 4/200, q = 1, . . . , 200.
The third and fourth columns of Fig. 1 show crockers (b 0 and b 1 ) for ve example simulations. The crockers for single and double mills di er markedly. In (b), the b 1 crocker for the double mill contains small islands corresponding to two loops (b 1 = 2) within a large area of topological noise (b 1 > 2). In (a), the b 1 crocker for the single mill lacks this signature. In (c), a very strong signature of two loops (b 1 = 2) appears for the double ring simulation. Appendix A provides an explanation for the presence of two loops for double mills and rings. In (d), multiple agents form tight clumps in the collective swarm simulation, with each clump su ciently dense that it appears as a single dot in the gure. The time scale at which clumps form manifests as the disappearance of high-valued regions in the b 0 contour. On a macroscopic scale, we notice that each clump eventually travels with rotational motion, consistent with b 1 = 1 over a range of scales at later times. In (e), agents escape to in nity in a radially expanding circular arrangement. The strong signature of b 1 = 1 occurs at larger scales as time increases, consistent with an expanding circle.

D. Unsupervised learning
We use the k-medoids algorithm to cluster numerical simulations. Each simulation is characterized by a feature vector constructed either from traditional order parameters or from topology. The order parameter feature vectors consist of time series P(t), M ang (t), M abs (t), D NN (t), or the concatenation of all four. The topological feature Chaos ARTICLE scitation.org/journal/cha vectors consist of vectorized crocker matrices for b 0 , b 1 , or the concatenation of the two, calculated from agent position (in some cases, augmented with time-delayed position as described in Sec. II C). The k-medoids algorithm divides the ensemble of simulations into k clusters, each of which is de ned by one member of the ensemble that serves as the medoid. The algorithm chooses medoids to minimize the sum of pairwise distances within each cluster, and each simulation is assigned to the cluster containing its closest medoid. We use the R software function pam to cluster our simulations into k = 25 groups, since there are 25 distinct parameter choices (C, ). This is an unsupervised approach, as it does not require labeled training data.

E. Supervised learning
As an alternative approach, we use a multiclass linear support vector machine (SVM) to infer parameters. 43 Our use of SVMs is supervised because we train them on a subset of our simulations, each labeled with its true (C, ) values. A linear SVM takes this training data and nds hyperplanes that maximally divide the simulations according to parameter values. To classify a simulation not included in the training set, one identi es the intrahyperplane region in which it falls and reads o the appropriate label, i.e., the parameter values.
We use Matlab's fitcecoc function to build and train our SVMs using a one-vs-one approach and 5-fold cross-validation. That is, for each round of cross-validation, we withhold 20% of the data (20 simulations from each parameter combination) for the testing data set. We then train the linear SVM on the remaining data and compute the out-of-sample accuracy for simulations in the testing set.
Order parameter and topological feature vectors are not of the same dimension. The time series of each order parameter is 87-dimensional, and the time series of all four concatenated is 4 × 87 = 348-dimensional. On the other hand, each position crocker is 200 × 87 = 17 400-dimensional, and the concatenation of b 0 and b 1 is double that. To make a fair comparison between the order parameter and topological approaches, we use principal component analysis 44 (PCA) to reduce the dimensionality of our input feature vectors. In one case, we reduce crockers and the concatenated order paramers to 87 dimensions in order to compare them directly to the individual order parameter time series. In a second case, we reduce all feature vectors to three dimensions to investigate performance at low dimensionality. Figure 4 recapitulates the entire analysis pipeline for our study. We summarize the dynamics of (1) using two approaches. The order parameter-based approach uses problem-speci c quantities designed to distinguish between observed dynamics. The topologybased approach does not require this a priori knowledge and is instead based purely on the shape of the data. We then construct feature vectors from each type of summary and input them into machine learning algorithms to identify parameters and a phenotypic pattern for each model simulation. In Secs. III A and III B, we compare how accurately these di erent approaches can recover simulation parameters and identify dynamic phenotypes.

FIG. 4.
Our analysis pipeline. We summarize the dynamics of the D'Orsogna model (1) using problem-specific order parameters (left) and a problem-independent description based on topology (right). We construct feature vectors from each summary and input them into machine learning algorithms to identify parameters and phenotype for each model simulation. Table II summarizes results for k-medoids clustering with k = 25. Columns 1 and 2 specify the feature vector used, while columns 3 and 4 give the accuracies obtained for parameter recovery and phenotype identi cation. For the parameter recovery task, using the concatenation of all four order parameters yields 49.9% accuracy. For the concatenation of b 0 and b 1 based only on position data, we obtain 76.6% accuracy. Finally, for b 0 and b 1 based on time-delayed position, we have 71.3% accuracy. In Appendix B, Fig. 7 displays confusion matrices. These reveal that regardless of feature vector type, the collective swarming and single mills are the most frequently misclassi ed phenotypes.

A. Unsupervised learning results
For all feature vector types, phenotype identi cation is signi cantly more accurate than parameter recovery. The confusion TABLE II. Unsupervised classification accuracy for parameter values using k-medoid clustering with various input feature vectors and k = 25. The third column displays the accuracy when simulations are classified by parameter vector (C, ), and the fourth column displays the accuracy when simulations are classified by phenotype.

Summary
Feature Parameter (%) Phenotype (%) Chaos ARTICLE scitation.org/journal/cha matrices reveal that most cases of parameter recovery failure nonetheless place a simulation in its appropriate phenotypic regime. Using topological feature vectors based on position, we classify nearly every simulation correctly by phenotype, and this approach slightly outperforms concatenation of all order parameters. The slight improvement in classi cation accuracy when using TDA instead of order parameters may not be su cient to justify the increased computational time for phenotype classi cation tasks. However, for parameter recovery tasks, TDA signi cantly improves classi cation accuracy when using k-medoids, and, as discussed shortly, when using a supervised classi cation method.

B. Supervised learning results
Table III summarizes supervised classi cation results for linear SVMs. For topological feature vectors based on position, b 0 does best with 97.0% accuracy. Similarly, for delayed positions, b 0 also does best, with 99.6% accuracy. Finally, for order parameter feature  vectors, D NN (t) does best with 91.1% accuracy, while concatenating all four order parameters yields 89.2% accuracy. For any feature vectors with a dimension greater than 87, we also include results obtained after reducing the dimensionality to 87 via a principal component analysis, allowing for a more fair comparison. After dimension reduction, the time-delayed topological information for b 1 achieves the highest classi cation accuracy at 99.9%, followed by the position-only topological information for b 0 at 96.2%. The classi cation accuracy for all four concatenated order parameters drops to 69.6%. Thus, a fourfold reduction in the dimension of the concatenated order parameters results in a 19.6% loss of accuracy, whereas a 200-fold reduction for time-delayed topological information leads merely to a 0.8% drop in accuracy for position-only information and a 0.1% increase for time-delayed topological information. These results suggest that even with dimensionality reduction, the topological feature vectors still carry more discriminative information.
To examine the limit of low-dimensional data, we calculate accuracies obtained after reducing all feature vectors to three dimensions. In this case, for topological feature vectors based on position data, the concatenation of b 0 and b 1 does best, with an accuracy of 93.1%. For delayed position data, b 0 does best, achieving 87.7% accuracy, and for order parameters, D NN (t) does best, yielding 81.5% accuracy. Figure 6 in Appendix B shows the three-dimensional representations of the b 0 and b 1 crockers. We observe a strong separation of the di erent phenotypes, which explains the high out-of-sample classi cation accuracy.
Also in Appendix B, Fig. 8 visualizes the classi cation results. These results suggest that, similar to the unsupervised case, collective swarms are the most di cult phenotype to classify. Still, overall, using topological data rather than order parameter data can signi cantly improve parameter recovery.

IV. CONCLUSIONS AND DISCUSSION
We have combined mathematical modeling, topological data analysis, and machine learning to study nonlinear dynamics and parameter inference in the D'Orsogna model of collective motion. More speci cally, we simulated (1), summarized the data using traditional and topological descriptors, and input these summaries into unsupervised and supervised machine learning algorithms in order to recover model parameters and classify pattern phenotypes.
Our machine learning classi ers achieved higher accuracy when using topological feature vectors (namely, crockers) than when using feature vectors based on traditional order parameters. Since the crocker feature vectors have higher dimensionality than the order parameter ones, we sought a fair comparison by reducing them via PCA. In this case, the crocker approach still achieved better classication accuracy using a supervised approach. In fact, b 0 crockers generated from time-delayed position data produced a nearly perfect classi cation. The time-delayed crockers encode information on particle velocity, which appears explicitly in Eq. (1) and may thus aid the algorithm. The addition of b 1 information serves primarily to increase the dimensionality of the feature space and results in reduced, though still quite high, classi cation accuracy.
One limitation of the topological approach is the computational cost required to produce crockers. While an order parameter Chaos ARTICLE scitation.org/journal/cha is scalar-valued at each time, a crocker is vector-valued. However, recent software improvements, including the development of the Ripser package, have led to a signi cant reduction in cost. A major advantage of using topological data summaries is that they do not require prior knowledge about the patterns resulting from model simulation. Order parameters, on the other hand, are typically developed to capture speci c features of previously observed model behavior. We found that for the D'Orsogna model, topological approaches to phenotype classi cation and parameter recovery achieved higher accuracy than order parameters even though they do not incorporate knowledge of the model or its dynamics.
In future work, we would like to apply this approach to data from biological experiments or eld observations. There is a scarcity of publicly available data describing real biological aggregation dynamics, so for the present, we have demonstrated our method on simulation data. Furthermore, we would like to extend our work to more complex settings, e.g., to the D'Orsogna model posed in three dimensions, in which dynamical transitions occur between distinct phenotypic regimes. 32 Finally, it would be useful to augment the model with noise and assess its e ect by using our topological methods.

ACKNOWLEDGMENTS
This material is based upon work supported by the National Science Foundation (NSF) under Grant No. DMS 1641020. We wish to acknowledge the American Mathematical Society's Mathematical Research Community, which brought our collaboration together and supported our work. L.Z. is partially supported by the NSF grant Chaos ARTICLE scitation.org/journal/cha and double ring phenotypes. Since the data are 4D, as described in Sec. II C, we show various 2D projections. Panel (a) shows a single mill. We see a single looplike structure that arises from the arrangement of particles in an annulus all traveling with the same orientation. This structure produces b 1 = 1 in Fig. 1(a). Panel (b) shows a double mill. Especially in the last two columns, we see a signal of two loops. The two loops correspond to the two counter-rotating swarms of the double mill. However, the signal is quite noisy. This noisy signal manifests as the transient islands of b 1 = 2 in Fig. 1(b). Panel (c) shows a double ring. Agents occupy a well-de ned circle, with some rotating clockwise and others counterclockwise. This phenotype gives rise to two loops in the 4D space of our data (see last two columns) and produces a clear signal of b 1 = 2 in Fig. 1(c). Figure 6 displays the representation of all 2500 time-delayed b 0 and b 1 crockers reduced to three-dimensional space using PCA. Here, we see a strong separation of the single mill, double mill, collective swarm, escape, and double ring phenotypes. This separation explains why a linear SVM trained on this information has a fairly high out-of-sample classi cation accuracy, as shown in Table III. In panel (a), we cluster on the concatenation of all four order parameters. Most of the simulations with misidenti ed model parameters are those exhibiting single mill behavior or collective swarm behavior. However, for these incorrect cases, most were clustered with cases sharing the same phenotype. In panel (b), we use the concatenation of b 0 and b 1 crockers derived from 2D position data. In this case, parameter recovery is more accurate for single mill simulations than in panel (a) but collective swarms remain challenging. Panel (c) is similar to (b), but it is based on 4D data incorporating position and time-delayed position. This approach yields results similar to those in (b), with a slight increase in misclassi cation among the three escape phenotypes. Figure 8 depicts the out-of-sample parameter classi cation results from linear SVMs, as described in Sec. III B. Note that in this gure, we are depicting the classi cation of each individual simulation and not binning these classi cations together as we do in the confusion matrices of Fig. 7. Because of the high supervised classi cation accuracies, depicting the individual classi cations is more informative than the summary confusion matrix. In panel (a), the linear SVM is trained on feature vectors comprised of D NN (t) without any dimensionality reduction. We observe a high misclassi cation from parameters (C, ) = (0.1, 0.5) and (0.5, 3.0) as each other, as well as simulations from (C, ) = (0.1, 0.9), (0.1, 2.0), (0.1, 3.0). All of these parameter choices produce the collective swarm phenotype (see Table I), suggesting that this is the most di cult phenotype for parameter recovery. In panel (b), the linear SVM is trained on feature vectors comprised of the concatenation of b 0 and b 1 crockers derived from 2D position data with dimensionality reduction down to 87 [to match the D NN (t) dimensionality]. Here, we observe a marked reduction in the misclassi cation of the collective swarm parameter values as compared to Panel (a). In panel (c), the linear SVM is trained on feature vectors comprised of b 0 and b 1 crockers derived from 4D time-delayed data with dimensionality reduction down to 87. Here, we observe very accurate classi cations, with only 7 simulations being misclassi ed out of 2500.