Anatomy of Leadership in Collective Behaviour

Understanding the mechanics behind the coordinated movement of mobile animal groups (collective motion) provides key insights into their biology and ecology, while also yielding algorithms for bio-inspired technologies and autonomous systems. It is becoming increasingly clear that many mobile animal groups are composed of heterogeneous individuals with differential levels and types of influence over group behaviors. The ability to infer this differential influence, or leadership, is critical to understanding group functioning in these collective animal systems. Due to the broad interpretation of leadership, many different measures and mathematical tools are used to describe and infer"leadership", e.g., position, causality, influence, information flow. But a key question remains: which, if any, of these concepts actually describes leadership? We argue that instead of asserting a single definition or notion of leadership, the complex interaction rules and dynamics typical of a group implies that leadership itself is not merely a binary classification (leader or follower), but rather, a complex combination of many different components. In this paper we develop an anatomy of leadership, identify several principle components and provide a general mathematical framework for discussing leadership. With the intricacies of this taxonomy in mind we present a set of leadership-oriented toy models that should be used as a proving ground for leadership inference methods going forward. We believe this multifaceted approach to leadership will enable a broader understanding of leadership and its inference from data in mobile animal groups and beyond.

Understanding the mechanics behind the coordinated movement of mobile animal groups (collective motion) provides key insights into their biology and ecology, while also yielding algorithms for bio-inspired technologies and autonomous systems. It is becoming increasingly clear that many mobile animal groups are composed of heterogeneous individuals with differential levels and types of influence over group behaviors. The ability to infer this differential influence, or leadership, is critical to understanding group functioning in these collective animal systems. Due to the broad interpretation of leadership, many different measures and mathematical tools are used to describe and infer "leadership", e.g., position, causality, influence, information flow. But a key question remains: which, if any, of these concepts actually describes leadership? We argue that instead of asserting a single definition or notion of leadership, the complex interaction rules and dynamics typical of a group implies that leadership itself is not merely a binary classification (leader or follower), but rather, a complex combination of many different components. In this paper we develop an anatomy of leadership, identify several principle components and provide a general mathematical framework for discussing leadership. With the intricacies of this taxonomy in mind we present a set of leadership-oriented toy models that should be used as a proving ground for leadership inference methods going forward. We believe this multifaceted approach to leadership will enable a broader understanding of leadership and its inference from data in mobile animal groups and beyond. When observing the collective motion of animal groups (e.g., schooling, herding, or flocking), an immediate question is, what is the leadership structure? Who (if anyone) is in charge and who is following, and does such structure stay the same or change over time? Recent technological advances in image processing and animalmounted sensors make it possible to record the simultaneous movement trajectories of every animal in a group. Such abundance of data makes the present a promising time to progress in understanding leadership structure in mobile animal groups. Despite the availability of data and the central importance of understanding leadership in collective motion, there is surprisingly little explicit mathematical description or even a consistent and well-defined approach to this subject. Here, as a first step toward addressing this deficiency, we construct a framework for inferring leadership in collective motion. We review various sources and characteristics of leadership to provide an anatomy and a language for describing the multifaceted aspects of leadership across a) Electronic mail: Joshua@santafe.edu a variety of animal societies. We then present a suite of leadership-focused toy models, which can be used as a proving ground for any proposed leadership inference method, before being naively applied to (empirical) data. Together, this lays the groundwork for a principled exploration of a perennial question: how is control of a collective system distributed? Such understanding will not only contribute to the ecology and conservation of group-traveling species, but will also aid in the design of control algorithms for emerging distributed technologies.

I. OVERVIEW
Mobile animals groups (e.g., flocks, herds, schools, swarms) are ubiquitous in nature. In such collective systems, the interactions between individuals may be as important as characteristics of the individuals themselves 1 . Insight into these interactions and their impact on the group dynamics is of fundamental importance for our understanding of both the ecology of these systems 2 as well as design and control principles underlying general complex systems 3 .
A key challenge in the study of collective animal behavior is understanding how groups of organisms make decisions as a whole 4 , for example about where 5 or when 6,7 to go. Group decision-making processes range from despotic to shared 8 , although even in systems with shared or distributed decision making there are likely inter-individual differences (e.g., sex, rank, personality, size, nutritional state, informational state) that produce asymmetry in influence. Models suggest that such heterogeneity is potentially important to group-level dynamics 9,10 , but inferring differential influence and leadership from empirical data, though often attempted, is an open challenge. As we elaborate in some detail in this paper, a key step toward tackling this challenge lies in the recognition that the notion of leadership is not merely a simple, unidimensional concept. Instead, a rich palette of different types and forms of leadership often coexists, even for the very same system. Thus, we argue that a precursor step to the "correct" inference of leadership is the clarification of what (type of) leadership is sought of. Without such, any inferred leadership can potentially be deemed inappropriate.
The need to distinguish between the definition and inference of leadership is standing out as a central problem partly because of the acceleration of technical progress that enabled collection of "big" data. For example, new technologies to collect the simultaneous trajectories of all members of a mobile animal group 11 , along with increases in computing power, make the near future a fruitful time to meet this challenge. Will having large amount of real-world data alone be sufficient to address questions about leadership, or do we (still) need conceptual advances? As recently reviewed by Strandburg-Peshkin et al. 12 , most efforts to infer leadership have used position within a group 13-16 (e.g., leaders are assumed to be at the front), initiator-follower dynamics 17,18 or timedelayed directional correlations [19][20][21][22][23] . Information theoretic measures provide additional, potentially more powerful and less subjective, tools to infer leadership and influence [24][25][26] . However, a central viewpoint of this paper is that any measurement of leadership needs to start by clarifying the particular type or form of leadership one is after. Without such clarification, the "leadership" resulted from the application of any inference method can be subject to misinterpretation, and perhaps more seriously, lead to fundamentally flawed conclusions about the interaction mechanisms of an animal system.
To illustrate the many facets of leadership and thus the need to distinguish between its definition and measurement, consider, for example, the case of migrating caribou. Older, more experienced individuals are thought to guide the migration-scale movements 27 , however, pregnant or nursing females might have increased nutritional requirements 28 and thus guide movements along that path towards habitat with better forage opportunities 29 . Therefore, who is leading depends on the time-and length-scale of the movements considered. Additionally, for some populations fall migration coincides with the rut, so mating behaviors drive social interactions: a dominant male may attempt to herd females or drive other males away. Such a male is certainly influential, but perhaps should not always be considered a leader, at least in the context of the migration. Finally, whether or not an individual is a leader might depend on who (or which group) one is considering as a potential follower. A nursing (and thus infertile) female might be ignored by the libidinous male, but will be closely followed by her calf 30 . Because there are many scales and types of influence/leadership, we argue that one should begin such explorations with a clear question and select analytical methods to match.
The central goal of this paper is to develop a formal language and multifaceted framework for defining and (potentially) inferring the many aspects of leadership. In addition, we aim to provide a set of leadership-oriented toy models to serve as a proving ground for leadership inference methods. Thus our work here offers a practical language and set of tools for researchers hoping to match questions about leadership with the appropriate methods while avoiding potential pitfalls. We hope that the combination of mathematical rigor, biological intuition, together with several real and synthetic examples will make our framework accessible and interesting to both biologists and applied mathematicians.

II. GENERAL MATHEMATICAL FRAMEWORK
To capture various forms of leadership, consider dynamics of individuals (with potential interactions among them via a network) together with dynamics of the group determined by the individuals, modeled by the general form of ODEs: (1) In this general model class, x i (t) represents the state of the i-th individual at time t (i = 1, . . . , n), S = [S ij (t)] n×n is the (time-dependent) adjacency matrix (also known as the sociality matrix) of a network encoding the structure of interactions, where S ij = 0 if it is possible for j to (directly) impact the state of i. Furthermore, µ i (t) denotes the parameter (vector) associated with i, and ξ i (t) is noise. Here a "parameter" can be anything that describes the heterogeneity of the individuals in the group. For example, in the Viscek model 31 , the parameter µ i can represent the preferred direction an individual takes, or it can also be used to represent the speed of an individual that might differ from one to another, or both by associating a parameter vector to each individual. The function f models how the dynamics of each individual depends on their own state and parameter(s), the state of others in the network, and noise. Finally, the state of the group, y(t), is determined by the state of the individuals through the function h; for example, taking h(x 1 (t), . . . , x n (t)) = 1 n n i=1 |x i (t)| defines the group state as the average of the individuals states.
A separate and complementary perspective is to model/represent the individual and group dynamics as a multivariate stochastic process, focusing on stationary variables X i (t) and Y (t). From this perspective, the relationship between the group variable and the variables are encoded in the conditional distribution function where t − = (t − τ, t) denotes time history of the system, taking into account a time lag of τ ∈ (0, ∞). We point out that there is intimate connection between a dynamical system [such as one defined by Eq. (1)] and a stochastic process, generally through an underlying (ergodic) measure 32 , where the uncertainty associated with the state of the variables is generally related to the distribution of initial conditions and noise in addition to the coupled dynamics. For a deterministic system, the randomness initiates exclusively from (experimental) imperfection of choosing and determining the initial condition, and the evolution of uncertainty can be treated as a stochastic process. Thus entropy methods are naturally associated even with otherwise deterministic dynamical systems Eq. (1) in terms of the associated stochastic process.
From the stochastic representation (2) of the dynamics, we can define an individual's (observed) influence on the group using various forms of conditional mutual information. For example, the (unconditioned) mutual information (MI) measures the apparent influence of i on the group, aggregated over both direct and indirect factors. On the other hand, after factoring out indirect factors, the "net" influence of i on the group can be measured by the conditional mutual information (CMI) whereī = {1, . . . , i − 1, i + 1, . . . , n}. As suggested recently by James et al. 33 , Eq. 4 may not capture influence entirely, therefore care should be taken when quantifying net influence in this way. Note that Eq. (1) itself does not uniquely determine the distribution in Eq. (2), due to the possibly different states/trajectories the system can follow depending on initial conditions, parameters, and other factors; unique ergodicity and fixed parameters are possible assumptions if we wish to discuss uniqueness. Equation (1) can be interpreted as modeling the possible interactions among the individuals, although these interactions may or not not be realized in a particular setting depending on the states the system operates in; on the other hand, the PDF in Eq. (2) encodes (intrinsic) dependence between the group variable and those of the individual variables without necessarily matching the structural information in Eq. (1), even if such dependence comes from dynamics of Eq. (1).
Next, we distinguish between intrinsic states of the system versus observed states, as a key aspect in mathematical interpretation of any process, including group roles of leadership, is the concept of measurement of observables, from the underlying process. In fact, the concept of leadership and information flow can be dramatically obscured depending on the details of the observables (extrinsic variables) relative to the underlying system (intrinsic variables). We usex i (t) to represented the observed state regarding x i (t), and similarly,ŷ(t) for the observed state regarding y(t). We represent the observations over a finite time window, producing observational data Proper characterization and interpretation of leadership requires the (subjective) identification of a "reference frame", namely, choosing the (observable) variables, groups, as well as time and spacial scales. That is, we argue that defining such a frame needs to include making at least the following three choices: 1. Variables (e.g., position, velocity, acceleration, direction of motion or some combination of these).
Depending on the choice of variables, different types of leadership can be defined and (potentially) identified.

Temporal resolution and time lag.
What is the temporal resolution of the actions of interest (e.g., seconds, days, or years)? Additionally, there is an issue of time lag. How far into the future is an action thought to have potential impact? If the time lag is larger than the time-scale of the typical response to an individual's action, then each individual will appear to have a similar random influence on the others. On the other hand, too small of a time lag might prevent detection of the (time-delayed) dynamics of the group in response to an individual's actions.
3. Definition of a group and what it represents. For example, a group can contain everyone within a spatial domain, or can be a certain class of individuals based on age, gender, etc.

III. PRINCIPLE COMPONENTS OF LEADERSHIP
In broad terms, we define leadership as an individual having asymmetric potential to impact the trajectory of agents in the group. As we explore below, the source of this asymmetrical impact or influence may be due to group structure, individual information or emerge from social interaction rules alone. Further, the distribution and time and length scales of the resulting leadership may vary considerably. In this section we construct a series of informative classifications which we will refer to as the components of leadership. We further divide these components into sources and characteristics of leadership.

A. Sources of leadership
Structural Leadership. Structural leadership encompasses a wide range of leadership which fundamentally relies on the structure of the animal society. This structure could be an explicit dominance hierarchy, or more subtly due to unequal social influence due to semi-persistent traits (e.g., age, gender, reproductive status). Depending on the particular taxa, the driving mechanism for such asymmetric interactions differ and deriving such a mechanism is not the purpose of this manuscript. For simplicity, we assume all of this rich societal structure has been pre-encoded in the sociality matrix defined in Eq. 1. In particular, S ij = 0 if and only if j has the capacity to lead i directly. Where "capacity to lead" is defined by the particular society.
To formalize this component of leadership, let G be the directed graph associated with the sociality matrix S, where there exists an edge from j to i if S ij = 0. For each node ∈ G, denote the reachability set of node as F . In particular, node k is a member of F if there exists a directed path from to k in G. If F = ∅ then is defined to have capacity for structural leadership. We define the set of individuals with non-zero capacity to exhibit structural leadership (have a nonempty reachability set on the sociality matrix) as L. Of course, the degree to which an individual is a structural leader exists on a continuum. Quantifying the strength of such leadership is a highly non-trivial and potentially systemspecific task (e.g., [34][35][36] ). However, to first order, individuals with many individuals downstream of them and fewer individuals upstream of them in the sociality matrix will tend play a stronger leadership role, or at least have the potential to do so.
In our caribou example from the Introduction, we might expect to find strong hierarchical relationships between males during the rut. With these hierarchies encoded in the sociality matrix then the dominant males would be labeled as strong structural leaders and the weaker males would be members of various reachability sets. In the same example, if a nursing offspring closely followed their mother, then the mother would exhibit structural leadership over her calf. Finally, note that while the mother is a structural leader to the calf, she may be influenced by a dominant male; making this mother a structural leader and a follower simultaneously and making the male an indirect structural leader of the calf. Therefore a binary classification of 'leader vs. follower' is generally not appropriate.
To further illustrate this point, consider the canonical Here each directed edge represents the capacity for an individual to lead as defined by Nagy et al. 19 . For example, L has the ability to lead J, and J has the ability to lead no one.
example of hierarchical dynamics in pigeon flocks from Nagy et al. 19 depicted in Figure 1. In this example, nodes C and J have no structural leadership capacity as they have empty reachability sets. All other nodes however have the capacity to lead at least one other individual and thus all have some degree of structural leadership capacity. Notice, that with the exception of node A, each of the remaining individuals both lead and follow, i.e., they have non-empty reachability sets and are also members of others reachability set. The strength of their structural leadership would roughly mirror their vertical position in Figure 1. Structural leadership is simply the capacity for a member of an animal society to lead other members of that society as dictated by the societies rules. In this sense structural leadership should really be seen more as a necessary but not sufficient condition for leadership to occur within a mobile animal group. However, in reality this component of leadership is quite important because it encodes the potentially important heterogeneity in interactions between specific pairs of individuals and more generally any hierarchies in the group.
Informed Leadership. Informed leadership arises when a subset of the group are differentially informed and motivated to act on that information, e.g., a subset of the group senses a resource 37,38 , or has information about a migration route 5,39 . Such leaders may be anonymous 9 , or may indicate that they have information, for example by changing speed 40 or signaling 41 .
In the case of our migrating caribou, both the experienced individuals leading the long-scale migration movement, and the individuals responding to local food and predation cues provide complementary examples of informed leadership.
Informed leadership generally arises from some underlying intent or motivation e.g., hunger or fear. For this reason, while the concept of informed leadership is intuitively sensible, from a mathematical standpoint it is both difficult to define and perhaps impossible to accurately infer without additional knowledge of the system.
Target-Driven Leadership. Target-driven leadership is a specific subset of informed leadership. A targetdriven leader is an informed leader ("informed by target") that uses a series of deliberate control inputs such as calls, explicit motions, etc. to guide a group toward a particular target state or set of target states. However, not all informed leadership is target-driven. For example, when an individual from a group of animals detects a predator, that individual becomes "informed" and tries to move away, and such abrupt change of motion may cause the rest of the group to follow. In this case, the first-reacting individual exhibits informed-leadership, but its sole "target", if any, is to move away from the predator instead of trying to lead the entire group away from the predator.
To be more precise, we characterize a target-driven leader as an individual that not only influences the group, but deliberately controls the group toward some target state. In addition, the removal of such an individual should result in the group not going towards the target state. Mathematically, we define this component as follows. Given that A is a set of target states, then individual i is a target-driven leader (with respect to A) if the net influence of i on the group (see Eq. 4) is nonzero and That is, the individual directly influences the group as a whole and that influence results in the group progressing toward the target states. An example of a target-driven leader is a sheep dog. These dogs runs behind a group of sheep and through an intentional series of signals such as barking, eye contact and body posture the dog deliberately controls the sheep herd toward a given target state such as a barn or field. Emergent Leadership. Asymmetrical influence, and thus leadership, may arise from social interactions rules alone, in the absence of social structure or differential information; we term this emergent leadership. This would be the case if animals used anisotropic social interaction rules. For example when individuals are more influenced by other individuals that are in front of them, then individuals in more frontal positions of the group are more influential, even if they have no additional information, motivation or status. Such emergent leadership has recently been shown to be the case in our migratory caribou example 30 .
Alternately, if individuals are more influenced by faster-moving group-mates 42 , then those faster-moving individuals will have more influence. If those individuals are moving more quickly in response to information, or to signal dominance, then this would be informational or structural leadership, respectively, but if the increased speed is purely a function of the group dynamics, this would be an example of emergent leadership.

B. Characteristics of Leadership
Distribution of Leadership. In animal groups decisions range from full distributed among all group members ('democratic') to dominated by a single or a few individuals ('despotic') 8,12 . It can be informative to quantify the number of individuals involved in a leadership role within the group. Similar to 12 we refer to this as the distribution of leadership which we define on a continuum that lies between centralized and distributed leadership.
At the scale of the entire herd, we might expect our migrating caribou to fall somewhere on this spectrum, bookended by primate societies with an alpha individual on one end and leaderless fission-fusion fish schools on the other. If we consider the mother-calf pairs as subgroups, we would expect the mother to be a centralized leader. However, in a larger group containing many such pairs, we would expect distributed leadership shared between the mothers. The pigeon example in Figure 1 illustrates that many systems fall somewhere between these two extremes. In this example nearly all of the individuals have some influence, yet it has a clear hierarchy so it is not fully decentralized; it therefore lies somewhere between centralized and distributed.
Temporal Scale of Leadership. A leader may not be actively influencing the motion of other agents at all times and it is thus useful to quantify and understand the time scales for which a leader qualifies as a leader under any of the components of leadership. Here, we consider two notions of time scales-consistency and granularity. For the following discussion consider dynamics of individuals, represented by discrete-time observations {x i (t)} T t=0 . Consistency of leadership is simply defined as the proportion of the observation window for which a leader qualifies as a leader. More specifically, we classify leaders as persistent over the observation window if it is identified as a leader for the entire time window. Conversely, we classify a leader as ephemeral if it only qualifies as a leader for some small time window [t, t + τ ], with τ T . A similar temporal leadership scale is presented in 12 which ranges from variable to consistent but attempts to capture the same notion.
The granularity of leadership concerns the resolution of time steps for which an individual acts as a leader. For example, a leader for daily activities might be different from one that is for seasonal activities. We can check for granularity by altering the time step we examine the dynamics under. In particular, quantify leadership using only the observations {x i (kt)} T /k t=0 (k > 1) for a large range of k. If a leader only acts on a coarse basis then they may not register as a leader for small k but may then register as a leader for some larger k. In contrast a fine-scaled leader may register for many k.
In our migrating caribou example, the experienced individuals leading the broad migration path exhibit leadership that is persistent, but perhaps has coarse granularity. In contrast, the leadership of those animals responding to resources or predation threats along the way is ephemeral and has fine granularity.
Time scales present several challenges when attempting to infer leadership roles from a time series. If the granularity or observation window length do not match the natural time-scales of leadership then leadership events may be completely missed or misclassified. For example, consider a structural leader with the property that I(x i (t − ); y(t)|xī(t − )) = 0, i.e., a structural leader that does not directly influence the group-although it has the potential to. Regardless of the inference method such a potential leader will always be misclassified. Similarly consider an informational leader that only leads when they are within some radius to a known resource. Say that this event only occurred for a very short time window [t, t+τ ], with τ T . If you only consider leaders that lead for the entire observation window, most aggregate measures will wash out such an ephemeral leadership event. For these reasons, carefully considering both consistency by studying sub samples of the data set as well as granularity by down sampling the data and retesting one will be able to obtain a much clearer picture of the leaders that are present in a mobile animal group.

Reach of Leadership.
The reach of a leader quantifies the members of the group that the leader has potential influence over, directly and indirectly through subsequent interactions. Formally, we define the reach of a leader as the members of that leaders reachability set on a graph associated with a particular source of leadership. In particular, let G be a graph where there exists a directed edge from node j to node i if j has the capacity to lead i, where capacity to lead may be structural, emergent or informed leadership. Then the reach of agent i is the reachability set of i on G.
Consider Figure 2, where the graph represents the potential for structural leadership. In this example, individual G has a reachability set of {D, B, H, L, I, C, J} and thus those 7 agents are within the reach of structural leader G. Reach naturally lies on a continuum between local and global. If an agent exemplifies some form of leadership over all individuals this would be global reach; if an individual only leads some small subset of the group then this leader is considered local. In Figure 2 Agent A has global reach and agent I has local reach.
In the case of our migrating caribou, the experienced migrants leading the entire herd on its broad migration path, would have global reach, while the mother leading her calf on a finer-scale would have local reach.
Observability of Leadership. When we observe an animal society we do so imperfectly, mainly in two ways. First, any observed quantity is subject to noise and measurement errors. Secondly, and perhaps more importantly, there may be elements of the society which go unobserved. Such hidden variables and states may in turn act in our interpretation of leadership. In fact, the strongest leadership might not be detectable if the data are not appropriate. Across various taxa, leaders may use vocal cues 43,44 , gestures 45 or movements that are too fine to be picked up by GPS (e.g. pre-flight flapping 46 ) to initiate or control movement. If the resulting movement is synchronized, leadership inference based on trajectories will fail. Worse, if in the resulting movement, the least dominate individuals respond first to the cues, it could appear as though those individuals are leading.
In the case of our migrating caribou, lead animals may stand up to signal departure, or motivate others to start moving. This would not be captured by GPS tags and so would be hidden to inference methods based on trajectories alone.
Quantification of hidden leadership in practice is quite difficult by definition. Namely, if you have detected leadership it was observed. Doing this in theory however is quite trivial. As defined in Section II we define the full system dynamics via x, y, S (and/or some mix of these). When the system is being observed, the observed variables, denotedx,ŷ, andŜ, can differ from the true ones. We term an individual's leadership role 'hidden' if it exhibits leadership defined in terms of the intrinsic variables (x, y, S) but does not appear to do so given the observed variables (x,ŷ,Ŝ); a leader that is not hidden is then called an observable leader.

C. Real World Animal Behavior and the Anatomy of Leadership
Here we discuss real world animal interactions, and we do so in a manner to emphasize the terminology of our anatomy of leadership taxonomy.
We expect to find structural leadership in relatively stable animal groups, often having complex social hierarchies, such as cetaceans, wolves, wild dogs, elephants and primates 15,[47][48][49][50] . The canonical example is the so-called 'alpha' individual in a primate society, who has some level of control over an entire group over a long period of time (assuming he society is stable) 51 . In our taxonomy, this dominant individual would be a persistent, centralized, structural leader with a large reach. It is important to note that in such societies, structural leadership may be well correlated with informational leadership. For example, a matriarch elephant may have better information about rarely visited water holes, as well as greater percapita influence to lead her group to them.
We expect informational leadership to dominate in animal groups composed of unrelated individuals and unstable membership (i.e., fission-fusion dynamics), such as fish schools and bird flocks. A single arbitrary member of a fish school may perceive a respond to a threat, causing those around it to also startle, or the entire group to make an evasive maneuver 52 . This is an example of centralized, ephemeral, informational leadership with a limited or global reach, depending how much of the group responded. Similarly, some fraction of the same school might have information about where or when a food resource might occur and lead the entire school to that time-space location 37,38,53 . In our terms those fish are distributed, ephemeral, informational leaders with global reach.
Informational leadership is also common for movement at long length scales. In flocks of pigeons, better informed individuals act as leaders during homing flights 54 . (However, it should be noted that pigeons also exhibit a structural hierarchy too 19 .) During migratory movements, older, more experienced, birds guide groups on efficient migration routes 5,39 . In both of these examples the informed birds are centralized, persistent, target-driven, informational leaders with global reach.
In migrating white storks some individuals actively seek thermals updrafts, which are necessary for them to get efficient lift to complete the migration, while others tend to copy, by moving towards individuals who are already in thermals 55 . This is a specific example of a general phenomenon, emergent sensing 5 , in which a group spans an environmental gradient and individuals in the 'preferred' end of the gradient alter their behaviour (purposefully or not) in a way that cause the entire group to climb the gradient 41,56 . In general such leadership would be distributed and ephemeral (although could be persistent if, like in the storks, the same individuals always find the thermals) informational leadership with global reach.

IV. A MODEL SANDBOX FOR VALIDATING LEADERSHIP-INFERENCE METHODS
Ultimately one would like to develop methods to infer and classify leadership from empirical data. This is of course a long-standing and non-trivial challenge, and a pragmatic approach is to first test inference methods on simulated data where the leadership type and distribution is known because it is programmed in explicitly. For mobile animal groups an obvious starting point is to modify classic flocking/schooling/herding models (e.g., 31,57,58 ) to include known leadership structures. In this section we first describe a canonical model of collective motion -the so-called zonal model 9,58 . Following that, we modify the model to incorporate the various leadership sources and characteristics described in this paper.   figure). The outer ring is the attraction zone A and the focal individual attempts to get closer to these agents (the green triangles in the picture). The resulting desired direction is then the sum of the green and blue vectors.

A. Basic Collective Motion Model
The first zone to consider is called the repulsion zone and is denoted by R. This zone ensures that 'personal space' is maintained for each agent. If any other agent is in the repulsion zone R, for focal individual j, then the desired direction in the next time step is defined by This desired direction ensures that a collision will not occur at time t + ∆t. However, if for the focal individual R = ∅ then the focal individual attempts to get closer to agents in their attraction zone A and orient with agents in their orientation zone O. This is accomplished by choosing a desired direction at time t+∆t in the following way: Where α is a parameter that controls the relative strength of attraction and alignment. For example, a flock of geese -dominated by alignment -would have a relatively low α, while a swarm of insects -dominated by attraction -would have a relatively high α.
The desired direction vector, d, is normalized to a unit vectord(t + ∆t) = di(t+∆t) |di(t+∆t)| . Next, to represent uncertainty stemming from limitations of sensory and cognitive abilities, this unit vector is transformed into d (t + ∆t) by rotating it by a small angle drawn from a circularwrapped Gaussian distribution centered at zero. Finally, it is assumed that individuals can turn at a maximum rate of θ radians per unit time. Therefore, if the difference between an individual's current direction, v i (t), and its desired direction for the next time step, d i (t + ∆t), is less than θ∆t then the desired direction is achieved and v i (t + ∆t) = d i (t + ∆t). Otherwise, that individual's direction v i (t + ∆t) is the result of rotating v i (t) by θ∆t radians towards their desired direction d i (t + ∆t).
After the heading is assigned, the position at t + ∆t can be computed by where s i is the speed of individual i.

B. Explicitly Adding Sources of Leadership
While this base model captures a wide variety of flocking, swarming and schooling behavior it does not account for leadership explicitly. In order to explicitly test leadership inference methods, it is helpful to make a few simple modifications to this base model: (1) add a sociality matrix 59 (structural leadership) (2) add "informed" individuals to the group 9 (informed leadership) and (3) make interaction rules isotropic 58 (emergent leadership).

Structural Leadership
To add structural leadership we introduce a sociality matrix S = [S ij ] N ×N . S ij = 0 if agent i can be influenced by agent j. More generally, S ij is a continuous value that gives the relative influence of individual j on individual i. To take this into account the desired-direction computation is modified to weight the influence of each neighbor relative to S ij (rather than an equal weighting of everyone in A and O, i.e., Adding this sociality matrix to the base model allows for structural leadership to be explicitly built in. This is an advantage as you can then see if post-facto if the structural leadership placed in the model can be extracted by a candidate inference method. Informed Leadership To simulate informed leadership, a subset of the agents are given knowledge of a preferred direction g (more generally each informed agent is given their own not necessarily equal preferred direction g i ) 9 . This preferred direction may be part of a migration route or the direction of a prey or known resource. Noninformed group members, have no knowledge of g and may or may not know which individuals are informed. Following Couzin et al. 9 , to integrate this into the model the informed individuals balance between the social interactions and their preferred direction with a weighting term ω. In particular, informed individuals have a desired dircetion d , given by If ω = 0 the preferred direction is completely ignored and only social interactions are followed. As ω increases toward 1 the influence of the preferred direction is balanced with influence of the social interactions. With ω > 1 the preferred direction is favored over social interactions.
Emergent Leadership One way to make a test case for inferring emergent leadership, is to make interactions spatially asymmetric. In particular, one can simply add 'blind zones' 58 to the model described in Eqs. 7-9. In this case the zones A and O are missing wedges behind them and individuals in those wedges are ignored. If these blind zones are large enough, individuals are more influenced by individuals in front of them 30 .

Distribution of Leadership
Using the framework presented in the previous sections one can explore a variety of distributions of leadership, ranging from centralize to distributed 12 . For structural leadership, the spectrum could range from a sociality matrix with a hub structure (centralized) to a one with a random dense connectivity, or even fully connected (decentralized). For informed leadership, the fraction of the group having a non-zero value of ω would roughly span the spectrum of the distribution of leaderhip. We note that the distributions of structrural and informed leadership are potentially orthoginal. For example a group could have highly centralized structural leadership in tandem completely distributed informational leadership, or vise versa.
Temporal Scale of Leadership Temporal consistency and granularity of leadership can be built into this leadership model by making the model parameters associated with leadership time dependent, e.g, [S ij (t)], ω(t) and g(t). For example, one could remove or change the preferred direction at regular time intervals by defining time-varying ω(t) and then see if an inference algorithm could detect this change.
Reach of Leadership By setting specific examples of the sociality matrix one can experiment with a variety of leadership reach scenarios and test the ability of various inference measures to recover them.

Observability of Leadership
There is a vast set of variations that could be made to the framework presented here to encode potential for leadership to to be driven by non-trajectory based cues or signals [43][44][45][46]60 . One obvious example (which is also ubiquitous in nature) is auditory signalling, which could provide long-range interactions 41 .

D. A Potential Pitfall: Influence vs. leadership
Consider a mobile-animal group where each member is governed by Eq. 9 and the direction is decided by Eqs. 7 & 10. Furthermore, define S i(i+1) = 1 and 0 otherwise for i ∈ {1, . . . , N − 1} and let α ∈ [0, 1]. This describes a simple chain topology, where each individual has the capacity for structural leadership over at most one other agent. In particular, each agent orients and attracts to (follows) at most one other agent in the group. However, it is important to note that every agent avoids collisions with all other agents (the sociality metrix applies to Eq. 10, but not Eq. 7).
In this example the incidental social interactions, such as those caused by repulsion, cause a real problem for the majority of influence/causal inference algo-rithms. For example, if one blindly applied optimal causation entropy 26 or transfer entropy 61 to infer who leads whom then these algorithms would conclude an all-to-all leadership graph. By construction however we know this is incorrect and that the underlying influence graph is a simple chain. The issue here is that these measures 26,61 , and causal inference from information in general, are not explicitly measuring leadership but reductions in uncertainty about a particular variable. In this example, the minor local repulsion interactions cause enough "information flow" over time to trigger these algorithms/measures. However, as discussed in Appendix A conflating influence, information flow, causality and leadership is a non-trivial challenge, which is nicely highlighted by the present example.

V. AFTERWARD
Traditional approaches to leadership inference have focused on a single defining characteristic, e.g., position within a group, social hierarchy, information flow or influence. We believe that, in general, none of these concepts alone fully captures leadership. In this manuscript we have begun to show that a multifaceted approach where multiple axes of leadership are analyzed provides a more complete classification of the leadership structure. This formalism should serve to link questions about empirical systems with the appropriate analytical tools to address those questions. While this taxonomy we provided is surely not complete we hope that this effort will serve as a starting point for formalizing a multifaceted approach to leadership inference.
Multiple technological advances in sensors, computer vision have led to the availability of more high-resolution collective motion data than ever before 11 . As such the near future is an opportune time to make meaningful advances in leadership inference. Causal inference and information theory show a lot of promise in this arena but as we have shown throughout this manuscript leadership is a highly intricate and multifaceted subject and neither causal inference nor information theory may be up for the task alone. We hope that as new inference algorithms come to be the formal language and toy models developed here will serve as a proving ground. We believe that being able to carefully classify the components of leadership being inferred will be invaluable for practitioners and theorists as they begin to tackle all the high-resolution data as it becomes available. Information theory provides sophisticated measures for rigorously quantifying concepts like "the reduction in uncertainty about the present state of X given past states of Y ." As such, these measures are often associated with concepts like information flow, causality, influence and even leadership-and often all of these terms are used interchangeably. These measures are often viewed as less subjective inference methods because almost no assumptions need to be made about the structure of the system being observed. As a result, information theory has become a popular tool for inferring leadership from time series [19][20][21][22][23][24][25][26]62 . However, while influence, information flow and causality are all closely related to the notion of leadership, these concepts are inherently different and therefore are not readily interchangeable. Furthermore, recent work has begun to show that these information measures fail to even capture information flow 33 let alone leadership.
The following appendix discuses information flow, causality and influence and provides motivation for why we do not believe any of these alone fully quantifies leadership.
Information flow and entropy, as we have argued in previous mathematical works 24,32,63 , is a fundamental concept in coupled (dynamical) systems, and the associated stochastic processes. Information theory, as formulated upon Shannon entropy and its variants, basically describes the average "surprise" one should attribute to observing a specific value or state of a random variable. More formally, such quantification of surprise or (un)predictability is referred to as "entropy" and can be defined rigorously as a function of the underlying probability distributions. When the time evolution of multiple variables are considered, the state of a variable often depends on the history of a set of related variables, and such inter-variable dependencies can be viewed as "information flow". Explicit characterization of information flow in coupled systems can be done by quantifying how informative (again as a notion of surprise) one should be in measured observations conditioned on given previous observations, giving rise to commonly used measures such as transfer entropy 61 and causation entropy 26,64,65 . In other words, information flow describes the reduction in uncertainty regarding forecasts for predictions associated with conditioning on the past in various combinations. Thus whether by Granger causality 66 , transfer entropy 61 , causation entropy 26,64,65,67 , or some other method, the idea is to ask if there is a reduction in uncertainty with knowledge of the past of a perhaps coupled variable. Clearly, this question is universally relevant from a wide range of scientific fields of science or mathematics. However, part of the theme of this paper is that these information flow concepts themselves are not sufficient or equivalent as leadership.
Causation is a related but not identical con-cept as information flow. The notion of causality has many interpretations, depending on the context, from philosophical [68][69][70] , to statistical [71][72][73][74][75] , to dynamical 61,64,66,76 . Here we will avoid the philosophical direction entirely, but note that some of these do coincide with the others. Statistical perspectives are sometimes relevant to a stochastic process, especially from the influential work of Pearl 71-73 , associated with a calculus for understanding interventions, but not always relevant to our context. We are more so interested in understanding interpretations of causal influence, of a free running system, that is, a system that is passively observed rather than actively probed. As such, this relates more closely, almost synonymously to the concepts of information flow in a stochastic process, but not quite identically. We take the same perspective as Granger in his line of reasoning that eventually lead to the 2003 award of the Nobel Prize in Economics; Granger's fundamental principles were that 1) cause happens before effect, and 2) a cause necessarily contains unique information concerning future states of its effect 66 . In details the so-called Granger causality is a specific computation that assumes a linear stochastic process, and as such, it was shown 76 to be entirely equivalent to transfer entropy computed by other means (in information theoretic by the Kullback-Liebler divergence appropriately conditioned) in the special case of a linear stochastic process with Gaussian noise. So said, while the underlying principles of Granger are the same, the details of computation may differ.
Influence can now be described within this formalized framework as related to, but somewhat distinct from leadership, depending on if we are relating interactions between agents in terms of information theory, reduction of uncertainty, or some other underlying principle, including the potential goal of controlling the system. Consider that some agents in a group may be leaders, with various ways to interpret this phrase to be stated subsequently below. A measure of leadership may be associated with information flow for example, or as a proxy for causal influences that leaders may change states, before other agents, a concept which will follow analogously to cause that comes before effect. An influential member of a group is not necessarily a leader, although in some sense influence is a kind of leadership de facto in the sense that influence is comparable to the possibility to cause others to change their behavior (dynamics).
So said then what is the difference between influence, causation, and leadership, from the perspective of information flow? In some interpretations then, influence or causation over others and leadership are almost synonymous but with important distinctions. When leadership is viewed through the lens of reduction of uncertainty (thus measurable by causation inference and information flow), then causation and influence becomes a synonym for leadership. Therefore, if a leadership action is active and observable, then causation and information flow are relevant concepts that enable one to define and empirically score the leadership. However, there are other notions of leadership that are clearly beyond the scope of information flow. Herein, by using a taxonomy of leadership, we expand beyond the typical causation and information flow concepts 24,25,37 to allow for those features which may be missed through the narrow interpretation of entropy, including structure, degree to which agents are informed, distribution, time and space scales, and target-drive are some of the other aspects that we will discuss here.