A Hierarchical Exact Accelerated Stochastic Simulation Algorithm

A new algorithm,"HiER-leap", is derived which improves on the computational properties of the ER-leap algorithm for exact accelerated simulation of stochastic chemical kinetics. Unlike ER-leap, HiER-leap utilizes a hierarchical or divide-and-conquer organization of reaction channels into tightly coupled"blocks"and is thereby able to speed up systems with many reaction channels. Like ER-leap, HiER-leap is based on the use of upper and lower bounds on the reaction propensities to define a rejection sampling algorithm with inexpensive early rejection and acceptance steps. But in HiER-leap, large portions of intra-block sampling may be done in parallel. An accept/reject step is used to synchronize across blocks. This method scales well when many reaction channels are present and has desirable asymptotic properties. The algorithm is exact, parallelizable and achieves a significant speedup over SSA and ER-leap on certain problems. This algorithm offers a potentially important step towards efficient in silico modeling of entire organisms.


Introduction
Computational biology is moving toward ever more complex, comprehensive and detailed biological models. It is becoming increasingly important to simulate and understand these models computationally. The Stochastic Simulation Algorithm [11] (SSA) was introduced to exactly sample the Chemical Master Equation and has seen widespread adoption.
The original SSA iteratively samples reaction events in a way that requires O(R) computational steps per sampled reaction event, where R is the number of reaction channels. This can be prohibitively slow when there are a large number of reaction channels or reaction events.
This fact together with the importance of the SSA has inspired a slew of SSA acceleration techniques [10,4,22,13,16,3,2,33,25]. The work of Gillespie [12] and its recent variants [7,5,6] reduces the total number of reaction events that need to be sampled but does so at the cost of accuracy. Additionally, the work of Gibson and Bruck [10] reduces the amount of work per simulated reaction event to log R. The work of Slepoy et al. [29] ups the ante further by finding the next reaction event to sample in O(1) time using rejection sampling under assumptions reasonable for biochemical networks.
There have been recent advances in consumer level multi-core CPU technology. There are indications that next-generation CPU technology is moving from maximizing single-core speed to increasing the number of cores by orders of magnitude. There has been work on the parallelization of SSA via GPUs [21,16,19] and multicore CPUs [9]. However, the parallelization was used to speed up sampling of many trajectories rather than speeding up each trajectory in a large system. Multicore GPUs and CPUs have not been effectively used to speed up the sampling of a single Chemical Master Equation trajectory exactly. Arguably this becomes the dominant problem when extremely large systems are being studied. For example, the E. coli genome has been estimated to have about 4400 gene products [28]. This fact suggests that tens of thousands of molecular species will needed to be present if an E. coli specimen is to ever be comprehensively modeled in silico.
Relatively little work has succeeded in reducing the number of reaction events sampled without introducing bias. While the work of Riedel and Bruck [26] is able to skip over cyclic states (eg loops), this method of reducing work does not apply to reaction networks with little state cycling. The previous work of the present authors (ER-leap) [25] is a leaping algorithm and was the first known general method to effectively reduce the number of SSA iterations sampled without sacrificing accuracy. This method scales well when reducing the number of SSA iterations. However, this method does not scale well when many reaction channels are present.
The currently proposed work describes a new SSA-equivalent algorithm that can take advantage of parallel hardware, and additionally provides an algorithmic speedup for systems with many reaction channels. Like ER-leap, this "HiER-leap" (Hierarchical Exact Reaction-Leaping) algorithm achieves these advances without the loss of accuracy. The HiER-leap algorithm uses a divide-andconquer strategy to independently sample sparsely connected submodules of the reaction network, in a way somewhat similar to ER-leap. HiER-leap then performs a network-wide synchronization using rejection sampling. As will be shown, this synchronization step is efficient for "reasonably" independent submodules. The acceptance probability associated with synchronization is asymptotically equal to one as the number of reaction channels goes to infinity. This implies that the majority of the work will take place during submodule sampling, which may be performed in parallel.
This work therefore presents a potentially important step towards organism-scale simulation.

Background Theory
The new HiER-leap algorithm begins its derivation from the state transition distribution defined by the Chemical Master Equation after L > 0 reaction events. We then algebraically manipulate the CME until a distribution suitable for parallel sampling and synchronization is found. In many ways this derivation closely follows the derivation found in ER-leap. Therefore, this section is dedicated to recalling the notation and key equations from ER-leap [25] that will serve as a starting point for the algorithm derivation in section 3.

Notation
We define reaction channels, indexed by r, as a set of input and output species, C a , with corresponding input (m (ra) ) and output (m ′ (ra) ) stoichiometries and the net stoichiometry for a given species and reaction channel as Later, we will show the probabilities of state transitions after L "reaction events" occur. Under the Chemical Master Equation it is assumed that each reaction channel has a small probability of firing during a small time interval dt with probability equal to a r (n)dt. The vector n, possibly indexed by α for species type α, represents the quantities of the constituent species in terms of raw counts. This a r (n) term is also called the propensity or rate of reaction channel r and is defined as Note that superscripts involving "r" and related variables that index reaction numbers occur here and numerous times in the following. These are enclosed in parentheses "(r)" throughout, to indicate they are not powers but rather indexes.
In this work, it will be notationally convenient for us to keep the propensity term factored out into ρ r and F (r) n . As a brief aside, the upcoming derivations in this paper may work with other forms for F (r) n . For example, the "umbral transformation" of a Hill function, , may work as propensity function, where the falling factorial n (k) ≡ n! (n−k)! replaces n k in Hill(n; K) (or more generally in a rational function) for each power of any integer-valued molecule number n. This functional form has the advantage of being monotonic and equal to zero for n < k, as required for a stochastic version of the Hill function with discrete integer numbers of molecules.
Furthermore, we define the total propensity D I for some reaction to occur in state I as which is equivalent to equation (2). Bounds for F (r) and D, after L reaction events, are computed by bounding species counts after L reaction events. For each species identifier (ID) a, we bound the number of molecules present after these reaction events, n ′ a , by: n a + L min r ∆m (r) a n ′ a n a + L max r ∆m (r) If we introduce the notation that a tilde superscript or subscript,x or x , represents upper or lower bounding values respectively then, we can re-write the above as: n a ≤ n ′ a ≤ñ. The corresponding propensities calculated from using the upper and lower bounding states, after L − 1 reaction events, respectively are written as for any state K.

Markov Process
In the ER-leap paper [25] it was shown that the probability of starting at state I 0 and ending up in state I L after τ time elapses is for every ordered vector {R k |k = 1 . . . L−1} of reaction channel events. Each state may be uniquely transformed by a reaction as I 0 → I 1 (R, I 0 ). Furthermore, it was shown in ER-leap [25] that for any function e(r) summing over all possible orderings of L reaction events is equivalent to summing over all possible counts of reactions (eg a multinomial with L draws) and then permuting each of these draws for all unequal reactions, yielding which, when combined with equation (6), and introducing the previously defined bounds, separating out terms in e(r) which are permutation invariant, and after some algebra results in This expression can be interpreted as a rejection-sampling algorithm (last line) that corrects a multinomial approximate sampling algorithm (first two lines).

Rejection Sampling
Through the lens of rejection sampling, equation (8) represents an algorithm. Briefly, rejection sampling is a method to sample x from some distribution, x ∼ P (x), by means of an approximate distribution P ′ (x). This can be expressed algebraically since P (x) can be rewritten as assuming M ≥ 1. Equation (9) can be viewed as a mixture distribution with the probability of sampling from P ′ (x) being the "acceptance" A(x): for some constant M such that ∀ x A(x) ≤ 1.
It is now possible to see the ER-leap algorithm represented in equation (8). If in equation (8) we recognize P (x) = P ′ (x)M A(x) (equivalent to equation (9)), with M = D I 0 L−1 L / D I 0 L−1 L , then we will implicitly define a P ′ (x). This P ′ (x) has the next L reaction events sampled from a multinomial with the probability p r of choosing the r th reaction channel being equal to Furthermore, our P ′ (x) samples τ from an Erlang distribution (equivalent to a Gamma distribution with integer "shape" parameter) with rate parameter being D I 0 ,L and shape parameter being L. Finally, if needed when calculating A(x), a random permutation σ is drawn uniformly and {τ k |τ = L−1 k=0 τ k } is sampled from an L-simplex. The work in section 3 will similarly arrive at an equation representing an efficient and exact leaping algorithm for sampling L reaction events from an SSA equivalent distribution.

Hierarchical Notation
The HiER-leap algorithm uses a divide-and-conquer strategy to accelerate SSA. Evidence suggests that protein-protein interaction (PPI) networks tend to be modular [24]. These networks contain submodule clusters that interact heavily inside the cluster. Interactions with other clusters of proteins are less common. Although still an active area of research, evidence [14] suggests that similar modularity may exist in genetic regulatory networks as well. Additionally, when modeling spatial interactions [23,27,31,15,17,20], events spatially distant must interact through sparse intermediate diffusion reaction channels. In this way, it is probably common that many reaction channels are weakly coupled to the majority of other channels. This observation suggests a potential avenue towards algorithm acceleration and parallelization for large biological networks.
Notation is introduced below to describe a hierarchical organization of reaction channels. Table  (1) provides a comprehensive guide to notation used throughout the following sections. Next, following and generalizing the strategy of section 2, we will derive bounds on propensities and species. The bounds will be essential for deriving an algorithm for exact speedup of SSA for systems amenable to hierarchical organization.
Reaction channels must belong to exactly one block. A block is defined as a set of reaction channels. If reactions are "connected" by shared reactants, it is preferred that reactions should be more strongly connected within than between blocks. For this work, a two level hierarchy of reactions and blocks is used. However, it is straightforward to apply this method repeatedly to multiple levels.
Each reaction channel is indexed by its block ID r 1 , and its within-block ID r 2 , and will be designated as R = (r 1 r 2 ) for r 1 ∈ {1 . . . b} and r 2 ∈ {1 . . . b r 1 }. The "block propensity" for block r 1 and state I, denoted D (r 1 ) I is the sum of propensities of constituent reaction channels. Specifically, similar to equation (4) this means Furthermore, we denote the number of reaction events occuring within block r 1 as u r 1 . Finally, the number of events for the reaction channel indexed by R = (r 1 r 2 ) is denoted by v r 1 r 2 .

Bounds on Propensities and Species Counts
Similar to equation (5), we now develop bounds on species counts and propensities. This enables us to derive a two-scale rejection sampling algorithm in many ways analogous to ER-leap at each scale. For reasons that will become evident in section 3.3, we first derive bounds on the block propensities given L and I 0 . Afterwards, bounds will be developed on the species molecule counts and reaction channel propensities given u.
First, recall that in equation (5) we found bounds on species and propensities after L reaction events. Note that similar to equation (5) we can definẽ and a similar definition for D (r 1 ) K,L−1 .

Optimized Block Level Bounds
If it is the case that we only need bounds on the block propensities, and not individual reaction channels, then we can take advantage of "reaction event exclusion". This means that we only need to consider the sequence of at most length L reaction events which will result in the most extreme value for the sum of propensities in block r 1 . Therefore, we no longer need to assume that all species counts are at the most extreme value possible after L reaction events.
We want to find a bound closer to the optimal block propensity Unfortunately, naïvely solving this exactly for r 1 requires enumerating (b r 1 ) L possible choices for v r 1 upon every iteration. Fortunately the bound we seek, D (r 1 ) (I 0 L) , is not required to be exactly optimal. Instead we only require that Symbol Meaning x Upper bounding value for x after L − 1 reaction events. Calculated by assuming each species type will be maximal.
x Upper bounding value for x after L − 1 reaction events.
May not depend on bounding all species values and therefore may be tighter thanx.
x Upper bounding value given u.
Will often involve inner block calculations.
x , x , x Lower bounding versions of the above definitions.
The optimal value of x with respect to some objective function.
We demonstrate this falls between the requisite values and has 'nice' asymptotic properties that will be discussed later.
Derivation The idea is to find the maximum ∆ D (r 1 ) (I 0 L) possible resulting from one reaction channel firing sometime during the next L reaction events. If we determine this value, we can upper bound D (r 1 ) where ∆ D Note how this is an upper bound on D (r 1 ) is the largest amount that the block propensity may change for any of the upcoming possible (L − 1) reaction events in r 1 . Since there are (L − 1) reaction events, and the most any of them may increase D (r 1 ) (14) will always bound D (r 1 ) (I 0 ,L) . This method improves upon our previous methods, which found the maximumñ a for all species and then calculates the block propensity. Each within-block reaction channel propensity will be larger whenñ a rather than n a is used to calculate the propensity. Therefore, using the increased bound will result in a block's propensity being O(b r 1 * L) larger than D (r 1 ) I 0 . However, by calculating using equation (14) the bound will be just O(L) larger than D (r 1 ) Again, naïvely solving for ∆ D I k . If any species increases to n ′ a ≥ n a we know that D na . Therefore, if we find the reaction channel that increases the block propensity the most whenñ I 0 ,L is used for positive ∆m (r 1 r 2 ) a , we are guaranteed that there does not exist a larger ∆ D (r 1 ) where q(n, r 2 ) a = n a + ∆m as our final equation for ∆ D (r 1 ) (I 0 L) . A proof that this bounds the maximum delta possible can be found in the appendix.
This tighter bound will result in a greater acceptance ratio. The basic reason for this improvement is that we need not overestimate every propensity in r 1 by O(L), and add the overestimates up, since only L and not b 1 L reactions will occur.
Naïvely finding the reaction channel R = (r 1 r 2 ) that will increase D (r 1 ) (ñ I 0 ,L ) by a maximal amount will cost O(|R r 1 |) steps to compute. To accelerate this step further, blockwise priority queues (PQ) are used to find this R efficiently. Nodes in the PQ are reaction channels and values are the ∆D (r 1 ) (I(ñ,br 1 )) caused by each reaction channel firing. Upon acceptance of L reaction events we must update the priority queue for each block. Only nodes that interact with species which have changed, need to be adjusted. This, at worst, will be O(log b r 1 ) work for each node, although in practice the order rarely needs to change.

Propensity Bounds Given u
If we know u, the number of reaction events for r 1 and adjacent blocks, we can derive even tighter bounds on the reaction channels R r 1 * . In fact, these tighter bounds help us to efficiently increase L when larger systems are considered, as will be demonstrated in section 3.3.
We determineF (r 1 * ) by finding bounds on species counts given u. In other words, we want to find n A (r, n 0 , k) ≤n which is the maximum possible value of n Ar 1 prior to the last event in r 1 occurring. In this way n A (r, n 0 , k) will never exceed the propensity calculated fromn Ar 1 (u, n 0 ). Finding the optimal value forn Ar 1 (u, n 0 ) is straightforward. We first need to consider blocks other than r 1 which may change n Ar 1 . Since the order of reactions is unknown, we must assume that all reaction events in blocks except r 1 , written as u\{u r 1 }, occur prior to those in u r 1 . It is desired that the number of neighbors relative to the total number of blocks will be small. This will decrease n A (r, n 0 , k) and ultimately lead to a more efficient algorithm. Secondly, we need to consider reactions in r 1 . In calculating the bound, it is assumed that all (u r 1 − 1) reaction events chosen will behave adversarially. This is analogous to the method considered in section 3.2, with the modification that we will consider a subset of reaction channels. Thus will bound each n Ar 1 with respect to r 1 and u. The Kronecker delta function δ(a, b) or δ(a − b) is as usual: Finally, the propensities of reaction channels inside of block r 1 are bound as, F (r 1 r 2 ) (u, n 0 ) ≡ F (r 1 r 2 ) (u,n Ar 1 ).
As in ER-leap, lower-bounding the propensities and species is done with the same techniques as that used for upper-bounding with the restriction that propensities and species molecule counts cannot go below zero. These derived bounds are used in the following sections.

Equivalent Markov Process
Similar to section 2.2, we want to algebraically manipulate the distribution represented by the Chemical Master Equation (a special case of the Kolmogorov-Chapman equation [30]) into a form suitable for parallelization and acceleration. The hierarchical description from section 3.1 will aid us in this transformation. First, note it is possible to rewrite equation (7) into a hierarchical version with u's and v's strictly ordered such that By taking an average of e(σ(R)) and weighting by the number of ways the selection may occur we get and analogous to the way shuffling a deck of cards is the same as shuffling by suit and then, maintaining that order, shuffling by value independently for each suit, we may write and arrive at a useful form for our distribution, which is already suggestive of a block-parallel algorithm: To go further, we need to re-examine e(. . .).

Introduction of Probability Bounds
We now make use of our previously derived propensity bounds to derive a parallel algorithm. From equation (6) we have with inclusion of derived bounds, If we separate out terms based on independence of σ 1 , σ 2 and v, then We now substitute the expression for e(σ 1 (σ 2 (R))) σ 2 σ 1 into equation (17), combining terms where appropriate: AcceptBlock(v r 1 , σ 2 ; u) × AcceptF ine(σ 1 ; u, v, I 0 , σ 2 ) (18) The acceptance probabilities are as follows: Furthermore, prior to turning these equations into an algorithm, we note that we can lowerbound these acceptance probabilities. This will enable us to do an early acceptance or rejection without always doing all of the work to calculate these values exactly.

Lower Bounding Acceptance Probabilities
We begin by lower-bounding AcceptF ine(. . .). This probability requires the most work to calculate and as we will see may be bound fairly tightly. The bound only requires that τ has been sampled.
The lower bound AcceptF ine (. . .) is sought such that  (14)) in the limit of many non-zero propensity reaction channels which implies that both AcceptF ine (. . .) and AcceptF ine(. . .) tend to unity as the number of reaction channels increases.
Next, we set out to lower-bound AcceptBlock(. . .). This acceptance probability depends on σ 2 . Therefore, work will be saved if we can calculate the lower bound without sampling σ 2 . This can be accomplished by noting that AcceptBlock(. . .) is a product of fractions. If we have a numerator and denominator that are independent of σ 2 we can re-write this equation in terms of r 2 . Specifically, using F (r 1 r 2 ) u,I 0 allows us to lower-bound the equation.

Algorithm
The above equations, along with rejection sampling, allow us to create an efficient algorithm that will allow much of the work be done in parallel. From equation (18) observe there are two probability mass function expressions for a multinomial distribution. Specifically, is the multinomial distribution for sampling u. And for each r 1 the vector v r 1 is sampled as which is interesting and implies v r 1 is independent of all other block's v r ′ 1 given u. The multinomials v r 1 may be sampled independently for each block, however it may be the case that equation (20) needs to be computed by considering multiple blocks simultaneously. Specifically, computing I k (R, I 0 ) for block r 1 may require knowledge about any neighboring blocks, r ′ 1 , changing the chemical species counts for reaction channels v r 1 reacting on the same species. Since equation (23) is independent of I k (. . .), this "joint" acceptance probability only needs to be calculated when (a) blocks r 1 and r ′ 1 share a chemically reacting species in their respective v r 1 and v r ′ 1 sampled reaction channels and (b) block r 1 or r ′ 1 does not pass early block-acceptance. Computing equation (20) jointly involves computing a σ 2 for reaction events in r 1 and r ′ 1 . Then, I k (. . .) may be computed properly. It should be noted that in the pseudocode below some possible optimizations (eg independent early block accept and some parallel execution) are not shown for clarity. Instead "connected components", block groups which may need to be sampled jointly if conditions (a) and (b) are met, are sampled entirely in "joint" form.
We now present the HiER-leap algorithm, which is a realization of the aforementioned equations, in pseudocode. First, note that if we have an early global acceptance, then most of the computational effort will be put into line 12 of the following pseudocode. The subroutine from this line will be shown later. Notice that this function is independent for all blocks, with the exception that computing equation (20) may need to be done jointly for neighboring blocks, and needs to be done for all blocks with at least one reaction event. This is an ideal scheme for parallelization and is done so with good efficacy as will be shown. Furthermore, for the tests in in section 4. ComputeD's ⊲ May be done in parallel.
⊲ Blocks share an edge iff for each u r 1 ≥ 1 and they share a species. function SampleConnectedComponents(c, u, I 0 ) , u r 1 ) for r 2 ∈ r 1 and v (r 1 r 2 ) ≥ 1 do pEarly ← pEarly × F (r 1 r 2 ) (u,I 0 ) ⊲ Calculating I k takes the most work.

CaliBayes Validation
We check the HiER-leap algorithm correctness numerically with the CaliBayes test suite similar to the work in ER-leap [25]. If is possible to solve analytically for P (X|t), this allows us to compare many simulated trajectories to the true distribution defined by the CME. Since HiER-leap reduces to ER-leap when the number of blocks goes to one, and it has already been show that ER-leap samples the correct distribution, we wish to test across a variety of reaction channel quantities and organization structure.
The reaction networks in CaliBayes for which we know the analytical solution involve at most two species types. However, simulating many replicates of these networks on a grid, not connected with diffusion, will allow us to treat each block as an independent sample. We can then treat the simulation of many network replicates as many sampled trajectories of a single network.
We perform tests over a number of network replicates m = 2 . . . 1000. The number of blocks range from b = 1 . . . m. The leap is in the range L = 3 . . . 18, where the leap used depends on the specific reaction network, m and b, but is held constant throughout the simulation.
CaliBayes models 1-01, 1-03, 1-04, 2-01, 2-02, 3-01 and 3-02 [8] are tested, on the spaced defined by the Cartesian product of the possible values for the m, b and L parameters as described above, for parameters which result in an acceptance probability greater than about 0.05. These tests pass on these cases using the criteria of Evans et al. [8] We now turn to a large, spatially coupled system.

Acceleration
As an exact algorithm, the key performance metric of relevance to HiER-leap is the amount of acceleration achievable. As discussed earlier, in principle adding more reaction channels and processors should increase the relative speedup over SSA. We can see this trend experimentally in figure 1 and figure 2.  We test using a spatially coupled version of the Williamowski-Rössler model [32] defined as The following tests are all run on an Apple Macintosh Pro with a Quad-Core Intel Xeon processes running a total of 8 cores at 2.26 GHz and 13 GB of RAM using OS X 10.6.8. The algorithms are coded in C++ and Boost.T hread [18] and the Intel Threading Building Blocks [1] are used for multithreading. Connected components were found using the depth-first search algorithm. We compiled the code using the LLVM compiler 1.0.2. The HiER-leap code may be found at http://computableplant.ics.uci.edu/hierleap/.
Results are shown in figures 1 and 2. They show a substantial speedup of HiER-leap over SSA and ER-leap, around 100x and 10x respectively, as we increase the number of reaction channels to around 190,000 . The spatial nature of this experiment means that blocks are neighbors with relatively few other blocks. This leads to a greater "coarse-scale" acceptance probability and therefore increased efficiency.
Additionally, we see that the slopes of the log-log runtime plots for SSA and ER-leap become nearly equal as the number of reaction channels increase. This is expected, since ER-leap finds bounds on individual reaction channels after L reaction events, and this bound is independent of the number of reaction channels. HiER-leap however does not have this shortcoming and has a lower slope (eg better asymptotic behavior) as a result.

HiER-leap Properties
The algorithm parameters, such as leap size and hierarchical organization, require optimization before the fastest possible execution time is achieved. To find the ideal methods with which to optimize our algorithm, we explore various trade-offs here.
In figure 3 we observe that the optimal b and L are interdependent for a given network. However, it is interesting to note that for this experiment there is a relatively large plateau of nearly equivalent optimal running times. This means that the range of reasonably good parameters is large. Furthermore, the contour plot of figure 3 indicates that there is only one global optimum. This seemingly convex behavior indicates that finding the optimum requires only a simple hill climbing algorithm. As leap increases the amount of work per iteration goes up but the acceptance ratio goes down. Furthermore, if there are many reaction channels per block the total acceptance probability of the system goes down. However, in this situation the inner-block acceptance probability goes up. When the number of reaction channels per block goes down, the opposite trends occur. In this way the chosen leap and block organization will determine the total execution time.
Thus, the results from figure 3 indicate that finding the optimal L and hierarchical organization for a spatially distributed system is an easy optimization problem. These results, and those from ER-leap, suggest that L will generally have a local optimum that is also a global optimum. However, the optimal configuration of the blocks and reaction channels for networks not specifically representing a spatially distributed reaction network remains an open problem.

Summary
We have presented a novel accelerated stochastic simulation algorithm which has demonstrated an ability to sample from the CME without a loss of accuracy. Due to its hierarchical design, this method (a) scales very well with the number of reaction channels and simultaneously (b) takes advantage of parallel hardware for single trajectory samples. As far as we are aware, this is the first exact accelerated algorithm with either property (a) or (b), and is therefore of potential significance to the computational biology community.
Open questions and future work abound. For example, it is not know how well this method works on 'real networks' of substantial complexity taken from biological modeling practice. We believe that modular structure in biological networks will make the method particularly useful. Additionally, it is unknown how substantial increases in the parallel architectures of future computers will increase performance.
But we will show that this is impossible for anyñ a > n ′ a ≥ 0. Note that we do not need to consider ∆m (r 1 r 2 ) a ≤ 0 because F is monotonic, the LHS will be decreased and the RHS will not change as per the definition of ∆ D (r 1 ) (I 0 L) (negative ∆m (r 1 r 2 ) a are ignored). Before proceeding we will introduce the forward difference operator, ∆ F (i) , such that (26) for any function f (z). Furthermore, F (r 1 r 2 ) (n) can be decomposed by species into terms including chemical species C a and those which do not. Following from equation (3), this allows us to rewrite F (r 1 r 2 ) (n) as F (r 1 r ′ 2 ) (n) = G (r 1 r ′ 2 ) (n\{n a }) × (n a ) k for some constant G (r 1 r 2 ) (n\{n a }) ≥ 0 which does not depend on n a , where k = m (r 1 r ′ 2 ) a is the input stoichiometry for reaction r ′ 2 and species C a , and (n) k ≡ n! (n − k)! .
For equation (25) to be true there must exist a species C a such that is true. All of the above F (r 1 r 2 ) are calculated using n b = n ′ \{n a } and n a ∈ {n ′ a ,ñ a }. When we show that n ′ a will not result in a greater delta than that offered by usingñ a instead, this implies that equation (25) may never be true.
However, because n ′ a <ñ a , if it is shown that ∆ F (m) (n) k is monotonic in n then this will imply equation 28 is false.