Accelerated Bayesian inference of gene expression models from snapshots of single-cell transcripts

Understanding how stochastic gene expression is regulated in biological systems using snapshots of single-cell transcripts requires state-of-the-art methods of computational analysis and statistical inference. A Bayesian approach to statistical inference is the most complete method for model selection and uncertainty quantification of kinetic parameters from single-cell data. This approach is impractical because current numerical algorithms are too slow to handle typical models of gene expression. To solve this problem, we first show that time-dependent mRNA distributions of discrete-state models of gene expression are dynamic Poisson mixtures, whose mixing kernels are characterized by a piece-wise deterministic Markov process. We combined this analytical result with a kinetic Monte Carlo algorithm to create a hybrid numerical method that accelerates the calculation of time-dependent mRNA distributions by 1000-fold compared to current methods. We then integrated the hybrid algorithm into an existing Monte Carlo sampler to estimate the Bayesian posterior distribution of many different, competing models in a reasonable amount of time. We validated our method of accelerated Bayesian inference on several synthetic data sets. Our results show that kinetic parameters can be reasonably constrained for modestly sampled data sets, if the model is known \textit{a priori}. If the model is unknown,the Bayesian evidence can be used to rigorously quantify the likelihood of a model relative to other models from the data. We demonstrate that Bayesian evidence selects the true model and outperforms approximate metrics, e.g., Bayesian Information Criterion (BIC) or Akaike Information Criterion (AIC), often used for model selection.


I. INTRODUCTION
Gene expression is a biochemical process driven by the chance collisions of molecules, which can result in strong stochastic signatures and cell-to-cell variability in gene dynamics.Advances in single-cell and single-molecule technologies have provided unprecedented resolution on the stochastic dynamics of gene expression [1].Dynamic assays measure gene expression in living cells either directly via transcript tagging [2][3][4][5] or indirectly via fluorescent or luminescent proteins [6][7][8][9].Static assays measure transcript levels in fixed cells either using a cocktail of fluorescently-labelled DNA oligos that bind specific transcripts [10,11] or via single-cell RNA sequencing [12,13].Static assays are popular because they do not require genetic modifications and are easily multiplexed.The disadvantage is that static assays only provide population snapshots of transcripts levels and cannot follow the dynamics of transcription in a single cell through time.
To this end, static assays have relied upon mathematical models to infer dynamic properties of gene expression in single cells from the measured snapshot of transcript levels; see [14] for a review.Inference requires (1) appropriate models of stochastic gene expression, (2) numerical methods to calculate the time-dependent mRNA distribution in a population of cells given any underlying model and associated parameters, and (3) calculating the likelihood that measured data were sampled from the calculated distribution.We recently developed a Bayesian approach (BayFISH) that uses this likelihood to infer best-fitting parameters from single cell data and quantify their uncertainty using the posterior distribution [15].Although Bayesian inference is the most complete and rigorous approach, it requires significantly more computation than other approximate methods, e.g.maximum likelihood.
We have developed a hybrid numerical method that accelerates the calculation of time-dependent mRNA distributions by 1000-fold compared to standard methods.We integrated this method into BayFISH and, for the first time, one can estimate the Bayesian posterior distribution of many competing models in a reasonable amount of time.The Bayesian evidence rigorously quantifies the likelihood of a model relative to other models given the data, and we show that HME selects the true model and outperforms approximate metrics, e.g., BIC or AIC, typically used for model selection.Our accelerated Bayesian inference represents a significant advance over existing methods used for inferring gene expression models from snapshots of single cell transcripts.

II. RESULTS
Our inference method uses data from single-molecule RNA Fluorescence In Situ Hybridization (smFISH), but could include single cell data from other static assays.The smFISH technique labels transcripts with fluorescent DNA oligos and measures both the number of ma-ture mRNAs (m) and the number of gene loci with highactivity transcription sites (T S); see Figure 1a.A typical smFISH data set is a histogram h = h( ω) where ω ∈ Ω is the set of all possible states (m, T S) that can be measured in a cell.

A. Connecting models of gene expression to single cell data
A broad spectrum of measured gene expression profiles in bacteria and eukaryotes is well-explained by discrete state gene expression models [16,17], summarized by the following reactions: ( In this article, we adopt a two-allele, 3-state model (Figure 1b) as a case study for modelling stochastic gene expression in eukaryotes and for testing our method of accelerated Bayesian inference.We further focus on dynamic smFISH experiments that perturb gene expression (e.g.induction) and then measure mRNA distributions at different times after induction to infer dynamics and kinetic parameters.Induction can change one or more of the model parameters (Figure 1c).The smFISH data from an induction experiment consist of a joint histogram h = h( ω, t ), where t are independent observations made at different times before and after induction.
If the changed parameters are unknown a priori, then one should evaluate all possible induction models, which leads to a combinatorial explosion in model space.For example, there are 2 8 = 256 candidate induction models for the 3-state model shown in Figure 1c.
A likelihood approach is used to connect mathematical models of stochastic gene expression to single cell data.Formally, the likelihood L is the probability that a candidate model M and its associated parameter set θ would generate a given set of data (h).The number of parameters (i.e., dimension of θ) is determined by the model structure M. Mathematically, the likelihood L is a function of the joint probability distribution P( ω, t | θ, M) of a candidate model M and its associated parameters θ at discrete observation times: where Φ is the set of observation times and M is the Multinomial coefficient associated with each h( ω, t ) that arises because the data were not ordered.
In our Bayesian inference work flow (Figure 1d), each candidate model M in the class of possible models {M} will require a large number (≥ 10 6 ) of Monte Carlo steps where, at each step, numerical simulations calculate the time-dependent mRNA distributions and evaluate the likelihood that different parameter sets θ for that model generated the observed data.Our previous software [15] took days to perform the likelihood calculations for one model, which highlights the challenge of using Bayesian inference to evaluate hundreds of models and perform model selection.Below, we develop a hybrid method that both accelerates numerical simulation and likelihood calculations, and (in contrast to standard methods) scales with the number of multi-core processors, thus, allowing for efficient parallelization.

B. A novel hybrid method to calculate the time evolution of discrete-state models
While exact time-dependent solutions exist for twostate models [18][19][20], it is hard to generalize this analysis to models with more than two states.It is therefore necessary to solve the general time-dependent problem using numerical simulations.There are two classes of numerical procedures to solve the time evolution of a discrete-state model for a given set of parameters.The first class forward-evolves the chemical master equations (CME), which are a system of infinitely many coupled ordinary differential equations (ODEs) that describe the joint probability distributions P( ω, t) as a function of time [21,22].To be numerically feasible, a truncation scheme (e.g.only consider mRNA levels below a maximum M ) is used to reduce the infinite size of the dynamical system.While this approach delivers accurate estimates of the temporal evolution of the truncated system, there are two shortcomings.First, the number of ODEs scale as S 2 M where S is the number of genetic states for each allele.The ODE system becomes unwieldy for mammalian cells where the number of observed mRNAs per cell can be O( 103 ) [23][24][25][26].Second, the forward integration of the CME requires stiff ODE solvers, which can place demands on memory resources and hinder parallel processing.The second class of numerical procedures utilize kinetic Monte Carlo methods (e.g., continuous time Markov chain simulation [27,28]) to sample the temporal evolution of the joint probability distribution P( ω, t).While this approach is computationally less expensive, it comes at the cost of having to sample over many runs to achieve equivalent accuracy to CME.
In this article, we propose a hybrid simulation method (Poisson Mixture with Piecewise Deterministic Markov Switching Rates, or PM-PDMSR) which leverages analytical results and the efficiency of the kinetic Monte Carlo method.The key result is that the mRNA distribution can be exactly calculated for any realization (trajectory) of the genetic state, s(t); see Methods.Once transient, initial conditions have burned off (t δ −1 ), where δ is the mRNA degradation rate, the mRNA (N mRNA ) distribution is always Poisson, P(N mRNA = m) = λ m (t)e −λ(t) /m! with a dynamic rate λ(t) satisfying the following piecewise ODE: with an initial condition λ(0) = 0. Given any trajectory s(t), we can exactly compute the mRNA distribution P(m|s(t)); see Figure 2a-b  Hybrid simulation method, PM-PDMSR.For simplicity, we illustrate the principle of PM-PDMSR for a single allele, 3-state model (M = 00110000).The gene is induced at t = 10.
Model parameters: before stimulation: κ01, κ12, κ U 21 , κ U 10 , β0, β1, β2, δ = (0.5, 0.5, 5, 5, 20, 150, 300, 1); after stimulation, κ dimensional) are efficiently generated using standard kinetic Monte Carlo methods.After accumulating a large number of sample paths N s generated by the underlying model, the mixture of the Poisson distributions recovers the mRNA distribution: (c) FIG. 3. Efficiency of the hybrid method relative to the CME method for a single processor.We measured the time for each method to complete one step of Bayesian inference, i.e. calculate joint distribution and evaluate the likelihood.This comparison was performed for increasingly complex model classes: where where λ k (t) is the solution of (3) subject to the k th sample path of genetic switching trajectory s k (t) and δ i,j is the Kronecker delta (see Figure 2c-e).
A detailed description of the hybrid simulator is given in the Methods section.We evaluated the efficiency of the hybrid simulator relative to CME in performing a single step of the Bayesian inference work flow, i.e. simulate the joint distribution P( ω, t) and calculate the likelihood L that this joint distribution produced a given data set (h).We bench-marked the simulators on diverse classes of discrete-state models, parameter sets, and data sets; see Figure 3.The hybrid simulator is up to 10 3 more efficient for models with increased genetic states, S = 3 and 4. The efficiency gain of the hybrid simulator originates from the fact that P(m|s(t)) is solved exactly in mRNA space (and is independent of the size of M ) and that P(s, t) is sampled efficiently in genetic-state space via kinetic Monte Carlo techniques.Last, we tested the scaling of efficiency of different simulators on a modern machine, which can execute 32 parallel threads.The hybrid method scaled linearly with the number of processors, i.e. 32 processors in parallel ran just as fast as one processor.Surprisingly, the simultaneous execution of 32 stiff ODE solvers for the CME simulator took 16 times longer per processor than one processor alone; see Methods.

C. Bayesian inference and uncertainty quantification of model parameters
Equipped with an efficient simulator of the timedependent joint probability distribution and likelihood calculation for any model and parameter set, we first turned our attention to uncertainty quantification of model parameters θ for a fixed model M. Given a likelihood, Bayesian inference uses Bayes formula to update any prior beliefs P( θ|M) and calculate the posterior distribution P( θ|h, M) of parameters θ given the data h and a fixed model M: As done previously, we resorted to Markov chain Monte Carlo (MCMC) with a Metropolis-Hastings (MH) sampler to estimate the posterior distribution P( θ|h, M); see Methods and [15].We assumed that the prior P( θ|M) is log-uniform.At each MCMC step, the MH sampler randomly proposes a nearby parameter set and computes the ratio of the posterior probability P( θ|h, M) relative to that of the current parameter set, and probabilistically accepts or rejects the proposal with a prescribed criterion that only depends on the ratio of the likelihood values.The denominator P(h|M) in ( 5) is identical for any parameter set θ and cancels during the calculation of the ratio.
We bench-marked our approach on two synthetic data sets that were generated by sampling (N = 100 or 1000 cells per time point for a total of 4 time points) from a two-allele, 3-state induction model, where the induction stimulus decreased the downward transition rates; see Methods.Here, the model was known a priori and our goal was to infer the kinetic parameters and perform uncertainty quantification by comparing their posterior distributions (Figure 4a-b) to the ground truth (GT) parameters used to generate the sampled synthetic data set (Figure 4c-d).Our method constrained the posterior parameter distribution around the ground truth, and a 10fold increase in the number of sampled cells dramatically reduced uncertainty in the inferred parameters.This observation holds true for a synthetic data set generated by a different two-allele, 3-state induction model; see Supplemental Materials.

D. Model selection using the full Bayesian framework
Knowing that our method of accelerated Bayesian inference can reliably constrain the kinetic parameters for a given model, we turned our attention to the harder problem of model selection.The goal was to identify the correct model from 64 possible types of two-allele, 3state induction models given the same synthetic data set in Fig. 4, which was sampled from a ground-truth model and its parameters.Bayesian analysis naturally provides a quantitative measure of the likelihood of any model M, i.e., the probability of the model to reproduce the experimentally observed data h.The measure, referred to as the marginalized likelihood or evidence [29,30], is the denominator of (5): The evidence is simply the probability that a model M produced data h and is equal to the sum of the probabil- ities of the model (i.e.likelihood) over all sets of parameters that could have produced the data.The evidence is a convolution of the likelihood with the prior P( θ|M), which quantifies the belief regarding the initial parameter distributions.The dimensionality of θ does not have to be identical for two different models and this prior inherently penalizes models with too many parameters; see Discussion.
The evidence for a model M is not calculated during the MCMC sampling of the posterior distribution and has to be computed separately.Computing the evidence is a sophisticated problem [31][32][33][34] and we adopted an Importance Sampler of the Harmonic Mean Estimator (IS-HME) proposed Robert and Wraith [35], which re-samples the posterior distribution estimated by the MCMC to compute the evidence of each model; see Methods.We first carried out the MCMC calculations of posterior distributions for each of the 64 possible types of two-allele, 3-state induction models for the data sets described in Fig. 4. We then used IS-HME to compute the evidence of each model given the underlying data set.We compared the IS-HME evidence to maximum likelihood metrics used for model selection, such as the Bayesian Information Criterion (BIC) and Akaike Information Crite-rion (AIC) [15].Both BIC and AIC are approximations to the Bayesian evidence and become equivalent in the limit of large sample sizes; see Discussion.
Our results demonstrate that the IS-HME evidence P(h|M) of the ground truth model dominates over other models (≥ 95%) when using Bayesian inference on the larger data set (N = 1000 cells sampled per time point); see Figure 5.The BIC approximation also selected the ground-truth model (although incorrect models exhibited significant probabilities, e.g.> 5%) whereas the AIC failed to select the correct model.When the sample size dropped to N = 100 cells per time point, even IS-HME evidence could not reliably select the ground-truth model with this under-powered data set.

III. DISCUSSION
Piecewise-deterministic Markov processes (PDMP) have become a useful, coarse-grained description of stochastic gene dynamics, where the underlying discrete variable s(t) captures the stochastic dynamics of gene states and the continuous variable λ(t) captures the first moment of downstream gene products [36][37][38][39][40][41][42].The key insight of our manuscript was proving that the timedependent mRNA distribution of any underlying s(t) is asymptotically a Poisson distribution with a rate λ(t), and that the time-dependent joint probability distributions of discrete-state models are dynamic Poisson mixtures, whose mixing kernels are characterized by a PDMP.This significantly expands upon a related framework, which only considered the stationary distribution of discrete-state models [43].More generally, our analysis helps bridge a gap between mechanistic discrete-state models and statistical models used in single cell analysis.For example, Wang et al. recently proposed an statistical model of gene expression, which postulated that mRNA distributions are Poisson mixtures [44], and our work justifies this assumption.
We used our insight to develop a hybrid method that calculates the time-dependent joint distribution more efficiently than standard numerical methods that forwardintegrate the Chemical Master Equation (CME).The efficiency arises because our method analytically solves the mRNA distribution and rapidly samples many path s(t) of discrete-switching events using a kinetic Monte Carlo algorithm.We benchmarked the hybrid method and showed that it is O 10 3 more efficient than previous methods that directly integrate the CME.Furthermore, the hybrid method runs as efficiently in parallel on a multi-core processor than it does on a single processor.The stiff CME integrators ran more slowly in parallel and this sublinear scaling persisted for different integrators.We suspect that the slow-down arises from the competing memory demands of stiff CME integrators running on a multi-core processor.While there is room to improve stiff integration and parallelization, current approaches are limited compared to the hybrid method because they must integrate the CME for a large number of mRNA states, e.g.0 to 1000 mRNAs per cell.
We incorporated the hybrid algorithm into BayFISH and were able, for the first time, to use a full Bayesian framework for model selection and uncertainty quantification of parameters from single-cell smFISH data.We adopted the Bayesian framework for model selection because it naturally quantifies "Occams factor" [29,32] and, thus, avoids over-fitting.For example, the top models based on Bayesian evidence are not the most complex models with the largest number of parameters that change upon induction, e.g.M = 11110110; see Figure 5.The evidence resists over-fitting because when the dimensionality of parameter space is high, the value of a uniform probability density of the prior parameter distribution P θ|M in ( 6) is small due to normalization.Thus, Bayesian evidence will favor a model that is complex enough to have a large likelihood but not too complex to decrease the prior parameter density.We note that when the data sample size is large, such that the posterior distributions P θ|h, M can be approximated by a multivariate normal distribution, the logarithm of the evidence converges asymptotically to the Schwartz index (commonly known as the Bayesian Information Criterion, BIC) [30,32].Here, we bench-marked the ability of Bayesian evidence and the BIC metric to select the correct model from synthetic data sets generated by a ground-truth model and parameters.Figure 5 shows that while the BIC (but not AIC) ranked models similarly to Bayesian evidence for the larger data set (N =1000 cells per time point), BIC requires an even larger sample size to confidently converge to the correct model.This is an important issue because most biology labs are ill-equipped to generate and analyze large smFISH data sets, and their sample sizes are typically N = 100 -1000 cells per time point.Our work demonstrates why Bayesian inference should be used for modestly sampled data sets.We show that N =100 cells per time point is sufficient for parameter inference and uncertainty quantification if one has high confidence in the underlying model; see Figure 4.However, if the goal of the smFISH experiments is model selection, then these smaller data sets are under-powered and the experimentalist needs to increase data sampling by atleast 10-fold; see Figure 5a and 5c.Here, we only considered one round of experiments followed by Bayesian inference, but multiple cycles of data collection and analysis are the norm.Our framework quantifies certainty in both models and parameters using Bayesian evidence and posterior distributions.Future work can complete the data collection and analysis cycle by using the evidence and posterior distributions to rationally dictate the next round of experiments, i.e. different sampling times and densities, that are most informative for constraining models and/or parameters.
In this article, we adopted a Markov chain Monte Carlo algorithm with Metropolis-Hastings sampling (i.e., MCMC-MH) to compute the posterior distributions of the model parameters [15,45].However, there is room to further improve the speed of Bayesian inference.First, Hamiltonian Monte Carlo (HMC) algorithms are more efficient at sampling posterior distributions in high dimensional parameter spaces because they uses local sensitivity, i.e., the partial derivatives of the likelihoods with respect to the model parameters [46][47][48].Second, although PM-PDMRS is efficient at generating sample paths in the space (s, λ), evaluating the convolution to calculate the joint distribution ( 4) is the rate-limiting step in the likelihood calculation.Thus, transforming the experimental data h into the mixing kernel of the Poisson mixtures ρ s (λ) would accelerate Bayesian inference.Last, one could use low-order moments of PM-PDMSR and experimental data to formulate a sufficient statistics for likelihood-free Approximate Bayesian Computation [49], thus, replacing the explicit calculation of the likelihood L.

A. Poisson mixture with piecewise deterministic
Markov switching rates We illustrate our central theoretical results using a single-allele model.However, the results generalize to multiple-allele models because the states of a multipleallele model can be relabeled as internal states of a single-allele model.
Central theoretical result I: Given a trajectory of genetic state s(t), the total number of mRNA, N mRNA (t), is the sum of two variables N initial mRNA (t) and N new mRNA (t).N initial mRNA (t) describes the number of initial mRNAs that remain at time t.
The probability distribution of N initial mRNA (t) is a Binomial mixture weighted by the initial mRNA distribution P m,0 := P N initial mRNA (t = 0) = n : Here, Θ (•) is the Heaviside step function.N new mRNA (t) describes the number of new mRNAs that were synthesized after t > 0 and still remain at time t.The probability distribution of N new mRNA (t) is a Poisson distribution with rate λ (t) where λ(t) satisfies equation λ (t) = β s(t) − δλ (t) with an initial condition λ (0) = 0.
Proof.We denote the probability of the total number of mRNA N mRNA at time t by P m (t).For a given trajectory of the genetic state s(t), the temporal evolution of satisfies the chemical master equation (CME) for all m ∈ Z ≥0 and with a boundary condition P −1 = 0. We prove the central theoretical result I by using the probability generating function defined by where We first solve for G (z, t) over an interval of time when s (t) is constant.We will then extend our analysis to include piecewise intervals of time with different value of constant s, similar to the s (t) generated by a genetic state model.To begin, we apply the operator ∂ t to G (z, t) and use (9) to obtain the partial differential equation (PDE): (11) This linear PDE can be solved using the method of characteristics [50] and the general solution is where P m,0 := P m,0 (t = 0) is the initial mRNA distribution of the system.For reasons that will become apparent below, we label the initial-distribution-dependent part by G init (z, t) and the rest of the terms by G new (z, t): The above solution applies to a constant s (t).We now consider a piecewise-constant trajectory for any genetic state s (t).To specify the discrete state and the switching events, we label s (t) by the ordered pairs (t i , s i ) for i = 0, 1, . . ., N before an observation time t.The gene starts with a state s 0 at t 0 := 0, switches to s 1 at t 1 ,. . .until the final switching event to s N at time t N .Our aim is to compute the generating function G (z, t) after N switching events (t ≥ t N ).The solution G (z, t, |t ≤ t 1 ) is identical to ( 12) with s = s 0 before the first switching event.At t = t 1 , the generating function is Note that after t 1 and before t 2 , the genetic state changes to s 1 and only the transcription rate in (9) changes from β s0 to β s1 .The initial condition of the generating function of this period (t 1 ≤ t ≤ t 2 ) is precisely G (z, t 1 ) in the above equation.Matching the "initial condition" for G (z, t 1 ), we arrive at We iterate this procedure for each piecewise "episode" and for t ≥ t N , The total solution G (z, t|t ≥ t N ) is factorized into two terms, G init (z, t) and G new (z, t), for any N and t.The probability generating function of the sum of two independent random variables is the product of the generating functions of the random variables.This hints that we can define two random variables, X 1 and X 2 , which have generating functions G init (z, t) and G new (z, t) respectively.
Our next task is to identify the variables X 1 and X 2 and their probability distributions.For X 1 , we expand G init (z, t) to arrive at Recall that the generating function of a Binomial (m, p) distribution is The probability distribution of X 1 is therefore identified to be a binomial mixure with a temporally decaying parameter p = exp (−δt) and a mixing kernel defined by the initial distribution P m,0 .The physical meaning of X 1 (t) is the number of initial mRNA molecules that remain at time t, i.e., N initial mRNA (t).These mRNA can only degrade with the decay rate δ.Each of the mRNA decays independently and, at time t, there is a probability exp (−δt) that a specific mRNA has not degraded.Importantly, when t 1/δ, this distribution will be concentrated at m = 0 (see Corollary I below).
The total mRNA N (t) = X 1 (t) + X 2 (t) = N initial mRNA (t) + X 2 (t), so X 2 (t) is identified to be the number of new mRNA molecules that were synthesized after t = 0 but which have not degraded at time t.We refer to this variable as N new mRNA (t).The squarer bracket of G new (z, t) in ( 17) is the piecewise solution λ (t) of the following ODE for a given genetic trajectory s (t): We now expand G new (z, t): where q m (λ (t)) is the probability density function of a Poisson distribution with rate λ (t), as in (8).
Corollary I.The transient timescale for the initial distribution is O (1/δ).When t O (1/δ), the mRNA distribution converges to a Poisson with a dynamic rate parameter λ (t).
Proof.Physically, the timescale of degradation of each initially populated mRNA is 1/δ, so for a timescale which is much longer than this, the initial distribution will be fully degraded.Mathematically, the probability that the initial mRNA molecules have not fully decayed is In the asymptotic limit t 1/δ, exp (−δt) 1 so Here, N initial mRNA (0) is the first moment of the initial distribution.P N initial mRNA (t) > 0 decays exponentially fast, and we can bound this probability to be smaller than ε when t > δ −1 log N initial mRNA (0) /ε .
Central theoretical result II: At long times t 1/δ, the mRNA distribution asymptotically converges to a Poisson mixture regardless of the initial mRNA distribution and genetic switching trajectory s (t) where the joint probability density ρ i (λ, t) satisfies the forward Kolmogorov equation The initial condition for ρ i (λ, t = 0) are defined by ρ i (λ, t = 0) := δ (λ) P (s (t = 0) = i) , (26) where δ (λ) is the Dirac delta distribution at λ = 0.
Proof.The solution of (20) subject to random switching events of s (t) is a random process.Formally, λ (t) and the discrete switching states s (t) jointly comprise a piecewise deterministic Markov process (PDMP) [51][52][53].
The forward Kolmogorov equation describing the temporal evolution of the joint probability distribution is (25) [39,41].Therefore, We then use the central theoretical result I and corollary I to show that P (N mRNA (t) = m|λ (t) = , s (t) = i) = m e − /m! asymptotically when t 1/δ to complete the proof.
Efficient numerical method for sampling ρ i (λ, t).Because λ is continuous, solving the forward Kolmogorov equation ( 25) is as complex as solving the full CME, both of which are infinite dimensional systems.Instead, we used an efficient kinetic Monte Carlo simulation [36,41] to generate a large number of sample paths to estimate the asymptotic joint distribution P (N mRNA (t) = m|λ (t) = , s (t) = i) when t 1/δ using (24).The pseudo code of this algorithm is shown in Algorithm 1.
Computing the joint distribution from sample paths.To compute the time-dependent joint distribution of genetic states and mRNAs, we generated N s sample paths {λ k (t) , s k (t)} Ns k=1 with Algorithm I. We then used (24) to estimate ρ i (λ) where δ i,j is the Kronecker delta function which is equal to 1 if i = j, and 0 otherwise.We determined that N s ≡ 10 5 is a sufficient number of sample paths to estimate the same joint distribution obtained by forward-integrating the CME.We refer to our method as the Poisson Mixture with a Piecewise Deterministic Markov Switching Rate (PM-PDMSR).
The goal was to compute the joint distribution of genetic states and mRNAs before and after induction.Similar to the situation in many experimental systems [54,55], the joint distribution is at stationarity before induction.Upon induction, we assume some model parameters are changed, which results in the time-evolution of the joint probability distribution P (N mRNA (t) = m, s (t) = i) towards a new stationary state.We label the kinetic rates (κ ij and β k ) before and after induction by U (Unstimulated) and S (Stimulated).To use PM-PDMSR to estimate the stationary distribution before induction, we first solved for the marginal stationary distribution of the genetic state p * i , where 0 = j κ U ij − κ U ji p * i , i = 1, 2, . . ., S for an S-state model.We initiated N s p * i sample paths in PM-PDMSR at λ = β i /δ and state s = i and ran for t = 10/δ so that the Poisson mixture relaxes to stationarity.Upon induction at t = 10/δ, we changed model parameters κ , . . .S} and continued simulating the temporal evolution of the joint probability distribution after induction using PM-PDMSR.This is valid because our previous proofs and arguments for (24) apply even when the kinetic rate constants change upon induction.

B. Testing the speed and accuracy of the simulators
We bench-marked the efficiency of PM-PDMSR versus the "gold-standard" simulator, i.e., forward-integration of a truncated CME [21,22].Both simulators were embedded into BayFISH and evaluated on their ability Require: Initial state λ (t = 0) = 0 and s (t) = s0.Kinetic rate κij (switching rates from discrete state i → j), β k (transcription rates), and δ (degradation rate).N discrete observation times T := {t } N =1 .Ensure: An exact sample path of the random process (λ (t) , s (t)) at N discrete times T .end while 21: Output the system state (λ, s) at the observation time t observation 22: end for Algorithm I: An efficient kinetic Monte Carlo algorithm which generates exact sample paths of the piecewise deterministic Markov process (λ (t) , s (t)).
to perform a single Monte Carlo step, i.e., simulate the time-dependent joint distribution of a model and its parameters and to calculate the likelihood of a synthetic data set generated by the same model and parameters.This single-step bench-marking was performed for 1024 diverse models and parameters across two-allele, discrete-state models of increasing complexity (2-state, 3-state, and 4-state induction models).
Generating diverse models, parameters, and synthetic data sets.Each model had a randomly chosen subset of parameters that were one value κ U ij , β U k before induction from t = 0 to t = 20 and with different parameters κ S ij , β S k after induction from t = 20 to t = 22.The genetic switching rates of parameters κ ij ranged between 10 −2 to 10 2 and the transcription rates β k range between 0 and 200.For each instance of a model, we randomly generated the switching rate constants in the logarithmic space (log 10 κ ij ∼ Unif (−2, 2) if |i−j| = 1) and transcription rate constants in linear space (β k ∼ Unif (0, 200)).Note that not all parameters were random or changed upon induction: We fixed β 0 = 0 and the mRNA degradation rate δ = 1, and we constrained β i ≤ β j if i < j.For the purpose of bench-marking algorithm speed and efficiency, we considered the most complex induction model M for each discrete model class.The corresponding induction model is M = 11010 (2state), M = 11110110 (3-state), and M = 11111101110 (4-state).
The synthetic data of each model with its parameters were generated by running 1000 independent trajectories of standard continuous-time Markov chain We collected the statistics of the trajectories at four discrete times t = 20, 20.5, 21, and 22 (N = 1000 cells per time point).The measured allele activity state T S was marginalized: we define T S = 1 when its internal state s > 0. The synthetic data set therefore consists of a histogram at discrete times t , h (m, T S, t ), which is the number of trajectories with m mRNAs and T S active transcription sites (which can be 0, 1, or 2 for a two-allele system) at time t .For each model class (2-, 3-, and 4-state models), we repeated the process 1024 times to test diverse parameter combinations.
CME simulators.Given an induction model and associated parameters, we forward propagated the CME using the same parameter values used to create the synthetic data sets in the previous section.We truncated the number of mRNA at 500 (i.e., there are no transcription event once the system reaches N mRNA = 500) with an absolute error tolerance of 10 −5 .The truncation number M was motivated by data sets in animals, showing that mRNA populations can be as large as O (500) [23][24][25][26].We tested different software platforms, including Matlab, Python (SciPy), and a research software ACME [22] to forward integrate the same stiff CME.Pythons stiff integrator (using backward differentiation formula, BDF) outperformed other integrators and software platforms.Thus, Python (with SciPy) was chosen to be the platform for direct Gene state i κij − − → Gene state j, Gene state i βi − → Gene State i + mRNA, mRNA δ − → .

FIG. 1 .
FIG. 1. Single cell data and models of gene expression.(a) The single-molecule RNA FISH (smFISH) technique provides information on the localization and numbers of mature mR-NAs (m) in single cells, including clusters of nascent transcripts produced at transcription sites (TS) at active genetic loci.(b) A diploid, two-allele 3-state genetic model where κij is the transition rate between genetic states, βi is the mRNA synthesis rate of each state, and δ is the mRNA degradation rate.(c) Induction changes one or more parameters from an unstimulated (U) to a stimulated value (S).Here, we show one of many possible induction models M, labelled in binary (00110000) (d) Schematic of the Bayesian inference work flow.
FIG.3.Efficiency of the hybrid method relative to the CME method for a single processor.We measured the time for each method to complete one step of Bayesian inference, i.e. calculate joint distribution and evaluate the likelihood.This comparison was performed for increasingly complex model classes: (a) 2-state, (b) 3-state, (c) 4-state models of gene expression.Each model class was evaluated for 1024 different parameters along with associated data sets; see Methods for details.

FIG. 4 .
FIG. 4. Parameter inference and uncertainty quantification using the Bayesian posterior distribution.We bench-marked the hybrid method by running Bayesian inference on a synthetic data set sampled (N cells at 4 different time points) from a known model M = 00110000 and "ground-truth" (GT) parameter set.Posterior distributions (a-b) and joint distribution of best-fit parameters (c-d) for N = 100 and 1000 cells per time point, respectively.

FIG. 5 .
FIG. 5. Model selection using Bayesian evidence.We plot the IS-HME, BIC, and AIC evidence metrics of the top 10 models, ordered by decreasing IS-HME score.Model selection was performed on data sampled at two densities (N = 100 and 1000 cells per time point at 4 different time points) for two different ground-truth models (M = 0011000 and M = 1100000).