Maximum likelihood estimates of diffusion coefficients from single-particle tracking experiments

Single-molecule localization microscopy allows practitioners to locate and track labeled molecules in biological systems. When extracting diffusion coefficients from the resulting trajectories, it is common practice to perform a linear fit on mean-square-displacement curves. However, this strategy is suboptimal and prone to errors. Recently, it was shown that the increments between observed positions provide a good estimate for the diffusion coefficient, and their statistics are well-suited for likelihood-based analysis methods. Here, we revisit the problem of extracting diffusion coefficients from single-particle tracking experiments subject to static and dynamic noise using the principle of maximum likelihood. Taking advantage of an efficient real-space formulation, we extend the model to mixtures of subpopulations differing in their diffusion coefficients, which we estimate with the help of the expectation-maximization algorithm. This formulation naturally leads to a probabilistic assignment of trajectories to subpopulations. We employ the theory to analyze experimental tracking data that cannot be explained with a single diffusion coefficient. We test how well a dataset conforms to the assumptions of a diffusion model and determine the optimal number of subpopulations with the help of a quality factor of known analytical distribution. To facilitate use by practitioners, we provide a fast open-source implementation of the theory for the efficient analysis of multiple trajectories in arbitrary dimensions simultaneously.


I. INTRODUCTION
Single-particle tracking methods are routinely used to monitor the erratic motion of labeled macromolecules in their native environment, such as molecular motors moving along the cytoskeletal network 1,2 , transcription factors binding to DNA 3 , or receptor proteins diffusing in cell membranes 4 .The mode of transport and associated transport coefficients are inferred from the measured trajectories using microscopic models mimicking the globally observed behavior, which can range from ordinary to anomalous and confined diffusion, as well as active transport 5 .
The simplest mode of transport in a dense fluid medium is free diffusion, which is fully characterized by a diffusion coefficient D. However, single-particle tracking experiments are plagued by static localization errors and dynamic motion blur, which have to be properly accounted for when estimating diffusion coefficients.Static errors originate from various noise sources in the experimental setup, such as spatial resolution of the instrument, and noise in the detection and processing electronics 6 , and are commonly modeled via additive Gaussian noise [7][8][9] .Motion blur arises from the camera's finite frame integration time, during which the particle position is smeared out.This effect depends on the illumination profile of the shutter, which in most cases is uniform 8 , but can, in general, be represented by any non-negative function s(t) that integrates to unity over the frame integration time 9 .a) Electronic mail: gerhard.hummer@biophys.mpg.de   Traditionally, the estimation of diffusion coefficients has relied on the fact that the slope of the mean squared displacement (MSD) is directly proportional to D 10 .This procedure, however, has some serious drawbacks.For instance, the most common estimator for the MSD of finite time series introduces correlations between the MSD values at different time lags, which, in turn, cause the estimate to suffer if too many MSD values are used for the fit 11,12 .To remedy these shortcomings, one can either consider the optimal number of MSD values for the fit 12 , or explicitly account for the above-mentioned correlations in the fit procedure 13 .Departing from the idea that the MSD is needed to determine D, one arrives at the highly efficient covariance-based estimator (CVE) 14 , which is based on the much simpler statistics of the position increments between two observations.Even before, Berglund 9 had already used the sparse covariance matrix of the increments to construct an approximate maximum likelihood estimator (MLE) operating in discrete Fourier space.This estimator asymptotically approaches the exact MLE in the limit of infinitely long trajectories.It was later shown 15 that replacing the Fourier transform with a discrete sine transform results in an orthogonal basis in which the MLE of Ref. 9 is exact.
Despite the fact that the CVE is orders of magnitude faster than likelihood-based methods 14 , the latter can still be beneficial, e.g., to incorporate prior knowledge of the parameters in a Bayesian manner or to globally analyze trajectories of different lengths.The latter issue is common in single-particle tracking, because individual molecules stochastically disappear as a result of photobleaching.Importantly, MLEs are minimally affected by blinking events and other interruptions of the trajectory recording, in the sense that the estimates and their uncertainty depend almost exclusively on the number of observed particle positions, and not the number of trajectories.We note, though, that the CVE can also overcome missing positions in the trajectory by lumping together increments of the resulting trajectory segments 16 .However, correlations in longer trajectories may reveal deviations from regular diffusion.
Here, we revisit the problem of minimizing the negative log-likelihood, but instead of transforming the data we work in real space, and exploit the symmetries of the increment covariance matrix to develop a fast and reliable numerical solution scheme.We develop MLEs of the mean-squared positional uncertainty a 2 in single-particle tracking and of the diffusional spread σ 2 during a single time step ∆t, from which one obtains the MLE of the diffusion coefficient as D = σ 2 /(2∆t), irrespective of the dimension d.The MLEs can also be used to analyze molecular dynamics simulation trajectories by setting the motion-blur coefficient to B = 0.The coefficient a 2 then accounts for fast non-diffusional spread in the particle position.
We extend the likelihood formulation to a mixture model, which assumes that the trajectories to be analyzed originate from different subpopulations, each characterized by a distinct diffusion coefficient.In comparison to some established models, such as those that account for diffusion in inhomogeneous media [17][18][19][20][21][22][23][24] or those allowing for multiple diffusive regimes within a single trajectory [25][26][27][28][29] , our approach may seem somewhat restrictive.However, what it lacks in generality it makes up with speed, efficiency, and compactness.Furthermore, we provide rigorous statistical tests to assess whether the data comply with the theory assumptions or if more demanding models are needed to explain the data.
The paper is structured as follows.In Sec.II A, we review the minimal stochastic model of Ref. 9 for diffusive trajectories smeared out by static localization noise and dynamic motion blur.Section II B is dedicated to the negative log-likelihood of the trajectory increments and the efficient O(N )-algorithm used to numerically minimize it.To test the assumptions of the underlying diffusion model, we introduce a quality factor in Sec.II C that can either be inspected visually or via a non-parametric test, such as the Kolmogorov-Smirnov test or Kuiper's test.In Sec.II D, we generalize the likelihood function to a mixture of subpopulations, each with a distinct diffusion coefficient.We minimize this joint likelihood with an expectation-maximization (EM) algorithm.A novel selection criterion for the optimal number of subpopulations, based on quality factor statistics, is introduced and tested on synthetic data in Sec.II E. Section III A explores ways to determine whether a small sample of short trajectories is governed by diffusive dynamics or not.In Sec.III B, we use the mixture MLEs to analyze single-molecule tracking experiments 30 reporting on the diffusive dynamics of human MET receptor tyrosine kinase in live cells.An outlook on possibilities to incorporate elements of Bayesian analysis in the theory is presented in Sec.IV.Our findings are summarized in Sec.V and implemented in a data analysis package written in Julia 31 .Detailed derivations and background information on the theory can be found in the Appendix.

A. Microscopic model
A detailed stochastic model capturing the effective dynamics observed in single-molecule tracking experiments was introduced by Savin and Doyle 8 for a uniform illumination profile, and later generalized to arbitrary profiles by Berglund 9 .In what follows, we review the model by Berglund, on which we then build the MLE in Sec.II B.
Consider a freely diffusing particle in one dimension with diffusion coefficient D, whose dynamics is described with a time-continuous Wiener process Y , satisfying Here, a dot indicates the time derivative and ξ(t) denotes Gaussian white noise fully characterized by ξ(t) = 0 and ξ(t)ξ(t ) = δ(t − t ) with δ(t) being the Dirac δ-function.
The particle motion is captured by a camera with frame integration time ∆t, during which the camera shutter may be fully or partially open.This gives rise to the shutter function s(t) ≥ 0 that smears out the particle position over the integration time ∆t, resulting in the process for i = 1, 2, . . . .Furthermore, we assume that each frame is affected by additional measurement noise, which we model as Gaussian with variance a 2 /2.The observed motion of the particle is therefore described by the following stochastic process, where R is a normal distributed random variable with R i = 0 and R i R j = δ i,j , and δ i,j is the Kronecker delta that evaluates to one if i = j and zero otherwise.Note that due to linearity, the Gaussian nature of ξ and R is inherited by the processes X, Y and Z.According to Eqs. ( 2)-( 4), the observed mean-squared displacement (MSD) is given by which is, in comparison to [Y (i∆t) − Y (0)] 2 = iσ 2 , corrupted by the static error a 2 and the dynamic error 2σ 2 B. The latter is characterized by the motion blur coefficient where S(t) gives the relative amount of exposure up to the time t, namely For uniform illumination, we have s(t) = ∆t −1 , which results in a motion blur coefficient of B = 1/6.Generally 0 ≤ B ≤ 1/4 must hold.In the case of (near-)perfect time resolution, e.g., when analyzing molecular dynamics trajectories, one has B = 0.

B. Maximum likelihood estimation
The probability to observe a one-dimensional time series X = (X 0 , X 1 , X 2 , . . ., X N ) T is given by the Gaussian joint probability distribution function p ), which can be reinterpreted as a likelihood function for the parameters a 2 and σ 2 .These enter the dense covariance matrix Σ X whose inversion requires O(N 3 ) operations, and therefore makes the variation of a 2 and σ 2 , when maximizing the likelihood, computationally expensive.
The likelihood function can be expressed more economically in terms of the position increments ∆ i = X i − X i−1 with i = 1, 2, . . ., N .Like the X i they are also Gaussian distributed, their expectation values read ∆ i = 0, and the elements of the corresponding covariance matrix Σ are given by i.e., Σ is a linear combination of the constant tridiagonal matrices Σ and Σ .The likelihood of observing the increments ∆ = (∆ 1 , ∆ 2 , . . ., ∆ N ) T in one dimension thus has the form which results in the following negative log-likelihood (up to a negligible constant), Because Σ is tridiagonal, Eq. ( 11) can be evaluated in an efficient manner.For example, the Thomas algorithm 32 and a three-term recurrence relation 33 can be used to calculate Σ −1 ∆ and |Σ|, respectively, in O(N ) operations.Yet, due to the fact that the covariance matrix here is also a symmetric Toeplitz matrix, the recurrence relation can be solved analytically (see Appendix A), giving Here, Equations ( 12) and ( 13) can be modified to account for non-symmetric tridiagonal Toeplitz matrices, as demonstrated in Ref. 34.
A global analysis of M d-dimensional trajectories of different lengths N m + 1 is realized by minimizing with respect to the same one-dimensional parameters a 2 and σ 2 as before, due to the assumption of isotropic motion.Here, the quantities ∆ m,n and Σ m are defined as before, except that they vary with the trajectory (m = 1, . . ., M ) and dimension (n = 1, . . ., d), respectively.In general, the minimization has to be conducted numerically, but thanks to the linear dependence of Σ on the parameters it becomes analytically tractable on the boundaries, where either a 2 or σ 2 becomes zero.The remaining parameter can then be estimated as follows, where N M = M m=1 N m denotes the total number of considered increments and N m is the number of entries in the vector ∆ m,n , independent of n.Note that the vectormatrix products are all strictly positive, because the matrices Σ , Σ and Σ, as well as their inverses, are all positive definite (see further Appendix B).If both a 2 and σ 2 are greater than zero, Eq. ( 14) can effectively be reduced to a one-dimensional optimization problem in the vein of Ref. 14, where the sine-transformed counterpart of Eq. ( 11) was considered.We thereby introduce the new parameters a 2 and φ = σ 2 /a 2 , which make the covariance matrix and the log-likelihood function separable, i.e., Σ m = a 2 Σm (φ) ∀m and The determinants ln(| Σm |) can be evaluated via Eqs.( 12) and ( 13) by replacing α and β with α = 1 + φ(1 − 2B m ) and β = −1/2 + φB m , respectively.Equation ( 17) gets minimized with respect to a 2 for and therefore reduces to except for an additive constant dN M /2−dN M ln(dN M )/2, which can be neglected because it is independent of a 2 and σ 2 .Minimizing Eq. ( 19) with respect to φ is a nonlinear one-dimensional optimization problem, which can be tackled conveniently using robust derivative-free algorithms, such as Brent's method 35 or golden-section search 36 .The Julia data analysis package 31 makes use of an implementation of Brent's method provided by the Optim package 37 .
To assess the uncertainty of the estimates, we turn to the standard errors δθ = var(θ) of the model parameters θ ∈ {a 2 , σ 2 }.These are bounded from below by the Cramér-Rao bounds, which are computed from the Fisher information corresponding to the likelihood as The matrix elements I i,j of the Fisher information matrix I(a 2 , φ) are specified in Appendix C along with a detailed derivation of the above equations.The bounds in Eqs.(20) become tight in the limit N M → ∞, because the estimators are asymptotically unbiased (see Fig. 1).For extremely sparse datasets, where the lower bounds of Eqs.(20) vastly underestimate the standard errors, other Figure 1.Estimating relative bias in the diffusion coefficient MLE with respect to the trajectory length.We analyzed M simulated trajectories of equal lengths N + 1, either separately or collectively, using Eqs.( 15), ( 16), ( 18) and (19).We considered two cases: a signal-to-noise ratio of (a) σ 2 /a 2 = 1/2 and (b) σ 2 /a 2 = 2.The resulting average diffusion coefficient (blue solid line), which was determined from the single-trajectory estimates and compared to the simulation input value Dexact, was either over or underestimated, respectively, for N 20.As indicated by the guides to the eye (solid black lines), the bias is O(N −3/2 ), which becomes O(N ) for the global MLE estimates (dashed red line for M = 10, dashed-dotted green line for M = 100).This is why the latter are virtually unbiased for almost all trajectory lengths.All results were averaged over multiple realizations to reduce noise.
methods have to be employed to estimate the uncertainty, such as bootstrapping.
It is worth mentioning that the formalism of Refs. 9 and 15, which treats the problem in discrete sine space (albeit only for M = d = 1, but the extension to d and M different from one is straightforward), results in identical numerical values for the parameter estimates and standard errors.However, our approach has the advantage that we obtain closed-form expressions for the standard errors [Eqs.(20)], whereas these same quantities involve finite sums that grow with the trajectory length in discrete sine space.Working in real space can therefore be beneficial under certain circumstances, e.g., in the context of Bayesian inference involving priors based on the Fisher information matrix (see further Sec.IV).
Finally, note that the estimators a 2 and φ (or, equivalently, a 2 and σ 2 ) are slightly correlated, as seen by the non-vanishing off-diagonal elements I 1,2 = I 2,1 of the Fisher information matrix.Figure 1 explores the bias in estimates of σ 2 and the associated diffusion coefficient as a function of the trajectory length N + 1, using twodimensional trajectories generated via Brownian dynamics simulations.The simulations were conducted using discretized versions of Eqs. ( 2)-(4) (see Appendix D 1), and analyzed both on a single-trajectory level and col-lectively.Our numerical results demonstrate that a bias of O(N −3/2 ) affects single-trajectory estimates of short trajectories, but is virtually non-existent in global estimates, where the bias scales like O(N −3/2 M ), i.e., with the total number of increments.This is in good agreement with analytical results obtained in Ref. 14, which show that the estimators in discrete sine space are unbiased up to O(N −1 ).However, Ref. 14 claims that next to said bias, which originates from the asymmetry of the likelihood function, there should be an additional O(N −1/2 ) bias coming from the fact that the maximum likelihood approach requires a 2 and σ 2 to be positive.The diffusion coefficient estimator in real space does not exhibit this second bias, at least not for the signal-to-noise values considered here.

C. Quality factor analysis
How can we test that we are actually observing free diffusion?According to Eq. ( 10), for a diffusive process the elements of Σ −1/2 m ∆ m,n for a fixed n are independent, uncorrelated normal random variables with zero mean and unit variance.The sum of their squares, given by should therefore be distributed according to a χ 2distribution with dN m degrees of freedom.Note that we do not correct dN m by the number of model parameters, because the degrees of freedom associated with a single trajectory are just a tiny fraction of the overall number of unconstrained degrees of freedom, dN M − 2.
In principle, we could test whether a sample of M trajectories adheres to the diffusion model in Eqs. ( 2)-( 4) by verifying the distribution of the corresponding χ 2 m -values, but only if they are all of equal length.For trajectories that differ in their lengths N m +1, the quadratic forms χ 2 m follow different χ 2 -distributions.To simplify the analysis, we focus on their corresponding cumulative distribution functions, which are all uniform on the interval [0, 1).The associated statistic is given by where γ(a, z) and Γ(a) denote the lower incomplete and ordinary Γ-functions, respectively, that are defined as follows, This is reminiscent of the discussion in Ref. 13, where a quantity similar to the one in Eq. ( 22) is referred to as the quality factor Q, because its values are uniformly distributed whenever the elements of ∆ m,n are truly independent.The main difference between Eq. ( 22) and traditional quality factors, such as the one in Ref. 13, is that the latter consider χ 2 -statistics computed by summing over the weighted squared differences between data and model predictions.Here, the model makes predictions about the distribution of χ 2 m of individual trajectories m and, in turn, of Ϙ m , into which the parameters a 2 and σ 2 enter via the covariance matrix.To highlight this distinction, we make use of the archaic greek letter qoppa, instead of Q, to denote this (somewhat unorthodox) quality factor.
To test whether a set {Ϙ m } m=1,...,M of quality factor values follows the uniform distribution on [0, 1) expected for diffusive processes, we employ a variation of the Kolmogorov-Smirnov statistic called the Kuiper statistic 38,39 .Kuiper's variant measures the largest vertical deviations of the empirical distribution function above and below the cumulative distribution function, and is defined as their sum.In our case, it thus reads for a list of Ϙ m -values, m = 1, 2, . . ., M , sorted in ascending order.Ideally, we have κ ≈ 1, but significantly larger κ-values arise when the underlying assumption of ∆ m,n being normal distributed is violated.This is, e.g., the case for heterogeneous data arising from subpopulations with differing diffusion coefficients, as discussed in Sec.II D.
Finally, it should be mentioned that one can construct a right-tailed p-value from the sampling distribution of κ.Asymptotically, it takes the form where the sum has to be truncated at some large value, e.g., at m = 100 or m = 1000, if used in practice.The associated probability p can be helpful to determine whether a κ-value is reasonable or not.For sparse datasets, tables can be used to look up the p-value 40 .

D. Analysis of mixtures of subpopulations
In situations where a single diffusion coefficient cannot describe the observed dynamics in a sample of trajectories, the single-population analysis of Sec.II B breaks down.Here, we extend the model by introducing K distinct subpopulations, each characterized by a set of parameters {a 2 k , σ 2 k }, k = 1, 2, . . ., K. Every tracked particle is assumed to belong to one of these subpopulations, but the exact assignment is not known a priori.Particles are not allowed to switch between populations and therefore have a fixed diffusion coefficient, depending on which subpopulation k they belong to.The task of parameter fitting now becomes two-fold: The parameters of each subpopulation have to be varied to find their optimal values, while, at the same time, the particle trajectories have to be assigned to the subpopulations they most likely belong to.Problems like these can be treated with the help of the EM algorithm 41 , as outlined below.
To analyze data from a mixture of K subpopulations with distinct diffusive dynamics, we consider the following likelihood function with latent mixing fractions P k , The one-dimensional likelihoods k ( ∆ m,n | a 2 k , σ 2 k ) are given by Eq. (10), where the subscript k indicates that the covariance matrix Σ = Σ m is evaluated using the subpopulation parameters a 2 k and σ 2 k .Now, due to the sum appearing in , its negative log-likelihood remains intractable.This is where the EM algorithm comes in: Instead of minimizing − ln( ) explicitly, we consider the upper bound which follows from Jensen's inequality.In principle, such upper bounds can be constructed for arbitrary coefficients T k,m > 0, whose interpretation becomes clearer in what follows.If we were able to choose the T k,m such that equality holds in Eq. ( 26) for all parameter tuples {P k , a 2 k , σ 2 k } k=1,...,K , then it would not matter if we minimized − ln( ) or L, given a known set of T k,m .This is exactly the case whenever so by requiring that the T k,m are also normalized, we finally obtain It now becomes apparent that T k,m corresponds to the probability of trajectory m belonging to the k-th subpopulation.
The EM algorithm is a two-step algorithm, where in the first step the current values of {P k , a 2 k , σ 2 k } are used to update the classification coefficients T k,m via Eq.( 27).
In the second step, the latter are plugged into the upper bound [Eq.( 26)], which can be rewritten as Here, L( ∆ m,n | a 2 k , σ 2 k ) is defined in Eq. ( 11) for Σ = Σ m .Minimizing Eq. ( 28) with respect to {P k , a 2 k , σ 2 k } updates the estimate for the parameters, and the first step is repeated.Said minimization can mostly be done analytically, because Eq. ( 28) is separable, and therefore susceptible to the methods applied in Sec.II B. For the mixing fractions, we get On the boundary, the solutions read Using the parameter representation (a 2 k , φ k ) with φ k = σ 2 k /a 2 k , Eq. ( 28) is minimized for The high-dimensional minimization problem therefore reduces to a series of analytic expressions [Eqs.( 29), ( 30), ( 31) and ( 32)], and a handful of independent onedimensional optimization problems [Eq.(33)], which can be solved numerically in an efficient way.The algorithm is implemented in a Julia data analysis package 31 .Finally, we would like to mention that one could also consider a continuum of a 2 and σ 2 values, as an alternative to discrete subpopulations.If only a 2 is allowed to vary between trajectories, a latent variable model could be considered.In general, a Bayesian formulation would naturally lend itself to a continuous-parameter treatment, in which a prior distribution would be updated into a posterior distribution in light of the observed data.

E. Selection criterion for optimal number of subpopulations
As the number of subpopulations K is increased, the fit to heterogeneous data gradually gets better.Because the regularity conditions for conventional criteria, such as the Bayesian information criterion (BIC), do not hold for finite mixture models 42 , we propose to rely on a quality factor analysis for model selection.This is done as follows: After repeated fitting of a dataset via the EM algorithm for a fixed K, the classification coefficients T k,m that (along with the optimal parameter values {P k , a 2 k , σ 2 k }) minimize Eq. ( 28) are used to assign each trajectory m to the subpopulation k it most likely belongs to, according to The associated quality factors {Ϙ (k) m } are then determined using the subpopulation-specific parameters a 2 k and σ 2 k , resulting in a set {Ϙ (1) m }, {Ϙ (2) m }, . . ., {Ϙ (K) m } that is finally plugged into Eq.( 23) to evaluate the Kuiper statistic κ. Figure 2(a) serves as a proof of concept for this selection procedure, where κ is plotted as a function of K for a simulated heterogeneous dataset with three distinct subpopulations.The estimates for K = 3 are in excellent agreement with the input parameter values of our simulations for three subpopulations, and result in κ ≈ 1.13.Note that the slight increase in κ for K > 3 is due to stochasticity in the EM optimization, which becomes more pronounced as the number of subpopulations is increased (for details on our implementation of the EM algorithm, see Appendix D 2).Also note that we are not interested in the global minimum of κ with respect to K, but in the smallest K where κ ≈ 1.In practice, this can be realized by a threshold value for κ, below which κ is considered sufficiently close to one to accept the associated K as the optimal number of subpopulations.We recommend practitioners to choose a threshold somewhere between κ ≈ 1.75 and 1.42 (corresponding to the p-values 0.05 and 0.25, respectively) that reflects their confidence in the quality of the data.
For comparison, we show predictions of two established selection criteria, namely the BIC and the integrated completed likelihood (ICL) criterion 42 , in Fig. 2(a).The former can be evaluated for our model as follows, where the likelihood is given by Eq. ( 28).The ICL is formulated in an almost identical manner as the BIC, except that the classification coefficients T k,m in front of the square brackets in Eq. (28)    k=1,2,...,9 = 0.08, 0.0005, 0.009, 0.02, 0.18, 0.32, 0.10, 0.34, 0.72) and following the same length distribution as in (a).The data were analyzed for K = 1, 2, . . ., 15, and the predictions of BIC (red) and ICL (green) were determined by the positions of their minima on said interval.In contrast, the novel selection criterion (blue) relied on the first instance where κ went below a threshold value of either κ ≈ 1.75 (p = 0.05) or κ ≈ 1.42 (p = 0.25).In the event of the threshold not being reached, which happened more frequently for the lower threshold, the position where κ got minimized was chosen.
Whereas the BIC overestimates the number of subpopulations, the ICL and the Kuiper statistics criterion produce reliable results for the mixture of three subpopulations.However, for more complex mixtures, also the ICL fails.Figure 2(b) visualizes the prediction statistics of BIC, ICL and the Kuiper statistics criterion when applied to datasets composed of trajectories sampled from nine distinct subpopulations.Although the distributions of predicted K-values are fairly broad, probably due to the trajectories being so short and few, the biases of BIC and ICL are clear, thus confirming the claim that traditional selection criteria cannot be relied on in our particular case.

III. APPLICATION TO EMPIRICAL DATA
To illustrate the use and effectiveness of the above theory, we applied it to simulations and experimental data.Comparisons to the former are primarily intended to gauge the limits of the theoretical framework, while the analysis of experimental data is meant to demonstrate its use in practice.

A. Simulated diffusion data
As mentioned earlier, there are multiple modes of transport possible in microbiological systems, and a diffusion coefficient estimate is only reliable when the data satisfy the assumptions of the model.In this regard, we generated one-dimensional synthetic trajectories imitating fractional Brownian motion (FBM) 43 and diffusion in inhomogeneous media 44 , two processes which deviate from regular diffusion distinctly, and tested whether the theory was able to detect their non-diffusive behavior.This was realized by analyzing three well-defined quantities: the distribution of quality factors, the distribution of singletrajectory σ 2 -estimates, and the distribution of statistics resulting from a two-sample Kolmogorov-Smirnov (KS) test, which estimates how likely it is for early parts of a trajectory to be governed by the same diffusion dynamics as its later parts.Said distributions were extracted from the non-diffusive data and compared visually to their diffusive counterparts.
FBM is a Gaussian process with stationary increments, similar to the Wiener process in Eq. ( 2).However, the increments of FBM need not be independent, and therefore give rise to a MSD that grows proportional to t 2H with H ∈ (0, 1) being the so-called Hurst index.For H = 1/2 we retrieve the Wiener process, whereas the cases H < 1/2 and H > 1/2 result in sub-and superdiffusion, respectively, due to negative and positive correlations.We generated realizations of FBM via the circulant method 45 , and added static and dynamic noise by replacing Y (t) in Eqs. ( 2)-(4) (or their discretized counterparts, see Appendix D 1) with said realizations.The resulting trajectories were analyzed both individually and globally using Eqs.( 15), ( 16), ( 18) and (19), where the solution that resulted in the lowest negative log-likelihood value was chosen in each case.We used the global estimates for a 2 and σ 2 , on the one hand, to calculate the corresponding Ϙ m -values [Eq.(22)] and, on the other hand, as input parameters for simulations of regular diffusion, i.e.Eqs. ( 2)-( 4), which we considered as reference data.For further details on the Brownian dynamics simulations, see Appendix D 1.The above-mentioned KS test was achieved by splitting Σ −1/2 m ∆ m in half (dropping the last entry if the number of elements in ∆ m was uneven), and forming two empirical distribution functions that were then compared.Note that d = 1, so we have dropped the index n in ∆ m,n .The covariance matrices Σ m were evaluated using the global parameter estimates.The above procedures gave rise to the samples {σ 2 m }, {Ϙ m } and {S m } for m = 1, 2, . . ., M of single-trajectory σ 2estimates, quality factors and KS statistics, respectively, whose distributions are plotted in Fig. 3.Here and in the following, reference simulations were sampled extensively such that the statistical scatter in the reference histograms is negligible.
Overall, the data sparsity and strong static noise, which in most cases lead to a low effective signal-to-noise ratio σ 2 /a 2 , made the non-diffusive nature of FBM somewhat difficult to detect.For example, in the case of H = 1/4, the only significant discrepancy between the data and reference was found in the distribution of singletrajectory estimates of σ 2 .Yet, this turned out to be a clear indication for non-diffusive dynamics in light of the fact that the other two quantities did not deviate from their diffusive counterparts.For H = 3/4, where discrepancies were found in all considered quantities, the situation was more definitive.A possible reason for the vastly different outcomes for H = 1/4 and H = 3/4 is that the latter case results in a slower decay of correlations between increments than H = 1/4.The two cases are therefore by no means symmetric, as one might naively assume.
The second non-diffusive process we considered allows for a position-dependent diffusion (PDD) profile D(z) = σ(z) 2 /2∆t.One thereby has to replace Eq. ( 2) with the following stochastic differential equation, which is to be interpreted in the sense of Itô.Here, f (z) denotes the derivative of f (z) with respect to z.The resulting process is generally not Gaussian, due to the presence of multiplicative noise.For our simulations, we chose σ(z) = σ 0 [1 − 0.9e −(z/z0) 2 /2 ] to mimic the diffusion around a diffusivity well, and the initial positions Y 0 were drawn from a uniform distribution on the interval [−5z 0 , 5z 0 ].Analogous to the FBM data, the lengths of the trajectories were uniformly distributed on [4, 101] and a total of 1000 non-diffusive trajectories were considered.
Our results for a 2 = σ 2 0 = 1 and z 0 = 3 are given in Fig. 4(a).Again, we compared our results to refer- , where σ 2 is the variance of the normal random variable used as input for the circulant method, we generated 1000 trajectories with lengths uniformly distributed on [4,101] governed by subdiffusive (left panels), diffusive (center panels) and superdiffusive (right panels) dynamics, and compared them to reference data made up of 10 5 diffusive trajectories.We considered the (a) distributions of single-trajectory σ 2 -estimates, distributions of quality factors calculated for the FBM data (insets), and (b) distributions of KS statistics, which test for varying diffusivity within the trajectories.The gray histograms are associated with the FBM data, whereas the colored histograms belong to the reference data.The arrows at the bottom of the plots in (a) indicate the global σ 2 -estimates.For the superdiffusive data, we saw clear deviations in all three distributions, thus confirming that the process under scrutiny was non-diffusive.However, for H = 1/4 the results were more subtle: we only detected clearly noticeable deviations in the distribution of σ 2 , which, in combination with the fact that the other two distributions fit the reference data, hinted at non-diffusive behavior.
ence data simulated using the global MLE estimates of a 2 and σ 2 obtained from an analysis of the PDD data.The fact that the quality factor distribution is anything but uniform clearly indicates that the process cannot be described by a single diffusion coefficient.This is also mirrored in the lack of overlap between the PDD data and the reference seen in the distribution of single-trajectory σ 2 -estimates.Only the KS statistics seemed consistent between the two datasets.We also fitted the non-diffusive data with mixture models and found a decent match (κ ≈ 1.40) for K = 3 and the parameters a 2 k=1,2,3 ≈ 0.97, 1.03, 1.03, σ 2 k=1,2,3 ≈ 0.29, 0.86, 0.03, and P k=1,2,3 ≈ 0.17, 0.71, 0.12.As a test for consistency and to see if diffusion in inhomogeneous media can be distinguished from heterogeneous mixtures using samples of short trajectories, we then used these parameters to generate new synthetic data.These data, drawn from the mixture model best describing the PDD, are compared to the PDD data in Fig. 4(b).Overall, the data from the mixture model and PDD process are fairly similar, thus making it impossible to distinguish between the processes, at least for the PDD profile considered here.
In conclusion, it can be challenging to distinguish between diffusive and non-diffusive processes when the samples are made up of extremely short trajectories.Also the presence of static and dynamic noise muddies the waters, especially when the (effective) signal-to-noise ratio is poor.We recommend that practitioners investigate the three above-mentioned distributions, compare their results to Brownian dynamics simulations, and even consider further tests, such as the goodness-of-fit test proposed in Ref. 14.

B. Experimental data
We also applied the above theory to single-molecule tracking data for human MET receptor tyrosine kinase, bound to labeled 3H3-Fab antibody fragments within the plasma membrane of HeLa cells 30 .In the original study, tracking in two dimensions was conducted for multiple cells using the super-resolution imaging protocol uPAINT 46 and imaging fluorescence correlation spectroscopy 47 .Next to "resting" MET, realized with the above-mentioned Fab antibodies, the authors of Ref. 30 also considered internalin B-bound MET.Here, we limit our analysis to the Fab-data recorded via uPAINT for 10 randomly chosen cells.To account for the fact that each cell might be influenced by its local environment, we refrained from pooling together data measured in different cells.The temporal resolution of the experiment was ∆t = 0.02 s, and we assumed a blurring coefficient of B = 1/6 for all trajectories.
Each trajectory of a given cell was first analyzed separately, analogous to the simulation data in Sec.III A. This gave rise to M diffusion coefficients per cell (M = 1280, 1943, 1567, 1709, 1840, 874, 1165, 1582, 1178, 1357 for cells 1 through 10), i.e., one for each trajectory, and allowed us to sieve out the trajectories of essentially immobile particles, which we defined as those having D < 1 × 10 −6 μm 2 s −1 .This is in line with the original analysis of the data 30 but, alternatively, one can explicitly account for the immobile fraction in the modeling 48 .We then analyzed globally the remaining trajectories to estimate a 2 and σ 2 for each cell.The normalized histograms of single-trajectory diffusion coefficients for two distinct cells are depicted in Fig. 5, next to Brownian dynamics simulation results for a single-population model.The simulations were conducted using discretized versions of Eqs. ( 2)-( 4) (see Appendix D 1), where a 2 and σ 2 were set equal to the global cell parameter estimates, and the length of each trajectory was drawn from the observed length distribution of the respective cell.Figure 5 clearly illustrates that the experimentally determined distributions are too broad to be explained by a single set of diffusion parameters.To verify this conclusion, we computed the corresponding quality factors [Eq.( 22)], one for each considered trajectory, and inspected visually, as well as with the help of the Kuiper statistic [Eq.( 23)], whether they were uniformly distributed on the interval [0, 1).Unsurprisingly, this was not the case, as seen in the insets of Fig. 5.We therefore proceeded to fit the data with a mixture of K = 1, 2, . . ., 15 subpopulations via the EM algorithm of Sec.II D, starting from multiple different random initial values for {a 2 k , σ 2 k } k=1,...,K , and uniform mixing fractions P k = 1/K ∀k.This was done to increase the chance of finding the global minimum of Eq. (28).Further details on our implementation of the EM algorithm can be found in Appendix D 2.
Although our results for the live-cell data were not entirely unambiguous, the general trend of κ decreasing with increasing K was still observed [see Fig. 6(a)].Depending on the choice of threshold, we either ended up with six or three cells, for which the null hypothesis that their single-trajectory diffusion coefficient distributions originated from a finite mixture of fast and slowly diffusing particles could not be rejected.Furthermore, the three cells that crossed the more stringent threshold of κ ≈ 1.42 all gave κ-values with associated p-values [Eq.(24)] well above 0.5.For cells 6 and 9, the same optimal K-values were predicted for both thresholds, whereas the optimum shifted somewhat upwards for cell 8.
The results of the lower threshold suggest that the trajectories of cells 6, 8 and 9 should be sorted into 11 to 13 distinct subpopulations [see Figs.6(b-d)], whose relevant parameters are tabulated in Table I.However, it should be noted that some of the populations can be lumped together if we solely focus on the diffusion coefficient.Also, the last two subpopulations of cell 8 are extremely underrepresented and can, in principle, be neglected.While this may not be apparent from the corresponding P k -values, it becomes clear when the classification coefficients T k,m are inspected: It turns out that for the parameters presented here only 9 and 6 trajectories out of a total of 1349 mobile ones get assigned to the 12th and 13th subpopulations, respectively.This might be an indication of an overly stringent threshold, and that cell 8 is better described with K = 11 subpopulations.In conclusion, the effective number of subpopulations within each cell might therefore be somewhat smaller than the output of the EM algorithm seems to imply, but certainly not two or three.
For completeness, in Fig. 7 we plot the KS distributions for cells 6, 8 and 9, as well as for the multi-population simulation data of Fig. 6.In comparison to single-population reference data, we can see that both the cell data and multi-population simulation data are shifted and have a slightly broader tail than the reference.The qualitative agreement between the cell data and the simulation data strengthens our working hypothesis that the measured trajectories originate from a heterogeneous mixture of fast and slowly diffusing particles.However, in light of our results in Sec.III A, we cannot exclude the possibility that the experimental data originate from an underlying intricate PDD profile.23)] for each cell as a function of the number K of subpopulations in the mixture.The black dashed lines mark the thresholds κ ≈ 1.42 and κ ≈ 1.75, below which there is a > 25% and > 5% chance, respectively, to obtain quality factor results at least as extreme as actually observed.Three cells clearly cross the lower threshold, and a fourth one narrowly misses it.(b-d) Comparison between experimental single-trajectory diffusion coefficient distributions (grey histograms) and results from Brownian dynamics simulations.The 10 5 synthetic diffusion coefficients making up each of the colored histograms were obtained in almost the same way as in Fig. 5, except that different parameters a 2 k and σ 2 k were used for each mixture component, and the mixing fractions P k determined their frequency of occurrence.The arrows indicate the diffusion coefficients behind the subpopulations and their relative opacities the respective mixing fractions, both of which are tabulated in Table I.Insets: Distributions of quality factors, after sorting the trajectories according to the subpopulations they most likely belong to.
Note that there appears to be a weak negative correlation between the average trajectory length and the diffusion coefficient associated with each subpopulation.This means that the trajectories of fast diffusing populations are generally shorter than the trajectories of slow diffusing populations.We can rule out effects due to the trajectory-length bias of the MLEs (see Fig. 1), because said bias is virtually non-existent in global estimates of trajectories of length N + 1 ≥ 5 and the experimental datasets contain exclusively trajectories of length 8 or greater.The correlation is possibly explained by the fact that tracking algorithms have difficulties stitching together trajectories of fast diffusing particles with large displacements between frames, and are therefore more likely to split them up into shorter trajectories.
Finally, we want to demonstrate how the information encoded in the classification coefficients T k,m can be em-ployed, such as specifically picking out trajectories belonging to a certain subpopulation for further analysis.Figure 8(a-c) visualizes the trajectories of mobile particles in cells 6, 8 and 9.We indicate increasing receptor diffusivity with a color code from blue to red.The resulting map of confined blue and spread-out red trajectories is reminiscent of the diffusivity landscapes mapped out by Gaussian propagators in Refs.20 and 23.Motivated by this observation, we applied the TRamWAy open-source software platform for analyzing single biomolecule dynamics 23 to the tracking data of our best-behaving cells.We thereby relied on the diffusion-only (standard.d)inference mode with the hyperparameter of the diffusivity prior (diffusivity prior) and the minimum number of assigned (trans-)locations (min location count) both set to 1, and a nearest neighbor assignment by count (from nearest neighbors) of 10.The localization preci-Table I. Fit parameters for specific cells obtained via the EM algorithm.The number of trajectories M , mean trajectory length N + 1 and standard deviation (SD) of the trajectory lengths are listed for each cell.The units of the diffusion coefficients D k and static noise amplitudes a 2 k are μm 2 s −1 and μm 2 , respectively.(2.0 ± 0.      I).We considered 10 5 trajectories with lengths distributed in the same way as in the experiments.
diffusion 49 , we leave a more definitive analysis of the spatial patterning of cell-surface receptor diffusion for future studies.27)], we assign each trajectory to an appropriate subpopulation.Depending on their diffusion coefficients (see also Table I), the subpopulations have been color-coded from blue (slow) to red (fast).As in Fig. 6, we focus on the three cells whose diffusion coefficient distributions are well described with the mixture model, namely (a) cell 6, (b) cell 8, and (c) cell 9. (d-f) Diffusivity landscapes inferred from mobile and immobile particles using the software platform TRamWAy.The localization uncertainty was estimated as a ≈ 0.05 μm from the values in Table I, and the results seemed independent of the value chosen for the regularization factor.Note that the resolution of the diffusivity landscape is lower than the trajectory representation in (a-c), which also results in the more moderate diffusion coefficient values.

IV. OUTLOOK: TOWARDS BAYESIAN INFERENCE
We want to point out that both the single-population log-likelihood [Eq.( 14)] and the mixture log-likelihood [Eq.(28)] can be paired with priors Π(a 2 , σ 2 ) to construct posterior distributions via Bayes' theorem.This not only affects the numerical values of the parameter estimates, but also their interpretation, because in a Bayesian setting the model parameters are assumed to be randomly distributed.A Bayesian prior can, e.g., be used to allow for variations in the localization uncertainty a 2 for each recorded particle individually.
The priors can either be chosen as (weakly) informative, where definite information about a parameter is taken into consideration, or uninformative.We expect informative prior information (if any) to exist for a 2 , which can often be estimated empirically, e.g., via auxiliary experiments involving immobilized labeled particles.Its estimated mean a 2 emp and uncertainty δa 2 emp can be incorporated into an informative prior, such as , which then enters the joint prior Π(a 2 , σ 2 ) as follows, The remaining prior Π(σ 2 | a 2 ) should be chosen as uninformative if no substantial information on σ 2 is available.The Jeffreys prior 50 is a classic uninformative prior, which is proportional to the square root of (the determinant of) the Fisher information and therefore invariant under coordinate transformations.Using the coordinates a 2 and φ, the Jeffreys prior for a fixed a 2 corresponds to the square root of the I 2,2 -element of the matrix I(a 2 , φ) and is, in fact, independent of a 2 .Alternatively, one can assume a uniform prior for φ, i.e.
If no information is available on any of the two parameters, the priors can either be chosen to differ between parameters or be of the same type.The two-parameter Jeffreys prior is given by Π(a 2 , φ) = det I(a 2 , φ) , where an analytic expression for the Fisher information matrix I(a 2 , φ) can be found in Appendix C. While it is usually not recommended to use Jeffreys prior on anything other than single-parameter models, it turns out that for the present problem the preferred alternative, the so-called reference prior 51,52 , coincides with the bivariate Jeffreys prior.This might be due to the fact that the Jeffreys prior becomes separable when treated in the (a 2 , φ)-coordinates, or because both parameters are scale parameters.
Bayesian estimates are, in comparison to MLEs, often more sharply distributed and thus have smaller uncertainties.However, poorly chosen priors introduce biases.For uninformative priors, the bias vanishes as the sample size goes to infinity, but different priors result in different biases.Also, the specific choice of the estimator affects the size of the bias and the way it scales with the sample size.We therefore recommend that practitioners gauge which uninformative priors work best with their estimator of choice.

V. CONCLUSIONS
In this paper, we have derived a numerically efficient maximum likelihood estimator to extract diffusion coefficients from single-molecule tracking experiments subject to static and dynamic noise.To estimate diffusion coefficients from molecular dynamics simulations, one sets the blur coefficient to B = 0.The estimator is based on the statistics of trajectory increments in real space and therefore complements the theory presented in Refs.9 and 15, where an estimator operating in discrete sine space was developed.Fourier representations are computationally efficient and naturally lead, e.g., to a power spectral analysis of the diffusion process 14 .Our approach has the benefit of delivering closed-form expressions for some quantities, such as the accompanying Fisher information matrix.Compared to other diffusion coefficient estimators, the maximum likelihood estimator has several advantages, e.g., it allows for the inclusion of prior knowledge in a Bayesian manner, makes the error analysis for trajectories of different lengths straightforward, and leads naturally to a quality factor analysis, which helps us to assess whether the data satisfy the underlying assumptions of the diffusion model or not.
In case of heterogeneous data, the theory can be extended to a mixture model, where the overall numerical efficiency is retained for each subpopulation.Minimizing the corresponding negative log-likelihood is then achieved with the help of the expectation-maximization algorithm.To demonstrate its applicability and effectiveness, we first considered sparse quantities of non-diffusive synthetic data to test whether the theory was able to detect any deviations from regular diffusion.Our results highlight the difficulty to distinguish between diffusive and nondiffusive processes, especially when the trajectories are overwhelmingly short and corrupted by static and dynamic noise.We then used the framework to analyze single-molecule tracking data for "resting" MET receptor tyrosine kinase, recorded in ten different cells 30 .As the fit gradually got better with increasing number of subpopulations K, we relied on the distribution of quality factors, which are a measure of how well a dataset conforms to the assumptions of the underlying diffusion model, to determine the optimal value of K.Although our results varied significantly between the cells, we found solutions for four of the ten cells, where the null hypothesis that the tracking data originates from a mixture model could not be rejected.All these fits involved fairly large K-values, so we ruled out models containing only a few subpopulations.However, the effective number of subpopulations can often be reduced, e.g., by neglecting sparsely represented subpopulations or by lumping together subpopulations with similar diffusion coefficients.Overall qualitative agreement between the experimental data and simulations suggests that the broad distributions of diffusion coefficients observed in the experiment are a result of heterogeneous mixtures, although position-dependent diffusion as an underlying process cannot be ruled out.In fact, for short trajectories that probe only local regions, position dependence and heterogeneity in diffusion are intertwined.
We believe that our results provide practitioners of single-molecule tracking techniques with valuable tools to analyze their data.To facilitate their use, we have implemented the theoretical formulations in a Julia data analysis package 31 .The general formulation and efficient evaluation of the likelihood functions should also pave the way for more elaborate analysis methods in the future, e.g., involving Bayesian statistics.

SUPPLEMENTARY MATERIAL
A supplementary file contains detailed information on the trajectory-length distribution for the ten cells analyzed, the model parameters and their uncertainties from fits with different numbers K of subpopulations, and the results of the quality factor analysis in terms of the Kuiper statistic κ and the associated p-value.

DATA AVAILABILITY
The data that support the findings of this study are available from the corresponding author upon reasonable request.Requests for the tracking data should be directed to Prof. Dr. Mike Heilemann, corresponding author of the original publication.
Considering first the scalar case, we have Σ = θ Σ θ with Σ a 2 = Σ and Σ σ 2 = Σ , and therefore Ensemble averages only affect terms dependent on ∆, which are in this case all of quadratic form and can therefore be computed using ∆ i ∆ j = Σ i,j = θ (Σ θ ) i,j .When evaluating the scalar Fisher information, we encounter a sum of terms of the form δ i,j = θN , which simply reduce I(θ) to and thus give rise to the following lower bound for the standard error δθ = var(θ), The general case makes use of the Hessian 2 × 2 matrix, whose components are given by according to the identity dA −1 /dθ = −A −1 (dA/dθ)A −1 .The first diagonal element of the Fisher matrix is calculated in an analogous way to the scalar Fisher information above, giving For the summands of the non-diagonal elements, we find Σ−1 i,j Σ j,i = a 2 tr( Σ−1 Σ ) .The last diagonal element of the Fisher matrix can be computed with the help of In the general case, the standard errors emerge from the diagonal elements of the inverse Fisher information matrix I −1 (a 2 , σ 2 ).This inverse is related to I −1 (a 2 , φ), which is simply given by .
A coordinate transformation then gives us with an inverse Jacobian This gives rise to Eqs. (20) of the main text.

Figure 3 .
Figure 3. Testing for diffusivity in FBM subject to static and dynamic noise.Using B = 1/6 and a 2 = σ 2 = 1, where σ 2 is the variance of the normal random variable used as input for the circulant method, we generated 1000 trajectories with lengths uniformly distributed on[4,101] governed by subdiffusive (left panels), diffusive (center panels) and superdiffusive (right panels) dynamics, and compared them to reference data made up of 10 5 diffusive trajectories.We considered the (a) distributions of single-trajectory σ 2 -estimates, distributions of quality factors calculated for the FBM data (insets), and (b) distributions of KS statistics, which test for varying diffusivity within the trajectories.The gray histograms are associated with the FBM data, whereas the colored histograms belong to the reference data.The arrows at the bottom of the plots in (a) indicate the global σ 2 -estimates.For the superdiffusive data, we saw clear deviations in all three distributions, thus confirming that the process under scrutiny was non-diffusive.However, for H = 1/4 the results were more subtle: we only detected clearly noticeable deviations in the distribution of σ 2 , which, in combination with the fact that the other two distributions fit the reference data, hinted at non-diffusive behavior.
statistic S m probability density p(S m ) position-dependent diffusion mixture model, K = 3

Figure 4 .
Figure 4. Comparing position-dependent diffusion to mixture models.(a) Analogous to Fig.3, we compared PDD data (gray histograms) to diffusive reference data (blue histograms) in terms of their distributions of single-trajectory σ 2 -estimates (left panel), quality factors (inset) and KS statistics (right panel).The PDD data was generated via a stochastic process X satisfying Eqs.(36), (3) and (4), resulting in 1000 non-diffusive trajectories with lengths uniformly distributed on[4,101].The reference data consisted of 10 5 trajectories following the same length distribution as the non-diffusive data, and were generated using B = 1/6, and a 2 and σ 2 extracted from a global analysis of the PDD data (arrow indicates global σ 2 -estimate).The non-uniform quality factor distribution and strong deviations between the σ 2 -distributions clearly indicate a non-diffusive behavior in the PDD data.(b) After analyzing the non-diffusive data via mixture models, we used the parameters for the minimal model best describing the data in (a) (arrows indicate the σ 2 -estimates, and their relative opacities the respective mixing fractions) to generated the 10 5 trajectories behind the red histograms.Insets: Quality factor distributions for the PDD data.The associated Ϙ-values [Eq.(22)] were calculated using (a) the parameters of a single-population fit, and (b) the fit parameters of a mixture model with K = 3.

Figure 5 .
Figure 5. Assessment of model with a single diffusion coefficient against single-molecule localization microscopy experiments.Shown are the distributions of single-trajectory diffusion coefficients estimated from experiment (gray histograms) and from synthetic trajectories generated via a single-population model (colored histograms).(a) The global fit parameters a 2 = (3.51± 0.03) × 10 −3 μm 2 and D = (1.21± 0.01) × 10 −1 μm 2 s −1 (red arrow) were used to simulate 10 5 independent trajectories following the same length distribution as the experimental data for cell 2. The distribution of single-trajectory diffusion coefficients (red histogram) turned out to be too narrow to explain the experimental observations.(b) Same as in (a), except that for cell 5 the parameters underlying the brown histogram are a 2 = (3.09± 0.03) × 10 −3 μm 2 and D = (1.062± 0.009) × 10 −1 μm 2 (brown arrow).Insets:Distributions of quality factors corresponding to the experimentally observed trajectories, evaluated using the respective global single-population fit parameters.Their horned shapes strongly deviate from the expected uniform distribution on [0, 1) (black dashed lines), which highlights the fact that the experimental data cannot be properly described by a singlepopulation model.

Figure 6 .
Figure 6.Multi-mixture models compared to experimental (a) The Kuiper statistic [Eq.(23)] for each cell as a function of the number K of subpopulations in the mixture.The black dashed lines mark the thresholds κ ≈ 1.42 and κ ≈ 1.75, below which there is a > 25% and > 5% chance, respectively, to obtain quality factor results at least as extreme as actually observed.Three cells clearly cross the lower threshold, and a fourth one narrowly misses it.(b-d) Comparison between experimental single-trajectory diffusion coefficient distributions (grey histograms) and results from Brownian dynamics simulations.The 10 5 synthetic diffusion coefficients making up each of the colored histograms were obtained in almost the same way as in Fig.5, except that different parameters a 2 k and σ 2 k were used for each mixture component, and the mixing fractions P k determined their frequency of occurrence.The arrows indicate the diffusion coefficients behind the subpopulations and their relative opacities the respective mixing fractions, both of which are tabulated in TableI.Insets: Distributions of quality factors, after sorting the trajectories according to the subpopulations they most likely belong to.

Fig. 8
also in the context of corralled

Figure 7 .
Figure 7. Distributions of Kolmogorov-Smirnov statistics for the cells analyzed in Fig.6.Using global single-population estimates for a 2 and σ 2 , we estimated how likely it is for early parts of a trajectory to be governed by the same diffusion dynamics as its later parts via two-sample KS tests.The results for (a) cell 6, (b) cell 8 and (c) cell 9 (gray histograms) are consistent and show shifted weights toward smaller Sm-values and broader tails than the corresponding reference distributions (colored histograms).Insets: Analogous plots for Brownian dynamics simulation data generated using the mixture model parameters behind Figs.6(b-d) (see also TableI).We considered 10 5 trajectories with lengths distributed in the same way as in the experiments.

2 s − 1 ]Figure 8 .
Figure 8. Spatial distribution of fast and slowly diffusing MET receptor tyrosine kinases.Using the classification coefficients [Eq.(27)], we assign each trajectory to an appropriate subpopulation.Depending on their diffusion coefficients (see also TableI), the subpopulations have been color-coded from blue (slow) to red (fast).As in Fig.6, we focus on the three cells whose diffusion coefficient distributions are well described with the mixture model, namely (a) cell 6, (b) cell 8, and (c) cell 9. (d-f) Diffusivity landscapes inferred from mobile and immobile particles using the software platform TRamWAy.The localization uncertainty was estimated as a ≈ 0.05 μm from the values in TableI, and the results seemed independent of the value chosen for the regularization factor.Note that the resolution of the diffusivity landscape is lower than the trajectory representation in (a-c), which also results in the more moderate diffusion coefficient values.
With the help of Σ = d Σ/dφ and Jacobi's formula, which states that d|A| dθ = |A| tr A −1 dA dθ must hold for an invertible matrix A, we can rewrite the trace as follows, tr( Σ−1 Σ ) = tr Σ−1 d