A combinatorial view of stochastic processes: White noise

White noise is a fundamental and fairly well understood stochastic process that conforms the conceptual basis for many other processes, as well as for the modeling of time series. Here we push a fresh perspective toward white noise that, grounded on combinatorial considerations, contributes to give new interesting insights both for modelling and theoretical purposes. To this aim, we incorporate the ordinal pattern analysis approach which allows us to abstract a time series as a sequence of patterns and their associated permutations, and introduce a simple functional over permutations that partitions them into classes encoding their level of asymmetry. We compute the exact probability mass function (p.m.f.) of this functional over the symmetric group of degree $n$, thus providing the description for the case of an infinite white noise realization. This p.m.f. can be conveniently approximated by a continuous probability density from an exponential family, the Gaussian, hence providing natural sufficient statistics that render a convenient and simple statistical analysis through ordinal patterns. Such analysis is exemplified on experimental data for the spatial increments from tracks of gold nanoparticles in 3D diffusion.


INTRODUCTION
The incorporation of stochastic ingredients in models describing phenomena in all disciplines is now a standard in scientific practice. White noise is one of the most important of such stochastic ingredients. Although tools for identifying white and other types of noise exist [1,2], there is a permanent demand for reliable and robust statistical methods for analyzing data in order to distinguish noise and filter it from signals in experiments. Or in hypothesis tests, for assessing the plausibility of the outcome of an experiment being the result of randomness and not a significant, controllable effect. Due to its ubiquity in experiments and its mathematical simplicity, white noise is very often the most convenient stochastic component that adds realism to a dynamic model, commonly regarded as the noise polluting observations. It can be continuous or discrete both in time or in distribution, so it can be applied to many scenarios. It is a stationary and independent and identically distributed process, all relatively simple properties for a stochastic process. Here we present a combinatorial perspective to study white noise inspired in the concept of ordinal patterns. An ordinal pattern of length n is the diagramatic representation of the inequality fulfilled by a subsequence of n points x 1 , . . . , x n in a time series {x t } t∈I . We discuss ordinal patterns in detail in Sec. II. This concept was used in 2002 by Bandt and Pompe [3] for building a measure of complexity for time series named Permutation Entropy (PE). PE has proven its value not only in applications, when used to analyze time series from a great variety of phenomena [4,5], but it is also of theoretical relevance since it is equivalent to the Kolmogorov-Sinai entropy for a large class of piece-wise continuous maps of the interval [6,7]. The procedure for computing PE consists in first choosing the size n for the window that will be used in the analysis, corresponding to the embedding dimension in a Takens embedding [2]. Then we slide the window through the series (equivalent to construct the lagged or delay vectors [2]), to find the frequencies π j , k = 1, . . . , n! with which ordinal patterns occur. Then we compute the associated Shannon entropy H π = − j π j log π j . For white noise in the long time series limit all the patterns are equally likely, therefore their frequencies follow a discrete uniform probability mass function (p.m.f.) with support on the integers 1, . . . , n!, which is equivalent to the probability measure of permutations over the symmetric group of degree n, S n . Despite its relevance and wide range of applications, there are few rigorous studies on the properties of PE for its use in statistical inference. To the best of the author's knowledge, this is addressed only in works such as [8], where the authors investigate the expectation value and variance of PE for finite time series of white noise, and later the same authors address the effect of ordinal pattern selection on the variance of PE [9]. In the customary PE approach every permutation is, in a sense, considered as a class, since the count of every single permutation is important. The effect of this is that the empirical distributions obtained from finite-length observations will be very sensitive to relatively minor changes in the proportions of each observed pattern [8]. This lack of robustness represents a liability when trying to distinguish noise from structure. Another problem is the factorial growth of the number of classes, enlarging the discrete support correspondingly and making the analysis both impractical and meaningless for values of the embedding dimension beyond low or moderate n (∼ 10), since the required length of a time series that will have a chance to display roughly one representative member of each class would be on the same order of n! (already around 3 million observations for n = 10).
In this work, we address these limitations by introducing a new statistic for permutations in Sec. III. This arXiv:2203.12807v2 [cond-mat.stat-mech] 14 Apr 2022 statistic is a functional over the symmetric group S n , that can be interpreted as a measure of asymmetry for ordinal patterns. The functional divides the symmetric group into classes corresponding to a coarse notion of levels of overall increasing or decreasing behaviour of a pattern. In turn, this results on the transformation of the original discrete uniform probability measure of the patterns over S n , into a new probability measure that concentrates around its expected value, as we show in Sec IV. This has practical and conceptual consequences, such as the ability of performing a suitably modified version of the ordinal pattern analysis for very large embedding dimensions, since now the number of classes of patterns to be tracked is reduced from O(n!) to merely O(n 2 ). The probability mass functions corresponding to our functional can be approximated by a Gaussian distribution, that is itself an exponential family. This guarantees the existence of natural sufficient statistics for the estimation of our statistic, as we explain in Sec IV. We open Sec. V with an illustration of our framework by analysing white noise from different source distributions and discussing its potential for distinguishing the deterministic signature of a chaotic orbit from a discrete map by inspection of the empirical p.m.f. obtained from our modified pattern analysis. Then we take experimental 3D tracks of gold nanoparticles and design a test for identifying the statistical independence among the spatial increments on each coordinate in a plane of observation. We finish by discussing our results and making additional remarks on the advantages and drawbacks of the presented framework.

II. ORDINAL PATTERNS AND THEIR SYMMETRIES
A permutation τ is a bijection τ : S → S. If we take the set S to be the sequence of integers S = {1, 2, 3, . . . , n}, then τ (i) = τ i maps this sequence into itself τ i = k, i, k = 1, 2, 3, . . . , n. Due to its bijective nature, we call the arrangement τ = τ 1 τ 2 · · · τ n also a permutation of length n. Alternatively we can call it a word produced by τ on n symbols. The set S n = {τ : τ (i) = k, i, k = 1, 2, 3, . . . , n} together with the operation of functional composition is the symmetric group on n symbols, of cardinality |S n | = n! [10]. We will denote the set of symbols as [n] := {1, 2, . . . , n} and an interval of symbols is denoted by [i, j] ∈ [n]. From a simplified perspective, the elements τ ∈ S n can be regarded all equivalent to each other a priori. Therefore, the corresponding discrete measure over S n would be U (1, n!), assigning each permutation a weight of 1/n!. Under this measure, permutations can be enumerated by lexicographic order, which ranks the words according to their size as integers. The lexicographic order in S 3 is shown in Table I following the index i along with other statistics for permutations, such as the inversion number, the major index and the runs (number of ascending sequences) [10], [11]. In the last column, we include the corresponding values of the functional introduced in Sec. III.

A. Reflections of diagrams
It is instructive to explore the symmetries of the ordinal patterns, since they are reflected in the statistical properties of the permutations as we explain in Sec. III for inspecting the properties of the distribution of the statistic introduced there. III. Permutations τ , the functional α(τ ) (defined in Sec. III), and the associated ordinal patterns of length n = 4. Reflection of the patterns on the first and second columns across a horizontal axis results in the diagrams in the pervious to last and last columns, respectively, with the same absolute values of the characteristic for the reflected diagrams/permutations, but negative sign. This is a basic property of α(τ ) that is also reflected statsistically.
i.e. it leaves the second coordinate invariant. The action of v over [n] is explicit v(1) = n, v(2) = n−1, . . . , v(n−1) = 2, v(n) = 1, (2) meaning that v simply mirrors the symbols along a vertical axis of reflection to the right of the permutation, as illustrated in Fig. 1. For even permutation length n, the effect of v(τ ) as a reflection with respect to an external vertical axis, is the same as an internal reflection of the symbols of the word v(τ 1 τ 2 · · · τ n−1 τ n ) = τ n τ n−1 · · · τ 2 τ 1 along an internal axis splitting the permutation in equal parts of n 2 symbols. For odd n, the permutation is split into equal parts of m ≡ n 2 but the symbol τ m+1 is left invariant since it becomes the internal axis of reflection itself. As a simple illustration, let us take a diagram of length n = 6, as shown schematically in Fig. 1(a). Ranking the points from smallest to largest gives the ranking permutation τ = τ 1 τ 2 · · · τ n , corresponding to the permuted indices of the vector of ranks: r(x t1 , x t2 . . . , x tn ) = (x τ1 , x τ2 · · · , x τn ) relative to the original positions t 1 , t 2 , . . . t n . Hence, in the example of Fig. 1(a), v( 1, 2, 5 : 4, 6, 3 ) = 3, 6, 4 : 5, 2, 1 , where we put double dots instead of a comma in the middle only to highlight the internal axis of symmetry of the diagram.

III. A FUNCTIONAL OVER PERMUTATIONS
All of the permutation statistics displayed in Table  I reflect different symmetries in S n . The sign of the permutation is arguably the most basic statistic, telling explicitly whether the number of flips of symbols in a permutation τ relative to the identity τ id = 123 · · · n is even (sgn(τ ) = +1) or odd (sgn(τ ) = −1). The inversion number explicitly counts the pairs for which the symbol in position i is larger than the one at i + 1. Summing up the indices of the larger symbols in each inversion yields the major index maj(τ ) = τ (i)>τ (i+1) i.
Below we introduce a new statistic of permutations that accounts for an imbalance in weight when considering the symbols [n] conforming every permutation τ ∈ S n as a collection of weights. The objective of defining and characterizing this new statistic is twofold: On one hand, we have the intrinsic interest on new insights on the study of the symmetric group and the implications of these discoveries on other areas such as dynamical systems and stochastic processes. On the other hand, there is also a practical interest on the analysis of time series, specifically in connection with ordinal pattern analysis.
The construction of the set of ordinal patterns from a realization of any process X t , consists on mapping portions of that process time series {x t } t∈I (with I = [L] an index set, usually [N ] = 1, 2, . . . , L), to the diagrams in Sec.II by sliding a window of size n over {x t } t∈I . We can inspect larger portions of the series without increasing n by skipping (lagging) a number of l − 1 points after every choice on each window, so the number l is correspondingly known as lag. For instance, a lag of l = 1 means that we do not skip any point and we slide the window without gaps. The collection of windows of n points are called delay or lagged vectors, and the process of construction of these vectors is known as an embedding [2]. Correspondingly, the natural number n indicating the window size is called embedding dimension. This terminology comes from the embedding theorems that form the basis of the state space reconstruction methods from scalar time series, known in general as time delay embeddings [2]. Hence, for the ordinal pattern analysis, we first make the embedding of the time series for a chosen dimension n and lag l. This yields a total of N = L − (n − 1)l lagged vectors. Then we rank the amplitudes {x 1 , x 2 , . . . , x n } on every lagged vector according to their magnitude, thus obtaining the ranked vectors {x τ (1) , x τ (2) , . . . , x τ (n) }. The sequence of indices in these ranked vectors are the ranking permutations τ = τ 1 τ 2 · · · τ n , where τ i = τ (i). Finally, these permutations have associated ordinal patterns as we saw in Sec. II. In this way the local information on the relative ordering is preserved, and as a consequence also the relevant information on the correlation structure of the series. However, a simple but careful inspection of this mapping makes evident that the relative ordering is not the only information preserved. In fact, if we interpret the ranks as abstract weights assigned to the amplitudes of the process, then it is clear that also information on the overall variation within portions of these windows of n data points is preserved. Questions such as the weight accumulated in sub-intervals within each bin of size n arise naturally from this observation. An equally natural choice of sub-interval for analysis within the n-window is half the bin, so as to build a measure to estimate the tendency of the process to have local increasing, decreasing or close to constant behavior. This is in close analogy to the concept of derivative, but getting rid of the information on the specific amplitudes that the process takes. In order to study this asymmetry on the total weight concentrated on each half of a length n window, we provide the following Definition 1 Functional α(τ ). For every permutation τ ∈ S n , n a non-negative integer, define the functional where m ≡ n 2 .
The functional thus defined is invariant under a shift of the sequence [n] by an integer. This transformation is also invariant under monotonic transformations of the process X t , since the relative ordering is not affected. The case for odd n gives mixed statistical behavior, since the middle permutation symbol τ m+1 is ignored in the computation of α(τ ), implying that for a given permutation lenght n, when the ignored symbol happens to be τ m+1 = n, there will be (n−1)! permutations that display the same statistics as in the full problem for permutation lenght n − 1. By the invariance of α(τ ) under a shift of {1, 2, . . . , n} by an integer, when the middle symbol is τ m+1 = 1 we have the same situation as before. Other ignored symbols produce different and more complicated effects. Therefore, we shall limit here to the case for even n, which reduces Eq. (5) to Hence, α(τ ) splits τ into equally sized intervals L = [τ (1), τ (m)] and R = [τ (m + 1), τ (n)] with partial sums s l = m i=1 τ (i) and s r = n i=m+1 τ (i), respectively. We get α(τ ) by summing up the symbols on each half, and subtracting: α = s r − s l . Notice that s l,min = m(m+1) 2 , and s l,max = n(n+1) . This means that α max ≡ α M = m 2 , as it is the case, for instance, for the identity permutation α(τ id ) = α M . By symmetry, using the reflections in Eqs. (2) and (4) the minimum value of α is α(h(τ id )) = α(v(τ id )) = −α M . This means that α(τ ) is a bounded variable for finite n.
The definition of α(τ ) allows for an obvious source of degeneracy or multiplicity, since the order of the summands on each half s l , s r is irrelevant. Let us denote the number of permutations that share the same value of α(τ ) by φ(α). For every distinct value of α(τ ), there are (by shuffling the terms on each sum s l , s r ) at least |L| · |R| = (m!) 2 ≡ η reorderings that share the same value, hence φ(α) ≥ η. For instance, if we consider τ id we get α(τ id ) = α M as above, which has the least degeneracy φ(α M ) = η. The next possible value of α(τ ) can be obtained from τ id = 123 · · · n by switching m, the largest symbol in L, with the smallest symbol in R, which is m + 1. This clearly increases the value of the left partial sum by one s l → s l + 1 while decreasing the right partial sum by one unit s r → s r −1, with the effect of decreasing α(τ id ) by two units α(τ id ) → α(τ id ) − 2, then the next value is α = α M − 2. Proceeding in this way, we get the set of all possible α-values over S n (for even n) (2), then min τ {|α(τ )| : τ ∈ S n } = 1 and min τ {|α(τ )| : τ ∈ S n } = 0 for m 2 ≡ 0 mod (2), a difference that becomes irrelevant as n → ∞. It is not difficult to see that the degeneracies come in multiples of  (8) were found by direct numerical computation, but in the following we obtain them analytically from the observations in the construction of the set A n and the definition in Eq (6) for α(τ ).
It is clear Thus, the central binomial coefficient n m counts the number of ways of computing α(τ ) in a nontrivial way. This is because n m also counts the number of different ways to assign a + sign to m = n/2 out of n symbols and a − sign to the rest of n−m = n/2 symbols, which is precisely the problem of computing α(τ ) without counting the shuffling of the terms in the sums. Yet, Eq. (9) tells us the total number of different ways to compute α(τ ), not the value of the individual coefficients a i . To find the nontrivial multiplicities a i , we notice the equivalence between our problem and the combinatorial problem of sums of partitions of sets. The multiplicities a i correspond to the number of permutations τ ∈ S n such that α(τ ) = a i . This is equivalent to the problem of finding the number of sets made of m = n/2 elements out of n total elements, such that the sum of the m-element set is fixed. This, in turn, fixes the sum of the n − m remaining elements. These sums are clearly s l , s r discussed before. Since fixing either of the sums fixes the value of α(τ ), we have thus just rephrased the computation of our functional as a combinatorial problem which has an elegant solution, stated in the form of the following Theorem (Bóna 2012 [10]) Let n and k be fixed non-negative integers so that k ≤ n. Let b i denote the number of k-element subsets of [n] whose elements have sum k+1 2 + i, that is, i larger than the minimum. Then we have In other words, n k is the ordinary generating function of the k-element subsets of [n] according to the sum of their elements.
The object in Eq. (10) belongs to a special kind of polynomials known as Gaussian polynomials or Gaussian binomial coefficients [10], regarded as a generalization of the binomial coefficients and defined as which are polynomials of the form in Eq. (10) whose degree is k(n − k) and with symmetric coefficients b i = b k(n−k)−i . The symbol [k] it is defined as also a polynomial, which reduces to [k] = k in the limit q → 1. From the same limit, we also recover n k from Eq. (11). The polynomial [k] in q is called a q-analog of k, a natural choice for generalizing a representation of nonnegative integers k parametrized by q. The generalization of the factorial as a q-analog is the q-factorial = (1 + q)(1 + q + q 2 ) · · · (1 + q + q 2 · · · + q k−1 ).
Our main result, the computation of the numbers a i (n) in Eq. (8), corresponding to the nontrivial contribution to φ(α i ) = a i η reduces to a corollary of the Theorem in Eq. (10) for the particular choice k = m = n/2. In fact, using k = m = n/2 we recover a generating function for our sequence a i with G n (1) = n m . Therefore where the notation [q i ]P (q) indicates the coefficient of the i-th power of the polynomial P (q). We can compute statistical properties of a sequence from its generating function with relative ease [12]. If we sample a permutation τ ∈ S n at random and assign the random variable χ to the associated value of α(τ ), the probability of observing the value α i is given by the ratio hence, P χn (α) is a probability mass function with support in A n (Eq. 7) and Eq. (9) is the normalization condition i P χ (α i ) = 1. The associated probability generating function (p.g.f.) is where p i = P χn (α i ). From the p.g.f in Eq. (19) we can, at least in principle, compute any desired moment of χ by differentiation of G χ (q) with respect to q. However, the derivatives G χ (q) are more cumbesome than illuminating and we will not present them here. By applying the functional α(τ ) defined in Eq. (6) to the set of observed permutations obtained from embedding a stochastic process X t as explained previously, we get an associated stochastic process in terms of α. The functional α(τ ) can be applied to any process, but here we limit to its use for the analysis of white noise processes.
A white noise process X t is a continuous or discrete time stochastic process with the following properties E[X t ] = 0, and Var[X t ] = σ 2 , for all t ∈ Ω (20) E[X t X s ] = σ 2 δ(t − s), for all s, t ∈ Ω (21) where 0 < σ 2 < ∞, Ω is the set in which s, t take values, for instance Ω = R for a continuous process. From these properties, the zero-mean and finite variance conditions are irrelevant for the application of α(τ ), since they merely are information about the scale of the process, to which our functional is blind. The values of α over an embedding in a process are well defined, so long as the source process X t has a continuous support (later we will see that this condition can be sometimes relaxed), thus having a zero probability of observing repeated amplitudes of the process X t . This means that the ranking permutations are well defined. Of course, real observations have a finite resolution, but even considering this, repetition of amplitudes from a stochastic process with continuous support would be expected to be highly unlikely at least in a finite time. Considering the former facts, we arrive at the following Definition 2 Induced χ-process and α-series. For every choice of non-negative integers n, m and l (with n = 2m) and every white noise process X t , the functional in Eq. (6) induces the discrete-time process χ k with discrete support A n (7). Correspondingly, we denote a realization or time series of the process χ k as where [N ] = {1, 2, . . . , N }, L is the length of an observed realization of X t , and N = L − (n − 1)l, is the number of delay vectors in an embedding with fixed values n and l.
In general, for arbitrary values of n and l, the process χ k and, consequently, {α k } k∈ [N ] are correlated due to the overlap of the embedding vectors, that is in turn reflected in a sequence of permutations that are not independent. Nevertheless, the process could become effectively uncorrelated for some values of l or at large values of both n and l. In order to visualize how the p.m.f P χn (α i ) changes as n increases, let us plot Eq. (16) for different n values. To facilitate comparisons, we make use of the standardized variable Z α = (χ − µ χ )/σ χ . Since µ χ = 0, we have the simple quotient we introduce the notation Z α for referring to the standardized χ random variable. We will denote its realizations by α z and the corresponding p.m.f. will be P Zα (α z ) or simply P (α z ), with moments µ Zα ≡ E[Z α ], σ Zα ≡ Var[Z α ], and associated α-process {α z,k } k∈[N ] (see (22)). We plot P (α z ) for n = 4, 8, 16, 32 in Fig. 2. Let us come back briefly to the reflections of diagrams introduced in Sec. II to add understanding to P χ (α). As illustrated in Figs. 1 and 1 and we will denote them simply by hv(·) and vh(·) in the following. Since in general hv(d) = vh(d), this is the source of the degeneracy φ(α) that is not accounted for by simple permutation of the terms of the partial sums s l , s r . An illustration of Eq. (24) is seen in Fig. 1. We get diagrams that are different in a nontrivial permutational way but with the same value of α(τ ) via the composition hv, as illustrated there by going from d 1 = 1, 2, 5, 4, 6, 3 to d 2 = 4, 1, 3, 2, 5, 6 The symmetry of the coefficients a i is thus equivalent to the reflection symmetry in Eq. (24).

IV. CONTINUOUS APPROXIMATIONS AND SUFFICIENT STATISTICS
By direct computation of the variance σ 2 χn = 0≤i≤m 2 α 2 i p i for succesive n values we arrive at the formula which for m → ∞ becomes As previously noticed, the probability mass function P χ (α) converges to a Gaussian as n increases. However, as can be noticed from Fig. 2 for low values of n the shape of the distribution is different from a Gaussian, specially around its center. We discuss this case in Appendix A. The one-parameter exponential or Darmois-Koopman-Pitman families such as the Normal distribution arise naturally from the optimization of the Shannon entropy of P (χ) under normalization, first and second moment . Using the normalization condition, together with µ χ = 0 we get = a i j a j with β = β 2 . We find β by making a direct identification with the Gaussian p.d.f.
with σ = σ χn in formula (26), µ = µ χn = 0 and j P χn (α j ) = 1 in order to get the correct normalization factor. This yields β = 3 4 m −3 and thus, an explicit continuous approximation of our p.m.f for large n as where α 2 i = (4i 2 − 4m 2 i + m 4 ), m = n/2. A convenient form for finding the natural sufficient statistics for our distribution is given by dropping the index i, i.e. α i = α(τ ) so now χ is seen as a continuous variable. Allowing the mean back into the expression, we get thus f χ (α; µ χ , σ χ ) is a probability density function that approximates P χn (α i ) for large n and allows for the possibility of a change in location and scale. More importantly, Eq. (31) indicates directly that the natural sufficient statistics for estimating the mean and variance from a sample χ 1 , χ 2 , . . . , χ N are given by the sample mean and variancē where T L is the set of permutations corresponding to the ordinal patterns observed in a time series of length L and |T L | = N = L − (n − 1)l is the number of patterns observed in that realization. The sample meanᾱ is of special interest for statistical analysis, as we will show in the next section.

V. TIME SERIES ANALYSIS
As an illustration, let us apply our framework as explained at the beginning of Sec. III to white noise time series of length L = 10 5 generated from different distributions: Standard Normal (unbounded support), Cauchy (unbounded support, heavy tails), Exponential (asymmetric distribution, unbounded support), Poisson (discrete unbounded support), Continuous Uniform (bounded support), and the special case of deterministic chaotic trajectories generated from the logistic map at control parameter value r = 4. At this value of r, the logistic map is ergodic and its invariant density has a closed form ρ(x) = 1/π x(1 − x), which is equal to Beta(1/2, 1/2). Furthermore, its orbits display exponential decay of correlations [14] and thus are effectively random in the long run. Therefore we can regard long time series obtained from 34 at r = 4 as white noise generated from a Beta distribution as a source, i.e., X ∼ Beta(1/2, 1/2). As discussed in Sec. III, the validity of the theory developed for our functional requires the process X t to have a source distribution with continuous support, since a total order is needed for obtaining welldefined ranking permutations, but this is not the case for the Poisson distribution. Nevertheless, here we relax this condition to see that, in a practical situation where the discrete support of X t is large enough to make the probability of repeated neighbouring points in time very low, then we can expect our method to apply to a good approximation.
For the sake of comparison, we use the standardized variable Z α , whose associated process is {α z,k } k∈ [N ] . Now let us choose an embedding dimension (sliding window length) with n = 32 and let us use a lag of l = 1. With this choice we ensure that the shape of P χ (α) is well approximated by a Gaussian (see Fig. 2-(d)).
In Fig 3, we can confirm empirically the statement that the only requirements for obtaining a Gaussian distribution for α(τ ), are the statistical independence in the observations and the continuity of the support of X t . These conditions can be relaxed to include processes with sufficiently rapid (i.e. exponential) decay of correlations as in the case of the deterministic chaotic trajectory, or to discrete processes with sufficiently large support.
The usefulness of our analysis is not limited to large values of the embedding dimension. In Fig. 4 we show the distributions P (Z α ) for embedding dimensions n = 2, 4, 8, obtained from a chaotic trajectory of length L = 10 4 from the logistic map with r = 4 and initial condition x 0 = 0.84291157 . . . . The discrepancies between the distributions for the chaotic trajectory and realizations of uniform white noise come from the impossibility of the logistic map of displaying some types of patterns, known as forbidden patterns [15]. For instance, in [15] it is shown that the pattern of the type 3, 2, 1 , and more generally patterns of the form * , 3+k, * , 2+k, * , 1+k, * called outgrowth patterns (where * indicates any other symbol in the pattern) cannot be displayed by orbits of the logistic map with r = 4. Although one of the drawbacks of our method is that we are limited to even values of n, we can still see the effect of the forbidden patterns by the asymmetry in the distributions in Fig 4, which has less mass in the negative side of the support. This is because the forbidden pattern has a negative value of our functional α(321) = −2, and thus, patterns that are negative in the α sense will be more likely to belong to the outgrowth set patterns of 3, 2, 1 . Correspondingly, an excess of positive patterns is observed, yielding an asymmetric distribution. This makes our analysis potentially useful for detecting signatures of determinism in an observed process by direct comparison with white noise. As it can be seen in Fig. 4, this effect is lost for a larger value of the lag due to the increased scale of observation and consequent loss of correlations.

A. 3D Diffusion of Gold Nanoparticles
Now, let us analyze experimental data gathered using a recent and powerful technique for direct observation of the 3D dynamics of nanoparticles (NPs), known as liquid-cell scanning transmission electron microscopy (LCSTEM). Although it provides a very sharp resolution, this technique have been reported to yield observations of NP dynamics that is 3 to 8 orders of magnitude slower than the theoretical predictions [16]. This discrepancy can be atributed to the damping effect of the strong beam of electrons, the viscosity of the media, and interactions of the particles with the boundaries of the experimental cell [16]. In [16], Welling et al. address the problem of observed slowed down diffusion by tuning the electron beam to a low dose rate and using high viscosity media, such as glycerol, for the NP diffusion. With those modifications, they track the 3D diffusion of charge-neutral 77 nm gold nanoparticles (Au-NPs) in glycerol as well as charged 350 nm titania particles in glycerol carbonate.
The independence between the spatial increments is one of the defining properties of Brownian motion. In the following we show how to use our transformed ordinal pattern framework for testing this independence in the experimental particle tracks from the set of Au-NPs in Ref. [16]. There are more than 200 NP tracks observed in the x-y plane in this data set, whose original experimental labels are kept here for identification. We analyzed all the trajectories whose length is L ≥ 100 points (with a maximum of L = 359), for a total of M = 37 tracks (See Fig. 5).
2) Choose an embedding dimension n and lag l in order to apply the α-analysis to ∆x, ∆y. This yields a pair of vectors, denoted for simplicity in notation by α • ∆ j and α • ∆ y . The notation α • u has to be interpreted as first making an embedding of the series u = {u 1 , u 2 , . . . u L } with the chosen n, l values and then applying the α(τ ) functional to the corresponding collection of ranking permutations.

4)
We now account for the correlations among the α z values that are introduced by construction, by computing the effective length of the vectors α z •∆ j via [1] through the correction where R α (i) is the autocorrelation function (ACF) of the {α series from the previous step, at time lag i.

5)
Finally, since the variance of Z α is known (σ Zα = 1), we can perform a one-sample Z-test on the {α (j) z,k } k∈[N ] , j = x, y series, under the assumption that the increments ∆x, ∆y are independent, and using N eff computed in previous step as the sample size. Therefore, our null hypothesis is simple: The mean value of the standardized variable Z α is zero, µ Zα = 0. In other words, we want to test The robustness of this test is guaranteed by the fact that the quantitiesᾱ and s 2 α (Eqs. 34) obtained in Sec. IV are sufficient statistics for χ. Therefore, we can rely on the statisticᾱ z as the the unbiased estimator of the standardized mean µ Zα . For each spatial dimension of the diffusion, we have the estimators α jw z = E[α z •∆ j ], j = x, y. We will follow common practice and choose a confidence level of 95%, corresponding to a type-I error (false-positive) rate of 0.05, denoted here by 1 . This error rate is customarily denoted by "α", conflicting with the notation for the main object in the paper. The author hopes that this change from the standard notation does not affect the reading. Correspondingly, we will denote the rate of a type-II error (false-negative) as 2 , customarily denoted by β, and will require an 80% power, or 2 = 0.2.
We choose an embedding dimension of n = 32, since in that case we can consider P (α z ) to be well aproximated by a Gaussian (see Fig. 2-(d)). For the lag we choose l = 1, because it ill give the maximum series length N = L − (n − 1)l. The correlations introduced by this choice of lag are accounted for by Eq. (35), but, before proceeding with next steps, let us estimate the minimum value of N eff that complies with the chosen 1 and 2 . This estimate for N eff is given from the usual two-sided Z-test by [17] where σ Zα = Var[Z α ] (see Eq. (23)), z 1− 1 /2 and z 1− 2 are the quantiles of the standard normal distribution at 1 − 1 /2 and 1 − 2 respectively, and c is the desired threshold of detection for deviations from the mean, regularly written as proportional to the standard deviation, so c ∼ σ.
For 1 = 0.05, 2 = 0.2, and c = 1.96σ, we get N eff ≥ 30 from Eq. (37). All of the trajectories considered have a corresponding N eff > 30, therefore we can reliably apply our test. A summary of the results of the test is provided in Table IV, where we show the tracks for which the null hypothesis is rejected (p-value lower than 0.05).
In contrast with the analysis displayed in Fig. 6, where only 3 x-trajectories are detected as outliers (labels 11, 94, 144), there are 4 trajectories that get rejected by the single trajectory hypothesis test (See Table IV), with labels 11, 94, 144 (x-tracks) and 159 (y-track).
Nevertheless, the agreement between the single trajectory Z-tests and the independent analysis through the box plot is reasonably good, suggesting that the correction in Eq. (35) represents a sensible approximation that accounts for the autocorrelation in the α (j) z,k processes.

VI. DISCUSSION
We have illustrated the main advantages of avoiding working with the direct statistics of patterns by, instead, first dividing the symmetric group into classes through NP label x, p-val y, p-valᾱ  the functional α(τ ) defined by Eq. (6), so that the problem is reduced to analyze these classes. However, this procedure does not come without drawbacks, and next we will discuss this, as well as other positive points in more detail. An interesting conceptual consequence of the presented view of white noise is, that a sense of typicality emerges in terms of the functional α(τ ) due to its concentration around zero. Therefore, the stationarity of white noise acquires a combinatorial character arising from statistical constraints.
In Sec. V, we have successfully shown a use case of our framework to test for independence in the spatial increments of diffusing particles in 3 dimensions, whose motion was recorded in a 2-dimensional plane. Although computing the ensemble averaged mean squared displacement (MSD) is the customary check for diffusive behavior, it does not provide the single-trajectory detail and statistical power of our method. Indeed, our method can be implemented for a single particle if that is the only information available, and still being reliable assuming a minimum effective trajectory length is achieved (as seen in Sec. V A), which is a sensible requisite.
The time series available were of a relatively short length. Yet, notably, the test is able to handle these short time series. None of the trajectories were rejected by our test for the two dimensions at the same time. That could be interpreted as a good indication that the observation technique used in [16] performed well enough as to preserve the 3D Brownian diffusion overall, by keeping the introduction of correlations in the motion through the observation scheme at a minimum. After discussion with one of the authors in [16], we can explain the higher rejection rate for the x-coordinate tracks by the observation procedure: The LCSTEM probe is progressively scanning line by line along the x-dimension, thereby potentially introducing weak correlations in the particles motion in that direction.
The former is supported from the theory exposed and the reasonably good agreement between our analysis and the quartile analysis in Fig. 6. Furthermore, the quartile analysis did not determined a track whose y-component was rejected by our test, suggesting our method is more sensitive for a given confidence level.
For all the examples of white noise processes considered in Sec. V, the customary ordinal pattern analysis for the PE computation would be practically impossible for the choice of n = 32 used here, since we have to keep track of 32! patterns. The analysis and graphical representation of the final distribution would be also impossible without a further coarsening of the support, something that would render the analysis crippled of the detail that characterizes it in the first place. Instead, in the present framework the size of the support of our distribution is |A n | = m 2 + 1 (see Sec.II). This is an important simplification, while still keeping relevant information about the patterns both in terms of correlations, and even rough information about the variation of the amplitudes in the form of weights. A remarkable aspect of our framework when comparing white noise of different sources is the robustness of the empirical distributions. The empirical distribution over S n in the customary PE approach approximates a discrete uniform with support {1, 2, . . . , n!}, implying that for moderate to large n, it would display strong variations when estimated from a finite sample. In the considered case with n = 32, l = 1 and series length L = 10 5 , the number of observations N = L − (n − 1)l 10 5 falls extremely short for having at least one representative out of the 32! 2.6×10 35 possible patterns. Therefore the obtained empirical density would be composed mostly of void regions and uneven peaks. In order to prevent this effect, in our current example of time series of length L = 10 5 one should limit to n = 8 (8! 4 × 10 4 ) and still, we could get an uneven empirical density with gaps despite the rather large time series. In contrast to this, in our approach there are just m 2 + 1 = 257 classes to keep track of, as in the illustrative examples considered in Fig. 3. The choice n = 32 was done mainly to guarantee a good approximation of P (Z α ) by a Gaussian p.d.f. Nevertheless, we can think of alternative analysis for lower n values exploiting the fact that we know the specific form of P (Z α ) for hypothesis testing. Now, let us address the autocorrelations introduced in the embedding procedure. We already mentioned in Sec. III that the sequence of α values obtained from a time series are correlated due to the overlap among the embedding vectors, that in turn translates into overlapping ranking permutations τ ∈ S n and, finally, correlated α values. This is specially so for low values of the lag l. As illustrated in Sec. V, for large enough values of l, the effect of the correlations in the original process can be significantly diminished, but also the overlap between the patterns can be diminished or eliminated (see Fig.4). Nevertheless, even for low l, the autocorrelations in the series {α k } k∈ [N ] do not affect the Gaussian character of P (α), since this characteristic comes from the fact that, as n → ∞, the partial sums s l discussed in Sec. III are composed of integers from the uniform distribution, that become effectively independently sampled as n grows, and thus the Central Limit Theorem applies. This is the same effect as the effective loss of statistical dependence when drawing with replacement from a very large pool. Thus, despite the correlations introduced by construction in the method, this does not come at a cost so high that it ruins the statistical power of the analysis, specially for large n, large l, or both n, l large values.
The absence or over representation of patterns that is expected to happen due to finite sampling is lessened by the fact that it is very likely that a pattern in the same or a similar class will take the place of the missing one. Or vice-versa, over represented patterns in a sample would induce over representation of patterns in neighboring classes for low lag values, specially l = 1 when the window has the greatest overlap. This has the overall effect of perturbing the shape of the Gaussian around the over represented pattern. It is a similar situation for absent patterns. An example of this can be seen for the chaotic trajectory of the logistic map in Fig. 3-(f), where there is a region in the center which gets a slightly increased probability density than it should for a white noise process. This is explained by the missing forbidden patterns as discussed in Sec. V. The over representation of the patterns with values of α around zero is apparently more likely to happen for the processes with bounded support as is the case of Uniform white noise and the deterministic chaotic trajectory (See Fig 3(e)-(f)). This could be explained by the boundeness of the noise and the finiteness of the trajectory, making less likely that increasing (decreasing) sequences appear, corresponding to positive (negative) values of α that are far from the average around zero. Thus, values α 0 get over represented.
A major drawback for the applicability of our framework is, that an even embedding dimension must be used. Nevertheless, a workaround to this could be to perform the analysis for the adjacent even values of the desired odd n value if the actual odd structure of the patterns is not relevant. Furthermore, in principle the odd n analysis is also possible by the Definition 1 (Eq. (5)) that we can approximate the exact distribution of the corresponding α functional since we have the general expression 11 describing the statistics of the degeneracy of α for any choice of n and m. This degeneracies can be computed numerically from the expression 11 by means of a recursion for the Gaussian binomial coefficients [10]. The distributions for α(τ ) for odd n thus obtained are skewed, but still bell-shaped. We wish to make a general analysis of this case, together with possible practical implications in a future work.
To finish, we stress that an important message conveyed with this contribution is, that the full detail of the customary ordinal pattern analysis (prior to PE computation) is not needed and instead hinders its statistical applications and, on the other hand, that the computation of the Shannon entropy directly from the ordinal pattern analysis washes away information that is valuable statistically. The approach presented here represents a middle ground with several extra benefits and relatively minor drawbacks.