Simulating the Photon Statistics of Multimode Gaussian States by Automatic Differentiation of Generating Functions

Advances in photonics require photon-number resolved simulations of quantum optical experiments with Gaussian states. We demonstrate a simple and versatile method to simulate the photon statistics of general multimode Gaussian states. The derived generating functions enable simulations of the photon number distribution, cumulative probabilities, moments, and factorial moments of the photon statistics of Gaussian states as well as of multimode photon-added and photon-subtracted Gaussian states. Numerical results are obtained by automatic differentiation of these generating functions by employing the software framework PyTorch. Our approach is particularly well suited for practical simulations of the photon statistics of quantum optical experiments in realistic scenarios with low photon numbers, in which various sources of imperfections have to be taken into account. As an example, we calculate the detection probabilities for a recent multipartite time-bin coding quantum key distribution setup and compare them with the corresponding experimental values.


I. INTRODUCTION
Recent progress in the generation, manipulation, and detection of photonic quantum states has led to new applications in the field of photonic quantum information processing and sensing. An example is photonic quantum computing, which can be realized by using only single-photon sources, beam splitters, phase shifters, and photon detectors 1 . Other applications such as quantum key distribution (QKD) or quantum imaging have gained considerable attention over the last years and become commercially relevant, requiring the development of sophisticated optical setups working at the singlephoton level 2,3 . Photon-number resolved (PNR) detection of quantum states opens new pathways to experiments and applications requiring the simulation of such experiments.
The strength of nonlinear optical interactions between light and matter typically decreases rapidly with the order of the nonlinear effect. Therefore, many common photonic quantum states are described by Hamiltonians that are at most quadratic in the creation and annihilation operators. States with such Hamiltonians are called Gaussian states (GSs) and include e.g. vacuum, coherent states, squeezed states, thermal states, or states generated by spontaneous parametric downconversion (SPDC). Transformations by optical setups described by such Hamiltonians, introduced e.g. by phase shifters or beam splitters, map GSs to other GSs. Hence, the capability to simulate the photon statistics of GSs is of great practical relevance. The covariance formalism describes GSs by a covariance matrix and a displacement vector and allows the modeling of many common effects on GSs in experiments, such as losses, phase shifts, and interference at beam splitters, by relatively simple matrix transformations of the covariance matrix and displacement vector [4][5][6][7] . Therefore, it lends itself to the implementation of simulations of quantum optical experiments. Importantly, the covariance formalism, described briefly in appendix A 1, carries the full information about the photon statistics of the state.
While non-PNR detectors such as single-photon avalanche photodiodes are common, a variety of detector types capable of PNR detection have been developed as well 8 . Transition edge sensors combine high detection efficiencies and PNR capabilities, but can only be operated at comparatively low count rates 9 . Superconducting nanowire single-photon detectors (SNSPDs) can be operated at higher count rates and allow to realize PNR detection by evaluating the electronic output pulse shape depending on the number of incident photons 10 or by subdividing the light-sensitive area into multiple pixels [11][12][13][14] . SNSPD detectors achieving photon-number resolution with eight pixels are commercially available 15 . Not only SNSPDs but also single-photon avalanche detectors have been multiplexed, in time 16,17 or space 18 , to realize PNR detectors. In such multiplexing setups, multiple photons can hit the same sub-detector so that the photons are not resolved. But, with an increasing number of detectors, the counting statistics converge to the actual PND 18,19 .
The task to find the detection probability p(n) = p(n 1 , . . . , n S ) for n 1 photons in mode 1, n 2 photons in mode 2 etc. of a GS with S modes, i.e. its photon number distribution (PND), is known as the Gaussian boson sampling (GBS) problem 20 . Experimentally, GBS has been realized e.g. using networks of beam splitters on photonic chips 21,22 . Large-scale GBS experiments have been realized for example in the context of quantum computing and to pursue the demonstration of the computational advantage of quantum computers over classical computers [23][24][25] . Ref. 26 discusses possible applications for GBS-based quantum computing. The computational complexity of simulating GBS has been investigated as a benchmark for optical quantum computing comparing it to GBS simulations on classical computers 20,[27][28][29][30][31][32][33][34] . The fact that GBS is investigated in the context of computational quantum supremacy indicates that calculating the solution to GBS problems can require substantial computational resources. The PND can be calculated by evaluating expressions involving matrix functions called the Hafnian and loop Hafnian for PNR detection as well as of functions called the Torontonian and loop Torontonian for non-PNR detection 20,30,33,35,36 . The operation count for the evaluation of these functions scales exponentially with the number of detected photons and optimization of the algorithms is an active field of research 34 .
The previously mentioned methods for GBS computations have been designed for calculating PNDs with many photons efficiently, but do not offer much flexibility. Typically, quantum optical setups not designed for the particular task of GBS quantum computation are operated at low photon numbers. Examples are applications in quantum ghost imaging and QKD, where coincidences involving only a relatively small number of detectors are relevant. If such a setup is to be simulated, minimizing the required computational resources is not necessarily the main concern. However, often a simulation method is required that is flexible enough to include imperfections inevitably present in real setups. In this paper, we present such a simulation method. The purpose of our method is not to compete with specialized GBS algorithms in terms of performance, but rather to provide a flexible and simple way to compute the photon statistics for imperfect setups.
Two relevant effects in experimental setups are the simultaneous detection of multiple modes by the same detector and noise in the detection process. These effects require the convolution of probability distributions which can conveniently be expressed by the multiplication of the corresponding probability-generating functions. Therefore, we derive generating functions of the photon statistics in terms of the covariance matrix and displacement vector in section II. Our generating-function approach is an alternative to the established expressions for the PND involving Hafnian-type functions. This different perspective on the GBS problem allows for deriving several related expressions for the generating functions of cumulative probabilities, raw or central moments, and rising or falling factorial moments of the detection statistics in section II B, for which so far no systematic method existed. Furthermore, our method allows us to calculate the same quantities for certain non-Gaussian states called photon-added and photon-subtracted GSs as shown in section II C considerably extending the range of possible applications.
Generating functions need to be differentiated repeatedly to retrieve probabilities or moments. We show that these derivatives can be evaluated numerically by automatic differentiation (AD) without much effort. An advantage of AD is that it provides accurate numerical results while hiding the whole complexity of the calculation from the user. We discuss the usage of AD for our application in section II D and provide an implementation example in Ref. 37.
Our simulation method consists of two steps, connected by the generating function. First, the quantum state and the optical setup are modeled in the covariance formalism. Second, the photon statistics are obtained by AD of the generating function expressed in terms of the covariance matrix and displacement vector. Both steps can be easily implemented with widely available software, making the method very practical. However, it is expected that optimized algorithms for the evaluation of Hafnian-type functions will outperform general-purpose AD algorithms for the particular task of calculating the PND. To show that our method can nevertheless be used to simulate various aspects of Gaussian photon statistics efficiently for small photon numbers, we apply it to multiple examples with common GSs in section III.
As a more complex application, we present in section IV a simulation of an entanglement-based QKD system we presented recently in Ref. 38. It demonstrates the strengths of our method to readily incorporate relevant imperfections such as noise, detection efficiencies, and simultaneous detection of multiple modes. Although the experimental setup is complex, the simulation is straightforward because it can be broken down into elementary operations described by basic Gaussian state transformations. The simulation results are in very good agreement with the experimental values. Furthermore, we show that the contribution of multi-photon pair emission to the quantum bit error rate of the QKD system can be estimated by applying Bayes' theorem to PNR simulation results.

II. GENERATING FUNCTIONS FOR THE DETECTION STATISTICS OF GAUSSIAN STATES
Our method to simulate the photon statistics of GSs uses generating functions for the photon statistics in terms of the covariance matrix Γ and the displacement vector d. They connect the covariance formalism to the photon statistics and are therefore essential for our simulation method. In appendix A 2 we briefly summarize the relevant properties of generating functions for probability distributions. In the following section II A, we briefly recap the formulation of the PND in terms of a generating operator and extend it to generating operators for the moments and factorial moments.
A. Generating operators for photon detection probabilities and moments Assuming a photon detection process where photons in a single mode are detected independently of each other with an efficiency η, the probability to detect n photons from a quantum state ̺ is given by [39][40][41][42] p(n) = : (ηN ) n n! e −ηN : .
Here, normal order is indicated by : : and the photon number operator isN =â †â . By inserting eq. (1) into the defining equation of a probability generating function h(y) (cf. eq. (A14)), the PND can be obtained from the expectation value of a generating operatorĥ(y) [39][40][41][42] . This method has recently been used by Nauth to express the PND of biphoton states in terms of the covariance matrix 43 . We extend this procedure to obtain similar generating operators for the moment generating function M (µ, y) and rising factorial moment generating function R(y) as described in appendix A 2: These operators can be modified to take into account noise in the detection process. For that, according to the multiplication rule for generating functions (cf. appendix A 2), the generating operator is simply multiplied by the generating function of the noise process. As an example, we consider noise with Poissonian statistics p noise (n) = e −ν ν n /n! and noise parameter ν. The probability generating function of the noise is given by h noise (y) = e ν(y−1) so that the generating operators from eqs. (2) to (4) including noise read e ν(y−1)ĥ (y), exp ν(e y − 1) M (µ, y) and e νy/(1−y)R (y).
Similarly, noise with different statistics could be taken into account by multiplying the generating operators from eqs. (2) to (4) with the generating function of the respective noise process. Another option is to include noise in the covariance formalism. Thermal noise can be modeled by coupling in a thermal state with a beam splitter or a matrix can be added to the covariance matrix to represent e.g. classical Gaussian noise or noise from amplification 44 .

B. Generating functions in terms of the covariance matrix and displacement vector
In experiments, often multiple modes, e.g. different polarization directions or frequency modes, enter the same detector. For example, calculating the simultaneous detection of multiple modes is required for modeling imperfect interference due to a mode mismatch with the model from Ref. 47, which we use in section IV. Furthermore, often joint detection probabilities between multiple detectors such as coincidence probabilities are of interest. Therefore, we generalize the calculation to multi-mode states.
In appendix B 2 we show that for multiple modes with vectors u, v, and w, G(u, v, w) = ĝ(u, v, w) can be expressed in terms of the covariance matrix Γ and displacement vector d by with the diagonal matrix W = diag(w) ⊕ diag(w), and z = d + ζ with ζ as defined in eq. (B9). Equation (6) is the generating function for the multivariate photon statistics from which the probabilities and moments are retrieved by repeated differentiation w.r.t. D parameters y 1 . . . y D for the detectors d = 1 . . . D, detecting M d modes, with additional Poissonian noise 1 ν d . The total number of modes is S = d M d and the index s runs over all mode indices, i.e. enumerates m d = 1 d . . . M d for all D detectors in the order 1 1 , 2 1 , . . . , M 1 , 1 2 , . . . M 2 , . . . . . . M D . Abbreviating G(w) = G(0, 0, w) and employing a multi-index notation 2 we obtain: 1 Similarly, the generating function of noise processes with a different, non-Poissonian statistics could be taken into account by multiplying with the corresponding generating function. 2 For tuples x and k we write We also use n! = d n d ! and ∂ n y = d ∂ n d /∂y Equations (6) to (11) connect the covariance formalism and the photon statistics by repeated derivatives and are therefore crucial for our simulation method. Alternative expressions to calculate p(n), restricted to the noiseless case, can be found in the literature and are discussed and compared to eq. (7) in section V. Our generating-function approach additionally yields eqs. (8) to (11) for the cumulative probabilities, raw moments (for µ = 0), central moments (for µ = N ), and rising and falling factorial moments directly incorporating the detection efficiency and noise. For general multi-mode GSs, to the best of our knowledge, expressions for these quantities have so far not been presented in the literature.
The cumulative probabilities are useful when ranges of photon numbers are to be analyzed jointly, especially when detection events in multiple detectors are considered. For example, the calculation of the coincidence probability to detect ≤ n 1 photons in one detector and ≤ n 2 in a second detector from the PND requires the evaluation of the probabilities for (n 1 + 1)(n 2 + 1) combinations of photon numbers, but only one evaluation of eq. (8). Mean and variance are two of the most important moments of the PND and can be directly calculated from eq. (9). The factorial moments are useful for the calculation of photon statistics of photon-added and photon-subtracted states (cf. section II C).
By using a formulation with detection opera-torsΠ, more complex detection events involving multiple detectors can be calculated.
For example, the probability for the detection of n A photons in detector A and n B photons in detector B is given by Π NA=nB ,NB=nB = Π NA=nAΠNB =nB and the probability for n 1 or n 2 photons in the same detector with n 1 = n 2 is given by Operators for complementary events can be defined as well, enabling the calculation of the probability to detect any photon number except for n or for more than n photons witĥ Π N =n =1 1 −Π N =n andΠ N >n =1 1 −Π N ≤n . The detection operators for different modes are then joined before the generating function is evaluated. This procedure was used for example in Ref. 47 to relate the detection probabilities for non-PNR detectors to the detection of vacuum by considering operatorsΠ N >0 =1 1 −Π N =0 . We extend this method to PNR detection. As an example, consider the probability to not detect n A photons in detector A and to detect more than n B photons in detector B: The last term, for example, is given by where the modes for A and B are kept in Λ and d.
Finally, we derive multivariate expressions for matrix elements of̺. In appendix B 1 we present a simple derivation of the matrix elements in terms of g(u, v, w) = ĝ(u, v, w) . We generalize the single-mode expressions from eqs. (B3) and (B5) to the multimode case by using eq. (6): In eq. (15) we define l by l s = min(n s , m s ) as well as ∆n = n − l and ∆m = m − l. An equation similar to eq. (15) with l = 0, ∆n = n and ∆m = m can be derived differently 48 and can be related to an expression involving the loop Hafnian function. Compared to this expression, our eq. (15) has the advantage that for each mode the number of derivatives to be evaluated is reduced from n s + m s to max(n s , m s ). When the derivatives are evaluated numerically, for example by automatic differentiation (AD), our expression is especially advantageous for close-by values of n s and m s .

C. Photon statistics of non-Gaussian states derived from Gaussian states
The range of possible applications for our simulation method of the photon statistics can be extended by adapting it to apply to certain non-Gaussian states.
Here, our eqs. (10) and (11) for factorial moments are helpful because they allow to calculate n (k) and n (k) directly. Photon addition and photon subtraction can have nontrivial effects on the PND of quantum states. An example is photon subtraction from a thermal state, which increases the mean photon number of the state 49,50 . Photon-subtracted states can be approximately realized by inserting a beam splitter with high transmission into the beam and by conditioning the detection of the transmitted quantum state on the detection of a reflected photon [49][50][51] . Photon addition has been realized by seeding an SPDC process with low gain so that the SPDC signal modes are emitted into the mode of the incident state. The generation of the photon-added state is then conditioned on the detection of an SPDC idler photon 52,53 . Photon addition and subtraction can be used for example to enhance the signal-to-noise ratio in quantum ghost imaging 54 .
The matrix elements of photon-added and photonsubtracted GSs can be directly calculated by usinĝ a k |l = l!/(l − k)! |l − k , given that l ≥ k, and a †k |l = (k + l)!/l |k + l : The expressions for n + k|̺|l + k and n − k|̺|l − k can be calculated from the GS's matrix elements by using eq. (15).
To derive explicit expressions for the photon statistics, we first consider the single mode case of̺ −k . The operatorâ †kĝâk is in normal order and we obtain from the optical equivalence theorem in eq. (B1) By comparison to the phase-space representation ofĝ from eq. (B2), the expectation value ofĝ with respect to̺ −k can be expressed in terms of the generating function of the underlying GS̺: For the photon-added GS, we consider for simplicity only G(w), as u and v are essentially only required for the calculation of the matrix elements discussed above. In appendix B 3, we derive the expression for the singlemode case which generalizes to This means that all the quantities of the photon statistics for which we have derived expressions above can also be calculated for multi-mode photon-added and photonsubtracted GSs. In this context, G(w) = tr(̺ĝ(w)) in eqs. (7) to (11) is replaced by the expressions for tr(̺ −kĝ (w)) and tr(̺ +kĝ (w)) from eqs. (21) and (22).
To our knowledge, expressions for the PND of photonadded and photon-subtracted GSs have so far only been reported for single-mode states 55-59 . Our expressions enable the calculation of the photon statistics for the general multi-mode case and additionally enable the calculation of moments, factorial moments, and matrix elements. Moreover, the expressions are evaluated by repeated differentiation and can therefore be calculated with the same tools as the expressions for GSs.
Certain non-Gaussian states̺ ′ , such as NOON states and cat states, can be generated from larger GSs̺ by PNR detection 48,60-62 . The effects of the optical setup on such states can be modeled using the covariance formalism as described in Ref. 61. Consider a GS̺ with (M ′ + M ) modes, of which M modes are detected with PNR detectors so that a state̺ ′ with M ′ modes remains. By conditioning the further analysis of̺ ′ on the detected PND, the non-Gaussian states are produced probabilistically 61 . Unitary transformations U ′ acting on̺ ′ can simply be modeled by applying the unitary transformation U = U ′ ⊗1 1 M×M to̺ 61 . For the simulation, the required PNR detection pattern on the M modes is fixed and the photon detection pattern of interest is applied to the remaining M ′ modes. Unitary transformations that are linear or quadratic inâ andâ † can be applied to̺ in the covariance matrix formalism before the detection 61 . By using this procedure from Ref. 61, our method can be directly applied to simulate states that can be obtained from GSs via PNR detection.
To summarize, we note that our method to simulate the photon statistics of GSs can also be applied to photon-added and photon-subtracted GSs as well as to non-Gaussian states derived from larger GSs by PNR detection. The class of states that can be simulated and the range of possible applications of the simulation method are thereby greatly extended.

D. Automatic differentiation for retrieval of probabilities and moments
Multiple options exist to evaluate the derivatives of the generating functions. One option used in Ref. 63 is to use finite-difference approximations, but this method can accumulate numerical inaccuracies. Another option to retrieve probabilities from the generating function is to approximate Cauchy's integral formula on a circle around the origin in the complex plane. In Ref. 64 this method has been applied to probability generating functions and error bounds for the approximations have been given. It is extensively discussed in Ref. 65 and expressions for moments and multi-dimensional distributions are given in Refs. 66-68. A versatile method to evaluate derivatives of functions numerically is automatic differentiation (AD) 69,70 . AD breaks down a function to be differentiated into elementary operations such as addition, multiplication, or sin(x). The differentiation rules for these operations are then applied and combined according to the chain rule to the derivative of the overall function, which is finally evaluated. For example, instead of approximating the derivative of sin(x) numerically, AD uses the fact that the derivative is given by cos(x) and evaluates the cosine function. Hence, in contrast to finite-difference approximations, the derivatives obtained are accurate to working precision.
Applying AD to the generating functions allows us to readily calculate all the quantities we have derived. Equations (7) to (11) as well as eqs. (21) and (22) only involve derivatives and basic functions as well as linear algebra that are easily implemented. With modern software tools, multivariate higher-order derivatives can be conveniently obtained with a few lines of code. From a practical point of view, this is convenient because the user is not confronted with the complexity of the calculation, which is hidden in the repeated derivatives. AD thereby greatly improves the practical usability of the generating functions we derived. However, as mentioned in the introduction, the computational complexity to calculate the PND of GBS problems with state-of-the-art algorithms scales exponentially with the number of photons, so that the numerical evaluation of the derivatives of the generating function via AD will become infeasible for large photon numbers.
A variety of AD software exists and an overview of available tools is provided in Ref. 71. AD is widely used in machine learning, for example for training artificial neural networks 72 . Therefore, AD functionalities are part of popular machine learning libraries such as PyTorch 73,74 and TensorFlow 75 . Machine learning has gained considerable attention during the past years and the range of its applications is consistently growing. Consequently, software for machine learning will be further developed and it can therefore be expected that AD software will gain further capabilities during the next years, which of course is beneficial for our application. State-of-the-art machine learning libraries provide numerous possibilities to optimize their performance, for example, using parallel computing on graphic processing units (GPUs).
Throughout this paper, we use the autograd function from the machine learning framework PyTorch 1.11.0 for AD. We use PyTorch in a very basic configuration, i.e. we do not use the option for acceleration by GPU computations and the only setting we change is the default precision which we increase from float32 to float64. All simulations are run on a regular desktop computer to show that for small photon numbers, the generating functions can be evaluated without much effort, demonstrating the practicability of our approach. In Ref. 37, we present a straightforward code example to demonstrate the use of PyTorch for computing the PND.

III. APPLICATION TO COMMON GAUSSIAN STATES
In this section, we apply our simulation method to two common GSs with well-known photon statistics: the single-mode displaced squeezed thermal state and the two-mode squeezed vacuum with multiple squeezers.
These practical examples demonstrate that our simulation method can easily take into account effects such as non-unity detection efficiencies, noise, and multiple modes entering the same detector.

sinh 2r and (26)
While an explicit expression for the PND can be derived from this generating function, we do not present it here as it can be found in Ref. 57 where it was derived by using a different approach. The photon statistics of the single-mode squeezed thermal state, displaced squeezed thermal state, and squeezed displaced thermal state have been studied in Refs. 76 and 77.
As a simplification, we now consider a single-mode displaced squeezed state |ψ = D(α)S(χ) |0 , i.e. with µ th = 0, for which the PND is given by 57 where H n is the n-th Hermite polynomial. In fig. 1, we verify that deriving the PND of the state from the generating function eqs. (7) and (25) for η 1/2 = 1 and ν 1/2 = 0 by AD yields the same results as the analytic expression from eq. (28) within a tolerance of few units of least precision (ULP) of the analytical value. The maximum absolute value of the deviation from eqs. (7) and (25) to the analytical result from eq. (28) is 118 ULP, i.e. the relative error of the numerical results is less than 2 × 10 −14 for both methods. This is remarkable because for n = 11, computing the result via eq. (7) already took a few minutes and required multiple GB of RAM, indicating a considerable number of numerical operations. The fact that the result is nevertheless numerically accurate underlines that the AD capabilities of PyTorch well suit our simulation method.
The analytic formula has the advantage that it can be quickly evaluated even for high photon numbers. But, an advantage of our formulation with a generating function in terms of Λ and d is that effects such as noise and detection efficiencies can easily be taken into account (c.f. fig. 1).

B. Multi-mode spontaneous parametric down-conversion
A strength of our approach is that the detection statistics of states with multiple modes entering the same detector can be easily evaluated. For example, the spectral degree of freedom can be incorporated into a simulation by discretizing the frequency spectrum and treating each 0 1 2 3 4 5 6 7 8 9 10  frequency as a separate mode. For the simulation in section IV we want to take into account the photon statistics, but we do not resolve the frequency spectrum of the SPDC process. Because the spectrum and the photon statistics are related to each other, we recap some basic properties of the multi-mode SPDC photon statistics.
SPDC with one signal mode s and one idler mode i generates the two-mode squeezed vacuum (TMSV) state with mean photon number µ = sinh 2 r which can be written as 4 The photon statistics of either the signal or idler mode alone is therefore given by p(n) = sech 2 (r) tanh 2n r, i.e. resembles that of a thermal state (cf. eq. (24)), with the probability generating function h(y) = sech 2 (r)/(1 − y tanh 2 r) 79 . In practice, the bandwidths of the parametric process and the pump light result in two-mode squeezing between a number M of pairs of orthogonal Schmidt modes determined by the Schmidt decomposition of the joint spectral amplitude of signal and idler photons 63,79 . The resulting state is produced by applying M independent two-mode squeezers to vacuum, each with thermal statistics of photon pairs. Hence, the generating function for the number of photon pairs is given by 79 with r i e iϑi = C √ λ i /(i ). The constant C is determined by the properties of the nonlinear medium and the pump light and √ λ i is the i-th Schmidt coeffcient. For an increasing number of equally strong squeezers, the PND changes from thermal to Poissonian statistics 79 as shown in fig. 2.
As a further example for our simulation method, we consider the bivariate photon statistics of SPDC while taking into account noise and detection efficiencies. First, we calculate the two-dimensional joint distributions p(n A , n B ) for different parameters of the detection efficiencies and noise as well as the cumulative distribution. The results obtained via our approach using the covariance matrix are shown in fig. 3. It can be seen that as expected, decreasing the detection efficiency and adding noise blurs the line of delta functions that are observed for the ideal state along the diagonal n A = n B .
Second, we use eq. (9) to analyze mixed moments of the PND of a TMSV state in fig. 4. For comparison, analytic values for the mean values and the first mixed moment are shown. However, if the photon statistics are more complicated e.g. due to loss, simultaneous detection of multiple modes, or noise, analytic expressions for the moments may not be easy to find, but can be easily calculated by applying AD to the moment-generating function eq. (9). In fig. 4, we show the influence of loss and noise on the moments.
To investigate the scaling of our method, we have measured the run time for the PND computation from fig. 3(c) as a function of n 2 with n 1 = 0 and for different numbers of modes. As expected for a GBS computation method, we observed an approximately exponential increase in the run time with the photon number. On average, each additional photon prolonged the run time by a factor of 2.8 for 256 two-mode squeezers and 3.5 for four two-mode squeezers. In comparison, Hafnian-based algorithms achieve a scaling of O(N 3 2 N/2 ) for N photons 33 and even faster methods have recently been demonstrated 34 . It is important to note that the run time of our simulation method also grows with the number of modes. However, computing p(n 1 = 0, n 2 = 6) for a 1024×1024 covariance matrix representing 256 twomode squeezers still took less than 13 s.

IV. DETECTION PROBABILITIES FOR ENTANGLEMENT-BASED QUANTUM KEY DISTRIBUTION
For states and setups more complex than the examples considered above, analytical solutions can become infeasible and numerical simulations are required. In this section, we consider an application demonstrating that by employing our method the photon statistics can be simulated in a relatively simple way, even for complex optical setups. We model a multi-user QKD system recently published based on the BBM92 protocol 38 . Although the system uses non-PNR detectors, the photon statistics are relevant as multi-photon pair emission is an important effect limiting the achievable quantum key rates. Detection efficiencies and noise are taken into account. Imperfect interference is modeled by using the mode mismatch model from Ref. 47. All these effects are readily included by using our simulation method.
A. Simulated setup for quantum key distribution QKD enables the distribution of secure keys between users based on information-theoretic principles from quantum physics 2,80,81 . The entanglement-based QKD system for four users named Alice (A), Bob (B), Charlie (C), and Diana (D) 38 implements the BBM92 timebin protocol [82][83][84][85][86][87] . Figure 5(a) shows the setup for the users Alice and Bob. The setup for Charlie and Diana is identical, but with different values for parameters such as insertion losses and coupling ratios as well as different detector properties.
The central component of the QKD system is a source of time-bin entangled photon pairs. Laser pulses are sent through an imbalanced interferometer, resulting in double pulses with a time and phase difference given by the interferometer's path length imbalance. The double pulses produce photon pairs by type-0 SPDC in a nonlinear crystal. The mean photon pair number per double pulse is µ ≈ 0.034. The spectral width of the pump is much narrower than the photon frequency spectrum so that the photons are frequency-correlated: for a pump frequency 2ω 0 , energy conservation requires that the frequencies of the signal and idler photon sum up to ω s + ω i = 2ω 0 . The photons are separated by wavelength demultiplexing (WDM). The phase matching function is almost constant over several dozen nanometers so that photons are produced in multiple pairs of 100 GHz-wide WDM channels. A pair of users obtain entangled photons only if their wavelength channels are symmetrically arranged around the center frequency ω 0 . Multiple user pairs can thereby exchange quantum keys simultaneously and independently and the user combinations can be reconfigured by changing the WDM channel assignments.
In the receiver station of each user, the photons pass an interferometer with a path length difference matched to that of the source's interferometer. They are therefore detected in one of three different time bins, depending on the path combination of the laser pulse and the pho- However, various imperfections of the real setup lead to bit errors. For example, the interferometer delays may not be perfectly matched, the detectors have a non-zero dark count probability and the beam splitters are not ideal 50/50 splitters. The ratio of the number of bit errors to the number of measured bits is called quantum bit error rate (QBER). The higher the QBER, the lower the secure key rate that can be extracted. A simulation of the system can help to quantify the contribution of the different effects to the QBER and to optimize the setup. A higher value of µ, for example, yields a higher emission probability for photon pairs per laser pulse, resulting in a higher bit rate. However, due to the statistical nature of SPDC, increasing µ increases the probability of multi-pair emission and thereby leads to a higher QBER. Therefore, a simulation inevitably needs to take into account the effect of multi-pair emission.
For the simulation, we treat the three time bins as separate modes. The setup from fig. 5(a) can thus be unfolded as depicted in fig. 5(b). It is divided into four parts: state generation, propagation in fibers and interferometers, imperfect interference with mode-mismatch at the beam splitters A 2 and B 2 , and detection in three time bins. The state after the source interferometer is modeled by initializing two TMSV states with covariance matrices Γ TMSV (χ S ) and Γ TMSV (χ L ), where the beam splitter asymmetries and losses in the arms of the source interferometer are absorbed into the squeezing parameters The coefficient r 0 is chosen such that both states together have a certain mean photon pair number µ. The propagation in the fibers and interferometers is represented by the state transformations for beam splitters, phase shifts, and losses and the interference described by the mode-mismatch model from 47 is also modeled by beam splitters. Although the setup is complex, the construction of the final covariance matrix is straightforward: it is a combination of only four basic building blocks, namely the covariance matrix of a TMSV state and the Gaussian state transformations for beam splitters, phase shifts, and losses. Combining the building blocks is simply achieved by multiplying the matrices representing them. The representations of the most important Gaussian states and their transformations can be found e.g. in Refs. [4][5][6][7]47 so that constructing a setup's model becomes a relatively simple task.
In fig. 2, we have shown the effect of the number of twomode squeezers on the photon statistics. To obtain results representing the correct photon statistics, we would have to replicate the setup from fig. 5(b) in the sim-ulation for the correct number of two-mode squeezers, each with its individual mean photon pair number. The Schmidt number K can be used to estimate the number of two-mode squeezers based on the shape of the joint spectral amplitude of the SPDC process 79,89,90 . For the simulation, we roughly estimate that K is in the order of magnitude of 100 and we assume equally strong two-mode squeezers resulting in almost Poissonian photon pair statistics. As the beam splitters and losses are applied to all Schmidt modes, the covariance matrix is block-diagonal with K identical blocks. Instead of directly calculating the determinant of this large matrix, we apply the rule for block determinants and raise the determinant of one block to the power of K = 100.
In the experiment, we use single-photon avalanche detectors thoroughly analyzed in Ref. 91. Relevant parameters are the dead time τ dead ≈ 10 μs, the efficiency η ≈ 20 %, the after-pulse probability of a few percent and the dark count rates in the range from 300 cps to more than 3000 cps depending on the detector. The flexibility of our simulation method allows us to include all these effects in the model as described in appendix C.

B. Comparison of simulated key rates and error rates to measurements
For the simulation, we have measured the values of all relevant experimental parameters, i.e. for the beam splitter transmissions, propagation losses in the fiber links, insertion losses of the components, and interferometer mode mismatches. Then, we used these values to determine the parameters of the simulation model in fig. 5(b). For the detection efficiencies, dark count rates, and afterpulse probabilities of our detectors we have included values obtained from tomographic measurements 91 in the detection operators as described in appendix C.
We have simulated the sifted key rates and quantum bit error rates by multiplying the source repetition rate with the detection probabilities for key bits and quantum bit errors in the time basis and phase basis. We compare the results with the measurements we have reported in Ref. 38 to check the validity of our simulation. For these experiments, the mean photon pair number had been set to µ = 0.034. For comparison, we consider the time basis QBER instead of the overall QBER because in the experiment the interferometer phases fluctuate and they are readjusted by tuning the interferometer temperatures as to minimize the phase basis QBER. Therefore, the phase basis QBER is not necessarily always at the minimum. Although it is possible to adjust the phases in the simulation to mimic phase fluctuations, we have chosen the phases for optimal interference and compare only the time QBER, which is unaffected by the phase fluctuations.
A comparison between the measured and simulated sifted key rates and QBERs in the time basis is shown in fig. 6. The simulation represents both the dependence of the key rates and of the time basis QBER on the user combination and distance very well. The key rate decreases with increasing distance between the participants due to the increasing propagation losses as expected. Since for the distances involved and the chosen pulse durations, dispersion is not significant, the time QBER is almost independent of the transmission distance and transmission losses. However, the QBER variations can be attributed to the individual dark count rates and afterpulse probabilities of the detectors. Multiple effects lead to quantum bit errors in the time basis: uncorrelated clicks from two noise counts can lead to coincidence. Or, one photon from a pair may be lost and the other photon can be detected in coincidence with a noise count or a photon from a different pair. Investigating the different contributions helps to determine the effects limiting the secure key rate and optimizing the setup. Thus, PNR simulations enable a detailed analysis of the different effects contributing to the QBER, as we demonstrate in section IV C.

C. Retrodictive analysis of the contribution of two-pair emission to coincidences
The process to derive the initial quantum state from the measurement outcome by applying Bayes' theorem is known as quantum retrodiction [92][93][94][95] and has been applied for example to imperfect detectors with non-unity quantum efficiency and dark counts 96 as well as to beam splitters and amplifiers 97 . Bayes' theorem states that the conditional probability p(E 1 |E 2 ) for event E 1 , given that event E 2 occurred, can be calculated from The ability to simulate the photon number statistics enables such retrodictive analysis of the quantum state, i.e. it enables us to answer the question: Given that a certain measurement result was observed, what is the probability that it was produced by an initial quantum state with a specific photon number?
PNR simulations with GSs as presented above can be used to obtain p(E 1 |E 2 ) when the probabilities p(E 1 ) and p(E 2 ) can be obtained via the simulation and p(E 2 |E 1 ) is also accessible. Such retrodictive analysis enables the investigation of effects that may not be easily accessible by measurements. For example, the influence of multi-photon pair emission on the QBER can be compared to the contribution from noise counts. Instead of simulating the probability for multi-photon pair production, it can be calculated for the photon statistics of the pair source or it can be estimated by measuring the heralded g (2) auto-correlation function of the source 98 . Not all events in which multiple photon pairs are produced lead to quantum bit errors and conversely, not all quantum bit errors originate from multi-photon pair emission. Using PNR simulation, these effects can be included systematically. However, for example, the quantum bit errors originating from dark counts could also be estimated without setting up such a simulation.
When each of the two pump pulses produces a photon pair, Alice can detect a photon of the first pair in the early time bin and Bob can detect a photon of the second pair in the late time bin, causing a quantum bit error in the time basis. Here, we only want to briefly illustrate the method and for simplicity, we neglect the dependency of the time bins in the same repetition cycle due to the dead time among each other. Furthermore, we only consider the contribution from up to one photon pair produced by each pump pulse as well as raw coincidences between the detectors A 0 and B 0 instead of all exclusive coincidences that would lead to error bits in the time basis.
Two photon pairs can only lead to an error in the time basis when one pair was produced by the first pump pulse happening with probability p(f ), and the other one is produced by the second pulse occurring with probability p(s). Both values are obtained by simulating the probability to obtain one photon pair directly after the SPDC process. From Bayes' theorem in eq. (33), the probability can be calculated that when a coincidence between A 0,E and B 0,L is measured, actually f and s pairs were produced by the first and second pulse, respectively: The emission events are independent of each other, so that p(f, s) = p(f )p(s). The detections are independent as well, so that in eq. (34) we can factorize is the probability that detector A 0 is triggered in the early time bin, given that f pairs have been produced in the first pump pulse. For f = 1 it can be simply obtained by tracing the path of the photon through the setup to calculate the total photon transmission probability T A0,E to the detector including the beam splitter transmissions, propagation losses, and the detector efficiency. The value for p(B 0,L |s) is similarly obtained: For f = 0 and s = 0, the detection probability is given by the probability for a noise count, which is equivalent to setting T = 0 in eqs. (35) and (36). Finally, the raw time error coincidence probability p(A 0,E , B 0,L ) is obtained from the simulation as well. Thereby, all probabilities can be calculated that are required to obtain p(f, s|A 0,E , B 0,L ) from eq. (34). In fig. 7 we show p(f, s|A 0,E , B 0,L ) as a function of the produced total mean number of photon pairs per repetition cycle µ and for different combinations of f and s.
For small values of µ below 1 × 10 −3 , the largest contribution to the time basis error coincidence probability p(A 0,E , B 0,L ) comes from coincidences between dark counts. Coincidences involving one noise count and one photon pair are relevant for a wide range of µ-values from µ < 1 × 10 −4 to µ > 1. The difference between the cases f, s = 1, 0 and f, s = 0, 1 is mainly due to different transmission losses from the fiber lengths of 26.9 km to Alice  fig. 7 shows that in the range around µ ≈ 0.1 and above, effects from more than one photon pair become relevant. In the range of µ = 0.034 where the QKD setup is operated, a majority of 53 % of the A 0,E , B 0,L -coincidences occur when each pump pulse produces one photon pair. This underpins the fact that a simulation of the QKD system should take into account effects from multiple photon pairs to produce accurate results for the QBERs.

V. DISCUSSION
In the previous sections, we have shown the versatile applications of our expressions for the calculation of the photon statistics of GSs. To the best of our knowledge, we are the first to derive expressions for cumulative probabilities, moments, and factorial moments for general multi-mode GSs. For the PND p(n), alternative calculation methods can be found in the literature. One approach requires 2n derivatives to evaluate the photon number n. For that, Γ and d are expressed in the complex (â,â † )-basis instead of the real (x,p)-basis via σ Q = Ω † Γ Ω/2 + 1 1/2 and τ = Ω † d. In our notation, the expression reads 20,31 . The exponential to be differentiated has been recognized as the generating function for multivariate Hermite polynomials and therefore, the PND can be expressed in terms of these polynomials [99][100][101] . However, these polynomials are given by complex recursion formulas and the expressions are therefore considered difficult to evaluate for higher numbers of photons [102][103][104] .
For GSs with d = 0, eq. (37) can be rewritten in terms of the Hafnian function as where A S is derived from A by repeating rows and columns, depending on the number of photons to be detected in a particular mode 20,30,31,33 . For states with d = 0, a similar expression involves the loop Hafnian function 32,33,35,48 . For non-PNR detection, the probabilities involve the Torontonian and loop Torontonian function 30,36 . The software library The Walrus 105 provides software related to GBS and has implemented algorithms to evaluate these functions. In contrast to such highly specialized algorithms, we use the general-purpose tool of AD for our computations. Generating operators for the photon statistics are treated in textbooks such as Refs. 41 and 106. In Refs. 79 and 89, Mauerer et al. have applied AD to the generating function from eq. (30) to obtain the PND resulting from multiple two-mode squeezers. In Ref. 43, a probabilitygenerating function for TMSV states in terms of the covariance matrix has been presented. However, generating functions in combination with AD, have rarely been applied for practical calculations in the context of photon statistics. A possible reason is that the evaluation of higher-order derivatives is in general a resourceintensive computational task so that analytical expressions are preferred. But, for the calculation of the photon statistics, this is not an a-priori disadvantage, as the required computational resources for GBS computations scale exponentially even with optimized state-of-the-art algorithms 33,34 . It can nevertheless be expected that specially tailored algorithms yield performance benefits becoming relevant for higher photon numbers.
Often only the lowest few moments of probability distributions, such as to order 3 or 4 are considered, which are easily evaluated by our method. For the PND, we have been able to calculate photon numbers up to 10 to 12 without any optimizations. The simulations took a few minutes and required multiple GB of RAM. By tuning the parameters of PyTorch, by using different AD software or more powerful hardware, even higher numbers of photons can be simulated. In the field of GBS complexity research, PNDs for considerably higher numbers of photons have been computed on much more powerful computers. For example, GBS probabilities have been computed on a workstation with 96 CPUs for 50 photons in Ref. 32 or for up to 92 photons on a 100,000-core supercomputer in Ref. 34.
However, quantum optics applications often consider states with low mean photon numbers, for example when coincidences are considered.
An example is entanglement-based QKD where µ ≪ 1 is required to avoid quantum bit errors from coincidences between photons from different pairs. The QKD system we have modeled in section IV is operated with a mean photon pair number per pulse around µ = 0.034 (cf. Ref. 38). For such low mean photon numbers, only the first few photon numbers are of practical relevance as the probabilities of finding higher photon numbers decrease rapidly. Thus, for many applications, a computational limitation to low photon numbers is not a significant restriction.
Another method to calculate the PND of GSs has recently been presented in Ref. 63, where one of the major results written in our notation is the expression with Y = diag(w)⊕diag(w) and w s = √ y j . The authors of Ref. 63 derive it by applying a fanning-out transformation to infinitely many virtual non-PNR detectors, motivated by the intuition that in this limiting case, the probability that any of these virtual detectors receives more than one photon vanishes. Therefore, this approach formalizes the experimental approach to realize PNR detection by multiplexing multiple non-PNR detectors mentioned in section I. The expresssion from Ref. 63, given in eq. (39), can be rewritten as a special case of our eq. (7), which generalizes the result from Ref. 63 in multiple ways: it already incorporates detection efficiencies and noise counts and it can be applied to states with non-zero displacement vector d. Besides eq. (7), our derivation yields generating functions that enable the calculation of the moments, cumulative probabilities, density matrix elements, and factorial moments of the PND via eqs. (8) to (11) and eqs. (14) to (15). Furthermore, our expressions require, in contrast to eq. (37), only one derivative per photon number facilitating the numerical evaluation. While analytic expressions for the PND of single-mode photon-added and photon-subtracted GSs have been reported in Ref. 55-57, we have presented the more general eqs. (21) and (22) which cover the multi-mode case and which provide, besides the PND, also the cumulative probabilities, moments, and factorial moments.
Conceptually, our generating-function approach provides a new point of view on photon statistics that is different from the Hafnian approach closely related to graph theory, and also different from the approach using the fanning-out transformation. Therefore, it can be a starting point for further theoretical investigation. For example, the analytic generating functions could also be analyzed for the complexity of their computation. Another example is the analytical generating function we have presented for the single-mode displaced squeezed thermal state.
We have shown that the flexibility of our approach enables us to easily incorporate imperfections such as noise of arbitrary statistics, provided its generating functions are known. In Ref. 63 it is noted that in comparison to other formulas for PNR detection, such as eq. (37) or eq. (38), an advantage of eq. (39) is that modes entering the same detector are treated by the same derivative, whereas the alternative formulas treat all modes separately so that terms for all the possible distribution patterns of the photons over the modes in a detector have to be calculated separately. In Ref. 63, this fact has been used to incorporate the spectral modes into calculations of the Hong-Ou-Mandel visibility of SPDC sources. The same argument holds for our eqs. (7) to (11), which involve one differentiation variable per detector, independent of the number of modes entering it. This is because our formulas obey the usual rules for generating functions for probabilities and moments and, therefore, the convolution of probability distributions is simply represented by the multiplication of the generating functions.

VI. CONCLUSIONS
We have presented various generating functions for the photon statistics of multi-mode GSs in terms of the covariance matrix and displacement vector, from which the probabilities, moments, and factorial moments are retrieved by repeated AD. Common effects in experiments, such as noise, non-unity detection efficiency, and the joint detection of multiple modes can be easily taken into account. Furthermore, expressions for the generating functions of multi-mode photon-added and photonsubtracted GSs are derived, relating them to the generating functions of the underlying GSs, thus greatly extending the range of its possible applications.
We have implemented AD by using the software framework PyTorch to retrieve probabilities for photon numbers up to values of 12 on a desktop computer effortlessly, without any optimization for performance. The code for a simple example demonstrating the calculation of a PND using PyTorch can be found in our technical report Ref. 37.
As an application, we have modeled an entanglementbased system for QKD which we have recently developed and we have observed very good agreement between the simulated and measured sifted key rates and quantum bit error rates. Finally, we have used our PNR simulation of the setup to analyze the fraction of quantum bit errors due to the production of up to two photon pairs in the same repetition cycle.
Our expressions for the generating functions in eqs. (7) to (11) are formulated in terms of simple operations on the covariance matrix and displacement vector of GSs. Together with AD for the differentiation of the generating functions, this enables practical, easy-to-implement, versatile simulations of quantum-optical setups.
The data that support the findings of this study are available from the corresponding author upon request. with 1 1 denoting the S × S identity matrix.
The characteristic function of an operatorÔ, has 2S arguments that can be collected into the vector ξ T = (ξ T x , ξ T p ) = ξ x1 . . . ξ xS ξ p1 . . . ξ pS . The expectation value Ô w.r.t. a quantum state̺ can be written as 4,47 GSs are the states described by a Gaussian characteristic function [4][5][6]47 with a real, symmetric 2S × 2S covariance matrix Γ and a displacement vector d defined by 4 Hamiltonians linear or quadratic inâ andâ † map GSs to GSs 6 and can be expressed in matrix notation aŝ Here, Y = Y T as well as X = X † ensure thatĤ is Hermitian 7 . A unitary operationÛ 1,2 = e −iĤ1,2 transformsq according tô The characteristic function becomes χ̺′ (ξ) = tr Û̺Û † exp(iξ Tq ) = tr ̺ exp(iξ TÛ †qÛ ) , (A11) resulting in transformations of Γ and d 4-7 : Examples of such transformations are the phase shift of a single mode and the coupling of two modes by a lossless beam splitter.

Generating functions for probabilities and moments
The probability distribution of a non-negative integervalued random variable N can be encoded by a probability generating function as 107 h(y) = y N = ∞ n=0 p(N = n)y n (A14) 4 We use the convention from Refs. 4,7,47 , as our notation benefits from grouping x and p components together and the covariance of the vacuum states simply becomes Γ = 1 1. The expressions we derive will depend on Γ − 1 1. Other common conventions are to arrange x and p in alternating order 5,6 , to scale Γ by a factor of 1/2 6,20,31 , or to define Γ with complex elements with respect toâ instead ofq 7,20,31 . so that the probabilities p(n) and the mean N can be retrieved by The series in eq. (A14) converges at least for y on the unit disk because all p(n) are non-negative and n p(n) = 1. The cumulative probabilities p(N ≤ n) are encoded by a closely related generating function 107 : The moments M(k, µ) = (N − µ) k of the probability distribution can be encoded in a moment generating function M (µ, y) = exp[y(N − µ)] 107 : For µ = 0 or µ = N , M (µ, y) generates the raw moments N k or the central moments (N − N ) k .
The falling factorial moments n (k) = N (k) = N (N − 1) · · · (N − k + 1) are generated by (1 + y) N : Similarly, a generating function for the rising factorial moments n (k) = N (k) = (N + 1) · · · (N + k) can be defined 41,49 : In eq. (A23) the following relation is used 108 : In the derivation of the generating functions for multimode GSs, we use two important properties of such generating functions. First, multivariate probability distributions are encoded by multivariate generating functions. For example, the bivariate distribution p(n 1 , n 2 ) is generated by h(y 1 , y 2 ) = n1,n2 p(n 1 , n 2 )y n1 1 y n2 2 . Second, the probability-generating or momentgenerating function for the sum of random variables is the product of the probability-generating or momentgenerating functions of the individual random variables 107 . For example, the probability p(n) to detect n photons in total, distributed over several modes can be calculated from the product of the generating functions h m (y) of all individual modes, sharing the same parameter y: Note that when multiplying the generating functions for cumulative probabilities and falling factorial moments, the power series in N is multiplied by the prefactor (1 − y) −1 only once, not for each factor individually.
Appendix B: Derivation of generating functions for the matrix elements and photon statistics 1. Derivation of the relation between g(u,v,w) and matrix elements In this section we derive expressions relating the matrix elements of̺ to g(u, v, w) = ĝ(u, v, w) from eq. (5) for g(u, v, w) = :exp(uâ+vâ † −wâ †â ):. We expand the quantum state in the overcomplete basis of coherent states |α as̺ = C P (α) |α α| d 2 α with the Glauber-Sudarshan P-function P (α) of the state. The expectation value of a normally ordered function of creation and annihilation operators F (â † ,â) can be calculated by using the optical equivalence theorem 41,106,109 tr(̺F (â † ,â)) = C P (α)F (α * , α) d 2 α . (B1) Applying this result to the expectation value ofĝ yields We obtain the matrix elements of̺ in the basis of coherent states and number states: Our goal is to evaluate the derivatives numerically. For m = n + ∆ m and ∆ m > 0, the number of derivatives can be reduced from n + m to m by using which is again directly evident from the phase space integral. Similar to eq. (B5), m + ∆ n |̺|m can be calculated for n > m by swapping the roles of u and v. Equation (B5) is especially useful for matrix elements close to the diagonal as it then considerably reduces the number of derivatives.
We insert the expression as a coefficient into a generating function, apply the relation from eq. (A25) and insert n|̺|n = :exp(−N )N n /n!: from eq. (1): In general, a detector can be described by a positive operator-valued measure (POVM), i.e. a set of positive operators {Π i } with iΠ i =1 1 so that the probability to obtain result i is p(i) = Π i . Our detectors are non-PNR detectors, i.e. they can only yield two detection results: "click" and "no-click". In our QKD setup, we use the BBM92 protocol with time-bin entanglement. The detection events of the entangled photons are therefore cast in an early, central, and late time-bin (cf. section IV). Due to the dead time being much longer than the time bins and the time separating them, the detector cannot click in multiple time bins of the same repetition cycle. For our detectors, we have therefore four mutually exclusive results: Detection in the early (E), central (C), and late (L) time bin as well as no click (no). The probability for a detector click in a particular interval is the probability that non-zero clicks are registered, multiplied by the probability p on that the detector is not in the dead time. Hence, for the POVM elements we obtain 1 1 =Π E +Π C +Π L +Π no witĥ Here, we abbreviate the index N E,C,L = N , indicating all time bins together. The probability that the detector is live, i.e. not during its dead time τ dead at a given point in time, is given by p on = 1 − p off . The operatorsΠ NE=0 in eq. (C2) andΠ NE,C=0 in eq. (C3) take into account that due to the dead time, a click switches the detector off in the subsequent time bins of the same repetition cycle. To include effects from afterpulses and dark counts occurring with the dark count rate r dark and to calculate p on , we first consider the simple case of a detector showing some Poissonian noise ν but exhibiting no dead time. We define the noise rate r noise = r dark + p ap r, click rate r = p click f rep and click probability with ν = r noise /f rep . Here, f rep is the repetition rate of the photon pair source, i.e. ν and p click are defined for a time interval of one repetition cycle. The afterpulse probability p ap is the probability that a click triggers a subsequent uncorrelated click. Here, only modes of the covariance matrix are kept in Λ which enter the detector, while the columns and rows corresponding to all other modes are removed. This means that on the one hand r noise depends on the click rate and on the other hand the click rate also depends on the noise. A selfconsistent value for r noise can be obtained by solving for r noise . Linearizing the exponential for r noise /f rep ≪ 1 yields This approximation is justified because the values for r noise /f rep are in the order of 1 × 10 −5 for our setup operated at values of µ ≈ 0.034. From eq. (C5), we obtain the click rate r including afterpulses. The noise parameter for detection in a time bin of width ∆T = 1 ns is then given by ν time bin = r noise ∆T .
Finally, we consider the effect of the dead time τ dead . Thus, we introduce the measured click rate r m blocking a fraction p off = r m τ dead of the acquisition time. We can now calculate p on = 1 − r m τ dead . Furthermore, the measured click rate is simply the product r m = p on r leading to p on = 1/(1 + rτ dead ), which can be directly calculated from the click rate.
For coincidences between detectors, we distinguish between raw and exclusive coincidences. By coincidences, we mean events where at least two detectors click in the same repetition cycle. While raw coincidences only consider that two detectors click, exclusive coincidences also require that the two other detectors do not click. For QKD, we use only those events for which an unambiguous bit value can be derived, i.e. only exclusive coincidences. Note that for the security of the key exchange, it is recommended that the participant observing clicks in both detectors randomly assigns one of the values 2,110 , but for the sake of simplicity of the simulation, we work with exclusive coincidences which exclude such double-clicks. The exclusive coincidence probabilities, for example between detectors A 0 in time bin i and B 0 in time bin j with i, j ∈ {E, C, L}, are given by p A0,i,B0,j , excl. = Π A0,iΠB0,jΠA1,noΠB1,no . (C8) As each detection operator consists of two terms (cf. eq. (C5)), expanding eq. (C8) yields 16 terms.
To simplify the comparison of the simulation with the measured key rates, we approximate eq. (C4): This approximation is valid when p off ≪ 1 or when Π N =0 ≈1 1, which is the case because the click probability is low due to the detection efficiency of η = 20 % and due to the transmission losses. Using this approximation reduces the number of terms from 16 to 4. As an example, consider exclusive coincidences between detector A 0 in the early time bin and B 0 in the late time bin.
By using eq. (C9), the exclusive coincidence probability becomes ≈ p A0,on p B0,on exp(−ν B0,E,C − ν A1 − ν B1 ) In the experiment, the photons have been separated by WDM with an arrayed-waveguide grating, for which we have measured the wavelengthdependent transmission τ (ω) of Alice's and Bob's channels C A and C B with channel widths ∆. In the simulation, we have included the averaged transmissions τ A/B = ∆ −1 C A/B τ (ω) dω into the losses L A and L B . Due to the photon frequency correlation, the average transmission probability for a photon pair is not τ A τ B but τ pair = ∆ −1 CA τ (ω)τ (2ω 0 − ω) dω. We, therefore, multiply the key rate by the correction factor τ pair /(τ A τ B ) ≈ 1.5 to take the spectral dependence of the channel loss into account.