Quasi-factorization and Multiplicative Comparison of Subalgebra-Relative Entropy

Purely multiplicative comparisons of quantum relative entropy are desirable but challenging to prove. We show such comparisons for relative entropies between comparable densities, including the relative entropy of a density with respect to its subalgebraic restriction. These inequalities are asymptotically tight in approaching known, tight inequalities as perturbation size approaches zero. Based on these results, we obtain a kind of inequality known as quasi-factorization or approximate tensorization of relative entropy. Quasi-factorization lower bounds the sum of a density's relative entropies to several subalgebraic restrictions in terms of its relative entropy to their intersection's subalgebraic restriction. As applications, quasi-factorization implies uncertainty-like relations, and with an iteration trick, it yields decay estimates of optimal asymptotic order on mixing processes described by finite, connected, undirected graphs.


Introduction
As relative entropy is at the core of quantum information theory, bounds and comparisons often yield results across many operational contexts. General bounds on relative entropy are however difficult to prove, as this quantity can be infinite and involves nonlinear functions of noncommuting operators. A motivation for such inequalities is to derive a strong form of decay estimate known as a modified logarithmic Sobolev inequality (MLSI) for quantum Markov semigroups. Decay inequalities characterize decoherence, noise, thermal relaxation, and related processes. To prove MLSI, however, requires inequalities that are non-trivial for arbitrarily small relative entropies -any density-independent, additive constant that holds in general will cause the inequality to reduce to positivity when involved relative entropies approach zero. In this paper, we derive asymptotically tight relative entropy inequalities without additive constants. We use these inequalities to prove a tightening of the entropic uncertainty principle and asymptotically optimal MLSI inequalities for processes described by finite graphs.
The strong subadditivity (SSA) of von Neumann entropy is a quintessential entropy inequality. SSA's impacts range from quantum Shannon theory [1] to holographic spacetime [2]. Lieb and Ruskai proved SSA in 1973 [3]. A later form by Petz in 1991 [4] generalizes from subsystems to subalgebras. Let M be a von Neumann algebra and N be a subalgebra. Associated with N is a unique conditional expectation E N that projects an operator in M onto N . In tracial algebras, we denote by E N the unique conditional expectation from M to N that is self-adjoint with respect to the trace. We call subalgebras S, T ⊆ M a commuting square if E S E T = E T E S = E S∩T , where E S∩T is the conditional expectation onto their intersection. The Umegaki relative entropy for matrices is defined by D(ρ σ) = tr(ρ(ln ρ − ln σ)) when supp(ρ) ⊆ supp(σ) and is infinite otherwise. In terms of Umegaki's relative entropy, SSA becomes: Theorem 1.1 (Petz's Conditional Expectation SSA). Let E S , E T be conditional expectations to subalgebras S, T ⊆ M. If S and T form a commuting square, then for all densities ρ, As noted in [5], theorem 1.1 implies SSA and an uncertainty relation for mutually unbiased bases, joining these concepts. In [5], it is shown that the condition [E S , E T ] = 0 is necessary as well as sufficient for SSA in finite dimensions. That can only hold for all ρ when [E S , E T ] = 0 if c > 0. When conditional expectations don't commute, there are nonetheless inequalities approximating strong subadditivity. Perturbations of entropy and its inequalities often take additive forms [6,7,8,9]. If entropies are sufficiently small, then equation (2) becomes trivial and weaker than the statement that the left hand side is non-negative. Multiplicative bounds on relative entropy exist [10,11,12], but these are constrained by the infinite divergence of entropy relative to states with smaller support. The special case of D(ρ 1 /d), however, has range [0, ln d] on a system of dimension d. More generally, for a doubly stochastic conditional expectation E, we refer to the function on a density ρ given by D(ρ E(ρ)) as subalgebra-relative entropy. Having bounded range, subalgebra-relative entropy may support much stronger inequalities than general relative entropy. D(ρ 1 /d) is a special case of subalgebra-relative entropy, in which the algebra is C1, that of the complex scalars. Subalgebra-relative entropy appears as measures of resources such as quantum coherence [13] and reference frame asymmetry [14,15,16]. One may write the conditional mutual information as D(ρ E S (ρ))+D(ρ E T (ρ))−D(ρ E S∩T (ρ)) for subsystem-restricted algebras S and T , which then applies to derived entanglement measures [17,18,19]. Subalgebra-relative entropy is a natural measure of decoherence from processes that are self-adjoint with respect to the Hilbert-Schmidt inner product [20,21]. The maximum subalgebra-relative entropy for a given conditional expectation connects closely to the theory of subalgebra indices [22]. Hence subalgebra-relative entropy is fundamental to quantum information, motivating inequalities on this form.
More broadly, relative entropy should be comparable between densities that are comparable up to constants in the Loewner order, where these constants determine the strength of such comparisons. Again, there are simple ways to obtain bounds with additive corrections of this form. Multiplicative comparisons are more challenging but more powerful in many settings, especially when all of the relative entropies involved could be arbitrarily small. As an example in Section 4.1, we show an uncertainty-like relation for incompatible projective measurements that remains non-trivial even for states approaching complete mixture. We contrast this with conventional entropic uncertainty relations, which usually reduce to positivity of relative entropy in these circumstances.
The notion of quasi-factorization for classical entropies was introduced and shown in [23], yielding a multiplicative generalization of strong subadditivity for non-commuting conditional expectations. Several works consider quantum generalizations or related properties in a variety of settings [24,25,26]. This form of inequality is also known as approximate tensorization [27,28] * , including a "strong" form that is fully multiplicative and a "weak" form that includes an additive correction term. In this paper, we refer primarily to the multiplicative form, which we generalize to any finite number of subalgebras: Definition 1.2 (Multiplicative Quasi-factorization). Let {E j : j ∈ 1...J ∈ N} be a set of conditional expectations and E the conditional expectation to their intersection algebra. We say that this set satisfies a strong quasi-factorization (SQF, or specifically (α j )-CSQF) if j α j D(ρ E j (ρ)) ≥ D(ρ E(ρ)) .
for some (α j > 0) J j=1 . We say that it satisfies complete, strong quasi-factorization (CSQF) if {E j ⊗1} has SQF for any finite-dimensional extension by an auxiliary system, where1 acts as the identity on that auxiliary system. When α j = α l for all l, j ∈ 1...J, we may write α-(C)SQF.
The original version of this paper, [29, v1], showed such a bound for subalgebras with scalar intersection. In that version, it was left as a conjecture that the dimension could be replaced by a subalgebra index, the result generalized to arbitrary sets of subalgebras, and the inequality made tensor-stable.
Later, a new technique developed by Gao and Rouzé [30] showed general, tensor-stable, multiplicative quasi-factorization with constant determined by a subalgebra index C, rather than the system's dimension. Their result shows the existence of quasi-factorization for all finite-dimensional quantum systems. Incorporating one of the techniques of that * We thank the authors of [28] for access to an early draft that considered such an inequality in parallel with the writing of this manuscript. Their present version derives a comparable, multiplicative form in some cases. result, we find a strengthened quasi-factorization that is asymptotically tight in the following sense: for a pair of conditional expectations E 1 , E 2 with intersection conditional expectation E such that E 1 E 2 − E 3 → 0, our bound approaches strong subadditivity. Our results have an asymptotic α ∼ O(ln C) dependence on the index for large C, improving on the asymptotic dependence from preceding versions of [30]. We find a strong, complete quasi-factorization like that in [30], which also has asymptotic tightness like in [28] or [29, v1], and logarithmic index scaling like that in [29, v1]. Furthermore, we explicitly show this for any finite number of conditional expectations.
Decay and decoherence are some of the most vexing challenges to quantum technology. We say that a semigroup (Φ t ) ∞ t=0 with fixed point conditional expectation Φ ∞ has MLSI with constant λ (λ-MLSI) if for all t ∈ R + , MLSIs were introduced for classical systems in [31,32] and for quantum systems in [33], then recalled in [34]. MLSI was inspired by the earlier notion of the logarithmic Sobelev inequality [35,36], which does not hold as generally [20]. As defined in [21], a semigroup has λ-CMLSI if for all extensions by an auxiliary system B and joint densities ρ on the original system and B, As a primary application, (C)SQF allows us to derive concrete (C)MLSI constants for quantum Markov semigroups. These do not follow from additive perturbations of strong subadditivity such as Equation (2). It is shown in [37] that CMLSI upper bounds capacities of quantum channels, which are famously difficult to calculate due to superadditivity and hardness of numerics for high-dimensional quantum entropies. Furthermore, we note in that work that CMLSI implies tensor-stable decoherence time estimates, an important problem for quantum computing and memory.
The culminating result of this paper, Theorem 1.8, derives CMLSI for semigroups described by finite, undirected graphs as represented on a basis in Hilbert space. This Theorem addresses an open problem, [21,Remark 7.5]. This example illustrates a broader principle known as transference, in which bounds on mixing rates of classical channels imply relative entropy decay rate bounds for quantum channels with related structure. Transference is used previously in [21,37]. This current work extends the idea to imply tensor-stable relative entropy inequalities based on order inequalities from classical vector spaces. The same principle applies to channels constructed from finite subgroups of the unitary group.

Primary Contributions.
A quantum channel is a completely positive, trace-preserving map. By E we may denote a channel or a conditional expectation. By E σ , E σ * we respectively denote a conditional expectation weighted by state σ as in Section 3 and its predual with respect to the trace. We write E N ,σ and E N ,σ * for a weighted conditional expectation to subalgebra N in order to explicitly emphasize the subalgebra. By D(· ·) we denote the relative entropy and by H(·) the von Neumann entropy. By1 we denote the identity matrix. For systems A, B, C, ... or von Neumann algebras M, N , ... we denote by |A| or |M| the dimension. The subsystem entropy is denoted H(A) ρ := H(ρ A ), where ρ A denotes the restriction to subsystem A of a multipartite state ρ on A ⊗ B ⊗ C ⊗ .... The state1/|A| or1/|M| is the respective complete mixture on A or M. For a pair of densities ρ, σ, we use ρ ≥ σ (respective ≤, >, <) to denote the Loewner order. For a pair of channels Φ, Ψ, for all extensions via an auxiliary system B. All results of this paper assume finite-dimensional densities. Entropies should be read as using the natural logarithm, though when an inequality multiplicatively relates entropies to other entropies and logarithm-containing quantities, the inequality holds as long as the same base is taken for all logarithms.
As Proposition 2.2, we prove that for any densities ρ, σ of the same dimension such that ρ σ (ρ majorizes σ) and any ζ ∈ [0, 1], Combining Proposition 2.2 with estimates of the derivatives of relative entropy with respect to complete mixture and the iteration technique of Section 3, we obtain a multiplicative bound on relative entropy to complete mixture as Theorem 2.5: This result as asymptotically tight in that we may take a → 1 as ζ → 0.
In section 2.2, we use the functional calculus as in [30] to generalize the relative entropy comparisons of Section 2.1 from complete mixture to arbitrary conditional expectations. Rather than directly using the integral form of relative entropy for desired inequalities as in [30], we use similar techniques to derive a perturbation result comparing relative entropy of related densities. This perturbative result, Theorem 1.3, is reminiscent of the triangle inequality for norms, allowing one to upper bound D(ρ (1 − ζ)σ + ζη) in terms of D(ρ σ) and D(ρ η). Indeed, the primary idea of this proof is that by comparing the relative entropy to a weighted 2-norm, we may transfer the triangle inequality from the norm to entropy up to some constant factors.  (Triangle-like Relative Entropy Comparison). Let ρ, σ, ω be densities such that (1 − ζ)σ ≤ ω ≤ (1 + ζ(c − 1))σ for constants ζ ∈ (0, 1) and c ≥ 1. Let η := (ω − (1 − ζ)σ)/ζ, so that ω = ζη + (1 − ζ)σ. Assume that ρ ∈ supp(σ). Then η is a density, Theorem 1.3 is asymptotically tight in approaching the equality D(ρ σ) = D(ρ ω) as ζ → 0. When D(ρ σ) ≥ D(η σ), and the Theorem's conditions are satisfied, Theorem 1.3's connection to quasi-factorization and subalgebra-relative entropy is apparent via Corollary 2.15: for any density ρ, quantum channels E, Φ such that ΦE = E, and constants ζ ∈ (0, 1), As with the Theorem, Corollary 2.15 is asymptotically tight in that β c,ζ → 1 as ζ → 0. The more commonly stated criterion, implies the conditions of Corollary 2.15 with ζ = , c = 2. Conversely, we show as Proposition 2.14 that with additional assumptions, D(ρ Φ(ρ)) can be upper bounded in terms of D(ρ E(ρ)). Combining Corollary 2.15 with the iterative technique of Section 3 forms the base of a quasi-factorization result: be a set of J ∈ N von Neumann algebras and associated (predual) conditional expectations within von Neumann algebra M and weighted respectively by densities (σ j ). Let E be a channel such that EE j * = E j * E = E for each E j * .
Let S = ∪ m∈N {1...J} ⊗m be the set of finite sequences of indices. For any s ∈ S, let E s denote the composition E j 1 * ...E jm * for s = (j 1 , ..., j m ). Let µ : S → [0, 1] be a probability † After a version of Theorem 1.4 appeared in v3 of this paper, [30] added comparable results (theorems 5.3 & 5.4, corollary 5.5 in that paper). Nonetheless, the techniques of our Section 3 originally appeared in v1 of this paper. It is these techniques that yield both the logarithmic dependence on c described in Remark 1.5 and the extension from two to many conditional expectations. Furthermore, Theorem 1.4 has the advantage of approaching strong subadditivity in the appropriate commuting square limits while yielding a logarithmic (or no) index dependence otherwise. measure on S and k j,s upper bound the number of times E j * appears in each sequence s. If then E is a projection, and for β c,ζ given in Corollary 2.15 and all input densities ρ (including those with arbitrary extensions to auxiliary systems), Recall the subalgebra indices as considered in [22,30] and originally by Pimsner and Popa [38] as a finite-dimensional analog of the Jones index [39]. When E is the (doubly stochastic) conditional expectation from M to N , and Φ is a channel that leaves N invariant, ρ, Φ(ρ) ≤ C(M : N )E(ρ), and the bound holds up to arbitrary extensions with C cb (M : N ) replacing C(M : N ). As Corollary 3.5, we show explicitly index-based bounds following Theorem 1.4. Furthermore, this Corollary shows how to obtain an α-(C)SQF constant scaling logarithmically with the index. Though c can be upper bounded by the index as in Corollary 3.5, sometimes there is a better upper bound based on specific knowledge of the channels involved. As explained in Section 4.2, Theorems 1.3 and 1.4 are naturally strong in contexts reminiscent of transference, an idea present in [8,21,40,37]. Transference may compare quantum channels through analogous classical channels. When several quantum channels are weighted averages of the same unitary conjugations, we may often derive Loewner order inequalities by studying how operations and compositions affect the weights. These inequalities naturally map to the conditions of Theorems 1.3 and 1.4.
Though it might not be obvious that all sets of subalgebras satisfy equation (4) for some finite k and ζ > 0, Proposition 2.16 shows that when the operator norm distance between a unital channel Φ and a conditional expectation E acting on half a Bell pair is sufficiently small, and EΦ = E, Φ must be a convex combination of E with another channel Ψ such that EΨ = E. As shown in [41], it is always possible to find a convex combination of chains of conditional expectations from a finite set that approaches the conditional expectation to their intersection. Hence: Quasi-factorization is asymptotically tight. In particular, consider a set of continuously parameterized (predual) conditional expectations E J * (ρ) → E(ρ) in diamond norm for all input densities ρ as θ → 0, then {E j } has α θ -(C)SQF with α θ → 1. When J = 2, such an arrangement approaches strong subadditivity as the conditional expectations approach a commuting square. Proposition 2.16 and Theorem 1.4 yield a concrete continuity bound on the convergence rate.
As a primary application, quasi-factorization allows us to combine MLSI estimates. Let Φ t be a family of quantum channels in dimension d parameterized by t ∈ R + , such that Φ s • Φ t = Φ s+t for all s, t ∈ R + . This family of channels is thereby a semigroup under composition, and there exists a Lindbladian generator given by L = lim t→0 (1 − Φ t ) such that Φ t = e −tL . As introduced , we say that L has λ-MLSI if for any input density ρ, A Lindbladian L has λ-CMLSI (complete, modified logarithmic Sobolev inequality) if for any bipartite density ρ AB , We (re)prove: Proposition 1.6. Let {Φ t j : j ∈ 1...J ∈ N} be self-adjoint quantum Markov semigroups such that Φ t j = exp(−L j t) with fixed point conditional expectation E j * = lim t→∞ Φ t j for each j weighted respectively by (σ j ). Let E σ * be the weighted intersection fixed point conditional expectation, assuming E j * are compatibly weighted so that it exists. Let Φ t be the semigroup generated by j } has {α j }-(C)SQF, and Φ t j has λ-(C)MLSI for each j, then Φ t has λ-(C)MLSI. Proposition 1.6 is not surprising, and the historical use of quasi-factorization in proving modified log Sobolev inequalities relies on essentially equivalent results. A simple proof of results like Proposition 1.6 with Φ 0 =1 emerges from the Fisher information formulation of MLSI detailed in [21], as the Fisher information of a sum of Lindbladians is equal to the sum of their respective Fisher informations. An alternate form of proof appears in Appendix A. Proposition 1.6 can also be useful when conditional expectations do commute, in which case quasi-factorization reduces to SSA. When multiple subsets of constituent conditional expectations lead to the same intersection algebra, α < 1 is possible. The use of bipartite quasi-factorization to prove MLSI appears in [28] and [42], so similar methods exist in the literature. Remark 1.7. As shown in [28, Section 3.2], one can convert (C)SQF inequalities from the doubly stochastic setting to non-trivially weighted conditional expectations. The results of [43] compare (C)MLSI constants of Lindbladians with non-tracial invariant states to those with tracial invariant states, and the methods therein underlie the comparison in [28,Section 3.2]. One may also use Lemma 2.14 together with Theorem 2.5 or 1.3 similarly.
We demonstrate uses of quasi-factorization in two primary examples. First, we show asymptotically tight, uncertainty-like relations for pairs of measurement bases. In particular, when A is a d-dimensional subsystem of bipartite system AB with respective matrix algebras A and B, and S, T ⊆ A correspond respectively to measurement bases {|i S : i = 1...d} and {|i T : i = 1...d} such that ξ = min i,j | i S |j T | 2 > 0, quasifactorization implies that for any ≤ 1 − dξ, where β d, is as in Corollary 2.15. This form of inequality, detailed in Subsection 4.1, strengthens the usual, entropic uncertainty principle for highly mixed states. More broadly, quasi-factorization yields uncertainty-like inequalities between relative entropies to the invariant subalgebras of finite groups. Second we use quasi-factorization to show new entropy inequalities and decay estimates for mixing channels described by finite groups and graphs. As detailed in Subsection 4.3, a finite, undirected graph G with n vertices can be represented on n-dimensional densities by conditional expectations given by E i,j (ρ) = 1 2 ( |i i| ρ |i i| + |j j| ρ |j j| + |i j| ρ |j i| + |j i| ρ |i j|) for each pair (i, j) in G's edges. Using big-Ω notation to denote asymptotic order: Theorem 1.8. Let an m-regular, connected graph with n vertices G have subleading normalized adjacency matrix eigenvalue (also known as spectral gap) γ as defined in Theorem 4.3. Consider the conditional expectations of Equation (8) possibly in tensor product with an arbitrary, finite-dimensional auxiliary system. Let E G denote the conditional expectation to the invariant subspace of these for all i, j. Then This inequality is stable under tensor extensions by auxiliary systems. The Lindbladian has CMLSI with the same constant.
The technical version of Theorem 1.8 appears as Theorem 4.8.
Remark 1.9. Let L G be the degree-normalized Laplacian matrix corresponding to a finite, connected, undirected graph G with n vertices, and let L G be the Lindbladian as constructed in Theorem 1.8. L G generates a semigroup on l n 1 . For any probability vector p ∈ l n 1 , let density ρ ∈ S n 1 be such that p = diag(ρ), L( p) = diag(L G (ρ)), and diag : S n 1 → l n 1 denotes restriction to the diagonal. Furthermore, if a densityρ ∈ S n 1 ⊗ B has the partially diagonal formρ = x∈1...n so will (L ⊗1 B )(ρ) for any finite-dimensional extension B, so restriction to the diagonal remains bijective. In this way, Theorem 1.8 bounds CMLSI constants for finite graphs in a sense compatible with that of [44,21,40]. A similar argument holds for finite groups. [21,Remark 7.5], showing that the fastest, regular expander graphs have CMLSI with constant no worse than one over logarithmic in n, in line with expectations based on classical mixing times of expanders and classical MLSI [45]. It also yields expected decay times for cyclic graphs. For any graph with γ(n) constant in n, Theorem 1.8 shows convergence in O(ln γ(n) (1/n)) time. This convergence time is believed to be of optimal asymptotic order in n, matching the best classical bounds in known cases. Subsection 2.1 proves Theorem 2.5, a special case of the more general Theorem 1.3 proven in Subsection 2.2. Section 3 proves Theorem 1.4. Section 4 describes applications that use quasi-factorization to tighten entropic uncertainty relations and to derive new inequalities on graphs and groups. Section 5 concludes with some open problems.
In Subsection 2.1, we show that when ω =1/d, and ρ majorizes σ, there is a multiplicatively adjusted form of concavity-like relation. Recall that ρ majorizes σ if are the respective eigenvalues of ρ and σ in non-increasing order. In Subsection 2.2, we show an analogous bound when ω = E(ρ) and σ = Φ(ρ) for channels Φ and E under certain conditions. These conditions are satisfied when E is a conditional expectation to an invariant subspace of Φ. We may for instance take σ pure, and ρ = (1 − ζ)1/d + ζσ. The final condition we add is that ρ majorizes σ (ρ σ), in which case we obtain Theorem 2.5.
The results of this subsection precede [30] and use a different approach from that of section 2.2. Many of the results herein are subsumed by results of that section at least up to constants. Nonetheless, we include this subsection as illustrative of a more computationally inspired line of proof. Furthermore, this method yields intermediate results of potentially independent interest and gives more intuition for the subsequent generalization.
Let ω = ρ ζ . We alter ω via a cascading probability redistribution procedure consisting of the following steps, which transform it into a copy of σ ζ : (1) Start with the index i set to 1.
Add δ to ω j and subtract it from ∆. (c) If ∆ = 0 (which must happen at or before j = d for normalized densities), go to step (3). Otherwise, increment j → j + 1, and return to substep (2b). (3) If i < d, increment i → i + 1 and return to step (2). Otherwise, the procedure is done. See Figure 1. Since this procedure only subtracts from larger eigenvalues and adds to smaller ones, we apply Lemma 2.1 at each step that transfers probability mass from one index to another. If ρ i ≥ ρ j , then ρ i / ρ ζ i ≥ ρ j / ρ ζ j . Furthermore, if ρ i ≥ ρ j for any i and j, then it is always the case that ω i ≤ ρ ζ i even as ω changes throughout the algorithm, since we move probability mass out of ω i and into ω j ≥ ρ ζ j . Hence ρ i / ρ j ≥ ω i / ω j for all i and j such that ρ i ≥ ρ j . Since each step of the flattening algorithm can only increase the relative entropy via Lemma 2.1, D(ρ ρ ζ ) = D( ρ ρ ζ ) ≤ D( ρ σ ζ ). Finally, using the simultaneous diagonalizability of ρ and σ, it is easy to see that D( ρ σ ζ ) ≤ D(ρ σ ζ ).
Proof. We exponentiate both sides and solve We then estimate where n is the dimension of the system. Let a, β ∈ (0, 1), i, j ∈ 1...n such that ρ i ≥ 1/n ≥ ρ j , and let ζ ∈ R + such that If ρ j = 0, then lettingρ = ρ − î + ĵ , for sufficiently small , whereî andĵ denote the rank 1 unit densities according to the respective ith and jth basis vectors in the chosen diagonal basis of ρ.
Since the proof of Lemma 2.4 is technical and the Lemma a more specific alternative to methods in section 2.2, we defer proof to Appendix A.
Hence it is sufficient to prove that which expands as The main insight behind this proof is Lemma 2.4. If ρ =1/n, then both terms are 0, and the proof is trivially complete. If ρ =1/n, then the total probability mass above1/n must equal that below1/n to maintain normalization. Hence we apply Lemma 2.4 to successive pairs of i, j such that ρ i ≥ 1/n ≥ ρ j , flattening ρ until we transform the second argument of the relative entropy in Equation (9) from (1 − ζ)1 + ζnρ to1 without increasing the relative entropy.
We may optimize a and b in Theorem 2.5 for given values of d and ζ. If we wish to avoid optimization, b = 1/2 is a reasonable value, and one may use the calculated bound of Corollary 2.6. We further see from this Corollary that 2.5 is asymptotically tight: as ζ → 0 for fixed d,we may choose a and b such that (1 − a) → 1. Theorem 2.5 relies on a comparison using telescopic relative entropy as introduced in [46]. Corollary 2.6. Given a ∈ [0, 1] and two densities ρ, σ in dimension d such that ρ σ (ρ majorizes σ), With ζ → 0 as d is held fixed, we see that this expression is asymptotically tight. We may choose a = 1 − 1/d and The proof of this Corollary is contained in Appendix A. The proof is essentially a basic calculation with linear approximations.
where e x is a classical basis vector. One can thereby expand D(ρ E(ρ)) = x∈X p x D(ρ x E(ρ x )). Hence Theorem 2.5 may include classical auxiliary systems. 2.2. Perturbation of Relative Entropy to a Subalgebra. The primary result of this section is the proof of Theorem 1.3, generalizing and strengthening Theorem 2.5 using the methods of [30]. For this proof, we recall 4 useful results of [30] and preceding works. First, as noted in [21] or inferred from the form of weighted inner product constructed in [47,48]: is a norm for strictly positive ρ on spaces with finite trace and Hilbert-Schmidt norm.
Though knowing how X ρ −1 is induced by an inner product is in principle sufficient to deduce geometrically that it must be a norm, we here give an elementary proof: Proof. Let Γ −α ρ,r (X) = (ρ + r) −α X(ρ + r) −α as a more parameterized version of Γ −1 as in [30]. Then as in [30], Via the cyclic property of the trace, where · 2 is the usual Schatten or Hilbert-Schmidt 2-norm. This is already enough to show positivity. By inspecting the form of Γ −1/2 r,ρ (X), we can also see that aX ρ −1 = |a| X ρ −1 for all a ∈ C, and that X 2 ρ −1 = 0 ⇐⇒ X =0, the zero matrix. Expanding and using the triangle inequality for the Schatten 2-norm, Proving the triangle inequality for the weighted norm then reduces to showing that The Hilbert-Schmidt norms involved are obviously positive, and they are square integrable for strictly positive ρ. Hence we may interpret these norms as the absolute values of complex-valued, strictly positive functions of r. The Cauchy-Schwarz inequality finishes the proof.
Shown explicitly as [30, Lemma 1] and implicit from Lemma 2.8 or from the methods of [43] is a comparison property for inverse-weighted norms: Lemma 2.9. For any positive operator ρ, strictly positive σ, and c ∈ R + such that ρ ≤ cσ, Though the following Lemma appears in [30], it follows directly from taking a wellknown integral representation of the second derivative of relative entropy D((ρ, σ) t σ) with respect to t, where (ρ, σ) t is defined in the Lemma: We also use the following "key" Lemma of [30]: [30]). Let ρ ≤ cσ for c > 0 and strictly positive densities ρ and σ. Then We make the simple observation: For any pair of bounded, non-negative, Riemann integrable scalar functions f (t), g(t) and any s > 0, This inequality follows from the inequality of arithmetic and geometric means.
Finally, we will often use a well-known continuity argument to bypass the assumption of strict positivity: Remark 2.13. For all densities ρ and σ, D(ρ σ) = D(P ρ ρP ρ P ρ σP ρ ), where P ρ is the projection to the support of ρ. Hence without loss of generality, we may restrict to the support of ρ, on which ρ is strictly positive. Since D(ρ σ) is finite if and only if the support of ρ is contained in that of σ, we may assume it is finite if and only if P ρ ρP ρ and P ρ σP ρ are both strictly positive.
Using these known results, we derive the new results in the rest of this Subsection.
Proposition 2.14. Let E, Φ be quantum channels and ρ be a density such that for constant a. In general, a ≤ 4.
For cases in which σ is not strictly positive, we refer to Remark 2.13.
Proof of Theorem 1.3. By the assumptions of the Theorem, σ, ω, and η all have the same support. By the assumptions of the inequality and Remark 2.13, we may assume that ρ, σ, ω, and η are all strictly positive on this support.
If ω ≥ (1−ζ)σ, then η := (ω−(1−ζ)σ)/ζ is a positive semidefinite matrix. Furthermore, when ω, σ are densities, Hence η is positive semidefinite with trace 1, the conditions for it to be a density matrix. It also then holds that This convex combination form will be useful in understanding the proof.
Proof. We use the lower inequality of Theorem 1.3 with σ = E(ρ), and ω = (1 − ζ)E(ρ) + ζΦ(ρ). By the assumptions of the Theorem, Φ(ρ) is strictly positive on and has the same support as E(ρ), so we may assume strict positivity of both by Remark 2.13. The Corollary follows from the data processing inequality when ΦE = E.
For any conditional expectation E σ , there is a basis for which where σ B is a ρ-independent density on the subsystem B l , and P l is a projector to the lth diagonal block. In particular, ⊕ l P l ρP l is a block diagonal matrix with entries from ρ, effectively removing all coherence between blocks. We may subsequently interpret each such block as a bipartite system A l ⊗ B l , then trace out subsystem B l and replace it by the fixed state σ B l . This block diagonal form is applied commonly in operator algebrassee [5] for discussion of the doubly stochastic case and [30] in general. It follows from the fundamental result of von Neumann [49] that every von Neumann algebra is decomposable as a direct integral of factors, which are von Neumann algebras in which only the identity commutes with all elements. In infinite dimensions, "⊕ l " may take the form of a direct integral rather than a sum.
Proposition 2. 16. Let E σ * be a stochastic conditional expectation weighted by normal, faithful density σ and Φ a quantum channel such that |i ⊗ |i in the computational basis, which we may assume without loss of generality is compatible with the block diagonal form of E σ * as in (14). Let d l denote the dimension of the lth diagonal block of1 ⊗ E σ * (ρ) and m l denote the dimension of the traced subsystem in that block. Then for some channelΦ such thatΦE σ * = E σ * Φ = E σ * whenever If E σ * = E as a unital conditional expectation weighted by the trace, then the above condition reduces to: andΦ is assured to be unital. If E(ρ) =1/d for all ρ, then we may replace the condition on ζ by where the supremum is over normalized densities.
For a pair of Hermitian matrices X, Y and any k ∈ 1...d, Weyl's inequality [50] states that Via Weyl's inequality, for each value of k, setting The absolute value of the right hand side is upper-bounded by the ∞-norm distance.
Re-arranging and combining terms, Invoking Choi's theorem on completely positive maps, we recall the assumption that ρ is a d × d Bell pair that we may express in a basis that is compatible with the block diagonal form of Equation (14). We estimate λ k (E σ * (ρ)). Let us consider E σ * to act on the outer-indexed subsystem of the Bell pair. We may decompose E σ * =Ẽ σ * E bl , where E bl (ρ) = ⊕ l P l ρP l , andẼ σ * traces and replaces the B l subsystem within each block by σ B l . Each block in E bl (ρ) is then itself a maximally entangled state in dimension d l ×d l times a factor of d l /d. We then apply a partial trace with replacement by complete mixture of an m l -dimensional subsystem from each lth block, which because of the maximal entanglement yields m 2 l distinct eigenvalues of magnitude each at least d l λ min (σ B l )/(dm l ). To ensure positivity of the right hand side of Equation (16), we may choose any ζ ≥ max l m l d/(d l λ min (σ B l )). In the unital case, σ B l =1/m l , so λ min (σ B l ) = 1/m l . The map given by Φ − (1 − ζ)E σ * is then completely positive by Choi's theorem. Since both Φ and E σ * are trace-preserving, tr((Φ(ρ) − (1 − ζ)E σ * (ρ))(X)) = ζtr(X) for any matrix X. HenceΦ = (Φ − (1 − ζ)E σ * )/ζ is a quantum channel. By linearity and the assumption that E σ * Φ = ΦE σ * = E σ * , We may then rewrite Φ = (1 − ζ)E σ * + ζΦ as a convex combination of channels. To see thatΦ is unital when σ =1/d and E σ * = E, if Φ(1) =1, then Φ(1) cannot be1, which would contradict the assumption that E is unital. When E(η) =1/d for any η, we can simplify the above argument by noting that Φ(η) and1/d are always simultaneously diagonal.
Combining terms and re-arranging yields the final part of the Proposition.
Even though Proposition 2.16 depends on the dimension, this is the minimal dimension on which the conditional expectation may act, not including any extra, untouched subsystems. Hence the same bound applies for E ⊗1 B , Φ ⊗1 B independently of B's dimension, and any unitary embedding of such an extension preserves constants. Proposition 2.16 is similar to [37,Lemma VI.8], which is attributed in that work to non-author Li Gao.

Combining Conditional Expectations
We start this section by recalling basic facts about conditional expectations. A doublystochastic conditional expectation E N is a projector to matrices in algebra N that is selfadjoint under the trace. For this reason, we do not distinguish between E N and E * N or E N * , which respectively denote the dual and predual of E N . For a subalgebra N , the doublystochastic conditional expectation E N is unique. We also consider weighted conditional expectations with block diagonal decompositions of the form in Equation (14), which we rewrite equivalently as for some probability distribution (p i ). One could specify an overall weighting density Given an arbitrary σ not necessarily in this form, we recall the self-adjoint conditional expectation to the commutant algebra, For any σ, E N (σ) yields each σ B i in the block decomposition of Equation (17). Given a weighting state σ and a set of conditional expectations {E j } J j=1 , we may thereby construct the set {E j,σ * = E j,E N (σ) * } unambiguously.
The projections denoted E N ,σ and E N ,σ * are respective adjoints under the trace. Letσ N be the unnormalized density in finite dimension d given bỹ It then holds for any density ρ that and for any operator X that . Via its block diagonal form, we see that E N ,σ * is idempotent, as is its adjoint. Finally, E N = E N ,τ * = E N ,τ , where τ is the normalized identity or trace. It is simple to observe using the block diagonal forms that if EE j = E for unweighted conditional expectations E j and E, then E σ * E j,σ * = E σ * for the σ-weighted versions. Let ω be a density and E σ be a conditional expectation such that E σ * (ω) = ω. Then for any density ρ, This equality is well-known. Nonetheless, we include a simple proof for finite dimensions. Here the logarithm as denoted "log" is with respect to an arbitrary base.
Proof. Let E denote the doubly stochastic conditional expectation such that for an operator X, whereσ is the unnormalized density as in Equation (19). Then Examining the logarithm of the block diagonal form in Equation (17), Since η ρ has no dependence on ρ, we define η ω = η ρ analogously. Comparing to Equation (20), and η ω − η ρ = 0.
Lemma 3.2. Let E σ be a conditional expectation and ρ, ω be densities. Then Proof. By data processing on the 2nd term, By Lemma 3.1 and the idempotence of conditional expectations, we obtain the Lemma.
In the rest of this Section, we may sometimes drop the explicit subscript of the weighting state, e.g. writing E j * for a weighted conditional expectation. This is to reduce the verbosity of notation when considering sets of potentially weighted conditional expectations.
Proof. For each s ∈ S, we apply Lemma 3.2 iteratively, finding that By convexity, we move the weighted average over s inside the relative entropy, completing the Lemma.
then E is a projection, and for β c,ζ given in Corollary 2.15 and all input densities ρ (including those with arbitrary extensions to auxiliary systems), s∈S µ(s) j k s,j D(ρ E j * (ρ)) ≥ β c,ζ D(ρ E(ρ)) .

Proof of Theorem 1.4. Note that Equation (4) in the Theorem implies that
for some channel Φ and constant c such that Φ(ρ) ≤ cE(ρ).
We see that E is a projection taking lim m→∞ ( s∈S µ(s)E s ) m . Via the assumption of Equation (4) and that E j E = EE j = E, this limit is equal to E and clearly idempotent.
We have by the assumptions of the Theorem and Lemma 3.3 that We then note that since it must hold that EΦ = E, and similarly, ΦE = E. We then use Corollary 2.15.
One may substitute Equation (21) for Equation (4) for particular states and may find that in some cases, doing so yields better constants than would the full cp-order inequality. The Theorem is nonetheless stated in cp-order form for its easier interpretability and connection to other results in this paper. Proof. First, E projects to a fixed point subspace of {E j } for all j, so for any sequence s of k-many j indices, E also projects to a fixed point subspace of j=1...k E j . For all densities ρ, it must then hold that EΦ(ρ) = ΦE(ρ) = E(ρ), so Φ also has N as a fixed point subalgebra.
For convenience of notation, we denote C := C(M : N ), and C cb := C cb (M : N ). It follows from the definitions of C and C cb that Recall that via the chain rule (Lemma 3.1), A single term of D(ρ E j (ρ)) cancels the first term on the right hand side of the above, so for any α ≥ 1, By Equation (23) and with an additional round of the data processing inequality, Since E j Φ κ E = EE j Φ κ = E, we may use 2.15 as before. Hence we may replace the index of N in M by that of N in N j at the cost of taking k s,j → k s,j + 1 for each s with non-zero measure.
Proof of Remark 1.5. Proposition 2.16 shows that for a channel Φ and conditional expectation E σ * such that ΦE σ * = E σ * Φ = E σ * with Φ(ρ) sufficiently close to E σ * (ρ) in operator norm distance, Φ is a non-trivial convex combination of E σ * with another channelΦ such thatΦE σ * = E σ * Φ = E σ * . Via [41,Theorem 6.7], an iterated product of doubly-stochastic conditional expectations (E1..., E J ) n converges to the intersection algebra's conditional expectation as n → ∞. Together, these suffice to show that a quasi-factorization inequality with some constant holds. Corollary 2.15 shows a correction factor scaling as 1+O(cζ) for small ζ. Hence as ζ → 0 with constant c, the correction factor approaches 1. This shows asymptotic tightness.
The caveat to Remark 1.5 is that as ζ → 0, E must not change. For example and as discussed in Section 4.1, if we take two incompatible measurement bases at arbitrarily small angle to each other, they will converge to the same basis, though for any finite angle E will be a trace of the subsystem.

Applications
Well-known uses of quasi-factorization, powered by Proposition 1.6 and similar, include strengthening quantum uncertainty principles and estimating decay rates to thermal equilibrium in many-body systems [28,42]. MLSI and similar estimates also bound decoherence times and quantum capacities [37,22]. Taking this idea a step further, quasifactorization gives strong bounds on certain combinations of quantum resources, such as the relative entropy of coherence in incompatible bases [13] or the asymmetry with respect to overlapping symmetry groups [14,15,16]. Similarly, combining MLSI is a powerful way to estimate decoherence or resource decay for systems undergoing several noise processes simultaneously. In Subsection 4.1 we show how the improved quasi-factorization yields strong, asymptotically tight, uncertainty-like bounds for mixtures.

Uncertainty Relations. Let S and T correspond to bases of subsystem A within
A ⊗ B such that |A| = d, not necessarily mutually unbiased. Let {|i S : i = 1...d} and {|i T : i = 1...d} be the states of these bases. Corresponding to each basis is a conditional expectation that is the pinching map for that basis. In particular, for any input density ρ and suitable B-system densities (ρ B i,S ) and (ρ B i,T ). These forms easily extend to bipartite ρ AB with |A| = d and conditional expectations acting on A. Hence If S and T correspond to mutually unbiased bases, then E S E T (ρ) =1/d ⊗ ρ B . When the bases are not mutually unbiased, E S E T = E. In full generality, E S and E T might leave mutual subspaces invariant. We will however assume that 0 < | i S |j T | < 1 for all i, j ∈ 1...d, excluding cases that leave subspaces of A invariant. Let ξ = min i,j | i S |j T | 2 , which by our assumptions is larger than zero. Note that ξ ∈ [0, 1/d]. Then Repeated applications increasingly replace a density by one that is completely mixed on A. Hence E S E T ≥ cp (dξ)E, and E is a completely depolarizing channel on the A subsystem.
Since | i S |j T | 2 ≤ 1, we obtain using the Choi matrix that E S E T ≤ cp dE. For any input ρ, if |B| = 1, then E S E T (ρ) ≤ dE(ρ). These observations will allow us to derive entropic uncertainty-like bounds from quasi-factorization. First, we recall some known bounds for comparison. The conventional Maassen-Uffink uncertainty relation [51] states that which may be extended by an auxiliary system [6]. When ρ ≈1/d and/or when there is high overlap between bases, the right hand side of the conventional uncertainty relation becomes negative, and the bound becomes trivial. In contrast, α-(C)SQF still gives a positive, non-trivial bound on the sum of basis entropies. We also recall the result of [28, Corollary 2], This bound remains non-trivial as ρ →1/d but fails if d max i,j | i S |j T | 2 − 1/d is too large. The quasi-factorization from [30] also yields an uncertainty-like bound with extendibility by an auxiliary system, in this case stable under extension by auxiliary systems. Now we apply quasi-factorization to derive bounds that are tighter in some circumstances. Letting ζ = 1 − dξ in Corollary 3.5, for any < ζ, whereẼ = E ⊗1 B , andẼ S ,Ẽ T are defined analogously. Expanding the relative entropies in terms of von Neumann entropies, this is equivalent to where H(S ⊗ B) ρ , H(T ⊗ B) ρ , and H(B) ρ are defined respectively as the entropies of the outputs of E S , E T , and E on ρ. .
As the two bases approach mutual unbias, ζ approaches 0, and both forms of the inequality approach the entropic uncertainty relation implied by strong subadditivity (see Petz's version as theorem 1.1). Finally, we consider two bases such that 1]. This situation may arise, for instance, when taking a partial rotation into a Fourier transform of the original basis. As θ → 1, the bases approach mutual unbias, and CSQF approaches the bound given by Petz's subalgebra SSA as in Equation (1). As θ → 0, ξ approaches 0. In this latter regime, 1 − dξ is close to 1, so to achieve a sufficiently small for the inequality to be non-trivial, log 1−dξ must be large. Unlike equation (25), Theorem 1.4 does not become completely trivial until the bases become the same basis. When E S = E T , the intersection conditional expectation ceases to be E, instead becoming E S = E T . As noted in Corollary 1.5, the bases approach a different intersection algebra. SSA for two of the same bases reduces to the trivial statement that 2D(ρ E S (ρ)) ≥ D(ρ E S (ρ)). Meanwhile, when ξ << 1/d but is still finite, quasi-factorization still compares to the same intersection algebra. If we take for instance the pure state |0 S 0 S | as a test density, we see that as θ → 0. In contrast, D(|0 S 0 S | E(|0 S 0 S |)) = D(|0 S 0 S | 1 /d) = log d for arbitrarily small but finite θ. Hence we should not expect to find α-(C)SQF without α → ∞ as θ → 0.

Finite Groups and Transference. A common form of conditional expectation is
where G is a finite group, π(G) is a unitary representation in some Hilbert space, and ad u (X) := uXu † for any unitary u and matrix X. Since this conditional expectation is self-adjoint with respect to the trace, we need not distinguish between E G and its adjoint.
Groups may induce collective channels as considered in [37]. We build on a simplified version of some ideas from that paper and [21].
Let the map Φ p be given by for any probability vector p ∈ l 1 (G), where u(g) is an element of a particular unitary representation of group G. We may think of Φ p as a quantum channel parameterized by p.
We may also think of Φ as a map that takes a probability vector p as input and outputs a quantum channel that applies a correspondingly weighted convex combination of unitary conjugations. There is an analogous classical channel Ψ p : l 1 (G) → l 1 (G) given in the left regular representation by and h(f ) = δ h,f for any f ∈ G. In this formulation, g(·) denotes the action of G promoted to probability distributions in l 1 (G), and p g denotes the probability of group element g as given by p. Here l 1 (G) is the 1-normed vector space on group elements.
A key insight for quasi-factorization of finite groups (and graphs as considered in Subsection 4.3) is that of transference. Let u(g) be a unitary representation of a finite group group G on |G|-dimensional Hilbert space with basis {|g : g ∈ G} given by u g |h = |gh . This is a Hilbert space version of the left regular representation of G on itself. For any pair of probability distributions p, q ∈ l 1 (G) and any input density ρ, Given a sequence of composed quantum channels Φ p (1) • ... • Φ p (k) for some k ∈ N, we can calculate the final unitary weights via Ψ p (1) •...•Ψ p (k) . In many cases, the latter will be easier to handle, as it is a composition of classical rather than quantum channels. Furthermore, there are many circumstances in which mixing processes on groups or graphs have strong, known results that were previously not known to extend to quantum analogs. The principle of transference was used in [21,37]. Following the relative entropy inequalities established in this paper, we are able to extend the technique to yield tensor-stable relative entropy comparisons. Let 1/|G| denote the classical probability vector weighting each finite group element equally. If for some probability distribution q and ζ ∈ (0, 1), then The main idea of this Section is to transfer bounds of the form in Equation (32) to those on the corresponding quantum channel Φ p . In doing so, we derive Loewner order bounds on quantum representations from classical vector order bounds that are often simpler to calculate. From these order inequalities, the techniques of this paper yield relative entropy inequalities. Also, the convex combination form of equation (29) allows us to upper bound the constant c in Theorem 1.3 by |G|.
A nice case of quasi-factorization with transference involves subgroups G 1 , ..., G J ⊆ G. If ∪ J j=1 G j contains generators of the entire group, then we may conclude that at least some chain of conditional expectations from the set {G j } J j=1 of length |G| or shorter will include E G in convex combination. (C)SQF follows with good constants depending on the specific structure of the group. The highlighted example appears in the next section, where we consider transference analogs on conditional expectations and semigroups derived from finite graphs.
Though we highlight a particular group representation for clarity, the techniques of this Section apply to other representations. One may for instance consider both left and right regular representations, in which case nearly equivalent results hold. It is not always valid, however, to mix one classical representation with a different quantum representation. It is also not assured that distinct representations will always yield the same optimal constants. Example 4.1 (Symmetric Group of Degree 3). The group of permutations on 3 indices, known both as the symmetric group of degree 3 (S 3 ) and as the dihedral group of degree 3, is the smallest non-abelian group. It contains 6 elements, which we may represent on 6-dimensional probability space or on 6-dimensional Hilbert space as above. We may describe convex combinations of unitaries from this group by where p = (p 1 , ..., p 6 ) is a probability vector, and each u j is a unitary conjugation representing a distinct element of S 3 . When p 1 = ... = p 6 = 1/6, the channel becomes a conditional expectation to the invariant subalgebra of the group. Repeated applications of two non-redundant generators of S 3 will eventually generate every element of the entire group. Hence a sufficiently long chain of applications of Φ p approaches the fixed point conditional expectation as long as p is non-zero on at least two non-redundant generators. Mathematically, such a channel Φ p will have that lim k→∞ Φ k p = E S 3 . As in Equation 30, we may construct the channel Ψ p (·) : is the 6-dimensional, 1-normed vector space on elements of S 3 . Via Equation (31), where 1 denotes the probability vector corresponding to the group's identity element (not to be confused with 1, which equally weights all elements). Hence we may determine ζ and c as in Corollary 2.15 precisely for any p by finding the distribution induced by k repeated applications of Ψ( p) as discrete classical Markov chain to a probability vector. This process may allow us to bypass the potentially harder problem of calculating Φ k p for arbitrary quantum states. Applying Theorem 1.4, for sufficiently large k and when p has non-zero weight on at least some pair of non-redundant generators, (1)) is the minimum element of Ψ k p (1). We see via the convex combination form of E S 3 that c ≤ 6.
Finally, we may consider extensions of the Hilbert space by an arbitrary, finite-dimensional, auxiliary system that is untouched by Φ s for any s ∈ l 1 (S 3 ). The same transference as above nearly holds, but we note the possibility to define a maximally entangled (pure) state on the two-copy space.
Remark 4.2. Let G be a finite group of cardinality |G|. For any unitary representation, there is a natural conditional expectation given by Equation (28). Since the representation of the identity element is the identity unitary, E G (ρ) contains (1/|G|)ρ as a term. For any input density ρ, ρ = |G|(1/|G|)ρ ≤ |G|E G (ρ), so the first Pimsner-Popa index as in Equation (5) is upper-bounded by |G|. In general, Corollary 3.5 yields quasi-factorization for subgroups' conditional expectations with constants depending on the group structure but not otherwise on the Hilbert space dimension of the representation.
The relative entropy of frameness or asymmetry of a density ρ takes the form D(ρ E G * (ρ)), where E G * is the conditional expectation to the invariant subspace of some group in some representation [14,15,16]. In many of these cases, however, the group is compact but not finite. Nonetheless, replacing the sum in Equation (29) by an integral, it may still be possible to apply transference and related techniques. See [37] for examples.

4.3.
Finite, Connected, Undirected Graphs. In this section, we will use the symbol G = (V, E) to denote a graph with vertex set V and edge set E ⊆ V × V . We reserve G for a group.
The group and graph scenarios relate closely. An undirected, n-vertex graph G will have a naturally corresponding group G with action on 1...n in which each edge (i, j) ∈ V × V corresponds to the swap operation i ↔ j. This association is not unique -we may for instance identify a cyclic graph with a one-generator cyclic group, or with a multi-generator group of self-inverse swaps. Conversely, a finite group G has a corresponding Cayley graph.
For simplicity and to facilitate concrete calculations, we here define a representation. In the computational basis {|i : i ∈ 1...n}, let the unitary representation of undirected edge (l, j) ∈ E be given on a bipartite system A ⊗ B by u θ (l, j) := e iθ |l j| + e −iθ |j l| + r =l,j |r r| ⊗1 B .
Note the extra parameter θ: in addition to swapping the lth and jth basis states, such a representation may apply a relative phase on the switched elements. Here the graph representation acts on a system A of dimension n, which we may extend by an arbitrary, finite-dimensional system B on which the graph acts as the identity. Correspondingly, one may naturally define rank 1 basis vectors { v j : j ∈ 1...n} on the classical probability space in l n 1 . We will use the classical representation of an edge (l, j) given by the exchange v l ↔ v j . Under the identification v j ↔ |j j| between l n 1 and the diagonal densities on dimension n, then u θ (l, j) acts as this swap operation regardless of θ. In the literature [44,21,40], it is actually common to restrict to the diagonal or matrix-valued probability space in which A ∼ = l n 1 , while B is a space of densities. Indeed, the parameter θ suggests an ambiguity in the representation and complicates the analogy with the classical space. To restore this analogy, we work with channels of the form The channel Φ l,j dephases the ith and jth matrix elements as well as swapping. Hence it is non-unitary but in some ways more closely analogous to the classical swap. Acting on l n 1 ⊗ B, Φ l,j is equivalent to u θ (l, j) for any θ.
In [40], it was shown that connected graphs as represented above on l n 1 ⊗ B have CMLSI with constant at least 1/O(n 2 ). This result arises by comparing all graphs to the broken cycle, the slowest-decaying of connected graphs. Similarly, [52,30] showed general CMLSI, resolving at least the existence of CMLSI for graphs. In [21], complete graphs including two-vertex, single-edge graphs were shown to have CMLSI with constant of O(1), as the channels given by these graphs are already convex combinations that include E G . Missing so far have been explicit estimates of the CMLSI constants for expanders and similar graphs, which are often expected to be stronger than Ω(1/n 2 ) but worse than O(1).
We define the diagonal projection conditional expectation: For each edge there is a natural conditional expectation given by which projects any density ρ to the invariant subspace of Φ l,j and of u θ (l, j) for all θ. We note that E l,j = E l,j E diag(j) = E diag(j) E l,j , and similarly with E diag(l) . Also, denote the conditional expectation to the computational basis. There is also a natural conditional expectation to the invariant subspace of {Φ i,j : (i, j) ∈ E} given by Any k-fold composition Φ i 1 ,j 1 • ... • Φ i k ,j k can be written as a sum of ketbra conjugations ρ → |i j| ρ |j i| for i, j ∈ 1...n. Furthermore, as no Φ i,j escapes the diagonal basis, neither does any product of them, nor any convex combination of such products. Hence any quantum channel Φ p that is a convex combination of composed Φ i,j following or followed by E diag has the form Φ p (ρ) = n i,j=1 for a probability vector p ∈ l 1 (n) ⊗ l 1 (n) that weights transitions between basis states. The form of channel defined here allows a greater range of processes than expressed by symmetric graphs, such as those in which not every transition is balanced by its inverse with equal weight. Φ is not as general as an arbitrary combination of |i j| terms, as properties such as normalization are implicit in its structure. Hence not all values of p are valid. We need not explicitly check these constraints as long as the weights arise from a composition of valid physical processes.
We say that an undirected graph G is m-regular when each vertex has m incoming/outgoing edges. We denote by A the normalized adjacency matrix of a graph G, which for an m-regular graph is the adjacency matrix divided by m. We recall an important result on classical graphs: Theorem 4.3 (Theorem 3.3 from [53]). Let G be a connected, undirected, m-regular graph with n vertices. Let the eigenvalues of G's normalized adjacency matrix A be denoted 1 = λ 1 ≥ ... ≥ λ n , and γ = max{|λ n |, |λ 2 |}. Then for any normalized probability vector x ∈ l n 1 and t ∈ N, From this point on, we focus on m-regular, undirected graphs. Theorem 4.3 is a form of spectral gap condition analogous to that considered in [30,Lemma 2.6]. In the graph literature, however, 1 − λ 2 is often referred to as the spectral gap for a graph's Laplacian or normalized adjacency matrix. To avoid confusion, we define γ explicitly rather than refer to it as the spectral gap.
suggesting transference to quantum channels as in Equation (32). This Remark yields a similar result to Proposition 2.16, but the argument is simplified in the classical setting.
The following Lemma formalizes the notion of transference for a graph: Lemma 4.5. Let G be a graph with edge set E having cardinality |E|. Let Φ p , Φ q be quantum channels of the form in Equation (37) for probability vectors p, q ∈ l 1 (E). Let Ψ p , Ψ q be the respectively corresponding classical channels with the same weighting defined by applying the quantum channel on diagonal densities with the identification v i ↔ |i i| for vertex basis vector v i . Then for any ζ ∈ (0, 1), if and only if for all input vectors x.
Proof. That Equation (39) implies Equation (40) follows immediately from diagonality preservation and the classical-quantum identification.
To transfer the result of Theorem 4.3 to the setting of 4.4, we compare repeated applications of single edge conditional expectations as in Equation (35) to a random walk. One step of a random walk on a graph G with edge set E is represented by The corresponding classical channel Ψ RW (G) via the identification of Lemma 4.5 is equivalent to left multiplication by the normalized adjacency matrix of G as in Theorem 4.3.
Lemma 4.6. Let G be an m-regular graph with n vertices. Then for any k ∈ N, t < km/2, and ρ diagonal in the basis of G, and for Φ RW (G) defined by Equation (41), where ν = 2t/km, and Φ (≥t) for some probability distribution µ on t...|E|k. Here D(p q) is defined for p, q ∈ [0, 1] as the relative entropy of the two-outcome probability distribution (p, 1 − p) to (q, 1 − q).
Proof. Let |E| = mn/2 denote the number of edges in G. Via Lemma 3.2 and the convexity of relative entropy, 1 |E| for any channel Θ from the input space to itself. Iterating |E|k times starting with Θ =1 yields that Each edge conditional expectation E (i,j) has probability 1/2 to apply Φ i,j . Let Φ G := (1/|E|) i,j E i,j . As in Lemma 4.5, we define a corresponding classical channel Ψ (k) G : l 1 (n) → l 1 (n). For the rest of the proof, we analyze Ψ k G . Let v i denote the ith element in the canonical basis of l n 1 , which has probability fully concentrated at the ith vertex. Since Ψ G applies the conditional expectation corresponding to each edge with equal probability, and each edge touches two nodes, it has a probability of 2/n to apply a conditional expectation that would affect v i . If it does so, then with probability 1/2 it applies Ψ i,j for some (i, j) ∈ E, which we may regard as one step in a random walk. Since the action on {|i i|} defines the action on diagonal vectors, Ψ G applies Ψ RW (G) with probability 1/n and identity otherwise.
Let t denote the number of steps in a random walk. Since each application of Ψ G is equivalent to applying at least one step in a random walk with probability at least 1/n, we obtain a binomial distribution to take t steps in k|E| trials. By the Chernoff bound (see Theorem 1 in [54]), for any ν ∈ [0, 1), p(t ≤ νk|E|/n) ≤ exp(−k|E|D(ν/n 1/n)) .
Since the relative entropy is positive and convex, for any t ≤ νk|E|/n = νkm/2, Combining the above with Equation (43) completes the first inequality of the Lemma. For the 2nd inequality, we expand and estimate the binary relative entropy.
To complete the Lemma, we substitute in Equation (44).
Via transference, we obtain relative entropy bounds on the quantum representation of a random walk on the partly diagonal algebra l n 1 ⊗ S n 1 : Lemma 4.7. For any t ∈ N and m-regular graph G with γ as defined in Theorem 4.3, E G as in Equation (36), and Φ Proof. Let Ψ RW (G) be the transferred classical analog of Φ RW (G) from Equation (41). We apply Theorem 4.3 to Ψ RW (G) with γ as defined therein, recalling that Ψ RW (G) is equivalent to left multiplication by G's adjacency matrix. Since the 2-norm is an upper bound for the ∞-norm, sup Via Remark 4.4, Ψ for any input probability vector x and some classical channel Θ. By Lemma 4.5 and the properties of the cp-order, for any input density ρ.
A possible next step would be to apply Corollary 2.15 with ζ = nγ t and c = n. Instead, we obtain a better estimate of the constant c. Returning to Equation (45), the maximum element of Ψ (≥t) RW (G) ( x) is at most γ t + 1/n for any x. Recalling Equation (46) and applying Lemma 4.5 again, Θ( x) ≤ 2 × 1/n, soΦ(ρ) ≤ 2E G (ρ). We conclude that c ≤ 2. Using Corollary 2.15 with ζ = nγ t and c ≤ 2 completes the Lemma.
Combining results of this Subsection, we obtain the technical version of Theorem 1.8: Theorem 4.8. Let an m-regular graph with n vertices G have γ as defined in theorem 4.3. Then for any k, t ∈ N such that max{ log γ (1/n) , 2/m} ≤ t ≤ (1 − 1/2 m−1 )k, where D(p q) is defined for p, q ∈ [0, 1] as the relative entropy of the two-outcome probability distribution (p, 1 − p) to (q, 1 − q). Hence the Lindbladian given by has CMLSI with the same constant.
Proof. To use Lemmas involving Φ p , we must first reduce the desired results to those on densities that are diagonal on the subsystem on which the graph acts. Invoking the chain rule of relative entropy and data processing inequality, To estimate D(ρ E G (ρ)) in terms of l,j D(ρ E l,j (ρ)), we handle the relative entropy to the diagonal subalgebra, then handle the diagonal case.
To estimate D(ρ E diag (ρ)) in terms of l D(ρ E diag(l) (ρ)), we use Lemma 3.2 to combine diagaonlizing conditional expetations. Since each vertex appears m times on each side of an edge, but we only use one vertex in each edge, we double the sum over edges to count each vertex exactly m times. Hence Since [E diag(l) , E diag(j) ] = 0 for all l, j ∈ 1...n, Therefore, for any coefficient 1 ≥ α > 0, Without loss of generality we may henceforth assume that ρ = E diag (ρ). Lemma 4.6 implies that Then, Lemma 4.7 implies (assuming that ρ = E diag (ρ))) that A re-arrangement of this Equation yields the quasi-factorization part of the Theorem.
For the simplified Theorem 1.8 as in the introduction, we replace the binary entropy by its estimate as in Lemma 4.6. To simplify the expression, we choose t = k/2. We then choose k = 2 log γ (1/n) . Doing so, we arrive at the estimate that One may confirm by calculating its derivative and value at m = 2 that (ln m + 1)/m < 1, so for sufficiently large n, the subtracted exponential asymptotes to zero. This Equation yields the simplified Theorem 1.8. Theorem 4.8 is especially powerful when γ = O(1) in n, in which case it obtains the expected logarithmic mixing time for Ramanujan and similar graphs. Graphs with γ = O(1) are commonly known as fast expanders. These graphs have the fastest mixing times possible with fixed degree.
In contrast to graphs with constant γ is the cyclic graph, which is 2-regular and has the slowest mixing time up to constants of any connected, regular graph. In [40], it was shown that Lindbladians corresponding to cyclic graphs having CMLSI with constant O(1/n 2 ). Based on classical mixing times, we expect this dependence to be optimal. It is well-known that the eigenvalues of a cycle's normalized adjacency matrix take values cos(2πl/n) for l ∈ 0...n − 1, so γ = cos(2π/n) ≈ 1 − 4π 2 /n 2 . Hence log γ (1/n) = O(n 2 ln n), which is larger than the expected O(n 2 ) time for a random walk to converge on a cyclic graph. We may however obtain better estimates through a more case-specific, fine-grained analysis: Example 4.9 (Cyclic Graph). Here we will not go through Theorem 4.3 but will instead directly calculate bounds on the minimum and maximum probability after O(n 2 ) steps. For any a ∈ N, a random walker on an infinite, one-dimensional lattice has probability of landing s steps away from its original position after an 2 is given by a binomial distribution approximated as an 2 an 2 /2 + s within multiplicative error exp(1/(144an 2 + 12n)). The approximation follows from Robbins's precise form of Stirling's approximation [55]. Any location's probability on the cycle will be at least as large as its corresponding probability on the infinite lattice, because all of the possibilities for the walker to escape the cycle instead return and contribute positively. For a lower bound on the probability that the walker is at its furthest and therefore least likely point at the cycle, s = n/2, we start with the a = 1 case. For sufficiently large n, we coarsely overestimate this probability as at least 1/( √ aπn) by bounding the s-dependent factors in Equation (48) as greater than 1/ √ 2. Hence at a = 1, for some classical channel Ψ . Since 1/n is invariant under the application of random walks, we then have for any a ∈ N that for some classical channel Ψ a . Hence the minimum location probability is lower bounded For an upper bound on the largest possible probability, we must account for the chance that the walker loops around and returns to its original position. We must count the contributions of an 2 /2 + rn for many values of r ∈ Z. The r = 0 term is bounded by 2/πa/n times a factor that can be made arbitrarily close to 1 for large enough n, so we may use the simplified overestimate 2/(n √ πa). On an infinite line, the walker's probability of landing on a location +1 steps away from its original location is less than that of landing n steps away, which is upper bounded by 1/2n. Since left and right paths lead to the same location on the cycle, the probability of r = 1 is upper bounded by 1/n. To simplify this problem, we use Hoeffding's inequality [56], obtaining p(|s| ≥ |r|n) ≤ 2 exp(−2an 2 ( − + r/an) 2 ) = 2 exp(−2r 2 /a) , where p(|s| ≥ |r|n) is the probability of taking a net s steps in either direction. The total probability of being at any location declines with distance from the origin, and there are n total positions. The probability to get past r = 1 is in total less than 2 exp(−2/a), bounding the total probability for r to remain between 2 and 3. Hence the least likely s ∈ [1n, 2n) has probability at most 2 exp(−2/a)/n, which upper bounds the probability of r = + − 2 by 2 exp(−2/a)/n. Iterating, the total probability of r ≥ 2 is bounded by the series ∞ l=1 exp(−l 2 /a)/n. This series is not easily expressed in closed form via elementary functions, but it is upper bounded by the geometric series l exp(−l/a) = 1/(e 1/a − 1). Since e 1/a ≥ 1 + 1/a, the series's total is upper bounded by a, yielding a contribution of a/n. Adding the r = 0, r = 1, and r > 1 cases, we find a total of p max (a) ≤ 2/(n √ πa) + 1/n + a/n.
Using the transference technique from Lemma 4.5 and a simple calculation, for ζ = 1 − np min (a), and Θ ≤ cp n(p min + p max /(1 − np min ))E G . For large enough a that is constant in n and sufficiently large n, a non-trivial bound follows from 2.15. Using Lemma 4.6, bn 2 for a large enough constant b with κ lower bounded above zero independently from n.
Hence the cycle has quasi-factorization constant O(n 2 ). Via Theorem 1.4, the corresponding Lindbladian has O(1/n 2 )-CMLSI. This Example shows that for the cycle, quasifactorization and multiplicative relative entropy comparison suffice to obtain bounds of the expected best asymptotic order.
Another graph that is very unlike the expander or cycle is the complete graph, which is n-regular. CMLSI for complete graphs was studied in [21]. For a complete graph, 1 |E| Using the convexity of relative entropy, i,j D(ρ E i,j (ρ)) ≥ |E|D(ρ 1 |E| (i,j)∈E E i,j (ρ)). The right hand side of Equation (49) is already a convex combination of E G with another channel, so we simply use convexity of relative entropy to obtain that With iteration, We find a quasi-factorization constant of O(1/n). That the dependence is less than o(1) in n arises because we have not normalized by the degree -were we to construct such a jump process, the probability of making any jump in an infinitesimal time interval would grow proportionally to n. If we normalize with 1/m = 1/n, this leads to an O(1) CMLSI constant. As the Lindbladian constructed from a complete graph (again up to normalizing factors) already generates a convex combination including E G , earlier notions and methods [34,20,21] already suffice to show CMLSI for complete graphs. Finally, we note again that there are many ways to represent a graph or construct its corresponding Lindbladian. For example, [40] uses a single unitary u such that u n =1 to generate the edge transitions of a cyclic group, corresponding to a cyclic graph. In many of these cases, we can still use Theorem 4.3 and the same methods of analysis as in Theorem 4.8 to similar effect, and via the classical Laplacian construction of Remark 1.9, we see that in many cases the results are comparable.

Conclusions and Outlook
The underpinnings of this paper's results combine the entropy-geometry links from [48,21] that have continued to [40,52], functional calculus as in [30], and iterative, computation-like techniques as detailed in sections 2.1, 3, and 4.3. Using these methods, we derive comparisons between relative entropies of particular relevance to scenarios that combine subalgebra restrictions or decay processes. This combination of techniques may be useful in future studies.
It remains open what the best possible quasi-factorization constants are and whether these are calculable in a simple, closed form expression. In [30], a two-sided bound is shown in terms of an L 2 → L 2 norm difference between a composition of conditional expectations and their intersection, but the bound is not necessarily tight on either side.
A still open question is to what extent quasi-factorization and related inequalities hold for infinite dimensions. Proposition 2.16 breaks down if the minimum dimension of a conditional expectation E is infinite. Theorem 1.3 is expected to hold beyond finite dimensions and even in non-tracial von Neumann algebras, but it remains to check that assumptions and cited results hold. Once this is verified, most results of this paper will probably carry through to infinite-dimensional settings.

Acknowledgments
NL is supported by IBM as a Postdoctoral Scholar at UChicago & the Chicago Quantum Exchange. NL was previously supported by the Department of Physics at the University of Illinois at Urbana-Champaign.
I thank Marius Junge for early feedback on these results. I also acknowledge interactions with Li Gao as helping to motivate this project and withÁngela Capel as inspiring some improvements.

Author Declarations
The authors have no conflicts to disclose.
This will allow us to deal with the two terms in equation (A.1) individually. First, we handle the logarithm term, ln((1 − ζ) + ζnρ k ), by finding ζ such that We rewrite the right hand side as ln 1 + ζnδ (1 − ζ) + ζnρ j .
Proof of 2.6. Given that ζ ≤ a min 1 − b n + a(1 − b) + 1 , b (1 − ab)n + ab + 1 , we seek (1) A value of b at which both expressions are equal, such that the minimum is maximized; (2) The maximum possible constraint on ζ; (3) For a given ζ, the corresponding best achievable a. (4) A formula for ζ given a reasonable a.
The optimal solution may use numerics. Here we derive an approximation.
Here we show a proof of Proposition 1.6 that does not rely on Fisher information. While the proof is more involved, this version gives some intuition that might be useful in future work. To facilitate this proof, we recall a continuity bound for subalgebra-relative entropy and its weighted generalizations. Let When σ =1/d in dimension d, κ = D(M N ) = sup ρ D(E M (ρ) E N (ρ)) as described in [22] and applied to estimate wsb as Proposition 3.7 therein. In general, Lemma 7 in [7] implies that |D(ρ E N ,σ * (ρ)) − D(ω E N ,σ * (ω))| ≤ wsb( ρ − ω 1 , N ⊆ M, σ) . (A.7) As long as σ is faithful, κ is finite.
Proposition A.3 (Restatement of 1.6). Let {Φ t j : j ∈ 1...J ∈ N} be self-adjoint quantum Markov semigroups such that Φ t j = exp(−L j t) with fixed point conditional expectation E j * = lim t→∞ Φ t j for each j weighted respectively by (σ j ). Let E σ * be the weighted intersection fixed point conditional expectation, assuming E j * are compatibly weighted so that it exists. Let Φ t be the semigroup generated by L = j α j L j + L 0 , where L 0 generates Φ t 0 such that Φ t 0 E σ * = E σ * Φ t 0 = E σ * . If {Φ t j } has {α j }-(C)SQF, and Φ t j has λ-(C)MLSI for each j, then Φ t has λ-(C)MLSI.