Contextuality and the fundamental theorems of quantum mechanics

Contextuality is a key feature of quantum mechanics, as was first brought to light by Bohr and later realised more technically by Kochen and Specker. Isham and Butterfield put contextuality at the heart of their topos-based formalism and gave a reformulation of the Kochen-Specker theorem in the language of presheaves. Here, we broaden this perspective considerably (partly drawing on existing, but scattered results) and show that apart from the Kochen-Specker theorem, also Wigner's theorem, Gleason's theorem, and Bell's theorem relate fundamentally to contextuality. We provide reformulations of the theorems using the language of presheaves over contexts and give general versions valid for von Neumann algebras. This shows that a very substantial part of the structure of quantum theory is encoded by contextuality.


Introduction
Structural theorems of quantum theory. There is a small number of key theorems in the foundations of quantum theory which throw the differences between classical and quantum in sharp relief. The first and oldest of these is Wigner's theorem [69] from 1931, which shows that each transformation on the set of pure states of a quantum mechanical system that preserves transition probabilities is given by conjugation with a unitary or anti-unitary operator. Hence, the pure state space of a quantum system has a very specific structure.
In 1957, Gleason [35] proved that any assignment of probabilities to projection operators such that probabilities assigned to orthogonal projections add up must be given by a quantum state already. Since projections represent propositions about the values of physical quantities 1 by the spectral theorem, Gleason's result justified the use of the Born rule-30 years after its introduction-when calculating expectation values in quantum mechanics.
Bell [11] proved in 1964 that in local hidden variable theories, there is an upper bound on correlations that can exist between spatially separated subsystems of a composite system. Quantum theory violates this upper bound and hence cannot be (or be replaced by) a local hidden variable theory. There are a number of fine interpretational points, but it is largely accepted today that the violation of Bell's inequality has been confirmed experimentally.
Finally, in 1967 Kochen and Specker [49] showed that under mild and natural conditions, it is mathematically impossible to assign values to all physical quantities simultaneously. Usually, this is phrased as saying that there are no non-contextual value assignments. Since in classical physics, states do assign values to all physical quantities at once, this is a rather strong result and a severe obstacle to any realist interpretation of the quantum formalism.
Each of these landmark theorems singles out a central aspect of quantum theory that distinguishes it from classical physics. The contents of the theorems are very distinct; each one concerns a different structural aspect of quantum theory. Yet, in this article we will show that in fact all these theorems have a common source, which is contextuality. This strongly suggests that contextuality is at the heart of quantum theory and is largely responsible for the structural differences between classical and quantum.
Contextuality and its mathematical formalisation. Contextuality, which was introduced by Bohr [13], is a deep concept, and like many other deep concepts in physics, it has taken on a range of meanings and interpretations in the literature [36]. This has led to a certain danger of talking vaguely and at cross-purposes. In order to avoid this, we define precisely what we mean by contextuality and we give a rather minimal and conservative definition that comes with little interpretational baggage.
We say that a physical system has physical contextuality if it has some incompatible physical quantities, that is, quantities that cannot be measured simultaneously in an arbitrary state. For all we know today, physical contextuality in this sense is a characteristic feature of all quantum systems, but not of classical systems, and the restrictions on co-measurability are fundamental and not just due to a lack of experimenters' (or theoreticians') finesse.
Even in a physical system with physical contextuality, there are families of compatible, co-measurable physical quantities. (In an extreme case, such a family could consist of a single physical quantity, although this does not occur for quantum systems.) A family of compatible physical quantities is called a physical context. We remark that • Our definition of physical contextuality does not refer to actual measurement setups, but we could, as is often done, identify a measurement setup with a physical context, i.e., with the family of physical quantities that is measured by the setup.
• Our definition does not refer to values measured and/or possessed by physical quantities, nor to probabilities. This is not necessary for our purposes, and it avoids many interpretational issues that often cloud the discussion. In particular, we can consider whether non-contextual assignments of values, probabilities etc. are possible in our kind of contextual theory.
• In order to be compatible according to our definition, two physical quantities must be co-measurable in all states. Hence, two physical quantities that are comeasurable in some states can still be incompatible and lie in different contexts. We are not concerned with weak measurements, weak values etc.
The definition of physical contextuality is given in a somewhat intuitive manner, since no precise mathematical formalisation of 'physical quantities' and 'states' is provided so far. We remark-following Kochen and Specker [49] and especially Conway and Kochen [19]-that much less than the full Hilbert space formalism needs to be given: as long as the physical system under consideration (or some subsystem of it) has the physical quantity usually called 'spin-1', which can be measured in different directions in space, the system is contextual, since measurements in different directions cannot be performed simultaneously. The mathematical formalism required to describe this situation is a modest part of projective geometry in 3 dimensions.
Yet, following standard practice, we will assume the usual Hilbert space formalism in which the set of (bounded) physical quantities is mathematically represented by the set B(H) sa of bounded self-adjoint operators on the Hilbert space H of the system. The self-adjoint operators form the real part of the complex, noncommutative algebra of bounded operators on the Hilbert space of the system. A context is mathematically formalised as a commutative subalgebra of this noncommutative algebra. This can be generalised to von Neumann algebras of physical quantities. We will quickly recall the necessary mathematical background in Sec. 2. Moreover, we will introduce the context category and some minimal background on presheaves in order to make this article largely self-contained.
Structural theorems and contextuality. We will show that each of the key theorems mentioned above has an equivalent reformulation in the language of presheaves over contexts, thus showing the close and sometimes surprising connections between these theorems and contextuality. The prototype of such results is found in the work by Isham, Butterfield, and Hamilton [41,37] on the Kochen-Specker theorem. We will extend their results to the other fundamental theorems by Wigner, Gleason, and Bell and will provide a bigger, more coherent picture of the role of contextuality in the foundations of quantum mechanics.
In Sec. 3, Wigner's theorem is treated. The relevant presheaf is trivial, since Wigner's theorem is based on the mere order of contexts, as we will show. The Kochen-Specker theorem and its reformulation are presented in Sec. 4. Here, the so-called spectral presheaf plays the key role. It can be seen as a generalised state space for a quantum system, and the Kochen-Specker theorem is equivalent to the fact that this space has no points in a suitable sense. Gleason's theorem is treated in Sec. 5. Its reformulation is based on the so-called probabilistic presheaf, which does have points, i.e., global sections, and these correspond exactly with quantum states. Finally, we consider Bell's theorem and its relations to contextuality in Sec. 6. The relevant presheaf is a bipartite (or multipartite) version of the probabilistic presheaf. This presheaf is based on a simple way of composition via contexts, yet it is rich enough to encode all quantum correlations (and not more). In fact, by adding a notion of time orientation in subsystems it singles out quantum states unambiguously. Sec. 7 concludes.
2 Mathematical preliminaries 2.1 Algebras of physical quantities Throughout, we will take the perspective of algebraic quantum mechanics, that is, we emphasise the role of the physical quantities, or observables, and the algebra they form. This means no departure from standard textbook quantum mechanics, just a slightly different perspective that allows for substantial generalisations. We will assume that the physical quantities generate a von Neumann algebra (more details below; standard references are e.g. [47,48,65]). This allows to encode symmetries of the quantum system directly at the algebraic level by picking a von Neumann algebra that has a non-trivial commutant. Superselection rules can be modelled algebraically by algebras with non-trivial center.
The algebra of bounded operators, weak operator topology and norm topology. Let H be the complex Hilbert space of the quantum system under consideration. H may be finite-or infinite-dimensional. The set of all bounded linear operators on H is denoted B(H). Linear operators can be added, multiplied (by composition), and multiplied by complex numbers; B(H) is an algebra over C. Of course, multiplication of operators is not commutative in general.
If H is of finite dimension n, then H = C n and B(H) = M n (C), the algebra of all n × n-matrices with complex entries. If H is infinite-dimensional, then B(H) carries several interesting topologies (which all coincide in finite dimensions). We will only consider the weak operator topology and the norm topology. For details and many more results on topologies on B(H) and how they relate to each other, see e.g. [65].
Let ⟨ψ, η⟩ denote the inner product of ψ, η ∈ H. Let (a i ) i∈I be a family of bounded operators on H, labeled by a directed set I. The weak operator topology on B(H) is the topology of pointwise weak convergence, i.e., a i → a weakly if ⟨ψ, a i η⟩ → ⟨ψ, aη⟩ for all ψ, η ∈ H. We will simply say weak topology from now on.
Let a ∈ B(H). The norm of a is defined as where ψ = ⟨ψ, ψ⟩ is the norm of ψ. The topology on B(H) defined by the operator norm is called the norm topology (or uniform topology). Physical quantities, self-adjoint operators and von Neumann algebras. Let H be the Hilbert space of the quantum system under consideration. The physical quantities of the quantum system are represented by the bounded self-adjoint operators on H. The fact that we only consider bounded operators is not a severe restriction, since in the operator-algebraic framework there are ways of dealing with unbounded operators (by affiliating them to the von Neumann algebra of physical quantities, see e.g. section 5.6 in vol. 1 of [47,48]).
We assume as usual that the self-adjoint operators representing physical quantities form a real vector space under addition, denoted O sa . Multiplication (composition) of self-adjoint operators is not commutative in general, ab ≠ ba, and the product ab is selfadjoint if and only if a and b commute. The complexification of the set of self-adjoint operators, is a complex vector space that is closed under multiplication, moreover it is a complex algebra. By construction, it is a subalgebra of B(H), the algebra of all bounded operators on the Hilbert space H. We will always assume that this complex algebra is closed in the weak topology, i.e., the algebra is a von Neumann algebra (see e.g. [47]). 2 We denote this algebra of physical quantities by N . The real vector space of self-adjoint operators in N is denoted N sa .
Projections and the spectral theorem. Recall that a projection operator is a self-adjoint operator p such that p 2 = p. The rank of a projection p is the dimension of the subspace that p projects onto (which may be infinite). By the spectral theorem, each self-adjoint operator a is the norm limit of finite real linear combinations of projections, i.e., a can be approximated by operators of the form ∑ n i=1 A i p i , where the A i are real numbers and the p i are mutually orthogonal projections, that is, p i p j = δ ij p i . In finite dimensions, a is exactly of the form a = ∑ n i=1 A i p i , where the projection p i projects onto the eigenspace of the eigenvalue A i of a. In infinite dimensions, a may fail to have any eigenvalues, but it has a non-empty spectrum.
We had assumed that the algebra N is weakly closed. This guarantees that N is generated by the projection operators that it contains in the following sense: let a be a self-adjoint operator in N . All the projection operators p i in the approximations to a of the form ∑ n i=1 A i p i can be chosen from N . Moreover, any operator b in a von Neumann algebra N can be decomposed uniquely as b = a 1 + ia 2 , where a 1 , a 2 are self-adjoint operators in N . Since both a 1 and a 2 can be approximated by real linear combinations of projections in N , also b can be approximated by projections in N .
The double commutant and weak closure. There is another, more algebraic way of expressing weak closure. Let S be a family of bounded operators on H. The commutant of S, denoted S ′ , is the set of all operators in B(H) that commute with all operators in S, Von Neumann's double commutant theorem (see e.g. [47]) shows that a subalgebra N of B(H) is weakly closed, i.e., a von Neumann algebra if and only if N = N ′′ . Here

Any other von Neumann algebra N formed by bounded operators operators on H is a weakly closed subalgebra of B(H).
If H is finite-dimensional, H = C n , the algebra B(H) simply is the algebra of all n × n-matrices. Self-adjoint operators are given by Hermitian matrices. In finite dimensions, every operator is bounded and every subalgebra of B(H) is weakly closed.
The lattice of projections. The projections in a von Neumann algebra N form a complete orthomodular lattice, denoted P(N ). The lattice operations can most easily be understood in geometric terms: let S p denote the closed subspace of Hilbert space H onto which p projects. (Clearly, there is a one-to-one correspondence between projection operators and closed subspaces of H.) The meet p ∧ q of two projections is the projection onto the subspace S p ∩ S q , and the join p ∨ q is the projection onto the closure of the subspace generated by S p and S q . The largest projection is 1, the identity operator, which projects onto the whole of H. The smallest projection is 0, the zero projection, which projects onto the null subspace. Note that P(H) is not distributive, that is, in general we have It is easy to find examples in any Hilbert space of dimension 2 or greater: just take p, q, r to be projections onto one-dimensional subspaces lying in a plane.
Every family of projections (p i ) i∈I has a least upper bound (join) in P(N ), denoted ⋁ i∈I p i , and also has a greatest lower bound (meet) in P(N ), denoted ⋀ i∈I p i , so P(N ) is a complete lattice. Moreover, there is a complement defined on P(N ), given by 1 − p for p ∈ P(H). Geometrically, 1 − p is the projection onto the orthogonal complement of the subspace S p that p projects onto.
In quantum logic, the projection operators are interpreted as propositions about the values of physical quantities. Let a be a self-adjoint operator representing some physical quantity. For simplicity, assume that a = ∑ n i=1 A i p i . Then the projection p i represents the proposition "the physical quantity (represented by) a has the value A i ". The lattice operations ∧, ∨ and the complement are interpreted logically as And, Or, and Not. Non-distributivity and other conceptual problems make it difficult to interpret quantum logic in any straightforward way, though.
States on a von Neumann algebra. A positive operator a in a von Neumann algebra is an operator that is of the form a = b 2 for some self-adjoint operator b ∈ N sa . Clearly, positive operators are self-adjoint. A state on a von Neumann algebra N is a linear map ρ ∶ N → C such that (a) ρ(1) = 1, Hence, a state is a positive linear functional of norm 1. The states on a von Neumann algebra N form a convex set S(N ). The extreme points of S(N ) are called pure states. In the case of N = B(H), the pure states are the familiar vector states.
A state ρ is called multiplicative if ρ(ab) = ρ(a)ρ(b) for all a, b ∈ N sa . This is a strong condition: multiplicative states exist only on commutative von Neumann algebras. For these, they are exactly the pure states.
A state ρ is called normal if ρ(a i ) → ρ(a) for every monotone increasing net of operators a i ∈ N with least upper bound a. Equivalently, ρ is normal if it is completely additive, ρ(∑ i∈I p i ) = ∑ i∈I ρ(p i ), for every orthogonal family (p i ) i∈I of projections in P(N ). Normal states correspond to density matrices (or, more precisely, to positive operators which have trace 1), that is, every normal state ρ ∶ N → C is of the form where ̺ is a density matrix (in finite dimensions), or more generally a positive operator of trace 1. We will often use ρ for both the state and its corresponding positive operator. Note that in finite dimensions, every self-adjoint operator has a finite trace, but in infinite dimensions, having a finite trace is a proper condition. Those operators that have a finite trace are called trace-class.
Just as in standard quantum mechanics, in algebraic quantum theory (mathematical) states on a von Neumann algebra are interpreted as physical states of the quantum system, assigning expectation values to physical quantities.
Jordan algebras and associativity. Instead of the usual multiplication (by composition) of self-adjoint operators, one can also use another product, defined by This is the Jordan product, given by the anti-commutator of a and b up to the conventional factor 1 2 . In contrast to the usual product ab, the Jordan product a ⋅ b is always self-adjoint, even if a and b do not commute. Hence, there is a real Jordan algebra (B(H) sa , ⋅). Its complexification is (B(H), ⋅).
Clearly, the Jordan product is commutative, a ⋅ b = b ⋅ a for all a, b ∈ B(H) sa . To every von Neumann algebra N , we can associate a weakly closed Jordan algebra J (N ) by replacing the generally noncommutative product in N by the commutative Jordan product. There is a 'shadow' of noncommutativity left: it is easy to show that J (N ) is associative if and only if N is commutative.

Contexts and the context category
Mathematical contexts. We begin with the basic definitions of a (mathematical) context and the partially ordered set of all contexts of a quantum system. This definition of a (mathematical) context simply encodes the idea that within a (physical) context, which can be identified with a chosen experimental setup, all the physical quantities are compatible, co-measurable, and hence are represented mathematically by commuting self-adjoint operators. We denote contexts as V,Ṽ ,V , ...
The context category. There is a natural partial order on contexts: some contexts are maximal, that is, one cannot add any further self-adjoint operators to them without destroying commutativity. Other contexts are non-maximal, they are commutative von Neumann subalgebras that are properly contained in a larger context. In fact, if V is a non-maximal context, there are many different maximal contexts containing V . This can already be seen in finite dimensions: Example 2. Consider a spin-1 system with Hilbert space H = C 3 and algebra of physical quantities B(H) = M 3 (C). Recall that the physical quantities of the system are represented by self-adjoint operators in B(H). Let p 1 be a projection onto a one-dimensional subspace, and let be the non-maximal context generated by p 1 . The projections in V are 0, p 1 , 1 − p 1 and 1. Let p 2 , p 3 be two mutually orthogonal rank-1 projections such that p 2 + p 3 = 1 − p 1 (that is, p 2 and p 3 are also orthogonal to p 1 ). Then the context V 123 ∶= {p 1 , p 2 , p 3 } ′′ = Cp 1 + Cp 2 + Cp 3 is maximal and contains V 1 . In fact, the context V 1 is contained in continuously many maximal contexts: there are infinitely many pairs of mutually orthogonal rank-1 pro- Geometrically, one rotates in the plane that 1 − p 1 projects onto to obtain continuously many pairs p ′ 2 , p ′ 3 . Hence, any quantum system has many different contexts, with smaller ones contained in larger ones. The key idea for all that follows is that one should not just consider a single context of a quantum system, or a small number, but all of them simultaneously. This idea goes back to Chris Isham [41] and has become a fruitful perspective and powerful tool over the last 20 years. The topos approach to quantum theory and many subsequent developments are based on this idea, for an introduction see e.g. [27]. The key definition is: The name context category comes from the fact that every partially ordered set (or poset, for short) can also be regarded as a category [51]. The objects are the elements of the poset, and there is an arrow a → b if and only if a ≤ b. Hence, in a poset seen as a category, arrows express the order, so there is at most one arrow between any two objects. We will actually need very little category theory in the following, but we will feel free to use some simple and well-established categorical notions where appropriate. The reader not familiar with category theory can equally well read 'context poset' for 'context category' throughout. If a contextṼ is contained in another context V , we will write iṼ V ∶Ṽ ↪ V for the inclusion map. Alternatively, one could just writeṼ ⊂ V .
The following, powerful result is due to Harding and Navara [38]: Let N be a von Neumann algebra not isomorphic to C ⊕ C or to M 2 (C).
Then the context category V(N ) of N determines the projection lattice P(N ) as an orthomodular lattice up to isomorphism. Conversely, the projection lattice P(N ) determines the poset V(N ) up to isomorphism.
In fact, Harding and Navara's proof holds more generally for orthomodular lattices with no maximal Boolean sublattices with only 4 elements (this is why we exclude the trivial cases N = C ⊕ C and N = M 2 (C)). The result shows that the context category, i.e., the set of contexts together with the information of how contexts are contained within each other, encodes exactly the same amount of information as the projection lattice. In this sense,

Contextuality determines quantum logic and vice versa.
Note that the context category V(N ) is just a poset. Its elements, the contexts V ⊆ N , are just 'points' within V(N ), without inner structure. In particular, from the perspective of V(N ), we do not have access to the commutative von Neumann subalgebras, much less to the operators contained within each context. All the structure of V(N ) lies in the order, that is, in the information of how some contexts are contained within others. This makes the Harding-Navara result quite remarkable, since the mere order structure on contexts determines the full structure of the projection lattice.
Contexts without (non)commutativity. The context category V(N ) of a von Neumann algebra N can be defined without any reference to (non)commutativity: every weakly closed associative Jordan subalgebra of N is a commutative von Neumann subalgebra and vice versa. Hence, we can regard N as a weakly closed Jordan algebra and consider the set of its weakly closed associative Jordan subalgebras, partially ordered by inclusion. This poset is (isomorphic to) V(N ).

Presheaves over the context category
The concept of a presheaf: local data glued together. We saw in the previous subsection that the context category V(N ) already encodes a lot of information about a quantum system. Now we will build further structures upon the context category in order to make it an even more useful tool. Concretely, given the context category V(N ), we are interested in assigning data to each context. Moreover, since V(N ) is a poset, we want to relate the data assigned to a context V to the data assigned to a smaller contextṼ ⊂ V .
For example, for each context V ∈ V(N ), one may consider the set Σ(V ) of all pure states of V . IfṼ ⊂ V , then every pure state of V gives a pure state ofṼ simply by restriction, (Pure states of commutative von Neumann algebras are traditionally denoted λ.) In this way, we obtain a natural map from the pure states of V to the pure states ofṼ , hence, relating the data assigned to V to the data assigned toṼ . This is an example of a general construction, viz. a presheaf. This naming is traditional and has no particular meaning for us. The general definition of a presheaf over the context category V(N ) is: Of course, this is a very general notion. The idea is that the 'local' data assigned to each context can vary from context to context, but there are 'connecting maps' relating the local data at V andṼ wheneverṼ ⊂ V . The mathematical language of presheaves is a convenient tool for book-keeping.
Example 3. The trivial presheaf. The simplest (non-empty) presheaf over V(N ) is the trivial presheaf 1, which is given We will make use of the trivial presheaf when we consider Wigner's theorem in Sec. 3. For the treatment of the Kochen-Specker theorem in Sec. 4, we will use the so-called spectral presheaf (already sketched above), for Gleason's theorem (in Sec. 5), we will use the so-called probabilistic presheaf, and for Bell's theorem (in Sec. 6), the so-called Bell presheaf, a version of the probabilistic presheaf adapted to bipartite (or multipartite) systems. In fact, each of these presheaves is tailor-made for reformulating the respective theorem. The spectral presheaf is built from pure states in each context, and the probabilistic presheaf is built from mixed states.
Contravariance and coarse-graining. One may wonder why we are using contravariant functors rather than covariant ones. If V,Ṽ are contexts such thatṼ ⊂ V , why not map the local data assigned toṼ into the local data assigned to V ? Covariant functors do have their place in the bigger scheme [39], but, as it turns out, for our purposes we only need contravariant functors. Generically, the idea is that the data assigned to a larger context V is richer, more informative, and can be coarse-grained or restricted to the data assigned to a smaller contextṼ . Conversely, there is often no canonical way to 'fine-grain' or extend data assigned toṼ to data assigned to V . It is always possible to discard information, but it is often impossible to create information, at least not in a unique way. E.g. every pure state on V gives a pure state onṼ by restriction, but a pure state onṼ can usually be extended in many different ways to a pure state of V . (The problem here is that there is no canonical way of extending.) Mapping a presheaf into itself. In order to relate Wigner's theorem to the trivial presheaf in Sec. 3, we have to consider automorphisms of the trivial presheaf, that is, reversible mappings of 1 into itself. There are a number of possible definitions and conventions, but we will focus on a very simple notion of automorphism that suits our purposes.
Let P be a presheaf over the context category V(N ). Roughly speaking, we can map P to itself by first shifting the components around and then mapping each (shifted) component into itself in a way that is compatible with the restriction maps (natural transformation). More precisely, the shifting around of components is achieved by a morphism of the base category, which is V(N ) in our case, acting by pullback. This means that ifφ ∶ V(N ) → V(N ) is a morphism of the base category, then it acts as follows: P ○φ is the presheaf over The restriction maps of P ○φ are given in the obvious way by An automorphism Θ ∶ P → P of a presheaf P over V(N ) then consists of (a) an automorphismφ ∶ V(N ) → V(N ) of the base category acting by pullback, thus mapping P to P ○φ, Local and global sections of a presheaf. Presheaves over the context category V(H) (or V(N )) are not just sets, but collections of sets (one for each context), which are interconnected by the restriction maps. Hence, the notion of an 'element' of a presheaf must be defined suitably. Let P be a presheaf over V(N ) and let D be a downward closed subset of V(N ), i.e., if V ∈ D andṼ ⊂ V , thenṼ ∈ D. A local section γ of P over D consists of a choice of one element from the component This condition means that the elements γ V that we pick from the sets P V (where V ∈ D) fit together under the restriction maps of the presheaf P .
One should think of a local section γ of P over D as a 'partial' element of P . If one has a local section over D = V(N ), then γ is called a global section (or global element) of the presheaf P . This is the analogue of an element of a set or a point of a space.
For a given presheaf P , global sections may or may not exist (while local sections always exist, just make D small enough). In fact, finding a global section amounts to fitting specified local data, one element from each component of P , together into a whole. We will see that the presheaf reformulations of the Kochen-Specker theorem (following Isham, Butterfield, and Hamilton), Gleason's theorem, and also Bell's theorem are statements about the existence or nonexistence of global sections of certain presheaves. 3 Wigner's theorem, contextuality, and Jordan algebra structure 3

transition probabilities are preserved) is implemented by conjugation with a unitary or anti-unitary operator u,
Various nice proofs can be found in the literature, for a modern perspective see e.g. Cassinelli et al. [15]. These authors also prove the following: let Aut(P 1 (H)) denote the group of automorphisms of P 1 (H) (i.e., bijective maps ϕ ∶ P 1 (H) → P 1 (H) that preserve transition probabilities). If dim(H) ≥ 3, then Aut(P 1 (H)) is isomorphic to the group Aut(P(H)) of automorphisms of the projection lattice P(H), i.e., maps 3. preserve and reflect order, ∀p, q ∈ P(H) ∶ (p ≤ q) ⇔ (φ(p) ≤ φ(q)).
Geometrically, p ≤ q means that the closed subspace that p projects onto is contained in the closed subspace that q projects onto. Algebraically, (p ≤ q) ⇔ (pq = p). Since an automorphism φ ∈ Aut(P(H)) preserves the order, it also preserves all meets (greatest lower bounds) and all joins (least upper bounds) in P(H). Since φ also preserves complements, it is an automorphism of the complete orthomodular lattice P(H).
Hence, if the Hilbert space is at least three-dimensional, Wigner's theorem is equivalent to the fact that every automorphism of the projection lattice P(H) is implemented by conjugation with a unitary or anti-unitary operator, There is a generalisation of Wigner's theorem to von Neumann algebras, which is closer to the formulation with automorphisms of P(H) than automorphisms of P 1 (H). This is Dye's theorem [28]. Before we formulate the theorem, we recall that given a von Neumann algebra N , we can form the associated Jordan algebra (N , ⋅), which has the same elements and linear structure as N , and Jordan product given by Also recall that the Jordan product is commutative, but only associative if the von Neumann algebra N is commutative. A Jordan * -automorphism of (N , ⋅) is a bijective map Φ ∶ (N , ⋅) → (N , ⋅) such that both Φ and Φ −1 preserve the Jordan product and the involution ( ) * , that is, and analogously for Φ −1 . We can now formulate Dye's theorem: It is easy to see that the Jordan * -automorphism Φ induced by an automorphism φ ∶ P(N ) → P(N ) is ultraweakly continuous (or normal), i.e., it preserves (countable) joins of projections [25]. Conversely, every ultraweakly continuous Jordan * -automorphism Φ ∶ (N , ⋅) → (N , ⋅) induces an automorphism φ of the complete orthomodular lattice P(N ) by φ ∶= Φ P(N ) .
The ultraweakly continuous Jordan * -automorphisms of (N , ⋅) form a group denoted Aut(N , ⋅). Dye's theorem hence shows that, provided N has no type I 2 summand, there is a group isomorphism between the group of automorphisms of the projection lattice and the group of ultraweakly continuous Jordan * -automorphisms of N .
One may wonder how Dye's theorem and Jordan * -automorphisms relate to unitary and anti-unitary operators as in Wigner's theorem. To see this, we first need the following well-known result (see e.g. [5]): Proposition 4. Every Jordan * -automorphism Φ ∶ N → N of a von Neumann algebra N can be decomposed as the sum of a * -isomorphism and a * -anti-isomorphism.
More concretely, there are projections p, q in the center of N such that N is unitarily We see that Wigner's theorem is a special case of Dye's theorem (depending on some special features of B(H)) and we can rephrase Wigner's theorem as follows: This shows that contrary to the usual hand-waving arguments, there is a good mathematical reason to consider both unitary and anti-unitary operators in Wigner's theorem, since we actually have a statement about the structure of B(H) as a Jordan algebra. The Jordan structure is preserved by the action of both unitary and antiunitary operators.

The Döring-Harding result, contextuality and Wigner's theorem
So far, all this does not relate to contexts in any obvious way. The result that connects Wigner's theorem (and Dye's theorem) with contextuality is the following: Theorem 7. (Döring, Harding [26]) Let N be a von Neumann algebra not isomorphic to C ⊕ C or to M 2 (C). For every order automorphismφ This shows that the mere order structure of contexts determines the algebra of observables as a Jordan algebra up to isomorphism. The proof proceeds in two steps: first, using the result by Harding and Navara [38] already mentioned in Sec. 2.2, one shows that an order automorphismφ ∶ V(N ) → V(N ) induces a unique automorphism φ ∶ P(N ) → P(N ) of the projection lattice, second, by Dye's theorem, this gives a Jordan * -automorphism Φ ∶ (N , ⋅) → (N , ⋅). As a shorthand, Contextuality determines the Jordan algebra of physical quantities and vice versa.
As remarked in [25], it is easy to see that the Jordan * -automorphism Φ induced by an order isomorphismφ ∶ V(N ) → V(N ) is ultraweakly continuous. Hence, there is a group isomorphism The Döring-Harding result can be regarded as a reformulation of Dye's theorem (Thm. 3), explicitly showing that it is contextuality which determines Jordan algebra structure of von Neumann algebras (and vice versa). Specialising to the algebra N = B(H) and using Prop. 5, we have:

Conversely, every unitary or anti-unitary operator u induces an order automorphism of the context category V(H) by conjugation.
This is our first reformulation of Wigner's theorem. Remarkably, any bijective map φ that preserves the order on the collection of contexts must be implemented by a unitary or anti-unitary operator.

Wigner's theorem and the trivial presheaf over the context category
Obviously, the structure of the context category V(H) is sufficient to reformulate Wigner's theorem. Extra information, as may be provided by presheaves over V(H), is not necessary. In this sense, Wigner's theorem is the simplest of the theorems that we consider.
Yet, there is a reformulation of Wigner's theorem, more generally Dye's theorem, that does use a presheaf. Since we need exactly the information provided by the context category V(N ), which is a partially ordered set, the presheaf must mirror this partial order (and nothing more). The trivial presheaf 1 over V(N ) does this: the component at V ∈ V(N ) is the one-element set { * }, and for every inclusion Hence, we both have the elements of the poset V(N ) and the order relation encoded by 1. 3 As discussed in Sec. 2.2, an automorphism Θ of a presheaf P consists of two things (according to our convention): a shifting around of components, induced by an automorphismφ of the base category acting by pullback, followed by an isomorphism In our case, an order automorphismφ ∶ V(N ) → V(N ) acts by pullback on the trivial presheaf 1 over V(N ) in the following way: Since each component 1 V of the trivial presheaf 1 is just a one-element set, Hence, an automorphism of the trivial presheaf 1 over V(N ) is simply given by a shifting around of components, induced by an (order) automorphism of the base category V(N ). We have shown: Lemma 9. Let N be a von Neumann algebra. There is a bijective correspondence between automorphisms Θ of the trivial presheaf 1 over V(N ) and order automorphisms φ of the context category V(N ).
From this and Thm. 7, it follows: Corollary 10. (Dye's theorem in presheaf form.) Let N be a von Neumann algebra not isomorphic to C⊕C or to M 2 (C). For every automorphism Θ of the trivial presheaf 1 over the context category V(N ), there is a unique (ultraweakly continuous) Jordan is the automorphism of the context category corresponding to Θ.
Finally, from Lm. 9 and Thm. 8 we obtain: Note that the trivial presheaf contains exactly the right amount of information: every automorphism of 1 gives a unitary or anti-unitary u and vice versa. More physically speaking, every 'rearrangement' of contexts that preserves the order (i.e., preserves how contexts are contained within each other) determines a unitary or anti-unitary operator and vice versa.

The Kochen-Specker theorem and the spectral presheaf
The Kochen-Specker theorem [49] is deeply connected to contextuality. The usual interpretation of the theorem amounts to a negative statement of the kind 'there are no non-contextual assignments of values to physical quantities'. 4 Kochen and Specker's result excludes certain state space models for quantum theory: a hypothetical state space Σ on which each physical quantity a is represented by a real-valued function f a ∶ Σ → R would imply the existence of valuation functions (i.e., assignments of values to physical quantities such that the spectrum rule and the functional composition principle hold, see below). In fact, each point s of Σ would provide a valuation function v s , given by evaluation at that point, that is, the value of a physical quantity a would simply be v s (a) = f a (s). By showing that no valuation functions exist, Kochen and Specker exclude the existence of such a state space Σ.
Precisely because state space models are excluded, it is difficult to interpret the Kochen-Specker theorem in geometric terms. Also, it is not straightforward to see the exact nature of the connection between contextuality in our sense (commutative subalgebras of compatible physical quantities, arranged into a poset) and the nonexistence of valuation functions.
Both aspects were clarified by Isham and Butterfield in a beautiful series of papers [41,42,37,43], with J. Hamilton as a co-author of the third paper. In fact, the context category first shows up in these papers, and so does the spectral presheaf Σ, which will be defined below. The latter plays a central role in the topos approach to quantum theory [27] and serves as a generalised state space for a quantum system, notwithstanding the Kochen-Specker theorem. In fact, the Kochen-Specker theorem is equivalent to the fact that the quantum state space Σ has no points (technically, it has no global sections). This reformulation serves as the prototype for the reformulations of the other fundamental theorems of quantum theory in this article.
In Sec. 4.1, we will give a quick overview of the Kochen-Specker theorem and its background. Then, in Sec. 4.2, we make some connections with contextuality and present the presheaf reformulation of the Kochen-Specker theorem by Isham, Butterfield, and Hamilton. Finally, we extend their results to von Neumann algebras.

Valuation functions, the Kochen-Specker theorem, and contextuality
In their seminal paper [49], Kochen  In the proof, a certain family of rays, i.e., rank-1 projections is considered. Each of these must be assigned either 0 or 1 according to the spectrum rule and in every orthogonal triple p 1 , p 2 , p 3 of rank-1 projections, exactly one projection is assigned 1 and the others are assigned 0. By carefully choosing the family of projections, Kochen and Specker construct an explicit counterexample: they show that no consistent assignment of values 0, 1 to the projections in their family is possible. The original proof used a configuration of 117 rays in H = R 3 (the real Euclidean space), which could later be reduced to 31, and even fewer in C 4 . The proof of the result in real, three-dimensional Hilbert space implies the result in higher-dimensional, real and complex Hilbert spaces.
It is noteworthy that the proof of the Kochen-Specker theorem relies only on some basic projective geometry in R 3 . As soon as one accepts that there is a physical quantity, usually called 'spin-1', which can be measured in all spatial directions and which takes three different values, 5 the Kochen-Specker theorem holds unless one is willing to give up either the spectrum rule or the functional composition principle. Much less than the full mathematical apparatus of quantum mechanics is required to prove the theorem and very little physical interpretation, or metaphysical baggage, underlies the proof.
Bell provided a proof of the same result, i.e., there are no non-contextual assignments of values to physical quantities, in [12]. His proof uses a continuity argument and Gleason's theorem and hence is not 'discrete' as Kochen and Specker's proof. The relations induced by Kochen-Specker triples led Isham and Butterfield to introduce the spectral presheaf. In [41] the self-adjoint operators themselves served as 'stages', and in [37], the step to commutative subalgebras of B(H) and the context category was taken. We will focus on the latter.

The spectral presheaf and reformulation of the Kochen-Specker theorem
First, consider a single commutative subalgebra V of B(H). We assume for the moment that V is closed in the so-called norm topology, hence V is a C * -algebra. 6 Not surprisingly, there are valuation functions on V sa , the self-adjoint operators in V : every character, that is, every multiplicative linear functional of norm 1, fulfils both the spectrum rule and the functional composition principle. Conversely, every valuation function is a character of V . Recall from Sec. 2.1 that multiplicative linear functionals of norm 1 on a commutative von Neumann algebra V are exactly the pure states of V . Hence, we consider the set of characters (or pure states) of V , traditionally called the Gelfand spectrum of V . 7 In physical terms, Σ(V ) is the (pure) state space of the physical system described by the physical quantities in V . As expected, the points of the state space Σ(V ) correspond exactly with valuation functions on V sa .
IfṼ ⊂ V is a unital C * -subalgebra, then every character ofṼ arises as the restriction of some character of V , that is, there is a surjective map Hence, there is a canonical map from the state space of the bigger algebra V to the state space of the smaller algebraṼ . Every valuation function on V can be restricted to a valuation function onṼ .
In infinite dimensions, it is useful to work with commutative von Neumann subalgebras instead of the more general C * -subalgebras, and we will do so from now on. This guarantees the existence of sufficiently many projection operators within our algebras and simplifies some arguments. In finite dimensions, there is no difference.
Isham, Butterfield, and Hamilton's key idea [41,37] was to combine all the state spaces for commutative subalgebras of a quantum system into one global object. This is the spectral presheaf: It is clear by construction that the spectral presheaf Σ is a kind of state space for the quantum system, built from all the state spaces Σ(V ) of the commuting, compatible parts V ∈ V(H) of the noncommutative algebra B(H) of physical quantities.
As we saw in Sec. 2.2, for a presheaf, the analogue of a point is a global section. What would a global section of the spectral presheaf Σ be? For every context V ∈ V(H), we have to pick one element λ V ∈ Σ V , the Gelfand spectrum of V . λ V is a valuation function for the physical quantities in V , i.e., it assigns a value λ V (a) to all a ∈ V sa such that the spectrum rule and functional composition hold.
Moreover, ifV is another commutative subalgebra that contains a, then a is also contained inṼ ∶= V ∩V . The value we assign to a in V is λ V (a) and the value we assign to a inV is λV (a). Moreover, the value we assign to a inṼ = V ∩V is and also The structure of a global section therefore guarantees that the value assigned to a physical quantity, represented by the self-adjoint operator a, is the same, independent of the context in which it lies. Since also the spectrum rule and the functional composition principle hold, every global section of Σ would provide a valuation function on all of B(H). Conversely, a valuation function would give a global section of Σ.
Since the Kochen-Specker theorem shows that there are no valuation functions, i.e., no non-contextual value assignments, Isham, Butterfield, and Hamilton could give the following reformulation:

Theorem 13. The Kochen-Specker theorem is equivalent to the fact that the spectral presheaf Σ(H) has no global sections whenever dim(H) ≥ 3.
In more physical terms, the Kochen-Specker theorem is equivalent to the fact that the quantum state space Σ has no points. This does not mean, however, that Σ is 'empty', it still has plenty of subobjects (which are the presheaf analogue of subsets). One can just not 'focus down' to points, which would be (nonexistent) microstates.
We note that the nonexistence of points/global sections is not just a consequence of Kochen-Specker, but is exactly equivalent. This shows that the context category and the spectral presheaf contain just the right amount of information and that the Kochen-Specker theorem indeed is encoded by our notion of contextuality.
The Kochen-Specker theorem was generalised to von Neumann algebras in [23]:

Theorem 14. (Generalised Kochen-Specker theorem.) Let N be a von Neumann algebra with no direct summand of type I 2 . There are no valuation functions
The condition that N has no summand of type I 2 generalises the condition that dim H ≥ 3 in the original proof of the theorem.
Since we can easily define the spectral presheaf of a von Neumann algebra N (simply replace B(H) by N and V(H) by V(N ) in Def. 4), we also have the following reformulation of the generalised Kochen-Specker theorem: Corollary 15. Let N be a von Neumann algebra with no direct summand of type I 2 , let V(N ) be the context category of N , and let Σ be its spectral presheaf. The generalised Kochen-Specker theorem is equivalent to the fact that Σ has no global sections. 5 Gleason's theorem and the probabilistic presheaf 5

.1 Gleason's theorem
Gleason's theorem, proven in 1957 [35], showed that the Born rule follows from very modest assumptions. Let H be a Hilbert space of dimension 3 or greater. Assume that there is a function assigning probabilities to projection operators,

Condition (a) is the obvious normalisation condition and (b)
is finite additivity on mutually orthogonal projections. Clearly, if one aims to have any probabilistic formalism relating to projections (representing propositions about a quantum system), having such a function µ that assigns probabilities to projections is the minimal and natural requirement. There is a built-in non-contextuality condition: every projection p lies in many different contexts, but µ assigns just one probability to p, independently of contexts.
There is an obvious strengthening of finite additivity to infinite families of mutually orthogonal projections, called complete additivity: (b') for any family (p i ) i∈I of mutually orthogonal projections (i.e., p i p j = δ ij p i for all i, j ∈ I), it holds that µ(⋁ i∈I p i ) = ∑ i∈I µ(p i ).
Note that if the underlying Hilbert space H is separable, then the index set I is at most countable. 8 Gleason showed the following, partly answering an earlier problem posed by Mackey: In finite dimensions ρ µ is nothing but a density matrix. As mentioned in Sec. 2.1, this means that every completely additive probability measure on projections determines a unique normal state of B(H). Conversely, every normal state, equivalently every positive trace-class operator of trace 1 (or, in finite dimensions, every density matrix) ρ determines a unique completely additive probability measure by Obviously, µ ρµ = µ and ρ µρ = ρ. Hence, Gleason's theorem justifies the use of density matrices and the Born rule in quantum mechanics. The condition that the Hilbert space is at least three-dimensional is essential and we will assume dim H ≥ 3 from now on.
In order to understand the power of Gleason's theorem, note that the definition of a completely additive probability measure µ ∶ P(H) → [0, 1] only poses conditions on mutually orthogonal, hence, commuting projections. 9 For simplicity, let us assume the Hilbert space H is finite-dimensional, H = C n . Let V be a context of B(H) = M n (C), the complex n × n-matrix algebra. Let {p 1 , ..., p m } denote the unique set of minimal projections in V . Then the p i are mutually orthogonal and V is generated by them, V = {p 1 , ..., p m } ′′ . Every self-adjoint operator a ∈ V sa is a unique real linear combination of the p i , that is, a = ∑ m i=1 A i p i . We extend the finitely additive probability measure µ to a function µ ∶ V sa → R by This implies directly that if r ∈ R, then µ(ra) = rµ(a) and if a, b ∈ V sa , then µ(a + b) = µ(a) + µ(b). Hence, µ ∶ V sa → R is a real-linear function. Importantly, if a and b do not commute, it is not obvious at all if µ(a + b) = µ(a) + µ(b) holds or not. A finitely additive probability measure µ ∶ P(H) → [0, 1] gives a function µ ∶ V sa → R that is linear in every context V in a straightforward way, but it is not clear initially why this function should also be linear across contexts, i.e., on noncommuting operators.

Traditionally, a function that is linear on commuting operators is called quasi-linear.
Gleason's result shows that there always exists a density matrix ρ µ such that µ(p) = tr(ρ µ p) and clearly, we also have ∀a ∈ V sa ∶ µ(a) = tr(ρ µ a) due to linearity of the trace. Crucially, the map which implies that µ is also linear on all operators. Hence, the quasi-linear function µ is in fact linear. In this way, Gleason's theorem solves a local-to-global problem, where 'local' here means 'on commuting operators' (or 'within contexts') and global means 'on all operators'.
By the efforts of many people, Gleason's theorem has been generalised to von Neumann algebras (see [52] and references therein):

Theorem 17. (Generalised Gleason's theorem.) Let N be a von Neumann algebra
with no direct summand of type I 2 , and let µ ∶ P(N ) → [0, 1] be a finitely additive probability measure on the projections of N . There exists a unique state ρ µ of N such that ∀p ∈ P(N ) ∶ µ(p) = ρ µ (p) .
Note that here ρ µ ∶ N → C denotes the state itself (i.e., a positive linear functional of norm 1), while before the state was denoted tr(ρ µ ) ∶ B(H) → C and ρ µ was just the positive trace-class operator (or density matrix). The reason is that the state ρ µ need not be normal and hence there may be no density matrix. In fact, ρ µ is normal if and only if the probability measure µ is completely additive.

The probabilistic presheaf and reformulation of Gleason's theorem
In order to relate Gleason's theorem (in its generalised form), Thm. 17, more explicitly to contextuality, we consider a certain presheaf that encodes probability assignments to projections. The obvious definition is: Here, the restriction µ V Ṽ of the function µ V ∶ P(V ) → [0, 1] to P(Ṽ ) ⊂ P(V ) is simply marginalisation.
Note that this is the simplest possible definition of a presheaf built from finitely additive probability measures (FAPMs) on contexts. An element µ V ∈ Π V is a FAPM for the projections in V , so it only assigns probabilities to projections in V , not to all projections (unlike the FAPM µ ∶ P(N ) → [0, 1] in the generalised Gleason's theorem, Thm. 17, which assigns probabilities to all projections in N ).
The probabilistic presheaf Π can be seen as a generalisation of the spectral presheaf Σ in the following way: at each context V ∈ V(N ), the component Σ V of Σ is the set of pure states λ ∶ V → C, see Def. 4. In the probabilistic presheaf Π, on the other hand, the component Π V is given by finitely additive probability measures µ V ∶ P(V ) → [0, 1]. The latter are positive linear functionals on V of norm 1, i.e., convex linear combinations of elements in Σ V . In other words, the elements of Π V correspond to mixed states of V , while the elements of Σ V correspond to pure states of V , equivalently, extreme points of Π V .
What about global sections of the probabilistic presheaf? Prima facie, we do not know whether global sections exist or not, but we now show that every quantum state ρ ∶ N → C gives a global section γ ρ of Π. Define Here, ρ P(V ) is the restriction of the quantum state to the projections in the context V . Since ρ is linear, ρ P(V ) is a finitely additive probability measure on the projections in V . If a projection p is contained in a context V and a subcontextṼ ⊂ V , then Conversely, let γ be a global section of the probabilistic presheaf Π. In every The restriction maps Π(iṼ V ) guarantee that, whenever a contextṼ is contained in another context V , a projection p is assigned the same probability, no matter whether we regard p as a projection in V or inṼ . Hence, a global section γ of the probabilistic presheaf Π gives a finitely additive probability measure µ on all projections in N . By Gleason's theorem for von Neumann algebras, this determines a unique state ρ γ of the algebra N , provided N has no type I 2 summand. 10 Before we state Gleason's theorem in its contextual reformulation, we discuss the following slight variation of Def. 5. Note that we may interpret the probability measures µ V ∈ Π V as positive operator-valued measures. In fact, by Gelfand duality every commutative von Neumann algebra V ∈ V(N ) corresponds with an (extremely disconnected) compact Hausdorff space, whose σ-algebra of open (and closed) sets corresponds with the projection lattice P(V ). Identifying R with (real-valued) (1 × 1)matrices, µ V (trivially) becomes a positive operator-valued measure, and by Naimark's theorem [54], we can find a dilation of the form where v V ∶ C → K is a bounded linear map into some Hilbert space K, (equivalently, v V ∈ K under scalar multiplication), and ϕ V ∶ P(V ) → P(K) is an embedding (or spectral measure).
In this reading, we obtain a finitely additive probability measure µ V by setting v V Ṽ = vṼ and ϕ V Ṽ = ϕṼ wheneverṼ ⊂ V : first, by Dye's theorem [28,45], the latter defines an orthomorphism ϕ ∶ P(N ) → P(K), which lifts to a unique Jordan * -homomorphism Φ ∶ P(N ) → B(K); second, by an extended version of Gleason's theorem in [44], (v V ) V ∈V(N ) corresponds with a unique vector v ∈ K. Note that in contrast to the marginalisation constraints in the probabilistic presheaf, it is crucial to restrict the dilations µ V = v * V ϕ V v V with respect to both v V and ϕ V , since there is (at least) a freedom in choosing complex phases v V → e iα v V , α ∈ R in every context, which leaves the measures µ V invariant, but obscures linearity of v ∈ K. Alternatively, we may choose v ∈ K fixed and only consider global sections arising under restrictions with respect to ϕ V along context inclusion. This is the approach taken in Def. 6 below.
Importantly, collections of dilations over contexts (µ V = v * ϕ V v) V ∈V(N ) still correspond with quantum states ρ = v * Φv. In finite dimensions, the latter is easily recognised as a purification of ρ. In particular, restricting to pure states, we may choose K = H and ϕ ∶ P(H) → P(H) the identity map such that v⟩ ∈ H is the pure state corresponding to ρ(p) = ⟨v p v⟩ = tr( v⟩⟨v p) for all p ∈ P(N ). As always, mixed states correspond to convex combinations of pure states; in this sense, applying Naimark's theorem in contexts amounts to a type of intrinsic convexity condition with respect to the set of pure states. Taking the latter into account, we refine Def. 5 as follows. Definition 6. Let N be a von Neumann algebra with context category V(N ) and K a Hilbert space. The dilated probabilistic presheaf Π ∶ V(N ) op → Set of N over V(N ) is the presheaf given (i) on objects: for all V ∈ V(N ), let Note that in a slight abuse of notation we denote the dilated probabilistic presheaf Π of N over V(N ) by the same symbol as the probabilistic presheaf, Π of N over V(N ). The reason is that, by the preceding discussion, both share the same set of global sections, in fact, we have shown the following.

Theorem 18. (Generalised Gleason's theorem in contextual form.) Let N be a von Neumann algebra with no direct summand of type I 2 . There is a bijective correspondence between quantum states, that is, states on N , and global sections of the (dilated) probabilistic presheaf Π over V(N ).
This is our reformulation of Gleason's theorem, which connects it explicitly with contextuality. In contrast to the spectral presheaf, the (dilated) probabilistic presheaf does have global sections and they correspond exactly with quantum states. It is remarkable that the very simple definition of the probabilistic presheaf Π, with FAPMs in every context, connected by the obvious restriction maps in the form of marginalisation, suffices to guarantee this (for single systems). 11 As usual, the power of the construction lies in the restriction maps (and, of course, Gleason's theorem). In particular, no further local or global data is needed. In physical terms, there is no need for hidden variables. More importantly, there is no room for hidden variables: as soon as a theory assigns probabilities to all projections in dimension 3 or greater in the obvious way, i.e., finitely additively on orthogonal projections, there exists a quantum state that provides this assignment of probabilities.
There are no other (finitely additive) assignments of probabilities to projections apart from those given by quantum states.
Any hidden variables or other extra data could at best give further restrictions. 12 It is worthwhile mentioning the case N = B(H) explicitly.

Corollary 19. (Gleason's theorem in contextual form.) Let B(H) be the algebra of all bounded operators on a Hilbert space H. If dim H ≥ 3, then there is a bijective correspondence between quantum states, that is, states on B(H), and global sections of the (dilated) probabilistic presheaf Π over V(H).
Moreover, one can consider a presheafΠ that is closely related to the (dilated) probabilistic presheaf Π, but has as component at V ∈ V(N ) only completely additive probability measures. It is easy to check that if a von Neumann algebra N has no type I 2 summand, there is a bijective correspondence between normal states on N and global sections of the normal (dilated) probabilistic presheafΠ over V(N ). As a special case, if dim H ≥ 3, there is a bijective correspondence between normal states on B(H) and global sections ofΠ over V(H).
The fact that Gleason's theorem is closely linked with contextuality in this manner was first observed in [23], made more explicit by de Groote [20], and in a form very similar to the one above in [24].

Bell's theorem and contextuality
Bell's seminal paper [11] responds to a long-standing conjecture by Einstein, Podolsky and Rosen (EPR) [30], who claim quantum theory is only a statistical version of a more fundamental theory, similar to the relation between thermodynamics and statistical mechanics. Besides the probabilistic nature of quantum theory, this idea is motivated by certain nonlocal features present in the quantum formalism, believed to be resolved within the more fundamental theory. As a response to EPR's famous thought experiment, Bell formalises EPR's assumption of an underlying space of hidden variables and derives a constraint for the maximal amount of correlations possible in such theories under the additional assumption of locality [11,10]. However, some quantum mechanically predicted and experimentally verified correlations [7,34,62] do not obey these constraints and thus cannot be reproduced by any local hidden variable model. We show that, as with the other theorems discussed in this article, the essence of Bell's theorem is naturally encoded in a partial order of contexts and we discuss the relation between contextuality and locality in this setting. The connection between these concepts has been highlighted before [3], here we extend these results in several ways, in particular, we stress the importance of composition.
We first recall the derivation of Bell's theorem in Sec. 6.1 emphasising the assumption of an underlying single-context state space [29], i.e., with trivial physical contextuality. In Sec. 6.2 we show that factorisability is closely linked with composition of (single-context) state spaces. We then consider two further notions of composition, one via contexts and the standard composition in quantum theory, via tensor products. Although standard composition results in a much richer context structure than our notion of composition via contexts, we show that it suffices to consider the dilated probabilistic presheaf over the smaller set of product contexts (to be defined as the Bell presheaf below), in order to uniquely single out quantum correlations. This is interpreted as a contextual form of Bell's theorem.

Classical state spaces
We first give an account of what we mean by a classical theory. For our purposes it will be enough to consider the kinematics and so we start with a set (soon to be upgraded to an algebra) of observables O. We take as a defining property of a classical theory that all its observables are simultaneously measurable, from the perspective of physical contextuality we are thus considering the trivial case of a single context [29]. 13 Observables a ∈ O in classical theories are mathematically represented by measurable 14 functions f a ∶ Σ → R from some measure space (Σ, σ, ds) to the real numbers. Σ is called the (single-context) state space of the theory and every microstate s ∈ Σ assigns truth values to propositions of the form ′ a ∈ ∆ ′ (read 'the physical quantity a has a value within the Borel subset ∆ ∈ R' ): We can therefore speak of the value of an observable v s (a) given the state s ∈ Σ in the intuitive sense, i.e., through evaluation of the corresponding measurable function, v s (a) ∶= f a (s) .
The valution functions v s ∶ O → R in Eq. (2), which we already discussed in connection with the Kochen-Specker theorem in Sec. 4, are defined on all observables, in other words, every observable has an intrinsic (sharp) value in every state. 15 The observation that all observables simultaneously take deterministic values justifies to model physical states by points in some space Σ and observables by (measurable) functions f a ∶ O → R in the first place. Of course, this inductive reasoning has to be revisited for nonclassical theories, that is, theories with non-trivial physical contextuality. In so doing, we attribute a fundamental role to observables, whereas states appear as a secondary concept. This perspective will become important in Sec. 6.2.1, when we go from single to multiple-context state spaces. In (classical) physics it is natural to equip the set of observables O with the structure of an algebra. In fact, by modelling observables as functions we are automatically given a vector space structure as well as a product by pointwise multiplication of functions. 16 It is straightforward to extend the definition of valuation functions in Eq. (2) to this algebraic structure, namely, for all a, b ∈ O, r ∈ R and s ∈ Σ we set In other words, classical states s ∈ Σ correspond to algebra homomorphisms v s ∶ O → R. Note that in the presence of physical contextuality this suggests to consider generalised classical states to be valuation functions, that is, partial algebra homomorphisms for which Eq. (3) only holds within (sub)algebras of simultaneously measurable observables, or contexts. In the setting of von Neumann algebras, which we will adopt again in later sections, Eq. (3) holds as a consequence of the functional composition principle, v s (f (a)) = f (v s (a)) , 17 (4) whenever O is a commutative von Neumann algebra. Yet, as the Kochen-Specker theorem shows, such valuation functions cannot exist under mild and natural conditions. We will see that Bell's theorem assumes a similar reformulation as a no-go-result for such (generalised) classical states, based on the additional assumption of composition.
In the remainder of this section we give a derivation of factorisability and thus Bell's theorem for classical, single-context theories with composition described by the canonical product of state spaces. Given two subsystems with measure spaces (Σ 1 , σ 1 , ds 1 ) and (Σ 2 , σ 2 , ds 2 ), the composite state space is defined as the cartesian product Σ 1&2 ∶= Σ 1 × Σ 2 with product σ-algebra σ 1&2 generated by elements B 1 × B 2 , B 1 ∈ σ 1 , B 2 ∈ σ 2 , and product measure ds 1&2 ∶= ds 1 × ds 2 satisfying the condition In a similar way we obtain composite state spaces with multiple subsystems. Correspondingly, composite observables a ∈ O are represented by measurable functions f a ∶ Σ → R n on the composite state space Σ = × n i=1 Σ i . The algebra O of observables of the composite system is generated by the algebras of its subsystems by taking real linear combinations and products (and suitable limits if we consider topologically closed algebras). Clearly, evaluation on elements s ∈ Σ still yields algebra homomorphisms similarly to Eq. (3), hence, we obtain composite valuation functions v s ∶ O → R n from the obvious generalisation of Eq. (2) to composite observables.
In order to obtain a generalisation of the truth values in Eq. (1) it is thus enough to consider tuples a = (a 1 , ⋯, a n ) ∈ O with a i ∈ O i for i ∈ {1, ⋯, n} as well as measurable functions f a ∶ Σ → R n , f a (s) ∶= (f a 1 (s 1 ), ..., f an (s n )) with s ∈ Σ = × n i=1 Σ i . Namely, we define the truth value of the proposition ′ a ∈ ∆ ′ with Borel set ∆ ∶= × n i=1 ∆ i as follows: Note that in the last step we have used the indicator function Θ( ′ a ∈ ∆ ′ , s) in Eq. (1). For instance, the probability for obtaining a particular outcome A corresponds to the Borel set ∆ A ∶= {A}. Analogously, for joint probability distributions on a bipartite system we have with Eq. (6):

Statistical mixtures and joint probability distributions
Crucially, for classical systems the support of the joint probability distribution splits according to Eq. (6). Since furthermore p(s 1 , s 2 λ) = p(s 1 λ)p(s 2 λ) over 'patches' 19 we may define the effective parameter space Λ 20 as a set of such patches covering Σ, yielding (continuing from above) which is the standard form of factorisability.
In this reading, the splitting of classical joint probability distributions according to factorisability fundamentally stems from the splitting of supports of indicator functions Θ( ′ a ∈ ∆ ′ , s), which follows by the existence of local (single-context) state spaces with composition defined by the cartesian product. We have thus derived Eq. (9) from essentially two assumptions: (a) trivial physical contextuality, i.e., just a single context (in each subsystem) and (b) the cartesian product of state spaces as the state space of the composite system.
Since condition (b) is entirely natural for single-context state spaces, Eq. (9) can also be read as a consequence of just trivial physical contextuality.
The locality constraint in factorisable distributions is simply the condition that the joint probability distribution is a statistical average over products of local distributions, which depend on local data (observables and outcomes) only. By modelling the composite system via the cartesian product of state spaces this is automatic-neither choice nor outcome of an observable affect the other factor in the product. Factorisability thus corresponds to composition given by the cartesian product and by the above argument to (trivial) physical contextuality. This argument then suggests an intimate relationship between the following concepts: Of course, at this point a relation only exists in the very special case of trivial physical contextuality in classical systems. Nevertheless, in Sec. 6.2.1 we will see how these concepts are closely related also in the multiple-context setting.

Contextuality, composition and locality
Note that the derivation in Sec. 6.1.2 crucially depends on the assumption of underlying classical state spaces with composition defined in terms of the cartesian product. In such systems all observables are simultaneously measurable, which means they are trivial from the perspective of physical contextuality. Clearly, this is not the situation we are facing in quantum theory, where the Kochen-Specker theorem, Thm. 12, rules out a classical state space picture.
Therefore, shifting perspective from states to observables, in Sec. 6.2.1 we will discuss alternative ways to define composition, in particular, we motivate composition of systems based on observables and the order of contexts. We study the implications of this kind of composition based on context structure for the Bell presheaf, i.e., the dilated probabilistic presheaf over product contexts in Sec. 6.2.2.

Locality constraints and composition
Composition via cartesian products of state spaces. We consider the notion of composition in more detail. Recall that we defined composition of classical systems in terms of their state spaces, namely via the product of the corresponding measure spaces. On the other hand, observables in classical theories are represented by measurable functions and every measurable function on the composite state space can be approximated by suitable limits of linear combinations of indicator functions (cf. Eq. (6)). In this sense, it does not matter whether we define composition in terms of states or observables for classical systems. In fact, if we take classical systems to be given by commutative von Neumann algebras N i , i = 1, 2 with corresponding state spaces given by Gelfand spectra Σ i = Σ(N i ) ≃ Γ(Σ(V(N i ))), 21 this equivalence reads, Here, the final equality refers to the context product in Eq. (13) below. By the Kochen-Specker theorem in contextual form, Γ(Σ(V(N ))) is empty whenever N is a (noncommutative) von Neumann algebra (not of type I 2 ). The equivalence in Eq. (10) thus breaks down for such algebras. Nevertheless, composition in terms of state spaces can be carried over to quantum systems if we define the state space of the composite system in terms of convex combinations of elements in the cartesian product of global sections of the probabilistic presheaves of subsystems instead: 22 Note that Γ 1&2 is the set of separable states of the corresponding tensor product quantum system. It is thus easy to show that factorisability holds for any systems-with or without local physical contextuality-as long as composition is defined in this way.
Proposition 20. Let N 1 , N 2 be possibly noncommutative von Neumann algebras and let the set of states on the composite system, Γ 1&2 , be defined according to Eq. (11). Then all states in Γ 1&2 are factorisable and satisfy the Bell inequalities. 21 Note that the category of commutative von Neumann algebras is equivalent to the category of localizable measurable spaces, that is, measurable spaces for which the Boolean algebra of equivalence classes modulo sets of measure zero is complete [61]. 22 Recall from Thm. 18 that global sections of the probabilistic presheaf bijectively correspond with quantum states.
Clearly, this argument is not restricted to states on von Neumann algebras (density matrices in finite dimensions), but holds for arbitrary locally stochastic models with composition defined by the cartesian product similar to Eq. (11). Every stochastic, factorisable model thus satisfies the Bell inequalities (cf. [18]). Conversely, it is interesting to note that by [31] the latter is equivalent to the existence of a deterministic local hidden variable model for the composite system. In this sense stochastic, factorisable models such as those with local physical contextuality, yet composition defined via Eq. (11), still correspond to single-context state spaces.
Succinctly, factorisability is a direct consequence of composition defined in terms of the cartesian product of state spaces.
Composition via contexts. It is clear from Prop. 20 and the above argument on stochastic, factorisable models (cf. [31]) that we cannot use Eq. (11) to define a suitable notion of composition for quantum systems. Shifting our focus from states to observables and their physical contextuality as encoded in the partial order of contexts, we can distill a second, different notion of composition from Eq. (10) as follows: This is simply the product in the category of partial orders with composite contexts given by the cartesian product of local contexts and their natural product order ⊂ 1&2 . With V 1&2 as base category we can build the probabilistic presheaf Π(V 1&2 ). Each component Π (V 1 ,V 2 ) consists of the finitely additive probability measures µ ∶ P(V 1 ⊗ V 2 ) → [0, 1] together with the obvious restriction maps. More importantly, we define the Bell presheaf as the dilated probabilistic presheaf Π(V 1&2 ) according to Def. 6. Definition 7. Let N 1 , N 2 be a von Neumann algebras with context category V(N 1 ), V(N 2 ), respectively. Then we call the dilated probabilistic presheaf Π(V 1&2 ) over the product context category V 1&2 ∶= V(N 1 ) × V(N 2 ) the Bell presheaf of N 1 and N 2 .
Clearly, there are versions of the Bell presheaf also for multipartite systems. We remark that composition via contexts is the most basic construction when considering contexts of the component systems (and a straighforward generalisation of Eq. (10)). We will have to show that this construction has a meaningful physical interpretation.
Composition via tensor products. Before we explore the consequences of composition defined via contexts for the Bell presheaf, we end this section by mentioning a possible third way of defining composition, which in fact is the standard composition in quantum theory. There, the pure state space S(H) is the projective space corresponding to the Hilbert space H. Given component systems with Hilbert spaces H 1 , H 2 , the Hilbert space of the composite system is H 1 ⊗ H 2 , hence, Note that there are many more contexts for this kind of composition than for composition via contexts described above: the poset V(H 1 ⊗ H 2 ) contains many contexts that are not of the form V 1 ⊗ V 2 , which are the only contexts available in the poset V 1&2 . There is a functor that is fully faithful, but not surjective on objects. We say that V 1&2 contains only product (or twisted product) contexts (cf. [32]).

The Bell presheaf and reformulation of Bell's theorem
As we saw in the previous section, composition via contexts gives a much smaller poset of contexts, V 1&2 , than the usual composition via tensor products, which leads to V(H 1 ⊗ H 2 ). From Gleason's theorem in contextual form, Thm. 18, we know that quantum states of the composite system described by H 1 ⊗ H 2 correspond bijectively with global sections of the probabilistic presheaf Π(V( is a much richer poset than V 1&2 , there are many more restriction maps in the (dilated) probabilistic presheaf Π(V(H 1 ⊗ H 2 )) than in the Bell presheaf Π(V 1&2 ). Each global section of Π(V(H 1 ⊗H 2 )), that is, each quantum state, induces a global section of Π(V 1&2 ), but it is not clear a priori whether the converse holds. The Bell presheaf Π(V 1&2 ) could potentially have many more global sections than those corresponding with quantum states. Remarkably, this is not the case. In order to see this, the following lemma is crucial, for details we refer to [32]. For simplicity, we restrict the presentation to the case N = B(H) for finite-dimensional Hilbert spaces H; we leave the general case (of arbitrary von Neumann algebras) for future work.
A related result by Wallach [68] shows that for finite-dimensional systems, frame functions over 'unentangled' 23 bases uniquely correspond with self-adjoint operators and thus almost correspond with quantum states (in the form of density operators). In particular, there also exists a unique positive linear map φ γ ∶ B(H 1 ) → B(H 2 ), yet generally not of the form in Lm. 21. The difference is that [68] considers global sections of the undilated probabilistic presheaf Π(V 1&2 ) in Def. 5 (for details, see [32]). In contrast, it is crucial to restrict to the Bell presheaf in Lm. 21. In fact, by means of the latter the connection with states can be made precise as follows. In finite dimensions, every state corresponds with a density matrix. By Choi's theorem [16], every density matrix on the composite system H 1 ⊗ H 2 corresponds with a completely positive, trace-preserving map φ ∶ B(H 1 ) → B(H 2 ). By Stinespring's theorem [64], every such completely positive map φ is of the form φ = v * Φv with v ∶ H 2 → K a linear map and Φ ∶ B(H 1 ) → B(K) a *-homomorphism. In other words, a global 23 'Unentangled' here refers to observables, which group into product contexts, and not to states. section corresponds with a quantum state if and only if the Jordan *-homomorphism in Lm. 21 lifts to a *-homomorphism.
Not every Jordan *-homomorphism is also *-homomorphism, but it almost is. It turns out that for the special case of N = B(H) there are exactly two ways to lift a Jordan algebra to a von Neumann algebra: by augmenting the symmetric product (anticommutator) to an associative product a○b =  [46]). The sign in front of the commutator of the augmented Jordan algebra can be interpreted as picking out a forward time direction on the corresponding physical system. This can be made precise in the form of time orientations on the context category. (For more details we refer to [32] and specifically [5,25]). For our purposes the following notion will be sufficient. Note that by Thm. 8, every order automorphism on V(H) corresponds to conjugation by a unitary or anti-unitary operator. Since every anti-unitary is composed of the time-reversal operator and a unique unitary operator e ita (as in Def. 8), the former effectively causes a sign change in the parameter t ∈ R, which is therefore naturally interpreted as the time parameter. Furthermore, it is straightforward to see that (infinitesimally) this corresponds to a sign change in the commutator (of the associative algebra). By the previous discussion, this corresponds exactly with the two different ways of extending a Jordan algebra of the form J (B(H)) to a von Neumann algebra. We remark that for general Jordan algebras there are more ways to lift them to von Neumann algebras and thus also more possible time orientations on N . 24 Succinctly, Time orientations encode the forward time direction in a quantum system.
We also need the following definition of orientation-preserving global sections.
where Φ γ is the Jordan *-homomorphism in Lm 21. The set of orientation-preserving global sections with respect to ψ = (ψ 1 , ψ 2 ) is denoted, . 24 There are corresponding notions of time-oriented presheaves over the context category as well (cf. [25]). Finally, we give our contextual reformulation of Bell's theorem.  V(B(H i )))) the corresponding state spaces. Then the state space of the composite system is given by Moreover, for B(H i ) commutative with pure state spaces Σ i ≃ Γ(Σ(V(B(H i )))) one has, Thm. 22 classifies state spaces of physical theories in terms of contextuality and context composition. First, it reproduces the known bound on classical correlations in terms of factorisability. As remarked in Sec. 6.1, the latter is a consequence of composition defined on state spaces, which for single-context theories is equivalent to context composition. What is more, Thm. 22 rules out stronger, non-physical correlations such as PR-boxes, which are only (seemingly) allowed if few contexts are considered. In turn, this is a consequence of the order relations underlying composition defined via contexts in Eq. (13), which implies no-signalling. Conversely, the latter relates contexts (and probability measures over them) as follows: It is straightforward to see that these conditions coincide with those in Eq. (13) if we also demand transitivity. No-signalling together with the fact that each local subsystem possesses a quantum state space by Gleason's theorem (cf. 'local quantumness' in [8]) therefore suffices to fix the correlations on the composite system, given by global sections of the Bell presheaf Π(V 1&2 ), to be quantum realisable. In fact, global sections Γ(Π(V 1&2 )) exactly correspond with quantum states in algebras with different time orientations. Since the surrounding arguments are of independent interest, we refer to our article [32] for the proof of Thm. 22 and many more details. Thm. 22 is our reformulation of Bell's theorem in contextual form. Instead of only providing an upper bound on the amount of correlations that can exist in local hidden variable theories as, e.g. in the formulation in [18], it also shows that quantum correlations are singled out in a natural way by composition of systems via contexts rather than states. Note that this clearly implies that quantum correlations are bounded by the Tsirelson's bound [17].
Finally, we note that Thm. 22 provides a (first step towards a) well-defined notion of composition in the topos approach to quantum theory. All states of the composite system arise as global sections of the Bell presheaf (over the oriented composite context category). Hence, in order to describe the state space of the composite system, we do not need all contexts of the composite system described in terms of the tensor product algebra, but merely product contexts, which arguably are the only ones operationally accessible. In turn, one may wonder whether knowing the state space of the composite system is sufficient to determine the poset of all contexts in the tensor product algebra. We will pursue this line of research elsewhere.

Conclusion and Outlook
In this article, we have shown that important structural theorems of quantum theory-Wigner's theorem, Gleason's theorem, the Kochen-Specker theorem, and Bell's theoremfundamentally relate to contextuality. This might come as little surprise in the case of the Kochen-Specker theorem, yet other theorems had not been explicitly connected with contextuality before (to the best of our knowledge).
Wigner's theorem can be rephrased in terms of automorphisms on the partially ordered set of contexts, that is, maps preserving the context order, which are implemented by conjugation with unitary or anti-unitary operators. Hence, instead of demanding transition probabilities between pure states to be preserved, one can equivalently demand the order on contexts to be preserved instead.
The Kochen-Specker theorem is equivalent to the fact that the spectral presheaf has no global sections: given a 'local' pure state in every context (each such state assigns sharp values to all observables in its context), there is no way of fitting these together in a consistent way. In other words, there are no dispersion-free quantum states. The 'fitting together' here refers to the non-contextuality condition asserting that if an observable is contained in different contexts, then the value assigned to it by the different pure states must be the same.
Gleasons theorem answers a similar local-to-global problem, yet instead of valuation functions, it considers measures on the physical quantities in a quantum system. From the perspective of presheaves, this is easily achieved by replacing pure states with mixed states locally, that is, in every context, and by extending the restriction maps (from the spectral preheaf) to all probability measures, which thus become marginalisation constraints. Again, one asks for global sections, that is, probability assignments that are consistent globally, or across contexts. In contrast to the spectral presheaf, global sections do exist in the case of the probabilistic presheaf, and by Gleasons theorem bijectively correspond with quantum states (density matrices in finite dimensions). Gleasons theorem therefore lifts quasi-linearity of probability measures in contexts to linearity on states.
Finally, Bells theorem attains a reformulation over contexts. Here, the crucial insight is the strong connection with composition of subsystems. In classical theories, composition is defined on the level of state spaces and thus in terms of the cartesian product. Bells original theorem can be read as a constraint on correlations between theories with states spaces composed in this way. However, from the perspective of physical contextuality, this is only justified for single-context systems. There, composition of states and composition of observables coincide. However, for multiple-context systems this is no longer the case. Shifting perspective from states to observables, one defines composition on the level of the context order instead. The corresponding Bell presheaf contains by far fewer contexts and thus also by far fewer constraints between local probability measures, which prima facie might allow for global sections that do not correspond to quantum states. Nevertheless, by combining several deep results including a generalised version of Gleasons theorem, and adding the crucial notion of time orientation in local subsystems, this turns out not to be the case: all global sections of the Bell presheaf over the oriented product context category bijectively correspond with quantum states already. In other words, no-signalling and our consistency condition on time orientations in subsystems can be seen as the physical principles that rule out more general non-signalling correlations, thus replacing factorisability (stemming from the cartesian product construction on state spaces) in single-context systems.
Our contextual reformulation of Bells theoremwith composition defined on the level of contextsthus unifies the classical and quantum case, it derives constraints for correlations of both: for the former we obtain the famous Bell inequalities, for the latter we obtain exactly the correlations realised in quantum theory, which rule out more general non-signalling correlations beyond the Tsirelson's bound.
In recent years, contextuality has been recognised more and more as a central feature of quantum theory [41,63,50,3]. It has been argued that contextuality is a resource for quantum computation. In particular, for the quantum computing architecture known as measurement-based quantum computation (MBQC) it has been shown that contextuality allows to outperform certain non-contextual settings [6,58,21,33]. Moreover, in [40] it is proven that contextuality is essential for magic state injection in the stabiliser formalism. Many connections to other resources exist, such as entanglement and negativity of the Wigner function [66,22]. For MBQC, a classification of contextuality in terms of group cohomology has been given in [59,55]. Similar connections between contextuality and cohomology have also been obtained in [60] and within the sheaf theoretic formalism in [4,9,14]. The latter framework provides a precise threshold for contextual computation by means of the contextual fraction [1,2,56]. These and similar results uncover the importance of contextuality for quantum computation, hinting at a universal classification and quantification of contextuality, which could pave the way for developments in future quantum computers.
Our work contributes to this research by substantially extending the scope of contextuality through the unified perspective it attains in the form of presheaves over the partial order of contexts. As we show, contextuality in this form underlies multiple and seemingly unrelated aspects in quantum theory.