Thermodynamics of Encoding and Encoders

Non-isolated systems have diverse coupling relations with the external environment. These relations generate complex thermodynamics and information transmission between the system and its environment. The framework depicted in the current research attempts to glance at the critical role of the internal orders inside the non-isolated system in shaping the information thermodynamics coupling. We characterize the coupling as a generalized encoding process, where the system acts as an information thermodynamics encoder to encode the external information based on thermodynamics. We formalize the encoding process in the context of the nonequilibrium second law of thermodynamics, revealing an intrinsic difference in information thermodynamics characteristics between information thermodynamics encoders with and without internal correlations. During the information encoding process of an external source $\mathsf{Y}$, specific sub-systems in an encoder $\mathsf{X}$ with internal correlations can exceed the information thermodynamics bound on $\left(\mathsf{X},\mathsf{Y}\right)$ and encode more information than system $\mathsf{X}$ works as a whole. We computationally verify this theoretical finding in an Ising model with a random external field and a neural data set of the human brain during visual perception and recognition. Our analysis demonstrates that the stronger internal correlation inside these systems implies a higher possibility for specific sub-systems to encode more information than the global one. These findings may suggest a new perspective in studying information thermodynamics in diverse physical and biological systems.

Perhaps the connection between thermodynamics and information is one of the most intriguing relations discovered in physics.It is impressive to see the ubiquitous and critical roles of information thermodynamics links in diverse physical, chemical, and biological systems.Nowadays, emerging interests have arisen in studying information thermodynamics in complex systems (e.g., the human brain), whose complexity originates from heterogeneous elements and intricate internal correlations.An open challenge relates to the elusive roles of internal orders inside a complex system in influencing information thermodynamics.This research attempts to lay a cornerstone of this rising direction.We present a theory of the thermodynamics of encoding, where a non-isolated system X acts as an information thermodynamics encoder to encode the information of an external source Y through thermodynamics coupling.This theory reveals an intrinsic difference between the non-isolated system with intra-system coupling and the non-isolated system of independent elements in information thermodynamics characteristics.It demonstrates that a stronger internal correlation inside system X implies a higher possibility for specific sub-systems of X to encode more information of Y than X itself.We

Introduction
As you are reading this sentence, synergistic neural dynamics emerges in your brain (a non-isolated system of neurons) to encode the text information [19,23], changing the thermodynamic state of your brain continuously [21].This is an instance of the coupling between a non-isolated system and its external environment.Similar instances is common in classic [85,55,97], quantum [97,15,12], biological [52,73,39], ecological [50,94,75], and social [75,26,82] systems.The pursuit to understand this coupling plays a pivotal role in statistical physics.Distinguished from the macroscopic systems studied by classic thermodynamics [14], the microscopic and mesoscopic systems in stochastic thermodynamics [86,88] usually feature non-negligible system-environment coupling [49].Over decades, the study of the thermodynamic attributes of these systems has become a rapidly emerging direction [47,32,27,71,87,70,96].
Apart from the above progress, a frequently neglected perspective is that the coupling can also be a process for the system to encode the external environment information.At first glance, this information-theoretical aspect seems to focus on quantifying the information processed by a population of elements about external sources (e.g., with multiple mutual information [64], total correlation [101], or connected information [83]), irrelevant to the thermodynamics analysis.However, as suggested by Szilard's [95] and Landauer's [57,56] classical works, information is physical.Historically, the information-thermodynamics connection is first discovered in the paradox of the Maxwell demon [59].Recently, more fundamental connections have been identified in reset [59,77,76], measurement [42,77,33,80,36,28], and feedback [42,80,77,2] processes and have been verified experimentally [7,8,51,99].The idea to consider thermodynamics from the information aspect (or vice versa) has been proven effective in identifying potential links between physics and information quantities [53,78,90,45,44,69,36,28]. Meanwhile, these intriguing connections have inspired widespread explorations on the possibility for the information-carrying capacity of a memory device to function as a thermodynamic fuel [11,10,91,5,41,100,65], leading to the remarkable development of designing nanoscale autonomous machines that are fueled by information (referred to as information engines) [24,63,62,60,17].These recent advances in physics have further stimulated growing attention from other fields such as biology [16,21,4,48,6,81] and computer science [43].
Till now, it is unclear how internal orders inside a non-isolated system affect information thermodynamics.The necessity to study this problem emerges especially for the complex systems (e.g., the brain), whose complexity originates from heterogeneous elements and intricate internal correlations.Our primary objective is to seek a systematic framework that formalizes the information thermodynamics coupling between a non-isolated system X and an external source Y and helps understand the critical roles of the existing order inside the system X on this coupling relation.
Another objective is to explore potential diversities of information thermodynamics characteristics determined by intra-system factors and verify these diversities in concrete non-isolated systems.
The rest of our paper is organized as follows.Sec. 2 depicts the theoretical framework of our research.We suggest an original perspective to characterize the non-isolated system X as an information thermodynamics encoder, which encodes the information of a coupled external source Y through thermodynamics.After contextualizing this perspective with existing theories, we formalize the definition of encoding in the context of the nonequilibrium second law of thermodynamics.Based on this formalization, Sec. 3 goes deep into the information thermodynamics of different non-isolated systems.We unveil how the nature of order inside the non-isolated system determines the information thermodynamics characteristics.An intrinsic difference in information thermodynamics is revealed between the encoders with and without intra-system coupling-unlike those with independent elements, an encoder X with intra-system coupling allows the encoded information of Y in its sub-systems to exceed the information thermodynamics bound on the joint system (X, Y).In other words, every time a particular amount of irreversible work is extracted or dispensed from the joint system (X, Y), specific sub-systems of X may be able to encode more information of Y than system X itself.This possibility originates from the internal correlation inside encoder X.In Sec. 4, we have computationally verified our theory in an Ising model with random external field and a real data set of the human brain during the perception process.
2 Information thermodynamics encoder and encoding

Definition of information thermodynamics encoder
Let us consider a non-isolated system X = ({X i }, R).Here we represent it as a graph.Set {X i } includes n elements (vertices), where each X i has m states Ω (X i ) = {X j i } (here Ω (•) defines the sample space).Set R ⊆ {X i } × {X i } describes the intra-system coupling relations (edges).Note that the coupling is a kind of equivalence relation, making every connected component of the graph be a clique (see Fig. 1).To function as an information processor, system X must have multiple distinguishable states for information storage [69].Therefore, we always consider the cases with We refer to X as an information thermodynamics encoder when it couples with an external source Y = {Y i } (see Fig. 1 for an instance).Following the idea in [33,28], the joint system (X, Y) is in contact with a heat bath HB and the total system [(X, Y) , HB] is assumed as isolated.
On the one hand, the coupling between X and Y allows the transformation of heat, energy or substances.On the other hand, this coupling ensures the existence of P (X | Y) (here P (•) denotes the probability), establishing an information channel between X and Y.Such a channel supports the information-theoretical analysis of the coupling relation in the context of that X encodes the information of Y. Taken together, the dual attributes of this coupling allow us implement a unified analysis of information and thermodynamics.

The second law of thermodynamics
To embed our analysis with solid physics backgrounds, we begin with a discussion on the second law of thermodynamics.
Let us consider a general case where X will change if it interacts with Y during an interval [0, t].The joint system (X, Y) varies due to this interaction and the heat bath HB accounts for supplying necessary heat.In [80,79], the time evolution of X under the influence of Y has been previously used for deriving the nonequilibrium second law of thermodynamics from the information perspective.
Here we offer a more fundamental derivation from the thermodynamics perspective.System (X, Y) varies during [0, t], accompanied by a nonequilibrium entropy change ∆S and a nonequilibrium free energy change ∆F .By decomposing ∆S as ∆S = ∆ e S + ∆ i S, one can distinguish between the reversible contribution ∆ e S = QT −1 due to the heat flow (heat Q comes from the heat bath HB and T is the temperature) and the irreversible contribution ∆ i S ≥ 0. As demonstrated in [28], an equivalent formulation of the nonequilibrium second law can be obtained by T ∆ i S = W − ∆F ≥ 0, where W is the work performed on system (X, Y).
Based on the knowledge of free energy [25,72], the actual free energy F (τ ) at moment τ ∈ [0, t] is associated with the equilibrium free energy F (τ ) as well as the relative entropy between the actual state P X (τ ) , Y (τ ) and the ( ( ) , ( ) ) : : , : Figure 1: A schematic diagram of the coupled system (X, Y) and the illustration of equality (1).
Therefore, the nonequilibrium second law during [0, t] can be specified as , where the term W − is referred to as the irreversible work [28].Taking these derivations together, the nonequilibrium second law can be reformulated as following [28] (please see Fig. 1 for an illustration) Here the term D P X (t) , Y (t) P X (t) , Y (t) − D P X (0) , Y (0) P X (0) , Y (0) can be treated as information gain if it is positive, otherwise it is information reduction [28].Accordingly, the irreversible work W irr in (1) can be either negative when the information reduction is greater than the entropy production, or non-negative in other cases [28].

Thermodynamics and information
Given the second law of thermodynamics in (1), one might be curious about its connection to the above mentioned information-theoretical perspective.Let us consider the information amount of Y can be encoded in X [22].This aspect is studied in the term of measurement in statistical physics [69,42,77,33,80,36,28] and substantial progress has been made in its application in neuroscience [21,23,102].
To organize our derivation, we divide the timeline into 3 sections, respectively corresponding to the period before, during, and after the measurement process.
• Before the measurement (before moment 0), there is no restriction on X and Y.They can be either independent or coupled.Specially, we refer to the case where X and Y are originally independent until the measurement begins as the UCI Condition (Uncoupled initialization condition), namely that P X (0) , Y (0) = P X (0) P Y (0) in (1).Moreover, for the special case where the joint system (X, Y) is at equilibrium when the measurement begins, we refer to it as the EI Condition (Equilibrium initialization condition), namely that P X (0) , Y (0) = P X (0) , Y (0) ; • During the measurement (during [0, t]), it is necessary for X and Y to be coupled with each other.Specially, for the case where the total system X (τ ) , Y (τ ) , HB (τ ) (here τ ∈ [0, t]) always features a equilibrium state where system (X, Y) and HB are independent and HB is at equilibrium (there is no restriction on the state of system (X, Y)), we refer to it as the PE Condition (Partial equilibrium condition), namely that P X (τ ) , Y (τ ) , HB (τ ) = P X (τ ) , Y (τ ) P HB (τ ) holds for any τ ∈ [0, t];  • After the measurement (after moment t), there is no restriction on X and Y as well.For the case where the coupling between X and Y is turned off after the measurement ends, we refer to it as the UCE Condition (Uncoupled ending condition).Under this condition, we have P X (t) , Y (t) = P X (t) P Y (t) in (1).
As suggested by [28,27,46], the PE Condition is necessary for defining the entropy production in (1) where the entropy production is defined as the irreversible contribution to the entropy change of system (X, Y) when it is coupled with HB.For the PE Condition and (2), one can see their detailed derivation in [27] and a summary in [28].Moreover, an equivalent vision of the PE Condition can be seen in [33].
When the UCE Condition is satisfied in (1), we can know that D P X (t) , Y (t) P X (t) , Y (t) = D P X (t) , Y (t) P X (t) P Y (t) .In previous studies, the term D P X (t) , Y (t) P X (t) P Y (t) is treated as the mutual information I X (t) , Y (t) between systems X and Y (e.g., see [28]).Considering the non-negative entropy production ∆ i S (t) ≥ 0, one can derive the following inequality In the meanwhile, one can obtain a special case if the EI Condition is satisfied as well.( 4) is the usual form of the thermodynamics cost of the measurement process [33,36,69].Although the UCI Condition is not necessarily met in the above derivation, the energy definition of (X, Y) can be calculated in a simple form under this condition [28].In [33], this condition is set for the measurement process as well.In a summary, the definition of measurement essentially requires the above four conditions to be met (see Fig. 2).

Thermodynamics of encoding
Given the summary above, the connection between the second law in (1) and the information has been revealed from the aspect of measurement.Then, we show how to generalize the concept of measurement to encoding.
For encoding, we only require the UCE Condition to be satisfied.Therefore, the measurement process studied by previous researches [69,42,77,33,80,36,28] is a particular case of encoding.In Fig. 2, we summarize the difference between the measurement process and encoding.
Given (6), we can formalize the thermodynamics of encoding in the form of (3).Under the UCE Condition, the irreversible work W irr that can be extracted or dispensed from system (X, Y) satisfies where the equality is satisfied only when P X (t) P Y (t) = P X (t) P Y (t) .Moreover, we prove that the UCE Condition is necessary for deriving (6) in appendix B. Therefore, the thermodynamics of encoding described in (7)(8) can not be independent of the UCE Condition.
Before we move on, let us rethink about the UCE Condition.A possible question might be about the physical ground underlying its definition: P X (t) , Y (t) = P X (t) P Y (t) .Here we emphasize that any thermodynamics variation costs time.Although the coupling between X and Y is closed at moment t, one can not expect that P X (t) , Y (t) jump to the independent state P X (t) P Y (t) without time delay.Therefore, the approaching of X (t) , Y (t) to independence can only be treated as an internal tendency at moment t.
3 Information thermodynamics during encoding

Problem setup
Different from the classic ideas, our research concentrates on a new aspect of encoding.We notice that X consists of multiple sub-systems, and every sub-system involves encoding as well.An exploration of the relative role of each sub-system (or individual element) in information encoding (system coupling) helps reveal the intrinsic order of the system [83].The significance of such an analysis has been discussed in information theory [101], statistical physics [83], and neuroscience [68,1,84].Combine this new aspect with the classic ideas, an important question is whether (8) can be generalized to these sub-systems.Specifically, we wonder if the encoded information of Y in an arbitrary sub-system of X is bounded by the irreversible work W irr from the joint system (X, Y) following (8) as well.
Consider any sub-system . Then, we can rewrite the whole system as a tuple . Depending on the nature of R X ′ (t) , X ′ (t) , we can distinguish between two cases of system X where R X ′ (t) , X ′ (t) ≡ ∅ (one can see the case 3 in Fig. 4) or R X ′ (t) , X ′ (t) = ∅ (cases 1, 2, and 4 in Fig. 4).These definitions allow reformulating (1) under the UCE Condition as Given ( 9), a natural thought for solving our question on the generalization of ( 8) is to compare . Below, we demonstrate that the solution to this question can be developed based on fundamental information theory tools and further leads to an interesting view regarding non-isolated systems.

Encoding without intra-system coupling
We start the comparison in a simple case where R X ′ (t) , X ′ (t) ≡ ∅, namely that X can always be divided into two uncorrelated sub-systems (see the left panel of Fig. 6).In this case, we know that I X ′ (t) ; X ′ (t) ≡ 0 is ensured by the independence between X ′ (t) and X ′ (t) .Moreover, we can find the item I X ′ (t) , X ′ (t) ; Y (t) in the 3-order multiple mutual information [64,83] Such an equality helps connect between I X ′ (t) ; Y (t) and I X ′ (t) , X ′ (t) ; Y (t) (see Fig. 5).We need to remind that equality (4) (or an equivalent formulation of it proposed by [98]) can not be calculated directly with our general settings.An estimation for it is necessary.By applying Yeung's inequality [103] on (10), we can derive Based on (11), we know that ; Y (t) (see Fig. 5 and the left panel of Fig. 6).Therefore, one can immediately obtain proving that (8) holds for sub-systems in this case.

Possibility 2
Figure 5: A schematic diagram of the 3-order mutual information defined in (10) and our idea underlying inequality (11), suggesting the necessity to verify if the 3-order mutual information can be positive.
Furthermore, one can obtain more knowledge about the above discovery after applying the theory of information synergy and redundancy [64,83].More specifically, there is synergy between X ′ (t) and X ′ (t) if ( 10) is negative, namely that X ′ (t) and X ′ (t) taken together (denoted by X ′ (t) , X ′ (t) in ( 10)) can encode more information of Y (t) than the case when they are taken separately.If (10) is positive, then the pair X ′ (t) , X ′ (t) is redundant in the information encoding about Y (t) .Based on (11), one can find that there is no information redundancy in the system of independent elements.Moreover, one can easily prove that the equality in (11) can be satisfied if and only if [22] and I X ′ (t) ; X ′ (t) = 0. Applying the definition of conditional mutual information [22], it can be known that There exists information synergy in system X unless ( 13) is satisfied.Considering that ( 13) is a strong condition, one can generally treat it as a rare case for a system of independent elements to have no information synergy (see the top panel of Fig. 7a).

Encoding with intra-system coupling
We turn to the comparison in a more complex case where R X ′ (t) , X ′ (t) = ∅, namely that X is a system with nonnegligible intra-system coupling (the right panel of Fig. 6).In this case, we can see I X ′ (t) ; X ′ (t) ≥ 0. Following the same idea that has been shown in (10)(11)(12), we know that is ensured by Yeung's inequality [103].Note that one can easily find that the right side of the inequality is either zero or positive in this case.Therefore, the upper bound in Yeung's inequality fails in determining the sign of item . Combining this finding with (9), one can prove that the bound in (8) does not always hold for sub-systems X ′ and X ′ .Although the encoded information of Y in system X is bounded by the irreversible work W irr from the joint system (X, Y) following ( 8), an arbitrary sub-system X ′ (or X ′ ) might disobey this bound by encoding more information than X (see the right panel of Fig. 6).We should emphasize that this does not disobey the second law of thermodynamics.The encoded information in sub-system X ′ is still bound by the irreversible work W ′ irr from (X ′ , Y) following . Moreover, one can measure the maximum extent that an arbitrary sub-system X ′ can disobey the bound offered by the irreversible work W irr .An elementary bound can be derived from Shannon entropy S (the right panel of Fig. 6).Nevertheless, the non-trivial bound remains elusive and requires more explorations in future.
One can see that the encoding by a system with non-negligible intra-system coupling might involve both information synergy and redundancy.The synergy case is in accord with our common sense of the relation between a system and its sub-systems.The redundancy case (where the bound offered by the irreversible work W irr is disobeyed by subsystems) corresponds to the situation where an arbitrary sub-system encodes more information than the whole system.Such a case results from the in-harmony between sub-systems in encoding (see the bottom panel of Fig. 7a).
To this point, we have proposed our answer to the question on the generalization of (8).We demonstrate that the irreversible work W irr from (X, Y) can bound the encoded information of Y in any sub-system of X if there is no intra-system coupling.Otherwise, this bound might be disobeyed by the sub-systems of X (see Fig. 7).We have discussed these findings from the aspect of information synergy and redundancy [64,83].These discussions lead to an interesting view that a non-isolated system without intra-system coupling frequently involves information synergy, in comparison, the system with intra-system coupling can involve either information synergy or redundancy (see Fig. 7).
Rather than limit ourselves in treating the redundancy as specific information inefficiency, we suggest a new perspective that the redundancy means a possibility to hide information in a non-isolated system.By observing the irreversible work W irr from a joint system (X, Y), one might underestimate the information of Y encoded in X (underestimate the inter-system coupling strength) if X consists of correlated elements.In such a case, more information of Y can be hidden in the sub-systems of X (see an illustration in Fig. 7b).
Below, we implement our analysis of this perspective by applying our theoretical framework to concrete non-isolated systems.These systems come from statistical physics and neuroscience, helping demonstrate the ubiquity of the discussed phenomenon above.
4 Computational verification in non-isolated systems

Ising model with random external field
We begin our demonstration with the Ising model [20], one of the significant themes in physics.Specifically, we consider a 1-dimensional Ising model with random external field, whose Hamiltonian is Here σ is a shorthand for a specification of the spin σ i on each site.The notion J denotes the coupling strength between spins (the intra-system coupling).The external field h ∈ (0, m] ∩ Z is defined as a discrete random variable (the inter-system coupling) To this point, a spin system σ and an external source h have been characterized, functioning as the systems X and Y in our previous discussion, respectively.Here we define a random external field h rather than a fixed one since it is more natural to consider a random variable h in our subsequent encoding analysis.One can find other similar models in [3,104].
Given a h, we know that system σ always has a unique equilibrium Boltzman distribution at a constant temperature T [74, 58] where k is the Boltzmann constant, and Z (h) denotes the normalization term (partition function) where n is the number of spins.Assuming ( 16) is the equilibrium distribution P (h) of h, we can calculate the equilibrium distribution of joint system (σ, h) There is either information synergy or redundancy 3 4 ' ; 5 is not necessarily bound by the irreversible work 6 788 from the joint system 94: <= There is frequently information synergy 3 4 ' ; 5 is always bound by the irreversible work 6 788 from the joint system 94: <= (a) The encoded information must be no more than ) bits based on my observation on 345 67 6 Oh! The maximum encoded information is beyond my observation Information can be hidden in us  The differences between the encoding processes in systems with/without intra-system coupling.(b) The illustration of hiding information in non-isolated system without intrasystem coupling.Now, let us consider the thermodynamics of encoding in the joint system (σ, h) during [0, t] (e.g., try to calculate inequality (1)).We treat system σ as an information thermodynamics encoder to encode the information of h.Given the above review, we can calculate the equilibrium distribution P σ (τ ) , h (τ ) at any moment τ ∈ [0, t].When σ is coupled with an external field h ∈ (0, m], the equilibrium distribution of (σ, h) can be worked out following (19).When the UCE Condition is met at moment t, one can see that P σ (t) = P σ (t) | 0 , where P σ (t) | 0 measures the equilibrium distribution of σ independent from the external field h.Thus, As for the work W performed on system (σ, h), it can be calculated under the framework of stochastic thermodynamics [86] Ẇ = For convenience, we implement the experiment in a special case where h (t) = h (0) and the UCI Condition is satisfied.
Under this condition, one can derive W irr = W. Taken together, the most part of (1) has been calculated to this point.
However, one can immediately find it non-trivial to define the time-dependent distribution P σ (τ ) , h (τ ) analytically.
To overcome this challenge, we turn to the computational implementation of (1).
Computationally, we generate m × k J × k T × k θ random sequences of the external field, where each random sequence h (0) , . . ., h (t) corresponds to an experiment condition.The condition includes a possible variation process of external field that begins and ends with a unique h (0) = h (t) ∈ (0, m] ∩ Z.This sequence features a certain transition rate θ −1 (there are k θ transition rates that quantify the variation speed of the external field).Meanwhile, each condition corresponds to a coupling strength J and temperature T (there are k J and k T kinds of coupling strength and temperature quantities, respectively).Under each experiment condition, the simulation experiment is implemented as following • We randomly initialize a specification σ (0) at moment 0; • We perform the continuous-time Monte Carlo simulation utilizing the Metropolis algorithm σ (0) , . . ., σ (t) = Metropolis σ (0) , h (0) , . . ., h (t) [9].The simulation lasts for a duration [0, t].During the simulation, we use the update of the external field h to drive the Metropolis algorithm.The work is initialized as W = 0. Every time when the spin specification changes from σ (τ ) to σ ′ (τ +1) , we add following (20).As for P σ (0) , h (0) , we assume it is uniformly distributed on the sample space Ω (σ) × Ω (h).Given this assumption, we can calculate the term D P σ (0) , h (0) P σ (0) , h (0) under the UCI Condition.Finally, we can directly obtain the entropy production ∆ i S (t) since other parts of (1) have been given.We implement the computational experiment using the parameter settings in appendix C. The spin system σ is set with a small size, allowing us to do a relatively exhaustive sampling.
Fig. 8a shows the variation trends of every term of (1) along with the increasing J, T , or θ conditions.The corresponding mutual information calculated based on ( 5) is illustrated as the function of J, T , or θ in Fig. 8b.One can see that the mutual information (mean value) features a positive correlation with the inverse transition rate θ (logarithmic value).The correlation is strong with a Pearson coefficient R = 0.9429 and significant at p < 10 −5 , suggesting that the spin system σ encodes more information about the external field h when it has more relaxation time between any two times of external field variations (larger θ).Apart of that, more complex variation trends of the mutual information can be found along with the coupling strength J and the temperature T .
Featuring intra-system coupling relations (the interactions between different spins), the spin system σ is expected to allow the encoded information by its sub-systems to exceed the encoded information by itself.To verify this speculation, we randomly select sub-systems σ ′ (and the corresponding σ ′ ) 100 times under each experiment conditions.As The probability distributions of the cases where I σ ′ (t) ; h (t) ≥ I σ (t) ; h (t) , I σ ′ (t) ; h (t) ≥ I σ (t) ; h (t) , or I σ ′ (t) ; σ ′ (t) ; h (t) ≥ 0 are shown as the functions of J, T , and θ (upper line).We also compare the probability variability of these cases in each parameter direction (middle line).Then, we analyze the probability distributions of these cases in the directions of J and T due to the high variability (bottom line).
shown by the examples in Fig. 8c, one can see that the information quantity differences I σ ′ (t) ; h (t) − I σ (t) ; h (t) and I σ ′ (t) ; h (t) − I σ (t) ; h (t) can be either non-positive (meaning that the bound ( 8) is followed by sub-systems) or positive (the bound (8) may be disobeyed by sub-systems).Moreover, one can find the corresponding positive 3-order mutual information I σ ′ (t) ; σ ′ (t) ; h (t) in Fig. 8d, which suggests the existence of information redundancy or synergy.
Although the experiment condition (J, T, θ) has significant effects on I σ ′ (t) ; h (t) − I σ (t) ; h (t) , I σ ′ (t) ; h (t) − I σ (t) ; h (t) , and I σ ′ (t) ; σ ′ (t) ; h (t) in Fig. 8c-8d, we suggest that these effects do not show clear trends if they are attributed to a single parameter among J, T , and θ in Fig. 8e.Therefore, we turn to search for underlying patterns from the probabilistic perspective and implement a unified analysis.In Fig. 8f, we investigate the probability distributions of I σ ′ (t) ; h (t) ≥ I σ (t) ; h (t) , I σ ′ (t) ; h (t) ≥ I σ (t) ; h (t) , and I σ ′ (t) ; σ ′ (t) ; h (t) ≥ 0 as the functions of (J, T, θ).After measuring the mean variability of these probability distributions in each parameter direction (e.g., the probability variability in the direction of J at (J, T, θ) is defined as the variance Var J [P (J, T, θ)].Then the mean variability is quantified as E T,θ {Var J [P (J, T, θ)]}), we confirm that the variability of these probability distributions can be better explained by (J, T ) rather than θ.This finding inspires us to concentrate on the effects of (J, T ) on the probability distributions.The bottom panel of Fig. 8f demonstrates that the probability densities of I σ ′ (t) ; h (t) ≥ I σ (t) ; h (t) , I σ ′ (t) ; h (t) ≥ I σ (t) ; h (t) , and I σ ′ (t) ; σ ′ (t) ; h (t) ≥ 0 relatively concentrate on the sub-region where J ∼ T (or equivalently, min J T , T J ∼ 1).Because the temperature T is not an inherent characteristic of the spin system σ, our research focuses on the coupling strength J.As the temperature T increases, the coupling strength J where these probability densities concentrate will also increase.In other words, the cases where the encoded information of h by a sub-system σ ′ (or σ ′ ) exceeds I (σ; h) and the cases with information redundancy frequently emerge when the coupling strength J (the intra-system coupling) is relatively proportional to the temperature T .
Assuming there is an observer who pursues to estimate the maximum encoded information of h in σ.Given our findings above, the encoded information might be underestimated if this observer implements the observation outside the system (σ, h) (e.g., the observer estimates the encoded information utilizing the irreversible work W irr from system (σ, h) and ( 8)).When the encoded information I (σ ′ ; h) by a sub-system σ ′ (or σ ′ ) can exceed I (σ; h), a certain amount of encoded information (measured as I (σ ′ ; h) − I (σ; h) or I σ ′ ; h − I (σ; h)) is hidden from the observer.

Real data of the human brain during the perception process
Perhaps the brain is the most natural example of information thermodynamics encoder with intra-system coupling.We choose the brain as our second demonstration since neuroscience studies can offer vast amounts of data for X (neural data) and a clear characterization for Y (stimulus data).Nevertheless, we need to note that the temperature T and the irreversible work W irr are not well-defined concepts for the brain and can not be calculated directly.Therefore, the analysis can only be performed indirectly.Specifically, we implement our analysis on an open-source functional magnetic resonance imaging (fMRI) data set obtained from the visual object recognition experiment, where the random stimulus sequence consists of 8 kinds of objects [38].The fMRI technique measures neural activities through the blood-oxygen-level-dependent (BOLD) contrast in the magnetic field [40].This data set includes the fMRI signals of 6 subjects from a 3 Tesla scanner, covering a high-resolution whole-brain region of 163840 voxels (small neural clusters) for each subject.Each voxel corresponds to a time series of the activities of a neural cluster.In the experiment, 12 pairs of (neural data, stimulus data) are obtained from each subject, and we arrange these pairs into a single matrix.Thus, every subject corresponds to a joint system (X, Y).The validity of the data has been verified previously [37,35,67], ensuring the reproducibility of our analysis.Please see appendix D for more details.
To measure information quantities in real data set, we estimate the mutual information I following the computational approach proposed by Gao et al. [30] where ψ (•) denotes the digamma function.The estimation is implemented based on k-nearest neighbor (KNN) method, where each τ ∈ [0, t] corresponds to a sample (here t is the length of data).Let ǫ (τ ) be the distance from X (τ ) , Y (τ ) to its k-th neighbor with a given parameter k (in our research, we set k = 5).One have In the sub-space of X, one can count the number C X (τ ) of the sample whose distance from X (τ ) is no more than ǫ (τ ) .Similarily, one can measure C Y (τ ) in the sub-space of Y.In [30], the distance is defined utilizing the maximum norm.Unlike the classic estimator designed with 3H-principle that I (X, Y) = H (X) + H (Y) − H (X, Y) (e.g., see [54]), this estimation is more efficient in controlling systematic errors and dealing with mixture spaces [30].
It is computationally costly to calculate information quantities on the whole brain regarding a large number of voxels.Therefore, we implement our analysis utilizing a reverse sequence.Specifically, we randomly initialize a system of z voxels in each brain (z ∈ [400, 600]).Then, we randomly add r voxels into this system in each iteration (r ∈ [50,150] and every sequence lasts for 100 iterations).We repeat this random reverse sequence generation in each subject 10 times.Given these settings, the previous system and the newly added voxel set are complementary sub-systems of the current system (can be referred to as X ′ and X ′ ).Fig. 9a illustrates the variation trends of the estimated mutual information in these random sequences, suggesting that the system in the previous iteration (includes fewer voxels) might encode more stimulus information (Y) than the current system (includes more voxels).The corresponding differences between the encoded information quantities in previous and current systems are shown in Fig. 9b.In Fig. 9c, one can see that I X ′ (t) ; X ′ (t) ; Y (t) > 0 always holds across all iterations, revealing the existence of information redundancy.Here we need to emphasize that the iteration is implemented with randomization and therefore frequently adds disharmonious voxels into the system to create redundancy.This result does not deny the possibility of information synergy.
To relate the above findings with the internal orders of the human brain, we quantify the mean Pearson correlation coefficient between the voxels that previously exist (X ′ ) and the newly added voxels (X ′ ).The correlation coefficients between voxels in functional states, reflecting the relations between neural clusters during neural information processing, usually vary across different cognitive processes and individuals.Moreover, the neural correlation usually concentrates on specific sub-intervals rather than pervades the whole interval of [−1, 1].Therefore, we further normalize the mean Pearson correlation to ensure the generality of our analysis (here, we normalize the coefficients based on their absolute values and maintain their signs).In Fig. 9d, we visualize the relations between the estimated information quantities and the normalized mean correlation coefficients (NMC).When NMC approaches 0, it can be seen that the cases where I X ′ (t) ; Y (t) ≤ I X (t) , Y (t) frequently occur and the 3-order mutual information drops sharply.This phenomenon inspires us to investigate the variation trend of I X ′ (t) ; Y (t) − I X (t) , Y (t) as a function of the absolute value of NMC.We implement binning on |NMC| following the Freedman-Diaconis approach [29], which maintains relative robustness on the non-smooth data.Then, we exclude the bins where no more than 5 sample exist to obtain the filtered data (covers 99.48% of the raw data and excludes 0.52% of the raw data as outliers).Our results in Fig. 9e suggest that the random variable ) follows a probability distribution P (• ≥ ρ) = α|NMC| + β (here α ∈ [0.2507, 0.3470] and β ∈ [0.3019, 0.3989]) in Fig. 9f.The probability density function fitting of these results obtains reasonable fitting accuracy (please see Fig. 9e-9f).In Fig. 9f, we illustrate a representative example of the fitted probability distributions P (• ≤ ν) and P (• ≥ ν).One can see that the probability P (• ≤ ν) (ν < 0) significantly decreases along with the increasing intra-system coupling strength (quantified by |NMC|) while the probability P (• ≥ ρ) (ρ ≥ 0) shows an opposite pattern.The finding is in consistency with our theory that a system with stronger intra-system coupling more possibly allow specific sub-systems to encode more information than itself.
To this point, we have demonstrated our theoretical results in the brain, a non-isolated system of neurons that has complex intra-system coupling and is coupled with external stimuli.Although we can not calculate (1) directly, the indirect analysis demonstrates that the encoded information in a sub-system of neural tissues is not necessarily bound by the irreversible work of the whole system.In the brain, the possibility for a sub-system to encode more stimulus information than the whole system is modified by the neural correlation (the intra-system coupling strength).Suppose we consider the observer perspective discussed in Subsec.4.1, a similar conclusion can be drawn that a specific amount of encoded information may be hidden in sub-systems.

Discussion
The current research pursues to explore the fundamental role of the order inside a non-isolated system in shaping the information thermodynamics coupling between this system and the environment.We begin with a thermodynamic perspective to characterize an arbitrary non-isolated system X as an information thermodynamics encoder when this system is coupled with an external source Y.The information thermodynamics encoder encodes the information of external source utilizing thermodynamics, providing a unified angle to analyze the inter-system coupling between X and Y in term of information and thermodynamics.Rather than stand alone, our idea is rooted in extensive explorations of the physics nature of information (please see [69] for a systematic review).Furthermore, similarities and differences coexist between the proposed information thermodynamics encoder and information engines [24,63,62,60,17].Although these two kinds of systems both bridge information and thermodynamics, information thermodynamics encoder mainly focuses on the thermodynamic costs of perceiving external information rather than the potential of information to function as specific thermodynamic fuel [11,10,91,5,41,100,65].
To formalize the encoding process with solid physics backgrounds, we try to find an appropriate way to derive the thermodynamics of encoding based on the nonequilibrium second law of thermodynamics.One can find a reverse derivation process from information quantities to the nonequilibrium second law of thermodynamics and fluctuation theorem [80,79].Although our proposed encoding process is suggested as a possible generalization of the measurement process [69,42,77,33,80,36,28], we need to emphasize that there exist essential consistency between these two concepts.Given the thermodynamic definition of encoding, a fundamental law is suggested that the mutual information I (X, Y) (the encoded information of Y by system X entirety) is bounded by the irreversible work W irr from system (X, Y) based on (8).This upper bound is implied by the nonequilibrium second law of thermodynamics intrinsically [28].Built on the above foundations, we turn to verify if this bound is followed by the encoded information of Y in an arbitrary sub-system of X as well.The necessity to study this problem lies in that it provides an opportunity to glance at the effects of internal orders inside X on the information thermodynamics link between X and Y.Despite the substantial progress in exploring the physics of information [69], the critical roles of these internal orders remain elusive.Exploring this problem may help understand information thermodynamics in complex systems, where the diversities of information thermodynamics characteristics might emerge on multiple scales due to heterogeneous elements and intricate internal correlations.As a starting point, the present study is limited at a qualitative level.Combining the 3-order multiple mutual information and the theory of information synergy and redundancy [64,83], the proposed thermodynamics of encoding suggests an intrinsic difference between the non-isolated system with internal correlations and the non-isolated system of independent elements when they act as information thermodynamics encoders.Unlike those with independent elements, an encoder X with internal correlations allows the encoded information of Y in its sub-systems to exceed the information thermodynamics bound on the joint system (X, Y).More specifically, the encoded information of Y in an arbitrary sub-system X ′ is not necessarily bound by the irreversible work W irr from system (X, Y) following ( 8).This difference may originate from the nature of order inside the system X.There is frequently information synergy in the non-isolated system of independent elements; in comparison, there can be either information synergy or information redundancy in the non-isolated system with intra-system coupling.These theoretical findings can be mathematically derived utilizing the Yeung's inequality [103].Furthermore, we have computationally verified them in an Ising model with a random external field and a real data set of the human brain during the perception process.Our analysis demonstrates that the stronger internal correlation inside these systems can create a higher possibility for their sub-systems to encode more information than the global one.
The essential idea that guides our research is to analyze the thermodynamic coupling relation between a non-isolated system and an external source from the information-theoretical perspective.Similar ideas are pervaded in various physics disciplines and become the foundation of information thermodynamics thanks to Szilard's [95] and Landauer's [57,56] inspiring works.
Recently, this idea has received increasing attention in neuroscience.Neuroscientists have identified the fundamental role of information thermodynamics in supporting brain functions [21,92,93,89,81,66].Historically, the physics mechanisms underlying brain functions used to be elusive [23].The brain is a system with complex topology [61], geometry [105], and dynamics [18], which creating numerous obstacles to study how the brain processes information.The application of information thermodynamics in neuroscience may help overcome these obstacles, suggesting a potential direction to understand neural information processing at a physically fundamental level [21].Based on previous studies, our research moves one more step further to explore the information thermodynamics nature of the brain.As a non-isolated system of neurons, the brain is demonstrated to act as an information thermodynamics encoder with intra-system coupling.By deriving the thermodynamics of encoding based on the nonequilibrium second law of thermodynamics, our theory helps reveal the physics foundation of how the brain encodes the information of an external source.This framework might contribute to understanding the intrinsic relations between cognition and the physical brain [21,69].Moreover, we combine information thermodynamics with the theory of information synergy and redundancy to implement a unified analysis.We demonstrate that the intra-system coupling within a system allows the encoded information in specific sub-systems to exceed the information thermodynamics bound on the whole system.In other words, more information of the external source can be encoded (or hidden) in certain sub-systems in comparison with the whole system.The emergence probability of this phenomenon is affected by the internal correlation strength inside the brain positively.We suggest that this finding may provide insights into the synergy and labor division among neurons and cortices during neural information processing.Classically, the decomposition of the information encoded by a system of elements is implemented utilizing information synergy and redundancy.The possible significance of synergy and redundancy in the neural system has been discussed for decades, discovering various effects of redundant or synergistic collective dynamics on neural information processing [83,31,13,68].Our research further connects these previous theoretical findings with the thermodynamics of encoding, revealing the thermodynamic foundation of information synergy and redundancy during the encoding process.This finding may offer a possible explanation for the stimulus-dependent superiority of specific neural or cortical tissues in encoding efficiency comparing with the whole brain.For instance, the superiority of the visual cortex during visual perception [34] not only means the relative redundancy of other sensory cortices but also corresponds to the larger encoded information quantity that is not restricted by the information thermodynamics bound of the whole brain.In sum, the theory depicted here features the potential to be further applied in neuroscience and related fields, which may help deepen our understanding of the physics foundation of neural information processing.
When we turn to a more general perspective about thermodynamics and information, one might find that limitations and possible insights coexist in our theory.Under the classical framework of information thermodynamics, our theory helps generalize the measurement process [69,42,77,33,80,36,28], one of the basic concepts that bridge information and thermodynamics, to the encoding process.By doing so, we can analyze the origin of mutual information from the nonequilibrium second law of thermodynamics in a more general aspect.Built on this framework, a new perspective is suggested to study the information thermodynamics connection that originates from the coupling between a non-isolated system and its environment.This perspective helps reveal how the intra-system attributes of a non-isolated system shape the information thermodynamics coupling between this system and the environment.The information thermodynamics difference discovered between the non-isolated system with internal correlations and the non-isolated system of independent elements is independent of detailed system definitions and free to be applied to any real system.However, we need to emphasize that our current work remains at a qualitative level.Although the 3-order multiple mutual information enables us to analyze the internal orders of a non-isolated system in terms of information synergy and redundancy [64,83], it fails to offer a more precise and quantitative characterization of these orders.This deficiency remains an open challenge for future studies.By overcoming this challenge, our suggested perspective may help explore more information thermodynamics characteristics determined by the nature of order inside the non-isolated system or the inter-system coupling relations.Last but not least, an idea that may interest both physicists and computer scientists relates to the characterized information thermodynamics encoder in our research, which functions as a kind of generalized computer or memory system [69].Given the arbitrariness of non-isolated system selection for defining the encoding process, it is possible for any system with multiple distinguishable states to function as a generalized computer (see similar ideas in thermodynamic computing [43,69,90]).
To conclude, the information thermodynamics link that originates from system coupling is explored in our research.We pursue a thermodynamics theory of encoding and propose a novel idea to analyze an arbitrary non-isolated system as an information thermodynamics encoder.This perspective indicates the critical role of the nature of order inside the non-isolated system in shaping information thermodynamics coupling.The presented theoretical findings might lay a cornerstone for the future exploration of various related topics.

Acknowledgements
The authors are grateful to Prof. Oren Raz for his inspiring suggestion on the research and valuable comments on the manuscript.Y.T appreciates the support of the YutChun Program.Moreover, the authors appreciate Mr. Ben Hou and Mr. Tianyi Wang for their proofreading on mathematics.
To conclude, we can ensure that the term in ( 24) is non-negative.This result further implies that D P X (t) , Y (t) P X (t) P Y (t) ≥ I X (t) ; Y (t) (please see (8) in our main text).

B The UCE Condition and the mutual information
Assume that the thermodynamics of encoding defined in (7-8) is independent of the UCE Condition, then D P X (τ ) , Y (τ ) P X (τ ) , Y (τ ) is not necessarily equivalent to D P X (τ ) , Y (τ ) P X (τ ) P Y (τ ) .The derivations of D P X (t) , Y (t) P X (t) P Y (t) ≥ I X (t) ; Y (t) do not necessarily hold under this condition, remaining for further verification.
Following the idea in appendix B, we know that for each τ D P X (τ ) , Y (τ ) P X (τ ) , Y (τ ) =I X (τ ) ; Y (τ ) − Ω(X)×Ω(Y) P X (τ ) , Y (τ ) log P X (τ ) , Y (τ ) P X (τ ) P Y (τ ) , where the second term of ( 38) is equivalent to Here ( 41) can be derived utilizing the Jensen inequality [22].Combining (38) and ( 42), what we can prove is rather than the inequality (8) in our theory.The inequality ( 43) is trivial since the non-negativity of the relative entropy has already be known.To derive the stronger conclusion in (8), the UCE Condition should be included in our theory.

C Simulation setting for Ising model
The following are the parameters used in our simulation for system (σ, h).Specifically, the coupling strength J, temperature T , and transition rate θ −1 conditions are set as following Please note that the parameter θ measures the interval between two times of transitions of the external field.Its inverse θ −1 quantifies the transition speed.
In our experiment, we simulate the Ising model under each combination J, T, θ −1 to obtain the results.Moreover, we can treat J, T , and θ −1 as three parameter directions, respectively.By averaging experiment results in arbitrary two parameter directions, we can further analyze these results as the functions of the remaining parameter direction.

D fMRI data acquisition and pre-processing
The fMRI data set used in the present study is acquired from [38], which has been used in several neuroscience studies [37,35,67].Neural responses, reflected by the fMRI BOLD signals, are measured from 6 subjects (5 females and 1 male) with gradient echo echoplanar imaging on a GE 3T scanner (General Electric, Milwaukee, WI).Repetition time (TR) is 2500 ms, echo time (TE) is 30 ms, the flip angle is 90, and the field of view (FOV) is 240 mm.In each time of repetition, 40 slices of 3.5-mm-thick sagittal images are obtained.T1-weighted spoiled gradient recall (SPGR) signals (124 slices of 1.2-mm-thick sagittal images) are obtained as the high-resolution information of detailed anatomy (please note that our analysis is implemented on the BOLD signals rather than the T1 signals).Stimuli are gray-scale images of 8 types of visual objects (faces, houses, cats, bottles, scissors, shoes, chairs, and the nonsense patterns that are generated as the phase-scrambled images of intact objects).More details about the stimulus sequences in visual perception experiments can be seen in [38].In our analysis, these stimuli are represented by the indices of their types.
Pre-processing of the BOLD signals is performed based on SPM12 (a MATLAB toolbox designed for brain imaging data analysis, one can see https://www.fil.ion.ucl.ac.uk/spm/ for details), including slice-timing correction and motion correction.These corrections are standard workflows in BOLD signal pre-processing and have no significant influence on the observed phenomena in Sec.4.2.As for the T1 signals, we have not implemented further pre-processing on them since the anatomy information is not necessary for our study.
In general, our analysis in Sec.4.2 does not rely on the data set and pre-processing techniques critically.One can implement the same analysis on any neural data set involving the perception of multiple stimuli.

Figure 2 :
Figure 2: The difference between measurement and encoding across time sections.

Figure 7 :
Figure 7: Summary of theoretical findings.(a)The differences between the encoding processes in systems with/without intra-system coupling.(b) The illustration of hiding information in non-isolated system without intrasystem coupling.

Figure 8 :
Figure 8: Information thermodynamics in the Ising model.(a) The calculated terms (mean value and probability distributions) of (1) are shown as the functions of J, T , and θ.While analyzing each parameter direction among J, T , and θ, the other two directions are averaged.(b) The calculated mutual information quantities (raw data distributions, mean value, and probability distributions) act as the functions of J, T , and θ. (c-d) Two representative examples of the mutual information comparison and their corresponding 3-order mutual information.(e) The mutual information comparison result and the corresponding 3-order mutual information are analyzed as the functions of J, T , and θ.(f) The probability distributions of the cases whereI σ ′ (t) ; h (t) ≥ I σ (t) ; h (t) , I σ ′ (t) ; h (t) ≥ I σ (t) ; h (t) , or I σ ′ (t) ; σ ′ (t); h (t) ≥ 0 are shown as the functions of J, T , and θ (upper line).We also compare the probability variability of these cases in each parameter direction (middle line).Then, we analyze the probability distributions of these cases in the directions of J and T due to the high variability (bottom line).

Figure 9 :
Figure 9: Information thermodynamics in the human brain.(a-c) The estimated information quantities and their probability distributions.(d) The relations between information quantities and the normalized mean correlation (NMC) are shown, where the sizes and colors of data points scale according to NMC.(e) We analyze the information quantity I X ′ (t) ; Y (t) − I X (t) , Y (t) as a function of the absolute value of NMC (left).By adjusting the base line ε, we can quantify the probability distribution of the case where I X ′ (t) ; Y (t) − I X (t) , Y (t) ≤ ν under different standards.Our results suggest that this kind of case, irrespective of the selected base line, the probability P[• ≤ ν] follows a power regression model of |NMC| with reasonable fitting accuracy (middle and right).(f) Correspondingly, we set different base line ρ to identify the case where I X ′ (t) ; Y (t) − I X (t) , Y (t) ≥ ρ.The probability P[• ≥ ρ] and |NMC| feature a linear regression relation with reasonable fitting accuracy, relatively independent of the base line settings (left and middle).Finally, we illustrate a representative instance of the probability distributions P[• ≤ ν] and P[• ≥ ρ] where (ν, ρ) = (−0.0545,0) (right).
This research is supported by the Artificial and General Intelligence Research Program of Guo Qiang Research Institute at Tsinghua University (2020GQG1017) as well as the Tsinghua University Initiative Scientific Research Program.

Table 1 :
Parameter settings in simulation