Compressibility of the protein-water interface

The compressibility of a protein relates to its stability, flexibility, and hydrophobic interactions, but the measurement, interpretation, and computation of this important thermodynamic parameter present technical and conceptual challenges. Here, we present a theoretical analysis of protein compressibility and apply it to molecular dynamics simulations of four globular proteins. Using additively weighted Voronoi tessellation, we decompose the solution compressibility into contributions from the protein and its hydration shells. We find that positively cross-correlated protein-water volume fluctuations account for more than half of the protein compressibility that governs the protein’s pressure response, while the self correlations correspond to small (∼0.7%) fluctuations of the protein volume. The self compressibility is nearly the same as for ice, whereas the total protein compressibility, including cross correlations, is∼45% of the bulk-water value. Taking the inhomogeneous solvent density into account, we decompose the experimentally accessible protein partial compressibility into intrinsic, hydration, and molecular exchange contributions and show how they can be computed with good statistical accuracy despite the dominant bulk-water contribution. The exchange contribution describes how the protein solution responds to an applied pressure by redistributing water molecules from lower to higher density; it is negligibly small for native proteins, but potentially important for non-native states. Because the hydration shell is an open system, the conventional closed-system compressibility definitions yield a pseudo-compressibility. We define an intrinsic shell compressibility, unaffected by occupation number fluctuations, and show that it approaches the bulk-water value exponentially with a decay “length” of one shell, less than the bulk-water compressibility correlation length. In the first hydration shell, the intrinsic compressibility is 25%–30% lower than in bulk water, whereas its self part is 15%–20% lower. These large reductions are caused mainly by the proximity to the more rigid protein and are not a consequence of the perturbed water structure.© 2018 Author(s). All article content, except where otherwise noted, is licensed under a Creative Commons Attribution (CC BY) license (http://creativecommons.org/licenses/by/4.0/). https://doi.org/10.1063/1.5026774

First, because volume fluctuations associated with compressibility are small, statistical accuracy is an issue, especially if the simulated protein solution is dilute. Second, because the protein-water interface is extensive, cross-correlated proteinwater volume fluctuations are important. As previously noted, 27,38,39 the decisive role of cross correlations has not always been appreciated. 8,30,35 This applies also to the interpretation of the experimentally derived protein partial compressibility. [13][14][15][16][17][18][19][20][21][22][23] Third, because geometry and interactions cannot be rigorously disentangled at the molecular level, 28 a protein-water dividing surface must be imposed and this choice affects the compressibility. Arguably, the most realistic dividing surface is obtained by the so-called additively weighted Voronoi tessellation, 40,41 but so far only primitive Voronoi tessellation has been used to compute component compressibilities. 27,28,[32][33][34] Fourth, because of the asymmetric interactions in the interfacial region, the water density is higher near the protein than further away. [42][43][44][45][46] This is a complication because the usual compressibility definitions are not valid for an inhomogeneous solvent. A continuously varying local compressibility can be defined and computed (in the grand canonical ensemble) for an inhomogeneous fluid bounded by a smooth planar wall, 47,48 but this approach is not readily extended to the structurally complex protein-water interface.
Computational studies of the inhomogeneous proteinwater interface usually adopt a coarse-grained description at the level of hydration (or water coordination) shells. The first and higher hydration shells can be defined in different ways, based on spatial or topological proximity, 46 but they provide a natural and physically intuitive distance metric. Several simulation studies have attempted to compute the hydration shell compressibility, 27,[32][33][34][35][36][37]49 but only for the first shell. Unlike the protein, the hydration shell is an open system, containing a fluctuating number of water molecules. 46 But the fluctuation formula that relates the compressibility to the mean-square volume fluctuation is only valid for a closed system. Consequently, when this formula is applied to a hydration shell, one obtains a pseudo-compressibility that differs greatly from the true (intrinsic) shell compressibility because it includes a (negative) contribution from molecular exchange between regions of different density. While this problem was recognized early on, 50 it has not, to our knowledge, been discussed or analyzed in subsequent work.
We refer to the shell compressibility as a pseudocompressibility because it is obtained from a theoretical expression-either the volume fluctuation formula or the usual thermodynamic definition of compressibility in terms of the pressure derivative of the volume-that is valid only for a closed system. In other words, the molecular exchange contribution to the pseudo-compressibility is an artifact of misapplied theory. But the sum of these artificial shell contributions is not artificial, provided that the system contains a fixed number of water molecules (as for simulations in the usual NpT, NVT, or NVE ensembles). This additional contribution to the total solvent compressibility arises because an inhomogeneous solvent can respond to an applied pressure by redistributing molecules from lower-density to higher-density regions.
The outline of this paper is as follows. In Sec. II, we describe the simulation methodology and data analysis. In Sec. III, we use a bulk-water simulation to determine a compressibility correlation length, to be compared with other compressibility length scales emerging from our analysis of protein solutions. In Sec. IV, we decompose the solution compressibility into its contributions from the two components: protein and water. The focus here is on the protein component compressibility and its self and cross parts. We also consider the experimentally accessible protein partial compressibility and how it is related to the component compressibilities. In Sec. V, we go one step further by subdividing the solvent into hydration shells in order to examine the spatial variation of the solvent compressibility. This analysis prompts us to introduce the pseudo-compressibility and the intrinsic compressibility associated with a given shell. We focus on the latter, analyzing its constituent self and cross parts and their dependence on the shell index (or the distance from the protein surface). Finally, in Sec. VI, we compare our results with the findings of previous computational studies. Because the results from the four proteins and three water models used here are very similar, some figures include data from only one of the simulated systems; the complete results from all six protein solutions can be found in the supplementary material.

A. MD simulations
Molecular dynamics simulations were performed on four globular proteins in dilute aqueous solution at 300.0 K and 1 bar (Table I): a triple mutant of the immunoglobulin-binding  domain B1 of protein G (GB1), 51 bovine pancreatic trypsin inhibitor (BPTI), 52 mammalian ubiquitin (UBQ) without the three C-terminal residues, 53 and isoform 2-14 of the β-helical antifreeze protein from Tenebrio molitor (AFP). 54 Ionizable groups were modeled in their predominant state at neutral pH. UBQ and AFP are then electroneutral, whereas GB1 and BPTI carry a net charge of −6 and +6, respectively. To avoid the complicating influence of counterions, GB1 and BPTI were neutralized by adding a small charge to each protein atom.
MD simulations were performed with the AMBER 16 package 55 using a 2 fs integration time step and periodic boundary conditions. The proteins were modeled with the Amber ff14SB force field 56 and were solvated in a truncated octahedral cell with the TIP4P-Ew water model. 57 As a control, we also performed simulations of GB1 with the threepoint water models SPCE 58 and TIP3P; 59 these systems are referred to as GB1SPCE and GB1TIP3P. A cutoff of 16 Å was used for short-range interactions and Coulomb interactions were treated with the particle mesh Ewald method. After ∼25 ns equilibration, a ∼100 ns NpT trajectory was propagated for each system (Table I), with coordinates saved every 100 fs. The C α root-mean-square deviation (RMSD) indicated no long-term drift or major conformational transition during the analyzed production runs. Further system and simulation details have been reported. 46

Hydration shells
The hydration (or coordination) shell provides a natural distance metric for a spatially resolved analysis of compressibility. 46 The first hydration shell comprises all water molecules with the oxygen atom within 5.0 Å of a carbon atom. This includes a few (3.3-6.4) more or less deeply buried water molecules. For the second and higher shells, a water molecule belongs to shell n if its oxygen atom is within 4.0 Å of at least one water oxygen in shell n − 1. These simple criteria produce a monolayered first hydration shell that covers 98.5% of the protein surface and higher shells with a high degree of coverage and a uniform thickness of ∼2.8 Å. 46

Volumes and compressibilities
The isothermal compressibility κ describes how the volume V of a closed system, with fixed particle numbers {N}, responds to an isothermal pressure change, The compressibility is also related to the isothermal volume fluctuation δV ≡ V − V of a closed system, 60 demonstrating that κ > 0, as required for mechanical stability. We used Eq. (2) to compute κ from the fluctuating system volume (or its parts) in an NpT molecular dynamics trajectory.
To decompose the compressibility of the protein solution into protein and water contributions and further into hydration shell contributions, we allocated space to (non-H) atoms by means of Voronoi tessellation. For water molecules in the second shell and beyond, and for bulk water, molecular volumes were obtained from the standard (primitive) Voronoi tessellation using MATLAB's delaunayTriangulation class. For protein atoms and water molecules at the interface, the volumes produced by Voronoi tessellation are not unique since they depend on where the protein-water dividing surface is placed. Arguably, the most physically realistic dividing surface is, at every point, equally distant from the vdW surfaces of the neighboring atoms. This "democratic" allocation of free space is accomplished by the so-called additively weighted (aw) Voronoi tessellation. 40 The volume of the protein and of first-shell water molecules was computed with aw tessellation using the algorithm Voronota 41 with atomic radii according to Shrake and Rupley. 61 The same atomic radii, and a 1.4 Å probe radius, were used to compute the solvent-accessible surface area of the protein with the program GETAREA. 62 For bulk water (and the higher hydration shells), aw tessellation should produce the same Voronoi volume as primitive tessellation, but we find that the Voronota algorithm underestimates the water Voronoi volume by ∼0.1% (Table A1 of the supplementary material). To correct for this systematic error, we multiplied all aw Voronoi volumes by a factor 1.001. All results presented in this paper were obtained using (corrected) aw tessellation for the protein and first hydration shell and primitive tessellation for the second and higher shells. For completeness, we also present, in the supplementary material, compressibility results obtained by using primitive tessellation throughout. As compared to aw tessellation, primitive tessellation transfers ∼6% of the protein volume to the first hydration shell (Table A1 of the supplementary material) mainly because aw tessellation allocates more space to (united) carbon atoms. 46 Furthermore, the protein and first-shell compressibilities (κ P and κ 1 ) are both ∼5% larger for primitive tessellation.

Statistics
For each of the six protein solutions (Table I), we computed the system volume V, the protein Voronoi volume V P , the occupation number N n , and shell-averaged water Voronoi volume v n for hydration shells n = 1-9. All Voronoi volumes were computed with both aw and primitive tessellation (Sec. II B 2). Mean values of these quantities and their combinations were computed from a sample of ∼20 000 trajectory frames, spaced 5 ps apart along the trajectory. The standard error of the various compressibilities was estimated by dividing the analyzed frames into ten blocks. The standard error was found to be independent of the number of blocks over a wide range, indicating that the blocks are uncorrelated. 63 The statistical correlation between variables X and Y was assessed via the usual correlation coefficient ρ(X, Y ) = δX δY /[σ(X) σ(Y )], with δX(t) = X(t) − X and σ(X) = (δX) 2 1/2 . The correlation coefficient was computed from the raw data, but the normalized fluctuation δX(t)/σ(X) in Fig. 3 was smoothed by applying a 1 ns moving-average filter.

Cluster analysis
For the bulk-water cluster analysis in Sec. III, 100 reference molecules were randomly selected among the 5000 molecules closest to the system centroid of each frame of the trajectory, again sampled every 5 ps. The statistical averages in Eq. (3) pertain to a "cluster ensemble," that is, a collection of (N + 1)-molecule clusters including the N molecules closest to a randomly selected reference molecule. The cluster ensemble is extracted from the simulated NpT ensemble, to which it becomes equivalent in the thermodynamic limit. But the two ensembles do not converge to this limit in the same way because the cluster ensemble is biased by the selection of the N nearest neighbors. As compared to the NpT ensemble, the cluster ensemble is too compact. As a result, the cluster-averaged molecular Voronoi volume v N and the cluster compressibility κ N are smaller than v bulk and κ bulk , respectively, and converge slowly to these asymptotic values. For example, even for N = 200, κ N is 8% below κ bulk . But our estimate of the correlation length ξ κ is hardly affected by this selection bias.
Alternatively, a correlation length might be defined with the aid of the compressibility equation, which links κ to the spatial integral of the total pair correlation function g(r) − 1 in the grand canonical ensemble. 64 A local compressibility can be defined via a finite upper integration limit, but it approaches the thermodynamic limit in an irregular oscillatory way, 65 making it difficult to extract a correlation length.
FIG. 1. Relative variation of the local compressibility κ N with cluster size, defined as the mean separation r N between the reference molecule and the outermost molecule in the compact cluster. Also shown are the constructions used to determine the correlation length ξ κ (blue dashed) and the pair correlation function g(r) for a cluster with N = 200 (green, right-hand axis).
The compressibility is a collective property, involving correlated volume fluctuations among many water molecules. To estimate the spatial range of these correlations, we consider a compact cluster consisting of a reference molecule and its N nearest neighbors. According to Eq. (2), the compressibility of such a cluster, embedded in the simulated bulk-water system, is where v N is the mean Voronoi volume of the N + 1 molecules in the cluster. Figure 1 shows how the local compressibility κ N approaches its asymptotic value κ bulk . A correlation length ξ κ that can be compared with the exponential decay lengths considered in Sec. V is defined as the distance where 1 − κ N /κ 200 has decayed to 1/e. The resulting correlation length, ξ κ = 4.7 Å, corresponds to a cluster of 16 water molecules, including most of the first two coordination shells ( Fig. 1). This value is comparable to the bulk-water densitydensity correlation length, typically estimated to be in the range 2-7 Å. 67-69

A. Component compressibility
The fluctuation formula in Eq. (2) allows us to compute the protein and water contributions to the compressibility of a protein solution. But this decomposition is not unique because it depends on where we choose to place the protein-water dividing surface. Arguably, the most realistic dividing surface is obtained by aw Voronoi tessellation, 40,41 as used here (Sec. II B 2). Once the dividing surface has been defined, we can partition the system volume into component volumes as (1) and (2), this implies that the solution compressibility can be decomposed as 32,33,50 where where we have identified the contributions from self-correlated and cross-correlated volume fluctuations. The water component compressibility κ W is obtained by interchanging P and W everywhere in Eq. (5). The protein component compressibility and its self and cross parts are presented in Table II and Fig. 2 for the six simulated protein solutions. As expected, the proteins are more rigid than the solvent; κ P being a factor 2-3 smaller than κ bulk .
The self compressibility of GB1, BPTI, and UBQ, κ self P = 0.11 ± 0.01 GPa −1 (mean ± standard deviation), is similar to the (isotropically averaged) isothermal compressibility of ice Ih at 273 K, 70 κ ice = 0.12 GPa −1 . For AFP, the self compressibility is 26% lower than for the other three proteins, consistent with the exceptionally compact structure of this protein, with eight disulfide bonds and a mass density of 1.40 g cm −3 (versus 1.25-1.30 g cm −3 for the other three proteins). Using the self compressibility κ self P in Table II and the mean protein volume V P in Table I, we find that the relative root-mean-square volume fluctuation (δV P ) 2 1/2 / V P is less than 1% (Table II).
Equation (5) tells us that the self compressibility κ self P determines protein volume fluctuations, whereas the component compressibility κ P , including the cross compressibility κ cross P , governs the protein's response to an applied pressure (and the protein's susceptibility to pressure denaturation) and contributes to the intrinsic part of the protein partial compressibility (Sec. IV B). This distinction has not always been recognized.
Protein-water cross correlations are large and positive: κ cross P accounts for 53% ± 4% of κ P (mean ± standard deviation for the four proteins in TIP4P-Ew water). These correlations have a finite range and only involve the first few hydration shells (Sec. V). For GB1 in SPCE or TIP3P water, the cross compressibility κ cross P is significantly smaller (41% of κ P ) than in TIP4P-Ew water.
For the highly dilute protein solutions simulated here, with φ P ≈ 0.01 (Table I), the protein contributes very little to the solution κ in Eq. (4). In fact, κ and κ W do not differ significantly from κ bulk , and κ cross W ≈ φ P κ cross P is comparable to the standard error in κ W .

B. Partial compressibility
The experimentally measured protein compressibility is not the component compressibility κ P , but the partial compressibility κ P . The compressibility is an intensive thermodynamic variable, so it cannot be decomposed in the same way as the extensive volume. Instead, 71 with the partial volume fractions given by is the mean system volume and N W is the number of water molecules per protein molecule (Table I). The protein partial compressibility is defined as 14,17 and κ W is defined analogously, with v W in place of V P in Eq. (7). Some authors 12,17,19,20,[22][23][24] deal not with the compressibility κ per se, but with the extensive quantity K ≡ V κ, which may be called volume-compressibility. 72 (The compressibility field presents the reader with a menagerie of terms, symbols, and units. 73 ) The partial volume-compressibility is defined, in analogy with the partial volume, as K P ≡ (∂K/∂N P ) N W ,T ,p and is related to the partial compressibility in Eq. (7) as K P = V P κ P .
In the "infinite dilution" limit, which applies to the solutions simulated here, the protein partial volume V P can be obtained as Combination of Eqs. (1), (7), and (8) yields 32 with φ 0 W ≡ N W v bulk / V . As seen from Eq. (9), the protein partial volume can be computed without specifying a protein-water dividing surface. But, in practice, κ P cannot be computed with acceptable accuracy in this way because, under the dilute conditions where Eq. (9) is valid, φ 0 W ≈ 1 and κ ≈ κ bulk .
The large statistical error in κ P resulting from Eq. (9) might be reduced somewhat by performing a much longer simulation or simply by simulating a more concentrated protein solution, while keeping N W large enough so that the cell includes some unperturbed water. In our simulations, N W is excessively large because we want to examine the variation of the compressibility, and other properties, 46,74,75 over many hydration shells (Sec. V). Alternatively, as advocated by Ploetz and Smith, 38 the statistical error can be reduced by performing simulations at several protein concentrations. Combination of Eqs. (8) and (9) yields V κ = V P κ P + N W v bulk κ bulk , so κ P and V P can be obtained from linear fits to V κ and V versus N W . 38 While formally exact, Eq. (9) does not answer the following three questions. (1) How is the partial compressibility κ P related to the component compressibility κ P ? (2) How can the experimentally accessible, but obscure, quantity κ P be decomposed into physically transparent contributions? (3) How can the dominant bulk-water contribution to κ be eliminated so that κ P be accurately computed (from simulation data at a single protein concentration)? To answer these questions, we decompose the system volume as where V P is the protein Voronoi volume and v n is the average Voronoi volume for the N n water molecules in shell n. By combining Eqs. (1), (5), (7), and (10), we can decompose the protein partial compressibility into three physically distinct terms, κ P = κ P,int + κ P,hyd + κ P,ex .
The intrinsic protein contribution is and the hydration contribution is where κ n is the intrinsic compressibility of hydration shell n (Sec. V) and the second equality follows from the excellent (Sec. V) approximation N n v n = N n v n . The exchange contribution κ P,ex arises from occupation number fluctuations in the presence of an inhomogeneous density. It is given explicitly by Eq. (20) in Sec. V and is further discussed below. Here, we simply note that, under the present conditions, κ P,ex is negligibly small: 0.009-0.012 GPa −1 (standard error 0.001) for the four proteins. Equations (11)-(13) represent an exact decomposition of the protein partial compressibility. Similar, but approximate, decompositions have been used previously, 15,27,32,50 based on two solvent regions (a hydration shell H and a bulkwater region) and the somewhat inconsistent approximation of neglecting the density difference between these regions, while allowing for different compressibilities. In this approximation, V P = V P , so κ P,int = κ P . Furthermore, κ P,ex = 0 and κ P, Table III presents our results for the protein partial volume, computed from Eq. (8), and the protein partial compressibility and its two dominant parts, computed from Eqs. (11)- (13). The partial volume V P is ∼10% smaller than the intrinsic volume V P (Table I), so κ P,int is ∼10% larger than κ P . We note that κ P,int includes a large contribution (more than half for our proteins) from cross-correlated protein-water volume fluctuations, which are usually ignored in the interpretation of experimental κ P data. 1,2,13,14,19,20,[22][23][24] The negative hydration contribution κ P,hyd shows considerable variation among the four proteins (Table III). Compared to κ P,int , it is of similar magnitude or larger, so that κ P is close to zero or slightly negative (Table III).
Experimentally, κ P is usually determined from measurements of ultrasound velocity. 11,12 This yields the isentropic ("adiabatic") compressibility, but the isothermal compressibility can be obtained by adding a small correction (0.03-0.04 GPa −1 ), 13 involving the thermal expansivity and the isobaric heat capacity. 71,73 In this way, κ P values in the rather wide range 0.03-0.15 GPa −1 have been obtained for two dozen globular proteins. 13,14,19 Apparently, experimental results are not available for proteins of the small size considered here. Surface-to-volume scaling arguments suggest that the negative hydration contribution should be more important for small proteins. 24 We therefore consider the κ P values in Table III to be consistent with the available experimental database.
By using Eqs. (8) and (9) to analyze simulations at multiple protein concentrations, Ploetz and Smith 38 obtained κ P = −0.25±0.32 GPa −1 for BPTI, or 0.05 ± 0.02 GPa −1 when κ was determined more accurately from simulations at multiple pressures. However, their simulations used the rigid-bond Kirkwood-Buff force field. 76 Comparing this force field with Amber ff99SB (similar to the one used here), they found, for another small protein, substantially more negative κ P values. 38 All considered, there is no obvious inconsistency between their results and our BPTI result. The decomposition of κ P is sometimes taken to include a third so-called relaxation term, 11,15,22,77,78 although this term is generally considered to be negligibly small for globular proteins at neutral pH. Experimental κ P values describe the response of the hydrated protein to pressure perturbations at the frequency, ω US ≈ 10 7 s −1 , of the applied ultrasound wave. 11,12 Processes that affect the protein volume, such as large-scale conformational transitions and protonation equilibria, contribute to the compressibility, but only if they occur on time scales shorter than 1/ω US . 11,15,22,77,78 Our simulations extend over 100 ns, so they sample approximately the same part of the fluctuation spectrum as ultrasound experiments. Shell occupation numbers fluctuate on a picosecond time scale, so the molecular exchange effect contributes fully to computed and experimental compressibilities alike. Although the exchange effect can be regarded as a relaxation process, it is associated with the inhomogeneous solvent density rather than with different protein structures, as previously considered. 11,15,22,77,78

V. HYDRATION-SHELL DECOMPOSITION
For two reasons, we expect the solvent compressibility to depend on the distance from the protein surface. First, because water-protein interactions differ from water-water interactions, the physical properties of the first hydration shell differ from bulk water and this perturbation propagates to some extent to higher shells. This applies to the compressibility as well as to more local properties, such as the packing density or inverse molecular Voronoi volume. 46 Second, because compressibility is a collective property reflecting coupled volume fluctuations among several molecules (Sec. III), it is also affected by the mere fact that the protein is more rigid than the water it displaces.
Our objective in this section is to resolve the inhomogeneous solvent compressibility at the level of individual hydration shells. But a decomposition of the solvent compressibility into shell compressibilities is not as straight-forward as the two-component decomposition discussed in Sec. IV. Unlike the protein and total solvent subsystems, the hydration shell, as usually defined, 46 is an open system, whereas the compressibility definitions in Eqs. (1) and (2) apply to closed systems.

A. Pseudo-compressibility
For each configuration, the system volume V can be rigorously decomposed into additive regional Voronoi volumes as in Eq. (10). We shall not attempt to decompose the first shell further on the basis of local chemical or structural surface properties because cross-correlated volume fluctuations limit the extent to which the compressibility can be meaningfully resolved.
With the system volume subdivided as in Eq. (10), the compressibility in Eq. (1) or (2) decomposes as where the first term is the same as in Eq. (4) and φ n = V n / V is the volume fraction of shell n. The quantity is the pseudo-compressibility of shell n, so called and so denoted to remind us that it is not a true compressibility since the shell is not a closed system (N n is not constant). The factorization V n = N n v n in Eq. (10) makes explicit the two physically distinct ways whereby the shell volume V n can fluctuate (or respond to applied pressure): via fluctuations δv n in the shell-averaged molecular volume (intrinsic compression) and via fluctuations δN n in the shell occupation number (molecular exchange). Fluctuations in the shell volume V n are almost entirely due to fluctuations in the occupation number N n . In other words: whereas V n is very strongly correlated with N n , it is only weakly correlated with v n ( Table IV, Table  A2 of the supplementary material). Consequently, N n and v n are only weakly correlated [ Fig. 3(a)], implying that, to an excellent approximation, In fact, for the six simulated protein solutions, the relative error incurred by these approximations is of order 10 −4 or less, which is negligible considering the relatively large standard error in the compressibilities. The approximations in Eq. (16) allow us to split the pseudo-compressibility κ n in two independent parts, associated exclusively with fluctuations in v n and N n , The intrinsic compressibility κ n only reflects fluctuations in v n and cross-correlated fluctuations in v n and v q (in two different shells) or in v n and V P , The exchange compressibility η n reflects fluctuations in the shell occupation number N n and cross-correlated fluctuations in N n and N q (in two different shells), where the prime indicates that we have used the identity δN B = − q δN q to eliminate the last term in the q sum, associated with the boundary region B between the highest explicitly considered hydration shell and the boundary of the simulation cell. The exchange compressibility clearly vanishes for a homogeneous solvent (v q = v B ) or for fixed occupation numbers (δN n = 0). Summing over all solvent regions, we obtain the exchange contribution to the solution compressibility, where we have also indicated the relation of the exchange contribution κ P,ex to the protein partial compressibility in Eq. (11). The shell occupation number N n fluctuates mainly as a result of fluctuations in the solvent-accessible surface area A S of the protein. Not only is N 1 strongly correlated with A S [Fig. 3(b), Table IV, and Table A2 of the supplementary material] but N 1 is also strongly correlated with the occupation numbers (and volumes) of all higher shells (Fig. 4, Fig.  B2 of the supplementary material). These correlations are a consequence of the proximity criterion used to construct the hydration shells. 46 Whereas the occupation numbers of any two shells are positively correlated, the occupation number N B of the boundary region is negatively correlated with any shell, since the total number N W of water molecules is fixed (Fig. 4,  Fig. B2 of the supplementary material). This is also true for the corresponding regional volumes (Fig. 4, Fig. B2 of the supplementary material). As a result, the exchange compressibility η n   FIG. 3. (a) Fluctuation in the first-shell occupation number N 1 (blue) and shellaveraged molecular volume v 1 (red, scaled by a factor 5). (b) Fluctuation of the occupation number N n of the first (red), second (blue), and sixth (green) hydration shell and of the boundary region (gray). Also shown is the fluctuation of the solvent-accessible surface area of the protein (cyan, scaled by a factor 0.75). All data refer to the GB1 simulation and the fluctuations are normalized by the standard deviation.
FIG. 4. Correlation coefficient ρ(X 1 , X n ) for variable X in the first hydration shell of GB1 and in spatial region n (P and B refer to protein and boundary region, respectively). The variable X is occupation number N n (red), shell volume V n (blue), or shell-averaged molecular volume v n (green). The curve resulted from an exponential fit to ρ(v 1 , v n ). and the pseudo-compressibility κ n both become more negative with increasing shells index n (Table A3 of the supplementary material), but are positive for the boundary region. Because a negative compressibility implies that the system is mechanically unstable, it is clear that κ n and η n cannot be interpreted as compressibilities.
However, the total exchange compressibility η, which contributes (slightly) to the solution compressibility κ and to the partial compressibility κ P , does reflect a real physical phenomenon. For simplicity, consider a solvent with a stepfunction density profile, where the solvent density exceeds the bulk value only in a hydration region (H). According to Eq. (20), we then have The linear response interpretation of this fluctuation formula is as follows: when pressure is applied, the system responds by transferring some molecules from the lower-density bulk region to the higher-density hydration shell. For the native proteins examined here, the exchange effect is negligibly small, but η (and κ P,ex ) may be much larger under non-native conditions, 1,2,18,21,22,79 where the protein conformation, and thus N H , fluctuates more.

B. Intrinsic compressibility
The intrinsic compressibility κ n in Eq. (18) can be decomposed into self and cross terms as with Similarly, the protein cross compressibility in Eq. (5) can be decomposed into its contributions from different hydration shells, κ cross P = n N n δv n δV P k B T V P ≡ n κ cross Pn . (24) In contrast to the shell volume V n , the shell-averaged molecular volume v n has a modest correlation with adjacent regions (shell or protein) and this correlation decays to zero with increasing separation between the shells (Fig. 4, Figs. B3 and B4 of the supplementary material, Table IV, Table  A2 of the supplementary material). The decay is approximately exponential, ρ(v 1 , v n ) = exp[−(n−1)/n * ], with a decay "length" n * = 1.1 shells for all six protein solutions (Table A2 of the supplementary material).
The individual contributions κ self n , κ cross nP , and κ cross  Fig. 4, the decay is approximately exponential, e.g., κ cross nq ∼ exp[−(q − n)/n * ] with a decay "length" n * ≈ 1.1 for the first shell (as for the correlation coefficient) and ∼1.35 for the following three shells (Table  A4 of the supplementary material). With a shell thickness of 2.8 Å, 46 this translates into a correlation length of 3.1-3.8 Å, slightly less than the bulk-water correlation length ξ κ = 4.7 Å (Sec. III).
The asymmetry of the bar graph in Fig. 5 shows that κ cross n has a larger contribution from the region outside shell n than from the region inside shell n. Three factors are responsible for this asymmetry. First, there are fewer shells on the inside. Second, the protein, being more rigid than water and smaller in volume than even the first hydration shell, 46 contributes less to κ cross n than a hydration shell. Third, the occupation number N n increases with shell index n so there are more FIG. 5. Contributions to κ P (cyan), κ 1 (red), κ 2 (blue), κ 3 (black), and κ 4 (green) from different hydration shells and the protein (indexed as shell 0) for GB1. Curves in matching color are exponential fits.
water molecules to correlate with in shell n + q than in shell n − q.
The intrinsic compressibility κ n of the first five hydration shells and its self and cross parts are reported in Table V  (see also Tables A5 and A6 of the supplementary material). The four proteins in TIP4P-Ew water have a similar firstshell compressibility, κ 1 = 0.348 ± 0.007 GPa −1 (mean ± standard deviation), ∼80% above κ P but ∼30% below κ bulk (Sec. III). While AFP has 26% lower κ self P than the other three proteins (Sec. IV A), κ 1 is only 3.5% lower than the average for the other three proteins. Cross-correlated volume fluctuations account for 51%-53% of κ 1 (as compared to 53% for κ P ) and 57%-60% of κ n for the next four shells. The effect of the water model is shown in Table A6 of the supplementary material. For GB1 in SPCE water, the bulk-like κ 5 is 6.9% lower than in TIP4P-Ew water, the difference diminishing to 3.4% in the first shell. For GB1 in TIP3P water, κ 5 is 20.7% higher than in TIP4P-Ew water, and the difference remains at 19%-22% in the other four shells.
The variation of the intrinsic compressibility κ n with shell index n is shown in Fig. 6 for the six examined protein solutions. (The corresponding results obtained with primitive Voronoi tessellation are shown in Fig. B5 of the supplementary material.) Here, and in Table V, we only include results up to the fifth shell since significant cross contributions extend out to shell n ± 4 and we have performed the Voronoi tessellation for nine shells. Even though the molecular exchange term δN n δN q has been eliminated from κ n , the fluctuating shell occupation number introduces a slight bias in the averaging  6. (a) Relative variation of the intrinsic compressibility κ n with shell index n. The curve is a joint exponential fit to all six data sets. (b) Relative variation of the intrinsic self compressibility κ self n with shell index n. The curves are exponential fits to the GB1TIP3P data (orange) and a joint fit to the other five data sets (magenta). Color code in both panels: GB1 (red), BPTI (blue), UBQ (black), AFP (green), GB1SPCE (cyan), and GB1TIP3P (orange). over molecular Voronoi volumes in Eq. (18). As a result, κ 5 , which should be unaffected by the protein, is 2%-5% below κ bulk . (The statistical error in κ 5 is 1%-2%.) For the purpose of representing the variation of κ n with n in Fig. 6, we therefore use as the asymptotic large-n value κ ref the average of κ 4 and κ 5 (the two highest shells for which we can compute κ n reliably), rather than κ bulk .
The intrinsic compressibility of the first hydration shell is 25%-30% below κ ref , but already in the fourth shell, the deviation is less than the standard error [ Fig. 6(a), Table V]. The magenta curve in Fig. 6(a) resulted from a joint fit of the exponential function exp(−n/n * ) to all six data sets, yielding n * = 0.95 ± 0.06 (Table A7 of the supplementary material). The difference between the first hydration shell and bulk water is larger for the intrinsic compressibility than for any other static property, but it is largely a trivial effect of the non-local character of κ n : the proximity to a more rigid material (the protein) suppresses volume fluctuations in the first hydration shell, thereby reducing κ cross n as well as κ self n . The self compressibility κ self n only reports on volume fluctuations within shell n, but these fluctuations depend on the compressibility of the adjacent regions. Although not an observable quantity, κ self n is well-defined and computable. No bulk water reference value is available, so we use the large-n asymptote of κ self n , defined here as the parameter b in the fitting function a exp(−n/n * ) + b, which accurately describes the approach of κ self n to its asymptotic value. The relative reduction in the first shell is somewhat smaller for the self compressibility, but the decay "length" is slightly longer: n * = 1.43 ± 0.16 for GB1TIP3P and n * = 1.28 ± 0.08 for the other five systems (Table A7 of the supplementary material).

VI. DISCUSSION
Considering the subsequent force field refinements and orders-of-magnitude increase of computing power, it is perhaps not surprising that the earliest simulation studies 25,26 of the protein and hydration shell compressibility did not produce accurate results. Indeed, it was erroneously concluded that hydration water of BPTI has a higher compressibility than bulk water, whereas we find it to be 30% lower. These studies were also adversely affected by performing the simulations in the NVT ensemble, for which Eqs. (2) and (5) predict that both the solution compressibility κ and the component compressibilities κ P and κ W vanish. The protein component compressibility was instead estimated from a finite-difference approximation to the pressure derivative in Eq. (5), but at the high pressures used (up to 10 kbar) to obtain statistically significant effects, the pressure response is nonlinear (the compressibility depends on pressure). The protein compressibility, estimated from the mean-square volume fluctuation (which only yields the self part) was found to be ∼10% of κ bulk , 25 whereas we find that κ P and κ self P are 46% and 20%, respectively, of κ bulk .
In the first protein compressibility study by simulation in the more appropriate NpT ensemble, Paci and Marchi used (primitive) Voronoi tessellation to partition volume and stressed the importance of protein-water cross fluctuations. 27 While they did not present any κ P values, the computed relative root-mean-square protein Voronoi volume fluctuations (related to κ self P ) of 0.27% and 0.37% for hen lysozyme and superoxide dismutase, respectively, are somewhat less than our values of 0.57%-0.75% (Table II). By contrast, another NpT simulation study found a relative fluctuation of 1.27% in the solvent excluded volume of human lysozyme. 29 Since both these simulations were rather short by today's standards, insufficient statistical sampling may have been an issue.
Dadarlat and Post performed NpT simulations of BPTI and four other proteins, 30 computing the protein compressibility by means of the fluctuation formula in Eq. (1) applied to the protein volume (estimated by grid point analysis), thereby missing the important protein-water cross fluctuations. Their result for BPTI, 0.066 GPa −1 , can be compared to our κ self P = 0.099 ± 0.002 GPa −1 (Table II) but should not be compared to either κ P = 0.23 ± 0.02 GPa −1 or κ P = −0.11 ± 0.04 GPa −1 (Table III). Regarding the hydration shell, they concluded that its compressibility does not differ greatly from bulk water, but the analysis on which this statement is based suffers from the omission of cross fluctuations. In a subsequent study with more focus on the hydration shell, 35 Dardarlat and Post again ignored cross-correlated protein-water volume fluctuations, leading them to conclude that, for trypsin and ribonuclease A, the hydration shell has a higher compressibility than bulk water.
In the first simulation study to report component compressibilities including cross fluctuations, Marchi computed the protein and hydration-shell compressibilities as well as the partial compressibility for BPTI and two other proteins. 32 Despite some notable differences with regard to the system (tenfold higher protein concentration), Voronoi tessellation method (primitive versus aw), and simulation protocol (different force field, shorter trajectory), Marchi's BPTI results are surprisingly (and perhaps fortuitously) similar to ours: κ P = 0.24 ± 0.03 GPa −1 (versus 0.23 ± 0.02 GPa −1 ; Table II), κ H = 0.39 ± 0.03 GPa −1 (versus 0.35 ± 0.01 GPa −1 ; Table V), and κ P = 0.10 ± 0.05 GPa −1 (versus −0.02 ± 0.05 GPa −1 ; Table III). Although Marchi's equation (8) would suggest that the (negative) pseudo-compressibility κ H of the hydration shell was computed, the results suggest that he actually computed the intrinsic first-shell compressibility κ H using v H rather than V H .
The partial compressibility was obtained from the relation κ P = κ P + (φ H /φ P )(κ H − κ bulk ), which results from Eqs. (11)- (13) under the assumption that hydration water has the same density as bulk water, also implying that V P = V P . Marchi used primitive Voronoi tessellation, for which we also find that the first-shell density is very close (0.3% higher) to bulk water, 46 so his analysis is consistent. But for the more realistic aw tessellation, we find that the first-shell density is 5.8% higher than in bulk water. 46 Another difference is the hydration shell definition, where Marchi used the topological criterion of shared Voronoi cell faces, whereas we use a criterion based on spatial proximity (Sec. II B 1). Marchi reported N H = 251, but we obtain N H = 430 for the same shell definition and N H = 369 for our cutoff-based shell definition. 46 The origin of this 70% discrepancy is unclear, but we note that if the larger N H value had been used, Marchi would have obtained κ P = 0.00 ± 0.05 GPa −1 , in complete agreement with our result.
Another simulation study 33 ostensively used the same analysis protocol as Marchi, 32 but obtained rather different results for the hydration shell. For example, the first-shell compressibility of cytochrome c was found to be 67% lower than in bulk water, 33 as compared to 20%-30% found here and by Marchi. 32 Moreover, κ H was found to consist of a larger-thanbulk self term and an almost equally large, but negative, cross contribution from the water molecules outside the first shell. Because all cross terms in the intrinsic shell compressibility κ H are positive (Fig. 5, Table V), this finding suggests, as do the equations given in the study, 33 that they actually computed the first-shell pseudo-compressibility κ H , which, indeed, is composed of a large positive self term and a large negative cross term (Table A3 of the supplementary material).
A different approach to hydration-shell compressibility has been adopted by Garde's group. 49 Performing several NpT simulations at different pressures, they compute a hydration shell compressibility from the mean number N H of water molecules in the shell as χ H = N H −1 (∂ N H /∂p) T . While the compressibility can be related to molecule-number fluctuations in the grand canonical (µVT ) ensemble, 60 it is not clear that χ H can be interpreted as a compressibility. One might argue that the µVT -formula should be approximately valid for a fixed subvolume that is small compared to the simulation cell. But it appears that Garde et al. define their hydration shell by a uniform 5 Å cutoff, 80 in which case the hydration shell volume is not fixed. 46 These misgivings are reinforced by the results: a 70% higher-than-bulk compressibility in the first hydration shell of hen lysozyme and similar results for two other proteins. 49 Tables A1-A7) and Appendix B (supporting Figs. B1-B5).

ACKNOWLEDGMENTS
This work was supported by the Swedish Research Council. Computing resources at Lund University Center for Scientific and Technical Computing (LUNARC) were allocated by the Swedish National Infrastructure for Computing (SNIC). We thank Pär Söderhjelm for stimulating discussions.  Table A2. Correlation coefficient ρ(X, Y ) between variables X and Y .    Figure B1: Normalized distribution f (V ) of system volume for bulk TIP4P-Ew water. Histogram with bin width 100Å 3 (red circles) and Gaussian fit (blue curve).  Figure B2: Correlation coefficient ρ(X 1 , X n ) for variable X between the first shell and spatial region n. The variable X is occupation number N n (red), shell volume V n (blue), or shell-averaged molecular volume v n (green). Index P and B refer to protein and boundary region, respectively. The green curves resulted from exponential fits to the correlation coefficient ρ(v 1 , v n ) for n ≥ 1.  Figure B3: Contributions κ nq to the regional intrinsic compressibility κ n for protein (n = 0, magenta) and the first (n = 1, red), second (n = 2, blue), third (n = 3, black) and fourth (n = 4, green) hydration shells, based on primitive tessellation. The curves with matching colors resulted from exponential fits to the contributions κ nq for the four hydration shells.

APPENDIX A: SUPPLEMENTARY TABLES
S9  Figure B4: Contributions κ nq to the regional intrinsic compressibility κ n for protein (n = 0, magenta) and the first (n = 1, red), second (n = 2, blue), third (n = 3, black) and fourth (n = 4, green) hydration shells, based on aw tessellation. The curves with matching colors resulted from exponential fits to the contributions κ nq for the four hydration shells.  Figure B5: (a) Percent difference from asymptotic value (average of shells 6 -8) of the self compressibility κ self n for hydration shells 1 -8. (b) Percent difference from asymptotic value (average of shells 4 -5) of the intrinsic shell compressibility κ n for hydration shells 1 -5. The magenta curve resulted from a joint exponential fit to all six data sets. Color code in both panels: GB1 (red), BPTI (blue), UBQ (black), AFP (green), GB1SPCE (cyan), and GB1TIP3P (orange). Compressibility based on primitive tessellation.