SSAGES: Software Suite for Advanced General Ensemble Simulations

Molecular simulation has emerged as an essential tool for modern-day research, but obtaining proper results and making reliable conclusions from simulations requires adequate sampling of the sys-tem under consideration. To this end, a variety of methods exist in the literature that can enhance sampling considerably, and increasingly sophisticated, effective algorithms continue to be developed at a rapid pace. Implementation of these techniques, however, can be challenging for experts and non-experts alike. There is a clear need for software that provides rapid, reliable, and easy access to a wide range of advanced sampling methods and that facilitates implementation of new techniques as they emerge. Here we present SSAGES, a publicly available Software Suite for Advanced General Ensemble Simulations designed to interface with multiple widely used molecular dynamics simulations packages. SSAGES allows facile application of a variety of enhanced sampling techniques—includingadaptivebiasingforce,stringmethods,andforwardﬂuxsampling—thatextract meaningfulfreeenergyandtransitionpathdatafromall-atomandcoarse-grainedsimulations.A noteworthyfeatureofSSAGESisauser-friendlyframeworkthatfacilitatesfurtherdevelopment andimplementationofnewmethodsandcollectivevariables.Inthiswork,theuseofSSAGES isillustratedinthecontextofsimplerepresentativeapplicationsinvolvingdistinctmethodsand differentcollectivevariablesthatareavailableinthecurrentreleaseofthesuite.Thecodemay befoundat:https://github.com/MICCoM/SSAGES-public. ' 2018 Author(s). All article content, except where otherwise noted, is licensed under a Creative Commons Attribution (CC BY) license (http://creativecommons.org/licenses/by/4.0/). https://doi.org/10.1063/1.5008853


INTRODUCTION
The past several decades have seen molecular simulations emerge as an essential tool for molecular research. 1Through the formalism provided by statistical mechanics, molecular simulations are able to relate phenomena at small length scales to experimental observables such as temperature, pressure, and other thermodynamic quantities.The validity of such calculations relies on proper sampling of the system under consideration, which is often achieved by running a simulation over a sufficiently long period of time.For many systems of interest, however, the time necessary to explore the phase space is prohibitively long, and one must resort to advanced sampling techniques.
Conventional molecular dynamics (MD) or Monte Carlo (MC) simulations generally encounter severe limitations in a) H. Sidky and Y. J. Colón contributed equally to this work.systems that exhibit rugged energy landscapes, which are characterized by multiple metastable states that are separated by large energy barriers.Examples include proteins, [2][3][4][5][6][7][8] small molecules in viscous solvents or membranes, [9][10][11][12][13][14] and phase transitions, [15][16][17][18][19][20] to name a few of the classes of simulation scenarios that one may encounter in a variety of disciplines. 21,22nhanced sampling techniques seek to eliminate or circumvent such barriers, thereby providing a means to recover thermodynamic quantities for systems that would be difficult to study through a direct MD approach.
A second limitation of traditional MD or MC simulations is that they only provide direct access to structure and "mechanical" properties such as the temperature or pressure.Thermodynamic quantities such as the entropy or the free energy can only be determined by relying on external methods that involve integrals of mechanical quantities over specific thermodynamic variables and pathways.The essence of an enhanced sampling method is to introduce an external bias, usually in the form of an external potential or an external force, which promotes the exploration of an activated process, described with a set of collective variables (CVs).Such collective variables are functions of the atomic coordinates of the system.Given a set of CVs and an enhanced sampling method, one can usually recover thermodynamic and kinetic quantities in the space that the CVs define, such as the free energy, entropy, or reaction rates.As explained in what follows, advanced biasing techniques and CVs provide information relating to the entropy or free energy as a byproduct of the simulation, thereby providing a strong incentive to implement such methods in order to fully exploit a simulation.
A variety of techniques have been developed to address these challenges.1][42][43][44][45] Similarly, typical CVs on which such techniques may be used include separation distance, dihedral angles, and coordination number. 46Implementing these methods and CVs can be challenging and requires access to quantities defined in the MD software that are usually inaccessible to common users, e.g., Cartesian coordinates, momenta, forces, or energies.As a result, a majority of simulations to date have relied on standard, unbiased molecular dynamics simulations or have used the simplest (and easiest to implement) biasing techniques and CVs.Recent efforts have sought to remedy this situation by providing generalized, open-source software that can interface with multiple molecular dynamics engines and bias a trajectory through the use of various enhanced sampling methods and CVs.Examples include plug-ins such as PLUMED, 47,48 PyRETIS, 49 open path sampling, 50 and various collective variables modules. 46,51While collectively these codes offer a wide variety of sampling algorithms, individual codes generally implement a few select methods.Path sampling and adaptive bias methods, for example, are offered separately.There is a need for a unified framework that allows for a seamless transition between types of methods and techniques, thereby enabling comparison between them.This capability will in turn allow users to make informed decisions about what sampling approach is most effective for a particular system of interest and facilitate both free energy and path sampling calculations.
In this work, we present SSAGES, a Software Suite for Advanced General Ensemble Simulations.SSAGES is a free, open-source software package written in C++ that allows users to easily apply enhanced sampling techniques to any molecular system of interest.SSAGES currently interfaces with LAMMPS, 52 GROMACS, 53,54 OpenMD, 55 Qbox, 56 and DASH. 57SSAGES includes enhanced sampling techniques such as: umbrella sampling, steered MD, metadynamics, forward flux sampling (FFS), nudged elastic band (NEB), finite temperature string (FTS), swarms of trajectories (SoT), adaptive biasing force, basis function sampling, and artificial neural network (ANN) sampling.The collective variables available in SSAGES include: angle, dihedral, distance between centers of mass, Rouse modes, secondary structure, box volume, gyration tensor, and coordination number.Importantly, SSAGES has been created in a highly modular manner, allowing for easy implementation of new methods and collective variables or modification of existing ones.Herein we describe the overall structure of the code, as well as the methods that have been implemented and tested within the package.We also provide guidelines to implement new methods and collective variables using the SSAGES framework.

DESCRIPTION OF THE CODE
Figure 1 summarizes the workflow used in SSAGES to perform an enhanced sampling calculation.Input files are defined using the JSON (JavaScript Object Notation) syntax, which is a lightweight human and machine-readable file format with native support across many programming and scripting languages.This facilitates easy scripting and automated workflows without the burden of implementing custom parsers by the end user.Communication between SSAGES and the engine takes place through a lightweight adapter, termed "hook," which bridges engine-agnostic methods and CVs implemented within SSAGES and the integration of the equations of motion by the underlying MD engine.
Once a simulation is initialized by the MD engine of choice, the relevant atomic coordinates, forces and momenta, the box virial tensor, and other thermodynamic quantities are copied by the hook into a SSAGES "snapshot" object, which is then exposed to the SSAGES framework.The collective variables of interest are calculated, and then a user specified method makes the necessary changes to the snapshot based FIG. 1. Flowchart of running a simulation using SSAGES.A JSON input file defines collective variables and sampling methods to be used.SSAGES then starts the simulation with a given engine and updates quantities according to the method until convergence or a prescribed number of iterations are met.
on the state of the system and the collective variables.Other modules declared by the user that "observe" the simulation are also invoked.These may include logging or histogram modules to aid the user in post-processing or collect additional useful information.The final modified snapshot is then used by the hook to update the appropriate data within the engine, after which a simulation step is completed.This iterative process is repeated until a predefined number of steps are executed or a method converges.
As described above and illustrated in Fig. 1, SSAGES routines are engine-agnostic.The only engine-specific code that needs to be written is the hook, which must integrate into engine components that expose the necessary data structures.Strict compartmentalization ensures that the only objects and functions exposed to the remainder of the SSAGES code are independent of engine implementation.This setup allows for new methods to be developed using a single unified programming interface, fully decoupled from the intricacies of any particular engine.Furthermore, exposing all the methods available within SSAGES to a new engine simply requires the development of a single additional hook.This is particularly useful for users who may want to rely on their own engines.The open source nature of this project welcomes such public contributions to the base code, which we envision will rapidly accelerate the utilization of enhanced sampling techniques in the field.This construct also simplifies the task of maintaining support with new releases of the existing MD codes since any changes are reflected only within the hook.
Another important aspect of SSAGES is the straightforward implementation of new methods and collective variables (Fig. 2).The addition of a new method requires that a new class, which derives from the base "Method" class, be programmed.There are four functions that every method must implement: "PreSimulation," which is invoked after the construction of the simulation engine but before integration begins, "PostIntegration," the primary function call that is invoked every step FIG. 2. Required functions for a method (left) and a collective variable (right).The pre-simulation section is used to initialize internal variables, and the postsimulation section is used for post-processing and to close any files that may have been used during the simulation.The post-integration section is where the heart of the method is located.This section is called after every step and is responsible for the method calculations.Both methods and CVs are constructed from JSON objects in the build call.
after forces are calculated by the engine-but before atomic coordinates are updated, "PostSimulation," invoked after all steps have been completed prior to the shutdown of the engine, and "Build," which constructs a new method object from a parsed JSON input.The first three functions are provided with a pointer to a snapshot that may be manipulated and a reference to a CV manager that provides access to initialized CVs with selection rules.
Although there are no strict requirements for what is contained within each of the aforementioned functions, it is customary that output file setup, memory allocation, and methodspecific data structures be initialized during pre-simulation.The biasing or other system manipulation occurs during postintegration.This typically involves the addition of force to atoms based on CVs or the direct manipulation of atomic coordinates and summing the virial tensor contribution for constant pressure simulations.For methods that require additional postprocessing, this task is usually reserved for the PostSimulation function.
Since different methods often rely on similar building blocks, SSAGES provides convenient access to implementations of various reusable components.For example, many methods either directly depend upon or are accelerated by grids that discretize CV space.SSAGES offers two efficient grid implementations that are templated to support arbitrary data types.One is tailored for use as a histogram, offering overflow bins, and the other is more general and better suited for storing forces.Both automatically handle periodic coordinates, support multidimensional interpolation, provide custom iterators, and include functions for saving/loading the grids to/from disk.Another example is related to the wide range of published string methods, which rely on slightly differing algorithms; a base string method class is available that implements a large portion of the necessary inter-process communication, greatly simplifying the implementation of other string techniques.Similarly, a reference forward flux sampling implementation is available that can be extended to support different "flavors" of forward flux sampling.
Like any method, a CV must derive from a base class and implement required functions.Because CVs are simply functions of system configurations, they are unable to manipulate snapshots and are given read-only access.Doing so limits the responsibility of a CV and in turn reduces the possibility of introducing undesirable errors.The "Initialize" function (Fig. 2) is used by a CV to ensure that all the participating atoms are available at runtime-this is the final part of a multitiered input validation process described later on."Evaluate" is responsible for calculating the CV in addition to its gradient with respect to the 3N atomic Cartesian coordinates and 6 box degrees of freedom.For some CVs, calculating the latter term may be difficult and is only necessary if proper treatment in the isobaric ensemble is desired.The calculated CV value and gradient are passed on to the methods which may propagate forces along the CVs to the atoms using the chain rule.A CV may also optionally override a "GetDifference" function which allows a method to properly measure distances along periodic CVs.
An important consideration for the development of both methods and CVs is the parallel nature of modern molecular dynamics codes.To support the large-scale simulation of thousands or millions of atoms, many engines rely on distributed computing strategies spanning multiple processors through domain and force decomposition.The necessary inter-process communication is often implemented using the Message Passing Interface (MPI) programming interface, which SSAGES handles seamlessly.Multi-threading using graphics processing units (GPUs) or OpenMP do not affect the availability of atomic information and do not alter the operation of SSAGES.In situations involving multiple processors, whether for a single system or multi-walker simulation, which SSAGES supports, global and intra-walker communicators are provided to methods and CVs.An additional MPI wrapper is also available which greatly simplifies inter-process communication.
To minimize the need for the tedious MPI calls that are necessary to gather information distributed across multiple processors, the snapshot provides helper functions which map global atomic indices to local processor-specific IDs, gather the center of mass of an atom group across the necessary processors, and collect all atomic coordinates into a single array.
Performance is of utmost importance when performing advanced sampling simulations.Some amount of overhead is unavoidable, even with the most basic methods, simply due to the additional calculations that must be performed at every step.Additionally, elaborate CVs may involve computationally expensive gradients or require significant interprocess communication.While an arbitrary implementation of a method or CV is not guaranteed to be efficient, SSAGES was designed in such a way so as to minimize the fixed overhead and offer tools to accelerate calculations.The SSAGES hook and synchronization mechanism is very lightweight and does not rely on complex polymorphism or abstractions, thereby resulting in a measured overhead of 1%-2% across all tested engines.Thus, pairing a simple method and CV will not introduce a significant slowdown compared to an unbiased simulation.Furthermore, numerical arrays are built upon Eigen, a C++ numerical library, which provides linear algebra operations in a natural syntax and takes advantage of single instruction multiple data (SIMD) extensions on modern processors, leading to a substantial speedup over serial loops. 58CV and method developers are encouraged to take advantage of SIMD parallelism when developing new classes.
To initialize a method or CV at runtime requires parsing of a JSON input file.SSAGES adopts a multi-tiered approach to validate the input.The integrated JSON parser is responsible for validating the overall structure and syntax of the input file.Gross syntax errors such as unclosed braces or unterminated quotes are caught immediately, and the user is informed.Basic data validation is provided for CVs and methods using the JSON schema draft 4 specification.This entails the specification of a JSON schema file which contains a "blueprint" or set of expectations by the developer for their object.Validation at this level includes data types, upper and lower bounds, string regular expression matching, required and optional fields, and enumerations.Once the schema is validated against the input file by the SSAGES engine, a C++ object containing the data is provided to the appropriate build function where a developer can perform semantic validation using advanced logic.However, the use of a schema greatly simplifies developers' efforts since they will have necessarily received the appropriate data and types.The object can be initialized using these data, and the final runtime validation, such as checks for the presence of atoms, can be performed at the pre-simulation stage.
Figure 3 shows a JSON input file to use a method (umbrella sampling) and a CV (distance between centers of mass).The input file specifies two distance CVs to be calculated.The first is the distance between the center of mass of particle 1 and 2 and the center of mass of particle 3 and 4. The second is the same but with particles 5 and 6 and particles 7 and 8. Then the input file identifies the method: umbrella sampling.It specifies that the method will operate on both CVs.The first will have a spring constant of 100 and a center of 0.5 while the second will have a spring constant of 200 and a center of 2. The units of these quantities will be consistent with the internal units of the engine of choice.Further instructions on how to implement new methods and CVs can be found in the documentation of the software.

OVERVIEW OF METHODS
In what follows, we only provide a brief description of the methods that are currently included in SSAGES.For more detailed explanations of the techniques, we refer the reader to the references contained within this section.We also note that the list of methods available in SSAGES continues to grow, and users interested in the latest release are referred to the code's website (https://github.com/MICCoM/SSAGES-public).
A wide range of techniques have been developed to improve sampling efficiency and allow the extraction of quantities of interest: free energies, transition pathways, and/or rates.One technique implemented in SSAGES is umbrella sampling.It helps to overcome energetic barriers and improve sampling by applying a bias along a specified collective variable. 59The bias often takes the form of a harmonic potential and is typically constant throughout a simulation.Usually, a series of umbrella-sampled simulations are performed and analyzed together using the weighted histogram analysis method (WHAM) 60,61 or umbrella integration. 62][65][66] To calculate free energies, SSAGES makes use of flathistogram or density-of-states methods.These are powerful methods that calculate free energies by developing uniform distributions along expanded ensemble partition functions.More details on these methods can be found in a recent review by Singh, Chopra, and de Pablo. 30Of the flat-histogram methods, SSAGES currently includes metadynamics, 67 basis function sampling (BFS), 38 and the recently proposed artificial neural network (ANN) sampling. 68The main difference between these methods is how the bias is applied to achieve a uniform distribution.Within metadynamics, a history-dependent bias is accrued through the sequential addition of Gaussian biases to the collective variables (CVs) of interest, which acts to push the system away from previously visited states.As the bias accrues to the height of nearby features in the free energy surface (FES), the system's trajectory can escape a local free energy minimum and begin to explore other regions of CV space.After a sufficient number of biases have been applied, the total applied bias within a given region begins to oscillate around the negative of the free energy in that region.There are many variants of metadynamics: well-tempered, flux-tempered, transition-tempered, and adaptive Gaussians.Currently, SSAGES includes only standard metadynamics with fixed-shape hills.
Instead of dropping Gaussians, basis function sampling (BFS) relies on the summation of Kronecker deltas to bias a simulation.The Kronecker delta is approximated by the projection of a locally biased histogram to a truncated set of orthogonal basis functions.By projecting a basis set relative to a weight function, the method resolves the same properties as the Kronecker deltas but in a continuous and differentiable manner that lends itself to MD simulations.SSAGES currently constructs the basis set using Legendre and Chebyshev polynomials and Fourier series.Convergence of the method is based on the updates of the basis set coefficients.ANN sampling uses Bayesian-regularized neural networks to develop an ideal bias that is free from ringing and boundary artifacts associated with basis functions and metadynamics. 68It is capable of rapidly conforming to varied free energy landscapes by interpolating undersampled regions and extrapolating bias into new unexplored areas.
A variant of the flat-histogram methods is adaptive biasing force, or ABF, which also seeks to sample uniformly over CV space.Unlike other methods, however, ABF does not obtain a direct estimate of the free energy surface but rather the derivative of the free energy in the CV direction: the generalized force on that CV by the system.In practice, this translates to binning coordinates in CV space with an instantaneous estimation of the force.This instantaneous estimate fluctuates around the true, global free energy derivative at that point, but the average quickly converges to the real value.Then, the free energy derivatives can be integrated to determine the free energy surface.More detailed explanations of the method can be found in Refs.69 and 70.SSAGES follows the implementation of Darve and co-workers. 69n addition to free energy methods, SSAGES provides methods to identify transition pathways between metastable states.These chain-of-states methods discretize a proposed pathway that connects two states into images or nodes.As sampling proceeds, the pathway is iteratively evolved by adjusting the position of these nodes.There are many variants, and SSAGES currently includes three: nudged elastic band (NEB), finite temperature string (FTS), and swarm of trajectories (SoT).2][73] The "nudged" nature refers to a force projection that ensures the images do not all fall into the metastable states as they evolve.This projection is accomplished by using the parallel portion of the spring force and the perpendicular portion of the true force.In this way, the spring forces act similarly to reparameterization schemes common to string methods.When the images are in Cartesian space, NEB converges to the minimum energy path (MEP).When they are in CV space, which requires the introduction of an M-tensor, NEB converges to the minimum free energy path (MFEP).FTS converges to the principal curve, which intersects each of the perpendicular hyperplanes that it passes through at the expected value of each hyperplane.Instead of sampling along each hyperplane belonging to each node along the string, SSAGES employs the Voronoi approximation introduced by Vanden-Eijnden and Venturoli. 74Each image is free to explore CV space within its own Voronoi cell: the region in CV space where any point is closer to its node than any other node along the string.The string is then evolved toward the running averages in CV space for each image along the string.When the string is updated, the nodes are redistributed such that they are an equal arc length apart and a cubic spline is used to ensure that the curve is smooth.Finally, SoT launches a large number (a swarm) of unrestrained trajectories from each image on the string and estimates the average drift of the collective variables over the swarm.The images along the string are then evolved along the average drift.As a result, SoT yields the most probable transition pathway (MPTP).Similar to FTS, the string is reparameterized such that the nodes along the string are an equal arc length apart.Additional details on SoT may be found in the literature. 75astly, another type of transition pathway method available in SSAGES is forward flux sampling (FFS).FFS produces as output the rate and the trajectories for a system that goes from an initial state A to a final state B. This is done by choosing several intermediate states or interfaces between the states such that the energy barriers between adjacent interfaces are readily surmountable in a typical simulation.Using configurations at each interface, attempts are made to arrive at the next interface in the forward direction.A large number of trajectories are launched at each interface, and successful trials are stored and used as checkpoints in the next interface.From successful and failed trials, conditional probabilities of interface crossing may be calculated and hence the transition rate from

OVERVIEW OF COLLECTIVE VARIABLES
Collective variables (CVs) are functions of the atomic coordinates in a molecular simulation.Table II shows the CVs included in SSAGES along with a short description.We have implemented several CVs highlighted by Fiorin and co-workers 46 and added others relevant to multiple research endeavors.Although SSAGES currently does not include as many collective variables as other sampling packages (e.g., PLUMED and Colvars), it still includes the most commonly used CVs 46 and provides the capability to utilize these CVs with the range of methods in Table I.Importantly, as described above, the implementation of new CVs is straightforward, and the project invites user contributions that will help expand the list of options.

REPRESENTATIVE EXAMPLES: METHODS
To highlight the current capabilities of SSAGES, several representative examples are provided in what follows.A feature of SSAGES is the ability to easily change between methods to study the same system.We compare the results obtained using ABF, BFS, metadynamics (MTD), and ANN sampling for the dissociation of NaCl in explicit water.NaCl was modeled using parameters reported in the literature 84,85 and extended simple point charge (SPC/E) 86 water model was used.An NaCl pair along with 1498 water molecules was Calculate pairwise distances between two groups of particles.The distances are then passed to a Gaussian or switching function.The resulting quantities can be used to distinguish between clusters 79 or calculate coordination number. 80

Particle coordinate
Cartesian coordinates of a group of particles (x, y, or z).

Particle position
Distance of a group of particles from a specified Cartesian coordinate (x, y, or z).

Particle separation
Distance between centers of mass of two groups of particles.

Rouse mode
Rouse mode of a polymer chain.

Alpha helix
Calculate alpha-helix character over a range of residues as the number of six-residue segments which match an ideal alpha-helical configuration. 81

Parallel beta-sheet
Calculate parallel beta-sheet character over a range of residues as the number of six-residue segments which match an ideal parallel beta-sheet configuration. 81

Anti-parallel
Calculate anti-parallel beta-sheet character over a range of beta-sheet residues as the number of six-residue segments which match an ideal anti-parallel beta-sheet configuration. 81

Torsional
Proper dihedral between four atoms. 82,83IG. 4. Free energy profile for NaCl dissociation in water using ABF, BFS, MTD, and ANN sampling.All three methods produce very comparable results, which serves as a means to validate the methods within SSAGES.
simulated at a constant temperature and pressure of 298.15K and 1.0 bar using the stochastic velocity rescale thermostat and Parrinello-Rahman barostat, respectively.Electrostatic interactions were calculated using particle-mesh Ewald method with a 10 4 accuracy.The simulations were run for 30 ns with a 2 fs time step using GROMACS 2016.3.Hydrogen bonds were held fixed using the LINCS algorithm. 87Methodspecific parameters were chosen optimally to achieve rapid convergence.
As can be seen from Fig. 4, all three methods yield very similar free energy estimates.Metadynamics, in particular, displays oscillations not present in the other methods.This is a known shortcoming of non-tempered metadynamics and need not be discussed here.The summed hills were averaged once a level bias began accruing with a stride of 100.Longer simulation times are certain to yield a smoother curve.However, the main point that we wish to emphasize is that having access to software that offers distinct methods for sampling and calculation of free energies allows one to compare the performance of different algorithms for a specific application and identify an optimal choice.
Similarly, we calculate the free energy surface (FES) for the conformational energies of alanine dipeptide (ADP) in vacuum and compare the results of ABF, BFS, metadynamics, and ANN sampling (Fig. 5).As with NaCl, we used GROMACS for the simulations.ADP was modeled using the Amber99sb forcefield 88 and hydrogen bonds were held fixed using the LINCS algorithm. 87Simulations were run in the NVT ensemble at 298.15 K using the stochastic velocity rescale thermostat.Electrostatic interactions were calculated using particle-mesh Ewald method with a 10 4 accuracy.The simulations were run for 100 ns with a 2 fs time step using GROMACS 2016.3.The free energy surfaces were calculated using two torsional angles along the backbone as collective variables.We find the three configurations with the lowest free energy at approximately (2.5, 2.7), (1.4,1.1), and (1, 1.1).These structures have also been found and reported previously in the literature. 47,67ext, we present a comparison of the nudged elastic band and the string methods implemented in SSAGES.As previously discussed, three methods are available: the nudged elastic band method, the finite temperature string method, and the string method with swarms of trajectories.In Fig. 6, an initial chain of 22 images is overlaid on a free energy surface for the alanine dipeptide configurations, obtained via ABF.The three converged pathways obtained from the three chain-of-states methods using this initial chain are also plotted over the surface in Fig. 6.All simulations, including ABF, were performed using LAMMPS with the same force field without holding hydrogen bonds rigid.
FIG. 5. Free energy surface of ADP configurations in vacuum calculated with ABF (a), BFS (b), Metadynamics (c), and ANN sampling (d).All methods find the same minima and identify the same barriers, though the explicit height and shape of high free energy barriers varies.FIG. 6. Free energy landscape with chain-of-states results for ADP calculated using nudged elastic band (diamonds), finite temperature string (circles), and swarm of trajectories (stars).The contour map was obtained using ABF.We observe excellent agreement between nudged elastic band and the string methods, as well as with ABF.
The estimated location of the transition barrier is in excellent agreement across all three methods, as well as the location of the two minima.However, the three paths differ slightly from one another outside the vicinity of the transition barrier and the two minima.This is likely a manifestation of several effects, including the fact that each of these three methods theoretically converges to a different type of pathway: NEB to the minimum free energy pathway (MFEP), FTS to the principal curve, and SoT to the most probable transition pathway (MPTP).Additionally, as discussed above, the metric tensor is assumed here to be identity for NEB and FTS for computational efficiency, which is not necessarily always the case.

REPRESENTATIVE EXAMPLES: COLLECTIVE VARIABLE
We next illustrate the application of SSAGES with a new CV.We consider the free energy profile for the Rouse modes of a polymer chain, which can describe both the internal structure and dynamics of various polymer systems. 89,90For a polymer comprised of N segments, the Rouse mode coordinates are computed as follows: where p = 0, . .., N 1 indicates the mode index, R i is the position of the ith segment in the polymer chain, and c p is a constant equal to 1 for p = 0 and equal to 2 for all other modes.In Eq. (1), the zeroth mode roughly maps to the polymer's center of mass, and the remaining modes can be thought as collective variables corresponding to sub-chains of (N 1)/p segments.For example, the relaxation at long times of the endto-end vector R = R N 1 R 0 corresponds to the relaxation of X 1 . 91ere, we compute the free energy profile along X 1 = √ X 1 • X 1 of an ideal, Gaussian chain with SSAGES using both umbrella sampling and ABF, with unbiased, bruteforce molecular dynamics and with the exact result given by FIG. 7. Free energy profiles along the first Rouse mode amplitude X 1 , using reduced units, for an ideal Gaussian chain obtained from unbiased, bruteforce MD simulation (black line with gray error region), the ABF method in SSAGES (red circles), the umbrella sampling method in SSAGES (blue diamonds), and Eq. ( 2 where b 2 is the mean-squared bond length between adjacent segments of the polymer chain.Figure 7 compares the results of the different methods for F(X 1 ) using reduced units with b = 1 and k B T = 2/3.All MD simulations employed the LAMMPS simulation engine.The figure shows that both umbrella sampling and ABF as implemented in SSAGES capably reconstruct the correct free energy profile.For this toy example, brute-force, unbiased MD also generates the correct free energy profile via Boltzmann inversion, albeit over a more limited range (there is no sampling for X 1 > 16.50) and with additional statistical error.Although the results for this system are trivially obtained, this first application, accelerated by the SSAGES framework, demonstrates an intriguing strategy to use specific Rouse modes as biasing for more novel future applications.
The umbrella sampling results were obtained using 21 simulation windows with harmonic biasing potentials given by 1 2 k X 1 − X (l) 1 2 , with k = 1.0 and X (l) 1 = l, for l = 0.0, . .., 20.0.Each simulation ran for 2.1 × 10 6 time steps with X 1 sampled every 50 time steps after the first 10 5 time steps used as equilibration; the free energy profile was reconstructed using WHAM.The ABF results were obtained using four independent simulations, and each simulation ran for 4 × 10 6 time steps.The unbiased, brute-force MD results were obtained from a single simulation of 4.1 × 10 6 time steps.

OUTLOOK
We present an open-source, free, and modular software for the general use of enhanced sampling techniques: SSAGES (Software Suite for Advanced General Ensemble Simulations).The software currently interfaces with LAMMPS, GROMACS, OpenMD, QBox, and DASH and includes the following methods: umbrella sampling, metadynamics, forward flux sampling, nudged elastic band, finite temperature string, swarms of trajectories, adaptive biasing force, artificial neural network sampling, and basis function sampling.We demonstrate SSAGES' versatility by comparing various systems across different methods and collective variables and showing their agreement.Further, SSAGES provides a flexible framework for the modification and implementation of new methods and CVs.We believe that SSAGES will be used and developed by the scientific community, promoting the development and exchange of new methods and CVs.The code is open-source and can be downloaded from GitHub at https://github.com/MICCoM/SSAGES-public.

FIG. 3 .
FIG. 3. JSON input file for performing umbrella sampling on two distance CVs.
FIG.7.Free energy profiles along the first Rouse mode amplitude X 1 , using reduced units, for an ideal Gaussian chain obtained from unbiased, bruteforce MD simulation (black line with gray error region), the ABF method in SSAGES (red circles), the umbrella sampling method in SSAGES (blue diamonds), and Eq.(2) (orange line).Insets are three representative configurations of the Gaussian chain for X 1 ≈ 1.0, 5.0, and 15.0 from left to right.Error bars and/or error regions represent the standard error of the mean.

F
X p = −k B T

TABLE I .
Summary of methods available in SSAGES and other open-source software.

TABLE II .
Collective variables available in SSAGES.