Practical rare event sampling for extreme mesoscale weather

Extreme mesoscale weather, including tropical cyclones, squall lines, and floods, can be enormously damaging and yet challenging to simulate; hence, there is a pressing need for more efficient simulation strategies. Here we present a new rare event sampling algorithm called Quantile Diffusion Monte Carlo (Quantile DMC). Quantile DMC is a simple-to-use algorithm that can sample extreme tail behavior for a wide class of processes. We demonstrate the advantages of Quantile DMC compared to other sampling methods and discuss practical aspects of implementing Quantile DMC. To test the feasibility of Quantile DMC for extreme mesoscale weather, we sample extremely intense realizations of two historical tropical cyclones, 2010 Hurricane Earl and 2015 Hurricane Joaquin. Our results demonstrate Quantile DMC's potential to provide low-variance extreme weather statistics while highlighting the work that is necessary for Quantile DMC to attain greater efficiency in future applications.

When rare events are studied using simulation, it can take a long time to gather sufficient data through direct sampling. As an alternative to direct sampling, specialized rare event sampling algorithms provide data more quickly, thus reducing computational costs. Here, we present a new rare event sampling method, Quantile DMC, that is simple to use. Quantile DMC performs extremely well on a one-dimensional test case, accurately estimating rare event probabilities with less than one thousandth the computational cost of direct sampling. Quantile DMC could potentially be of use in complex rare event simulations, for example, simulating the frequency of tropical cyclones, mesoscale convective systems, or floods under different climate conditions. When we apply Quantile DMC to simulate intense tropical cyclones, we obtain promising results: storms at high intensities are more reliably simulated using Quantile DMC compared to direct sampling.

I. INTRODUCTION
A common strategy for estimating rare event probabilities is direct sampling [1][2][3] . The direct sampling approach is to repeatedly simulate data from a model and then calculate the frequency of a rare event over all the simulated data. This approach can be effective in some contexts but can also be computationally expensive. For the rarest probabilities, an exorbitant amount of computational effort might be required before the event occurs even once in the simulations. Responding to these concerns, researchers as early as the 1950s 4,5 developed specialized a) Electronic mail: rw2515@nyu.edu rare event sampling algorithms to improve computational efficiency.
Today, a diverse community of scientists uses rare event sampling and analysis tools to study processes that take place infrequently, are too complex to be described analytically, and can be simulated on a computer. For example, the following extraordinary events have all been simulated using rare event sampling: a high-energy particle penetrating a nuclear shield 6 , a life-sustaining protein-protein reaction 7 , and an extreme loss of portfolio value 8 .
A burgeoning field of research explicitly links rare event simulation and analysis tools with geophysical applications [9][10][11][12][13][14] . In a recent paper, Ragone, Wouters and Bouchet 12 showed that rare event sampling methods can be used to study the probability of extreme weather occurring. They sampled intense 90-day heat waves over Western Europe at a fraction of the computational expense of direct sampling. Their simulations led to the surprising insight that extreme heat waves over Western Europe are associated with a stationary wavenumber 3 anomaly in the jet stream.
The work of Ragone and coauthors concerns synoptic scale weather, weather that occurs on a length scale of 1000km or greater 1 . A more challenging question is how to apply rare event sampling techniques to mesoscale weather, which occurs on a smaller length scale of 10-1000km. Extreme mesoscale weather, including tropical cyclones and floods, accounts for many of the world's most destructive natural disasters 15,16 . Yet simulations of mesoscale weather can demand enormous computational resources due to the need for high spatial resolution 17,18 . Mesoscale weather simulation presents a unique challenge for rare event sampling and analysis, where the need for efficient simulation strategies is great and yet sample size is highly limited due to computational expense.
In Plotkin et al. 14 , we present a rare event analysis strategy for potential use in extreme mesoscale weather simulations. Using a computationally efficient algorithm, we identify maximum likelihood perturbations that lead to the occurrence of an intense tropical cyclone in a high-resolution weather model. In particular, we identify key changes in wind, temperature, and relative humidity fields that help explain the rapid intensification process in modeled tropical cyclones.
In contrast to Plotkin et al. 14 , which analyzes the single most likely path toward extreme mesoscale weather, the present paper analyzes statistics of extreme mesoscale weather. Accurate estimation of statistics can require sampling numerous possible paths. To achieve this goal, therefore, it is appropriate to use a rare event sampling algorithm.
Here we present a new rare event sampling algorithm called Quantile Diffusion Monte Carlo (Quantile DMC) that is suited for complex real-world applications such as extreme weather simulations. The algorithm is simple to implement, yet suitable for a large class of nonlinear processes. When we apply Quantile DMC to a simple example, the algorithm is more efficient than direct sampling by a factor of more than a thousand.
Quantile DMC is a "splitting" algorithm, an algorithm in which some simulations are split into multiple replicas to promote progress toward the rare event of interest and other simulations are "killed". A key advantage of splitting algorithms is that they are practical to implement. Our simulations with Quantile DMC are simple to code and require the same computational cost as direct sampling from the dynamical model. In contrast, alternative rare event sampling approaches can be more challenging to implement, because they require modifying the underlying dynamical model or frequently starting and stopping the dynamics. For mesoscale weather models like the one simulated in the current paper, these manipulations would require extensive code development or substantial added computational cost.
Quantile DMC is inspired by a previous splitting algorithm called Diffusion Monte Carlo 4,5,[19][20][21][22] , but it incorporates two new features. First, in Diffusion Monte Carlo, splitting is typically uniform in time, but in Quantile DMC the intensity and frequency of splitting increase over time, improving efficiency. Second, Quantile DMC adaptively makes use of data from simulations, so that the algorithm requires less tuning compared to DMC.
We envision that Quantile DMC could be used to study the frequency of extreme mesoscale weather under different climate conditions. While a full application of rare event sampling techniques to study extreme weather is beyond the scope of the current paper, we test the feasibility of our approach using a high-resolution tropical cyclone model. In simulations, we use Quantile DMC to study the upper tail of the intensity distribution for numerical simulations of two historical tropical cyclones: 2010 Hurricane Earl and 2015 Hurricane Joaquin. Using an ensemble of N = 100 simulations, Quantile DMC produces more than seven times as many high-intensity Category 5 realizations for both of the storms compared to direct sampling. Moreover, the variance of important rare event statistics is improved by a factor of two to ten. Building on this success, we anticipate that we can improve the performance of this method in the future. This paper is organized into two major sections. Section II presents various approaches to estimating rare event probabilities: direct sampling, Diffusion Monte Carlo, and the new method Quantile DMC. Section III examines the potential role of rare event sampling in extreme weather and presents tropical cyclone simulations.

II. ESTIMATING RARE EVENT PROBABILITIES
In this section, we introduce several approaches to estimating rare event probabilities for potential use in climate and weather applications. First, we discuss the well-known method of direct sampling. Then, we describe a rare event sampling algorithm, called Diffusion Monte Carlo (DMC). We illustrate the advantages and disadvantages of DMC on a simple example. Then, we introduce Quantile DMC and explain why it gives more robust performance compared to standard DMC. Lastly, we discuss the implementation of Quantile DMC in practical settings.

A. Direct sampling
While direct sampling is a very useful tool for studying the typical behavior of complex or high-dimensional systems, it is not an ideal approach for investigating unlikely or infrequent phenomena. In particular, as we will demonstrate, direct sampling can give very high error when estimating statistics of rare events.
Direct sampling uses a straightforward approach to estimate probabilities. We assume X is a random process, A is an important event (e.g., the occurrence of an intense tropical cyclone), and we can draw independent samples of X, labeled ξ (1) , ξ (2) , ξ (3) , . . . , ξ (N ) . To estimate the probability p = P {X ∈ A}, direct sampling useŝ This estimator is called the sample average. To assess the error in direct sampling, we calculate the mean and variance ofp: On the surface, the variance of the sample averagep would appear to be quite good. In particular, the variance depends only on p and not on the process X. The process X can have millions of dimensions or even infinite dimensions and still the variance ofp converges to zero at a 1/N rate as N → ∞.
Surprisingly then, direct sampling estimatesp can have unacceptably high error in rare event calculations. For example, suppose p = 0.01 is the probability of a rare event A, and N = 100 is the sample size of simulations. Then,p may take the valuep = 0 with probability 0.37, the valuep = 0.01 with probability 0.37, the valuê p = 0.02 with probability 0.18, the valuep = 0.03 with probability 0.06, and higher values with probability 0.02. The error inp is overwhelming. If the estimatep is used for risk analysis, then the error inp might have harmful practical consequences.
How can direct sampling produce estimates that are simultaneously so good and so bad? In most applications, what is important is not absolute errorp − p, but rather relative error (p − p) /p. To assess the relative error in direct sampling, we calculate the mean and variance of p/p: To estimate a probability p with even one digit of precision, Var [p/p] must be many times smaller than 1, and this requires a sample size N that is many times larger than 1/p. When models are expensive to run and probabilities p are small, obtaining these large sample sizes is not a practical option.
Rare event sampling methods can help address the deficiency of direct sampling in estimating rare probabilities. For example, in carefully designed rare event splitting algorithms similar to Diffusion Monte Carlo and Quantile DMC it can suffice to increase sample size N as slowly as N ∼ log (1/p) as p → 0 to achieve fixed relative error 23 . This is an exponential improvement compared to direct sampling, where it is necessary to increase sample size N at a rate of N ∼ 1/p to achieve fixed relative error. The exponential improvement due to rare event sampling methods can make possible very precise calculations of rare event probabilities even with a limited ensemble size N .

B. Diffusion Monte Carlo
Diffusion Monte Carlo (DMC) is a sampling algorithm that causes simulations to explore regions of state space that would rarely be accessed under typical conditions. The earliest antecedents of DMC were splitting algorithms invented in the 1950s 4,5 . In the 1960s, DMC was popularized in the quantum chemistry community where researchers used DMC to obtain information about the ground state energy of the Schrödinger equation for chemical systems 19,20 . In the 2000s, the tools of DMC were increasingly applied to rare event sampling, and the algorithm became known in some circles as "genealogical particle analysis" 24,25 . Recently, DMC has been the subject of a series of mathematical analyses, which describe the convergence and asymptotic behavior of DMC as the ensemble size N approaches infinity 26,27 . Hairer and Weare 22 provide a more detailed history of DMC.
For a simple example of DMC, assume (X t ) t≥0 is a Markov process in R d , and ξ are simulations of X t , which are called "particles". To estimate the probability of a rare, important event A, the DMC algorithm iterates the following steps: forward from time t to a later time t .
2. Using a consistent set of rules, randomly "split" particles ξ (i) t that have moved much closer to A and randomly "kill" particles ξ (i) t that have moved much farther from A, making sure that the total number of particles N remains unchanged.
DMC uses splitting and killing to cause a greater number of particles to reach the rare event state A, compared to direct sampling.
In greater generality, the DMC algorithm is guided by one-dimensional coordinate θ : R d → R that is high in some regions of the state space R d and low in other regions of the state space. Where θ is high, DMC exhibits a greater propensity toward splitting. Where θ is low, DMC exhibits a greater propensity toward killing. The coordinate θ is often known as an order parameter or reaction coordinate. The particular choice of reaction coordinate can be crucial to the efficiency of DMC for computing rare event statistics.
A basic schematic of DMC is given in Figure 1. In this schematic, the reaction coordinate is the position θ (x) = x. Therefore, splitting and killing of simulations drives the process toward high values of x. To implement DMC on a computer, a sequence of actions are required. The user defines a series of resampling times 0 = t 0 < t 1 < t 2 < · · · . For each resampling time t k , the user specifies V k (x), a splitting function that increases with the reaction coordinate θ (x). DMC begins with an initialization step and then iterates over reweighting, resampling, and mutation steps according to the following definition: Definition II.1 (Diffusion Monte Carlo).
1. Initialization: Independently sample initial particles ξ (a) Reweighting: If k = 0, define initial weights (b) Resampling: By splitting and killing parti- , create an ensemble of up- The intialization and mutation steps in DMC are straightforward, but the reweighting and resampling steps require further elaboration. In the reweighting step, splitting/killing weights w are defined by means of the splitting functions V k . The simplest example of a splitting function is V k (x) = Cθ (x), where C > 0 is a splitting parameter that controls how many times a single particle can be split to create new copies.
In the resampling step, particles are split and killed according to the weights w In this formula, w (i) k is divided by w k is to ensure that the total number of particles satisfies E To define the particular distribution for the random numbers N , we use a low-variance resampling scheme called sorted stratified resampling 27,28 , the details of which we describe in the appendix. While a good choice of resampling scheme can slighly reduce DMC error, other factors determine a greater share of the error in DMC estimates. The splitting functions V k are the most important parameters for determining the dynamics of DMC.
The validity of DMC estimates is supported by mathematical analyses 26,27 . DMC estimates are unbiased and converge as the number of particles N tends to infinity under mild integrability conditions. These theoretical results are very general, holding true for systems with arbitrarily high dimension d. Any quantity that can be estimated by direct sampling can also be estimated by DMC. Estimates can include functions that depend on the entire path from time 0 until a later time t k .
Analysis of DMC supports the conclusion that DMC oversamples regions where the reaction coordinate θ is large and undersamples regions where θ is small. In particular, the distribution of particles 1 N N j=1 δ ξ (j) k converges weakly as N → ∞ to the distribution of X t k weighted by a likelihood ratio of Since V k (x) increases with the reaction coordinate θ (x), more particles occupy regions where θ is high, compared to direct sampling. In summary, we have defined Diffusion Monte Carlo and presented two key facts. First, DMC provides unbiased, convergent estimates. Second, at each resampling step t k , DMC moves particles to regions where the reaction coordinate θ is large.

C. Strengths and weaknesses of DMC: the Ornstein-Uhlenbeck example
The Ornstein-Uhlenbeck (OU) process is a onedimensional, linear process which we use to illustrate the strengths and weaknesses of DMC. On the one hand, DMC can effectively sample rare extreme deviations of the OU process. On the other hand, when sampling transformations of the OU process, DMC can require delicate tuning, which limits the practical effectiveness of the algorithm.
The OU process evolves under the dynamics where α > 0 is a constant. We present three properties of the OU process for later reference: 1. From any starting distribution, the OU process converges geometrically to an equilibrium distribution of N (0, 1). Therefore, it is a rare event for the OU process to reach a position much larger than the standard deviation of 1 at a large time T .
2. From any starting position X 0 = x, the mean of the OU process converges geometrically to 0 at a timescale of 1/α, that is, E [X t ] = xe −αt .
3. From any starting position X 0 = x, the variance of the OU process converges geometrically to 1 at a timescale of 1/ (2α), that is, Var We consider using DMC to estimate the probability that the OU process starting from X 0 = 0 exhibits a rare extreme deviation X 1 ≥ U . To set up the DMC algorithm, a natural choice of reaction coordinate is position θ (x) = x. We must also choose a series of resampling times 0 < t 1 < · · · < t K−1 < t K = 1 and splitting functions V k . The simplest choice is to set t k = k/K for k = 1, 2, . . . , K and V k (x) = Cθ (x), where C > 0 is a positive number. We refer to this strategy as "timehomogeneous" DMC, since the splitting intensity and splitting frequency are uniform in time.
As an alternative to time-homogeneous DMC, we also consider a "time-heterogeneous" resampling strategy. Since the OU process loses its memory exponentially quickly, we observe that random motion of particles at early times is not as important as random motion of particles at later times for determining final locations at time 1. Motivated by this observation, we can define resam- We can also define splitting functions V k (x) = Ce α(t k −1) θ (x). In this time-heterogeneous resampling strategy, strength and frequency of splitting increase exponentially with time. Splitting strength increases at the 1/α timescale with which the OU process mean reverts to zero. Splitting frequency increases at the 1/ (2α) timescale with which the OU process variance reverts to 1. We note that a similar suggestion to increase the strength of splitting appears in the work of Wouters and Bouchet 25 . The suggestion to increase the frequency of splitting is newly presented here. DMC can be used to estimate the probability p = P {X 1 ≥ U } that the Ornstein-Uhlenbeck process exceeds a threshold U at time 1. Following Definition II.1, estimates take the form We simulated rare extreme deviations of the Ornstein-Uhlenbeck process ten thousand times using DMC with a splitting intensity of C = 2.5. We then computed estimatesp and assessed error using the relative standard deviation Var [p/p]. Results are shown in Table I. The table shows for a range of U values that time-heterogeneous DMC is more accurate than time-homogeneous DMC. Moreover, when studying the most extreme rare events, time-heterogeneous DMC is more than fifty times more accurate than direct sampling. Thus, with direct sampling it would be necessary to increase the sample size by a factor of more than a thousand to obtain comparable error to time-heterogeneous DMC.
Having discussed the strengths of Diffusion Monte Carlo, we now turn to a discussion of the method's shortcomings. Under the best of conditions, Diffusion Monte Carlo is a highly effective rare event sampling strategy. However, to sample a transformation of the OU process, DMC can require delicate tuning, which limits the method's practical appeal.
Consider using DMC to sample extreme deviations of the process The process Y t is a nonlinear transformation of the OU process, with Y t = 4 exp {X t /4}. Figure 3 illustrates what can happen when DMC is used to sample Y t without a careful tuning of parameters. We apply DMC with a reaction coordinate θ (y) = y and splitting functions V k (y) = 2.5e α(t k −1) θ (y). The resulting DMC scheme performs well at the first resampling time, but as soon as the first particles reach positions y > 10, the algorithm becomes unbalanced. Particles with the highest positions y are split into dozens or hundreds of replicas. Thus, particle positions y become highly correlated, leading to volatile and error-prone estimates for rare event probabilities. While DMC can potentially be tuned to efficiently sample the process Y t , the tuning process requires great care and flexibility. In particular, no splitting function of the form V k (y) = Ce α(t k −1) θ (y) efficiently samples extreme deviations of Y t . A splitting function of a different parametric form is required. Moreover, tuning becomes much more difficult when sampling a process with unknown dynamics. Large amounts of data are necessary to tune DMC, and gathering the necessary data from a complex model can be computationally expensive.
We draw two essential observations from the Ornstein-Uhlenbeck example. First, DMC is most effective when the strength and frequency of splitting increase over time. Second, DMC is quite sensitive to the particular splitting functions V k that are used. This sensitivity potentially compromises the real-world performance of DMC and prompts the development of a more robust version of DMC.

D. Quantile DMC
Quantile DMC is an elaboration of the DMC algorithm algorithm, with additional adaptation steps that make the scheme more robust. The current section describes how adaptation steps are performed and explores the specific theoretical properties that explain the robustness of Quantile DMC.
The key difference between Quantile DMC and standard DMC is that Quantile DMC adaptively rescales the reaction coordinate θ to match a target distribution ν k . It is the rescaled reaction coordinate θ k that is used for splitting and killing of simulations.
To perform Quantile DMC, first define a series of resampling times 0 = t 0 < t 1 < t 2 < · · · . For each resampling time t k , specify a target distribution ν k for the rescaled reaction coordinate θ k and specify V k (x), a splitting function that increases with θ k (x). Quantile DMC begins with an initialization step and then iterates over adaptation, reweighting, resampling, and mutation steps according to the following definition: Definition II.2 (Quantile DMC).
1. Initialization: Independently sample initial particles ξ to ν k . Define the rescaled reaction coordinate Define the average weight w k = 1 k . (c) Resampling: By splitting and killing parti- , create an ensemble of up- Quantile DMC is distinguished from standard DMC by an adaptation step. The adaptation step begins by estimating the distribution of θ (X t k ) using data from simulations. At time t 0 , the distribution of θ (X 0 ) is es- . At later times, the distribution of θ (X t k ) is estimated using After estimating the distribution of θ (X t k ), Quantile DMC builds a transformation θ k = γ k (θ) so that the distribution of θ k (X t k ) approximates a target distribution ν k . In particular, Quantile DMC builds a transport function 29 from η k to ν k of the form Here, F η k is a distribution function for η k , defined by and F −1 ν k is a quantile function for ν k , defined by Since the transport function γ k maps the quantiles of η k to the quantiles of ν k , we call this algorithm Quantile DMC.
An explicit example of Quantile DMC helps illustrate the main features of this new algorithm. Consider using Quantile DMC to estimate the probability of extreme deviations of the process This is the same nonlinear transformation of the OU process that was responsible for a catastrophic failure of DMC in Section II C. To sample extreme deviations of the process Y t , we define a reaction coordinate θ (y) = y and target distributions ν k = N (0, 1). We use splitting functions Figure 4 presents results of these Quantile DMC simulations. The behavior of Quantile DMC is highly stable. Particles are nudged gently but forcibly in the direction of high θ values. Explicit error calculations confirm that Quantile DMC is just as effective at computing tail probabilities for this nonlinear transformation of the OU process as for the OU process itself. The main advantage of Quantile DMC, compared to DMC, is that Quantile DMC requires less tuning when it is applied to a wide class of nonlinear processes. This robustness is due to the fact that Quantile DMC estimates have the same distribution if the reaction coordinate θ is replaced with any other reaction coordinateθ that is a monotonic, one-to-one transformation of θ. Thus, for example, Quantile DMC is equally effective when sampling from the OU process or from a nonlinear, montonic transformation of the OU process. We note that this property of invariance under monotonic transformations is also shared by Adaptive Multilevel Splitting 30 and Steered Transition Path Sampling 31 and thus appears to be an important property underlying the success of a variety of effective rare event sampling methods. We refer the reader to the appendix for a proof of this theoretical property and further discussion of convergence properties for Quantile DMC estimates.

E. Implementation of Quantile DMC
Quantile DMC provides a method to estimate rare event probabilities with much reduced computational effort compared to direct sampling. For example, Figure 5 compares tail probabilities estimated using Quantile DMC with N = 1, 000 particles to the same tail probabilities estimated using direct sampling with N = 1, 000, 000 particles. When sampling from the OU process, Quantile DMC achieves better accuracy than direct sampling but uses one thousand times less computational power.
FIG. 5. Extreme tail probabilities obtained using Quantile DMC with N = 1, 000 particles are more accurate than tail probabilities obtained using direct sampling with N = 1, 000, 000 particles.
Quantile DMC is straightforward to implement. Whereas some rare event sampling algorithms can require additional simulation time, additional storage, or additional manipulations of the underlying dynamical model compared to direct sampling, this is not the case with Quantile DMC. The only additional cost of using Quantile DMC is the cost of resampling, and the cost of resampling is often negligible compared to the cost of running complex simulations forward in time. For the tropical cyclone simulations presented in Part III, resampling was completed with ten lines of code and a few seconds of processing time.
To maximize the efficiency of Quantile DMC, parameters should be adjusted depending on the particular rare event sampling problem being investigated. To illustrate this tuning process, we investigate optimal parameter choices for sampling from the OU process with resam-pling times, target distributions, and splitting functions With this resampling schedule and this approach to splitting and killing particles, four parameters can impact the quality of Quantile DMC estimates: the timescale parameter α, the tilting constant C, the number of resampling times K, and the number of particles N .
The simplest parameter to analyze is the timescale parameter α. Ideally, the parameter α should be the same timescale as the underlying Ornstein-Uhlenbeck process. However, if the parameter α is over-or under-estimated by a factor of two, we found in our experiments that error increases by less than 20%.
To ensure Quantile DMC's effectiveness, the tilting constant C must be adjusted depending on the rareness of the probabilities being investigated. When estimating a tail probability P {X 1 ≥ U } for the OU process, the optimal tilting constant C lies within one or two units of U . This numerical result is consistent with the fact that particles at time t = 1 are approximately normally distributed with mean C and variance 1, and the N (C, 1) distribution is a suitable importance sampling distribution for estimating P {X 1 ≥ U } when U lies close to C.
Parameters K and N should also be increased as probabilities being investigated become rarer. For example, when estimating P {X 1 ≥ 2}, 100 particles and 10 resampling times yield near-maximal efficiency. When estimating P {X 1 ≥ 4}, 1000 particles and 100 resampling times are required for near-maximal efficiency. In the second situation, the probability being estimated is rarer and consequently more computational power is required.
The efficiency gains from using Quantile DMC instead of direct sampling become most dramatic when the number of particles N exceeds a critical threshold. Figure 6 shows how Quantile DMC error decays quickly, at a faster than N −1/2 rate as N is increased from 10 to 1000. Once the ensemble size reaches N = 1000 particles, then error decreases less quickly, at an asymptotic N −1/2 rate.
The tuning of parameters α, C, K and N must be coupled with a careful selection of reaction coordinate θ to guarantee the effectiveness of Quantile DMC. The goal of Quantile DMC is to emphasize paths leading to a rare event A via splitting and to deemphasize paths leading away from A through random killing. Thus, the ideal reaction coordinate θ should anticipate what paths lead to the rare event A. One specific reaction coordinate that is appropriate for this goal is therefore the conditional probability function Here, the reaction coordinate θ k changes at each resampling time t k . This reaction coordinate is proven to be optimal for a splitting algorithm similar to Quantile DMC 23 . With Quantile DMC, identifying a reaction coordinate can be easier than with DMC because any monotonic, time-dependent transformation of P ( A| X t k = x) serves equally well as a reaction coordinate. However, in a complex, high-dimensional system, it can be challenging even to approximate a monotonic transformation of P ( A| X t k = x). In our analysis of tropical cyclone simulations in Part III, we conclude that identifying an appropriate reaction coordinate requires careful data analysis and scientific insight into the model being simulated.
With an imperfect choice of reaction coordinate Quantile DMC users should take care not to resample too often. In our experiments with the OU process, we can resample more than K = 100 times without any adverse effects because the process of splitting/killing particles is carefully tuned to the underlying dynamics. However, for more complicated problems in which an ideal choice of θ is not available, resampling should be performed as little as possible, while sustaining the necessary particle dynamics 32 . When testing a new reaction coordinate, it is a good strategy to start with just a few resampling times and increase resampling frequency once the coordinate is proven to be effective.
In a complex, high-dimensional system, where the reaction coordinate θ is imperfect, it is essential to gauge the quality of DMC estimates by providing error bars. For estimates it is possible to estimate variance σ 2 using 33 Here, anc ξ K back to an initial particle ξ (i) 0 , anc ξ (j) K is the index i for the initial particle. We note that the variance estimatorσ 2 is most accurate when the sample size N is very high and when the number of resampling times K is small, which is not always the case in practical simulations. Properties of the variance estimatorσ 2 are discussed at more length in the appendix.
In summary, Quantile DMC is straightforward to implement and becomes increasingly effective relative to other sampling methods as parameters of splitting/killing are tuned and as the number of particles N increases. However, Quantile DMC can perform poorly when the reaction coordinate θ fails to anticipate paths leading to rare event states. Lastly, we have shown how to provide rough error bars that assess the accuracy of Quantile DMC estimates.

III. EXTREME MESOSCALE WEATHER
In this section we discuss why it is difficult to estimate the frequency of extreme mesoscale weather and how rare event sampling methods like Quantile DMC can potentially assist in calculations. Then we focus attention on the frequency of intense tropical cyclones and present simulations which illustrate both the potential benefits of rare event sampling and the work that remains to be done.

A. Frequency of extreme weather
While extreme weather events such as heat waves, floods, and tropical cyclones are rare, they can cause immense damage and fatalities 15,16,34 . Understanding the frequency of extreme weather is therefore an essential task, both for real-world disaster preparedness and for assessing weather's impact on society. The study of weather extremes is more relevant than ever, since evidence points to changing frequencies of extreme weather events with climate change [34][35][36] .
To understand the frequency of extreme weather, observations provide the most fundamental data source, but this data has important limitations. For many potential extreme weather events no historical analogue exists. Storms can occur in surprising places. Droughts can afflict new areas. Even when historical data are available, measurements can be sparse and sometimes corrupted 37 . Most critically, as the climate changes, the frequency and intensity of tropical cyclones, of heat waves, and of flooding are expected to change, so that historical measurements will become less relevant [34][35][36] . For all these reasons, climate simulations provide essential additional insight into extreme weather in historical, current, and potential future climates.
When modeling extreme weather, there is a tradeoff between bias and variance. While inexpensive models can be run many times, leading to low-variance estimates, these estimates can be highly biased. More computationally intensive models potentially provide a less biased climatology of extreme weather 17,18,36 yet these models cannot be run for as long or with as many ensemble members, due to limited computational resources. Because of these practical limitations, estimates can have high variance, particularly for the rarest, most extreme weather events.
Mesoscale extreme weather, such as floods and tropical cyclones, provides a stark example of the need for increased statistical accuracy without sacrificing model fidelity. Because mesoscale extreme weather occurs on a smaller spatial scale (10-1000km) at which simplifying assumptions for thermodynamics and dynamics begin to fail, running accurate mesoscale weather models can be enormously expensive. It is here that the computational burden is the greatest. It is here, therefore, that rare event sampling methods stand to provide the greatest benefit.
In extreme mesoscale weather simulations where rare event sampling could potentially provide a benefit, a process known as "dynamical downscaling" 38 is now common. First a Global Climate Model (GCM) simulates a coarse-resolution version of an extreme weather event. Using initial and boundary conditions from the GCM, a high-resolution regional model with more complex physics then enhances the GCM output, simulating the local details of the extreme weather event.
The dynamical downscaling approach is necessary because current GCMs cannot simulate the details of mesoscale weather that are essential for damage assessment. For example, the peak winds of a tropical cyclone are underestimated by a GCM 2 and a GCM cannot resolve the overflowing riverbeds that lead to flooding 18,36 . Versions of dynamical downscaling have become standard in storm surge modeling 3 , flood modeling 36 , and tropical cyclone modeling 2 .
A simple probabilistic interpretation helps clarify how dynamical downscaling can estimate the probability of an extreme weather event occurring. Let A denote an extreme event and let B denote the coarse-scale meteorological conditions that are necessary for event A to occur. We can then write, A GCM is used to evaluate the first probability P {conditions B}; whereas, a regional model is used to evaluate the second probability P { event A | conditions B} using output from the GCM. For example, Bender and coauthors 2 identified protocyclones in a GCM and then used a high-resolution regional model to simulate the intensity evolution of proto-cyclones into full-fledged tropical cyclones.
Applying rare event sampling in a dynamical downscaling context is a multi-tiered process. A GCM can be run either directly or with a splitting method such as Quantile DMC. Then, starting from the intial conditions selected from GCM output, a regional weather model can be run either directly or with a splitting method such as Quantile DMC. At multiple stages of the dynamical downscaling process, statistics can be potentially improved by a judicious application of rare event sampling.
In summary, there is a pressing need for rare event sampling of mesoscale extreme weather, since mesoscale simulations are enormously expensive. The incorporation of rare event sampling into extreme weather calculations can enable higher-resolution, more computationally intensive models, ultimately leading to more accurate extreme weather risk assessment.
B. Tropical cyclone test case

Motivation for simulations
Tropical cyclones rank among the deadliest natural disasters in human history. Approximately 300,000 died in the 1970 Bhola cyclone, and an estimated 138,000 died in the 2008 cyclone Nargis 39,40 . While high-intensity tropical cyclones (TCs) are rare, they are the most destructive and fatal TCs 16,41 . Moreover, the frequency of the most intense storms is expected to increase with climate change, the precise rate of change being an open area of research 35 . Understanding the upper tail of intensity for TCs is therefore of paramount societal concern.
Reducing computational cost is a central priority for the TC modeling community. TCs are most accurately simulated using high-resolution weather models with 1-10km horizontal resolution 17 . High spatial resolution is required to resolve the storm eyewall, where winds are the strongest. Increasing horizontal resolution can lead to more accurate simulations; however, increased resolution comes at a steep computational cost. Doubling horizontal resolution requires an eight-fold increase in computational expense because resolution must be doubled in the zonal and meridional directions, and the timestep must be cut in half to ensure numerical stability.
Helping to alleviate the computational burden of TC modeling, rare event sampling potentially provides a means to accurately estimate TC statistics with a reduced sample size of high-resolution simulations. Here, as a proof of concept, we apply Quantile DMC to estimate statistics for two high-intensity TCs. For initial and boundary conditions, we use reanalysis data for two storms that achieved Category 4 status in the real world but did not achieve the highest intensity level, Category 5 status. We model these storms using a stochastic model that predicts a range of possible intensities, different from the real-world intensities of Hurricane Earl and Hurricane Joaquin. Starting from coarse-scale proto-cyclones, we nudge the evolution of storms toward high intensities using the Quantile DMC algorithm and then estimate the probability of high-intensity manifestations of these storms. We then compare the efficiency of Quantile DMC to the efficiency of direct sampling for this estimation problem.
These simulations are envisioned as a first step toward the goal of using Quantile DMC to study the probability of intense TCs in historical, present, and future climates. In our simulations, we use Quantile DMC to provide statistics for just two storms. In the future, however, as Quantile DMC is applied to study the frequency of intense TCs in different climates, it will be necessary to start simulations from an ensemble of hundreds of different proto-cyclones. For these initial investigations, we use a small sample size N = 100, whereas a sample size of N = 1000 would be more appropriate for a full implementation of Quantile DMC in the future.

Simulation details
The 2010 storm Earl was a long-lived hurricane of tropical origin that came very close to the Eastern seaboard of the United States but ultimately did not make landfall, passing 150km off the coast of Massachusetts. The 2015 storm Joaquin was a hurricane of extratropical origin that intensified more rapidly than expected, leading to the worst U.S. maritime disaster in decades, the sinking of the cargo ship El Faro with all 33 sailors aboard 42 .
We model Hurricanes Earl and Joaquin using the Advanced Research Weather Research and Forecasting Model 43 (ARW, version 3.9.1.1), which has been applied extensively for hurricane research in the past 17 . ARW is a finite-difference model whose governing equations include energy conservation, mass conservation, a thermodynamic law, and an equation of state. ARW incorporates parametrized physics schemes for precipitation processes, heat/moisture fluxes over land, radiation, and mixing in atmospheric columns. ARW uses artificial dissipation and filters to achieve numerical stability.
We have previously used the ARW model to simulate intense tropical cyclones 14 ; however, our previous simulations did not incorporate the Quantile DMC algorithm. In Plotkin et al. 14 , we used an optimization strategy to identify a maximum likelihood pathway for a simulated tropical cyclone to achieve a high intensity at a terminal time. While the optimization approach is useful for identifying factors that can cause tropical cyclones to intensify, it is less suitable for computing tropical cyclone statistics. Statistics of TCs can depend on myriad possible paths, and sampling these paths is a necessary requirement for accurate estimation of statistics. Thus, in the current work, we present Quantile DMC as an additional tool, uniquely suited to low-variance calculation of tropical cyclone statistics.
Among weather models, ARW has the advantage that it supports vortex-following nested domains. Three domains of different resolutions are often used in TC sim-ulations. The outer domain is static, while the inner domains follow a local minimum in the 500hPa geopotential height field, indicating a TC's location. Vortexfollowing domains enable high resolution around the eye of the storm without the computational expense of high resolution across the entire storm path.
Our simulations use a timestep of 6.7s and horizontal resolution of 2km in the inner domain, a timestep of 20s and resolution of 6km in the middle domain, and a timestep of 60s and resolution of 18km in the outer domain. The inner domain stretches 468km x 468km, the middle domain 1404km x 1404km, and the outer domain 5382km x 5382km. All domains use 40 vertical levels. The physics parametrizations are the same as used by Judt, Chen, and Berner 44 . Convection is explicitly simulated in the two inner domains whereas convection is parametrized in the outermost domain.
Simulations are randomly perturbed using the Stochastic Kinetic Energy Backscatter (SKEB) scheme 45 . Perturbations from this physics scheme are smooth in space and time, but they change rapidly, modeling the effects of small-scale turbulent processes. Potential temperature and the non-divergent component of horizontal wind are both independently perturbed with forcing terms where H j,k are the Fourier modes for the domain and C j,k,z are constants that produce a westward phase tilt in the perturbation field. F j,k terms evolve randomly, according to a complex-valued Ornstein-Uhlenbeck process with decorrelation timescale α −1 ≈ 0.5hrs, that is, where W j,k denote independent complex-valued Brownian motions and the noise amplitude σ 2 j,k decreases as a power law of j 2 + k 2 . The SKEB scheme is implemented as a physics module within the ARW software. The paper of Berner et al. 45 describes how this physics module discretizes the underlying OU process dynamics as a first-order autoregressive process. At every ARW timestep, the perturbation field is updated, and perturbations provide a series of small, frequent changes influencing model dynamics.
In our previous simulations of tropical cyclones 14 , we used an alternative probabilistic model for ARW perturbations, different from the SKEB scheme. In Plotkin et al. 14 , a Gaussian perturbation is applied once per hour to the ARW model. The optimization strategy used in Plotkin et al. 14 would not have been practical to apply to the SKEB scheme. The SKEB scheme perturbs simulations many times per minute, making a derivative-based optimization challenging, but leading to small and physically realistic perturbations.
We use direct sampling and Quantile DMC to sample extreme intensities of Earl and Joaquin with an ensemble size of N = 100 simulations. Direct sampling runs are seven-day forward runs of the ARW model with SKEB. Quantile DMC runs are seven-day forward runs, which incorporate splitting and killing of trajectories. Since the ARW model with SKEB is a Markovian model, Quantile DMC provides unbiased estimates of a wide range of statistics of tropical cyclones, including pathways, characteristics, and frequencies. For simplicity, however, this paper discusses only the probability of intense tropical cyclones occurring.

Earl
Joaquin t0 start August 27 00:00 September 29 00:00 t1 resample 1 August 30 00:00 October 1 00:00 t2 resample 2 August 31 00:00 October 2 00:00 t3 resample 3 August 31 12:00 October 2 12:00 t4 resample 4 September 1 00:00 October 3 00:00 t5 end September 3 00:00 October 6 00:00 Table II describes the specific start and end times for the simulations. Start times are selected near the beginning of the hurricane life cycle. End times are selected seven days after start times, which gives hurricanes sufficient time to reach peak intensity and then recede for at least two days. For the Quantile DMC runs, we select a time T when we expect each storm to achieve peak intensity. We resample at times t 1 = T − 48hr, t 2 = T − 24hr, t 3 = T − 12hr and t 4 = T . Thus, resampling increases in frequency in an attempt to maximize Quantile DMC efficiency. Resampling stops once hurricanes are expected to achieve peak intensity, since resampling too often can increase the variance of Quantile DMC estimates. The specific time T for each storm is identified using official best track historical data 46 .
The parameters for our Quantile DMC simulations are defined as follows: 1. The reaction coordinate θ is the deviation of sea surface pressure from a hydrostatically-balanced reference state 43 at the storm core.
2. We assume the reaction coordinate θ can be modeled as a monotonic, time-dependent transformation of an OU process. Since N (0, 1) distributions work well as target distributions when sampling from the OU process, our target distribution is N (0, 1) at each adaptation step.
3. Splitting functions take the form V k = Ce α(t−T ) θ k . The decorrelation timescale α −1 = 3d is the appropriate timescale for large-scale differences to emerge in TC development in the ARW model 44 . The splitting constant C = 1 is appropriate for estimating intensity quantiles up to the 99.9th percentile of intensity. Moreover, when sampling an OU process with a splitting constant of C = 1, four rounds of resampling and a sample size of N = 100 simulations are sufficient to ensure the effectiveness of Quantile DMC compared to direct sampling.
For both Quantile DMC and direct sampling, equivalent computing resources are required on the University of Chicago Research Computing Center highperformance cluster: 100 nodes ran continuously for 2 days with 28 CPUs per node and 2 gigabytes of RAM per CPU.

Simulation results
In Figure 7, we present intensity trajectories for direct sampling and Quantile DMC runs for Hurricane Earl and Hurricane Joaquin. Direct sampling trajectories typically occupy the middle quantiles of intensity, whereas Quantile DMC trajectories are more likely to occupy the upper quantiles of intensity. For example, direct sampling produces zero Category 5 realizations of Earl and only four of Joaquin. In contrast, Quantile DMC produces three Category 5 realizations of Earl and 22 of Joaquin. Therefore Quantile DMC is successful at simulating more intense storms compared to direct sampling.
When presenting results, we measure TC intensity using minimum sea surface pressure. While it is also common to see TC intensity reported using maximum wind speed, econometric analysis finds that minimum sea surface pressure is a better predictor of TC damage and fatalities than is wind speed 41 . Pressure combines information on wind speed and storm size 47 , thereby giving a more holistic indication of TC damage 48 . We note that historically the Saffir-Simpson hurricane scale combined maximum wind speed, minimum pressure and maximum storm surge information to classify TCs into Categories 1 (least intense) to 5 (most intense). More recently, the scale was renamed the Saffir-Simpson hurricane wind scale, and Categories 1 -5 are defined by maximum wind speeds alone 49 . When presenting results, we use the historical Saffir-Simpson hurricane scale to define Categories 1 -5 in terms of minimum sea surface pressure.
We can use data from Quantile DMC and direct sampling runs to estimate a range of statistics associated with storm intensity. In particular, we estimate cumulative distribution functions for random variables P T and P lif e . P T is TC intensity at the particular time T when each storm is expected to reach maximum strength, namely, September 1 00:00 UTC for Earl and October 3 00:00 UTC for Joaquin. P lif e is the strongest TC intensity over the entire seven days of simulated time. The cumulative distribution function for P T is defined by F (U ) = P {P T ≤ U }, where U ranges over all possible pressures. To estimate P {P T ≤ U } from Quantile DMC data, we use To estimate P {P T ≤ U } from direct sampling data, we use We apply analogous formulas when estimating the cumulative distribution function for P lif e . Figure 8 provides side-by-side comparisons between the direct sampling and Quantile DMC estimates. Intensity estimates from Quantile DMC are statistically consistent with intensity estimates from direct sampling. For example, the estimated probability for Earl to reach Category 5 status is 0% from direct sampling and 0.2% from Quantile DMC. The estimated probability for Joaquin to reach Category 5 status is 4% from direct sampling and 6% from Quantile DMC. The disagreement between estimates falls well within the range of random error, particularly if one or the other estimate has high relative variance. The cumulative distribution functions estimated from direct sampling exhibit sharper jump discontinuities compared to the relatively smooth behavior of the CDFs from Quantile DMC. This jumpy behavior may reflect a greater degree of error in the direct sampling estimates if it can be assumed that the distribution of hurricane pressures is smooth.
To check the hypothesis that Quantile DMC estimates are more accurate, we can use data from Quantile DMC and direct sampling runs to gauge the variance in our estimates. For Quantile DMC estimates of P {P T ≤ U }, we assess variance usinĝ For direct sampling estimates of P {P T ≤ U }, we assess variance usingσ 2 = 1 Np (1 −p). Figure 9 provides side-by-side comparisons between direct sampling and Quantile DMC relative variancesσ 2 / p 2 . For many important rare event sampling estimates, Quantile DMC provides substantial variance reduction compared to direct sampling. When estimating the P T distribution, Quantile DMC gives reduced variance for all pressures lower than 925hPa for both Earl and Joaquin. When estimating the P lif e distribution, Quantile DMC gives reduced variance for all pressures lower than 925hPa for Earl and 916hPa for Joaquin. At the lowest pressures, the variance of Quantile DMC is two to ten times lower than the variance of direct sampling.
An important reason for presenting the P T distribu- tion is to show that Quantile DMC is highly effective at sampling intense storms at the reference time T . More generally, Quantile DMC is effective at sampling intense storms in the twelve hours leading up to the reference time T and the twelve hours following the reference time T . Outside this window of time, Quantile DMC may be less effective at sampling intense storms. It is therefore of central importance to make sure the reference time T aligns with the time of maximum intensity. To achieve this goal, it may be necessary to run the model in advance of Quantile DMC simulations or at least predict based on initial and boundary conditions when the model will achieve peak intensity. The one anomalous result in the pattern of variance reduction due to Quantile DMC is Quantile DMC's limited benefit when estimating the P lif e distribution for Hurricane Joaquin. To shine light on the limitations of Quantile DMC for this particular estimation problem, we can examine the family weights contributing to the variance estimatorσ 2 . When estimating the probability for Joaquin to reach Category 5 status, the eight nonzero family weights W (i) are 0.2, 0.2, 0.3, 0.3, 0.5, 0.6, 0.9, and 3.1. The largest family weight of W (i) = 3.1 accounts for 85% of the varianceσ 2 . Without this particular W (i) value,σ 2 would be 3.5 times smaller for Quantile DMC compared to direct sampling. Because of this particular W (i) value,σ 2 is instead 1.9 times larger for Quantile DMC compared to direct sampling.
The largest weight of W (i) = 3.1 belongs to a family of late-developing storms. The initial simulation in this family exhibited low intensity at the first resampling time, but it was not killed during the resampling step due to random chance. Over time, the simulation increased in intensity. The single particle was eventually split into six family members, and ultimately all six family members achieved Category 5 status. The high variance of Joaquin P lif e estimates can be wholly attributed to this single family of late-developing storms.
The story of the largest family weight W (i) = 3.1 illustrates key areas in which our design of Quantile DMC simulations of tropical cyclones was not optimal. First, the reaction coordinate θ failed to anticipate which trajectories would lead to high intensities at later times. At the first resampling time, the value of θ ξ would go on to achieve Category 5 status. This rapid intensification was not predictable using sea surface pressure as our reaction coordinate. However, an improved reaction coordinate θ could predict lifetime intensity based on additional variables such as steering flow, vertical wind shear, and relative humidity. With an improved reaction coordinate, future intensity could be more accurately identified, thereby reducing Quantile DMC variance.
A second shortcoming in the design of simulations was the poor identification of the time T for Joaquin to reach peak intensity. For Earl, the reference time T was correctly identified as September 1 00:00 UTC, the approximate time when storms achieved peak intensity in direct sampling and Quantile DMC runs. For Joaquin, on the other hand, the reference time T was incorrectly identified as October 3 00:00 UTC, nearly twenty-four hours before peak intensity occurred in direct sampling and Quantile DMC runs. With a later reference time T , latedeveloping storms could be more appropriately sampled, reducing the variance of P lif e estimates for Hurricane Joaquin.
We report three takeaway messages for future applications of Quantile DMC to study intense tropical cyclones.
First, with an increased ensemble of N = 1000 simulations, we are optimistic that Quantile DMC can provide low-error estimates of extreme tail probabilities. Already with a small sample of N = 100 simulations, we see signs of reduced variance using Quantile DMC. But with increased sample size N , we expect the error of Quantile DMC to shrink with a faster-than-N −1/2 scaling, enhancing Quantile DMC's advantages over direct sampling. When estimating the most extreme quantiles of intensity, a splitting method such as Quantile DMC truly excels.
Second, when simulating cyclones, Quantile DMC is a more convenient algorithm to use than standard DMC. Using Quantile DMC, the decision to select a splitting constant of C = 1 is straightforward. In contrast, using DMC, similar results can only be obtained with the foresight to select a splitting constant C with C −1 being the standard deviation of intensity, namely 7hPa to 8hPa. Moreover, in future simulations of tropical cyclones, there is the possibility of highly skewed or bimodal intensity distributions 50 . In these contexts, the additional robustness of Quantile DMC over DMC may provide a further benefit.
Lastly, to maximize the potential benefit of Quantile DMC in future simulations, it is of paramount importance to improve the reaction coordinate θ and estimates of the time of maximum intensity . In our simulations, the poor identification of these parameters led to added variance in some Quantile DMC calculations. Improved parameters θ and T would better predict lifetime storm intensities, alleviating the problem of variance inflation.
The search for improved predictions of lifetime intensity is challenging in part due to a limited understanding of the precursors and dynamics of rapid intensification 51 . In complementary work, we develop a rare event analysis tool that offers insight into the physics of TC rapid intensification 14 . Such a technique could potentially identify a more suitable and predictive reaction coordinate for future Quantile DMC applications. By incorporating careful data analysis and scientific insight into the model being simulated, future work can potentially improve the efficiency of Quantile DMC in tropical cyclone simulations.

IV. CONCLUSION
Efficient sampling of extreme mesoscale weather remains one of the outstanding computational challenges of the 21st century. Extreme weather, such as tropical cyclones, squall lines, and floods, has a tremendous impact on human society, yet assessing the frequency of extreme weather in past, current, and projected future climates is extremely difficult. Responding to this challenge, we have introduced a new rare event sampling algorithm, Quantile DMC. Combining Quantile DMC with dynamical downscaling provides a new paradigm for calculating extreme weather statistics. This approach potentially enables high-accuracy, computationally-intensive models to be run with reduced computational cost, raising the quality of extreme weather statistics.
In Sections II D and II E, we have provided a practical guide to using Quantile DMC. In particular, we offer specific recommendations for the parameters to be used in the algorithm. When computing tail probabilities for the Ornstein-Uhlenbeck process, Quantile DMC is over a thousand times more efficient than direct sampling and is more stable than Diffusion Monte Carlo. When computing tail probabilities for intense tropical cyclones, Quantile DMC is two to ten times more efficient than direct sampling, with the possibility for greater efficiency in future simulations.
There remain important challenges in applying Quantile DMC to simulate extreme weather events. In our simulations of tropical cyclones, we observe that Quantile DMC's performance depends on a reaction coordinate, a one-dimensional coordinate that anticipates the occurrence of high-intensity weather. The reaction coordinate that we used in our simulations was not optimal, and we anticipate using an improved reaction coordinate in future TC sampling. Fortunately, even with an imperfect choice of reaction coordinate, it is possible for Quantile DMC to provide a reduction of variance.
We acknowledge two issues that affect the future of rare event sampling of extreme weather. First, splitting methods like Quantile DMC can only be successful if extreme weather is simulated stochastically. However, stochastic models are easily available and increasingly used in many state-of-the-art geophysical computations 52 , so this does not present a major limitation in practice. Second, rare event sampling is unnecessary if there exist models that are both accurate and inexpensive to run. While we acknowledge some statistical models and simplified physics models perform remarkably well in today's extreme weather calculations 38,53 , our outlook is to the future. We believe high-resolution three-dimensional models will eventually provide the greatest accuracy in all areas of extreme weather inference. Developing rare event sampling methods is therefore essential preparation for the future as computationally intensive ensem-bles become the authoritative source of insight in extreme weather inference.

ACKNOWLEDGMENTS
We acknowledge support from that National Science Foundation under NSF award number 1623064. RW and JW are supported by the Advanced Scientific Computing Research Program within the DOE Office of Science through award de-sc0014205. RW was supported by NSF RTG award number 1547396 at the University of Chicago and by a MacCracken Fellowship at New York University. DP was supported by the Department of Energy Computational Science Graduate Fellowship Program of the Office of Science and National Nuclear Security. MON was partially supported by the T.C. Chamberlin Postdoctoral Fellowship at the University of Chicago. The work was completed with resources provided by the University of Chicago Research Computing Center. RW acknowledges Alicia Zhao for her gracious and patient editorial assistance.

A. Resampling schemes
There are many possible resampling schemes that can be used during DMC's resampling step. One scheme that works well in practice is sorted stratified resampling 27,28 . The scheme first sorts particles ξ using stratified resampling.

Sorting:
Reindex the particles and weights w as follows: Select updated particlesξ  It can be checked that each particle ξ (i) k is duplicated an expected number of w (i) k /w k times. Therefore, sorted stratified resampling is a valid resampling scheme.

B. Invariance under monotonic transformations
Quantile DMC is unchanged if the reaction coordinate θ is replaced with a reaction coordinateθ that is a monotonic, one-to-one transformation of θ. To show this we provide an alternate description of Quantile DMC's process for splitting/killing particles that makes clear the property of invariance under monotonic transformations.
First, we introduce an order relation x ≺ y which indicates θ (x) < θ (y) and an equivalence relation x ∼ y which indicates θ (x) = θ (y). We observe that the order relation x ≺ y and equivalence relation x ∼ y remain unchanged if θ is replaced byθ.
Second, define the quantiles p where The quantitities p (i) k are approximate quantiles for the distribution θ (X t k ). Thus, if p (i) k = .9 there is an approximate one-in-ten chance that θ (X t k ) takes a value as high as θ ξ (i) k . Third, the splitting function takes the values In this description Quantile DMC only relies on the reaction coordinate θ through the order relation x ≺ y and equivalence relation x ∼ y. These two relations are unchanged under monotonic, one-to-one transformations of the reaction coordinate.
In addition to the property of invariance under monotonic transformations, Quantile DMC also has the property that estimates are unbiased, which can be established following standard martingale arguments 26,27 . Numerical evidence indicates that estimates converge with an asymptotic 1/ √ N error rate, as N → ∞. However, a rigorous mathematical analysis of Quantile DMC's error remains a task for future research.

C. Variance estimation for Quantile DMC
The variance estimator (28) for Quantile DMC was originally developed assuming a different resampling scheme called Bernoulli resampling is used 33 . Indeed, when the DMC algorithm is performed using Bernoulli resampling, the variance estimator is asympotically consistent as N → ∞. We have chosen to use a different resampling scheme which gives better performance than the Bernoulli resampling scheme. Consequently the variance estimatorσ 2 is biased toward overestimating the variance. Numerical experiments with the Ornstein-Uhlenbeck process suggest that the variance is inflated by less than 10%, at least for K ≤ 10 resampling times. Another potential concern is the natural variability in the estimatorσ 2 . We find thatσ 2 is most reliable when family weights are nonzero for many indices i. Consequently,σ 2 is most reliable when sample size N is high and the number of resampling times K is low. We find that the variance estimatorσ 2 provides a valuable tool for gauging the quality of Quantile DMC estimates. We caution the reader, however, there may be situations outside the current context where the variance estimatorσ 2 behaves poorly.