Can we use linear response theory to assess geoengineering strategies? Can we use linear response theory to assess geoengineering strategies?

Geoengineering can control only some climatic variables but not others, resulting in side-effects. We investigate in an intermediate-complexity climate model the applicability of linear response theory (LRT) to the assessment of a geoengineering method. This application of LRT is twofold. First, our objective (O1) is to assess only the best possible geoengineering scenario by looking for a suitable modulation of solar forcing that can cancel out or otherwise modulate a climate change signal that would result from a rise in carbon dioxide concentration [CO 2 ] alone. Here we consider only the cancellation of the expected global mean surface air temperature ∆ (cid:104) [ T s ] (cid:105) . It is in fact a straightforward inverse problem for this solar forcing, and, considering an inﬁnite time period, we use LRT to provide the solution in the frequency domain in closed form as f s ( ω ) = ( ∆ (cid:104) [ T s ] (cid:105) ( ω ) − χ g ( ω ) f g ( ω )) / χ s ( ω ) , where the χ ’s are linear susceptibilities. We provide procedures suitable for numerical implementation that apply to ﬁnite time periods too. Second, to be able to utilize LRT to quantify side-effects, the response with respect to uncontrolled observables, such as regional averages (cid:104) T s (cid:105) , must be approximately linear . Therefore, our objective (O2) here is to assess the linearity of the response. We ﬁnd that under geoengineering in the sense of

Geoengineering strategies with the aim of mitigating climate change are receiving increasing attention, 1-10 not only because of their potential to solve one of the greatest challenges faced by modern society, but also because of the great risk that such an unprecedented endeavor entails. Here we would like to advocate that the study of climate change in general, and geoengineering in particular, would benefit from response theory 11,12 and the theory of nonautonomous dynamical systems. [13][14][15][16][17][18][19][20] These mathematical tools were introduced into climate science many years ago, [21][22][23] but only recently have they started to really gain traction. [24][25][26][27][28][29][30][31][32][33][34][35][36][37] The first application of response theory to the study and efficient assessment of geoengineering in particular was by Kravitz and MacMartin. 38 They assessed the linearity of the response, but regarding global averages only. However, regional temperature responses to radiative forcing can be nonlinear, 32,[39][40][41] and there has been an indication 39 that they can be nonlinear in the case of geoengineering too. We show that it is possible to describe in a concise and general way the response of the climate system to two or more forcings with given timedependent modulations. In particular-and this is the case of interest in geoengineering-if a forcing is given, one can arrange the time modulation of N other forcings in such a) Electronic mail: bodai@pusan.ac.kr a way as to achieve a desired time-dependent change for N climatic observables of interest. The pitfall of this approach is that (a) the response of any other observable is, in principle, uncontrolled and (b) nonlinearities can become more and more relevant as forcings are added to the system. This indicates that there are some fundamental caveats in the setup of geoengineering strategies.

I. INTRODUCTION: USING RESPONSE THEORY TO FORMULATE GEOENGINEERING STRATEGIES AND OUTCOMES
First we summarize briefly the existing mathematical tools (Sec. I A) that will provide us with the framework to describe the geoengineering problem as an inverse problem (O1) (Sec. I B). We will then elucidate the utility of this inverse problem approach and compare it with the alternative control problem and other approaches (Sec. I C), and we will subsequently provide the context for the need to assess geoengineering strategies (O2) (Sec. I D).

A. Elements of response theory
In nonautonomous dissipative dynamical systems, like the climate system, given in the forṁ the response of the system to an external forcing g(x,t) can be unambiguously defined in terms of the so-called snapshot attractor 15 of the system, and the natural probability distribution or the measure µ(x,t) supported by it. This applies also to chaotic systems, when the snapshot attractor is a fractal object. Both the attractor and the measure are unique objects; they are defined by an ensemble of trajectories initialized in the infinite past. The time dependence of the snapshot attractor, also called a pullback attractor, 16,18,42 and its measure give what is often termed the "forced response," and their geometrical details at any instant describe (statistical aspects of) the internal variability in a conceptually sound sense. 36 For a scalar observable Ψ(x) too, the (forced) response is uniquely given by a projection of the measure. Response theory 12,43,44 asserts that the most basic ensemble-based statistics, the mean Ψ (t) = dx Ψ(x)µ(x)(t) can be decomposed into linear ( j = 1) and nonlinear ( j > 1) contributions: where the first-order, i.e., linear, term can be obtained as whereμ(x) is the natural invariant measure of the autonomous system (g = 0), and the operators are defined as L F µ = −div(F µ) and L gμ = −div(gμ). In (2), Ψ 0 is the unperturbed (ε = 0) expectation, and the series converges only if the forcing εg(x,t) is small enough. If the forcing depends on time in a multiplicative fashion, g(x,t) = g(x) f (t), then we can write Ψ (1) (t) = G (1) where the Green's function is implied by Eqs. (3) and (4) to be G (1) Note that the higher-order terms Ψ ( j) can be expressed as multiple convolution integrals involving multitime Green's functions. 32 Taking the Fourier transform (FT) of Eq. (4), we have, via the convolution theorem, 45 a response formula in the frequency domain: where χ (1) is called the linear susceptibility.

B. The geoengineering problem
It has been proposed 3 that the effect of greenhouse forcing can be mitigated by applying another external forcing to the Earth system by some geoengineering means that has, in a way, an opposing effect. There are various types of forcing that can achieve this, but here we will consider those-generically referred to as "solar-radiation management" (SRM) 46,47 -that can be modeled by a modulation of the solar constant. We will call this simply the "solar forcing." Clearly, these are means that modulate the shortwave incoming radiation. Readily proposed geoengineering methods include a fleet of reflective satellites of large Sun-facing surface area put into orbit around the Earth, aerosols sprayed into the atmosphere, artificially generated clouds, etc. A modulated solar constant model represents these geoengineering scenarios with various degree of approximation, not necessarily a good approximation. 5 Formally, the geoengineering problem involves a forced/nonautonomous system, where at least two terms contribute to the forcing. For simplicity, to start with, we consider the case of only two forcing terms, both of which are additive; that is, the dynamical system of interest iṡ x = F(x) + ε(g g (x) f g (t) + g s (x) f s (t)), where the subscripts already indicate the physical meanings of the forcings, g for "greenhouse" and s for "solar." Except for the need for a convergent series in Eq. (2), an arbitrary value can be assigned to the "small" parameter ε, and in order to obtain a result in the uncomplicated form of Eq. (10), we choose the same ε for both forcing components. Equation (4) implies that the first-order contribution Ψ Σ (1) (t) to the total response ∆ Ψ Σ under combined forcing, i.e., geoengineering (where the subscript in Ψ Σ is to indicate the presence of multiple forcings), can be written as the superposition of first-order contributions to respective responses to the two forcings in two separate scenarios when these forcings are acting alone. Formally, this is expressed as (1) (t) = G (1) Ψ,g (t) * f g (t) + G (1) The FT of this equation is Note that the nonlinear response is more complicated with multiple forcings present: it is not just a sum of multiple convolution integrals 32 as in the case of a single forcing scenario. If the "forward" problem is the prediction of the response to a given forcing, then the inverse problem of "predicting" the necessary forcing for a desired response, being our objective (O1), seems to be well defined in view of the above equations. To a linear approximation, the necessary or required forcing is In the above, ε = 1 is taken. The case of two forcings, one greenhouse and one geoengineering, can be generalized to N + 1 forcings when we desire to control N climate observables Ψ T = (Ψ 1 , . . . , Ψ N ) by modulating N geoengineering forcings f T s = ( f s1 , . . . , f sN ). With these, generalizing Eq. (9), we can write in matrix form 48 This equation can be inverted to give the vector of geoengineering forcings f s : provided the inverse of the matrix χ Ψ,s exists. The problem of static response is considered by Lu et al., 49 who take the N geoengineering forcings as those acting at N different gridpoints of a climate model and look for forcing fields to which the climate system is most susceptible.
C. The utility of the inverse problem and its alternatives In Appendix C, we provide a procedure to solve the inverse problem with N = 2 using discrete and finite time series. That situation can be interpreted as a control problem, which is in fact a rather special type of optimal control. This way, the required forcing can be predetermined and need not be updated during its application. In a different approach in, for example, Refs. 46 and 47, the solar forcing was constructed on the basis of some models of how much radiative forcing a sudden change of some greenhouse gas concentration or the stratospheric optical depth would yield. In addition, these authors created a scenario ensemble of SRMs, and selected the most effective SRMs. The latter assessment strategy is clearly rather inefficient and inaccurate, and that would still be the case had the ensemble been generated using response theory. We note that MacMartin et al. 50 proposed for the first time to solve an inverse problem as a "design problem" for geoengineering. They invert the analytical solution of a conceptual model for the global average surface temperature, the parameters of which conceptual model are inferred via fitting it to Earth System Model simulation data. Our method is more generic in that it does not require analytic inversion or the use of a simplified model, and it can also consider any observable to be controlled, not just the global average temperature.
The inverse problem would have a direct practical relevance were we to have f g (t) a given, as assumed. However, this is clearly not the case: predicting greenhouse gas emissions is an extremely complicated task, since it is determined among others by social processes, for which we do not have good models. Nevertheless, efforts are underway 51,52 (see also https://crescendoproject.eu/ research/theme-4/). The current standard practice to tackle this challenge, as reflected by the IPCC reports, 1 is to consider half a dozen so-called "methodologically constructed" twenty-first century emission scenarios. This way, instead of climate predictions, one produces so-called climate projections belonging to hypothetical future emission scenarios. Therefore, the solution to our inverse problem has a rather indirect practical relevance; the inverse problem approach would allow us to carry out scenario analyzes. The reader can find elsewhere 50,53-55 descriptions and analysis of a feedback control problem of direct practical relevance, when the solar forcing is being determined in real time with the use of some controller, adapting to a progressing greenhouse forcing, trying to realize the desired response. Note that with feedback control, in a scenario analysis setting, a new simulation needs to be run for each emission scenario, making it very inefficient for an extensive assessment exercise. Note also that for feedback control, what can be observed as a reference (the basis of feedback) is not the forced response in terms of an ensemble, but only a single realization. Therefore, what the feedback and open-loop control strategies would respectively realize-the climatology in terms of say ensemble means, on the one hand, and the internal variability, on the other-could be very different. They are expected to be more significantly different the stronger the internal variability exhibited by the observable Ψ chosen to be controlled, e.g., in the case of local vs global average quantities. In an extreme case, one can consider a geoengineering method designed 7 to extinguish the oscillatory El-Niño phenomenon, presumably in order to prevent floods or droughts that are part of the internal variability of Earth's climate.
We point out that in, for example, Eq. (10), we write Ψ to denote a generic observable. This means that we can choose a particular (scalar) observable that we desire to evolve in a particular way. With reference to the classic term "global warming," in contrast to "climate change," we will attempt to enforce cancellation of the global average surface air temperature (Sec. III A). With the increasingly wide-ranging analyzes of climate change scenarios, however, it is clear that "climate change" should have a comprehensive meaning, and not just be a synonym for "global warming". 56 In fact, physical quantities other than temperature could have a greater social or ecological impact. 1 Therefore, unlike in the present work, in practice, we might want to choose some variable other than global temperature to control. Beside the physical type of the observable quantity, we can make arbitrary choices with respect to the spatial scale of the quantity, such as local or regional averages (Sec. III A 2), zonal averages (Appendix E), global averages (Sec. III A 1), etc.

D. This paper
We now turn to motivating the need for comprehensive assessment of geoengineering scenarios. Once an observable Ψ is chosen to evolve in a particular way, which determines f s (t) according to Eq. (10), the evolution of any other observable Φ will be a given-the solution of a forward problem formally identical to (9): yet with an f s given by Eq. (10).
, which is the generic case. Regarding the desire for cancellation, ∆ Ψ Σ = 0, we can frame 57 geoengineeringconsidering for simplicity only quasistatically slow changes of f g (t)-as a confinement to the 0 isoline of ∆ Ψ Σ over the plane of f g and f s . In general, this isoline is different for different observables Φ = Ψ; i.e., under a linear response, these straight isolines fan out from the origin of the f g -f s plane. This is illustrated in Fig. 1, where the curvature of the isolines for larger values of f g and f s reflect also the more general situation of nonlinear responses. It is a straightforward implication that when the system is confined to one isoline, it cannot be confined to the different isolines of other variables Φ i ; i.e., (unwanted) changes ∆ Φ i,Σ = 0 will ensue. In other words: the proposed geoengineering method will provide just a partial solution at best. While one aspect of the problem is solved, other aspects can be neglected, or even changed to the worse, possibly with catastrophic consequences. 58 A long list of studies have to date addressed the issue of side effects; see, e.g., Refs. 5,38,46,47,50,[59][60][61]. This possibility is the main motivation of our present investigation too, concerning in particular the question (O2) of whether linear response theory can provide an efficient tool to map out and quantify accurately the various side effects of a variety of scenarios. A comprehensive assessment would consider a variety of geoengineering scenarios, emission scenarios, Earth System Models, and possibly other things, for which the efficiency of computation is crucial. In this study, having enforced (approximately, to various degrees) a cancellation of global average surface air temperature, ∆ Ψ Σ = ∆ [T s,Σ ] ≈ 0, we will diagnose unwanted changes, i.e., the total response, in terms of • Φ = T s : regional averages of the surface temperature (Sec. III A 2); • Φ = T tr : regional averages of the temperature near the tropopause (Sec. III A 2); • Φ = [P y ] and P y : the annual precipitation (Sec. III B).
Note that we denote spatial averaging by square brackets, subscripted by the spatial variable(s) with respect to which we average over its whole range, e.g., longitudes λ for zonal averages. However, for global averaging we drop the subscripting altogether (instead of writing, e.g., [T s ] λ ,µ ). Some of these observables have been considered in a number of studies, 5,38,46,47,60,61 and our results are mostly consistent with the published ones; however, as our novel contribution (O2), we will also investigate carefully whether these responses can be predicted by linear response theory.
The premise of our objective as set out above is that of Robock, 62 recently quoted in a blog by Kravitz,7 in which blog he asks the questions whether "there is only one thermostat" and whether "the climate can be optimized regionally." Regarding the second of these, Ban-Weiss and Caldeira 63 and MacMartin et al. 64 applied different spatial patterns to reduce side effects to surface temperature. This idea is actually already covered by our framework with N = 2, in which the function g s (x) in Eq. (7) needs to be specified accordingly. Regarding the first question, Kravitz et al. 55,65 proposed that one names multiple objectives and looks accordingly for multiple suitable "control knobs" for the climate. They employed a feedback control. The alternative-the inverse problem approach (O1) that we propose here-is shown above to be straightforward to generalize to the multiple-objectivemultiple-control-knob situation, simply through the possibility of solving the linear matrix equation (11) by inverting for f s . With regard to side effects, however, whatever way we construct the geoengineering forcing, the situation is hardly different from the single-objective-single-control-knob situation: there will be objectives that we could inadvertently miss from the list, or objectives that are not convenient to include, and then we need to assess the side effects in terms of the corresponding uncontrolled observables-or, indeed, assess the climatology as comprehensively as possible or as is desired. Regarding the assessability of side effects using response theory (O2), even if nonlinear response formulae are available, feasibility might be hampered by an increasing number of objectives or control forcings.
We point out that in the Planet Simulator intermediatecomplexity GCM (PlaSim in short), 66 the greenhouse and solar forcings have been found to be approximately equivalent in terms of the stationary response of the global average surface air temperature 67 insomuch that its isolines are parallel straight lines (even if there is a curvature of the surface). This was found to be the case in rather extensive ranges of the forcings, 90-2880 ppm and 1200-1500 W m −2 , respectively. That is, any curvature of the blue line as shown in Fig. 1 occurs outside of the said ranges. However, in the context of geoengineering, the concern is whether these forcings are equivalent in the same sense in terms of other variables (Φ) too, as discussed. We will demonstrate in PlaSim that with regard to regional averages T s , the correspondence of forcings is still remarkable, but there is nevertheless a residual response with a nontrivial pattern under geoengineering. Furthermore, our analysis hints that (O2) this residual response might not be so linear, and less so for precipitation, which goes beyond the findings of MacMartin and Kravitz, 38 who demonstrate clearly the linearity only of the global average response under geoengineering, but average the linear predictions of spatial patterns over nine models. On the other hand, Cao et al. 39 do indicate that the local response for several observables under geoengineering might be nonlinear by comparing the prediction of a linear regression model with simulation results for the HadCM3L model. However, we point out that it is possible that the local susceptibilities represented in the simple linear model were inaccurately estimated from model simulation data, and so further analysis would be needed to attribute the said mismatch to nonlinearity. This is what we attempt here.
This work follows Ragone et al. 31 and Lucarini et al. 32 The latter demonstrated that it is convenient for response theory to predict spatial patterns too, which, as outlined above, form the basis for one of the types of diagnostics that we pursue in order to assess the success of the geoengineering method. In both of these papers, the authors used PlaSim 66 to demonstrate the power of their methodology, but with slightly differing setups of the model. Here we adopt the setup of Lucarini et al. 32 featuring meridional ocean heat transport. The present work also builds on Gritsun and Lucarini 68 in adopting a simple technique to obtain a better estimate of the linear susceptibility, which was independently discovered by Liu et al. 69 Clearly, a better susceptibility estimate would be useful in making a linear prediction only if the actual response were linear. Under [CO 2 ]-doubling, Ragone et al. 31 and Lucarini et al. 32 found a nonlinear response of the global average ∆ [T s ] , and so no linear prediction would be productive in that case; however, under geoengineering, the total response is aimed to be much smaller, and so in principle the response may be linear. This is found to be approximately the case in PlaSim, and, so, as one of the main contributions of this work (O1), by improving the susceptibility estimates, we can improve greatly on our prediction of a solar forcing f s (t) required for cancellation, ∆ Ψ Σ (t) = 0.
The structure of the remainder of this paper is as follows. Next, in Sec. II, we detail our methodology, listing a set of experiments performed to establish the response characteristics of the climate model and to assess nonlinearities, among other things. Then, in Sec. III, we provide results, first pertaining to objective (O1) about the success of the primary objective of geoengineering, namely, the cancellation (Sec. III A 1), and then our diagnostics of other observables (Secs. III A 2 and III B). Finally, in Sec. IV, in terms of the stationary climate only (O1), we outline an improved method for obtaining the required solar forcing for cancellation, and also (O2) analyze our improved diagnostics with respect to the linearity of the response. In Sec. V we summarize our results and give our perspective for worthwhile future work. In a series of appendices we give details of the notation and algorithm for spectral analysis in discrete time (Appendix A), the way we obtain the Green's functions (Appendix B), our novel solution method to the discrete-and finite-time inverse problem for a required solar forcing (Appendix C), and the circular convolution theorem (Appendix D), and we relegate the diagnostics of zonal averages to Appendix E.

II. METHODOLOGY: FORCING SCENARIOS
The form of the forcing signal f g due to changes in the carbon dioxide concentration [CO 2 ], for which we want to solve the geoengineering inverse problem, is a ramp that was used by Lucarini et al. 32 This is a standard forcing type, also used for the CMIP6 DECK (Diagnostic, Evaluation and Characterization of Klima) protocols. 70 More precisely, it is not a time-continuous ramp, for the reason detailed in Appendix B, but [CO 2 ], and so f g , is kept constant for one year after each incremental increase. The [ , and therefore increasing in a superlinear fashion with time [n], but, owing to the logarithmic dependence of the radiative forcing on the [CO 2 ] concentration, 71 it realizes a linear radiative forcing signal 72 f g [n] (see Appendix A for the notation for a discrete time series), i.e., a constant-in-time (n) radiative forcing incre- . Hence the name "ramp." Such a form of the forcing signal is useful in diagnosing or interpreting results. For example, if the response characteristic to solar forcing f s is similar to that of f g , then the required solar forcing to cancel global change would also be approximately ramp-like.
Note, however, that a linearity of the response characteristic to any forcing is usually checked by a comparison of the linear prediction with the truth in terms of a model simulation subject to the same forcing. The latter we will refer to as "reference" in the following, instead of "truth." Beside the nonlinearity, another factor that gives rise to a discrepancy is a statistical error due to the finite ensemble size. However, the latter has a very distinct feature that can be visually told apart easily from the contribution of nonlinearity. We reiterate that by applying a staircase-like forcing, we guarantee that the said discrepancy has no contribution due to calculations being performed in discrete time.
We point out that, at asymptotic times, there is no discrepancy at all, because of the way we estimate the Green's function (Appendix B): the discrepancy emerges transiently only. Its all-time maximum is a useful intuitive measure of nonlinearity in the examined regime. In any case, the larger the response, the greater is the nonlinear contribution to it, and so-in the context of system identification-the more inaccurate our estimates of the susceptibilities (Appendix B) be-TABLE I. Sets of simulation data specified by the forcing. Each data set is codenamed by a three-character code, the first character coding the quantity in which the forcing is presented (C for [CO 2 ] and S for solar irradiance), the second character coding the "form" of the forcing signal (S for step, R for ramp, and Q for slow ramp), and the third character coding the plateau level of the (correspondingsee text) greenhouse forcing (2 for [CO 2 ] ∞ /[CO 2 ] 0 = 2 and 1 for . The CS2 and CR2 data sets are preexisting to the present study, 32 consisting of 200 ensemble members. All new data sets listed here consist of 20 ensemble members each, except for CQ2, which consists of 10.

Form:
Step Ramp Slow ramp come. Therefore, beside our base scenario of (overall) doubling [CO 2 ], we will also check if we can obtain a more accurate (and so useful for the geoengineering problem) estimate of the Green's function using a weaker identification forcing. In particular, we apply a [CO 2 ] change that results in half of the (overall) radiative forcing change of that by dou- according to the above-mentioned logarithmic law. 71 ) Note that in the case of this weaker forcing, irrespective of the different plateau level, the increments of the [CO 2 ] changes realize the same 1%/yr relative change.
We refer the reader to Table I for an overview of the various identification and test forcing scenarios that we used in the present study. Among them, we have CQ2 defined by 0.1%/yr relative changes, which makes it a much slower change than the base scenario. The response to such a slow ramp forcing should be ramp-like as long as the linear term in Eq. (2) dominates over the nonlinear ones. This forcing scenario will therefore provide us another reference in interpreting other results with respect to linearity.
In Table I, we have not indicated the plateau level of the solar forcing f s used in conjunction with f g . We chose this level such that the response asymptotically in terms of the global average surface air temperature is the same but of opposite sign as that due to the corresponding f g . This level can be easily determined to a good approximation by an iterative procedure. Beside those in Table I, we will introduce a few more forcing scenarios in Sec. IV: some forcing scenarios that will give improved results, and other forcing scenarios that aid the interpretation of our results. The abrupt stepwise forcing scenarios, CS1, SS1, CS2, SS2, are employed to numerically determine the Green's functions as detailed in Appendix B.

III. RESULTS
A. Surface air temperature

Global average
The global average surface air temperature is the variable with respect to which we prescribe the cancellation. We do not consider any other variable in this role throughout the present study. Having predicted the solar forcings (SR1, SR2) required to produce no total response used in combination with prescribed [CO 2 ] forcings (CR1, CR2) adopting the methodology described in Appendix C, we plot the predicted linear responses in Fig. 2(a). Clearly, these predictions can be viewed either as components of the predicted total response (BR1, BR2), or the predicted response in separate scenarios (CR1, CR2, SR1, SR2). Alongside these predictions, we plot the true response in the scenarios when the forcings are applied separately, i.e., the responses evaluated by direct numerical simulations (CR1, CR2, SR1, SR2). The comparison of prediction and reference reveals that (i) the response to stronger forcing is more nonlinear in the case of greenhouse forcing (CR2) in comparison with solar forcing (SR2) and (ii) with a weaker identification (CS1, SS1) and test forcing (CR1, SR1), the linear prediction for CR1 is much better than that for CR2, while SR1 is seemingly as good as SR2. For the scenarios of combined forcing (BR1, BR2), only the true response is nontrivial if nonlinear, which is displayed in Fig. 2(b). Indeed, because of the nonlinearity, the total asymptotic response is nonzero. (Note that the fluctuations at asymptotic time are due to the finite ensemble size.) It is visibly nonzero even with the weaker forcings. However, it is just about 10% of that with greenhouse forcing solely, even in the case of the stronger forcings.
The pronounced nonlinearity [point (i) above] shows up also in other experiments. With a very slow forcing CQ2, we registered the response as shown in Fig. 3. Although the rate of forcing is unchanged throughout the time period of almost 700 years, the response switches to a slower rate between 400 and 500 years, or, between 3 and 4 K changes in the temperature. 73 The placement of this change in the rate, compared with the asymptotic temperature change of almost 3 K upon the weaker CR1 forcing seen in Fig. 2(a), is in good agreement with the observation of a much more closely linear response to that weaker forcing as compared with CR2. A crude indicator of non/linearity can be extracted from the CQ2 experiment, but also from comparing the asymptotic/stationary responses (denoted by a subscript ∞) in the XX1 and XX2 experiments, as the following ratio: (Note that we write an "X" in place of one of the possible characters in the scenario identification code when it does not matter which of the possible characters is written there.) This value is ρ = 0.99 with solar forcing and 0.85 with greenhouse forcing, in agreement with what the comparison of predicted and true responses, seen in Fig. 2(a), allowed us to conclude above. Note that in the keys, e.g. in "CRX", "X" can stand for either "1" or "2", and, intuitively, these scenarios can be told apart in the diagram by the greater/smaller response levels. Note also that in (a), the two yellow curves perfectly cover the corresponding light blue ones, because f s is calculated to cancel global warming at all times.
FIG. 3. True response of the global mean surface temperature under a very slow ramp forcing, CQ2.

Spatial pattern
We continue with the diagnostics of the geoengineering method in view of uncontrolled observables. We begin by looking at the observable of the same physical quantity, the surface air temperature, but of a spatial scale other than the global average. We would like to map out the spatial variation of the total response. A comprehensive view of the spatial variations of the response is given by the distribution over the 2D surface, computing the response at each gridpoint separately, as done by Lucarini et al. 32 Similarly to zonal averages (Appendix E), the response patterns to greenhouse and solar forcings are very similar in the stationary climate regimes; see Unsurprisingly, large predicted residual total responses occur where the response is large to either greenhouse or solar forcing alone. However, the predicted total response turns out to be grossly erroneous; the reference regarding the surface air temperature, shown in Figs. 4(d) for BR2 and 4(f) for BR1, shows that significant cancellation is achieved even locally. [We note that the overwhelmingly red and blue colors in Figs. 4(d) and 4(f), respectively, are consistent with the signs of the true residual total global change shown in Fig. 2

B. Annual precipitation
Here we present results for another diagnostic observable, the annual precipitation P y . In terms of the spatial patterns of the response, very similar conclusions can be drawn for the precipitation as for the surface air temperature, and these conclusions are supported by the set of diagrams in Fig. 6. The difference is that the largest responses are observed at equatorial regions, and it is not clear what mechanism causes this. Most importantly, significant cancellation is actually achieved as opposed to the much less favorable linear prediction. This is so even if the solar forcing used is the same as before, i.e., FIG. 4. Spatial variation of the stationary climate in terms of the surface air temperature belonging to different forcing levels specified by plateaus of forcings collected in Table I one determined with the aim of canceling global warming (not wettening; in the same spirit as Fig. 4 of Ref. 38). The significant cancellation clearly suggests that the response characteristics of P y to greenhouse and to solar forcing, say in terms of the respective Green's functions, are very similar, as with the corresponding Green's functions of T s . Nevertheless, there are differences too, as seen in Fig. 7, such as the more obvious nonmonotonicity of the evolution, despite a monotonic forcing. Furthermore, the linear prediction for the total response in the stationary climate is nonzero, which is clearly because we are looking at an uncontrolled observable. This linear prediction, however, is quite "unreliable," as can be expected from the mismatch of the true and predicted spatial patterns. Otherwise, both the predicted and the true total global mean responses to combined forcing look rather negligible compared with the responses to the greenhouse or solar forcings acting separately; see also to CR1, SR1, and SR2. We note that equatorial drying under a similar geoengineering scenario has also been reported by others. 5,38 However, in these studies a quadrupling of [CO 2 ] was considered. We point out that it does seem to matter what levels of change we consider: under [CO 2 ]-doubling, we actually find less drying than in the case of the √ 2-fold [CO 2 ] increase. This finding can, however, have different reasons. One candidate is that the response under combined forcing is nonlinear; another is that (assuming that the response under combined forcing is approximately linear) the required solar forcing was determined inaccurately [which already resulted in a residual response as seen in Fig. 2(b)]. Note that in Refs. 5 and 38, an exact cancellation of global mean surface temperature was achieved in the stationary climate, like, for example, in the G1 GeoMIP experiment. Given this, Fig. 4 of Ref. 38 indicates that the response of the global mean is approximately linear in most CMIP5 models considered, at least up to a certain forcing level that was actually lower than [CO 2 ]-doubling. In the following, we indicate that both of the said effects play a role, i.e., nonlinearity is likely also present in our case; however, it might not This subsection pertains to our objective (O1). The very close resemblance of the patterns seen in Figs. 4(a) and 4(b) hints that the effect of a changing [CO 2 ] on the radiative forcing shaping the surface air temperature is very similar to that due to a changing solar strength. However, with these data, we are not properly informed about just how similar, because, for example, the CR2 and SR2 forcings act in opposite directions and, owing to nonlinearities, they do not have to have the same effect even if the effects due to forcings acting in the same direction are indistinguishable. Therefore, we produced just that missing simulation: complementing SS2, for which the applied solar forcing is a step of equal magnitude but opposite sign. For this forcing, the stationary climate is shown in Fig. 8(a), to be referred to as SS2I. It is virtually indistinguishable from the pattern resulting for CS2, seen in Fig. 4(a), including a lack of such misalignment as the comparison of Figs. 4(a) and 4(b) revealed. This goes beyond the report on the (approximate) "equivalence" of greenhouse and solar forcings with respect to (asymptotic in time) global average surface temperature; 67 this is extended now to regional averages, i.e., spatial patterns, of that variable with a remarkable degree of approximation. [Just how close this equivalence is will be indicated by Fig. 9(a).] The superposition of the stationary climates for SS2 and SS2I, displayed in Fig. 8(b), is in turn almost indistinguishable from the asymptotic total response to combined BR2 forcing, seen in Fig. 4(c). By inspection of Eq. (2), this pattern turns out to be created by even-order nonlinear perturbative terms of the response. The selection of the even-order terms takes exactly the superposition of the responses from two experiments where the forcings are equal but of opposite sign: ε 1 = −ε 2 .
Instead of eliminating the even-order terms by superposition, we can retain only the odd-order terms by subtraction. We proceed in this direction, assuming that the third-order and higher odd-order terms make a negligible contribution. This way, we attempt to improve on the results for the linear susceptibility-and so ultimately on our prediction of the required solar forcing needed for canceling global warming. This is done with the aim of making an advance regarding our objective (O1). We can then apply this forcing in a new experiment coded as BR2C ("C" for "cancel"). For this experiment, we can utilize (although we will not examine the transient 75 ) our finding that the response characteristics to greenhouse and solar, i.e., short-wave and long-wave radiative, forcings are very similar, which would allow the application of a solar forcing that is a simple straight ramp, just like log([CO 2 ]/[CO 2 ] 0 )(t), having the same length before the plateau. (This should be the rationale behind the G2 experiments of GeoMIP.) That is, what we improve on here is only the level of the plateau. It is rather straightforward to obtain the following equations for this level f ∞,BR2C,s : The subscripts ∞ refer to the asymptotic, stationary climate regime; other subscripts refer to the experiment, i.e., forcing scenario. Observe that we need data from a new experiment, CS2I, where the "I" indicates an experiment related to CS2 . Having | f ∞,BR2C,g | = | f ∞,CS2 |, we can express the sought-for forcing in relative terms based on the temperature data only, such as In fact, we carried out the BR2C experiment independently: iteratively determining a solar forcing that cancels to a very good approximation the total response (similarly to how the level for, e.g., SS2 was determined observing the result of CS2). This forcing in the above relative terms was found to be 1.11, agreeing well with our prediction of 1.08.
Given that our prediction is smaller than the forcing actually needed for cancellation, we can predict an upper bound on the actual total response to our predicted forcing by substituting the actually needed value | f ∞,BR2C,s |/| f ∞,SS2 | = 1.11 into Eq. (17) (assuming that the response under combined forcing is linear). This gives ∆ [T s ] ∞,BR2C < 0.134 K. Considering that the total residual response with the original naive methodology (Appendices B and C) was 0.6 K, this means that with the improved methodology we managed to reduce the total response almost to one-fifth or even less of the said first result. (Of course, the exact reduction can be easily determined by an extra simulation, which we have not run.) In fact, some residual total response even with the improved method could be expected, since the simple measure of nonlinearity (14) indicated that linearity is much more violated by an increasing radiative forcing as opposed to a decreasing one. This suggests that the third-order odd perturbative term is not very small relative to the second-order one-contrary to the assumption of our improved methodology. Another source of error could be a nonlinear component of the response under combined forcing. This is what we examine next.

B. Uncontrolled response and its (non)linearity
This subsection pertains to our objective (O2). Even if we managed to achieve a perfect cancellation in terms of the global averages, amounting to a success in terms of our objective (O1), it is still important to examine the total response in terms of any other observables regarding which the cancellation is not enforced, to see whether there is any unwanted residual. To this end, we look at the BR2C data. In particular, in Fig. 9, we show the spatial variations of the stationary climate in terms of (a) the surface air temperature and (b) the annual precipitation. The former looks to be a result of interpolating between the maps of Figs. 4(d) and 4(f), and the latter looks to have the same relationship with the maps seen in Figs. 6(d) and 6(f). This implies that the variances with respect to space for BR2C (exact cancellation), both for temperature and for precipitation, are about the same as those for BR2 (approximate cancellation employing a naive method), and are much larger than the residual total responses in terms of the respective global averages for BR2. The reason for this must be that typically the respective (not necessarily linear) local response characteristics to greenhouse and solar forcing are somewhat different. Furthermore, comparing the BR2 and BR2C scenarios, we see that the difference in terms of the climatic surface air temperature could be as much as 2 K, which is about 10% of the maximum response under the corresponding greenhouse forcing alone. This justifies very well the application of the improved methodology as described in Sec. IV A. We note that the cooling tropics and warming arctic under global average surface temperature cancellation are in agreement with the findings of other studies using state-ofthe-art models, including the first study examining the sideeffects of geoengineering by Govindasamy and Caldeira. 59 The improved methodology to estimate susceptibilities applies of course to regional averages too. What remains to be seen now is whether linear response theory can predict the residual total responses seen in Figs. 9 (a) and 9(b) (O2). The corresponding linear predictions are shown in Figs. 9(c) and 9(d), respectively. These predictions show a dramatic improvement on the first results shown in Figs. 4(d) and Fig. 6(d), respectively. Quantitatively, however, the prediction is not perfect. We can quantify this by, for example, the Pear- TABLE III. Measures of overall nonlinearity of the response in terms of the local temperature and precipitation. C is the Pearson correlation coefficient between the reference ∆ Ψ and the linear prediction Ψ (1) , and ρ is defined by Eq. (21). Note that to calculate std(ρ), values of ρ larger in modulus than 5 are discarded. The last column is devoted to the global averages. son correlation coefficient C between the reference ∆ Ψ and the linear prediction Ψ (1) , the results for which are shown in Table III. (Note that no weighting of the data points with the area represented by gridpoints is done.) This shows that the prediction skill is better for the temperature than the precipitation.
Whether the imperfection of the linear prediction is due to nonlinearity-as a small error E = ∆ Ψ − Ψ (1) would normally suggest-is not clear, because it is possible that the response ∆ Ψ is linear but errors in the susceptibility estimates determining Ψ (1) (or rather its estimator) remain. We should therefore find a way to check linearity without relying on the linear prediction. This can be done in a naive way similarly to what we did for single-forcing scenarios, evaluating ρ as defined by Eq. (14). However, this time, we have not one but two forcings present. Because of this, it turns out that a check of linearity requires not two but three data points at least. In fact, we are readily endowed by three data set candidates resulting from the BR1, BR2, and BR2C experiments. In each scenario, if the response is linear, the asymptotic climate would be given by an equation like where i = 1, 2, 3 stand for, say, BR1, BR2, BR2C, in that order. One can express χ Ψ,s from the equation for i = 3, substitute into the equations for i = 1, 2, and from these latter express χ Ψ,g . Under linearity, the ratio of these expressions, would be unity, meaning that Eqs. (20) are in fact satisfied. We have evaluated ρ for all gridpoints and display the results in Fig. 10. This suggests that we do have nonlinearity for both the temperature and precipitation. However, this conclusion can be called into question by noticing that the three data points could be too close to one another, resulting possibly in an inaccurate estimation of the ratio ρ, thereby falsely suggesting nonlinearity. Instead of evaluating ρ, faced with finite data set size to evaluate climatic means, a test statistic would be a proper way of indicating nonlinearity at individual gridpoints. 76 Nevertheless, based on the available data still, one idea to indicate that deviations from unity of both the correlation coefficient C and ρ are due to nonlinearity would be to demonstrate a correlation between the local errors E of the linear prediction and ρ. We have checked the scatterplots of these quantities for both the temperature and precipitation and found no sign of correlations. This, however, does not mean that the response is linear: some unidentified effect could destroy the correlation. Our final idea is that if two situations feature different levels of nonlinearity, even if the two gridpointwise quantifiers of nonlinearity, E and ρ, have random errors, typically they should indicate in a coordinated way a stronger deviation from linearity in the case when nonlinearity is actually stronger. We propose to capture this typical behavior by the correlation coefficient C on the one hand, and the standard deviation std(ρ) over the gridpoints on the other. We note that even if linearity is typical, a smaller std(ρ) would indicate that it is more typical. We have already given the correlation coefficient in Table III, where we also display std(ρ). We do indeed see that both quantities suggest that the response of precipitation is more nonlinear.
In the last column of Table III,  , a degree of nonlinearity still seems very likely. The figures indicate that the response of the global average precipitation, unlike the local values/regional averages, is not significantly more nonlinear than the response of temperature under geoengineering. These results caution us about the reliability of linear predictions of side-effects as part of an assessment exercise: • linear predictions of regional responses are less reliable than the global response; • some quantities can respond more nonlinearly than others.

V. SUMMARY AND OUTLOOK
As our objective (O1), we defined and solved an inverse problem for finding a solar forcing that can cancel global warming when applied in conjunction with a given greenhouse forcing. Our novel inverse problem approach is generic in two respects. First, we can allow for different choices of the scalar observable to keep under control-different either with respect to the physical quantity, or considering, for example, local variables. Second, we can prescribe an arbitrary time evolution of the chosen observable, assessing perhaps less stringent requirements than total cancellation. 8 The inverse problem itself was derived in the framework of linear response theory. We have then further generalized the problem to the case when we want to control N climatic observables by including N auxiliary forcings.
Because of a degree of nonlinearity of the response characteristics of interest, the degree of approximation of the solution of the inverse problem, specifically for the cancellation of global average surface air temperature, depended on the accuracy of determining the linear susceptibilities (or Green's functions) belonging to the different forcings. The inaccurate determination of the linear susceptibilities stems from the fact that for the estimation of the Green's functions we used finite-magnitude external system identification forcings (see Appendix B), in which case the nonlinearity of the response is already felt, while for the cancellation, i.e., zero total response, we would need the linear susceptibilities exactly-assuming the response is linear under combined forcing. An inaccurately predicted required solar forcing leads to a nonzero residual true total response. By a simple method, however, also used by Gritsun and Lucarini 68 and independently by Liu et al., 69 here, for determining the susceptibilities, we eliminate even-order nonlinearities from the response in the system identification experiments. The price of this is having to run twice as many ensemble simulations for system identification. In the scenario of doubling CO 2 concentration, by this method we were able to achieve a fivefold reduction of the unwanted actual total response arising instead of cancellation. Furthermore, the linear prediction of spatial patterns using the improved local susceptibilities improved dramatically.
Nevertheless, the prediction is not perfect, and, pursuing our objective (O2), we indicated that the response under combined forcing should be somewhat nonlinear, and the degree of nonlinearity could be typically stronger for some quantities. In particular, we have seen evidence suggesting that in PlaSim the response of precipitation is more nonlinear than that of the surface temperature. This casts a shadow over the use of linear response theory for an efficient assessment. Perhaps there would still be value in this method, since larger-scale quantities are expected to be better predictable. Results reported by Cao et al. 39 seem to indicate that in complex models this nonlinearity is not more modest. Therefore, it would be desirable in the future to work out a method of predicting the nonlinear response in geoengineering scenarios.
We note that the nonlinearity seen in the system identification experiments is not a local property of the unforced system. The nonlinearity starts to be manifested at a certain level of the response. This is the case only for the positive radiative forcing, but not for the negative one. This is why-despite the nonlocality of nonlinearity-our improved methodology in Sec. IV A still worked so well: the wrong susceptibility estimated from the positive forcing was averaged by the more correct susceptibility estimated from the negative forcing. That is, the error was mitigated.
Ours is the first such analysis of the linearity of regional response under geoengineering; previous studies 38,39 only compared the linear prediction and the true model response, and, as we have argued, such a comparison has to be considered inconclusive regarding the nonlinearity of the response. There is a question as to whether our findings on the predictive power of linear response theory in PlaSim carry over to state-ofthe-art Earth System Models, because these do respond more weakly. The question certainly seems valid, however, since CMIP5 models also feature nonlinear regional responses under [CO 2 ] forcing alone. 40,41 The responses of global average surface air temperature and precipitation have been reported FIG. 10. Non/linearity of the response in terms of (a) temperature and (b) precipitation, measured by ρ given by the expression (21). Any values of ρ lying outside the range of the color bar are represented by the limiting red and blue colors.
by MacMartin and Kravitz 38 to be approximately linear in some CMIP5 models, seemingly more so than in PlaSim, but weaker forcing than [CO 2 ]-doubling was considered, and the linearity of regional responses was not analyzed in detail. In apparent contradiction, Cao et al. 39 reported a clear and not insignificant mismatch of the linear prediction and model simulation for local responses. However, only a single state-ofthe-art model was considered, the HadCM3L model, and so it was not indicated whether this is typical behavior of state-ofthe-art models. Furthermore, to reiterate, regarding the novel contribution (O2) of our work, we claim that, based on these previous results reported, it was not correct to conclude that the mismatch was due to nonlinearity. As the latest result on the linearity of the local precipitation response, however, Ref. 77, analyzing the Geoengineering Large Ensemble of the CESM1 model, states that "While a rather extreme geoengineering scenario has been considered, many, but not all, of the precipitation features scale linearly with the offset global warming." In particular, see their Figs. 16 (c), (g), (k). Furthermore, we learn from Figs. 9 and 10 of Ref. 78 that different ESMs feature quite different patterns of local temperature and precipitation response under geoengineering.
We pointed out also that instead of stepwise system identification forcing, it is better to use a Kronecker delta forcing to achieve a better signal-to-noise ratio. As another gain from using a Kronecker delta forcing, the response would be much more modest in magnitude, and hence it would stay farther away from regimes with more significant contributions of nonlinear terms, and so the linear susceptibilities could be estimated more accurately, even by the naive method [see Appendix B and Eq. (A2)].
We note that the presented method for predicting a required solar forcing is based on Green's functions that are determined by externally forcing the system of interest. This is clearly not a method that could be put in practice in the case of the Earth system, and so we are restricted to using climate models. Therefore, this is another reason, beside the unpredictability of twenty-first century greenhouse forcing, why the method is suitable only for scenario analyses. However, it might be possible to estimate the Green's functions without externally forcing the system, just from an observation of unforced fluctuations. The crucial question in this regard is whether the fluctuation-dissipation theorem 11,21 is applicable.

ACKNOWLEDGMENTS
We would like to express our gratitude to an anonymous referee of a previous submission of an earlier version of our manuscript for their very thorough feedback and many suggestions to improve the quality of our work, and also for providing references to the relevant literature. We would like to acknowledge also Ken Caldeira for drawing our attention to his paper 39 relevant to our contribution (O2), and the then editor Ben Kravitz for informing us about his publication. 79 We would also like to thank an anonymous referee and the editor Georg Gottwald for their helpful suggestions to improve this paper. We also acknowledge an anonymous contractor commissioned by AIP's Author Services who edited this manuscript. TB would like to thank Jian Lu for inspiring discussions on geoengineering, and sharing their manuscript. 49 This work was part of the EU Horizon 2020 project CRESCENDO (under Grant No. 641816); the financial support is gratefully acknowledged. It also received support from the EU Blue Action project (under Grant No. 727852) and from the Institute for Basic Science (IBS), Republic of Korea, under IBS-R028-D1. V.L. acknowledges support from the DFG Sfb/Transregio TRR181 project.

Appendix A: Computing the response in time and frequency domains
To be able to carry out (approximate) calculations involving spectral transforms, we need to clarify the formulae and algorithms applicable to discrete-time and finite-size data. We can approximate the time-continuous forcing f (t) [appearing in Eq. (4)] by a staircase-like forcing that is defined by a uniform sampling of f (t), called a sample-and-hold approximation. It can be represented by a discrete sequence f [n] = f (t = nT ), n = . . . , −1, 0, 1, . . . , with T being the uniform sampling interval, in which sequence the data points provide the levels of the "stairs." That is, for an actual staircase-like forcing signal, f (t = (n + ν)T ) = f [n] for all ν ∈ [0, 1], where the noninte-ger ν can be viewed as a phase variable-the phase where the sample is taken within the interval where the forcing is constant. For such staircase-like forcings, sample values of the response with the sampling Ψ[n] = Ψ(t = (n + ν)T ) of the time-continuous response at any phase ν ∈ [0, 1] obey where the discrete-time (DT) impulse response or DT Green's function h Ψ [n] is, clearly, the response Ψ ⊥ (1) to a Kronecker delta function forcing: f [n] = δ [n] = 1 if n = 0 and 0 otherwise. 80 Note that we make a distinction in our notation with regard to the special forcing such that we distinguishΨ from Ψ; however, for simplicity, we did not subscriptΨ by ν, despite its dependence on the phase. Note also that the DT impulse response function cannot be obtained by a straightforward sampling of the Green's function; that is, in general, (orΨ[n]) is defined with, or with any other fixed ν, for all n. If the sampling frequency is not adequate regarding some typical time scales of the forcing, then the calculated discrete response will be also an inadequate approximation. We note further that-unlike the Dirac delta in the continuous-time case-the Kronecker delta can be realized for numerical purposes. It is equivalent to applying a step forcing and taking the difference: This method was used by Lucarini et al. 32 Such external forcings we will refer to as (system) identification forcing. First, to predict the response (to first order), we need to obtain for example the (first-order) Green's function. As Eq. (5) suggests, it is fully determined by the autonomous system. A direct evaluation of this formula is, however, prone to failure. 32 Second, we note that in practice we can study only a discrete-time version of the system. This suggests that for a direct way of determining the Green's function, instead of Eq. (4), we have to use Eq. (A1) [leading to Eq. (A2)].
It also means that we cannot infer the response of the system just by observing its autonomous dynamics, but we need to force it externally in a suitable way. Third, an ensemble of experiments (appropriately initialized) is needed to obtain the expected value Ψ [with the notation introduced in Appendix A, first appearing in Eq. (A1)]. This was acknowledged also by MacMartin and Kravitz. 38 However, it is feasible to run only a finite number of experiments, so we obtain an approximation of Ψ , where the error is some correlated noise process. This correlation can be made negligible by using a suitably infrequent sampling, allowed by, say, the application of a slow forcing. We use the same data as Lucarini et al., 32 which consist of some ensembles of 200 members, and we have produced new data belonging to new forcing scenarios, as described in Sec. II, that consist of ensembles of 20 members.
As already spelled out in Appendix A, two identification forcing types are particularly suitable to determine the Green's function: one is a step forcing, and the other is the Kronecker delta. When a random statistical error is present due to the finite ensemble size, represented say by a Gaussian random variable ξ , it is actually better to use a Kronecker delta forcing for the following reason. Using the step forcing instead, one needs to take the difference of consecutive valueswhat is sometimes called "differencing"-of the response sequence (A2). This way, at any time, the variance of the error is that of the difference of two random variables, ξ 1 and ξ 2 , both distributed identically to the original random variable ξ . For Gaussian variables, it is straightforward to show that Var[ξ 1 −ξ 2 ] = 2Var [ξ ]. Note that we assume that ξ is the same random variable to a good approximation under the delta and step forcings. Despite the advantage of the Kronecker delta forcing, we apply a step forcing also in our new experiments, so that we are able to make use of data produced for the study by Lucarini et al. 32 in a consistent manner. Examples of the response to step forcings are displayed in Fig. 11. The similarity of the responses to greenhouse and solar forcings here, and so the Green's functions, is consistent with the findings of others [82][83][84] and the design of the G2 GeoMIP experiment. 6 It is important to appreciate the following trade-off. For a better signal-to-noise ratio (SNR), one can apply a more powerful identification forcing. However, in the presence of nonlinearities, the more powerful the forcing signal, the larger is the error in estimating the Green's function belonging to the base state Ψ 0 (even without noise). MacMartin and Kravitz 38 applied a [CO 2 ]-quadrupling (and this is a standard forcing level for geoengineering studies 5,6 ); however, they did not determine the solar forcing for cancellation via Green's functions (Appendix C), and they checked the linearity of the response only up to a forcing level lower than [CO 2 ]-doubling. Their motivation for applying the high forcing level seems to have been only to be able to determine the Green's function with a better SNR given that no ensemble data are available from the GeoMIP experiments.
We make here two more comments about the issue with noise. First, instead of instantaneous samples of the observable Ψ and the corresponding Green's function, we will consider, like Lucarini et al., 32 annual averages,Ψ[n] = 1 0 dνΨ((n + ν)T ). This is sensible given the slow rate of change that the applied forcing represents, and it also greatly reduces the noise level. In this regard, we point out that annual averages too obey Eq. (A1) exactly if the forcing is constant over a year, because the order of summations can be interchanged, whereby a well-defined DT Green's function belonging to the annual average emerges. We will use only annually constant staircase-like forcings in our experiments (Sec. II), so that it will be clear that a linear prediction of the response has an error not because Eq. (A1) does not apply exactly, but because of the missing higher-order perturbative terms appearing in the expression (2). Second, the said enhancement of noise by differencing in Eq. (A2) cannot be overcome by working in the frequency domain. The Green's function, via the frequency domain and applying Eq. (D1) of Appendix D, is expressed as The latter is the very factor arising in the DTFT of a differenced sequence. As far as we are aware, the only way to avoid the differencing and thereby reduce the noise is by using a Kronecker delta identification forcing as argued above. 85 We also note that the application of a stepwise identification forcing implies a specific frequency dependence of the variance of the estimator of the susceptibility. See Ref. 79 for other system identification techniques that imply different such frequency dependences.  Table I. After a subtraction of the limit value and displaying the response on linear-log scales in (b), it is revealed that the high-dimensional system behaves very much like a noise-driven linear two-box model, also called a vector autoregressive (VAR) model, in view of the considered global scale variable, as also recognized by MacMynowski (MacMartin) et al. 84 and Caldeira and Myhrvold. 83 The two time scales of the VAR models fitted to the CS2 and SS2 data are about 5 and 40 years. The second time scale is in disagreement with MacMynowski (MacMartin) et al., 84 and it is not clear whether a more complex model is more reliable in this respect. Note: the angle brackets denoting ensemble average are dropped from diagram annotations throughout this paper. that in stage 1, there is no need to replace any values by zeros, only in stage 2). One can use deconv in MATLAB to perform the deconvolution. We find in simple examples studied (results not shown) that without noise the procedure in the time domain leads to the very same solution as the procedure in the frequency domain. However, this is not the case with additive noise, which means that the deconvolution/inverse problem is ill-posed in this case. Nevertheless, the weaker the noise, the closer the outcome is to the true solution, either in the time or the frequency domain. We find that in the time domain, the procedure always converges to some solution; however, with increasing noise strength, this solution features oscillations of increasing amplitude as time advances. Nevertheless, for a certain noise strength when the frequency-domain procedure also converges, we find that the solution in the time domain is smoother and so closer to the true solution earlier in time. This is also what we find considering the PlaSim data, as shown in Fig. 12. We conclude, therefore, that it is preferable to work in the time domain using Eq. (C4) to produce numerical results. Nevertheless, we will carry out our calculations in the frequency domain, using for example the forcing signals shown in Fig. 12(a), in order to make the point that even a rough forcing signal convolved with a rough Green's function produces a not so rough response, as we see in Sec. III A 1.
We emphasize that, in our case, multiple iterations are not needed, because h [T s ],g and h [T s ],s are very similar, as indicated by Fig. 11. The straightforward application of Eq. (C2) is sufficient, substituting an all-zero sequence for ∆ Ψ Σ . Ours is obviously a special case, however, and the generic iterative procedure might be needed for other geoengineering scenarios of practical relevance.
We note that an anonymous referee of a previous submis- = 0 for n = 1. We have checked (results not shown) that it gives exactly the same result as our iterative procedure, reproducing the time series pattern due to a particular noise realization in a simple example system.

Appendix D: The circular convolution theorem and its application
Taking the discrete-time Fourier transform (DTFT) of Eq. (A1), we have, via the convolution theorem for discrete sequences 45 , a formally analogous version of Eq. (6), with the individual Fourier transforms approximated by Fourier series: where, for example, . We indicate in the keys the data sets from Table I to which the forcings belong. We note that in either case, we neglected the iteration, skipping stages 1 and 2 and setting ∆ Ψ = 0 for all l straightaway in stage 3, the validity of which is suggested by the very similar Green's functions h [T s ],g and h [T s ],s , as indicated by Fig. 11. Correspondingly, the required f s is very similar to the given f g . A small gap between the red and blue ramps that can be resolved only with a smooth estimate [i.e., in (b) but not in (a)] informs us that the system responds slightly faster to the greenhouse forcing. This characteristic was already suggested by Fig. 11(b) and the exact results of the parameter estimation by fitting. Results presented in Sec. IV B suggest that it is likely to do with nonlinearity, which makes the responses to negative and positive anomalies "asymmetric," resulting also in different spatial patterns, while the time scales associated with different locales are quite varied (results not shown).
function of the frequency f , is often sampled at f = k/(NT ), k = 0, . . . , N − 1: for any n 0 , which yields the discrete Fourier transform (DFT) of the finite sequence f N [n], n = n 0 , . . . , n 0 + N, where the full infinite sequence f N [n], n ∈ R, turns out to be N-periodic, since it has to be, for the equivalence of the two sums in Eq. (D2), in the so-called periodic summation form Therefore, when f [n] is actually N-periodic, its DTFT is nonzero only at f = k/(NT ), k ∈ R, and also periodic, such that the DFT of a single cycle of f [n] is able to represent its DTFT. For such periodic sequences, to be denoted distinctively using a subscript as f N [n], it can be proved 45 N f + N y , N), . . . , N + min(1 + N y , N f + N y , N − 1). Therefore, when faced with the practical situation of having finite time series, f [l] and h Ψ [l], l = 0, . . . , L − 1, Eq. (D5) can be used to determine the response h Ψ * f [l], l = 0, . . . , L − 1 (whose usefulness comes from an efficient algorithm for evaluating the DFT, called the fast Fourier transform algorithm, FFT). In particular, if the two sequences are to be padded in front by a number N f = N h = N 0 of zeros equally (so that the circular convolution (D5) is well defined), then the reconstructed length of the linear convolution h Ψ * f (the response of a causal system coinciding with the forcing) is 1 + N 0 − max(N 0 − L + 1, 0). This is a linear function of N 0 saturating at N 0 = L − 1 reaching the full length L. Therefore, for simplicity one can pad by N 0 = L − 1 zeros, 81 and we will denote these padded sequences by, for example,f [l], l = 0, . . . , 2(L − 1). Note that padding with fewer or no zeros results in a circular convolution that better approximates either the useful or the not useful part of the linear convolution, which approximation is better the more zeros are used. In the extreme case of no padding, very little of the useful part can be approximated well. The key to the applicability of Eq. (D4) is that it does not matter how the forcing f [n]-and with it the response Ψ [n]-continue after our experiment, and so they can be thought of as periodic.
Appendix E: Zonal average surface temperature For the diagnostics of any residual total response, a good starting point is to look at the zonally averaged fields of the surface air temperature. As a reminder, any observable other than the one for which a desired evolution has been enforced (the global average surface air temperature in our case; see Sec. III A 1) can evolve in an uncontrolled fashion. This is something that is not intended and should be checked. First, we show results with the √ 2-fold [CO 2 ] increase (CR1, BR1). Following Lucarini et al. 32 (where only the case of [CO 2 ]doubling was treated), treating zonal means in a similar fashion to global means informs us that the response to either greenhouse or solar forcing is strongest at high-latitude/polar regions; see Figs. 13(a) and 13(c). This is where the response is most nonlinear, as indicated by Figs. 13(b) and 13(d), showing the difference between reference and prediction. This nonlinearity should be due to albedo saturation and/or nonlinear characteristics of radiation physics, as discussed by others. 32,40,41,86 We note that in Figs. 13(b) and 13(d), nonzero values under the stationary climate (after a transient) represent finite-ensemble-size statistical errors only. As a consequence of the said nonlinearities, in the high-latitude regions, linear response theory performs very poorly in predicting the total response to combined forcing, as it does also in the regime of stationary climate; compare Figs. 14(a) and 14(b) showing the prediction and reference, respectively.
In addition to such a visual comparison, it is customary to quantify the discrepancy by measuring the error of the prediction relative to the true value. However, the true value can be zero at certain latitudes, and this naive relative error measure then lacks an obvious meaning. In these situations, it is customary 87 to analyze the following relative error: This takes values in the interval [0,1] for all values of ∆ Ψ BRX and Ψ (1) BRX ; and a larger value should be considered worse. Clearly, e 1 (µ) as a function of latitude would facilitate the comparison of the predictive skill of linear response theory at different latitudes. We note that in Eq. (E1), Ψ BRX is meant to be an estimator of the actual quantity, and this estimator is biased, but to keep things simple, we do not introduce a separate symbol for the estimator. Another possibility in our situation is that we measure the error of prediction of the response to combined forcing relative to the response to one of the forcings: We evaluate e 1 and e 2 only with respect to the stationary climate, in which case the estimation is very accurate since we can take an average also with respect to time. Figure 15(a) shows the result in the case of the weaker forcing (CR1, BR1). Both e 1 and e 2 indicate with good agreement that the prediction is poorest at some high-latitude regions.
With [CO 2 ]-doubling (CR2, BR2), as shown by the results in Fig. 15(b), the performance has different characteristics in comparison with the case of weak forcing. Both e 1 and e 2 are highest at both equatorial and some high-latitude regions, and somewhat less at polar and some Southern Hemisphere midlatitude regions. FIG. 13. Response of the zonally averaged surface air temperature to ramp forcings. The first column shows the true responses in the model and the second the errors in the linear predictions. The first and second rows belong to the CR1 and SR1 forcing scenarios, respectively. Similar diagrams as in the first row but for CR2 are shown in Fig. 6 of Ref. 32.
FIG. 14. Predicted (a) and true (b) total responses of the zonally averaged surface air temperature to combined ramp forcings (BR1).