Empirical Model Reduction (EMR)

Modern climate dynamics uses a two-fisted approach in attacking and solving the problems of atmospheric and oceanic flows. The two fists are: (i) observational analyses; and (ii) simulations of the geofluids, including the coupled atmosphere-ocean system, using a hierarchy of dynamical models. These models represent interactions between many of processes that act on a broad range of spatial and time scales, from a few kilometers to tens of thousands, and from diurnal to multi-decadal, respectively. The evolution of virtual climates simulated by the most detailed and realistic models in the hierarchy is typically as difficult to interpret as that of the actual climate system, based on the available observations thereof. Highly simplified models of weather and climate, though, help gain a deeper understanding of a few isolated processes, as well as clues on how the interaction between these processes and the rest of the climate system may participate in shaping climate variability. Finally, models of intermediate complexity, which resolve well a subset of the climate system and parameterize the remainder of the processes or scales of motion, serve as a conduit between the models at the two ends of the hierarchy.

We developed a methodology for constructing intermediate models based almost entirely on the observed evolution of selected climate fields, without reference to dynamical equations that may govern this evolution; these models parameterize unresolved processes as multivariate stochastic forcing. This methodology may be applied with equal success to actual observational data sets, as well as to data sets resulting from a high-end model simulation. This methodology has benn successfully applied to: (i) observed and simulated low-frequency variability of atmospheric flows in the Northern Hemisphere; (ii) observed evolution of tropical sea-surface temperatures; and (iii) observed air-sea interaction in the Southern Ocean. In each case, the reduced stochastic model represents surprisingly well a variety of linear and nonlinear statistical properties of the resolved fields. Our methodology thus provides efficient means of constructing reduced, numerically inexpensive climate models. These models can be thought of as stochastic-dynamic prototypes of more complex deterministic models, as in examples (i), but work just as well in the situation when the actual governing equations are poorly known, as in (ii) and (iii). These models can serve as competitive prediction tools, as in (ii), or be included as stochastic parameterizations of certain processes within more complex climate models, as in (iii).

ENSO

We apply a multiple polynomial regression of sea-surface temperature anomalies (SSTA) to obtain both linear and nonlinear stochastically forced models of ENSO. The 1950-2002 Kaplan extended SSTA dataset (IRI/LDEO Climate Data Library, January 1950-December 2002), 60N-30S, 30E-110W is used for both model training and validation: the models were estimated on 1950-1995 data, and verified on1996-2002 data, which includes strong 1997-1998 warm ENSO event, Fig.1.

Figure 1. Time series of the observed NINO-3 SSTA from January 1950 to December 2002 and of the same dataset with a low-frequency trend removed.

Our stochastic ENSO model is governed by the set of ordinary differential equations (ODEs) that are derived by applying multiple polynomail regression on a few leading empirical orthogonal functions (EOFs) of observationl data. A multi-layer regression procedure is employed, where the regression residual at a given level is modeled as an ODE of predictor variables at the current, and all preceding levels. The number of levels is determined so that the lag-0 covariance of the regression residual converges to a constant matrix, while its lag-1 covariance vanishes. Thus, on the last level, the regression residual is modeled by the spatially correlated white noise, representing stochastic forcing in the model. Regression models are obtained via partial least-square (PLS) cross-validation procedure; PLS finds the so-called factors, or latent variables that both capture the maximum variance in the predictor variables, and achieve high correlation with response variables. On a first regression level the principal components (PCs) of EOFs are the predictor variables, while response variables are the PC’s time derivatives. On a second and the following levels, the response variables are time derivatives of the regression residuals on the preceding level.

A main feature of the nonlinear model is its ability to simulate a non-Gaussian distribution of ENSO events, with warm events having larger amplitude than cold events. This property cannot be captured by a linear model, as it is demonstrated by Nino-3 ensemble statistics, see Fig.2. Each member of ensemble is a stochactic realisation of Nino-3 index of the same length as the observed data, obtained by running a model with initial conditions for January 1950. The main features of data’s nonnormal PDF, such as skewness and a fatter positive tail, though smoothed by the stochastic realisations, are reproduced quite well by nonlinear model, while not by a linear model.

Figure 2. The histogram (bars) and superimposed fitted normal density (solid line) of the detrended observed Nino-3 index, and of Nino-3 stochastic ensembles of both linear and nonlinear models. The nonlinear model captures positive skewness of the observed time series distribution, while linear model does not.

The seasonal dependence of ENSO is demonstrated in data and model simulations in Fig.3a and Fig.3b. In both models the locking to a seasonal cycle is achived by adding a linear oscillatory forcing with a one year period on the first regression level.

Figure 3a. Boxplots for each month of the observed Nino-3 SSTA (detrended), and of the Nino-3 stochastic ensemble for linear and nonlinear models. Points beyond the whiskers represent outliers of the distribution; both nonlinear model and observed time series exhibit similar non-symmetric positively skewed outliers population, while for linear model it is clearly symmetric. Other observed features captured by a nonlinear model include stronger interannual spread and higher outlier values in winter.

Figure 3b. Seasonal dependence of ENSO extreme events.

The extreme events distribution for data and model simulations are presented in Fig.3b. The extreme events are defined here as Nino-3 index values exceeding one standard deviation in standardized time series. Both models seem to capture data behaviour. However, such measure does not take into account the absoulte value of the event, which is done by the so-called boxplot. The boxplot statistics for each month of the year is presented in Fig.3a.

For a nonlinear model the boxplot ensemble statistics is much closer to that of a real data. Both nonlinear model and observed time series exhibit similar positively skewed outliers distribution, while for linear model it is clearly symmetric. Other observed features captured by both models are stronger interannual spread and higher outlier values in winter.

Model verification

We perform cross-validated forecasts by training our regression models on a reduced data set that leaves out several-year-long segments of SST evolution which we subsequently predict. We choose these segments to span the whole warm and/or cold ENSO phases, and thus divide the time series into intervals that start and end in January of the following years: 1950, 1955, 1960, 1964, 1968, 1972, 1975, 1979, 1982, 1986, 1990, 1994, 1997 and 2002.

Figure 4. Comparison of the one-level (1L) and two-level (2L) nonlinear models' performance. Anomaly correlation at month 9 (9-month-lead forecast) with cross-validation: (a)1-L model; (b) 2-L model; (c) Nino-3 SST anomaly forecast skill as a function of a lead time for 1-L model, 2-L model, and persistence forecast, lines are defined in the legend. Nino-3 SST anomaly is defined as the area-average over the box shown in panels (a) and (b).

The 1-L and 2-L nonlinear (quadratic) models are compared in terms of their respective forecast skills in Fig.4. Spatial distribution of SST anomaly correlation of 9-month lead forecast for the 1-L and 2-L models is shown in panels (a) and (b), respectively. The forecast time series is a 100 member ensemble mean of models’ stochastic realizations, and it has been cross-validated over the whole time series as described above. Although both models have similar skill patterns, the 2-L model’s forecast skill is higher than that of its one-level counterpart, especially south of the Nino-3 region, shown as the black box in panels (a) and (b).

The correlation between the observed and predicted area-averaged Nino-3 SST anomalies is plotted in panel ©: the 2-L model is getting significantly better than a 1-L model at lead times longer than 4 months; both 1-L and 2-L models beat persistence forecast.

Figure 5. Anomaly correlation and normalized error variance of the cross-validated Nino-3 SST forecasts: (a) and (b) for the 100-member ensemble mean, (c) and (d) for the extreme event forecast series.

Next we compare 2-L linear and nonlinear models. The anomaly correlation coefficient and normalized prediction error varaince, are shown in Fig.5 for the cross-validated Nino3 SST forecasts: (a) and (b) for the 100-member ensemble mean, © and (d) for the extreme event forecast series, which are defined as the weighted sum of the top and bottom 20% of ensemble forecast. Such a time series produced by a quadratic model has a smaller rms error than that of its linear counterpart [panel (d)], with both models’ anomaly correlation being essentially the same as for the ensemble mean forecast [compare panels © and (a)]. This improvement of the quadratic versus linear forecasts comes from the ability of nonlinear model to capture non-Gaussian PDF distribution characteristic of ENSO, with positive anomalies having larger amplitude than the negative anomalies, while the PDF distribution of the linear inverse model time series is normal. The linear model tends, therefore, to underestimate the magnitude of El Nino events, and, likewise, to overestimate that of La Nina events.

References

  • Kaplan, A., M. Cane, Y. Kushnir, A. Clement, M. Blumenthal, and B. Rajagopalan, 1998: Analyses of global sea-surface temperature 1856-1991. J. Geophys. Res., 103, 18 567-18 589.
  • Kravtsov S, Kondrashov D, Ghil M, 2005: Multilevel regression modeling of nonlinear processes: Derivation and applications to climatic variability. J. Climate, 18 (21): 4404-4424.
  • Kondrashov D, Kravtsov S, Robertson AW and Ghil M., 2005: A hierarchy of data-based ENSO models. J. Climate, 18 (21): 4425-4444.
  • Kravtsov, S., D. Kondrashov, and M. Ghil, 2009: Empirical model reduction and the modelling hierarchy in climate dynamics and the geosciences, in Stochastic physics and climate modelling. Cambridge University Press, Cambridge , 35-72.