Statistical methods

Groth A, Ghil M. Monte Carlo Singular Spectrum Analysis (SSA) revisited: Detecting oscillator clusters in multivariate datasets. Journal of Climate. 2015;28 (19) :7873–7893.Abstract

Singular spectrum analysis (SSA) along with its multivariate extension (M-SSA) provides an efficient way to identify weak oscillatory behavior in high-dimensional data. To prevent the misinterpretation of stochastic fluctuations in short time series as oscillations, Monte Carlo (MC)–type hypothesis tests provide objective criteria for the statistical significance of the oscillatory behavior. Procrustes target rotation is introduced here as a key method for refining previously available MC tests. The proposed modification helps reduce the risk of type-I errors, and it is shown to improve the test’s discriminating power. The reliability of the proposed methodology is examined in an idealized setting for a cluster of harmonic oscillators immersed in red noise. Furthermore, the common method of data compression into a few leading principal components, prior to M-SSA, is reexamined, and its possibly negative effects are discussed. Finally, the generalized Procrustes test is applied to the analysis of interannual variability in the North Atlantic’s sea surface temperature and sea level pressure fields. The results of this analysis provide further evidence for shared mechanisms of variability between the Gulf Stream and the North Atlantic Oscillation in the interannual frequency band.

Plaut G, Ghil M, Vautard R. Interannual and Interdecadal Variability in 335 Years of Central England Temperatures. Science. 1995;268 (5211) :710–713.Abstract

Understanding the natural variability of climate is important for predicting its near-term evolution. Models of the oceans' thermohaline and wind-driven circulation show low-frequency oscillations. Long instrumental records can help validate the oscillatory behavior of these models. Singular spectrum analysis applied to the 335-year-long central England temperature (CET) record has identified climate oscillations with interannual (7- to 8-year) and interdecadal (15- and 25-year) periods, probably related to the North Atlantic's wind-driven and thermohaline circulation, respectively. Statistical prediction of oscillatory variability shows CETs decreasing toward the end of this decade and rising again into the middle of the next.

Edeline E, Groth A, Cazelles B, Claessen D, Winfield IJ, Ohlberger J, Asbjørn Vøllestad L, Stenseth NC, Ghil M. Pathogens trigger top-down climate forcing on ecosystem dynamics. Oecologia. 2016 :1–14.Abstract

Evaluating the effects of climate variation on ecosystems is of paramount importance for our ability to forecast and mitigate the consequences of global change. However, the ways in which complex food webs respond to climate variations remain poorly understood. Here, we use long-term time series to investigate the effects of temperature variation on the intraguild-predation (IGP) system of Windermere (UK), a lake where pike (Esox lucius, top predator) feed on small-sized perch (Perca fluviatilis) but compete with large-sized perch for the same food sources. Spectral analyses of time series reveal that pike recruitment dynamics are temperature controlled. In 1976, expansion of a size-truncating perch pathogen into the lake severely impacted large perch and favoured pike as the IGP-dominant species. This pathogen-induced regime shift to a pike-dominated IGP apparently triggered a temperature-controlled trophic cascade passing through pike down to dissolved nutrients. In simple food chains, warming is predicted to strengthen top–down control by accelerating metabolic rates in ectothermic consumers, while pathogens of top consumers are predicted to dampen this top–down control. In contrast, the local IGP structure in Windermere made warming and pathogens synergistic in their top–down effects on ecosystem functioning. More generally, our results point to top predators as major mediators of community response to global change, and show that size-selective agents (e.g. pathogens, fishers or hunters) may change the topological architecture of food webs and alter whole ecosystem sensitivity to climate variation.

Kondrashov D, Kravtsov S, Robertson AW, Ghil M. A hierarchy of data-based ENSO models. Journal of climate. 2005;18 (21) :4425–4444.Abstract

Global sea surface temperature (SST) evolution is analyzed by constructing predictive models that best describe the dataset’s statistics. These inverse models assume that the system’s variability is driven by spatially coherent, additive noise that is white in time and are constructed in the phase space of the dataset’s leading empirical orthogonal functions. Multiple linear regression has been widely used to obtain inverse stochastic models; it is generalized here in two ways. First, the dynamics is allowed to be nonlinear by using polynomial regression. Second, a multilevel extension of classic regression allows the additive noise to be correlated in time; to do so, the residual stochastic forcing at a given level is modeled as a function of variables at this level and the preceding ones. The number of variables, as well as the order of nonlinearity, is determined by optimizing model performance. The two-level linear and quadratic models have a better El Niño–Southern Oscillation (ENSO) hindcast skill than their one-level counterparts. Estimates of skewness and kurtosis of the models’ simulated Niño-3 index reveal that the quadratic model reproduces better the observed asymmetry between the positive El Niño and negative La Niña events. The benefits of the quadratic model are less clear in terms of its overall, cross-validated hindcast skill; this model outperforms, however, the linear one in predicting the magnitude of extreme SST anomalies. Seasonal ENSO dependence is captured by incorporating additive, as well as multiplicative forcing with a 12-month period into the first level of each model. The quasi-quadrennial ENSO oscillatory mode is robustly simulated by all models. The “spring barrier” of ENSO forecast skill is explained by Floquet and singular vector analysis, which show that the leading ENSO mode becomes strongly damped in summer, while nonnormal optimum growth has a strong peak in December.

Groth A, Ghil M, Hallegatte S, Dumas P. The Role of Oscillatory Modes in U.S. Business Cycles. OECD Journal: Journal of Business Cycle Measurement and Analysis. 2015;(2015/1) :63–81.Abstract

We apply multivariate singular spectrum analysis to the study of U.S. business cycle dynamics. This method provides a robust way to identify and reconstruct oscillations, whether intermittent or modulated. We show such oscillations to be associated with comovements across the entire economy. The problem of spurious cycles generated by the use of detrending filters is addressed and we present a Monte Carlo test to extract significant oscillations. The behavior of the U.S. economy is shown to change significantly from one phase of the business cycle to another: the recession phase is dominated by a five-year mode, while the expansion phase exhibits more complex dynamics, with higher-frequency modes coming into play. We show that the variations so identified cannot be generated by random shocks alone, as assumed in ‘real’ business-cycle models, and that endogenous, deterministically generated variability has to be involved.

Sella L, Vivaldo G, Groth A, Ghil M. Economic Cycles and their Synchronization: A spectral survey. Fondazione Eni Enrico Mattei (FEEM) [Internet]. 2013;105 (105) :1. Publisher's VersionAbstract

The present work applies several advanced spectral methods to the analysis of macroeconomic fluctuations in three countries of the European Union: Italy, The Netherlands, and the United Kingdom. We focus here in particular on singular-spectrum analysis (SSA), which provides valuable spatial and frequency information of multivariate data and that goes far beyond a pure analysis in the time domain. The spectral methods discussed here are well established in the geosciences and life sciences, but not yet widespread in quantitative economics. In particular, they enable one to identify and describe nonlinear trends and dominant cycles –- including seasonal and interannual components –- that characterize the deterministic behavior of each time series. These tools have already proven their robustness in the application on short and noisy data, and we demonstrate their usefulness in the analysis of the macroeconomic indicators of these three countries. We explore several fundamental indicators of the countries' real aggregate economy in a univariate, as well as a multivariate setting. Starting with individual single-channel analysis, we are able to identify similar spectral components among the analyzed indicators. Next, we consider combinations of indicators and countries, in order to take different effects of comovements into account. Since business cycles are cross-national phenomena, which show common characteristics across countries, our aim is to uncover hidden global behavior across the European economies. Results are compared with previous findings on the U.S. indicators \citepGroth.ea.FEEM.2012. Finally, the analysis is extended to include several indicators from the U.S. economy, in order to examine its influence on the European market.

Feliks Y, Groth A, Robertson AW, Ghil M. Oscillatory Climate Modes in the Indian Monsoon, North Atlantic and Tropical Pacific. Journal of Climate. 2013;26 :9528-–9544.Abstract

This paper explores the three-way interactions between the Indian monsoon, the North Atlantic and the Tropical Pacific. Four climate records were analyzed: the monsoon rainfall in two Indian regions, the Southern Oscillation Index for the Tropical Pacific, and the NAO index for the North Atlantic. The individual records exhibit highly significant oscillatory modes with spectral peaks at 7–8 yr and in the quasi-biennial and quasi-quadrennial bands. The interactions between the three regions were investigated in the light of the synchronization theory of chaotic oscillators. The theory was applied here by combining multichannel singular-spectrum analysis (M-SSA) with a recently introduced varimax rotation of the M-SSA eigenvectors. A key result is that the 7–8-yr and 2.7-yr oscillatory modes in all three regions are synchronized, at least in part. The energy-ratio analysis, as well as time-lag results, suggest that the NAO plays a leading role in the 7–8-yr mode. It was found therewith that the South Asian monsoon is not slaved to forcing from the equatorial Pacific, although it does interact strongly with it. The time-lag analysis pinpointed this to be the case in particular for the quasi-biennial oscillatory modes. Overall, these results confirm that the approach of synchronized oscillators, combined with varimax-rotated M-SSA, is a powerful tool in studying teleconnections between regional climate modes and that it helps identify the mechanisms that operate in various frequency bands. This approach should be readily applicable to ocean modes of variability and to the problems of air-sea interaction as well.

Kravtsov S, Kondrashov D, Ghil M. Multilevel regression modeling of nonlinear processes: Derivation and applications to climatic variability. Journal of Climate. 2005;18 (21) :4404–4424.Abstract

Predictive models are constructed to best describe an observed field’s statistics within a given class of nonlinear dynamics driven by a spatially coherent noise that is white in time. For linear dynamics, such inverse stochastic models are obtained by multiple linear regression (MLR). Nonlinear dynamics, when more appropriate, is accommodated by applying multiple polynomial regression (MPR) instead; the resulting model uses polynomial predictors, but the dependence on the regression parameters is linear in both MPR and MLR. The basic concepts are illustrated using the Lorenz convection model, the classical double-well problem, and a three-well problem in two space dimensions. Given a data sample that is long enough, MPR successfully reconstructs the model coefficients in the former two cases, while the resulting inverse model captures the three-regime structure of the system’s probability density function (PDF) in the latter case. A novel multilevel generalization of the classic regression procedure is introduced next. In this generalization, the residual stochastic forcing at a given level is subsequently modeled as a function of variables at this level and all the preceding ones. The number of levels is determined so that the lag-0 covariance of the residual forcing converges to a constant matrix, while its lag-1 covariance vanishes. This method has been applied to the output of a three-layer, quasigeostrophic model and to the analysis of Northern Hemisphere wintertime geopotential height anomalies. In both cases, the inverse model simulations reproduce well the multiregime structure of the PDF constructed in the subspace spanned by the dataset’s leading empirical orthogonal functions, as well as the detailed spectrum of the dataset’s temporal evolution. These encouraging results are interpreted in terms of the modeled low-frequency flow’s feedback on the statistics of the subgrid-scale processes.

Chen C, Cane MA, Henderson N, Lee DE, Chapman D, Kondrashov D, Chekroun MD. Diversity, nonlinearity, seasonality and memory effect in ENSO simulation and prediction using empirical model reduction. Journal of Climate. 2016;29 (5) :1809-1830.Abstract

A suite of empirical model experiments under the empirical model reduction framework are conducted to advance the understanding of ENSO diversity, nonlinearity, seasonality, and the memory effect in the simulation and prediction of tropical Pacific sea surface temperature (SST) anomalies. The model training and evaluation are carried out using 4000-yr preindustrial control simulation data from the coupled model GFDL CM2.1. The results show that multivariate models with tropical Pacific subsurface information and multilevel models with SST history information both improve the prediction skill dramatically. These two types of models represent the ENSO memory effect based on either the recharge oscillator or the time-delayed oscillator viewpoint. Multilevel SST models are a bit more efficient, requiring fewer model coefficients. Nonlinearity is found necessary to reproduce the ENSO diversity feature for extreme events. The nonlinear models reconstruct the skewed probability density function of SST anomalies and improve the prediction of the skewed amplitude, though the role of nonlinearity may be slightly overestimated given the strong nonlinear ENSO in GFDL CM2.1. The models with periodic terms reproduce the SST seasonal phase locking but do not improve the prediction appreciably. The models with multiple ingredients capture several ENSO characteristics simultaneously and exhibit overall better prediction skill for more diverse target patterns. In particular, they alleviate the spring/autumn prediction barrier and reduce the tendency for predicted values to lag the target month value.

Kondrashov D, Chekroun MD, Ghil M. Data-driven non-Markovian closure models. Physica D: Nonlinear Phenomena. 2015;297 :33–55.Abstract

This paper has two interrelated foci: (i) obtaining stable and efficient data-driven closure models by using a multivariate time series of partial observations from a large-dimensional system; and (ii) comparing these closure models with the optimal closures predicted by the Mori–Zwanzig (MZ) formalism of statistical physics. Multilayer stochastic models (MSMs) are introduced as both a generalization and a time-continuous limit of existing multilevel, regression-based approaches to closure in a data-driven setting; these approaches include empirical model reduction (EMR), as well as more recent multi-layer modeling. It is shown that the multilayer structure of MSMs can provide a natural Markov approximation to the generalized Langevin equation (GLE) of the MZ formalism. A simple correlation-based stopping criterion for an EMR–MSM model is derived to assess how well it approximates the GLE solution. Sufficient conditions are derived on the structure of the nonlinear cross-interactions between the constitutive layers of a given MSM to guarantee the existence of a global random attractor. This existence ensures that no blow-up can occur for a broad class of MSM applications, a class that includes non-polynomial predictors and nonlinearities that do not necessarily preserve quadratic energy invariants. The EMR–MSM methodology is first applied to a conceptual, nonlinear, stochastic climate model of coupled slow and fast variables, in which only slow variables are observed. It is shown that the resulting closure model with energy-conserving nonlinearities efficiently captures the main statistical features of the slow variables, even when there is no formal scale separation and the fast variables are quite energetic. Second, an MSM is shown to successfully reproduce the statistics of a partially observed, generalized Lotka–Volterra model of population dynamics in its chaotic regime. The challenges here include the rarity of strange attractors in the model’s parameter space and the existence of multiple attractor basins with fractal boundaries. The positivity constraint on the solutions’ components replaces here the quadratic-energy–preserving constraint of fluid-flow problems and it successfully prevents blow-up.

Unal YS, Ghil M. Interannual and interdecadal oscillation patterns in sea level. Climate Dynamics. 1995;11 (5) :255–278.Abstract

Relative sea-level height (RSLH) data at 213 tide-gauge stations have been analyzed on a monthly and an annual basis to study interannual and interdecadal oscillations, respectively. The main tools of the study are singular spectrum analysis (SSA) and multi-channel SSA (M-SSA). Very-low-frequency variability of RSLH was filtered by SSA to estimate the linear trend at each station. Global sea-level rise, after postglacial rebound corrections, has been found to equal 1.62±0.38 mm/y, by averaging over 175 stations which have a trend consistent with the neighboring ones. We have identified two dominant time scales of El Niño-Southern Oscillation (ENSO) variability, quasi-biennial and low-frequency, in the RSLH data at almost all stations. However, the amplitudes of both ENSO signals are higher in the equatorial Pacific and along the west coast of North America. RSLH data were interpolated along ocean coasts by latitudinal intervals of 5 or 10 degrees, depending on station density. Interannual variability was then examined by M-SSA in five regions: eastern Pacific (25°S–55°N at 10° resolution), western Pacific (35°S–45°N at 10°), equatorial Pacific (123°E–169°W, 6 stations), eastern Atlantic (30°S, 0°, and 30°N–70°N at 5°) and western Atlantic (50°S–50°N at 10°). Throughout the Pacific, we have found three dominant spatio-temporal oscillatory patterns, associated with time scales of ENSO variability; their periods are 2, 2.5–3 and 4–6 y. In the eastern Pacific, the biennial mode and the 6-y low-frequency mode propagate poleward. There is a southward propagation of low-frequency modes in the western Pacific RSLH, between 35°N and 5°S, but no clear propagation in the latitudes further south. However, equatorward propagation of the biennial signal is very clear in the Southern Hemisphere. In the equatorial Pacific, both the quasi-quadrennial and quasi-biennial modes at 10°N propagate westward. Strong and weak El Niño years are evident in the sea-level time series reconstructed from the quasi-biennial and low-frequency modes. Interannual variability with periods of 3 and 4–8 y is detected in the Atlantic RSLH data. In the eastern Atlantic region, we have found slow propagation of both modes northward and southward, away from 40–45°N. Interdecadal oscillations were studied using 81 stations with sufficiently long and continuous records. Most of these have variability at 9–13 and some at 18 y. Two significant eigenmode pairs, corresponding to periods of 11.6 and 12.8 y, are found in the eastern and western Atlantic ocean at latitudes 40°N–70°N and 10°N–50°N, respectively.

Kravtsov S, Kondrashov D, Kamenkovich I, Ghil M. An empirical stochastic model of sea-surface temperatures and surface winds over the Southern Ocean. Ocean Science [Internet]. 2011;7 (6) :755–770. Publisher's VersionAbstract

This study employs NASA's recent satellite measurements of sea-surface temperatures (SSTs) and sea-level winds (SLWs) with missing data filled-in by Singular Spectrum Analysis (SSA), to construct empirical models that capture both intrinsic and SST-dependent aspects of SLW variability. The model construction methodology uses a number of algorithmic innovations that are essential in providing stable estimates of the model's propagator. The best model tested herein is able to faithfully represent the time scales and spatial patterns of anomalies associated with a number of distinct processes. These processes range from the daily synoptic variability to interannual signals presumably associated with oceanic or coupled dynamics. Comparing the simulations of an SLW model forced by the observed SST anomalies with the simulations of an SLW-only model provides preliminary evidence for the ocean driving the atmosphere in the Southern Ocean region.

Pages