Stochastic Modeling of Multiscale Datasets

Tantet, Alexis, Mickaël D. Chekroun, Henk A. Dijkstra, and J. David Neelin. 2020. “Ruelle-Pollicott Resonances of Stochastic Systems in Reduced State Space. Part II: Stochastic Hopf Bifurcation.” Journal of Statistical Physics, 1-46. Publisher's Version Abstract

The spectrum of the generator (Kolmogorov operator) of a diffusion process, referred to as the Ruelle-Pollicott (RP) spectrum, provides a detailed characterization of correlation functions and power spectra of stochastic systems via decomposition formulas in terms of RP resonances; see Part I of this contribution (Chekroun et al. in Theory J Stat., 2020). Stochastic analysis techniques relying on the theory of Markov semigroups for the study of the RP spectrum and a rigorous reduction method is presented in Part I Chekroun et al. (2020). This framework is here applied to study a stochastic Hopf bifurcation in view of characterizing the statistical properties of nonlinear oscillators perturbed by noise, depending on their stability. In light of the Hörmander theorem, it is first shown that the geometry of the unperturbed limit cycle, in particular its isochrons, i.e., the leaves of the stable manifold of the limit cycle generalizing the notion of phase, is essential to understand the effect of the noise and the phenomenon of phase diffusion. In addition, it is shown that the RP spectrum has a spectral gap, even at the bifurcation point, and that correlations decay exponentially fast. Explicit small-noise expansions of the RP eigenvalues and eigenfunctions are then obtained, away from the bifurcation point, based on the knowledge of the linearized deterministic dynamics and the characteristics of the noise. These formulas allow one to understand how the interaction of the noise with the deterministic dynamics affect the decay of correlations. Numerical results complement the study of the RP spectrum at the bifurcation point, revealing useful scaling laws. The analysis of the Markov semigroup for stochastic bifurcations is thus promising in providing a complementary approach to the more geometric random dynamical system (RDS) approach. This approach is not limited to low-dimensional systems and the reduction method presented in Chekroun et al. (2020) is applied to a stochastic model relevant to climate dynamics in the third part of this contribution (Tantet et al. in J Stat Phys., 2019).

Chekroun, Mickaël D., A. Tantet, Henk A. Dijkstra, and J. David Neelin. 2020. “Ruelle-Pollicott Resonances of Stochastic Systems in Reduced State Space. Part I: Theory.” Journal of Statistical Physics, 1-37. Publisher's Version Abstract

A theory of Ruelle–Pollicott (RP) resonances for stochastic differential systems is presented. These resonances are defined as the eigenvalues of the generator (Kolmogorov operator) of a given stochastic system. By relying on the theory of Markov semigroups, decomposition formulas of correlation functions and power spectral densities (PSDs) in terms of RP resonances are then derived. These formulas describe, for a broad class of stochastic differential equations (SDEs), how the RP resonances characterize the decay of correlations as well as the signal’s oscillatory components manifested by peaks in the PSD. It is then shown that a notion reduced RP resonances can be rigorously defined, as soon as the dynamics is partially observed within a reduced state space V. These reduced resonances are obtained from the spectral elements of reduced Markov operators acting on functions of the state space V, and can be estimated from series. They inform us about the spectral elements of some coarse-grained version of the SDE generator. When the time-lag at which the transitions are collected from partial observations in V, is either sufficiently small or large, it is shown that the reduced RP resonances approximate the (weak) RP resonances of the generator of the conditional expectation in V, i.e. the optimal reduced system in V obtained by averaging out the contribution of the unobserved variables. The approach is illustrated on a stochastic slow-fast system for which it is shown that the reduced RP resonances allow for a good reconstruction of the correlation functions and PSDs, even when the time-scale separation is weak. The companions articles, Part II and Part III, deal with further practical aspects of the theory presented in this contribution. One important byproduct consists of the diagnosis usefulness of stochastic dynamics that RP resonances provide. This is illustrated in the case of a stochastic Hopf bifurcation in Part II. There, it is shown that such a bifurcation has a clear manifestation in terms of a geometric organization of the RP resonances along discrete parabolas in the left half plane. Such geometric features formed by (reduced) RP resonances are extractable from time series and allow thus for providing an unambiguous “signature” of nonlinear oscillations embedded within a stochastic background. By relying then on the theory of reduced RP resonances presented in this contribution, Part III addresses the question of detection and characterization of such oscillations in a high-dimensional stochastic system, namely the Cane–Zebiak model of El Niño-Southern Oscillation subject to noise modeling fast atmospheric fluctuations.


The response of a low-frequency mode of climate variability, El Niño–Southern Oscillation, to stochastic forcing is studied in a high-dimensional model of intermediate complexity, the fully-coupled Cane–Zebiak model (Zebiak and Cane 1987), from the spectral analysis of Markov operators governing the decay of correlations and resonances in the power spectrum. Noise-induced oscillations excited before a supercritical Hopf bifurcation are examined by means of complex resonances, the reduced Ruelle–Pollicott (RP) resonances, via a numerical application of the reduction approach of the first part of this contribution (Chekroun et al. 2019) to model simulations. The oscillations manifest themselves as peaks in the power spectrum which are associated with RP resonances organized along parabolas, as the bifurcation is neared. These resonances and the associated eigenvectors are furthermore well described by the small-noise expansion formulas obtained by Gaspard (2002) and made explicit in the second part of this contribution (Tantet et al. 2019). Beyond the bifurcation, the spectral gap between the imaginary axis and the real part of the leading resonances quantifies the diffusion of phase of the noise-induced oscillations and can be computed from the linearization of the model and from the diffusion matrix of the noise. In this model, the phase diffusion coefficient thus gives a measure of the predictability of oscillatory events representing ENSO. ENSO events being known to be locked to the seasonal cycle, these results should be extended to the non-autonomous case. More generally, the reduction approach theorized in Chekroun et al. (2019), complemented by our understanding of the spectral properties of reference systems such as the stochastic Hopf bifurcation, provides a promising methodology for the analysis of low-frequency variability in high-dimensional stochastic systems.

Kondrashov, Dmitri, Mickaël D. Chekroun, and Michael Ghil. 2018. “Data-adaptive harmonic decomposition and prediction of Arctic sea ice extent.” Dynamics and Statistics of the Climate System 3 (1): 1. Publisher's Version Abstract

Decline in the Arctic sea ice extent (SIE) is an area of active scientific research with profound socio-economic implications. Of particular interest are reliable methods for SIE forecasting on subseasonal time scales, in particular from early summer into fall, when sea ice coverage in the Arctic reaches its minimum. Here, we apply the recent data-adaptive harmonic (DAH) technique of Chekroun and Kondrashov, (2017), Chaos, 27 for the description, modeling and prediction of the Multisensor Analyzed Sea Ice Extent (MASIE, 2006–2016) data set. The DAH decomposition of MASIE identifies narrowband, spatio-temporal data-adaptive modes over four key Arctic regions. The time evolution of the DAH coefficients of these modes can be modelled and predicted by using a set of coupled Stuart–Landau stochastic differential equations that capture the modes’ frequencies and amplitude modulation in time. Retrospective forecasts show that our resulting multilayer Stuart–Landau model (MSLM) is quite skilful in predicting September SIE compared to year-to-year persistence; moreover, the DAH–MSLM approach provided accurate real-time prediction that was highly competitive for the 2016–2017 Sea Ice Outlook.

Kondrashov, Dmitri, Mickaël D. Chekroun, and Pavel Berloff. 2018. “Multiscale Stuart-Landau Emulators: Application to Wind-Driven Ocean Gyres.” Fluids 3 (1): 21. Publisher's Version Abstract

The multiscale variability of the ocean circulation due to its nonlinear dynamics remains a big challenge for theoretical understanding and practical ocean modeling. This paper demonstrates how the data-adaptive harmonic (DAH) decomposition and inverse stochastic modeling techniques introduced in (Chekroun and Kondrashov, (2017), Chaos, 27), allow for reproducing with high fidelity the main statistical properties of multiscale variability in a coarse-grained eddy-resolving ocean flow. This fully-data-driven approach relies on extraction of frequency-ranked time-dependent coefficients describing the evolution of spatio-temporal DAH modes (DAHMs) in the oceanic flow data. In turn, the time series of these coefficients are efficiently modeled by a family of low-order stochastic differential equations (SDEs) stacked per frequency, involving a fixed set of predictor functions and a small number of model coefficients. These SDEs take the form of stochastic oscillators, identified as multilayer Stuart–Landau models (MSLMs), and their use is justified by relying on the theory of Ruelle–Pollicott resonances. The good modeling skills shown by the resulting DAH-MSLM emulators demonstrates the feasibility of using a network of stochastic oscillators for the modeling of geophysical turbulence. In a certain sense, the original quasiperiodic Landau view of turbulence, with the amendment of the inclusion of stochasticity, may be well suited to describe turbulence.

Decadal DAH mode


Kondrashov, Dmitri, and Mickaël D. Chekroun. 2018. “Data-adaptive harmonic analysis and modeling of solar wind-magnetosphere coupling.” Journal of Atmospheric and Solar-Terrestrial Physics 177: 179-189. Publisher's Version Abstract

The solar wind-magnetosphere coupling is studied by new data-adaptive harmonic (DAH) decomposition approach for the spectral analysis and inverse modeling of multivariate time observations of complex nonlinear dynamical systems. DAH identifies frequency-based modes of interactions in the combined dataset of Auroral Electrojet (AE) index and solar wind forcing. The time evolution of these modes can be very effi- ciently simulated by using systems of stochastic differential equations (SDEs) that are stacked per frequency and formed by coupled Stuart-Landau oscillators. These systems of SDEs capture the modes’ frequencies as well as their amplitude modulations, and yield, in turn, an accurate modeling of the AE index’ statistical properties.


Kondrashov, Dmitri, Mickaël D. Chekroun, Xiaojun Yuan, and Michael Ghil. 2018. “Data-adaptive harmonic decomposition and stochastic modeling of Arctic sea ice.” Advances in Nonlinear Geosciences, A. Tsonis, 179-205. Springer. Publisher's Version Abstract

We present and apply a novel method of describing and modeling complex multivariate datasets in the geosciences and elsewhere. Data-adaptive harmonic (DAH) decomposition identifies narrow-banded, spatio-temporal modes (DAHMs) whose frequencies are not necessarily integer multiples of each other. The evolution in time of the DAH coefficients (DAHCs) of these modes can be modeled using a set of coupled Stuart-Landau stochastic differential equations that capture the modes’ frequencies and amplitude modulation in time and space. This methodology is applied first to a challenging synthetic dataset and then to Arctic sea ice concentration (SIC) data from the US National Snow and Ice Data Center (NSIDC). The 36-year (1979–2014) dataset is parsimoniously and accurately described by our DAHMs. Preliminary results indicate that simulations using our multilayer Stuart-Landau model (MSLM) of SICs are stable for much longer time intervals, beyond the end of the twenty-first century, and exhibit interdecadal variability consistent with past historical records. Preliminary results indicate that this MSLM is quite skillful in predicting September sea ice extent.

Chekroun, Mickaël D., and Dmitri Kondrashov. 2017. “Data-adaptive harmonic spectra and multilayer Stuart-Landau models.” Chaos: An Interdisciplinary Journal of Nonlinear Science 27 (9): 093110. Publisher's version Abstract

Harmonic decompositions of multivariate time series are considered for which we adopt an integral operator approach with  
periodic semigroup kernels. Spectral decomposition theorems are derived that cover the important cases of two-time statistics drawn from a mixing invariant measure. 

The corresponding eigenvalues can be grouped per Fourier frequency, and are actually given, at each frequency, as the singular values of a cross-spectral matrix depending on the data. These eigenvalues obey furthermore a variational principle that allows us to define naturally a multidimensional power spectrum.  The eigenmodes, as far as they are concerned, exhibit a data-adaptive character manifested in their phase which allows us in turn to define a multidimensional phase spectrum.

The resulting data-adaptive harmonic (DAH) modes allow for reducing the data-driven modeling effort to elemental models stacked per frequency, only coupled at different frequencies by the same noise realization. In particular, the DAH decomposition extracts time-dependent coefficients stacked by Fourier frequency which can be efficiently modeled---provided the decay of temporal correlations is sufficiently well-resolved---within a class of multilayer stochastic models (MSMs) tailored here on stochastic Stuart-Landau oscillators.

Applications to the Lorenz 96 model and to a stochastic heat equation driven by a space-time white noise, are considered. In both cases, the DAH decomposition allows for an extraction of spatio-temporal modes revealing key features of the dynamics in the embedded phase space. The multilayer Stuart-Landau models (MSLMs) are shown to successfully model the typical patterns of the corresponding time-evolving fields, as well as their statistics of occurrence.

Boers, N., M. D. Chekroun, H. Liu, D. Kondrashov, D.-D. Rousseau, A. Svensson, M. Bigler, and M. Ghil. 2017. “Inverse stochastic-dynamic models for high-resolution Greenland ice-core records.” Earth System Dynamics 8: 1171–1190. Publisher's Version Abstract

Proxy records from Greenland ice cores have been studied for several decades, yet many open questions remain regarding the climate variability encoded therein. Here, we use a Bayesian framework for inferring inverse, stochastic-dynamic models from δ18O and dust records of unprecedented, subdecadal temporal resolution. The records stem from the North Greenland Ice Core Project (NGRIP) and we focus on the time interval 59 ka–22 ka b2k. Our model reproduces the dynamical characteristics of both the δ18O and dust proxy records, including the millennial-scale Dansgaard–Oeschger variability, as well as statistical properties such as probability density functions, waiting times and power spectra, with no need for any external forcing. The crucial ingredients for capturing these properties are (i) high-resolution training data; (ii) cubic drift terms; (iii) nonlinear coupling terms between the δ18O and dust time series; and (iv) non-Markovian contributions that represent short-term memory effects.

Chen, C., M. Cane, N. Henderson, D. Lee, D. Chapman, D. Kondrashov, and M. D. Chekroun. 2016. “Diversity, nonlinearity, seasonality and memory effect in ENSO simulation and prediction using empirical model reduction.” Journal of Climate 29 (5): 1809–1830. Publisher's Version Abstract

A suite of empirical model experiments under the empirical model reduction framework are conducted to advance the understanding of ENSO diversity, nonlinearity, seasonality, and the memory effect in the simulation and prediction of tropical Pacific sea surface temperature (SST) anomalies. The model training and evaluation are carried out using 4000-yr preindustrial control simulation data from the coupled model GFDL CM2.1. The results show that multivariate models with tropical Pacific subsurface information and multilevel models with SST history information both improve the prediction skill dramatically. These two types of models represent the ENSO memory effect based on either the recharge oscillator or the time-delayed oscillator viewpoint. Multilevel SST models are a bit more efficient, requiring fewer model coefficients. Nonlinearity is found necessary to reproduce the ENSO diversity feature for extreme events. The nonlinear models reconstruct the skewed probability density function of SST anomalies and improve the prediction of the skewed amplitude, though the role of nonlinearity may be slightly overestimated given the strong nonlinear ENSO in GFDL CM2.1. The models with periodic terms reproduce the SST seasonal phase locking but do not improve the prediction appreciably. The models with multiple ingredients capture several ENSO characteristics simultaneously and exhibit overall better prediction skill for more diverse target patterns. In particular, they alleviate the spring/autumn prediction barrier and reduce the tendency for predicted values to lag the target month value.

Kondrashov, Dmitri, Mickaël D. Chekroun, and Michael Ghil. 2015. “Data-driven non-Markovian closure models.” Physica D: Nonlinear Phenomena 297: 33 - 55. Publisher's Version Abstract
Abstract This paper has two interrelated foci: (i) obtaining stable and efficient data-driven closure models by using a multivariate time series of partial observations from a large-dimensional system; and (ii) comparing these closure models with the optimal closures predicted by the Mori–Zwanzig (MZ) formalism of statistical physics. Multilayer stochastic models (MSMs) are introduced as both a generalization and a time-continuous limit of existing multilevel, regression-based approaches to closure in a data-driven setting; these approaches include empirical model reduction (EMR), as well as more recent multi-layer modeling. It is shown that the multilayer structure of \MSMs\ can provide a natural Markov approximation to the generalized Langevin equation (GLE) of the \MZ\ formalism. A simple correlation-based stopping criterion for an EMR–MSM model is derived to assess how well it approximates the \GLE\ solution. Sufficient conditions are derived on the structure of the nonlinear cross-interactions between the constitutive layers of a given \MSM\ to guarantee the existence of a global random attractor. This existence ensures that no blow-up can occur for a broad class of \MSM\ applications, a class that includes non-polynomial predictors and nonlinearities that do not necessarily preserve quadratic energy invariants. The EMR–MSM methodology is first applied to a conceptual, nonlinear, stochastic climate model of coupled slow and fast variables, in which only slow variables are observed. It is shown that the resulting closure model with energy-conserving nonlinearities efficiently captures the main statistical features of the slow variables, even when there is no formal scale separation and the fast variables are quite energetic. Second, an \MSM\ is shown to successfully reproduce the statistics of a partially observed, generalized Lotka–Volterra model of population dynamics in its chaotic regime. The challenges here include the rarity of strange attractors in the model’s parameter space and the existence of multiple attractor basins with fractal boundaries. The positivity constraint on the solutions’ components replaces here the quadratic-energy–preserving constraint of fluid-flow problems and it successfully prevents blow-up.
Chekroun, M. D., J. D. Neelin, D. Kondrashov, J. C. McWilliams, and M. Ghil. 2014. “Rough parameter dependence in climate models and the role of Ruelle-Pollicott resonance.” Proceeding of the National Academy of Sciences 111 (5): 1684—1690. Publisher's Version Abstract

Despite the importance of uncertainties encountered in climate model simulations, the fundamental mechanisms at the origin of sensitive behavior of long-term model statistics remain unclear. Variability of turbulent flows in the atmosphere and oceans exhibits recurrent large-scale patterns. These patterns, while evolving irregularly in time, manifest characteristic frequencies across a large range of time scales, from intraseasonal through interdecadal. Based on modern spectral theory of chaotic and dissipative dynamical systems, the associated low-frequency variability may be formulated in terms of Ruelle-Pollicott (RP) resonances. RP resonances encode information on the nonlinear dynamics of the system, and an approach for estimating them—as filtered through an observable of the system—is proposed. This approach relies on an appropriate Markov representation of the dynamics associated with a given observable. It is shown that, within this representation, the spectral gap—defined as the distance between the subdominant RP resonance and the unit circle—plays a major role in the roughness of parameter dependences. The model statistics are the most sensitive for the smallest spectral gaps; such small gaps turn out to correspond to regimes where the low-frequency variability is more pronounced, whereas autocorrelations decay more slowly. The present approach is applied to analyze the rough parameter dependence encountered in key statistics of an El-Niño–Southern Oscillation model of intermediate complexity. Theoretical arguments, however, strongly suggest that such links between model sensitivity and the decay of correlation properties are not limited to this particular model and could hold much more generally.


Kondrashov, K., M. D. Chekroun, A. W. Robertson, and M. Ghil. 2013. “Low-order stochastic model and “past-noise forecasting” of the Madden-Julian oscillation.” Geophysical Research Letters 40 (19): 5303—5310. Publisher's Version Abstract

This paper presents a predictability study of the Madden-Julian Oscillation (MJO) that relies on combining empirical model reduction (EMR) with the “past-noise forecasting” (PNF) method. EMR is a data-driven methodology for constructing stochastic low-dimensional models that account for nonlinearity, seasonality and serial correlation in the estimated noise, while PNF constructs an ensemble of forecasts that accounts for interactions between (i) high-frequency variability (noise), estimated here by EMR, and (ii) the low-frequency mode of MJO, as captured by singular spectrum analysis (SSA). A key result is that—compared to an EMR ensemble driven by generic white noise—PNF is able to considerably improve prediction of MJO phase. When forecasts are initiated from weak MJO conditions, the useful skill is of up to 30 days. PNF also significantly improves MJO prediction skill for forecasts that start over the Indian Ocean.

Chekroun, M. D., D. Kondrashov, and M. Ghil. 2011. “Predicting stochastic systems by noise sampling, and application to the El Niño-Southern Oscillation.” Proceeding of the National Academy of Sciences 108 (29): 11766—11771. Publisher's Version Abstract

Interannual and interdecadal prediction are major challenges of climate dynamics. In this article we develop a prediction method for climate processes that exhibit low-frequency variability (LFV). The method constructs a nonlinear stochastic model from past observations and estimates a path of the “weather” noise that drives this model over previous finite-time windows. The method has two steps: (i) select noise samples—or “snippets”—from the past noise, which have forced the system during short-time intervals that resemble the LFV phase just preceding the currently observed state; and (ii) use these snippets to drive the system from the current state into the future. The method is placed in the framework of pathwise linear-response theory and is then applied to an El Niño–Southern Oscillation (ENSO) model derived by the empirical model reduction (EMR) methodology; this nonlinear model has 40 coupled, slow, and fast variables. The domain of validity of this forecasting procedure depends on the nature of the system’s pathwise response; it is shown numerically that the ENSO model’s response is linear on interannual time scales. As a result, the method’s skill at a 6- to 16-month lead is highly competitive when compared with currently used dynamic and statistic prediction methods for the Niño-3 index and the global sea surface temperature field.