Statistical methods

Kondrashov D, Chekroun MD, Ghil M. Data-driven non-Markovian closure models. Physica D: Nonlinear Phenomena. 2015;297 :33–55.Abstract

This paper has two interrelated foci: (i) obtaining stable and efficient data-driven closure models by using a multivariate time series of partial observations from a large-dimensional system; and (ii) comparing these closure models with the optimal closures predicted by the Mori–Zwanzig (MZ) formalism of statistical physics. Multilayer stochastic models (MSMs) are introduced as both a generalization and a time-continuous limit of existing multilevel, regression-based approaches to closure in a data-driven setting; these approaches include empirical model reduction (EMR), as well as more recent multi-layer modeling. It is shown that the multilayer structure of MSMs can provide a natural Markov approximation to the generalized Langevin equation (GLE) of the MZ formalism. A simple correlation-based stopping criterion for an EMR–MSM model is derived to assess how well it approximates the GLE solution. Sufficient conditions are derived on the structure of the nonlinear cross-interactions between the constitutive layers of a given MSM to guarantee the existence of a global random attractor. This existence ensures that no blow-up can occur for a broad class of MSM applications, a class that includes non-polynomial predictors and nonlinearities that do not necessarily preserve quadratic energy invariants. The EMR–MSM methodology is first applied to a conceptual, nonlinear, stochastic climate model of coupled slow and fast variables, in which only slow variables are observed. It is shown that the resulting closure model with energy-conserving nonlinearities efficiently captures the main statistical features of the slow variables, even when there is no formal scale separation and the fast variables are quite energetic. Second, an MSM is shown to successfully reproduce the statistics of a partially observed, generalized Lotka–Volterra model of population dynamics in its chaotic regime. The challenges here include the rarity of strange attractors in the model’s parameter space and the existence of multiple attractor basins with fractal boundaries. The positivity constraint on the solutions’ components replaces here the quadratic-energy–preserving constraint of fluid-flow problems and it successfully prevents blow-up.

Unal YS, Ghil M. Interannual and interdecadal oscillation patterns in sea level. Climate Dynamics. 1995;11 (5) :255–278.Abstract

Relative sea-level height (RSLH) data at 213 tide-gauge stations have been analyzed on a monthly and an annual basis to study interannual and interdecadal oscillations, respectively. The main tools of the study are singular spectrum analysis (SSA) and multi-channel SSA (M-SSA). Very-low-frequency variability of RSLH was filtered by SSA to estimate the linear trend at each station. Global sea-level rise, after postglacial rebound corrections, has been found to equal 1.62±0.38 mm/y, by averaging over 175 stations which have a trend consistent with the neighboring ones. We have identified two dominant time scales of El Niño-Southern Oscillation (ENSO) variability, quasi-biennial and low-frequency, in the RSLH data at almost all stations. However, the amplitudes of both ENSO signals are higher in the equatorial Pacific and along the west coast of North America. RSLH data were interpolated along ocean coasts by latitudinal intervals of 5 or 10 degrees, depending on station density. Interannual variability was then examined by M-SSA in five regions: eastern Pacific (25°S–55°N at 10° resolution), western Pacific (35°S–45°N at 10°), equatorial Pacific (123°E–169°W, 6 stations), eastern Atlantic (30°S, 0°, and 30°N–70°N at 5°) and western Atlantic (50°S–50°N at 10°). Throughout the Pacific, we have found three dominant spatio-temporal oscillatory patterns, associated with time scales of ENSO variability; their periods are 2, 2.5–3 and 4–6 y. In the eastern Pacific, the biennial mode and the 6-y low-frequency mode propagate poleward. There is a southward propagation of low-frequency modes in the western Pacific RSLH, between 35°N and 5°S, but no clear propagation in the latitudes further south. However, equatorward propagation of the biennial signal is very clear in the Southern Hemisphere. In the equatorial Pacific, both the quasi-quadrennial and quasi-biennial modes at 10°N propagate westward. Strong and weak El Niño years are evident in the sea-level time series reconstructed from the quasi-biennial and low-frequency modes. Interannual variability with periods of 3 and 4–8 y is detected in the Atlantic RSLH data. In the eastern Atlantic region, we have found slow propagation of both modes northward and southward, away from 40–45°N. Interdecadal oscillations were studied using 81 stations with sufficiently long and continuous records. Most of these have variability at 9–13 and some at 18 y. Two significant eigenmode pairs, corresponding to periods of 11.6 and 12.8 y, are found in the eastern and western Atlantic ocean at latitudes 40°N–70°N and 10°N–50°N, respectively.

Kravtsov S, Kondrashov D, Kamenkovich I, Ghil M. An empirical stochastic model of sea-surface temperatures and surface winds over the Southern Ocean. Ocean Science [Internet]. 2011;7 (6) :755–770. Publisher's VersionAbstract

This study employs NASA's recent satellite measurements of sea-surface temperatures (SSTs) and sea-level winds (SLWs) with missing data filled-in by Singular Spectrum Analysis (SSA), to construct empirical models that capture both intrinsic and SST-dependent aspects of SLW variability. The model construction methodology uses a number of algorithmic innovations that are essential in providing stable estimates of the model's propagator. The best model tested herein is able to faithfully represent the time scales and spatial patterns of anomalies associated with a number of distinct processes. These processes range from the daily synoptic variability to interannual signals presumably associated with oceanic or coupled dynamics. Comparing the simulations of an SLW model forced by the observed SST anomalies with the simulations of an SLW-only model provides preliminary evidence for the ocean driving the atmosphere in the Southern Ocean region.

Kondrashov D, Ghil M. Spatio-temporal filling of missing points in geophysical data sets. Nonlinear Processes in Geophysics. 2006;13 (2) :151–159.Abstract

The majority of data sets in the geosciences are obtained from observations and measurements of natural systems, rather than in the laboratory. These data sets are often full of gaps, due to to the conditions under which the measurements are made. Missing data give rise to various problems, for example in spectral estimation or in specifying boundary conditions for numerical models. Here we use Singular Spectrum Analysis (SSA) to fill the gaps in several types of data sets. For a univariate record, our procedure uses only temporal correlations in the data to fill in the missing points. For a multivariate record, multi-channel SSA (M-SSA) takes advantage of both spatial and temporal correlations. We iteratively produce estimates of missing data points, which are then used to compute a self-consistent lag-covariance matrix; cross-validation allows us to optimize the window width and number of dominant SSA or M-SSA modes to fill the gaps. The optimal parameters of our procedure depend on the distribution in time (and space) of the missing data, as well as on the variance distribution between oscillatory modes and noise. The algorithm is demonstrated on synthetic examples, as well as on data sets from oceanography, hydrology, atmospheric sciences, and space physics: global sea-surface temperature, flood-water records of the Nile River, the Southern Oscillation Index (SOI), and satellite observations of relativistic electrons.