Statistical methods

Kravtsov S, Kondrashov D, Ghil M. Multilevel regression modeling of nonlinear processes: Derivation and applications to climatic variability. Journal of Climate. 2005;18 (21) :4404–4424.Abstract

Predictive models are constructed to best describe an observed field’s statistics within a given class of nonlinear dynamics driven by a spatially coherent noise that is white in time. For linear dynamics, such inverse stochastic models are obtained by multiple linear regression (MLR). Nonlinear dynamics, when more appropriate, is accommodated by applying multiple polynomial regression (MPR) instead; the resulting model uses polynomial predictors, but the dependence on the regression parameters is linear in both MPR and MLR. The basic concepts are illustrated using the Lorenz convection model, the classical double-well problem, and a three-well problem in two space dimensions. Given a data sample that is long enough, MPR successfully reconstructs the model coefficients in the former two cases, while the resulting inverse model captures the three-regime structure of the system’s probability density function (PDF) in the latter case. A novel multilevel generalization of the classic regression procedure is introduced next. In this generalization, the residual stochastic forcing at a given level is subsequently modeled as a function of variables at this level and all the preceding ones. The number of levels is determined so that the lag-0 covariance of the residual forcing converges to a constant matrix, while its lag-1 covariance vanishes. This method has been applied to the output of a three-layer, quasigeostrophic model and to the analysis of Northern Hemisphere wintertime geopotential height anomalies. In both cases, the inverse model simulations reproduce well the multiregime structure of the PDF constructed in the subspace spanned by the dataset’s leading empirical orthogonal functions, as well as the detailed spectrum of the dataset’s temporal evolution. These encouraging results are interpreted in terms of the modeled low-frequency flow’s feedback on the statistics of the subgrid-scale processes.

Chen C, Cane MA, Henderson N, Lee DE, Chapman D, Kondrashov D, Chekroun MD. Diversity, nonlinearity, seasonality and memory effect in ENSO simulation and prediction using empirical model reduction. Journal of Climate. 2016;29 (5) :1809-1830.Abstract

A suite of empirical model experiments under the empirical model reduction framework are conducted to advance the understanding of ENSO diversity, nonlinearity, seasonality, and the memory effect in the simulation and prediction of tropical Pacific sea surface temperature (SST) anomalies. The model training and evaluation are carried out using 4000-yr preindustrial control simulation data from the coupled model GFDL CM2.1. The results show that multivariate models with tropical Pacific subsurface information and multilevel models with SST history information both improve the prediction skill dramatically. These two types of models represent the ENSO memory effect based on either the recharge oscillator or the time-delayed oscillator viewpoint. Multilevel SST models are a bit more efficient, requiring fewer model coefficients. Nonlinearity is found necessary to reproduce the ENSO diversity feature for extreme events. The nonlinear models reconstruct the skewed probability density function of SST anomalies and improve the prediction of the skewed amplitude, though the role of nonlinearity may be slightly overestimated given the strong nonlinear ENSO in GFDL CM2.1. The models with periodic terms reproduce the SST seasonal phase locking but do not improve the prediction appreciably. The models with multiple ingredients capture several ENSO characteristics simultaneously and exhibit overall better prediction skill for more diverse target patterns. In particular, they alleviate the spring/autumn prediction barrier and reduce the tendency for predicted values to lag the target month value.

Kondrashov D, Chekroun MD, Ghil M. Data-driven non-Markovian closure models. Physica D: Nonlinear Phenomena. 2015;297 :33–55.Abstract

This paper has two interrelated foci: (i) obtaining stable and efficient data-driven closure models by using a multivariate time series of partial observations from a large-dimensional system; and (ii) comparing these closure models with the optimal closures predicted by the Mori–Zwanzig (MZ) formalism of statistical physics. Multilayer stochastic models (MSMs) are introduced as both a generalization and a time-continuous limit of existing multilevel, regression-based approaches to closure in a data-driven setting; these approaches include empirical model reduction (EMR), as well as more recent multi-layer modeling. It is shown that the multilayer structure of MSMs can provide a natural Markov approximation to the generalized Langevin equation (GLE) of the MZ formalism. A simple correlation-based stopping criterion for an EMR–MSM model is derived to assess how well it approximates the GLE solution. Sufficient conditions are derived on the structure of the nonlinear cross-interactions between the constitutive layers of a given MSM to guarantee the existence of a global random attractor. This existence ensures that no blow-up can occur for a broad class of MSM applications, a class that includes non-polynomial predictors and nonlinearities that do not necessarily preserve quadratic energy invariants. The EMR–MSM methodology is first applied to a conceptual, nonlinear, stochastic climate model of coupled slow and fast variables, in which only slow variables are observed. It is shown that the resulting closure model with energy-conserving nonlinearities efficiently captures the main statistical features of the slow variables, even when there is no formal scale separation and the fast variables are quite energetic. Second, an MSM is shown to successfully reproduce the statistics of a partially observed, generalized Lotka–Volterra model of population dynamics in its chaotic regime. The challenges here include the rarity of strange attractors in the model’s parameter space and the existence of multiple attractor basins with fractal boundaries. The positivity constraint on the solutions’ components replaces here the quadratic-energy–preserving constraint of fluid-flow problems and it successfully prevents blow-up.

Unal YS, Ghil M. Interannual and interdecadal oscillation patterns in sea level. Climate Dynamics. 1995;11 (5) :255–278.Abstract

Relative sea-level height (RSLH) data at 213 tide-gauge stations have been analyzed on a monthly and an annual basis to study interannual and interdecadal oscillations, respectively. The main tools of the study are singular spectrum analysis (SSA) and multi-channel SSA (M-SSA). Very-low-frequency variability of RSLH was filtered by SSA to estimate the linear trend at each station. Global sea-level rise, after postglacial rebound corrections, has been found to equal 1.62±0.38 mm/y, by averaging over 175 stations which have a trend consistent with the neighboring ones. We have identified two dominant time scales of El Niño-Southern Oscillation (ENSO) variability, quasi-biennial and low-frequency, in the RSLH data at almost all stations. However, the amplitudes of both ENSO signals are higher in the equatorial Pacific and along the west coast of North America. RSLH data were interpolated along ocean coasts by latitudinal intervals of 5 or 10 degrees, depending on station density. Interannual variability was then examined by M-SSA in five regions: eastern Pacific (25°S–55°N at 10° resolution), western Pacific (35°S–45°N at 10°), equatorial Pacific (123°E–169°W, 6 stations), eastern Atlantic (30°S, 0°, and 30°N–70°N at 5°) and western Atlantic (50°S–50°N at 10°). Throughout the Pacific, we have found three dominant spatio-temporal oscillatory patterns, associated with time scales of ENSO variability; their periods are 2, 2.5–3 and 4–6 y. In the eastern Pacific, the biennial mode and the 6-y low-frequency mode propagate poleward. There is a southward propagation of low-frequency modes in the western Pacific RSLH, between 35°N and 5°S, but no clear propagation in the latitudes further south. However, equatorward propagation of the biennial signal is very clear in the Southern Hemisphere. In the equatorial Pacific, both the quasi-quadrennial and quasi-biennial modes at 10°N propagate westward. Strong and weak El Niño years are evident in the sea-level time series reconstructed from the quasi-biennial and low-frequency modes. Interannual variability with periods of 3 and 4–8 y is detected in the Atlantic RSLH data. In the eastern Atlantic region, we have found slow propagation of both modes northward and southward, away from 40–45°N. Interdecadal oscillations were studied using 81 stations with sufficiently long and continuous records. Most of these have variability at 9–13 and some at 18 y. Two significant eigenmode pairs, corresponding to periods of 11.6 and 12.8 y, are found in the eastern and western Atlantic ocean at latitudes 40°N–70°N and 10°N–50°N, respectively.

Kravtsov S, Kondrashov D, Kamenkovich I, Ghil M. An empirical stochastic model of sea-surface temperatures and surface winds over the Southern Ocean. Ocean Science [Internet]. 2011;7 (6) :755–770. Publisher's VersionAbstract

This study employs NASA's recent satellite measurements of sea-surface temperatures (SSTs) and sea-level winds (SLWs) with missing data filled-in by Singular Spectrum Analysis (SSA), to construct empirical models that capture both intrinsic and SST-dependent aspects of SLW variability. The model construction methodology uses a number of algorithmic innovations that are essential in providing stable estimates of the model's propagator. The best model tested herein is able to faithfully represent the time scales and spatial patterns of anomalies associated with a number of distinct processes. These processes range from the daily synoptic variability to interannual signals presumably associated with oceanic or coupled dynamics. Comparing the simulations of an SLW model forced by the observed SST anomalies with the simulations of an SLW-only model provides preliminary evidence for the ocean driving the atmosphere in the Southern Ocean region.

Kondrashov D, Ghil M. Spatio-temporal filling of missing points in geophysical data sets. Nonlinear Processes in Geophysics. 2006;13 (2) :151–159.Abstract

The majority of data sets in the geosciences are obtained from observations and measurements of natural systems, rather than in the laboratory. These data sets are often full of gaps, due to to the conditions under which the measurements are made. Missing data give rise to various problems, for example in spectral estimation or in specifying boundary conditions for numerical models. Here we use Singular Spectrum Analysis (SSA) to fill the gaps in several types of data sets. For a univariate record, our procedure uses only temporal correlations in the data to fill in the missing points. For a multivariate record, multi-channel SSA (M-SSA) takes advantage of both spatial and temporal correlations. We iteratively produce estimates of missing data points, which are then used to compute a self-consistent lag-covariance matrix; cross-validation allows us to optimize the window width and number of dominant SSA or M-SSA modes to fill the gaps. The optimal parameters of our procedure depend on the distribution in time (and space) of the missing data, as well as on the variance distribution between oscillatory modes and noise. The algorithm is demonstrated on synthetic examples, as well as on data sets from oceanography, hydrology, atmospheric sciences, and space physics: global sea-surface temperature, flood-water records of the Nile River, the Southern Oscillation Index (SOI), and satellite observations of relativistic electrons.

Groth A, Ghil M. Multivariate singular spectrum analysis and the road to phase synchronization. Physical Review E. 2011;84 :036206.Abstract

We show that multivariate singular spectrum analysis (M-SSA) greatly helps study phase synchronization in a large system of coupled oscillators and in the presence of high observational noise levels. With no need for detailed knowledge of individual subsystems nor any a priori phase de?nition for each of them, we demonstrate that M-SSA can automatically identify multiple oscillatory modes and detect whether these modes are shared by clusters of phase- and frequency-locked oscillators. As an essential modi?cation of M-SSA, here we introduce variance-maximization (varimax) rotation of the M-SSA eigenvectors to optimally identify synchronized-oscillator clustering.

Walwer D, Calais E, Ghil M. Data-Adaptive Detection of Transient Deformation in Geodetic Networks. Journal of Geophysical Research: Solid Earth. 2016;121 (3) :2129-2152 .Abstract

The recent development of dense and continuously operating Global Navigation Satellite System (GNSS) networks worldwide has led to a significant increase in geodetic data sets that sometimes capture transient-deformation signals. It is challenging, however, to extract such transients of geophysical origin from the background noise inherent to GNSS time series and, even more so, to separate them from other signals, such as seasonal redistributions of geophysical fluid mass loads. In addition, because of the very large number of continuously recording GNSS stations now available, it has become impossible to systematically inspect each time series and visually compare them at all neighboring sites. Here we show that Multichannel Singular Spectrum Analysis (M-SSA), a method derived from the analysis of dynamical systems, can be used to extract transient deformations, seasonal oscillations, and background noise present in GNSS time series. M-SSA is a multivariate, nonparametric, statistical method that simultaneously exploits the spatial and temporal correlations of geophysical fields. The method allows for the extraction of common modes of variability, such as trends with nonconstant slopes and oscillations shared across time series, without a priori hypotheses about their spatiotemporal structure or their noise characteristics. We illustrate this method using synthetic examples and show applications to actual GPS data from Alaska to detect seasonal signals and microdeformation at the Akutan active volcano. The geophysically coherent spatiotemporal patterns of uplift and subsidence thus detected are compared to the results of an idealized model of such processes in the presence of a magma chamber source.

Kondrashov D, Kravtsov S, Ghil M. Empirical mode reduction in a model of extratropical low-frequency variability. Journal of the Atmospheric Sciences. 2006;63 (7) :1859–1877.Abstract

This paper constructs and analyzes a reduced nonlinear stochastic model of extratropical low-frequency variability. To do so, it applies multilevel quadratic regression to the output of a long simulation of a global baroclinic, quasigeostrophic, three-level (QG3) model with topography; the model's phase space has a dimension of O(104). The reduced model has 45 variables and captures well the non-Gaussian features of the QG3 model's probability density function (PDF). In particular, the reduced model's PDF shares with the QG3 model its four anomalously persistent flow patterns, which correspond to opposite phases of the Arctic Oscillation and the North Atlantic Oscillation, as well as the Markov chain of transitions between these regimes. In addition, multichannel singular spectrum analysis identifies intraseasonal oscillations with a period of 35–37 days and of 20 days in the data generated by both the QG3 model and its low-dimensional analog. An analytical and numerical study of the reduced model starts with the fixed points and oscillatory eigenmodes of the model's deterministic part and uses systematically an increasing noise parameter to connect these with the behavior of the full, stochastically forced model version. The results of this study point to the origin of the QG3 model's multiple regimes and intraseasonal oscillations and identify the connections between the two types of behavior.

Feliks Y, Ghil M, Ziona I, Dynamique M. Long-range forecasting and the scientific background in Joseph's interpretation to Pharaoh’s dreams, in Proc. 16th Conf. Research Judaea & Samaria. in press ; 2006.Abstract

Long-range forecasting is today a major area of climate research. Such forecasts affect socioeconomic planning in many fields of activity. There are essentially two approaches to longrange forecasting: one is based on solving the equations that govern atmospheric and ocean dynamics, the other on the statistical properties of past climate records. The present talk is based on the latter, statistical approach. Joseph’s interpretation of Pharaoh’s dreams provides a striking example of long-range planning based on a climate forecast. Joseph interpreted the two dreams as a forecast for seven years of plenty, followed by seven of famine. Based on this forecast, he proposed to Pharaoh a plan for running the agriculture and economy of Egypt. It is not clear from the Biblical story why Pharaoh trusted Joseph’s forecast and appointed him to implement the plan. Our answer to this question is based on ancient and medieval Egypt’s being entirely dependent on the Nile River’s seasonal flooding: when the highest water levels did not cover the arable areas of the river valley, crops were insufficient to feed the population. When successive years of hunger weakened the economy and the state, change of rulers could, and sometimes did ensue. Extreme examples were the fall of the Old Kingdom in 2185 B.C. and the Fatimid conquest of Egypt in 969 A.D. Hence the Egyptians measured the high-water mark of the Nile River for over 5000 years, using different tools. The most advanced of these tools was the nilometer; typical nilometers appear in several mosaics from the Roman and Byzantine period around the Mediterranean, such as the “Nile Festival” mosaic in Zippori (Upper Galilee), Fig. 1. The measurements had a twofold purpose: first to set the annual taxes, which were a function of the high-water mark, for obvious reasons; and second, to provide information for water management, with a view to reduce drought damage. Our analysis of high- and low-water levels for 622–1922 A.D. shows that oscillations with a period of several years occur, with a 7-year oscillation being dominant. We suspect that the origin of this 7-year swing lies in the same periodicity being present in the North Atlantic’s sea-surface temperatures and sea-level pressures. This North Atlantic Oscillation affects the climate of Europe, North America and the Middle East, and might be the ultimate reason for Joseph’s successful climate forecast.

Groth A, Ghil M, Hallegatte S, Dumas P. The Role of Oscillatory Modes in U.S. Business Cycles. Fondazione Eni Enrico Mattei (FEEM) [Internet]. 2012;26 :1. Publisher's VersionAbstract

We apply the advanced time-and-frequency-domain method of singular spectrum analysis to study business cycle dynamics in a set of nine U.S. macroeconomic indicators. This method provides a robust way to identify and reconstruct shared oscillations, whether intermittent or modulated. We address the problem of spurious cycles generated by the use of detrending filters and present a Monte Carlo test to extract significant oscillations. Finally, we demonstrate that the behavior of the U.S. economy changes significantly between episodes of growth and recession; these variations cannot be generated by random shocks alone, in the absence of endogenous variability.