Statistical methods

Kondrashov D, Ghil M. Spatio-temporal filling of missing points in geophysical data sets. Nonlinear Processes in Geophysics. 2006;13 (2) :151–159.Abstract

The majority of data sets in the geosciences are obtained from observations and measurements of natural systems, rather than in the laboratory. These data sets are often full of gaps, due to to the conditions under which the measurements are made. Missing data give rise to various problems, for example in spectral estimation or in specifying boundary conditions for numerical models. Here we use Singular Spectrum Analysis (SSA) to fill the gaps in several types of data sets. For a univariate record, our procedure uses only temporal correlations in the data to fill in the missing points. For a multivariate record, multi-channel SSA (M-SSA) takes advantage of both spatial and temporal correlations. We iteratively produce estimates of missing data points, which are then used to compute a self-consistent lag-covariance matrix; cross-validation allows us to optimize the window width and number of dominant SSA or M-SSA modes to fill the gaps. The optimal parameters of our procedure depend on the distribution in time (and space) of the missing data, as well as on the variance distribution between oscillatory modes and noise. The algorithm is demonstrated on synthetic examples, as well as on data sets from oceanography, hydrology, atmospheric sciences, and space physics: global sea-surface temperature, flood-water records of the Nile River, the Southern Oscillation Index (SOI), and satellite observations of relativistic electrons.

Groth A, Ghil M. Multivariate singular spectrum analysis and the road to phase synchronization. Physical Review E. 2011;84 :036206.Abstract

We show that multivariate singular spectrum analysis (M-SSA) greatly helps study phase synchronization in a large system of coupled oscillators and in the presence of high observational noise levels. With no need for detailed knowledge of individual subsystems nor any a priori phase de?nition for each of them, we demonstrate that M-SSA can automatically identify multiple oscillatory modes and detect whether these modes are shared by clusters of phase- and frequency-locked oscillators. As an essential modi?cation of M-SSA, here we introduce variance-maximization (varimax) rotation of the M-SSA eigenvectors to optimally identify synchronized-oscillator clustering.

Walwer D, Calais E, Ghil M. Data-Adaptive Detection of Transient Deformation in Geodetic Networks. Journal of Geophysical Research: Solid Earth. 2016;121 (3) :2129-2152 .Abstract

The recent development of dense and continuously operating Global Navigation Satellite System (GNSS) networks worldwide has led to a significant increase in geodetic data sets that sometimes capture transient-deformation signals. It is challenging, however, to extract such transients of geophysical origin from the background noise inherent to GNSS time series and, even more so, to separate them from other signals, such as seasonal redistributions of geophysical fluid mass loads. In addition, because of the very large number of continuously recording GNSS stations now available, it has become impossible to systematically inspect each time series and visually compare them at all neighboring sites. Here we show that Multichannel Singular Spectrum Analysis (M-SSA), a method derived from the analysis of dynamical systems, can be used to extract transient deformations, seasonal oscillations, and background noise present in GNSS time series. M-SSA is a multivariate, nonparametric, statistical method that simultaneously exploits the spatial and temporal correlations of geophysical fields. The method allows for the extraction of common modes of variability, such as trends with nonconstant slopes and oscillations shared across time series, without a priori hypotheses about their spatiotemporal structure or their noise characteristics. We illustrate this method using synthetic examples and show applications to actual GPS data from Alaska to detect seasonal signals and microdeformation at the Akutan active volcano. The geophysically coherent spatiotemporal patterns of uplift and subsidence thus detected are compared to the results of an idealized model of such processes in the presence of a magma chamber source.

Kondrashov D, Kravtsov S, Ghil M. Empirical mode reduction in a model of extratropical low-frequency variability. Journal of the Atmospheric Sciences. 2006;63 (7) :1859–1877.Abstract

This paper constructs and analyzes a reduced nonlinear stochastic model of extratropical low-frequency variability. To do so, it applies multilevel quadratic regression to the output of a long simulation of a global baroclinic, quasigeostrophic, three-level (QG3) model with topography; the model's phase space has a dimension of O(104). The reduced model has 45 variables and captures well the non-Gaussian features of the QG3 model's probability density function (PDF). In particular, the reduced model's PDF shares with the QG3 model its four anomalously persistent flow patterns, which correspond to opposite phases of the Arctic Oscillation and the North Atlantic Oscillation, as well as the Markov chain of transitions between these regimes. In addition, multichannel singular spectrum analysis identifies intraseasonal oscillations with a period of 35–37 days and of 20 days in the data generated by both the QG3 model and its low-dimensional analog. An analytical and numerical study of the reduced model starts with the fixed points and oscillatory eigenmodes of the model's deterministic part and uses systematically an increasing noise parameter to connect these with the behavior of the full, stochastically forced model version. The results of this study point to the origin of the QG3 model's multiple regimes and intraseasonal oscillations and identify the connections between the two types of behavior.

Feliks Y, Ghil M, Ziona I, Dynamique M. Long-range forecasting and the scientific background in Joseph's interpretation to Pharaoh’s dreams, in Proc. 16th Conf. Research Judaea & Samaria. in press ; 2006.Abstract

Long-range forecasting is today a major area of climate research. Such forecasts affect socioeconomic planning in many fields of activity. There are essentially two approaches to longrange forecasting: one is based on solving the equations that govern atmospheric and ocean dynamics, the other on the statistical properties of past climate records. The present talk is based on the latter, statistical approach. Joseph’s interpretation of Pharaoh’s dreams provides a striking example of long-range planning based on a climate forecast. Joseph interpreted the two dreams as a forecast for seven years of plenty, followed by seven of famine. Based on this forecast, he proposed to Pharaoh a plan for running the agriculture and economy of Egypt. It is not clear from the Biblical story why Pharaoh trusted Joseph’s forecast and appointed him to implement the plan. Our answer to this question is based on ancient and medieval Egypt’s being entirely dependent on the Nile River’s seasonal flooding: when the highest water levels did not cover the arable areas of the river valley, crops were insufficient to feed the population. When successive years of hunger weakened the economy and the state, change of rulers could, and sometimes did ensue. Extreme examples were the fall of the Old Kingdom in 2185 B.C. and the Fatimid conquest of Egypt in 969 A.D. Hence the Egyptians measured the high-water mark of the Nile River for over 5000 years, using different tools. The most advanced of these tools was the nilometer; typical nilometers appear in several mosaics from the Roman and Byzantine period around the Mediterranean, such as the “Nile Festival” mosaic in Zippori (Upper Galilee), Fig. 1. The measurements had a twofold purpose: first to set the annual taxes, which were a function of the high-water mark, for obvious reasons; and second, to provide information for water management, with a view to reduce drought damage. Our analysis of high- and low-water levels for 622–1922 A.D. shows that oscillations with a period of several years occur, with a 7-year oscillation being dominant. We suspect that the origin of this 7-year swing lies in the same periodicity being present in the North Atlantic’s sea-surface temperatures and sea-level pressures. This North Atlantic Oscillation affects the climate of Europe, North America and the Middle East, and might be the ultimate reason for Joseph’s successful climate forecast.

Groth A, Ghil M, Hallegatte S, Dumas P. The Role of Oscillatory Modes in U.S. Business Cycles. Fondazione Eni Enrico Mattei (FEEM) [Internet]. 2012;26 :1. Publisher's VersionAbstract

We apply the advanced time-and-frequency-domain method of singular spectrum analysis to study business cycle dynamics in a set of nine U.S. macroeconomic indicators. This method provides a robust way to identify and reconstruct shared oscillations, whether intermittent or modulated. We address the problem of spurious cycles generated by the use of detrending filters and present a Monte Carlo test to extract significant oscillations. Finally, we demonstrate that the behavior of the U.S. economy changes significantly between episodes of growth and recession; these variations cannot be generated by random shocks alone, in the absence of endogenous variability.

Feliks Y, Ghil M, Robertson AW. Oscillatory Climate Modes in the Eastern Mediterranean and Their Synchronization with the North Atlantic Oscillation. Journal of Climate. 2010;23 (15) :4060–4079.Abstract

Oscillatory climatic modes over the North Atlantic, Ethiopian Plateau, and eastern Mediterranean were examined in instrumental and proxy records from these regions. Aside from the well-known North Atlantic Oscillation (NAO) index and the Nile River water-level records, the authors study for the first time an instrumental rainfall record from Jerusalem and a tree-ring record from the Golan Heights. The teleconnections between the regions were studied in terms of synchronization of chaotic oscillators. Standard methods for studying synchronization among such oscillators are modified by combining them with advanced spectral methods, including singular spectrum analysis. The resulting cross-spectral analysis quantifies the strength of the coupling together with the degree of synchronization. A prominent oscillatory mode with a 7–8-yr period is present in all the climatic indices studied here and is completely synchronized with the North Atlantic Oscillation. An energy analysis of the synchronization raises the possibility that this mode originates in the North Atlantic. Evidence is discussed for this mode being induced by the 7–8-yr oscillation in the position of the Gulf Stream front. A mechanism for the teleconnections between the North Atlantic, Ethiopian Plateau, and eastern Mediterranean is proposed, and implications for interannual-to-decadal climate prediction are discussed.

Kondrashov D, Shprits Y, Ghil M. Gap Filling of Solar Wind Data by Singular Spectrum Analysis. Geophysical Research Letters. 2010;37 :L15101.Abstract

Observational data sets in space physics often contain instrumental and sampling errors, as well as large gaps. This is both an obstacle and an incentive for research, since continuous data sets are typically needed for model formulation and validation. For example, the latest global empirical models of Earth's magnetic field are crucial for many space weather applications, and require time continuous solar wind and interplanetary magnetic field (IMF) data; both of these data sets have large gaps before 1994. Singular spectrum analysis (SSA) reconstructs missing data by using an iteratively inferred, smooth “signal” that captures coherent modes, while “noise” is discarded. In this study, we apply SSA to fill in large gaps in solar wind and IMF data, by combining it with geomagnetic indices that are time continuous, and generalizing it to multivariate geophysical data consisting of gappy “driver” and continuous “response” records. The reconstruction error estimates provide information on the physics of co variability between particular solar wind parameters and geomagnetic indices.

Jiang N, Neelin JD, Ghil M. Quasi-quadrennial and quasi-biennial variability in the equatorial Pacific. Climate Dynamics. 1995;12 :101–112.Abstract

Evaluation of competing El Niño/Southern Oscillation (ENSO) theories requires one to identify separate spectral peaks in equatorial wind and sea-surface temperature (SST) time series. To sharpen this identification, we examine the seasonal-to-interannual variability of these fields by the data-adaptive method of multi-channel singular spectrum analysis (M-SSA). M-SSA is applied to the equatorial band (4°N-4°S), using 1950 1990 data from the Comprehensive Ocean and Atmosphere Data Set. Two major interannual oscillations are found in the equatorial SST and surface zonal wind fields, U. The main peak is centered at about 52-months; we refer to it as the quasi-quadrennial (QQ) mode. Quasi-biennial (QB) variability is split between two modes, with periods near 28 months and 24 months. A faster, 15-month oscillation has smaller amplitude. The QQ mode dominates the variance and has the most distinct spectral peak. In time-longitude reconstructions of this mode, the SST has the form of a standing oscillation in the eastern equatorial Pacific, while the U-field is dominated by a standing oscillation pattern in the western Pacific and exhibits also slight eastward propagation in the central and western Pacific. The locations of maximum anomalies in both QB modes are similar to those of the QQ mode. Slight westward migration in SST, across the eastern and central, and eastward propagation of U, across the western and central Pacific, are found. The significant wind anomaly covers a smaller region than for the QQ. The QQ and QB modes together represent the ENSO variability well and interfere constructively during major events. The sharper definition of the QQ spectral peak and its dominance are consistent with the “devil's staircase” interaction mechanism between the annual cycle and ENSO.

Strounine K, Kravtsov S, Kondrashov D, Ghil M. Reduced models of atmospheric low-frequency variability: Parameter estimation and comparative performance. Physica D: Nonlinear Phenomena. 2010;239 (3) :145–166.Abstract

Low-frequency variability (LFV) of the atmosphere refers to its behavior on time scales of 10–100 days, longer than the life cycle of a mid-latitude cyclone but shorter than a season. This behavior is still poorly understood and hard to predict. The present study compares various model reduction strategies that help in deriving simplified models of LFV. Three distinct strategies are applied here to reduce a fairly realistic, high-dimensional, quasi-geostrophic, 3-level (QG3) atmospheric model to lower dimensions: (i) an empirical–dynamical method, which retains only a few components in the projection of the full QG3 model equations onto a specified basis, and finds the linear deterministic and the stochastic corrections empirically as in Selten (1995) [5]; (ii) a purely dynamics-based technique, employing the stochastic mode reduction strategy of Majda et al. (2001) [62]; and (iii) a purely empirical, multi-level regression procedure, which specifies the functional form of the reduced model and finds the model coefficients by multiple polynomial regression as in Kravtsov et al. (2005) [3]. The empirical–dynamical and dynamical reduced models were further improved by sequential parameter estimation and benchmarked against multi-level regression models; the extended Kalman filter was used for the parameter estimation. Overall, the reduced models perform better when more statistical information is used in the model construction. Thus, the purely empirical stochastic models with quadratic nonlinearity and additive noise reproduce very well the linear properties of the full QG3 model’s LFV, i.e. its autocorrelations and spectra, as well as the nonlinear properties, i.e. the persistent flow regimes that induce non-Gaussian features in the model’s probability density function. The empirical–dynamical models capture the basic statistical properties of the full model’s LFV, such as the variance and integral correlation time scales of the leading LFV modes, as well as some of the regime behavior features, but fail to reproduce the detailed structure of autocorrelations and distort the statistics of the regimes. Dynamical models that use data assimilation corrections do capture the linear statistics to a degree comparable with that of empirical–dynamical models, but do much less well on the full QG3 model’s nonlinear dynamics. These results are discussed in terms of their implications for a better understanding and prediction of LFV.

Ghil M, Vautard R. Interdecadal oscillations and the warming trend in global temperature time series. Nature. 1991;350 (6316) :324–327.Abstract

The ability to distinguish a warming trend from natural variability is critical for an understanding of the climatic response to increasing greenhouse-gas concentrations. Here we use singular spectrum analysis1 to analyse the time series of global surface air tem-peratures for the past 135 years2, allowing a secular warming trend and a small number of oscillatory modes to be separated from the noise. The trend is flat until 1910, with an increase of 0.4 °C since then. The oscillations exhibit interdecadal periods of 21 and 16 years, and interannual periods of 6 and 5 years. The interannual oscillations are probably related to global aspects of the El Niño-Southern Oscillation (ENSO) phenomenon3. The interdecadal oscillations could be associated with changes in the extratropical ocean circulation4. The oscillatory components have combined (peak-to-peak) amplitudes of >0.2 °C, and therefore limit our ability to predict whether the inferred secular warming trend of 0.005 °Cyr-1 will continue. This could postpone incontrovertible detection of the greenhouse warming signal for one or two decades.

Pages