Statistical methods

2007
Camargo, Suzana J., Andrew W. Robertson, Scott J. Gaffney, Padhraic Smyth, and Michael Ghil. “Cluster analysis of typhoon tracks. Part I: General properties.” Journal of Climate 20, no. 14 (2007): 3635–3653.
PDF
Camargo, Suzana J., Andrew W. Robertson, Scott J. Gaffney, Padhraic Smyth, and Michael Ghil. “Cluster analysis of typhoon tracks. Part II: Large-scale circulation and ENSO.” Journal of Climate 20, no. 14 (2007): 3654–3676.
PDF
Kondrashov, Dmitri, Jie Shen, Richard Berk, Fabio D'Andrea, and Michael Ghil. “Predicting weather regime transitions in Northern Hemisphere datasets.” Climate Dynamics 29, no. 5 (2007): 535–551.
PDF
Gaffney, Scott J., Andrew W. Robertson, Padhraic Smyth, Suzana J. Camargo, and Michael Ghil. “Probabilistic clustering of extratropical cyclones using regression mixture models.” Climate Dynamics 29, no. 4 (2007): 423–440.
PDF
Kondrashov, Dmitri, and Michael Ghil. “Reply to T. Schneider's comment on "Spatio-temporal filling of missing points in geophysical data sets".” Nonlinear Processes in Geophysics 14, no. 1 (2007): 3–4.
PDF
Deloncle, Axel, Richard Berk, Fabio D'Andrea, and Michael Ghil. “Weather regime prediction using statistical learning.” Journal of the Atmospheric Sciences 64, no. 5 (2007): 1619–1635.
PDF
2006
Kondrashov, Dmitri, S. Kravtsov, and M. Ghil. “Empirical Mode Reduction in a Model of Extratropical Low-Frequency Variability.” Journal of the Atmospheric Sciences 63, no. 7 (2006): 1859-1877. Publisher's Version
Kondrashov, Dmitri, S Kravtsov, and M Ghil. “Empirical mode reduction in a model of extratropical low-frequency variability.” Journal of the Atmospheric Sciences 63, no. 7 (2006): 1859–1877. Abstract

This paper constructs and analyzes a reduced nonlinear stochastic model of extratropical low-frequency variability. To do so, it applies multilevel quadratic regression to the output of a long simulation of a global baroclinic, quasigeostrophic, three-level (QG3) model with topography; the model's phase space has a dimension of O(104). The reduced model has 45 variables and captures well the non-Gaussian features of the QG3 model's probability density function (PDF). In particular, the reduced model's PDF shares with the QG3 model its four anomalously persistent flow patterns, which correspond to opposite phases of the Arctic Oscillation and the North Atlantic Oscillation, as well as the Markov chain of transitions between these regimes. In addition, multichannel singular spectrum analysis identifies intraseasonal oscillations with a period of 35–37 days and of 20 days in the data generated by both the QG3 model and its low-dimensional analog. An analytical and numerical study of the reduced model starts with the fixed points and oscillatory eigenmodes of the model's deterministic part and uses systematically an increasing noise parameter to connect these with the behavior of the full, stochastically forced model version. The results of this study point to the origin of the QG3 model's multiple regimes and intraseasonal oscillations and identify the connections between the two types of behavior.

PDF
Feliks, Yizhak, Michael Ghil, Israel Ziona, and Météorologie Dynamique. “Long-range forecasting and the scientific background in Joseph's interpretation to Pharaoh’s dreams.” In Proc. 16th Conf. Research Judaea & Samaria. in press, 2006. Abstract

Long-range forecasting is today a major area of climate research. Such forecasts affect socioeconomic planning in many fields of activity. There are essentially two approaches to longrange forecasting: one is based on solving the equations that govern atmospheric and ocean dynamics, the other on the statistical properties of past climate records. The present talk is based on the latter, statistical approach. Joseph’s interpretation of Pharaoh’s dreams provides a striking example of long-range planning based on a climate forecast. Joseph interpreted the two dreams as a forecast for seven years of plenty, followed by seven of famine. Based on this forecast, he proposed to Pharaoh a plan for running the agriculture and economy of Egypt. It is not clear from the Biblical story why Pharaoh trusted Joseph’s forecast and appointed him to implement the plan. Our answer to this question is based on ancient and medieval Egypt’s being entirely dependent on the Nile River’s seasonal flooding: when the highest water levels did not cover the arable areas of the river valley, crops were insufficient to feed the population. When successive years of hunger weakened the economy and the state, change of rulers could, and sometimes did ensue. Extreme examples were the fall of the Old Kingdom in 2185 B.C. and the Fatimid conquest of Egypt in 969 A.D. Hence the Egyptians measured the high-water mark of the Nile River for over 5000 years, using different tools. The most advanced of these tools was the nilometer; typical nilometers appear in several mosaics from the Roman and Byzantine period around the Mediterranean, such as the “Nile Festival” mosaic in Zippori (Upper Galilee), Fig. 1. The measurements had a twofold purpose: first to set the annual taxes, which were a function of the high-water mark, for obvious reasons; and second, to provide information for water management, with a view to reduce drought damage. Our analysis of high- and low-water levels for 622–1922 A.D. shows that oscillations with a period of several years occur, with a 7-year oscillation being dominant. We suspect that the origin of this 7-year swing lies in the same periodicity being present in the North Atlantic’s sea-surface temperatures and sea-level pressures. This North Atlantic Oscillation affects the climate of Europe, North America and the Middle East, and might be the ultimate reason for Joseph’s successful climate forecast.

PDF (in Hebrew) PDF (English abstract)
Kondrashov, Dmitri, and Michael Ghil. “Spatio-temporal filling of missing points in geophysical data sets.” Nonlinear Processes in Geophysics 13, no. 2 (2006): 151–159. Abstract

The majority of data sets in the geosciences are obtained from observations and measurements of natural systems, rather than in the laboratory. These data sets are often full of gaps, due to to the conditions under which the measurements are made. Missing data give rise to various problems, for example in spectral estimation or in specifying boundary conditions for numerical models. Here we use Singular Spectrum Analysis (SSA) to fill the gaps in several types of data sets. For a univariate record, our procedure uses only temporal correlations in the data to fill in the missing points. For a multivariate record, multi-channel SSA (M-SSA) takes advantage of both spatial and temporal correlations. We iteratively produce estimates of missing data points, which are then used to compute a self-consistent lag-covariance matrix; cross-validation allows us to optimize the window width and number of dominant SSA or M-SSA modes to fill the gaps. The optimal parameters of our procedure depend on the distribution in time (and space) of the missing data, as well as on the variance distribution between oscillatory modes and noise. The algorithm is demonstrated on synthetic examples, as well as on data sets from oceanography, hydrology, atmospheric sciences, and space physics: global sea-surface temperature, flood-water records of the Nile River, the Southern Oscillation Index (SOI), and satellite observations of relativistic electrons.

PDF
2005
Kondrashov, Dmitri, S Kravtsov, Andrew W. Robertson, and Michael Ghil. “A hierarchy of data-based ENSO models.” Journal of climate 18, no. 21 (2005): 4425–4444. Abstract

Global sea surface temperature (SST) evolution is analyzed by constructing predictive models that best describe the dataset’s statistics. These inverse models assume that the system’s variability is driven by spatially coherent, additive noise that is white in time and are constructed in the phase space of the dataset’s leading empirical orthogonal functions. Multiple linear regression has been widely used to obtain inverse stochastic models; it is generalized here in two ways. First, the dynamics is allowed to be nonlinear by using polynomial regression. Second, a multilevel extension of classic regression allows the additive noise to be correlated in time; to do so, the residual stochastic forcing at a given level is modeled as a function of variables at this level and the preceding ones. The number of variables, as well as the order of nonlinearity, is determined by optimizing model performance. The two-level linear and quadratic models have a better El Niño–Southern Oscillation (ENSO) hindcast skill than their one-level counterparts. Estimates of skewness and kurtosis of the models’ simulated Niño-3 index reveal that the quadratic model reproduces better the observed asymmetry between the positive El Niño and negative La Niña events. The benefits of the quadratic model are less clear in terms of its overall, cross-validated hindcast skill; this model outperforms, however, the linear one in predicting the magnitude of extreme SST anomalies. Seasonal ENSO dependence is captured by incorporating additive, as well as multiplicative forcing with a 12-month period into the first level of each model. The quasi-quadrennial ENSO oscillatory mode is robustly simulated by all models. The “spring barrier” of ENSO forecast skill is explained by Floquet and singular vector analysis, which show that the leading ENSO mode becomes strongly damped in summer, while nonnormal optimum growth has a strong peak in December.

PDF
Kravtsov, S, Dmitri Kondrashov, and M Ghil. “Multilevel regression modeling of nonlinear processes: Derivation and applications to climatic variability.” Journal of Climate 18, no. 21 (2005): 4404–4424. Abstract

Predictive models are constructed to best describe an observed field’s statistics within a given class of nonlinear dynamics driven by a spatially coherent noise that is white in time. For linear dynamics, such inverse stochastic models are obtained by multiple linear regression (MLR). Nonlinear dynamics, when more appropriate, is accommodated by applying multiple polynomial regression (MPR) instead; the resulting model uses polynomial predictors, but the dependence on the regression parameters is linear in both MPR and MLR. The basic concepts are illustrated using the Lorenz convection model, the classical double-well problem, and a three-well problem in two space dimensions. Given a data sample that is long enough, MPR successfully reconstructs the model coefficients in the former two cases, while the resulting inverse model captures the three-regime structure of the system’s probability density function (PDF) in the latter case. A novel multilevel generalization of the classic regression procedure is introduced next. In this generalization, the residual stochastic forcing at a given level is subsequently modeled as a function of variables at this level and all the preceding ones. The number of levels is determined so that the lag-0 covariance of the residual forcing converges to a constant matrix, while its lag-1 covariance vanishes. This method has been applied to the output of a three-layer, quasigeostrophic model and to the analysis of Northern Hemisphere wintertime geopotential height anomalies. In both cases, the inverse model simulations reproduce well the multiregime structure of the PDF constructed in the subspace spanned by the dataset’s leading empirical orthogonal functions, as well as the detailed spectrum of the dataset’s temporal evolution. These encouraging results are interpreted in terms of the modeled low-frequency flow’s feedback on the statistics of the subgrid-scale processes.

PDF
Kondrashov, Dmitri, Yizhak Feliks, and Michael Ghil. “Oscillatory modes of extended Nile River records (A.D. 622–1922).” Geophysical Research Letters 32, no. 10 (2005): L10702. Abstract

The historical records of the low- and high-water levels of the Nile River are among the longest climatic records that have near-annual resolution. There are few gaps in the first part of the records (A.D. 622-1470) and larger gaps later (A.D. 1471-1922). We apply advanced spectral methods, Singular-Spectrum Analysis (SSA) and the Multi-Taper Method (MTM), to fill the gaps and to locate interannual and interdecadal periodicities. The gap filling uses a novel, iterative version of SSA. Our analysis reveals several statistically significant features of the records: a nonlinear, data-adaptive trend that includes a 256-year cycle, a quasi-quadriennial (4.2-year) and a quasi-biennial (2.2-year) mode, as well as additional periodicities of 64, 19, 12, and, most strikingly, 7 years. The quasi-quadriennial and quasi-biennial modes support the long-established connection between the Nile River discharge and the El-Niño/Southern Oscillation (ENSO) phenomenon in the Indo-Pacific Ocean. The longest periods might be of astronomical origin. The 7-year periodicity, possibly related to the biblical cycle of lean and fat years, seems to be due to North Atlantic influences.

PDF
2000
Yiou, Pascal, Didier Sornette, and Michael Ghil. “Data-adaptive wavelets and multi-scale singular-spectrum analysis.” Physica D 142, no. 3-4 (2000): 254–290. Abstract

Using multi-scale ideas from wavelet analysis, we extend singular-spectrum analysis (SSA) to the study of nonstationary time series, including the case where intermittency gives rise to the divergence of their variance. The wavelet transform resembles a local Fourier transform within a finite moving window whose width W, proportional to the major period of interest, is varied to explore a broad range of such periods. SSA, on the other hand, relies on the construction of the lag-correlation matrix C on M lagged copies of the time series over a fixed window width W to detect the regular part of the variability in that window in terms of the minimal number of oscillatory components; here W=M[Delta]t with [Delta]t as the time step. The proposed multi-scale SSA is a local SSA analysis within a moving window of width M<=W<=N, where N is the length of the time series. Multi-scale SSA varies W, while keeping a fixed W/M ratio, and uses the eigenvectors of the corresponding lag-correlation matrix C(M) as data-adaptive wavelets; successive eigenvectors of C(M) correspond approximately to successive derivatives of the first mother wavelet in standard wavelet analysis. Multi-scale SSA thus solves objectively the delicate problem of optimizing the analyzing wavelet in the time-frequency domain by a suitable localization of the signal's correlation matrix. We present several examples of application to synthetic signals with fractal or power-law behavior which mimic selected features of certain climatic or geophysical time series. The method is applied next to the monthly values of the Southern Oscillation Index (SOI) for 1933-1996; the SOI time series is widely believed to capture major features of the El Niño/Southern Oscillation (ENSO) in the Tropical Pacific. Our methodology highlights an abrupt periodicity shift in the SOI near 1960. This abrupt shift between 5 and 3 years supports the Devil's staircase scenario for the ENSO phenomenon (preliminary results of this study were presented at the XXII General Assembly of the European Geophysical Society, Vienna, May 1997, and at the Fall Meeting of the American Geophysical Union, San Francisco, December 1997).

PDF
1999
WANG, J., Dmitri Kondrashov, P. C. LIEWER, and S. R. KARMESIN. “Three-dimensional deformable-grid electromagnetic particle-in-cell for parallel computers.” Journal of Plasma Physics 61, no. 3 (1999): 367-389. Publisher's Version Abstract

We describe a new parallel, non-orthogonal-grid, three-dimensional electromagnetic particle-in-cell (EMPIC) code based on a finite-volume formulation. This code uses a logically Cartesian grid of deformable hexahedral cells, a discrete surface integral (DSI) algorithm to calculate the electromagnetic field, and a hybrid logical–physical space algorithm to push particles. We investigate the numerical instability of the DSI algorithm for non-orthogonal grids, analyse the accuracy for EMPIC simulations on non-orthogonal grids, and present performance benchmarks of this code on a parallel supercomputer. While the hybrid particle push algorithm has a second-order accuracy in space, the accuracy of the DSI field solve algorithm is between first and second order for non-orthogonal grids. The parallel implementation of this code, which is almost identical to that of a Cartesian-grid EMPIC code using domain decomposition, achieved a high parallel efficiency of over 96% for large-scal" # "e simulations.

Smyth, Padhraic, Kayo Ide, and Michael Ghil. “Multiple Regimes in Northern Hemisphere Height Fields via Mixture Model Clustering.” Journal of the Atmospheric Sciences 56, no. 21 (1999): 3704–3723.
1998
Moron, Vincent, Robert Vautard, and Michael Ghil. “Trends, interdecadal and interannual oscillations in global sea-surface temperatures.” Climate Dynamics 14, no. 7 (1998): 545–569. Abstract

This study aims at a global description of climatic phenomena that exhibit some regularity during the twentieth century. Multi-channel singular spectrum analysis is used to extract long-term trends and quasi-regular oscillations of global sea-surface temperature (SST) fields since 1901. Regional analyses are also performed on the Pacific, (Northern and Southern) Atlantic, and Indian Ocean basins. The strongest climatic signal is the irregular long-term trend, characterized by overall warming during 1910–1940 and since 1975, with cooling (especially of the Northern Hemisphere) between these two warming intervals. Substantial cooling prevailed in the North Pacific between 1950 and 1980, and continues in the North Atlantic today. Both cooling and warming are preceded by SST anomalies of the same sign in the subpolar North Atlantic. Near-decadal oscillations are present primarily over the North Atlantic, but also over the South Atlantic and the Indian Ocean. A 13–15-y oscillation exhibits a seesaw pattern between the Gulf-Stream region and the North-Atlantic Drift and affects also the tropical Atlantic. Another 7–8-y oscillation involves the entire double-gyre circulation of the North Atlantic, being mostly of one sign across the basin, with a minor maximum of opposite sign in the subpolar gyre and the major maximum in the northwestern part of the subtropical gyre. Three distinct interannual signals are found, with periods of about 60–65, 45 and 24–30 months. All three are strongest in the tropical Eastern Pacific. The first two extend throughout the whole Pacific and still exhibit some consistent, albeit weak, patterns in other ocean basins. The latter is weaker overall and has no consistent signature outside the Pacific. The 60-month oscillation obtains primarily before the 1960s and the 45-month oscillation afterwards.

1996
Ghil, Michael, and Pascal Yiou. “Spectral methods: What they can and cannot do for climatic time series.” In Decadal Climate Variability: Dynamics and Predictability, edited by D. Anderson and J. Willebrand, 446–482. Springer-Verlag, Berlin/Heidelberg, 1996.
1995
Unal, Yurdanur Sezginer, and Michael Ghil. “Interannual and interdecadal oscillation patterns in sea level.” Climate Dynamics 11, no. 5 (1995): 255–278. Abstract

Relative sea-level height (RSLH) data at 213 tide-gauge stations have been analyzed on a monthly and an annual basis to study interannual and interdecadal oscillations, respectively. The main tools of the study are singular spectrum analysis (SSA) and multi-channel SSA (M-SSA). Very-low-frequency variability of RSLH was filtered by SSA to estimate the linear trend at each station. Global sea-level rise, after postglacial rebound corrections, has been found to equal 1.62±0.38 mm/y, by averaging over 175 stations which have a trend consistent with the neighboring ones. We have identified two dominant time scales of El Niño-Southern Oscillation (ENSO) variability, quasi-biennial and low-frequency, in the RSLH data at almost all stations. However, the amplitudes of both ENSO signals are higher in the equatorial Pacific and along the west coast of North America. RSLH data were interpolated along ocean coasts by latitudinal intervals of 5 or 10 degrees, depending on station density. Interannual variability was then examined by M-SSA in five regions: eastern Pacific (25°S–55°N at 10° resolution), western Pacific (35°S–45°N at 10°), equatorial Pacific (123°E–169°W, 6 stations), eastern Atlantic (30°S, 0°, and 30°N–70°N at 5°) and western Atlantic (50°S–50°N at 10°). Throughout the Pacific, we have found three dominant spatio-temporal oscillatory patterns, associated with time scales of ENSO variability; their periods are 2, 2.5–3 and 4–6 y. In the eastern Pacific, the biennial mode and the 6-y low-frequency mode propagate poleward. There is a southward propagation of low-frequency modes in the western Pacific RSLH, between 35°N and 5°S, but no clear propagation in the latitudes further south. However, equatorward propagation of the biennial signal is very clear in the Southern Hemisphere. In the equatorial Pacific, both the quasi-quadrennial and quasi-biennial modes at 10°N propagate westward. Strong and weak El Niño years are evident in the sea-level time series reconstructed from the quasi-biennial and low-frequency modes. Interannual variability with periods of 3 and 4–8 y is detected in the Atlantic RSLH data. In the eastern Atlantic region, we have found slow propagation of both modes northward and southward, away from 40–45°N. Interdecadal oscillations were studied using 81 stations with sufficiently long and continuous records. Most of these have variability at 9–13 and some at 18 y. Two significant eigenmode pairs, corresponding to periods of 11.6 and 12.8 y, are found in the eastern and western Atlantic ocean at latitudes 40°N–70°N and 10°N–50°N, respectively.

PDF

Pages