Singular Spectrum Analysis (SSA)

SSA is a nonparametric method. It tries to overcome the problems of finite sample length and noisiness of sampled time series not by fitting an assumed model to the available series, but by using a data-adaptive basis set, instead of the fixed sine and cosine of the BT method.

Colebrook (1978) applied a form of SSA to biological oceanography and noted the duality between principal component analysis (PCA) in the space and time domain. Broomhead and King (1986: BK hereafter) applied the ``method of delays'' of dynamical systems theory to estimate the dimension of and reconstruct the Lorenz attractor using singular-value decomposition (SVD) on the trajectory matrix formed by lagged copies of a single series obtained from the system.

Vautard and Ghil (1989: VG hereafter) realized the formal similarity between classical lagged-covariance analysis and the method of delays. They exploited this similarity further by pointing out that pairs of SSA eigenmodes corresponding to nearly equal eigenvalues and associated (temporal) principal components that are nearly in phase quadrature can represent efficiently a nonlinear, anharmonic oscillation. This is due to the fact that a single pair of data-adaptive eigenmodes can capture the basic periodicity of a boxcar or seesaw-shaped oscillation, say, rather than necessitate the many overtones that will appear in methods with fixed basis functions.

There are three basic steps in SSA: i) embedding the sampled time series in a vector space of dimension M; ii) computing the M x M lag-covariance matrix CD of the data; and iii) diagonalizing CD.

Step (i): The time series {xn: t=1,...,N} is embedded into a vector space of dimension M by considering M lagged copies {xn-m: m=0,...,M-1} thereof. The choice of M is not obvious. Vautard et al. (1992) suggested that SSA is typically successful at analyzing periods in the range (M/5, M).

Step (ii): One defines the M x M lag-covariance matrix estimator CD . There are two distinct methods used widely to define CD. In the BK algorithm, a window of length M is moved along the time series, producing a sequence of N'=N-M+1 vectors in the embedding space. This sequence is used to obtain the N' x M trajectory matrix X, where the i-th row is the i-th view of the time series through the window. In the BK approach, CD is directly defined by the matrix product

CD=N' -1 XTX          (1)

In the VG algorithm, CD is estimated to have Toeplitz matrix with constant diagonals. Both the BG and VG algorithms are implemented in the SSA-MTM Toolkit.

Step (iii): The covariance matrix calculated from the N sample points, using the BK or VG algorithm, is then diagonalized and the eigenvalues tex2html_wrap_inline181 are ranked in decreasing order. The eigenvalue tex2html_wrap_inline183 gives the variance of the time series in the direction specified by the corresponding eigenvector tex2html_wrap_inline185 ; the square roots of the eigenvalues are called singular values and the set tex2html_wrap_inline187 the singular spectrum. These terms give SSA its name; BK showed how to obtain the set tex2html_wrap_inline189by SVD applied to the trajectory matrix tex2html_wrap_inline169 . VG called the tex2html_wrap_inline185 's empirical orthogonal functions (EOFs), by analogy with the meteorological term used when applying PCA in the spatial domain.

If we arrange and plot the singular values in decreasing order, one can often distinguish an initial steep slope, representing the signal, and a (more or less) ``flat floor,'' representing the noise level (Vautard and Ghil, 1989) To correctly separate signal from noise, this visual observation of the spectral plot must be confirmed by suitable criteria for statistical significance as discussed later.

Once these three steps have been completed, a number of other interesting results can be obtained. For each EOF tex2html_wrap_inline185 with components tex2html_wrap_inline197 , we can construct the time series of length N' given by:


called the k-th principal component (PC). It represents the projection of the original time series onto the k-th EOF. The sum of the power spectra of the PCs is identical to the power spectrum of the time series x(t) (Vautard et al., 1992); therefore we can study separately the spectral contribution of the various components. The PCs, however, have length N', notN, and do not contain phase information.

In order to extract time series of length N, corresponding to a chosen subset of tex2html_wrap_inline213 eigenelements, the associated PCs are combined to form the partial reconstruction tex2html_wrap_inline215 of the original time series x(t). Over the central part of the time series,


and tex2html_wrap_inline219 ; near the endpoints ( tex2html_wrap_inline221 or tex2html_wrap_inline223 ), the summation in j extends from 1 to t or from (t-N+M) to M and tex2html_wrap_inline233(Vautard et al., 1992). These series of length N are called reconstructed components (RCs). They have the important property of preserving the phase of the time series; thereforex(t) and tex2html_wrap_inline215 can be superimposed. When tex2html_wrap_inline213 contains the components of a single eigenvector tex2html_wrap_inline185 , tex2html_wrap_inline215 is called the k-th RC. If we denote it by tex2html_wrap_inline249 , one can show that


that is, the time series can be reconstructed completely by summing its RCs, without losing any information. Allen et al. (1992a) noted that, when a linear trend is present, reconstruction based on the BK algorithm has certain advantages near the endpoints of the sample.

The correct partial reconstruction of the signal, i.e., the reconstruction over the correct set tex2html_wrap_inline251 of EOFs, yields the optimal signal-to-noise (S/N) enhancement with respect to white noise. MEM spectral estimation will work very well on this clean signal (Penland et al., 1991, Keppenne and Ghil, 1992) which, unlike the raw time series, is band limited. As a result, a low-order AR-process fit can yield good spectral resolution, without spurious peaks.

In order to perform a good reconstruction of the signal or of its oscillatory part(s) several methods - either heuristic or based on Monte Carlo ideas - have been proposed for S/N separation or for reliable identification of oscillatory pairs and trends (Vautard and Ghil, 1989; Vautard et al., 1992; Ghil and Mo, 1991a, b).

Monte Carlo SSA

The particular Monte Carlo approach to S/N separation introduced by Allen (1992) has become known as Monte Carlo SSA (MC-SSA), and detailed description can be found in Allen and Smith (1996; AS hereafter). See also the review papers of Ghil and Yiou (1996), and Ghil and Taricco (1997).

Monte Carlo SSA can be used to establish whether a given timeseries is linearly distinguishable from any well-defined process, including the output of a deterministic chaotic system, but we will focus on testing against the linear stochastic processes which are normally considered as ``noise''. ``Red noise'' is often used to refer to any linear stochastic process in which power declines monotonically with increasing frequency, but we prefer to use the term to refer specifically to a first-order auto-regressive, or AR(1), process tex2html_wrap_inline253whose value at a time t depends on the value at time t-1 only,


where tex2html_wrap_inline259 is a gaussian-distributed white-noise process, for which each value is independent of all previous values, tex2html_wrap_inline261 is the process mean and tex2html_wrap_inline263 and tex2html_wrap_inline265 are constant coefficients.

When testing against a red noise null-hypothesis, the first step in MC-SSA is to determine the red-noise coefficients tex2html_wrap_inline263 and tex2html_wrap_inline265 from the time series X(t) using a maximum-likelihood criterion. Estimators are provided by Allen (1992) and AS which are asymptotically unbiased in the limit of large N and close to unbiased for shorter series provided the series length is at least an order of magnitude longer than the timescale of decay of autocorrelation, tex2html_wrap_inline275 . Based on these coefficients, an ensemble of surrogate red-noise data can be generated and, for each realization, a covariance matrix tex2html_wrap_inline277 is computed. These covariance matrices are then projected onto the eigenvector basis tex2html_wrap_inline279 of the original data,


Since (6) is not the SVD of that realization, the matrix tex2html_wrap_inline281 is not necessarily diagonal, but it measures the resemblance of a given surrogate set with the data set. This resemblance can be quantified by computing the statistics of the diagonal elements of tex2html_wrap_inline281 . The statistical distribution of these elements, determined from the ensemble of Monte Carlo simulations, gives confidence intervals outside which a time series can be considered to be significantly different from a generic red-noise simulation. For instance, if an eigenvaluetex2html_wrap_inline183 lies outside a 90% noise percentile, then the red-noise null hypothesis for the associated EOF (and PC) can be rejected with this confidence. Otherwise, that particular SSA component of the time series cannot be considered as significantly different from red noise.

Allen (1992) stresses that care must be taken at the parameter-estimation stage because, for the results of a test against AR(1) noise to be interesting, we must ensure we are testing against that specific AR(1) process which maximises the likelihood that we will fail to reject the null-hypothesis. Only if we can reject this (the ``worst case'') red noise process, can we be confident of rejecting all other red noise processes at the same or higher confidence level. A second important point is test multiplicity: comparing M data eigenvalues with Mconfidence intervals computed from the surrogate ensemble, we expect M/10 to lie above the 90th percentiles even if the null hypothesis is true. Thus a small number of excursions above a relatively low percentile should be interpreted with caution. AS discuss this problem in detail, and propose a number of possible solutions.

The MC-SSA algorithm described above can be adapted to eliminate known periodic components and test the residual against noise. This adaptation provides sharper insight into the dynamics captured by the data, since known periodicities (like orbital forcing on the Quaternary time scale or seasonal forcing on the intraseasonal-to-interannual one) often generate much of the variance at the lower frequencies manifest in a time series and alter the rest of the spectrum. Allen and Smith (1992) and AS describe this refinement of MC-SSA which consists in restricting the projections to the EOFs that do not account for known periodic behavior. This refinement is implemented in the SSA-MTM toolkit, where it is referred to as the "Hybrid Basis" (see the Toolkit Demonstration section below).