# Methodology

The ENSO-CLIPER predictive model utilizes a multiple linear regression based on least squares deviations which uses the method of leaps and bounds (IMSL 1987). This predictor selection routine steps forward using every possible combination of the predictors eventually finding the best multiple regression equation having one, two, three... predictors. Prospective predictors were retained only if they correlated in the regression test at a significance level beyond 95 percent using a t test and increased the total variance explained by at least 2.5 percent. If no predictor met these two criteria, then no ENSO-CLIPER forecast equation is obtained and a zero anomaly (climatology) forecast is made. This occurred only occasionally but most notably for the three season lead times and beyond. Other restrictions on regression predictors related to avoiding "over-fitting" are detailed below.

The SST indices and SOI are forecast at leads of zero to seven seasons. All forecasts are made for three month target prediction intervals but are made for each individual monthly initiation time. Here we follow the nomenclature of Barnston and Ropelewski (1992) wherein zero lead indicates predictions for the next immediately upcoming month (their Fig. 5). For example, a forecast issued on 1 February for February through April conditions is termed a zero lead seasonal forecast. A 1 February forecast for May through July is a one season lead forecast and so forth. A limit of two years lead time (ie., seven seasons) reflects the fact that hindcast ability becomes negligible beyond seven seasons lead.

As stated in our introduction, the aim is to optimally utilize trend and climatology to augment persistence as a ``no-skill" ENSO forecast. As shown by Wright (1985) and Wright et al. (1988), persistence of initial conditions depends both upon the region being forecast and seasonality. For example, persistence (the anomaly of the previous month) explains 92 percent of variance in a zero season lead for Niño 3.4 during November through January, but only 45 percent for May through July. In contrast, Niño 1+2 SSTs have peak zero season lead persistence during July through September (85 percent of the variance) and a minimum in February through April (29 percent). Thus to account for such a strong annual cycle in the effectiveness of persistence, separate regressions were performed for each monthly initial starting date.

A pool of fourteen predictors were available for selection by the regression scheme. Each regression had the choice of one, three or five month averages of initial predictor anomalies for each parameter to be predicted and similar choices for the trend of the initial conditions (one, three or five month differences of average anomalies). For example, predictions made 1 January had the choice of December, October through December or August through December mean initial values for predictor conditions. Options for trend of initial conditions (again from a 1 January starting point) included December minus November, October-December minus July-September or August-December minus March-July trends. Similarly, the regression considered the three month initial conditions and trend of the other four predictands. Hence, the potential predictors used for a prediction of Niño 3.4 are as follows:

As noted above, the regression procedure imposed an additional criterion to inhibit predictor selection beyond meaningful significance. The additional criterion is that the regression may not retain more than one of predictors 1, 2 or 3 and no more than one of the predictors 4, 5 or 6. This restriction is to minimize multicollinearity of predictors creating hindcast ability (Aczel 1989). The variety of initial conditions and trends of the predictand allows flexibility in handling a strong annual cycle of persistence. For example, for Niño 3.4 SST zero and one season lead forecasts, it was common for the one month initial conditions and trends to be chosen whereas at lead times of four season and longer, the three month or five month initial conditions (typically as negatively correlated predictors) and trend were instead chosen. Rather than manually selecting the highest persistence and trend time periods, we allowed the regression model to perform the selection adhering to the above criterion. If no predictors are found, which is occasionally the case at longer leads, the equation produces a climatology forecast.

All results from the multiple regression coefficients are adjusted or degraded to reflect what should be expected in completely independent future forecasts rather than the value obtained in the hindcasts. This alteration of both the variance explained and in the RMSE is performed following the methodology of Davis (1979) and Shapiro (1984).

This methodology begins with a definition of the amount of artificial ability () or variance explained in Eq. 1.

where M = number of predictors (varies from individual equation), N = number of observations (43 years), and = hindcast ability obtained from the regression equation expressed as the percent variance explained.

When hindcasts are applied to independent data it is expected that the degradation is twice this estimate of artificial ability and thus the actual forecast ability () can be estimated as shown in Eq. 2.

Since forecast ability is related to the square errors, the adjusted RMSE (RMSE) can also be estimated as shown in Eq. 3.

The results to be shown in the following sections have been adjusted to reflect these likely degradations in performance on independent data.

Five separate predictands (Niño 1+2, Niño 3, Niño 4 and Niño 3.4 SST indices and the SOI) plus eight different forecast periods (zero to seven seasons lead) and twelve initial starting times (1 January, 1 February, ... 1 December) yields a total of 480 regression relationships which were examined. An equation for each was developed using the 1950-1994 data which provided a sample of 43 hindcast data points.