FIFTH INTERNATIONAL WORKSHOP ON TROPICAL CYCLONES
Topic 3.2 Consensus approach to track forecasting.
Rapporteurs: R.A. Jeffries and E. J. Fukada
Joint Typhoon Warning Center
425 Luapele RD
Pearl Harbor, HI 96860-3103
Working Group: Lixion Avila (USA)
Harry Weber (Germany)
Russell Elsberry (USA)
James Goerss (NRL-USA)
Qian Chuanhai (China)
This paper gives an overview of consensus forecast research and discusses the Joint Typhoon Warning Center (JTWC) use and development of consensus forecast techniques for tropical cyclone track forecasting. The use of simple consensus forecast guidance at the JTWC has resulted in three straight record forecast seasons in the western North Pacific even with a 100% turnover in the forecast staff. The use of consensus model blends for tropical cyclone track forecast guidance is discussed in detail. Early consensus forecast results and the evolution of consensus forecasting at the JTWC are described, along with results of the first two years of a three-year test of the Systematic Approach to Tropical Cyclone Forecasting Aid (SAFA) at the JTWC.
SAFA provided JTWC with a systematic process to refocus the TC track forecast process on use of consensus forecast guidance as the first-guess forecast track. The Non-selective CONsensus (NCON) (a simple blended consensus of five available dynamic models) was a major contributor to JTWC forecast improvement during the 2000 TC season. SAFA also enabled the development of a thorough mental picture of the evolution of the TC environment, which increases forecaster understanding, standardizes the forecast process, and facilitates forecaster training. But, another basic element of SAFA, the development of Selective Consensus (SCON) forecasts, proved to be of little value to the JTWC warning process when compared to a consensus of all available dynamic models.
Consensus forecasts do not apply to all forecast scenarios and much work is needed to help forecasters rapidly identify the cases where consensus forecasts lack skill. JTWC continues to refine the consensus forecast process, develop new consensus forecast tools and expects to experiment with consensus forecasting to improve TC intensity and structure predictions. Data are also presented that show the value of extending these two new consensus forecasts to 120 h.
3.2.1. Introduction to the Consensus Forecasting Approach.
At the Fourth WMO/ICSU International Workshop on Tropical Cyclones (IWTC-IV), it was suggested that the systematic use of ensembles could aid the forecaster in information management and result in improved tropical cyclone (TC) track forecasts. A demonstration project to document the use of ensembles was suggested as a way to identify where improvements can be made.
The JTWC has experimented with and inconsistently used ensemble type forecast methods for approximately 10 years. Since 2000 the Joint Typhoon Warning Center (JTWC) has been systematically applying simple ensembles of dynamic model forecasts to the TC track forecast problem.
The idea of using a consensus of various objective or dynamical model tracks as a tool in developing a tropical cyclone (TC) track forecast is not new. Aberson (2001) noted that ...ensemble forecasting has been used operationally since the middle 1960s at the National Hurricane Center .
It has been and continues to be common practice for a forecast center to plot all available dynamic and objective model TC track forecasts and subjectively evaluate this guidance considering the recent-past motion, the synoptic situation, known error characteristics of the various track forecasts and other factors. This subjective evaluation at the JTWC has resulted in very accurate official forecasts when the numerous numerical track forecasts were in basic agreement.
In the mid-1990s, Goerss proposed that TC track forecasts by regional and global numerical models be used to produce a simple ensemble average or consensus forecast. Goerss (1998) presented data that indicated that the overall forecast performance of simple ensemble, determined by averaging the forecast positions for the three global models in the western North Pacific, was superior to that of the best model, JGSM. In February 1998, Goerss presented similar findings at the Department of Defense (DoD) Tropical Cyclone Conference (TCC) at the U.S. Forces Center, Tokyo Japan. Goerss (1999) also evaluated the extension of consensus forecast to 96 h and 120 h in the North Atlantic. Goerss (2000a,b) further evaluated the consensus forecast technique using three global models and two regional models1 and found that the average consensus track errors were smaller than the average errors for each of the individual models. Goerss (2000a,b) also found that the consensus forecasts provided either the most accurate or second-most accurate TC track forecast in more than 70% of the cases. At the 2001 DoD TCC, Goerss and Sampson described the potential improvements from using COAMPS and MM5 in the 72 h consensus forecasts. At this same conference, Goerss also described findings that indicated that the quality of the 120 h consensus forecasts in the western North Pacific would be greatly improved with the inclusion of the AVN/MRF and ECMWF global models.
Elsberry and Carr (2000) used the same five models evaluated by Goerss (2000a) to evaluate consensus forecasts and the consensus error versus the linear spread of the numerical forecasts (distance from 72-h consensus position to the farthest track position of the five models). Based on these findings, Elsberry and Carr proposed that the forecaster could improve on the large spread consensus forecasts by eliminating erroneous track(s) to form a selective consensus forecast. Subsequently, Carr and Elsberry (2000a, b) developed conceptual models for detecting large consensus forecast error situations.
Aberson (2001) investigated the ensemble mean of most of the model track guidance available at the U. S. National Hurricane Center (NHC), Miami, FL from 1976 to 2000 and determined that numerical guidance available at the NHC had improved since 1976. Aberson also concluded that the ensemble forecast process needed more development to further develop forecast reliability and forecast distribution potential.
Weber (2002a, 2002b) developed a statistical ensemble prediction system (STEPS) that uses the model performance during the previous year as a weighting factor for use in consensus forecasts. Results with STEPS showed a mean positive skill for Atlantic TC track predictions of more than 15% relative to all major dynamical models and the official National Hurricane Center forecasts for the 1997-2000 TC dataset.
In a different approach to Goerss (2000a), Krishnamurti et al. (2000) created a consensus of two global models and one spectral model of the Florida State University (FSU), the GFDL model, the UKMO model and NOGAPS to predict storm tracks and intensities during the 1998 Atlantic hurricane season. In the training period of Krishnamurtis method, statistical weights were determined for each individual model. The individual model forecasts of all storms except the one to be predicted and all available model forecasts of the ensemble members were subjected to a linear multiple regression relative to best track information to derive the statistical weights of the expected performance of each ensemble member. In the forecast period the weighted individual forecasts of all ensemble members were used to produce track and intensity predictions. Krishnamurti et al. justified this cross-validation approach based on the major modifications made to some of the models after 1997. With mean position errors of about 125, 190 and 260 km at 24, 48, and 72 h, respectively, the average guidance was found to be significantly better than that of each individual model and the official NHC forecast.
3.2.2. Evolution of Consensus Track Forecasting at JTWC.
The JTWC began to use consensus forecasts in the early 1990s but the persistent and methodical development of this forecast tool did not occur until the late 1990s.
In 1991, JTWC began use of what was then labeled as Hybrid Forecast Aids to reduce very large track forecast errors (JTWC 1993). These hybrid aids BLND and WGTD consisted of six forecasts that were developed through simple and weighted averages. These early aids were heavily weighted toward climatology but still produced the lowest overall track errors when compared to individual numerical model performance. Operational experience with these aids revealed that the aids were not consistently used because they required manual data entry and were too time consuming.
In 1993, JTWC began using the Dynamic AVErage (DAVE), which was a simple average of all available dynamic model guidance. Statistics for 1993 indicated that DAVE out-performed JTWC by approximately 8 10 % at 48 h and 72 h (JTWC 1993). A homogeneous comparison with the individual dynamic models included in DAVE shows most individual models out-performed both JTWC and DAVE by approximately 10 to 30 %. These results were not encouraging and only sporadic JTWC use of DAVE continued until 2000.
In 1998, two forecast aids proposed by Goerss (2000), the Global AVerage (GLAV)2 and the Regional model AVerage (RGAV)3 were installed4 as an upgrade to the previous simple ensemble or consensus forecast efforts. Tables 3.2.1(a) and (b) show the 1998 and 1999 forecast performance for GLAV and RGAV based on the data set available in the JTWC operational database. Since GLAV was produced at 0000 and 1200 UTC and RGAV was produced at 0600 and 1800 UTC, no homogeneous comparison is made.
Table 3.2.: Non-homogeneous comparison of JTWC, CLIP, GLAV, and RGAV at 24, 48, and 72h for the 1998(a) and 1999 (b) western North Pacific seasons. Error in kilometers (number of forecasts created in brackets).
Although GLAV and RGAV consensus forecasts showed skill, they were not consistently used to develop forecasts during the 1998 and 1999 seasons. Discussion with the JTWC forecasters indicate that the products were not used for two primary reasons: (1) a lack of appreciation for the skill of the consensus products; and (2) the product was not fully automated until the 1999 season.
In 2000, JTWC began operational evaluation of a prototype rules-based consensus forecast process called the Systematic Approach to TC Forecasting Aid (SAFA), which was developed by Carr and Elsberry (2000a, b). The SAFA is applied at the JTWC using a five-step process:
1). A Non-selective Consensus (NCON)
|Two Models||Three Models||Four Models||Five Models|
Using the findings from the 2000 season together with counseling and advice from Les Carr, JTWC attempted to improve the TDOs skill in deriving SCON forecasts. Training was conducted to improve the understanding of the forecasters of the numerical depictions of the meteorological patterns that govern tropical cyclone motion. Additionally, the forecasters were encouraged to be more conservative when creating SCON forecasts as post-analysis by Carr determined that the JTWC on numerous occasions had created SCON forecasts when none were required. Forecasters were encouraged to decrease the number of SCON forecasts and minimize the variance from the NCON.
Since Table 3.2.2 and Goerss (1999) demonstrate the value of having more consensus members, the Naval Research Laboratory (NRL) Automated Tropical Cyclone Forecast (ATCF) system development team then assisted JTWC in testing and development of consensus combinations other than NCON and SCON. As a result, the ATCF system was revised to allow JTWC to create additional consensus combinations. A version of the NHC interpolation and consensus track forecast code (GUNS) was installed by NRL at JTWC during July 2001.
Using the revised ATCF consensus forecasting capability, JTWC applied two new model combinations to determine the effects of the regional models on the consensus forecast. Consensus forecasts using the four global models (CONG6) were compared to consensus forecasts produced by all available dynamic models (CONU7). A homogeneous comparison of CONG, CONU, NCON and the JTWC forecasts for the 2001 western North Pacific season (Table 3.2.3) seems to indicate that the greatest TC track forecasting skill lies with the consensus of all numerical models (CONU).
3.2.3: Fusion of Consensus TC Track Guidance into the Track Forecast Process at the JTWC.
U. S. military personnel rotation policy causes the JTWC TC forecasters to completely change every 2 to 3 years. Consistent application and development of consensus forecasting techniques and application of the SAFA systematic field review process has provided the JTWC with tools to mitigate the routine loss of these skilled forecasters.
Consensus forecasts are used as the initial first guess for all JTWC TC track forecasts. Through continuous process refinement, JTWC shifted the focus away from creation of SCON forecasts to a more conservative goal of consistently adding value to the consensus forecast by better understanding of the numerical model output. The new focus at JTWC is on fine-tuning the consensus by systematically analyzing the model field and rapidly creating a mental picture of the model field evolution with time for the forecast period. This picture must accurately assess differences in steering, intensity and structure among the model solutions. Known model biases, departures from satellite-derived intensity and structure assessments, and current model trends are then subjectively applied to the track forecast.
With constantly changing models and the availability of new remotely sensed data, continuous training is required to enable forecasters to improve on the consensus forecast. JTWC routinely monitors the forecast process in the near-real time through statistical analysis to improve forecast quality. Twice-monthly statistical analyses of the CONU, CONG, NCON, and SCON performance, along with performance statistics for each consensus member, are also conducted. These analyses are presented when the situation requires and at monthly forecaster meetings. The objective is to standardize procedures, build forecaster confidence and improve on systematic procedures to add value over the consensus when developing the JTWC track forecast. Table 3.2.4
Table 3.2.4: Mean JTWC western North Pacific track forecast errors (km) for 2000 through 16 September 2002.
shows the decrease in JTWC track forecast errors since beginning the systematic use of consensus forecast guidance in 2000, and the error reductions due to improvements to the consensus forecast process in 2001 and 2002. Even though a 100% turnover in forecast staff occurred late in 2001, a preliminary review indicates that the quality of the JTWC forecasts continues to improve. The data presented in Table 3.2.5 supports the premise that persistent consensus forecast application can prevent degradation of the forecast process due to routine changes in the forecast staff. Table 3.2.5 indicates JTWCs skill compared to CLIPER improved 4 to 5 % from 1997-2000
Table 3.2.5: JTWC and CLIP 72-h errors (km) and forecast improvement over CLIP in percent.
with the exception of the 1998 season. In 1998, JTWC did not systematically use consensus forecast guidance but shifted to using the best performing model as the starting point for the official forecast. The poor performance of all the numerical models in 1998 resulted in degradation in quality of the JTWC forecasts. Table 3.2.6 suggests that had JTWC used consensus model guidance8 in 1998 substantial improvements would have occurred.
Table 3.2.7 shows a steady improvement in JTWC performance since 1997 with the exception of 1998. Based on these results and the findings of Goerss (2000 and 2001) and Aberson (2001), it can be suggested that some of the forecast improvement was the result of improved model performance. Additionally, these model improvements resulted in improved consensus forecasts that were used in the JTWC forecast process during the 2000, 2001 and 2002 western North Pacific seasons. Table 3.2.7 shows a significant reduction in track forecast error standard deviation for 2001 and 2002 which the authors believe is a direct result of the systematic use of consensus
Table 3.2.7: JTWC western North Pacific 24h, 48h, and 72h track forecast error standard deviation for 1998 thru 16 September 2002. Error (km).
forecast aids and review of model fields. Further studies are needed and will be conducted to verify these deductions.
3.2.4: Limitations With Consensus TC Track Guidance
Figure : Typhoon Nari as it finished a second loop near Okinawa. The consensus forecast predicts Typhoon Nari will move onshore on mainland China north of Fuzhou when in fact Typhoon Nari moved over Taiwan. This is just one example of potential failure using consensus forecasts.
As shown in Figure 1, there are limitations with the application of consensus track forecast guidance. Incorrect synoptic pattern depiction, incorrect numerical model representation of incipient or mature tropical cyclones, and the erroneous development or poor initialization of adjacent cyclonic circulations are three major causes of error in current numerical model solutions.
When tropical cyclones begin to move into high vertical wind shear regions, over colder water, or begin to interact with mid-latitude systems, varying solutions will occur in the numerical models contained in the consensus. This situation can result in major divergence in model solutions and can result in poor consensus forecasts.
Weak tropical cyclones and tropical depressions usually are not well analyzed in the numerical models. Additionally, small tropical cyclones are often portrayed as being too large in the numerical models, which results in incorrect poleward bias in the model track predictions. Often, the large size of the vortex in the model initialization is due to effects of the synthetic TC observations as they are assimilated into the dynamic models. Jim Goerss suggests that the combination of model resolution and model physics controls the vortex size. For example a T239 model can maintain a smaller vortex than T159 can. If you insert synthetic observations into the T239 a small vortex will be maintained. The same synthetic observation inserted in a T159 model will quickly be modified to something the model can maintain. Regardless of the source of these too large tropical cyclone vorticies, when the majority of the model tracks are poleward-biased, the resulting consensus forecast is of low quality.
Occasionally, the numerical models will forecast development of a cyclonic circulation near the location of a developing tropical cyclone or will show multiple cyclonic circulations along the monsoon trough. These adjacent, sometimes erroneous circulations have a tendency to interact in the model fields and degrade the quality of the consensus.
3.2.5: Experimental TC Track Forecast Products.
In 2001, JTWC began a three-year evaluation of 96 h and 120 h track forecasts. Goerss (2000b) had provided evidence that skillful 96 h and 120 h TC forecasts were being produced by the GFDL, NOGAPS, UKMO, and the European Centre for Medium-range Weather Forecast (ECMWF) models. Based on Goerss (2000b and 2001), independent analysis of CONU at JTWC, and lessons learned in 2000 and 2001 during the SAFA test, consensus forecast techniques are used for creation of these 96 h and 120 h forecasts. Table 3.2.8 gives a comparison of the 2002 JTWC 96 h and 120 h forecasts and the consensus guidance used to create these forecasts. Note that JTWC has not extended NCON and SCON tracks in SAFA to 96 h and 120 h because of the significant costs to update the SAFA computer code when a similar capability existed on ATCF. Table 3.2.9 shows the results of the JTWC 96 and 120 h test through 16 September 2002. These results suggest that relatively skillful consensus
Table 3.2.8: Homogeneous comparison of JTWC, CONG, CONU, and NCON forecast errors (km) at 24, 48, 72, 96, and 120 h for the western North Pacific.
Table 3.2.9: Homogeneous comparison of JTWC and CLIP western North Pacific forecast errors (km) for 2000 thru 16 September 2002.
guidance extends to 120 h and the experimental 96 h and 120 h JTWC forecasts improved significantly through use of CONU as the primary guidance.
3.2.6 Recommendations for Future Needs.
A tool or method is needed to calculate and display differences in the model analysis along with a method to create an independent analysis of satellite data. This method will aid forecasters in adding value to the consensus forecast. An independent analysis of satellite-derived data should be compared with the various model analyses to identify potential track errors resulting from poorly analyzed features that are evident in satellite data. Post-analysis at JTWC often reveals poor model initial conditions based on satellite imagery were present when the model produced poor quality track forecasts.
A method is needed to rapidly assess the model impacts resulting from warning position and past track fluctuations. Consensus forecasts are contaminated by rapid fluctuations in TC direction and speed of movement as represented in the initial position and must be accounted for when using the consensus forecasts.
It is recommended that numerical models be improved to more accurately depict expected tracks of weak tropical cyclones. Errors in the tracks forecasts by dynamical models are larger for less intense cyclones (tropical storm stage and weaker) than for hurricane stage cyclones. Some of this error may be from poorly defined initial conditions. Based on JTWCs results using consensus forecasting, it is suggested that efforts be made to share skillful model TC tracks to ensure availability of five or more tracks for use in consensus forecasting in other basins.
The missing ingredient for implementation of consensus forecasting in the 1990s was proper training on the systematic use of consensus guidance. The development of SAFA was needed to engineer a systematic process to incorporate consensus forecasts into the TC warning process. The use of the consensus forecast approach has helped JTWC track forecasting in the Pacific and Indian Ocean. During the forecast development process, the tropical cyclone forecaster gains needed information on the temporal and spatial evolution of model forecasts, erroneous model features, and the strengths and weaknesses of these models. This information can be applied within the time constraints of the TC warning cycle to improve the official forecast. Based on the positive results since the 2000 forecast season, JTWC will continue to explore and improve the use of systematic field review and consensus forecasts to produce TC warnings.
Mr. Charles R. Sampson and Dr. Jim Gross assisted with the post-analysis of consensus forecasts for 1998 and 1999. LT Dave Roberts assisted in producing figures. Captain Chris Cantrell and Captain Steve Vilpors assisted in creating statistics and Lt Colonel Greg Engel provided a review and helpful suggestions to improve this manuscript. Additionally, we want to thank Dr. Johnny Chan and Dr. Russ Elsberry for research assistance, review of the manuscript, and suggestions for improvement.
Aberson, S., 2001: The ensemble of tropical cyclone track forecasting models in the North Atlantic basin (1976-2000). Bull. Amer. Meteor. Soc., 82, 1895-1904.
Carr, L. E., III, and R. L. Elsberry, 2000a: Dynamical tropical cyclone track forecast errors. Part I: Tropical region error sources. Wea. Forecasting, 15, 641-661.
Carr, L. E., III, and R. L. Elsberry, 2000b: Dynamical tropical cyclone track forecast errors. Part II: Midlatitude circulation influences. Wea. Forecasting, 15, 662-681.
Elsberry, R. L., and L. E. Carr, III, 2000: Consensus of dynamical tropical cyclone track forecasts Errors versus spread. Mon. Wea. Rev., 128, 4131-4138.
Goerss, J., 1998: Global model tropical cyclone forecast performance.
Minutes of the 52nd Interdepartmental Hurricane Conference. 26
Thru 30 January, 1998. Clearwater Beach Fl. Office of Federal
Coordinator for Meteorological Services and Supporting Research,
______ 1999: Tropical cyclone forecasting using an ensemble of dynamical models: 1998 Atlantic hurricane season. Preprints, 23rd Conf. Hurr. Trop. Meteor, Dallas,TX, Amer. Meteor. Soc., 826-827.
_____ 2000a: Tropical cyclone track forecasts using an ensemble of dynamical models. Mon Wea. Rev., 128, 1187-1193.
_____ 2000b: Quantifying tropical cyclone forecast uncertainty using
an ensemble of dynamical models. Preprints, 24th Conf. Hurr. Trop.
Meteor., Ft. Lauderdale, FL, Amer. Meteor. Soc., 429-430.
Joint Typhoon Warning Center, 1994. Annual Tropical Cyclone Report, 1993.
Weber, H. C., 2002a: Hurricane track and intensity prediction using a
statistical ensemble of numerical models. Preprints, 25th Conf. On
Hurricanes and Trop. Meteor. San Diego, CA, Amer. Met. Soc., Boston, MA
Weber, H. C., 2002b: Hurricane track prediction using a statistical ensemble
of numerical models. Mon. Wea. Rev. (in press).
1 Global: Navy Operational Global Atmospheric Prediction System NOGAPS; Japan Global Spectral Model JGSM; and United Kingdom Meteorological Office UKMO
Regional: Navy version of the Geophysical Fluid Dynamics Lab model GFDN; Japan Typhoon Model JTYM
2 NOGAPS, JGSM, and UKMO models
3 GFDN and JTYM models
4 On the Automated Tropical Cyclone Forecasting System (ATCF), the TC forecast workstation used by JTWC
5 NCON is a simple ensemble or consensus forecast that combines the same five dynamic TC track predictions described by Goerss (2000) that are interpolated spatially and temporally to coincide with the initial TC warning position.
6 NOGAPS, JGSM, UKMO and AVN models
7 NOGAPS, JGSM, UKMO, AVN, JTYM, COAMPS, USAF MM5, and GFDN.
8 Mr. Charles R. Sampson at NRL Monterey computed CON_ (NOGAPS, UKMO, JGSM, GFDN, JTYM) to replicate NCON for 1998, and 1999. This consensus aid was reconstructed from JTWC archives and were not available at JTWC.