D10: Recommendations on methodologies for identification of the best predictor variables for extreme events
1. Introduction Definition of good/best predictor variables Strong/robust relationship with predictand Stationary relationship with predictand Explain low-frequency variability/trends Physically meaningful Appropriate spatial scale (physics/GCM) Data widely/freely available (obs/GCM) Well reproduced by GCM (see D13)
2. Identification of potential predictor variables Constrained by Reanalysis/GCM data Guided by expert judgement Two general approaches in STARDEX: –Start with minimum and add more if necessary –Start with (nearly) everything and select/prune
3. Choices Surface and/or upper air Continuous vs discrete (CTs) predictors Circulation only or include atmospheric humidity/stability etc Spatial domain Lags – temporal and spatial Number of predictors
4. Number of predictors What is optimal/desirable number? Traditionally feel comfortable with “a few” –Physical understanding –Avoid correlated predictors Also an issue “within” predictors –Few PC/sEOFs or Guy’s clusters (e.g., 3-5) vs CT classifications (e.g., classes) But is it so important to prune?
5. Methods Correlation, e.g., UEA, USTUTT-IWS Stepwise multiple regression, e.g., KCL PCA/CCA, e.g., ARPA-SMR, UEA Compositing, e.g., KCL Neural networks, e.g., KCL, UEA(SYS) Genetic algorithm, e.g., KCL “Weather typing”, e.g., AUTH, USTUTT-IWS Trend analysis, e.g., DMI, USTUTT-IWS
6. Conclusions Include summary table of variables recommended by each group Refer to D13 – need for validation of potential predictors