Federal Department of Home Affairs FDHA Federal Office of Meteorology and Climatology MeteoSwiss Task 1 of PP Interpretation 1.1Further applications of boosting: This talk 1.2Publication on boosting: Paper of Oliver Marchand submitted, but not yet published
Federal Department of Home Affairs FDHA Federal Office of Meteorology and Climatology MeteoSwiss Thunderstorm Prediction with Boosting: Verification and Implementation of a new Base Classifier André Walser (MeteoSwiss) Martin Kohli (ETH Zürich, Semester Thesis)
3 Andre Walser Overview Boosting Algorithm Impact of learn data Verification results Mapping to probability forecast New base classier: decision tree
4 Andre Walser Supervised Learning Rules Classifier New Data yes/no Historic Data Learner
5 Andre Walser Learn data COSMO-7 assml cycle Data for 79 SYNOP stations in Switzerland At least on year, every hour e.g. SI, CAPE, W, date, time LABEL DATA a thunderstorm „yes“ if an appropriate ww-code was reported in the SYNOP or at least 3 lightnings were registered within 13.5 km station 13.5 km
6 Andre Walser AdaBoost Algorithm Input Weighted learn samples Number of base classifier M Iteration 1 determine base classifier G 2 calculate error, weights w 3 adapt the weights of falsely classified samples
7 Andre Walser Output of the Learn process M base classifier Threshold classifier:
8 Andre Walser AdaBoost Algorithm Input Weighted learn samples Number of base classifier M Iteration 1 determine base classifier G 2 calculate error, weights w 3 adapt the weights of falsely classified samples Classifier:
9 Andre Walser Output of the Classifier: C_TSTORM 17 UTC 18 UTC 19 UTC Biased!
10 Andre Walser Reason: Inappropriate learn data… SYNOP messages contain events and non-events, but are only available every 3 hours (most messages for 6, 12, 18 UTC). Lightning data only contains events
11 Andre Walser New learn data sets B – biased SYNOP messages; only events from lightning data F – full SYNOP messages; all missing values are considered as non events AL1 – at least 1 SYNOP messages; when lightning data shows at least 1 events, all non missing value are considered as non-events
12 Andre Walser Without bias… 17 UTC 18 UTC 19 UTC
13 Andre Walser Verification POD and FAR for different C_TSTORM values between 0.3 and 0.6 FAR = False Alarms / #Alarms Learn data: Model: COSMO-7 assimilation cycle Jun 06 – May 07 Obs: B / AL1 / F Verification data: Model: COSMO-7 forecasts July 06 and May/June 07 Obs: F
14 Andre Walser Verification: earlier results Results reported last year for 2005: POD = 72%, FAR = 34% Unfortunately not realistic, verification done with obs data B
15 Andre Walser July 2006 ~7% events Random forecast
16 Andre Walser 18 May – 24 June 2007
17 Andre Walser Comparison with other system DWD Expert-System: Periode April September 2006: POD = 0.346, FAR = 0.740
18 Andre Walser Mapping to a probability forecast P C_TSTORM Polygon fit in a reliability diagram:
19 Andre Walser Mapping to a probability forecast 0 if x ≤ 0.4; ax 2 + bx + c if 0.4 < x < 0.6; a b0.6 + c if x ≥ 0.6. P C_TSTORM = Limited resolution: The system predicts probabilities only between 0 and ~40%
20 Andre Walser New Base Classifier: Decision Tree threshold classifier 1 1 0
21 Andre Walser New Base Classifier: Decision Tree threshold classifier 1 threshold classifier 2 threshold classifier 3 class 1 class
22 Andre Walser Decision Tree: Example
23 Andre Walser Conclusions & Outlook Boosting is a simple, efficient and effective machine learning method for model post-processing is completely general can employ a number of redundant indicators computes a certainty of the classification mapped to probability forecast First verification results promising, extended verification required Benefit of decision trees?