Towards a Learning Incident Detection System ICML 06 Workshop on Machine Learning for Surveillance and Event Detection June 29, 2006 Tomas Singliar Joint.

1 Towards a Learning Incident Detection System ICML 06 Workshop on Machine Learning for Surveillance and Event Detection June 29, 2006 Tomas Singliar Joint work with Dr. Milos Hauskrecht

2 Outline  Replace traffic engineers with ML algorithms for incident detection  Traffic data collection and quality Why, who and for what purposes  Incident detection algorithms  Evaluation metrics  Individual feature performance  Sensor fusion with SVM  Noisy data problems Attempts to model accident evolution with DBN  Conclusions and future work Noisy data: Poor onset tagging and “bootstrap”

3 Traffic data collection  Sensor network Volumes Speeds Occupancy  Data aggregated over 5 minutes  Incidents police camera system

4 Incident Annotation incident no incident

5 Incident annotation  Incident labels not necessarily correct or timely Do not correct timing (opportunity for more ML )

6 Incident detection algorithms, intuition  Incidents detected indirectly through caused congestion  Baseline: “California 2” algorithm: If OCC(up) – OCC(down) > T1, next step If [OCC(up) – OCC(down)]/ OCC(up) > T2, next step If [OCC(up) – OCC(down)]/ OCC(down) > T3, possible accident If previous condition persists for another time step, sound alarm  Hand-calibrated T1-T3 – very labor intensive  Why so few ML applications? nontraditional data, anomaly detection – rare positives, common sense works well Occupancy spikesOccupancy falls

7 Evaluation metrics  AMOC curve Time to detection (TTD) vs False positive rate (FPR) Don’t know when exactly incident happened Maximal TTD (120min) AU interesting region of C  Performance envelope Detection rate (DR) vs FPR Random gets over diagonal Report ROC as a check  Sensitivity vs specificity  Low false positive region 1 false alarm/day * 150 sensors

8 Features  Sensor measurements  Temporal derivative  Spatial differences

9 Features  Simple measurements: 3 per sensor, 6 total Occupancy < threshold

10 Temporal features  Capture abrupt changes Occupancy spike – now minus previous time slice

11 Spatial differences  “Discontinuities” in flow between sensor positions Difference in speeds downstream - upstream

12 Sensor fusion  Information in all simple detectors  How to combine their outputs?  Linear combination – SVM

13 Baseline: California 2  Hand-calibrated (+brute force)  Good low FAR performance, but poor detection rate

14 SVM  Combines sensor measurements via a linear combination

15 SVM  Spatial relations Sensor measurements plus ratios and differences from the neighboring sensor

16 SVM  Temporal derivatives Sensor measurements plus differences and ratios to previous step

17 Focus on low FAR  California better – persistency check

18 A dynamic Naïve Bayes network  Problem: Incidents are recorded later than they occur True state of highway is unobservable by sensors  Picture of incidents evolves in time About 30 features: 3 readings up/down stream, differences, ratios to neighboring sensor, previous time point speed Occupancy(t-5) Incident observed … True hidden state HHH I OnOn O1O1 I OnOn O1O1 I OnOn O1O1 … …… …

19 A dynamic Naïve Bayes network  Evolution of an accident: Normal traffic steady state Accident happens, effects build up Constricted steady state Recovery  Model has 4 hidden states Anchor hidden states to desired semantics: clamp p(I|H) Raise alarm if p(H=acc_state|O) > threshold  Learned hidden state transition matrix: 0.9536 0.0332 0.0000 0.0133 0.0050 0.9577 0.0339 0.0034 0.0000 0.0882 0.9033 0.0084 0.0957 0.0000 0.0753 0.8290 H1H2 H4H3

20 DNB Performance  Poor job at low FAR Fairly insensitive to threshold

21 Summary  Challenges to ML in traffic incident detection Rare class – data sparsity, unequal misclassif cost Incident annotations are noisy  Machine learning methods competitive though SVM outperforms current practice No manual tuning, readapts to data after changes  Lessons and surprises: Richer feature sets do not help much Neither does removing diurnal trends (?) SVM has very stable performance Dynamic Naïve Bayes weak

22 Future work  Discriminate incident and benign congestion  Improve discriminative classification SVM with nonlinearities (?) Unequal misclassification cost models  Improve dynamical models SVM handles time awkwardly – Dynamic Bayes Nets Conditional random fields – discriminative + time  Improve Data Bootstrap – use even a strawman to label incident start, learn from relabeled data (, iterate)  Supplemental materials available (AMOC curves that did not fit into the paper)

23 Thank you  Questions?  Suggestions?

24 SVM  California 2 measurements Current and past occupancies

25 DNB Performance

