1 Traffic accident analysis using machine learning paradigms Miao Chong, Ajith Abraham, Mercin Paprzycki Informatica 29, P89, 2005 Report: Hsin-Chan Tsai
2 The conditions traffic accident The drivers’ behavior The Roadway condition The weather condition
3 Some traffic features and methods Goals and featuresMethodologyresult 1To Compare different traffic fatalities Using Direct graphs [22] (1998) An out-of-sample forecast 2Identify significant factors to predict the possibility of crashes and injury logistic regression model [24] (2001) Village < resident and shopping sites 3To Verify the relationship between accident occurred location and frequency Relationship between casualty freq. and the distance of the zones of residents[25] (1997) Poor resident > rich resident 4A new regression model- 2 conventional linear regression 2 Poisson regression model HSIS (Highway Safety Information System)[26] (1993) describe adequately random, discrete, and sporadic accident events 5Focus on intersection accident from 1997 accident data [2] Compare Multi-layered Perceptron (MLP) Fuzzy ART MAP Accuracy MLP: 60.4~65.6 % Fuzzy: 56.1%
4 Injury type No injury Possible injury Disable injury
5 Some traffic features and methods(2) Goals and featuresMethodologyresult 6A frequency-based scheme transform to categorical codes into numerical value CARE system Back-propagation Neural Networks (BP) (1998) Controlling a single variable could reduce fatalities and injuries by up to 40% ( 僅使用 alcohol data) 7Using NN to analysis accident in intersections Feed-forward MLP using BP learning [12] (1999) 10 inputs, 8 variable Car accident occurred at non- signalized intersection at nighttime 8Drivers’ behaviorsNestied logic formulation [14] No use a restrain system increase accident occurred 9Drivers’ behaviorsLog-linear model [8]Drivers’ alcohol, seat belt… 10Roadway and drivers’ behavior FARS system [1] Light Truck Vehicles LTV-car, car-car, LTV-LTV
6 Preview methods Neural networks Nesting logic formation Log-linear model Fuzzy ART maps
7 Data sets From National Automotive Sampling System (NASS) General Estimates System (GES) datasets From 6.4 million accident reports in the U.S. –From 1995 to 2000 –A total of 417,670 cases
8 Data sets Drivers’ record only and does not include passengers’ information Total set dimension: –Year, month, region, primary sampling unit, the number describing the police jurisdiction, case number, person number, vehicle number, vehicle made and type –inputs of drivers’ age, gender, alcohol usage, restraint system, eject, vehicle body type, vehicle age, vehicle role, initial point of impact, manner of collision, rollover, roadway surface condition, light condition, travel speed, speed limit and the output injury severity.
9 Data sets The injury severity has five classes –No injury70.18% –Possible injury16.07% –Non-incapacitating injury9.48% –Incapacitating injury4.02% –Fatal injury0.25%
10 Data preprocessing The 7 collision categoriesfatal injury –Non-collision0.56% –Rear-end0.08% –Head-on1.54% –Rear to rear0.00% –Angle0.20% –Sideswipe same direction0.08% –Sideswipe opposite direction0.49% Use it only!
11 Head-on collision A total of records –Fatal injury160 records –Impact categorized as front
12 Machine Learning Paradigms Artificial Neural Networks (ANN) using hyper Learning Decision Trees (DT) Support Vector Machine (SVM) Hybrid Decision Tree-ANN (DTANN)
13 A. ANN A Multilayer Perceptron (MLP) –Is a feed forward neural network with 1 or more hidden layers. Nominal data and frequency value –Training data set: {x 1 (p), d 1 (p)}, … { x i (p), d i (p) },… { x n (p), d n (p) }. –Weight (w) and threshold (θ) are random small numbers
14 年齡 性別 拋出車體 酒精 警示系統 車體結構 車況 車輛翻覆 道路資訊 亮度 無傷 有傷 有意識 無意識 死亡
15 An Artificial Neural Networks : for training ……
16 factors Drivers’ type: – 年齡、性別、酒精 Road condition: – 交流道、亮度、道路潮溼 Car type: – 小客車、大客車、大貨車 Car condition: – 撞人、被撞、翻覆、駕駛拋出車體 Injury type: – 無傷、有傷、有意識、無意識、死亡
17 Neurons calculation Calculate the actual output of the neurons in the hidden layers Calculate the actual output of the neurons in the output layers
18 Neurons calculation y i ( 無傷 ) = 年齡(老、中、青)頻率比例 *0.2-Θ 1 + 酒精 ( 有、無 )*1.0- Θ 2 + 翻覆 ( 有、無 )*1.5- Θ 3 + 亮度 ( 佳、差 )*0.8-Θ 4 + ….
19 Backpropagation (BP) Error function Disadvantage –Trapped in a local minimum –Quite well to detect global feature But after training during a long time it will detect individual features
20 Neural networks E(w) = error function to be minimized w = weight vector PT = number of training patterns l = number of output neurons d i (p) = desired output of neuron I when pattern p is introduced to MLP
21 Scaled Conjugate Gradient Algorithm 2 nd derivative of information of global error function E(w k ) E’’: 2 nd derivative, E’: 1 st derivative w k : the weights, p k : search sirection σ k : 2 階標準差, λ k : parameter for regulating the indefiniteness of the Hessian
22 B. Decision Trees An algorithm for classification problem –The Classification and regression trees model (CART) Choosing a variable for splitting the data into two groups recursively at root. root split
23 Gini function gini(T) = 3/10 gini(T 1 ) + 7/10 gini(T 2 ) =3/10(1/3gini(T 3 )+2/3gini(T 4 )) +7/10(4/7gini(T 5 )+3/7gini(T 6 ))
24 C.SVM (support vector machine) For a set of n training examples (x i, y i ) where x i ∊ R d and y i ∊ {-1, +1} Hyperplane –H0: w ‧ x + b =0 –H1: w ‧ x i + b =1 –H2: w ‧ x i + b = -1 H1 H2 H0
25 Hybrid decision tree-ANN DTANN Accident data Decision TreesNeural Network ……
26 Internal Node Terminal Node 1 Terminal Node 7 Terminal Node 6 Terminal Node5 Terminal Node 4 Terminal Node 3 Terminal Node 2 Decision tree structure
27 Performance Analysis
28 A. ANN The number of neurons : 65 neurons has a better accuracy
29 B. Decision tree
30 Non-terminal nodes : 355 Terminal nodes :356 Non-terminal nodes : 486 Terminal nodes :496
31 Non-terminal nodes : 448 Terminal nodes :449 Non-terminal nodes : 290 Terminal nodes :291
32 Non-terminal nodes : 149 Terminal nodes :150
33 C. SVM
34 D. Hybrid DT-ANN Approach Best hidden Neurons number Training performanceTesting performance No injury %65.12% Possible Injury %63.10% Non-incapacitating injury %62.24% Incapacitating injury %72.63% Fatal injury % 90.00%
35 < < > > > 89.46
36 Comparison