Download presentation
Presentation is loading. Please wait.
Published byEduardo Duarte Godoy Modified over 6 years ago
1
Traffic flow prediction and minimization of traffic congestion using Adaboost Algorithm with Random Forest as a weak Learner ניבוי מצבי גודש ומזעור לחצי התנועה באמצעות מערכת לומדת (אלגוריתי Adaboost עם Random Forest כלומד חלש) המשמשת לניבוי, וזיהוי של בעיות תעבורה באזור אורבאני by Guy Leshem Joint work with Prof. Ya’acov Ritov Department of Statistics, The Hebrew University of Jerusalem, Israel 11/30/2018
2
Outline Introduction Motivation Methods Results Conclusion Future work
11/30/2018
3
Part I Introduction 11/30/2018
4
Traffic in urban road network
Road transportation is a critical link between all the other modes of transportation and their proper functioning. Signalized intersections, is a critical element of an urban road transportation system. Traffic congestion in urban areas is causing vehicular delays (which increases the total travel time through an urban road network), thus resulting in a reduction in speed, reliability, air and sound pollution and cost-effectiveness of the transportation system. 11/30/2018
5
Part II Motivation 11/30/2018
6
Our Motivation The first motivation for this research is to check if from traffic related data, we can predict heavy traffic with high accurate at the urban area. The main goal of this research is to plan and build a learning system for “traffic flow manage control” of urban area for prediction of traffic flow problem, and to use with this prediction to minimized traffic congestion. 11/30/2018
7
Part III Methods 11/30/2018
8
Definition of Machine learning
Machine learning is concerned with the development of algorithms and techniques that allow computers to "learn". Inductive machine learning methods create computer programs by extracting rules and patterns out of massive data sets. Every discrete sample come with a label, and the goal of the machine learning is to predict labels of a new sample that the algorithm doesn’t meet during the learning process. The learning algorithm will receive partial feedback on his performance during the learning process, and “he” needs to conclude which decision bring to him success and which bring failure. 11/30/2018
9
Example to train / test files (medical data)
Train File: Test File: Patients name E.K.G Blood test Temperature . Guy Yossi Yaacov Classifier by doctor 1+(healthy) 1- (Sick) 1- 1+ Patients name E.K.G Blood test Temperature . Amir Dov Moshe Classifier by the algorithm ? 11/30/2018
10
Part III - A System Part 11/30/2018
11
Scheme of intersection
11/30/2018
12
Data Collection by Magnetic loop detectors
Traffic management and information systems (TMIS) must rely on a system of sensors for estimating traffic parameters in real-time. Currently, the dominant technology for this purpose is use of magnetic loop detectors, which are buried underneath at almost every urban intersections and which measure traffic parameters of vehicles passing over them. Example of the data set (training and testing set) collected by TMIS and used by our system (table below). The last parameter Y is the hypothesis class (-1=congestion, 1=under congestion) which determined by Traffic Flow Evaluation model. 11/30/2018
13
Data Collection by Real-time computer vision system for measuring traffic parameters
The technology of video monitoring systems is alternative and / or addition to the magnetic loop detectors. 11/30/2018
14
Traffic Flow Evaluation model: determination of hypothesis class (LOS) for train file (Example 1 – under congestion) 11/30/2018
15
Traffic Flow Evaluation model: determination of hypothesis class for train file (Example 2 – congestion) 11/30/2018
16
Part III - B Algorithmic part 11/30/2018
17
Boosting Algorithm - What is Boosting?
A method for improving classifier accuracy Basic idea: Perform iterative search to locate the regions/ examples that are more difficult to predict. Thorough each iteration reward accurate predictions on those regions. Combines the rules from each iteration. Only requires that the underlying learning algorithm be better than guessing. 11/30/2018
18
Boosting - Sample space X with M samples: XM = {(x1,y1),…, (xm,ym)}
- Distribution D, e.g. D= 1/M. - Labels Y, Y : X → {-1,+1} - Weak Learner with two parameters g, l (x1,y1),(x2,y2)…(xm,ym) Weak Learn Hypothesis h errorD(h)=Px:D(h(x) ¹Y(x))<1/2-g Will occur at least in probability l 11/30/2018
19
Boosting (cont.) (x1,y1),(x2,y2)…(xm,ym) Weak Learn Hypothesis 1h
Over D1 (x1,y1),(x2,y2)…(xm,ym) Weak Learn Hypothesis h2 Over D2 (x1,y1),(x2,y2)…(xm,ym) Weak Learn Hypothesis hk Over Dk Final Hypothesis : hM=F(h1,h2,…,hk) 11/30/2018
20
Example of a Good Classifier
+ - + + - + - - + - 11/30/2018
21
+ Round 1 of 3 - + - + + - + - - + - O D2 h1 e1 = 0.300 a1=0.424
11/30/2018
22
+ - + + + Round 2 of 3 - - + - - + - O D2 h2 e2 = 0.196 a2=0.704
11/30/2018
23
- - - + + + Round 3 of 3 STOP - + - + O h3 e3 = 0.344 a2=0.323
11/30/2018
24
Final Hypothesis 0.42 + 0.70 + 0.32 Hfinal = sign[ 0.42(h1? 1|-1) (h2? 1|-1) (h3? 1|-1) ] + - 11/30/2018
25
AdaBoost on our Example
Train data Round 1 Initialization 11/30/2018
26
Accuracy Change per Round
11/30/2018
27
Random Forest Model What is a Random Vector ?
Combination of Tree predictors i.e. a large number of trees is generated to create a Random Forest. Each Tree created depends on the value of a Random vector . is generated independently using an identical distribution for all trees What is a Random Vector ? Represents a random columns chosen from Training Data. Data Samples (columns) are usually selected with replacement 11/30/2018
28
How Random Forest algorithm working ?
Random vector are chosen from train file Using any decision Tree Algorithm to build classifier tree Large number of trees is generated to create a Random Forest Each variable pass- through the Random Forest and gets classification (e.g 10 will classify as 1, -10 as 1 and -1 as -1). Each new variable will gets forest of results because each Tree casts 1 vote, the class at X is predicted by selecting the Class with maximum Votes Random column Label column 1 3 2- 1- 3- 4- 8 < 0 1 =< 3- 1- 11/30/2018
29
Random Forest Pseudo Code
Initially select the number of trees to be generated e.g. K. At Step k (1 < k < K): A Vector is generated, represents Samples (data selected for creating Tree) Construct Tree – h(x, ) Using any Decision Tree Algorithm Each Tree casts 1 vote for the most popular class at X The class at X is predicted by selecting the Class with maximum Votes 11/30/2018
30
A New Approach: the Adaboost-Random Forests algorithm
The first is "boost in forest", where an Adaboost classifier is built for each random vector (i.e., a collection of variables), and to get by that sequences of ``simple" Adaboost classifiers, each with small number of variables. The second approach is to use Random Forests algorithms as a weak learners. 11/30/2018
31
Adaboost with Random Forest as Weak Learner Pseudo Code
Input: N examples SN = {(x1,y1),…, (xN,yN)} a weak base learner h = h(D,S) Initialize: equal example weights wl= 1/N for all n = 1..N Iterate for l = 1..L: 1. computed normalized weights Pl(i)=wl(i) / Σ wl(i) 2. weak learner: ht := Learn( S; Pl ) call Random Forest: Initially select the number K of trees to be generated, For k = 1 to K A Vector is generated from the chosen columns (e.g include columns 2 4 5). Construct Tree h(x, ) Using any Decision Tree Algorithm. Each Tree casts 1 vote multiple by Pl t for the most popular class at X , The class at X is predicted by selecting the class with maximum Votes Return a hypothesis hl End For 2. compute hypothesis error per tree εl, 3. compute hypothesis weight βl 4. update example weights for next iteration wl + 1(i), Output: final hypothesis as a linear combination of ht 11/30/2018
32
Prediction of traffic congestion using machine learning
The algorithm is trained on this train file (X is the samples of traffic data, with matrices size m x n which were produced from specific date d and specific time t with class labels m* which were produced from same date d but from different time t+Δt), and the test samples ( which produce from latterly date d* and specific time t) were classified by the trained classifier. 11/30/2018
33
Traffic Signal Timing Optimization
There is number of models / Software for Traffic Signal Timing Optimization (e.g. PASSER ). Base on the existing data, and with the software for “Traffic Signal Timing Optimization”, we can to plan many programs of stop light for every possible congestion scenario. The suitable stop light program to the given prediction can be found, and the current stop light program will change with this optimize program before the congestion situation will occurred. 11/30/2018
34
Microscopic traffic simulator
In order to check the quality of the traffic prediction and the quality of “Traffic Signal Timing Optimization”, we also used the microscopic traffic simulator. We found an error rate of approximately 7%. In comparison, the naive predictor (consisting in the estimation of the future class labels by its current value) gives an error rate of 16% 11/30/2018
35
Part IV Results 11/30/2018
36
Experiment 1: Error curve of Adaboost with threshold, and with Random Forest as weak learners
Error curve for Adaboost algorithm with classical weak-learner (threshold), and with Random Forest as weak learners, on the same dataset (satimage). 11/30/2018
37
Error curve for Adaboost algorithm with Random Forest as
Experiment 2: Error curve of Adaboost with Random Forest as weak learners, with or without missing data. Error curve for Adaboost algorithm with Random Forest as weak learners, with / without missing data, on the same dataset (satimage). 11/30/2018
38
Experiment 3: prediction results when one week predict one week later
First prediction results on Jerusalem data set: one week predict one week later 11/30/2018
39
Experiment 4: prediction results when one week predict one day later
First prediction results on Jerusalem data set: one week predict one day later 11/30/2018
40
Experiment 5: Using with Traffic Signal Timing Optimization (Preliminary test)
Number of lanes with congestion situation lengthwise Yermiyahu road – 4. Number of lanes with congestion situation lengthwise Yermiyahu road with Optimization program – 1. 11/30/2018
41
Part V Conclusions 11/30/2018
42
Advantage of the new approach
The final classifier of this approach (Adaboost algorithm with random forest as weak learning) is very accurate with minor misclassification and strong predication ability (it is excelled in accuracy among current algorithms). It has an effective method for dealing with missing data and maintains accuracy when a large proportion of the data are missing. It has an effective method for predication of multi-class classification problems. It has an effective method for unbalanced data (for instance: one label (e.g “1”- congestion) in the train file is very poor (e.g 10%) and in the test file is a much larger (e.g 35%)). 11/30/2018
43
Conclusions from the predication results
This preliminary test showed very good results and the methodology used to predict traffic congestion for t + t=07:30 gave an error rate of approximately 2% in the first experiment ("one week predicts the following week"), and approximately 3% in the second experiment ("one week predicts a following day"). The possibility of prediction of traffic congestion half an hour before its happening enable to talk about future traffic flow policies. 11/30/2018
44
Part VI Future Work 11/30/2018
45
future traffic flow policies
System methodology Data set collection by Traffic management and information systems and preprocessing of traffic data (producing train and test files). Prediction of traffic flow congestion by the machine learning. Dynamic change of traffic lights program according to traffic lights optimization which achieved according to given predictions future traffic flow policies Reducing of average time delay Prediction of traffic flow by machine learning Dynamic change of traffic lights program Real-time data collection 11/30/2018
46
The suggested system Data set collection by Traffic management and information systems optimization of traffic lights according to given predictions Prediction of traffic flow congestion by the machine learning 11/30/2018
47
That is all, folks… Thank you for your patience!
Guy Leshem 11/30/2018
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.