Traffic flow prediction and minimization of traffic congestion using Adaboost Algorithm with Random Forest as a weak Learner ניבוי מצבי גודש ומזעור לחצי.

Traffic flow prediction and minimization of traffic congestion using Adaboost Algorithm with Random Forest as a weak Learner ניבוי מצבי גודש ומזעור לחצי התנועה באמצעות מערכת לומדת (אלגוריתי Adaboost עם Random Forest כלומד חלש) המשמשת לניבוי, וזיהוי של בעיות תעבורה באזור אורבאני by Guy Leshem Joint work with Prof. Ya’acov Ritov Department of Statistics, The Hebrew University of Jerusalem, Israel 11/30/2018

Outline Introduction Motivation Methods Results Conclusion Future work
11/30/2018

Part I Introduction 11/30/2018

Traffic in urban road network
Road transportation is a critical link between all the other modes of transportation and their proper functioning. Signalized intersections, is a critical element of an urban road transportation system. Traffic congestion in urban areas is causing vehicular delays (which increases the total travel time through an urban road network), thus resulting in a reduction in speed, reliability, air and sound pollution and cost-effectiveness of the transportation system. 11/30/2018

Part II Motivation 11/30/2018

Our Motivation The first motivation for this research is to check if from traffic related data, we can predict heavy traffic with high accurate at the urban area. The main goal of this research is to plan and build a learning system for “traffic flow manage control” of urban area for prediction of traffic flow problem, and to use with this prediction to minimized traffic congestion. 11/30/2018

Part III Methods 11/30/2018

Definition of Machine learning
Machine learning is concerned with the development of algorithms and techniques that allow computers to "learn". Inductive machine learning methods create computer programs by extracting rules and patterns out of massive data sets. Every discrete sample come with a label, and the goal of the machine learning is to predict labels of a new sample that the algorithm doesn’t meet during the learning process. The learning algorithm will receive partial feedback on his performance during the learning process, and “he” needs to conclude which decision bring to him success and which bring failure. 11/30/2018

Example to train / test files (medical data)
Train File: Test File: Patients name E.K.G Blood test Temperature . Guy Yossi Yaacov Classifier by doctor 1+(healthy) 1- (Sick) 1- 1+ Patients name E.K.G Blood test Temperature . Amir Dov Moshe Classifier by the algorithm ? 11/30/2018

Part III - A System Part 11/30/2018

Scheme of intersection
11/30/2018

Data Collection by Magnetic loop detectors
Traffic management and information systems (TMIS) must rely on a system of sensors for estimating traffic parameters in real-time. Currently, the dominant technology for this purpose is use of magnetic loop detectors, which are buried underneath at almost every urban intersections and which measure traffic parameters of vehicles passing over them. Example of the data set (training and testing set) collected by TMIS and used by our system (table below). The last parameter Y is the hypothesis class (-1=congestion, 1=under congestion) which determined by Traffic Flow Evaluation model. 11/30/2018

Data Collection by Real-time computer vision system for measuring traffic parameters
The technology of video monitoring systems is alternative and / or addition to the magnetic loop detectors. 11/30/2018

Traffic Flow Evaluation model: determination of hypothesis class (LOS) for train file (Example 1 – under congestion) 11/30/2018

Traffic Flow Evaluation model: determination of hypothesis class for train file (Example 2 – congestion) 11/30/2018

Part III - B Algorithmic part 11/30/2018

Boosting Algorithm - What is Boosting?
A method for improving classifier accuracy Basic idea: Perform iterative search to locate the regions/ examples that are more difficult to predict. Thorough each iteration reward accurate predictions on those regions. Combines the rules from each iteration. Only requires that the underlying learning algorithm be better than guessing. 11/30/2018

Boosting - Sample space X with M samples: XM = {(x1,y1),…, (xm,ym)}
- Distribution D, e.g. D= 1/M. - Labels Y, Y : X → {-1,+1} - Weak Learner with two parameters g, l (x1,y1),(x2,y2)…(xm,ym) Weak Learn Hypothesis h errorD(h)=Px:D(h(x) ¹Y(x))<1/2-g Will occur at least in probability l 11/30/2018

Boosting (cont.) (x1,y1),(x2,y2)…(xm,ym) Weak Learn Hypothesis 1h
Over D1 (x1,y1),(x2,y2)…(xm,ym) Weak Learn Hypothesis h2 Over D2 (x1,y1),(x2,y2)…(xm,ym) Weak Learn Hypothesis hk Over Dk Final Hypothesis : hM=F(h1,h2,…,hk) 11/30/2018

Example of a Good Classifier
+ - + + - + - - + - 11/30/2018

+ Round 1 of 3 - + - + + - + - - + - O D2 h1 e1 = 0.300 a1=0.424
11/30/2018

+ - + + + Round 2 of 3 - - + - - + - O D2 h2 e2 = 0.196 a2=0.704
11/30/2018

- - - + + + Round 3 of 3 STOP - + - + O h3 e3 = 0.344 a2=0.323
11/30/2018

Final Hypothesis 0.42 + 0.70 + 0.32 Hfinal = sign[ 0.42(h1? 1|-1) (h2? 1|-1) (h3? 1|-1) ] + - 11/30/2018

AdaBoost on our Example
Train data Round 1 Initialization 11/30/2018

Accuracy Change per Round
11/30/2018

Random Forest Model What is a Random Vector ?
Combination of Tree predictors i.e. a large number of trees is generated to create a Random Forest. Each Tree created depends on the value of a Random vector . is generated independently using an identical distribution for all trees What is a Random Vector ? Represents a random columns chosen from Training Data. Data Samples (columns) are usually selected with replacement 11/30/2018

How Random Forest algorithm working ?
Random vector are chosen from train file Using any decision Tree Algorithm to build classifier tree Large number of trees is generated to create a Random Forest Each variable pass- through the Random Forest and gets classification (e.g 10 will classify as 1, -10 as 1 and -1 as -1). Each new variable will gets forest of results because each Tree casts 1 vote, the class at X is predicted by selecting the Class with maximum Votes Random column Label column 1 3 2- 1- 3- 4- 8 < 0 1 =< 3- 1- 11/30/2018

Random Forest Pseudo Code
Initially select the number of trees to be generated e.g. K. At Step k (1 < k < K): A Vector is generated, represents Samples (data selected for creating Tree) Construct Tree – h(x, ) Using any Decision Tree Algorithm Each Tree casts 1 vote for the most popular class at X The class at X is predicted by selecting the Class with maximum Votes 11/30/2018

A New Approach: the Adaboost-Random Forests algorithm
The first is "boost in forest", where an Adaboost classifier is built for each random vector (i.e., a collection of variables), and to get by that sequences of ``simple" Adaboost classifiers, each with small number of variables. The second approach is to use Random Forests algorithms as a weak learners. 11/30/2018

Adaboost with Random Forest as Weak Learner Pseudo Code
Input: N examples SN = {(x1,y1),…, (xN,yN)} a weak base learner h = h(D,S) Initialize: equal example weights wl= 1/N for all n = 1..N Iterate for l = 1..L: 1. computed normalized weights Pl(i)=wl(i) / Σ wl(i) 2. weak learner: ht := Learn( S; Pl ) call Random Forest: Initially select the number K of trees to be generated, For k = 1 to K  A Vector is generated from the chosen columns (e.g include columns 2 4 5).  Construct Tree h(x, ) Using any Decision Tree Algorithm.  Each Tree casts 1 vote multiple by Pl t for the most popular class at X , The class at X is predicted by selecting the class with maximum Votes  Return a hypothesis hl End For 2. compute hypothesis error per tree εl, 3. compute hypothesis weight βl 4. update example weights for next iteration wl + 1(i), Output: final hypothesis as a linear combination of ht 11/30/2018

Prediction of traffic congestion using machine learning
The algorithm is trained on this train file (X is the samples of traffic data, with matrices size m x n which were produced from specific date d and specific time t with class labels m* which were produced from same date d but from different time t+Δt), and the test samples ( which produce from latterly date d* and specific time t) were classified by the trained classifier. 11/30/2018

Traffic Signal Timing Optimization
There is number of models / Software for Traffic Signal Timing Optimization (e.g. PASSER ). Base on the existing data, and with the software for “Traffic Signal Timing Optimization”, we can to plan many programs of stop light for every possible congestion scenario. The suitable stop light program to the given prediction can be found, and the current stop light program will change with this optimize program before the congestion situation will occurred. 11/30/2018

Microscopic traffic simulator
In order to check the quality of the traffic prediction and the quality of “Traffic Signal Timing Optimization”, we also used the microscopic traffic simulator. We found an error rate of approximately 7%. In comparison, the naive predictor (consisting in the estimation of the future class labels by its current value) gives an error rate of 16% 11/30/2018

Part IV Results 11/30/2018

Experiment 1: Error curve of Adaboost with threshold, and with Random Forest as weak learners
Error curve for Adaboost algorithm with classical weak-learner (threshold), and with Random Forest as weak learners, on the same dataset (satimage). 11/30/2018

Error curve for Adaboost algorithm with Random Forest as
Experiment 2: Error curve of Adaboost with Random Forest as weak learners, with or without missing data. Error curve for Adaboost algorithm with Random Forest as weak learners, with / without missing data, on the same dataset (satimage). 11/30/2018

Experiment 3: prediction results when one week predict one week later
First prediction results on Jerusalem data set: one week predict one week later 11/30/2018

Experiment 4: prediction results when one week predict one day later
First prediction results on Jerusalem data set: one week predict one day later 11/30/2018

Experiment 5: Using with Traffic Signal Timing Optimization (Preliminary test)
Number of lanes with congestion situation lengthwise Yermiyahu road – 4. Number of lanes with congestion situation lengthwise Yermiyahu road with Optimization program – 1. 11/30/2018

Part V Conclusions 11/30/2018

Advantage of the new approach
The final classifier of this approach (Adaboost algorithm with random forest as weak learning) is very accurate with minor misclassification and strong predication ability (it is excelled in accuracy among current algorithms). It has an effective method for dealing with missing data and maintains accuracy when a large proportion of the data are missing. It has an effective method for predication of multi-class classification problems. It has an effective method for unbalanced data (for instance: one label (e.g “1”- congestion) in the train file is very poor (e.g 10%) and in the test file is a much larger (e.g 35%)). 11/30/2018

Conclusions from the predication results
This preliminary test showed very good results and the methodology used to predict traffic congestion for t + t=07:30 gave an error rate of approximately 2% in the first experiment ("one week predicts the following week"), and approximately 3% in the second experiment ("one week predicts a following day"). The possibility of prediction of traffic congestion half an hour before its happening enable to talk about future traffic flow policies. 11/30/2018

Part VI Future Work 11/30/2018

future traffic flow policies
System methodology Data set collection by Traffic management and information systems and preprocessing of traffic data (producing train and test files). Prediction of traffic flow congestion by the machine learning. Dynamic change of traffic lights program according to traffic lights optimization which achieved according to given predictions future traffic flow policies Reducing of average time delay Prediction of traffic flow by machine learning Dynamic change of traffic lights program Real-time data collection 11/30/2018

The suggested system Data set collection by Traffic management and information systems optimization of traffic lights according to given predictions Prediction of traffic flow congestion by the machine learning 11/30/2018

That is all, folks… Thank you for your patience!
Guy Leshem 11/30/2018

Traffic flow prediction and minimization of traffic congestion using Adaboost Algorithm with Random Forest as a weak Learner ניבוי מצבי גודש ומזעור לחצי.

Similar presentations

Presentation on theme: "Traffic flow prediction and minimization of traffic congestion using Adaboost Algorithm with Random Forest as a weak Learner ניבוי מצבי גודש ומזעור לחצי."— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

Traffic flow prediction and minimization of traffic congestion using Adaboost Algorithm with Random Forest as a weak Learner ניבוי מצבי גודש ומזעור לחצי.

Similar presentations

Presentation on theme: "Traffic flow prediction and minimization of traffic congestion using Adaboost Algorithm with Random Forest as a weak Learner ניבוי מצבי גודש ומזעור לחצי."— Presentation transcript:

Similar presentations

About project

Feedback