Presentation is loading. Please wait.

Presentation is loading. Please wait.

1 TRAFFIC PREDICTION THROUGH DATA MINING IN LAS VEGAS Benjamin Pecheux

Similar presentations


Presentation on theme: "1 TRAFFIC PREDICTION THROUGH DATA MINING IN LAS VEGAS Benjamin Pecheux"— Presentation transcript:

1 1 TRAFFIC PREDICTION THROUGH DATA MINING IN LAS VEGAS Benjamin Pecheux bpecheux@vizuri.com

2 2 Overview Motivation RTCSNV - FAST Approach Data and Infrastructure Data and Infrastructure Assessment Visualizations Prediction Test

3 3 Motivation Most traffic monitoring and management system display current traffic status Existing traffic prediction use simple trending algorithms or complex simulation in-the-loop models Traffic Management Center (TMC) want to be proactive rather than reactive to improve traffic conditions, reduce delays

4 4 FAST Freeway and Arterial System of Transportation (FAST) in Las Vegas, NV

5 5 FAST FAST Dashboard http://bugatti.nvfast.org/

6 6 Approach

7 7 Large scale data analysis using RTCFAST data Reviewed FAST data collection architecture Obtained FAST historical traffic data Evaluated data produced by FAST in correlation with data collection architecture Assess the potential of using the collected data for traffic prediction Developed a test prediction using the best data available

8 8 Tools Cloud infrastructure: Google Cloud Relational Database: Google Cloud SQL Cloud Instances: Google Compute No SQL Database: Google BigQuery Storage Google Cloud Storage Statistics and visualization: R, ggplot2, ggmap Development: Rstudio

9 9 Data and Infrastructure

10 10 Traffic Sensor Data FAST archived data for all 2013 Extracted from FAST dashboard database backups Data is archived at a 15 min time interval 499 Road sensors: Wavetronix radar sensor208 devices ISS radar sensor121 devices Unknown170 devices Total of 35,164,505 records for 2013 Wavetronix radar sensor19,401,481 records ISS radar sensor9,787,761 records Unknown5,975,263 records

11 11 FAST Data Collection Infrastructure Unknown device type present in the dataset, possibly loop detectors

12 12 Location of Various Sensors Green – Wavetronix Radar Sensors, Serial communication connection to McCain 170 Controllers, Ethernet Cable to RuggedCom RS9000, Fiber daisy chain to Hub, Fiber to TMC Red – Wavetronix Radar Sensors, Ethernet Wireless communication connection to McCain 170, same path to TMC as Green Blue – ISS Radar Sensors, same path to TMC as Red

13 13 FAST Archive Data Fields FieldData TypeFieldData Type IpmIdintegerVolume6integer DateTimeStampdatetimeOccupancyinteger PathintegerSpeedinteger RoadIndexintegerPoll_Countinteger RoadwayIDintegerFailureinteger SegmentIDintegerRoadTypecharacters LaneintegerLocationcharacters DeviceIDintegerPolling_Periodinteger VolumeintegerInvalidinteger Volume1integerDetectorIDcharacters Volume2integerDayOfWeekinteger Volume3integerDateValuedatetime Volume4integerHourIdxinteger Volume5integerHolidayinteger

14 14 Construction, Road Weather and Incident Data Construction data Valuable to distinguish “bad sensors data” from “good sensor” with disrupted traffic Available from NDOT but not aggregated Road Weather Data Only 31 days (need all of 2013) Incident Data Available from FAST but not useful for prediction at this point as they happened somewhat randomly

15 15 Data and Infrastructure Assessment

16 16 Visualizing a million FAST data in 2013 35+ million of records in 2013 30 columns per records => ~ 1 billion data points Pie charts and histogram don’t work anymore New types of graphs are required Color schemes Trending, flattening, aggregation or binning techniques

17 17 Treemap

18 18 Treemap -7.5 Unknown Connection to TMC Fiber Connection from Controller to TMC

19 19 Treemap -7.5 Serial Connection from Sensor to Controller Ethernet Wireless Connection from Sensor to Controller Unknown Connection from Sensor to TMC Unknown Sensor Fiber Connection from Controller to TMC

20 20 Treemap -7.5 Serial Connection from Sensor to Controller Ethernet Wireless Connection from Sensor to Controller Unknown Connection from Sensor to TMC Image Sensing System Unknown Sensor Fiber Connection from Controller to TMC Wavetronix Radar Sensor

21 21 Treemap -7.5 Serial Connection from Sensor to Controller Ethernet Wireless Connection from Sensor to Controller Unknown Connection from Sensor to TMC Image Sensing System Unknown Sensor Fiber Connection from Controller to TMC Wavetronix Radar Sensor One box represents one sensor Size of box represents number of records

22 22 FAST Dataset Treemap (499)

23 23 FAST Sensor Locations (331 Geocoded))

24 24 Completeness of FAST Data Completeness is expressed as: Sensor generating data for all 12 months of the year 2013 Good or Bad Data Good Data is defined as data for every month with less than 20% of that month being invalid Bad Data is defined as everything else Count records archived (15 min interval) for every month of 2013 for each of the 499 FAST sensors Construction data can be used to better understand missing data

25 25 Completeness of FAST Data: Example 1

26 26 Completeness of FAST Data: Example 2

27 27 Completeness of FAST Data: Example 3

28 28 Completeness of FAST Data: Example 4

29 29 Completeness of FAST Sensor Data Across 2013

30 30 Final Completeness of FAST Sensor Data (288/499)

31 31 Completeness of Data Across 2013: Before Step 1

32 32 Completeness of Data Across 2013: After Step 1

33 33 Completeness of Data Across 2013: What Remains

34 34 Completeness Filter Outcome 7,280,003 records removed 27,884,502 records remain Secondary Network Primary Network Device Total Records Records Passed Percent Remaining Sensors FiberSerial/EthernetWavetronix radar sensor16,407,33215,072,54791.86%141 FiberWirelessISS radar sensor9,787,7617,300,04274.58%73 FiberWirelessWavetronix radar sensor2,994,1492,420,76580.85%19 Unknown 5,975,2633,091,14851.73%55 Total35,164,50527,884,502288

35 35 FAST Data Quality of 15 minute Aggregation Assessed the quality for each FAST sensor found to have Complete Data in 2013 FAST sensors data is pulled every minute for 15 minutes then averaged and recorded For the average to be reliable, there needs to be a limited amount of error during the 15 data pulls made Based on FAST guidance, records can be trusted if they have 10 or more successful pulls out of 15 attempts or 66% successful

36 36 FAST Sensor Data Quality Across 15 min Intervals

37 37 FAST Sensor Data Quality Across 15 min Intervals (264/499)

38 38 Quality of Data Across 2013: Before Step 2

39 39 Quality of Data Across 2013: After Step 2

40 40 Quality of Data Across 2013: What Remains

41 41 Quality Filter Outcome 1,341,366 records removed 26,543,136 records remain 24 sensors removed Secondary Network Primary Network Device Total Records Records Passed Percent Remaining Sensors FiberSerial/EthernetWavetronix radar sensor16,407,33214,937,94391.04%138 FiberWirelessISS radar sensor9,787,7617,136,57672.91%71 FiberWirelessWavetronix radar sensor2,994,1492,420,76580.85%19 Unknown 5,975,2632,047,85234.27%36 Total35,164,50526,543,136264

42 42 Fast Data Traffic Flow Usability Flow as a basis for traffic prediction Can we calculate traffic flow using the data collection by the “good” sensors we have isolated? We observed that some records are showing Speed > 0 and Volume = 0 or Speed = 0 and Volume > 0 Assess how much of the “good” datasets these rows represents Select FAST sensors with less than 20% flow error for all 2013 and for each 15 min interval

43 43 Usable (Flow) FAST Sensor Data for 2013

44 44 Usable (Flow) FAST Sensor Data for 2013 (242/499)

45 45 Usable Sensors for 2013: Before Step 3

46 46 Usable Sensors for 2013: After Step 3

47 47 Usable Sensors for 2013: What Remains

48 48 Usability Filter Outcome 1,838,859 records removed 24,704,277 records remain 22 sensors removed Secondary Network Primary Network Device Total Records Records Passed Percent Remaining Sensors FiberSerial/EthernetWavetronix radar sensor16,407,33214,062,01185.71%127 FiberWirelessISS radar sensor9,787,7616,679,73568.25%66 FiberWirelessWavetronix radar sensor2,994,1492,126,52571.02%16 Unknown 5,975,2631,836,00630.73%33 Total35,164,50524,704,277242

49 49 Failure Map: All Sensors that Failed Data Analysis

50 50 Failure Map: Fiber – Wireless – Wavetronix radar sensor

51 51 Failure Map: Fiber – Wireless – ISS radar sensor

52 52 Failure Treemap

53 53 FAST Sensor Data Top 10 Road Sensors for Prediction Prediction Model Candidates

54 54 Top 10 Sensors to Consider for Prediction

55 55 Visualizations

56 56 Additional Visualizations Additional visualizations to allow for further review of the behavior of each of the 499 FAST sensors Two types of graphs were generated Hexagonal binning plot Calendar heatmap

57 57 Hexagonal Binning plot

58 58 428.3.344 Hexagonal Binning plot

59 59 Calendar Heat map

60 60 Roadway ID 437 Calendar Heatmap

61 61 Prediction Test

62 62 Prediction limitations FAST data is aggregated at 15 minutes interval Traffic is random (stochastic) and cannot be predicted precisely Need inline traffic simulation with origin and destination 1. Classification Example: Estimate future phase (Three phase traffic theory) Free flow, Synchronized flow and Wide Moving Jam 2. Regression Example: Estimate future speed

63 63 Prediction Algorithms K-Nearest Neighbor Simplest prediction method Sensitive locality and noise Neural network Emulate biological neurons network Complex, difficult to implement and tune Random Forest Thousands of decision trees Simple, brute force Handle noisy dataset and locality

64 64 Random Trees or Random Forest Algorithm Predictors Records Outcomes

65 65 Random Trees or Random Forest Algorithm

66 66 FAST sensor selection Road sensor 350.1.158

67 67 FAST sensor 350.1.158 location

68 68 FAST sensor 350.1.158 Hexagonal binning plot

69 69 FAST sensor 350.1.158 Calendar heatmap

70 70 FAST sensor 350.1.158 Classification of sensor data

71 71 FAST sensor 350.1.158 Classification of sensor data Free flow Wide Moving Jam Synchronized Flow

72 72 Ancillary data Predictors so far Year, month, day and time 2013 Traffic data classified by traffic phase Not enough to create a good prediction model Need more datasets to add more possible predictors Weather prediction (always included in other prediction) Construction schedule Las Vegas Events attendance data Hotel and Casino occupancy data

73 73 Weather Data Road Weather Information System (RWIS) Not enough data KLAS NOAA archive Mean Temperature, Wind Speed

74 74 Weather Data Data obtained from KLAS FieldType FieldType PSTdate Min Sea Level Pressure (In)float Max Temperature (F)integer Max Visibility (Miles)integer Mean Temperature (F)integer Mean Visibility (Miles)integer Min Temperature (F)integer Min Visibility (Miles)integer Max Dew Point (F)integer Max Wind Speed (MPH)integer Mean Dew Point (F)characters Mean Wind Speed (MPH)integer Min Dew Point (F)characters Max Gust Speed (MPH)integer Max Humidityinteger Precipitation (In)float Mean Humidityinteger Cloud Coverinteger Min Humidityinteger Eventscharacters Max Sea Level Pressure (In)float Wind Dir. (Degrees)integer Mean Sea Level Pressure (In)float

75 75 Las Vegas Events Data Las Vegas Convention Calendar Event Venue Start and End Date Attendance DateVenueOccupancy 12/28/12Hard Rock Hotel & Casino190 12/29/12Hard Rock Hotel & Casino190 12/30/12Hard Rock Hotel & Casino190 12/31/12Hard Rock Hotel & Casino190 1/1/13Hard Rock Hotel & Casino190 12/31/12Bellagio31 1/1/13Bellagio31 1/2/13MGM Grand Hotel and Casino650 1/3/13MGM Grand Hotel and Casino650 1/4/13MGM Grand Hotel and Casino650 1/5/13MGM Grand Hotel and Casino650 1/3/13Caesars Palace252 1/4/13Caesars Palace252 1/3/13Tropicana Las Vegas300 1/4/13Tropicana Las Vegas300 1/5/13Tropicana Las Vegas300

76 76 Predictors and outcome For each 15 minute interval of 2013 Predictors: 94 predictors combining date, time, weather, events attendance Outcome: Traffic phase

77 77 Predictors selection Out of 94 predictors, 20 with non zero variance for sensor 350.1.158 Date and time Month, Day, Hour, Minute (15-minute interval), DayOfWeek Weather data Mean Temperature in Fahrenheit, Mean Wind Speed in MPH Events data Bellagio, Caesars Palace, Tropicana Las Vegas, Mandalay Bay Resort And Casino, Bally’s Las Vegas, Circus Circus Hotel Casino and Theme Park, Monte Carlo Resort and Casino, Venetian Resort Hotel Casino, ARIA Resort And Casino, Cosmopolitan of Las Vegas, South Point Hotel Casino And Spa, Westin Las Vegas Hotel Casino And Spa, MEET Las Vegas

78 78 Prediction model training Two-way partition Cross validation Bootstrapping

79 79 Prediction model test results

80 80 Three days prediction test

81 81 Possible use of prediction

82 82 Possible use of prediction

83 83 What’s next?

84 84 Questions?


Download ppt "1 TRAFFIC PREDICTION THROUGH DATA MINING IN LAS VEGAS Benjamin Pecheux"

Similar presentations


Ads by Google