Big Data and Advanced Models on a Mid-Sized City’s Budget: The Chattanooga Experience Vince Bernardin, PhD (RSG) Jason Chen, PhD (RSG) Steven Trevino (RSG) Yuen Lee, AICP (Chattanooga-Hamilton County Planning Agency) May 15, 2017
Background
TDM Development / Update Data Collection Plan Data Collection TDM Development / Update Regional Plan Development Adoption EVALUATED DATA OPTIONS Data Needs / Sources Staff / Consultant Resources Cost Accuracy & Geographic Coverage Update Frequency / Schedule
Internal Origin-Destination Data Data Source(s) Update Frequency Latest Data Next Collection Year Cost Estimate Recommendation Household Travel Surveys 8 - 12 years 2010 2020 > $250,000 Conduct HHTS every third planning cycle, align with Census Cell Phone Surveys (e.g., AirSage) 4 years Not currently used 2014 $40,000 – 50,000 Purchase cell phone O/D data once per planning cycle Bluetooth Surveys N/A Too expensive for regional data, consider for specific locations Transit On-board Surveys $100,000 Conduct survey once each planning cycle Stated Preference Transit Surveys $50,000 – 100,000 Freight Establishment Surveys 8 years $120,000-130,000 Not needed Truck GPS Data $25,000 Purchase directly from ATRI once each planning cycle starting in 2014
Travel Time / Speed Data Data Source(s) Update Frequency Update Priority Latest Data Next Recom-mended Data Collection Year Staff / Consultant Resources Required Data Purchase Cost Accuracy Expected Geographic Coverage Ability to Continuously Update Suitability For TPO Use Cost Estimatea Recommendation FHWA probe data from National Highway System (NPMRDS) Annual 1 Not currently used 2014b 3 4 2 Free Obtain free NPMRDS data annually HERE – additional roadways $17.8k Consider purchasing annually INRIX $20.5-24.5k TomTom $59.0k Too expensive, drop from consideration Traditional travel time runs on specific corridors 4 years 2009 (40 corridors) 2014 ~$7k for 5 corridors; $54.5k for 40 corridors Collect a small sample (4 or 5 corridors) of travel time runs to validate the first year of vendor travel time data. Otherwise discontinue, not needed.
Data Acquired / Collected Travel Time Data Cell-phone Total Origin-Destination Data Truck GPS Origin-Destination Data Supplemental Traffic Counts On-Board Transit Survey
Project Planning Full day kick-off and model design workshop Trip-based Hybrid Activity-based Spatial Resolution zone block Temporal Resolution AM/PM/MD/NT minute-by-minute Demographic Resolution household person Randomness analytic simulation Behavior Urban Form no yes Trip-chaining Tours/Physically Possible maybe Inter-personal Interactions Re-scheduling some Output matrix table Software TransCAD TransCAD& Daysim Programming GISDK GISDK & C# Runtimes ~2 hrs ~4 hrs ~5 hrs Hardware any desktop high end desktop Calibration Effort least intermediate most Cost (resident demand ONLY) ~$175k ~$225k ~$275k Project Planning Full day kick-off and model design workshop Established project goals Reviewed available data Reviewed alternative model designs
Activity-Based Model + Data-Driven Approach Sensitivity to urban form Bike & Walk demand More expensive, but still possible in budget Data-Driven Approach Leverage data investments Improved accuracy 1st time with activity-based
The Power of Big Data
The Power of Big Data Trip Table (OD pairs) Total: 529,984 HH Survey: 8,350 2.0% AirSage: 182,742 34.5%
Can you recognize the pattern based on <2%?
How about based on >25%?
Big Data allows us to see the Big Picture
1,000 Truck Sample
Same 1,000 Trucks After 24 Hours
Same 1,000 Trucks After 48 Hours
Same 1,000 Trucks After 72 Hours
Same 1,000 Trucks After 5 Days
Same 1,000 Trucks After 7 Days
The Limitations of Big OD Data
The Weaknesses of Big Data Filtering / Cleaning Needs vary by data source – but all need it GPS jumps/blips and equivalent Missing data No Purpose or Mode Just Ods – not a survey substitute Better to supplement with CTPP / LEHD Trip Definitions Non-representative
Not Representative Big Sample NOT Random Sample Locational biases, holes Trip length / duration biases Not corrected by penetration-based expansion 100 mi trip is 12 times as likely to be detected as a 10 mi trip
Pros and Cons of Expansion Methods Methods often combined to address multiple issues Level of effort and significance of biases vary
Chattanooga Expansion Adjustments FOUR-STEP ADJUSTMENT How best to expand to traffic counts? AirSage’s Market Penetration-based Expansion Trip-Generation-based filling of “holes” (ATRI) Single-factor Scaling Matrix Partitioning / Iterative Screenline Fitting No reliance on network model; holdout sample of counts
Destination Choice with Fixed Factors
Shadow-Pricing Used 40 district scheme with LEHD and AirSage data Destination District O-D Shadow Pricing Convergence Summary Iteration Absolute Error Mean absolute % error Weighted mean absolute % error RMSE 1 516,595 23.3% 22.2% 37.1% 2 421,404 20.6% 19.1% 30.7% … 24 59,962 11.8% 8.3% 10.5%
Total Daysim Trip Table vs. AirSage Daysim vs. AirSage Very good agreement – All cells within +/- 1% All residence/work Super Districts within +/-2.5% 10.5% RMSE
Assignment Validation Volume Range RMSE TDOT Maximum < 5,000 62.13% 100% 5,000 to 10,000 37.91% 45% 10,000 to 15,000 28.00% 35% 15,000 to 20,000 22.73% 30% 20,000 to 30,000 15.73% 27% 30,000 to 50,000 14.05% 25% 50,000 to 60,000 9.93% 20% All 28.97% Great fit! Better than old model Far exceeds TDOT standards
Big Data Driven Forecasting Improving forecasts may have as much to do with using better data as more advanced models Big data not solution for everything but its greatest strength addresses travel models’ greatest weakness But new data should result in new, data driven modeling approaches Need to be humble enough to admit limitations of “pure” models and capitalize on the new opportunity Pivoting, destination choice models with constants Better accuracy, analog to STOPS
Consultant Team Vince Bernardin, PhD Vince.Bernardin@rsginc.com Director of Travel Forecasting Vince.Bernardin@rsginc.com Agency Contacts Yuen Lee, AICP Director of Research & Analysis ylee@chattanooga.gov