Big Data “Triage” for Long Range Planning Transportation Engineering and Safety Conference Reuben S MacMartin December 12, 2014
Delaware Valley Regional Planning Commission Metropolitan Planning Organization (MPO) 2 States 9 Counties 351 Municipalities 5.6 Million Population 3,800 sq. miles ~115 employees Activities – Long Range Plan (LRP) Transportation Improvement Program (TIP) Wide range of planning and technical support for regional partners
Outline What we use data for? Traditional data sources – traffic counts, surveys, demographic data The old-new – OSM, GTF, VPP Suite, Bluetooth The new-new – CycleTracks, real-time transit data,…, data-mined GPS data, etc.
What do we use data for? Current conditions on transportation studies
Current Conditions
What do we use data for? Current conditions on transportation studies Definition and analysis of congestion for the Congestion Management Process (CMP)
A bad day compared to average
What do we use data for? Current conditions on transportation studies Definition and analysis of congestion for the Congestion Management Process (CMP) Long Range Planning
Long Range Planning
What do we use data for? Current conditions on transportation studies Definition and analysis of congestion for the Congestion Management Process (CMP) Long Range Planning Calibration and validation of travel forecasting models 250 Riders in 2040
Also a data provider – eg. RIMIS
“Traditional” Planning Data Sources Inventories Traffic counts (78,300+) Bike and Ped counts (1000+) Travel time surveys Behavioral Surveys Household travel survey ( ) Transit on-board ( ) Demographic Data Census, American Community Survey National Employment Time Series (NETS)
The old “new” data These were innovative 5 years ago – Open source data for our travel demand model networks
Travel Demand Model Networks The need: Accurate representations of regional highway and transit networks The past: “hand” code from paper maps, schedules, etc. or, combine a multitude of different data sources The innovation: Fuse OpenStreetMap (OSM) and GTF (i.e. “Google- transit”) and add extra data for modeling
Open Data Mash-up for Transportation Modeling Data integration Data objects of different origin are merged New relationships are created from OSM Stop Point Number Line Name Service Pattern Line Name Route Name Direction Scheduled Run Line Name Route Name Direction Index Travel Demand Data Stop Area Number from GTFS Node Number Link From Node To Node 2 1 or more 0 or more Exactly 1 Legend Connector Zone Number Node Number Direction Zone Zone Number
Integrated Street & Transit Network © in part by OSM and CC-by-SA
TIM 2 Highway Network © in part by OSM and CC-by-SA New, accurate topology (& routable)Legacy DVRPC network model
Original SEPTA GTFS (2010)
VISUM Imported Network
VISUM Exported Network (WKTPoly shape)
The old “new” data These were innovative 5 years ago – Open source data for our travel demand model networks Bluetooth detectors for speed and O-D data
The old “new” data These were innovative 5 years ago – Open source data for our travel demand model networks Bluetooth detectors for speed and O-D data Automated Passenger Counter (APC) data - SEPTA
Why APC data? Time stamped boarding and alighting data by line by stop Time period level targets for modeling Stop and line level expansion values for On Board Survey work Used in calibration/validation of path builder Transit studies: O-D matrices by line
The new “new” data User-sourced bike data - CyclePhilly
CyclePhilly – User Generated GPS Data
Raw GPS TraceSnapped GPSModel PathModel vs. Data
The new “new” data User-sourced bike data – CyclePhilly Vehicle probe data – INRIX
PM Peak TTI – INRIX
Archived Operational Data – INRIX
The new “new” data User-sourced bike data – CyclePhilly Vehicle probe data – INRIX SEPTA Key (new fare payment technology) data – SEPTA (availability TBD)
Fare Card Data – Possibilities Anonymized full day transit-based tour data for all riders O-D data Route choice data Transfer behavior Frequency of transit use Much higher resolution data than current survey methods
Triage – Making Data Usable Aggregation – Resolution and limits of existing analytical tools/methods Cleaning – You can’t check every data point Initial spot check and clean as you go if you find discrepancies Sampling Biases – Not all big data is truly random Compare non-random to random sources whenever possible Declare biases of data when using it