Using Google’s Aggregated and Anonymized Trip Data to Estimate Dynamic Origin-Destination Matrices for San Francisco TRB Applications Conference 2017 Bhargava Sana, Joe Castiglione, Dan Tischler, Drew Cooper Good Afternoon! I am here to share our recent experience with a new passive data source from Google to estimate OD matrices for San Francisco Bay Area. SAN FRANCISCO COUNTY TRANSPORTATION AUTHORITY May 16, 2017
OD Data Collection Conventional methods License plate surveys Roadside interviews Emerging passive data collection methods Bluetooth detectors Cell phone Call Detail Records (CDR) data GPS, Wi-Fi detectors Historically, trip OD info available only through expensive and time-consuming surveys Recently, use of passive data collection technologies is gaining momentum
Google’s Better Cities Program Aggregated and Anonymized Trip (AAT) Data Minimize congestion, improve safety and reduce infrastructure spending Aggregated and Anonymized Trip (AAT) information from location reports Extract data from moving users Clean data and snap to road network Aggregate OD trip counts Apply differential privacy filters and minimum trips threshold Partnered with Google Better Cities program made available aggregated and anonymized OD flow information from location reports of mobile device users Reports based on data from users who chose to store their location information from Google-enabled devices on Google servers Easy to turn it on in account settings and record trips fairly accurately; No active navigation constraint Describe AAT creation process AAT DATA
Google AAT Dataset Hourly AAT data for six months (Apr-Jun and Sep-Nov 2015) Flow data provided as relative trips as opposed to absolute counts Convert relative flows to trips using HH travel survey? Hourly flow data provided for 6 months in 2015 85 districts defined for 9-county San Francisco Bay Area: combination of Tract and County boundaries Google provided relative flows rather than absolute counts stating that location reports account for only a sample of travelers But we need trips for planning purposes; maybe relative flows could be converted to trips using regional HH travel survey? A quick comparison with CHTS flows shows that the relative magnitude of AAT flows is reasonable CHTS is sparse with data for less than half the total OD district pairs (7,225) but AAT has all ODs
Relative Flow Conversion Model 𝑷𝑻 𝒐𝒅𝒕 = β 𝒕 𝑹𝑭 𝒐𝒅𝒕 +ϵ ; 𝟎≤𝐭≤𝟐𝟑 𝑃𝑇 𝑜𝑑𝑡 : Person trips between o-d for hour-of-day t 𝛽 𝑡 : Coefficient of AAT relative flow for t 𝑅𝐹 𝑜𝑑𝑡 : Avg AAT flow between o-d for hour-of-day t 𝜖 : Error term No geographic constants applied District- and County-level regression models estimated 20% sample used for validation The relationship between relative flows and trips appeared to vary by geography as well as time-of-day However only TOD coefficients used so that the conversion model is independent of OD markets Both district-level and county-level regression through origin (RTO) models were estimated A 20% holdout sample was used for validation in each case
Origin-Destination-Hour Predicted vs Observed Origin-Destination-Hour Scatter plots of predicted vs observed hourly OD flows show that the models perform reasonably well except county model validation sample. Please note, each point is an origin-destination-hour. RMSE and Mean Absolute Error (MAE) show that county model is unsurprisingly better due to higher spatial aggregation
Summary AAT relative flow magnitudes correlated with actual trips AAT geographic coverage significantly higher Simple linear regression model may be used for conversion Could support measuring longitudinal variation Further studies Better/smooth survey data Compare with cell CDR data