Sample Expansion for Transit Rider Surveys Experiments with Two Rail Systems.

Slides:



Advertisements
Similar presentations
Data Analysis for Two-Way Tables
Advertisements

Contingency Tables Chapters Seven, Sixteen, and Eighteen Chapter Seven –Definition of Contingency Tables –Basic Statistics –SPSS program (Crosstabulation)
Learning Objectives 1 Copyright © 2002 South-Western/Thomson Learning Data Processing and Fundamental Data Analysis CHAPTER fourteen.
STRATEGIES FOR ORGANIZATION, VALIDATION AND DISTRIBUTION OF TRANSIT GEOGRAPHIC INFORMATION SYSTEMS DATA Jonathan Wade Manager, Service Development Support.
Smith Myung, Cambridge Systematics Sean McAtee, Cambridge Systematics Cambridge Systematics.
An Approach for Base Transit Trip Matrix Development: Sound Transit EMME/2 Model Experience Sujay Davuluri Parsons Brinckerhoff Inc., Seattle October,
Maths for Computer Graphics
CS 8751 ML & KDDEvaluating Hypotheses1 Sample error, true error Confidence intervals for observed hypothesis error Estimators Binomial distribution, Normal.
ARCGICE WP 1.4 ERROR ESTIMATES IN SPATIAL AND SPECTRAL DOMAINS C.C.Tscherning, University of Copenhagen,
Aaker, Kumar, Day Ninth Edition Instructor’s Presentation Slides
An Application of Mitigating Flow Bias from Origin/Destination Surveys in a Transit System Jamie Snow (AECOM) David Schmitt (AECOM) May 20, 2015.
Interfacing Regional Model with Statewide Model to Improve Regional Commercial Vehicle Travel Forecasting Bing Mei, P.E. Joe Huegy, AICP Institute for.
ARCGICE WP 5.1 Leader: UCL Synthesis report on benefit of combining data C.C.Tscherning, University of Copenhagen,
February 15, 2006 Geog 458: Map Sources and Errors
8/2/2015Slide 1 SPSS does not calculate confidence intervals for proportions. The Excel spreadsheet that I used to calculate the proportions can be downloaded.
Use of Truck GPS Data for Travel Model Improvements Talking Freight Seminar April 21, 2010.
KY 4/22 Module 1b Chapter 3 in the TS Manual Main Survey Types.
Learning Objective Chapter 13 Data Processing, Basic Data Analysis, and Statistical Testing of Differences CHAPTER thirteen Data Processing, Basic Data.
Arun Srivastava. Types of Non-sampling Errors Specification errors, Coverage errors, Measurement or response errors, Non-response errors and Processing.
Transportation leadership you can trust. presented to presented by Cambridge Systematics, Inc. Development of a Hybrid Freight Model from Truck Travel.
The Impact of Convergence Criteria on Equilibrium Assignment Yongqiang Wu, Huiwei Shen, and Terry Corkery Florida Department of Transportation 11 th Conference.
Presented to MTF Transit Committee presented by Scott Seeburger, Myung Sung, Dave Schmitt & Peter Haliburton November 20, Tri-Rail On-Board Survey.
© 2008 McGraw-Hill Higher Education The Statistical Imagination Chapter 13: Nominal Variables: The Chi-Square and Binomial Distributions.
Sampling Rates for Transit Rider Surveys An Initial Analysis 1.
Section 3.1B Other Sampling Methods. Objective: To be able to understand and implement other sampling techniques including systematic, stratified, cluster,
Matrix Arithmetic. A matrix M is an array of cell entries (m row,column ) and it must have rectangular dimensions (Rows x Columns). Example: 3x x.
Chi-Square Procedures Chi-Square Test for Goodness of Fit, Independence of Variables, and Homogeneity of Proportions.
Best Practices in Transit Rider Survey Data Collection Chris Tatham Sr. Vice President, CEO, ETC Institute 725 W. Frontier Circle Olathe, KS
Presented to MTF Transit Committee presented by David Schmitt, AICP November 20, 2008 FSUTMS Transit Survey Applied Research.
Random Sampling. Introduction Scientists cannot possibly count every organism in a population. One way to estimate the size of a population is to collect.
Aim: How do we analyze data with a two-way table?
Table 5 and Follow-up Collecting and Reporting Follow-up Data Or What’s Behind Table 5? American Institutes for Research February 2005.
May 2009TRB National Transportation Planning Applications Conference 1 PATHBUILDER TESTS USING 2007 DALLAS ON-BOARD SURVEY Hua Yang, Arash Mirzaei, Kathleen.
Copyright © 2013, 2009, and 2007, Pearson Education, Inc. 1 In an observational study, the researcher observes values of the response variable and explanatory.
Chapter Outline Goodness of Fit test Test of Independence.
Matrices: Simplifying Algebraic Expressions Combining Like Terms & Distributive Property.
Estimate of Overseas Travellers to Northern Ireland through RoI ports Comparison of PCI and SOT.
7: The Logic of Sampling. Introduction Nobody can observe everything Critical to decide what to observe Sampling –Process of selecting observations Probability.
Topics Survey data comparisons to “old” model Bluetooth OD data findings Freight model design Q & A.
Exploring Microsimulation Methodologies for the Estimation of Household Attributes Dimitris Ballas, Graham Clarke, and Ian Turton School of Geography University.
Copyright © Cengage Learning. All rights reserved. 2 SYSTEMS OF LINEAR EQUATIONS AND MATRICES.
Lecture PowerPoint Slides Basic Practice of Statistics 7 th Edition.
Tutorial I: Missing Value Analysis
GG 313 beginning Chapter 5 Sequences and Time Series Analysis Sequences and Markov Chains Lecture 21 Nov. 12, 2005.
Theory of Errors in Observations Chapter 3 (continued)
What is a Matrices? A matrix is a rectangular array of data entries (elements) displayed in rows and columns and enclosed in brackets. The number of rows.
Hua Yang Arash Mirzaei Zhen Ding North Central Texas Council of Governments Travel Model Development and Data Management.
Presented to Toll Modeling Panel presented by Krishnan Viswanathan, Cambridge Systematics, Inc.. September 16, 2010 Time of Day in FSUTMS.
Class Seven Turn In: Chapter 18: 32, 34, 36 Chapter 19: 26, 34, 44 Quiz 3 For Class Eight: Chapter 20: 18, 20, 24 Chapter 22: 34, 36 Read Chapters 23 &
DATA MINING: CLUSTER ANALYSIS (3) Instructor: Dr. Chun Yu School of Statistics Jiangxi University of Finance and Economics Fall 2015.
Applications in Mobile Technology for Travel Data Collection 2012 Border to Border Transportation Conference South Padre Island, Texas November, 13, 2012.
CHAPTER 13 Data Processing, Basic Data Analysis, and the Statistical Testing Of Differences Copyright © 2000 by John Wiley & Sons, Inc.
Marketing Research Aaker, Kumar, Leone and Day Eleventh Edition
13.4 Product of Two Matrices
Matrix Operations.
Introduction The two-sample z procedures of Chapter 10 allow us to compare the proportions of successes in two populations or for two treatments. What.
Matrix Operations.
Matrix Multiplication
Matrix Operations.
Regression.
WIFI Data Collection and the Effectiveness of Various Survey Expansion Techniques- A Case Study on I-95 Corridor in Palm Beach County, FL Presented to.
Matrices Elements, Adding and Subtracting
Chap. 7 Regularization for Deep Learning (7.8~7.12 )
MATRICES MATRIX OPERATIONS.
Overview and Chi-Square
Data Processing, Basic Data Analysis, and the
TransCAD Working with Matrices 2019/4/29.
Inference for Two Way Tables
Transit Survey White Paper
Correspondence Analysis
Presentation transcript:

Sample Expansion for Transit Rider Surveys Experiments with Two Rail Systems

Motivations FTA requirements for rider surveys – Model validation for major-project forecasting – “Before” and “After” rider surveys for projects Frequently asked questions: – What is best practice for sample expansion? – Are special on-to-off “surveys” worth the effort? 2

Motivations (continued) Sample expansion – Computed weight for each survey record: actual trips in the cell – Weight = useable survey records in the cell – Where the cell is an element of the sampling frame By route By route, direction, and time of day (TOD) By route, TOD, direction, and boarding stop By route, TOD, and on-to-off (or entry-to-exit) locations 3

Experimental design Find rider surveys for which station-to-station trip flows are known from fare-gate data Compare accuracy of survey records expanded: 1.To entry-station counts 2.To entry-station to exit-station flows estimated through matrix filling with: Seed matrix = 1.0 for all cells Seed matrix = tabulated survey records Seed matrix = N% sample of fare-gate records 4

Survey datasets Washington DC Metrorail system (2008) – 58,100 records from 750,400 daily entries (7.7%) – Paper survey forms with self enumeration – After-the-fact tabulation of the data Atlanta GA rail system (2009/2010) – 22,900 records from 195,500 daily entries (11.7%) – Personal interviews with tablets – Active sample monitoring and deficit repairs 5

1. Expansion to entry-station counts Estimated v. actual exits: Washington 6 AM MD PM EV

Expansion to entry-station counts (cont’d) Bias in estimated exits: Washington 7 Plotted points are the 86 rail stations R Squared =

Bias in response rates : Washington 8 R Squared =

Expansion to entry-station counts (cont’d) Bias in estimated exits: Atlanta 9 R Squared = 0.022

Bias in response rates: Atlanta 10 R Squared = 0.002

Expansion to entry-station counts (cont’d) Estimated v. actual flows: Washington 11 AM MD PM EV

2. Expansion to station-to-station flows Cells in sample designed defined jointly by: – Entry station and exit station by time period – Expansion weight computed for records in each cell – Number of actual trips divided by number of records Method to estimate actual trips in each cell – Matrix filling via iterative proportional fitting (IPF) – Counts of station entries and exits  targets for rows and columns – Seed matrix: Survey records? Sample of flows? 12

13 Entry counts only IPF: seed = 1’s IPF: seed = survey records IPF: seed = survey; 0.1 infills IPF: seed = survey; 1.0 infills IPF: seed = 20% flow sample Method for Estimating Station-to-Station Flows 109% 39% 2. Expansion to station-to-station flows (cont’d) Percent RMSE by time period: Washington

14 39% 2. Expansion to station-to-station flows (cont’d) Percent RMSE by time period: Washington Sampling Rate for the Sample of Entry-to-Exit Flows 20%

2. Expansion to station-to-station flows (cont’d) Sparseness of the flow table: Washington Actual flows have relatively few empty cells Low sampling rates lead to sparse seed matrices Main effect is to empty out cells with 1-5 trips – Actual:28,466 trips – 5%:0 trips – 20%:20,101 trips – 50%:24,127 trips 15 Percent of station-to-station cells with specified number of trips Sample rate 0 trips 1-5 trips 6-10 trips >10 trips 100%8%38%15%39% 5%64%0% 36% 10%51%0%10%39% 15%44%1%18%38% 20%38%13%11%38% 25%35%17%10%38% 30%32%17%13%38% 50%24%25%13%38% 80%18%33%12%38%

2. Expansion to station-to-station flows (cont’d) Comparison for AM flows: Washington 16 Records expanded with counts of station entries Records expanded with flows developed from a 20% sample

17 Records expanded with counts of station entries Records expanded with flows developed from a 20% sample 2. Expansion to station-to-station flows (cont’d) Comparison for AM flows: Atlanta

1.Expansion to station-entry counts – Errors in estimates of flows and station exits – Income-related biases in response rates Evident with paper-based, post-processed survey Much less evident with combination of: – Tablet-assisted personal interviews, – Sample design based on fare-gate flows, and – Real-time monitoring to ensure realization of the sample design 18 Observations

2.Expansion to station-to-station flows – Adds important information: station exits – IPF application improves estimate flows significantly Seed matrix from survey records retains survey biases Seed matrix from independent sample of flows – 20-30% sample effective for these two systems – Larger samples marginally better 19 Observations

Method in the field matters Second dimension helpful in all cases Sample of flows helps – With sample design and management – With sample expansion Extension to smaller-ridership cases – Similar sample design, management, expansion – Sparseness  grouping of stations/stops Implications for rider surveys 20

Thank you. Questions?