Evaluating Forecast Demographic Scenarios Using Population Synthesis and Data Simulation Joshua Auld CTS IGERT Seminar June 25, 2009
Overview Introduction Population Synthesis Forecasting Marginal Variables Travel Data Simulation Model Scenario Analysis ITA Analysis Conclusions
Introduction
Travel Demand Forecasting: –Typically done at long time horizons (20, 30 year, etc.) –Need forecast demographics to forecast demand –Many ways to do so (expert opinion, trend lines, land-use models, etc.) Move to activity based models: –Require synthetic populations –Used as agents in the ABM simulation –Travel patterns of all agents summed to give demand Data requirements for population synthesis –Household/Individual sample data – joint distribution –Marginal data – small area distributions of single variables
Introduction (continued) For forecast synthetic populations: –Same data requirements as base year –Data often nonexistent, no data 30 years in future Solutions for data problems: –Usually use base year sample directly as seed –Update base year marginals –This gives closest population distribution to base year that matches forecast marginals Forecasting marginals can be done in several ways –Full, integrated land-use model (UrbanSim, PECAS, etc.) –Proportional updating (assume same marginal distributions) Common approach for many agencies
Introduction (continued) Our approach: –Combine forecasting models, expert opinion / scenario analysis and proportional updating Forecasting models: –Estimate marginal distributions for household size, number of workers –based on limited information (number of households and employees per zone) Expert opinion/scenario analysis –For marginals of interest that are difficult to predict –Allow marginals to be varied by analyst –Easy-to-use scenario definition tool, direct manipulation of marginal distributions Useful where forecast information is limited
Objectives of Current Work To demonstrate: –Use of a flexible population synthesizer/scenario evaluation tool –Combined forecast population with data transferability model – synthesize forecast travel attributes –Demonstrate impact of forecast population changes on several travel demand variables –NOT to make realistic travel demand/demographic predictions (left to planning agency)
Using Population Synthesis in ITA Evaluation In addition to use in Travel Demand: –Improve ITA communications simulation –Market Analysis –ITA system performance In conjunction with ITA adoption and usage models –Where are the people who will use the ITA? –Where are they coming from/going? –How will these patterns impact ITA performance –Evaluate estimated individual/system benefits
Population Synthesis Program
Base Population Synthesis Program Link sample data geography to marginal data Choose up to six control variables Define the categories (link btw. sample data and marginal data Apply weighting Specify test variable –Estimate the fit of various forecast populations
Population Synthesis Methodology Foreach Pums in Pums_List Fill Pums.HH_List and Pums.PER_List from sample data Initialize Pums.HH_MWay and Pums.PER_Mway Run IPF to fit Pums.HH_MWay and Pums.PER_MWay to Pums.Marginals Foreach BG in Pums.BG_List Seed BG.HH_MWay and Bg.PER_MWay from Pums Run IPF to fit to BG.Marginals Foreach HH in Pums.HH_List For i=0 to Bg.HH_Mway(cell number of HH in Mway) Get Probability of adding household = f(HHtype, HH_Mway, PER_Mway) if HH added update Bg.HH_Mway, BG.PER_MWay, N remaining Write HH.Data with BG.ID End Next
Forecasting Control Variables Input base and forecast year required zonal data Link control variable categories to forecast categories –4 HHsize, 3 numworkers Generate forecast marginals: –Proportional updating, or –Forecast model
Scenario Definition Select sub-regions to apply changes Select control variable to modify Adjust variable marginal distribution Multiple selections, modified variables allowed
Performance Comparison Our Synthesizer –Nearly exact matching of HH level marginals –Close matching of PER level marginals Undercount of high hhsizes, missing group quarters –Tested on Chicago Region 2.9 mm HH, 7.8 mm people (within 2%) 3 HH controls, 3 person controls (560, 112 MWay size) Run time of 123 minutes Guo and Bhat 2007 –Test on Dallas/Fort-Worth 5 HH controls, 3 person controls (336, 140 MWay size) –Introduces slack in selection procedure – marginals not matched –No Performance characteristics given Ye et al –Test on Maricopa County (Phoenix) – 1.1 mm HH, 3.1 mm people 3 HH controls, 3 person controls (280, 140 MWay Size) –Seems to match distributions well – heuristic weight setting procedure –Run Time of 16 hours
Forecasting Control Variable Distributions
Forecasting Forecasting often done by proportional updating –Assume same marginal distribution in forecast year However, marginals change over time –i.e. changes in pop, households, housing, etc. lead to changes in household size –Can see in Census data, marginal dist. not constant –Distribution of each marginal should therefore change Need model of marginal changes –Only for certain variables (HH Size and Number of Workers in this study) –Need data that drives marginal changes –Income, race, etc. changes not modeled – done through scenario definition
SURE Forecasting Model SURE marginal changes forecasting model: –System of linear regression equations –Related only through correlated error terms –Accounts for cross equation correlations –d(hh,emp) -› dhhsize=1, dhhsize=2, etc. –Estimate change in hhsize and num workers categories Model specification:
Dependant variables are change in HH in each category: –HHsize=1, HHsize=2, HHsize=3-4, HHsize=5+ –NumWorkers=0-1, NumWorkers=2+, NumWorkers=NA (non-family) –All dependent variables normalized by base year total HH –i.e. change in HHsize=i per base year household Independent Variables include: –Total households in zone, base and forecast –Total employment in zone, base and forecast –Household Density, base and forecast –Base year demographics –Base year land use mix: (% of area devoted to Single Family) –Job accessibility (base and forecast – base year LOS/mode split) SURE Forecasting Model: Explanatory Variables
SURE Forecasting Model: HH Size Results MODEL:
SURE Forecasting Model: Number of Workers Results
SURE Forecasting Model Validation Validation run for HHsize and NWork models –Run using unseen data (1980) –Validation forecast: 1980 to 2000 –Compared against results from proportional updating Shows moderate improvement (~10%) in R 2, RMSE HHSize Validation:
Travel Data Simulation Model
Data simulation overview Objective –Quick alternative to travel demand model –Generating joint disaggregate travel data at household level –Transfer data from NHTS to synthetic population Travel Attributes –Household Total Trips per Day –Household Mandatory Trips per Day –Household Maintenance Trips per Day –Household Discretionary Trips per Day –Household Auto Trips per Day Total Trip Auto Trip Mandatory Trip Maintenance Trip Discretionar y Trip
Data simulation overview Travel attributes generating models –32 explanatory variables are employed including (NHTS, TIGER files): –Household socio-demographic characteristics. E.g. –Age –Income –Occupation –Education –Ethnicity –…. –Built-environment variables. E.g. –Residential density –Intersection density –Transit Use –…
Data simulation model Travel attributes generating models –Models are decision trees with a maximum of three depth levels –Decision trees were tested against the observed travel data for Des Moines add-on data and they provided good fits
Simulation Model Validation Travel attributes generating models –Probability density functions for observed, transferred and national household total number of trips per day in Des Moines area
Analysis Results
Scenarios Analyzed Base year, Forecast year and two scenarios analyzed for six-county Chicago region Four different synthetic populations generated –BY: 2000 (base year) –FY: 2030 (forecast year) –S1: 2030 High Ageing –S2: 2030 High Ageing in Suburbs, Lowered Age in Chicago Travel data indicators simulated for each scenario
Scenario Marginal Distributions
Selected scenario analysis results Change in Total Trips/HH for S1 and S2 compared to FY: IncreaseNo changeDecrease
Selected scenario analysis results Change in Discretionary Trips / HH for S1 and S2 compared to FY: IncreaseNo changeDecrease
Selected scenario analysis results Change in Auto Share for S1 and S2 against FY IncreaseNo changeDecrease
Scenario Analysis Results Aggregate results for whole region, Chicago and suburbs: –Ageing decreases total trips, increases auto share overall –In Chicago, increased aging and decreased aging both increase auto share
ITA Analysis Demonstration
Under an assumed ITA adoption model (binary choice model with made up numbers): –Average of 24% using an ITA –Probability increases with gender, Income, Travel Time to work, having a degree –Decreases with age Plot distribution of ITA users (density by Block group)
ITA Usage Results After applying model to synthetic population Shows ITA density per sq. mile for each block group in Chicago Area High Density Areas –North Side –Loop Low Density –SW Suburbs –South Side
Conclusions
Conclusions and Discussion Flexible, easy to use scenario analysis tool –Few limitations on geography/analysis variables Allows: –Accurate forecast, with minimal info requirements –Quick scenario visualization/analysis –Apply different scenarios to different sub-regions –Multiple levels of control (household and person) Useful for: –4-step travel demand – reduce agg. bias –ABM – synthesize agents for microsimulation –ITA Analysis Performance: –Compares very favorably to other population synthesizers
Thank You! Questions?