Download presentation
Presentation is loading. Please wait.
Published byAshlee Ford Modified over 9 years ago
1
What about the whole country? Extending the Activity-Based Person-Trip Synthesizer to all 330 million Americans Judy Sun ‘14 & Luke Cheng ’14 ORF467 F13
2
The Process Generate Schools Generate Employee Patronage File Assign Patronage Generate Patronage-Employee Ratios A Look at the Data Generate Census File (with Microsoft Access) NN Files through 7 NJ Modules by Jake and Talal Trip File Generator: Out-of-State commuters, students, workplace assignment, 18 Tour Type (Activity Patterns) assignment, Temporal Dimension
3
Roadmap Schools Data Employee-Patronage Data A Look at the Data Census Data Further Steps
4
Schools Data
5
Public Schools in the US
6
Quick stats on Public Schools (2011) School Type# of CHARTER# of PUBLICTotal Primary 2,584 51,79354,377 Middle 615 16,33216,947 High 1,316 19,76221,078 Other 1,145 5,8476,992 No Answer 564 3,5254,089 Total6,22497,259103,483
7
Public Schools: Enrollment School TypeCHARTERPUBLICTotal Primary 896,544 23,226,606 24,123,150 Middle 166,519 9,425,155 9,591,674 High 368,109 13,767,489 14,135,598 Other 626,562 1,289,050 1,915,612 No Answer (1,128) (7,016) (8,144) Total 2,056,606 47,701,284 49,757,890
8
Private Schools in the US TypeNumber of Schools Primary18,400 Secondary2,517 Combined7,300 Total28,217
9
Private Schools: Enrollment Type# students Primary 2,134,007 Secondary 738,600 Combined 1,431,252 Total4,303,859
10
Private Schools: School Size
11
Post-secondary schools (2009) Institution type# of Students Enrolled# of students as percent totalNumber of Schools Graduate2910%350 Primarily Baccalaureate1,483,01893%2,169 Primarily Non-Bacc53,9033%623 Associate's49,2633%1,745 Nondegree-granting postbac170%14 Nondegree-granting pre-bac10,9601%2,698 Total1,597,452100%7,735
12
Employee-Patronage Data
13
The Process 2012 InfoGroup US Businesses File (5.80 GB) 30 CSV files with 500,000 entries (~200MB) – Shell Script 30 CSV files with patronage generation and data cleaning and mapping (~115MB) – R Script 1570 Segmented State Files (1KB to 20MB) – R Script 51 Merged State Files (8MB to 390MB) – Python Script
14
Patronage Generation Previous Process – Manual Fine-Tuning Inconsistent: Same NAICS Code, Different Patronage/Employee Ratio Current Process – Employee Size Range, Sales Volume Range Not Perfect Data Matching businesses (Zip, County, NAICS, Latt/Long) Same Employee Size Range Assumption: Sales Volume same across time Trying to acquire the 2005 Data for better correlations Ratios from Averaging Previous EP file
15
Comparison: Distributions
16
Conclusion: Need to use NAICS Codes, in addition A large number of 0-1 ratio values are offset by the 7-20. Therefore, we get a surge averages of around 4-5. Difficult to capture nuances with just employee size and sales volume. Next Steps: Man-Power needed to assign ratio for each NAICS Code, Sales Volume, Employee Size combination
17
A Look at the Data
18
NJ Counties (Change in NJ EP File) UncensoredUn-Named Removed
19
NJ Wide UncensoredUn-named Removed No Businesses +73,500 Tot Emp +4.8M Emp Size +7.85 Tot Patrons -4.9M Avg Patrons -17.17 No Businesses +39,350 Tot Emp +4.8M Emp Size +9.09 Tot Patrons -5.3M Avg Patrons -16.29
20
Nation-Wide RankState Sales VolumeNo. Businesses Total Employees Avg Employee SizeTotal Patrons Average Patrons 1California$1,8891,579,34223,518,02214.8936,820,12923.31 2Texas$2,115999,33117,624,23517.6424,846,69524.86 3Florida$1,702895,58612,331,52413.7721,231,86423.71 4New York$1,822837,77318,327,93321.8819,610,81323.41 5Pennsylvania$2,134550,67810,498,44219.0613,704,90324.89 9New Jersey$1,919428,5968,833,89020.619,986,52923.30 45Washington DC$1,31749,4885,702,617115.231,067,93821.58 47Rhode Island$1,81446,5031,117,14024.021,201,12425.83 48North Dakota$1,97844,518492,54711.061,021,07722.94 49Delaware$2,10841,296670,62216.241,011,40024.49 50Vermont$1,55439,230379,2919.67821,19320.93 51Wyoming$1,67935,881340,3429.49772,09021.52
21
Census Data
22
Inputs 2010 Census Summary File 1 http://www2.census.gov/census_2010/04-Summary_File_1/ http://www2.census.gov/census_2010/04-Summary_File_1/ Does not convert to CSV/TXT; Files made for MS Access Process Tables (P12, P16, P29, H13, P43) with Talal’s VBA macro in MS Access (p.78) VBA Code – whereabouts unknown, perhaps with Prof K 2012 5-Year Census American Community Survey http://www2.census.gov/acs2012_5yr/summaryfile/ http://www2.census.gov/acs2012_5yr/summaryfile/ Income Data to assign incomes to households and residents
23
Generation Module 1 – Outputs resident file for each county in state Rows: Individual People Attributes/Columns: County Number (replace with State Number_County Number for national file), Household ID, Household Type, Latt/Long, ID Number, Age, Sex, Traveler Type, Income Bracket Module 2 – Out of state/region/nation nodes For commenting on code, go to p.17-19 http://www.princeton.edu/~alaink/Orf467F12/MuftiTripSynth esizer_v.1.pdf http://www.princeton.edu/~alaink/Orf467F12/MuftiTripSynth esizer_v.1.pdf
24
Further Steps
25
What To Do Next? Patronage Generation with NAICS, Sales Volume, Employee Size and Research – Low Difficulty I already generated a file mapping all NAICS and employment counts along with payrolls for patronage assignment using 2010 Census Data (200K entries) Census Data Generation and Rework NN Generation Modules – High Difficulty Optional: Data Verification for Employee-Patronage Files
26
Modules Very hard-coded for NJ; not very well-commented Initial National Implementation Ideas: Treat US as one entity with external nodes at airports to represent foreigners Problem: Computationally intensive for 330M people Solution: Do a semi-randomized sample Regionalize the US and use out-of-region external nodes Less labor-intensive and parallel processing Doing each state Problem: Hard to generalize code, out-of-state nodes Extremely labor-intensive
27
The Code: Thought Process Trips generated state-by-state Use state-level demographic information on residents Ignore state-level boundaries since we have employer and attraction information for the nation. Example: John Smith lives in NYC and works in CT. We will get his household from NYC Census file and the probability distribution of workplace in CT E-P file. When we map NYC Trips, we will see John Smith going to CT for work. When we map CT Trips, we will see John Smith returning from work. Trip destinations can be approximated using destination county centroids Requires assigning centroid to each county
28
The Code: Thought Process Workplace assignment (without replacement): Census maps individuals to workplace John Smith lives in NYC and works in CT Use distribution to match workplace to E-P file (keep a count of employees to match the number given) John Smith mapped to an employer in CT If more than x (e.g. 250) miles, assume arrival at airport School Assignment (without replacement): Use bounds and distribution to match students with schools (assume same county) Jane (8) is mapped to elementary school in her county
29
The Code: Thought Process Tour Type assignment and Temporal Dimension Can try to repurpose Talal’s code Add in Time Zones in Temporal Dimension Can do this with replacement (patrons) Assumptions: Same behavior across states in terms of work time and leisure time and activity patterns Out-of-Country Commuters / Non-Resident Workers International nodes for the states along the Canadian and Mexican borders Trip to the nearest border crossing
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.