Download presentation
Presentation is loading. Please wait.
Published byBrendan Sullivan Modified over 9 years ago
1
INFO 7470/ECON 7400/ILRLE 7400 Solutions to Lab 5 John M. Abowd and Lars Vilhuber March 25, 2013
2
LESSONS TO BE LEARNED Subtitle organization 3/4/2013 © John M. Abowd and Lars Vilhuber 2013, all rights reserved 2
3
Lessons Answering data-driven questions Identify tools to answer the question Correctly use available metadata Not all data on the same topic provide the same answer 33/4/2013 © John M. Abowd and Lars Vilhuber 2013, all rights reserved 3
4
Required tools SAS, Stata, R, Python, etc. Web browser Search engine… 3/4/2013 © John M. Abowd and Lars Vilhuber 2013, all rights reserved 4
5
NAICS 3/4/2013 © John M. Abowd and Lars Vilhuber 2013, all rights reserved 5
6
NAICS sub-sectors (NAICS3) 3/4/2013 © John M. Abowd and Lars Vilhuber 2013, all rights reserved 6
7
QCEW 3/4/2013 © John M. Abowd and Lars Vilhuber 2013, all rights reserved 7
8
After downloading ZIP file For historical data, BLS has packaged an entire year into a single ZIP file (151MB) We only need one file from there: county file for Pennsylvania What is the state code for PA? – PA -> FIPS=42 We thus need cn42pa10.enb (note the extension, but no choice: only.enb files available) Extract it from the ZIP file, unpacked: 38MB 3/4/2013 © John M. Abowd and Lars Vilhuber 2013, all rights reserved 8
9
How to read it in? No information in the ZIP file, but… – On the same FTP server: DOCUMENT/DOCUMENT/ – On the Web page: “Flat file formatters” – On the Web page: “Tools and tutorials” Use the template files to construct a SAS program – For Stata: construct a dictionary file – For R: read a fixed format file 3/4/2013 © John M. Abowd and Lars Vilhuber 2013, all rights reserved 9
10
Solution to QCEW http://www.vrdc.cornell.edu/info7470/Data/l ab5-qcew.sas.txt http://www.vrdc.cornell.edu/info7470/Data/l ab5-qcew.sas.txt Compare it to the template program provided in BLS’ makesas.zipmakesas.zip 3/4/2013 © John M. Abowd and Lars Vilhuber 2013, all rights reserved 10
11
Minor modifications 3/4/2013 © John M. Abowd and Lars Vilhuber 2013, all rights reserved 11
12
Computations 3/4/2013 © John M. Abowd and Lars Vilhuber 2013, all rights reserved 12
13
QCEW Pitfalls Industry coding: ftp://ftp.bls.gov/pub/special.requests/cew/D OCUMENT/industry.map ftp://ftp.bls.gov/pub/special.requests/cew/D OCUMENT/industry.map “Industry Code Map: This is for NAICS based Quarterly Census of Employment and Wages (QCEW) data.” 3/4/2013 © John M. Abowd and Lars Vilhuber 2013, all rights reserved 13
14
Mixed industry coding Industry CodeIndustry Title 1010 Total, all industries 101101 Goods-producing 10111011 Natural resources and mining 11NAICS 11 Agriculture, forestry… 3/4/2013 © John M. Abowd and Lars Vilhuber 2013, all rights reserved 14
15
QWI Challenge: very large files http://www.vrdc.cornell.edu/qwipu/R2012Q2 /pa/wia/qwi_pa_wia_county_naicssec_pri.csv. bz2 : 81MB compressed, 2.3GB uncompressed http://www.vrdc.cornell.edu/qwipu/R2012Q2 /pa/wia/qwi_pa_wia_county_naicssec_pri.csv. bz2 Read-in requires 8GB of RAM for R… 3/4/2013 © John M. Abowd and Lars Vilhuber 2013, all rights reserved 15
16
Metadata and data “How many data rows does the file you downloaded have?” – QCEW: as many as the.enb file has (no embedded metadata) (88,093) – QWI: count of lines minus 1: the header row is metadata, not data (8,482,131) – Same reasoning for CBP (2,155,389) 3/4/2013 © John M. Abowd and Lars Vilhuber 2013, all rights reserved 16
17
Reading in QWI http://www.vrdc.cornell.edu/qwipu/R2012Q2 /pa/wia/sas_import_wia.sas in the same directory http://www.vrdc.cornell.edu/qwipu/R2012Q2 /pa/wia/sas_import_wia.sas Very long program, but the very first section is for the file we want: qwi_pa_wia_county_naics3 Alternatively, use “proc import”, but may not yield correct results. 3/4/2013 © John M. Abowd and Lars Vilhuber 2013, all rights reserved 17
18
After read-in, same as for QCEW http://www.vrdc.cornell.edu/info7470/Data/l ab5_qwi.sas.txt : http://www.vrdc.cornell.edu/info7470/Data/l ab5_qwi.sas.txt 3/4/2013 © John M. Abowd and Lars Vilhuber 2013, all rights reserved 18
19
Solution for QWI 3/4/2013 © John M. Abowd and Lars Vilhuber 2013, all rights reserved 19
20
County Business Patterns Straight CSV file, but for entire year (15.2MB ZIP file) But: employment refers to March 15, so comparable to the other two Caution: file contains all levels of NAICS, right- filled with “////” 3/4/2013 © John M. Abowd and Lars Vilhuber 2013, all rights reserved 20
21
Solution for CBP 3/4/2013 © John M. Abowd and Lars Vilhuber 2013, all rights reserved 21
22
Results Not all sources give the same answer… – Differences in source data Count of individual wage records Firm-level report of employment at a particular point in time to state reporting system Establishment-level report of employment a particular point in time to federal reporting system – Differences in data cleaning – Other… 3/4/2013 © John M. Abowd and Lars Vilhuber 2013, all rights reserved 22
23
Now that you know how Try it on Lewis and Clark County, MT Try it for earlier time periods Drill down 3/4/2013 © John M. Abowd and Lars Vilhuber 2013, all rights reserved 23
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.