Presentation is loading. Please wait.

Presentation is loading. Please wait.

INFO 7470/ECON 7400/ILRLE 7400 Solutions to Lab 5 John M. Abowd and Lars Vilhuber March 25, 2013.

Similar presentations


Presentation on theme: "INFO 7470/ECON 7400/ILRLE 7400 Solutions to Lab 5 John M. Abowd and Lars Vilhuber March 25, 2013."— Presentation transcript:

1 INFO 7470/ECON 7400/ILRLE 7400 Solutions to Lab 5 John M. Abowd and Lars Vilhuber March 25, 2013

2 LESSONS TO BE LEARNED Subtitle organization 3/4/2013 © John M. Abowd and Lars Vilhuber 2013, all rights reserved 2

3 Lessons Answering data-driven questions Identify tools to answer the question Correctly use available metadata Not all data on the same topic provide the same answer 33/4/2013 © John M. Abowd and Lars Vilhuber 2013, all rights reserved 3

4 Required tools SAS, Stata, R, Python, etc. Web browser Search engine… 3/4/2013 © John M. Abowd and Lars Vilhuber 2013, all rights reserved 4

5 NAICS 3/4/2013 © John M. Abowd and Lars Vilhuber 2013, all rights reserved 5

6 NAICS sub-sectors (NAICS3) 3/4/2013 © John M. Abowd and Lars Vilhuber 2013, all rights reserved 6

7 QCEW 3/4/2013 © John M. Abowd and Lars Vilhuber 2013, all rights reserved 7

8 After downloading ZIP file For historical data, BLS has packaged an entire year into a single ZIP file (151MB) We only need one file from there: county file for Pennsylvania What is the state code for PA? – PA -> FIPS=42 We thus need cn42pa10.enb (note the extension, but no choice: only.enb files available) Extract it from the ZIP file, unpacked: 38MB 3/4/2013 © John M. Abowd and Lars Vilhuber 2013, all rights reserved 8

9 How to read it in? No information in the ZIP file, but… – On the same FTP server: DOCUMENT/DOCUMENT/ – On the Web page: “Flat file formatters” – On the Web page: “Tools and tutorials” Use the template files to construct a SAS program – For Stata: construct a dictionary file – For R: read a fixed format file 3/4/2013 © John M. Abowd and Lars Vilhuber 2013, all rights reserved 9

10 Solution to QCEW http://www.vrdc.cornell.edu/info7470/Data/l ab5-qcew.sas.txt http://www.vrdc.cornell.edu/info7470/Data/l ab5-qcew.sas.txt Compare it to the template program provided in BLS’ makesas.zipmakesas.zip 3/4/2013 © John M. Abowd and Lars Vilhuber 2013, all rights reserved 10

11 Minor modifications 3/4/2013 © John M. Abowd and Lars Vilhuber 2013, all rights reserved 11

12 Computations 3/4/2013 © John M. Abowd and Lars Vilhuber 2013, all rights reserved 12

13 QCEW Pitfalls Industry coding: ftp://ftp.bls.gov/pub/special.requests/cew/D OCUMENT/industry.map ftp://ftp.bls.gov/pub/special.requests/cew/D OCUMENT/industry.map “Industry Code Map: This is for NAICS based Quarterly Census of Employment and Wages (QCEW) data.” 3/4/2013 © John M. Abowd and Lars Vilhuber 2013, all rights reserved 13

14 Mixed industry coding Industry CodeIndustry Title 1010 Total, all industries 101101 Goods-producing 10111011 Natural resources and mining 11NAICS 11 Agriculture, forestry… 3/4/2013 © John M. Abowd and Lars Vilhuber 2013, all rights reserved 14

15 QWI Challenge: very large files http://www.vrdc.cornell.edu/qwipu/R2012Q2 /pa/wia/qwi_pa_wia_county_naicssec_pri.csv. bz2 : 81MB compressed, 2.3GB uncompressed http://www.vrdc.cornell.edu/qwipu/R2012Q2 /pa/wia/qwi_pa_wia_county_naicssec_pri.csv. bz2 Read-in requires 8GB of RAM for R… 3/4/2013 © John M. Abowd and Lars Vilhuber 2013, all rights reserved 15

16 Metadata and data “How many data rows does the file you downloaded have?” – QCEW: as many as the.enb file has (no embedded metadata) (88,093) – QWI: count of lines minus 1: the header row is metadata, not data (8,482,131) – Same reasoning for CBP (2,155,389) 3/4/2013 © John M. Abowd and Lars Vilhuber 2013, all rights reserved 16

17 Reading in QWI http://www.vrdc.cornell.edu/qwipu/R2012Q2 /pa/wia/sas_import_wia.sas in the same directory http://www.vrdc.cornell.edu/qwipu/R2012Q2 /pa/wia/sas_import_wia.sas Very long program, but the very first section is for the file we want: qwi_pa_wia_county_naics3 Alternatively, use “proc import”, but may not yield correct results. 3/4/2013 © John M. Abowd and Lars Vilhuber 2013, all rights reserved 17

18 After read-in, same as for QCEW http://www.vrdc.cornell.edu/info7470/Data/l ab5_qwi.sas.txt : http://www.vrdc.cornell.edu/info7470/Data/l ab5_qwi.sas.txt 3/4/2013 © John M. Abowd and Lars Vilhuber 2013, all rights reserved 18

19 Solution for QWI 3/4/2013 © John M. Abowd and Lars Vilhuber 2013, all rights reserved 19

20 County Business Patterns Straight CSV file, but for entire year (15.2MB ZIP file) But: employment refers to March 15, so comparable to the other two Caution: file contains all levels of NAICS, right- filled with “////” 3/4/2013 © John M. Abowd and Lars Vilhuber 2013, all rights reserved 20

21 Solution for CBP 3/4/2013 © John M. Abowd and Lars Vilhuber 2013, all rights reserved 21

22 Results Not all sources give the same answer… – Differences in source data Count of individual wage records Firm-level report of employment at a particular point in time to state reporting system Establishment-level report of employment a particular point in time to federal reporting system – Differences in data cleaning – Other… 3/4/2013 © John M. Abowd and Lars Vilhuber 2013, all rights reserved 22

23 Now that you know how Try it on Lewis and Clark County, MT Try it for earlier time periods Drill down 3/4/2013 © John M. Abowd and Lars Vilhuber 2013, all rights reserved 23


Download ppt "INFO 7470/ECON 7400/ILRLE 7400 Solutions to Lab 5 John M. Abowd and Lars Vilhuber March 25, 2013."

Similar presentations


Ads by Google