Presentation is loading. Please wait.

Presentation is loading. Please wait.

INFO 4470/ILRLE 4470 Visualization Tools and Data Quality John M. Abowd and Lars Vilhuber March 16, 2011.

Similar presentations


Presentation on theme: "INFO 4470/ILRLE 4470 Visualization Tools and Data Quality John M. Abowd and Lars Vilhuber March 16, 2011."— Presentation transcript:

1 INFO 4470/ILRLE 4470 Visualization Tools and Data Quality John M. Abowd and Lars Vilhuber March 16, 2011

2 2 Outline Data quality issues Data quality issues Coverage Coverage Known biases Known biases Missing data Missing data Extended example using the Quarterly Workforce Indicators Extended example using the Quarterly Workforce Indicators 3/16/2011 (c) John M. Abowd and Lars Vilhuber 2011, all rights reserved

3 Coverage What is the target (theoretical) population? Does the statistical frame fully cover this population? What are the known deficiencies in the frame? Can these be addressed with other data? 3/16/2011 (c) John M. Abowd and Lars Vilhuber 2011, all rights reserved 3

4 Biases and Sampling Error Bias: the statistical model does not estimate the quantity of interest in expected value Sampling error: the statistical model has imprecision in estimating the quantity of interest Non-sampling error: variation in the statistical model introduced by editing, imputation, non- random non-response, etc. 3/16/20114 (c) John M. Abowd and Lars Vilhuber 2011, all rights reserved

5 What are Missing Data? Random or systematic missing items or entities in a statistical sample or frame One of the most ubiquitous data quality problems known Cannot be ignored, or rather 3/16/20115 (c) John M. Abowd and Lars Vilhuber 2011, all rights reserved

6 How to code missing data? How about giving a zero, “0” for missing? Note with a distinctive value such as: – -9 for non-responses – -8 for invalid responses – Or even “.” Statistical software like SAS, Stata and SPSS can then recognize those values as missing and treat them according to your instructions 3/16/20116 (c) John M. Abowd and Lars Vilhuber 2011, all rights reserved

7 How to Handle Missing Data It depends on the type of analysis: – Simple summary tabulations – Descriptive stats, including mean and standard deviation – Correlation, relation between two or more variables – Model to estimate the effects of characteristics 3/16/20117 (c) John M. Abowd and Lars Vilhuber 2011, all rights reserved

8 Missing vs Non-missing "There is no difference between cases with missing versus observed data"

9 “Inference and Missing Data” Rubin (1976) MCAR – Missing Completely At Random – Rigorous and hard to achieve – Sometimes by design MAR – Missing At Random – Less Rigorous – Cannot be tested without the data which are missing 3/16/20119 (c) John M. Abowd and Lars Vilhuber 2011, all rights reserved

10 Assume MAR? YES – then mechanism by which data are missing is IGNORABLE NO – then mechanism is not ignorable, requires more involved procedures 3/16/201110 (c) John M. Abowd and Lars Vilhuber 2011, all rights reserved

11 MAR - YES Delete Listwise Pairwise Impute Assign Mean Hot Deck Maximum Likelihood Multiple Imputation 3/16/201111 (c) John M. Abowd and Lars Vilhuber 2011, all rights reserved

12 Visualization of Missing Data in the Quarterly Workforce Indicators

13 Data Sources (Quarterly Only) Quarterly Workforce Indicators (Census Bureau) Quarterly Census of Employment and Wages (BLS) Business Employment Dynamics (BLS) Job Openings and Labor Turnover Survey Adjustments by to JOLTS by Davis, Faberman, Haltiwanger, and Rucker (2010) 3/16/201113 (c) John M. Abowd and Lars Vilhuber 2011, all rights reserved

14 Quarterly Workforce Indicators I Flows are based on longitudinally linked (by employer and employee) Unemployment Insurance Wage Records Beginning-of-quarter employed if wage record with earnings > $1.00 in quarters t-1 and t (B) End-of-quarter employed if wage record with earnings > $1.00 in quarters t and t+1 (E) Accession if wage record in t but not t-1 (A) Separation if wage record in t but not t+1 (S)

15 Quarterly Workforce Indicators II Job creation if establishment has positive employment change from beginning to end of quarter (JC) Job destruction if establishment has negative employment change from beginning to end of quarter (JD), always stated as absolute value of change Demographic (age x gender), geographic (county, CBSA, WIB), NAICS (sector, sub-sector, industry group), and ownership (All, private) 3/16/201115 (c) John M. Abowd and Lars Vilhuber 2011, all rights reserved

16 Quarterly Census of Employment and Wages Stocks of employment measured as of the 12 th day of the month for each month in the quarter for each establishment (no job level data) BLS uses month-3 employment to measure changes Beginning of quarter employment is month-3 employment from quarter t-1 End of quarter employment is month-3 employment for quarter t Geographic (county, MSA), NAICS (sector, sub-sector, industry group), and ownership (all, public, private) 3/16/201116 (c) John M. Abowd and Lars Vilhuber 2011, all rights reserved

17 Business Employment Dynamics Gross job gains (job creations) is the change in employment at an establishment between month-3 in quarter t-1 and month-3 in quarter t, if positive Gross job losses (job destructions) is the absolute value of the change in employment at an establishment between month-3 in quarter t-1 and month-3 in quarter t, if negative Limited geography (state), NAICS (sector) detail 3/16/201117 (c) John M. Abowd and Lars Vilhuber 2011, all rights reserved

18 Job Openings and Labor Turnover Survey Monthly survey of continuing establishments Accessions measured as all new employment over the course of the month (summed over the three months in the quarter here) Separations measured as quits, layoffs, discharges, and other over the course of the month (summed over the three months of the quarter here) Limited geography (national) and NAICS (sector) detail 3/16/201118 (c) John M. Abowd and Lars Vilhuber 2011, all rights reserved

19 QWI Coverage of the Private Workforce Sub-period 1 Sub-period 2 3/16/201119 (c) John M. Abowd and Lars Vilhuber 2011, all rights reserved

20 Worker Reallocation Rate This rate is available in the QWI for 8 age groups, both genders, NAICS sector, state and quarter from 1990:Q1 to 2008:Q4 (more detail is available than we used) 3/16/201120 (c) John M. Abowd and Lars Vilhuber 2011, all rights reserved

21 Job Reallocation Rate This rate is available in the QWI for 8 age groups, both genders, NAICS sector, state and quarter from 1990:Q1 to 2008:Q4 (more detail is available than we used) 3/16/201121 (c) John M. Abowd and Lars Vilhuber 2011, all rights reserved

22 Excess Reallocation Rate ERR = WRR – JRR The excess reallocation rate measures the extent to which gross worker flows exceed the minimum required to service the gross job flows This has been very difficult to estimate nationally because there were no data collected on a consistent basis for all the component flows QWI solved that problem 3/16/201122 (c) John M. Abowd and Lars Vilhuber 2011, all rights reserved

23 Statistical Methodology Divide the analysis into two periods – 1993:Q1-2001:Q4 (early period, many states are completely missing, 10 states complete) – 1999:Q1-2008:Q4 (later period, 37 states are complete) For each sub-period use a multiple imputation model to complete the missing data For the overlap period, use a ramped weight to compute the average implicate combining the two periods Use the standard multiple imputation formulae to combine implicates 3/16/201123 (c) John M. Abowd and Lars Vilhuber 2011, all rights reserved

24 National Estimates The combining formula for producing the national WRR is shown above (similar formulae apply to other rates) 3/16/201124 (c) John M. Abowd and Lars Vilhuber 2011, all rights reserved

25 Implicate Combining Formulae I 3/16/201125 (c) John M. Abowd and Lars Vilhuber 2011, all rights reserved

26 Implicate Combining Formulae II 3/16/201126 (c) John M. Abowd and Lars Vilhuber 2011, all rights reserved

27 Implicate Combining Formulae III 3/16/201127 (c) John M. Abowd and Lars Vilhuber 2011, all rights reserved

28 Results Comparison of WRR between QWI and JOLTS Comparison of JRR between QWI and BED Demographic detail Industry detail 3/16/201128 (c) John M. Abowd and Lars Vilhuber 2011, all rights reserved

29 WRR: QWI v. JOLTS 3/16/201129 (c) John M. Abowd and Lars Vilhuber 2011, all rights reserved

30 WRR: QWI v. Adjusted JOLTS 3/16/201130 (c) John M. Abowd and Lars Vilhuber 2011, all rights reserved

31 JRR: QWI v. BED 3/16/201131 (c) John M. Abowd and Lars Vilhuber 2011, all rights reserved

32 JRR WRR ERR by Gender 3/16/201132 (c) John M. Abowd and Lars Vilhuber 2011, all rights reserved

33 ERR (Churning) by Age Group 3/16/201133 (c) John M. Abowd and Lars Vilhuber 2011, all rights reserved

34 Job Creation Rate 3/16/201134 (c) John M. Abowd and Lars Vilhuber 2011, all rights reserved

35 Job Destruction Rate 3/16/201135 (c) John M. Abowd and Lars Vilhuber 2011, all rights reserved

36 Job Reallocation Rate 3/16/201136 (c) John M. Abowd and Lars Vilhuber 2011, all rights reserved

37 Accession Rate 3/16/201137 (c) John M. Abowd and Lars Vilhuber 2011, all rights reserved

38 Separation Rate 3/16/201138 (c) John M. Abowd and Lars Vilhuber 2011, all rights reserved

39 Worker Reallocation Rate 3/16/201139 (c) John M. Abowd and Lars Vilhuber 2011, all rights reserved


Download ppt "INFO 4470/ILRLE 4470 Visualization Tools and Data Quality John M. Abowd and Lars Vilhuber March 16, 2011."

Similar presentations


Ads by Google