REDI 3x3 Presentation: Data projects, Wage Inequality and Top Incomes Martin Wittenberg DataFirst 4 November 2014.

Slides:



Advertisements
Similar presentations
Simulating Publicly Subsidized Reinsurance Strategies In Three States Lisa Clemans-Cope, Ph.D. (presenter) Randall R. Bovbjerg, J.D. (PI for Reinsurance.
Advertisements

The Impact of R&D on Innovation and Productivity Professor Derek Bosworth Intellectual Property Research Institute of Australia Melbourne University.
Apr-15H.S.1 Stata: Linear Regression Stata 3, linear regression Hein Stigum Presentation, data and programs at: courses.
CountrySTAT Team-I November 2014, ECO Secretariat,Teheran.
Research on Improvements to Current SIPP Imputation Methods ASA-SRM SIPP Working Group September 16, 2008 Martha Stinson.
Doing an Econometric Project Or Q4 on the Exam. Learning Objectives 1.Outline how you go about doing your own econometric project 2.How to answer Q4 on.
Editing and Imputing VAT Data for the Purpose of Producing Mixed- Source Turnover Estimates Hannah Finselbach and Daniel Lewis Office for National Statistics,
Deliverable 2.8: Outliers Gary Brown Office for National Statistics UK.
STA305 week 31 Assessing Model Adequacy A number of assumptions were made about the model, and these need to be verified in order to use the model for.
1 SSS II Lecture 1: Correlation and Regression Graduate School 2008/2009 Social Science Statistics II Gwilym Pryce
Co-employment and the Business Register: Impact and Solutions Brandy L. Yarbrough U.S. Census Bureau.
World Distribution of Household Wealth James Davies, Susanna Sandström, Anthony Shorrocks and Edward Wolff World Institute for Development Economics Research.
1 A REVIEW OF QUME 232  The Statistical Analysis of Economic (and related) Data.
Econometric Details -- the market model Assume that asset returns are jointly multivariate normal and independently and identically distributed through.
Econ 140 Lecture 121 Prediction and Fit Lecture 12.
Return, Risk, and the Security Market Line
National Institute of Economic and Social Research Means Testing and Retirement Choices in Europe: a Comparison of the British and Danish Systems James.
1 Human Capital 2. 2 Example based on last section: Assume for a person there is just two years after high school. The individual could work in both years.
Kristen Sobeck ILO WAGES AND LABOUR PRODUCTIVITY ACROSS DEVELOPED ECONOMIES
© Institute for Fiscal Studies Child poverty, tax and benefit policy and the labour market since Robert Joyce.
OECD Short-Term Economic Statistics Working PartyJune Analysis of revisions for short-term economic statistics Richard McKenzie OECD OECD Short.
Forecasting Revenue: An Example of Regression Model Building Setting: Possibly a large set of predictor variables used to predict future quarterly revenues.
Issues on Living Wages in the UK Stephen Machin March 2003.
Prentice Hall, Inc. © A Human Resource Management Approach STRATEGIC COMPENSATION Prepared by David Oakes Chapter 8 Building Market-Competitive.
Conditions of Work and Employment Branch (TRAVAIL) Wages and equitable growth 22 March 2013 Sangheon Lee Research and Policy Coordinator International.
Multiple imputation using ICE: A simulation study on a binary response Jochen Hardt Kai Görgen 6 th German Stata Meeting, Berlin June, 27 th 2008 Göteborg.
11 Comparison of Perturbation Approaches for Spatial Outliers in Microdata Natalie Shlomo* and Jordi Marés** * Social Statistics, University of Manchester,
Work Package 5: Integrating data from different sources in the production of business statistics Daniel Lewis Office for National Statistics (UK)
Quantitative Methods Heteroskedasticity.
Use of survey (LFS) to evaluate the quality of census final data Expert Group Meeting on Censuses Using Registers Geneva, May 2012 Jari Nieminen.
The next step in performance monitoring – Stochastic monitoring (and reserving!) NZ Actuarial Conference November 2010.
Linear Functions 2 Sociology 5811 Lecture 18 Copyright © 2004 by Evan Schofer Do not copy or distribute without permission.
All the answers? Statistics New Zealand’s Integrated Data Infrastructure Paper by Felibel Zabala, Rodney Jer, Jamas Enright and Allyson Seyb Presented.
Michael Rogan & John Reynolds. Content International context International Labour Organisation SA context Income, wages & earnings over post-apartheid.
Chapter 9 – Classification and Regression Trees
Combining survey and administrative data to create a new input data file for National Accounts processes Shaun McLaughlin Central Statistics Office, Ireland.
Sample-Based Epidemiology Concepts Infant Mortality in the USA (1991) Infant Mortality in the USA (1991) UnmarriedMarriedTotal Deaths16,71218,78435,496.
The Common Shock Model for Correlations Between Lines of Insurance
Topic (ii): New and Emerging Methods Maria Garcia (USA) Jeroen Pannekoek (Netherlands) UNECE Work Session on Statistical Data Editing Paris, France,
1 Introduction to Survey Data Analysis Linda K. Owens, PhD Assistant Director for Sampling & Analysis Survey Research Laboratory University of Illinois.
Confidence Intervals for Proportions Chapter 8, Section 3 Statistical Methods II QM 3620.
Stat 112 Notes 16 Today: –Outliers and influential points in multiple regression (Chapter 6.7)
Validation Dr Andy Evans. Preparing to model Verification Calibration/Optimisation Validation Sensitivity testing and dealing with error.
Stat 112: Notes 2 Today’s class: Section 3.3. –Full description of simple linear regression model. –Checking the assumptions of the simple linear regression.
Eurostat Statistical matching when samples are drawn according to complex survey designs Training Course «Statistical Matching» Rome, 6-8 November 2013.
Scatterplots & Regression Week 3 Lecture MG461 Dr. Meredith Rolfe.
CHEMISTRY ANALYTICAL CHEMISTRY Fall Lecture 6.
1 South Africa Design and Implementation of Labour Force Surveys Yandiswa Mpetsheni South Africa.
1 The Decomposition of a House Price index into Land and Structures Components: A Hedonic Regression Approach by W. Erwin Diewert, Jan de Haan and Rens.
United Nations Workshop on Revision 3 of Principles and Recommendations for Population and Housing Censuses and Evaluation of Census Data, Amman 19 – 23.
Overview of Regression Analysis. Conditional Mean We all know what a mean or average is. E.g. The mean annual earnings for year old working males.
Expert Group Meeting on MDG, Astana, 5-8 Oct.2009 MDG 3.2: Share of women in wage employment in the non-agricultural sector Sources of discrepancies between.
Item-Non-Response and Imputation of Labor Income in Panel Surveys: A Cross-National Comparison ITEM-NON-RESPONSE AND IMPUTATION OF LABOR INCOME IN PANEL.
R. Ty Jones Director of Institutional Research Columbia Basin College PNAIRP Annual Conference Portland, Oregon November 7, 2012 R. Ty Jones Director of.
Factors influencing the comparability of poverty estimates across household surveys and censuses Derek Yu Department of Economics.
Inflation Report November Output and supply.
Workshop on MDG, Bangkok, Jan.2009 MDG 3.2: Share of women in wage employment in the non-agricultural sector National and global data.
Stats Methods at IC Lecture 3: Regression.
Income inequality in BiH: Combining survey and tax record data
Education as social infrastructure
Basic Estimation Techniques
Multiple Imputation Using Stata
How to handle missing data values
Basic Estimation Techniques
Writing the executive summary section of your report
Geology Geomath Chapter 7 - Statistics tom.h.wilson
The European Statistical Training Programme (ESTP)
Chapter 3 Statistical Concepts.
World Distribution of Household Wealth
Chapter 13: Item nonresponse
Presentation transcript:

REDI 3x3 Presentation: Data projects, Wage Inequality and Top Incomes Martin Wittenberg DataFirst 4 November 2014

Overview DataFirst data projects Wage and Wage Inequality Trends Top earnings

DATAFIRST DATA PROJECTS REDI 3x3 Presentation

Data Projects What is DataFirst? A data service based at UCT Data dissemination – DataFirst portal ( Survey data Metadata Searchable – Secure Data Research Centre Data that is confidential/sensitive NIDS geospatial data, UCT admissions data, CT RSC levy data… Training Research – Data quality – Harmonising data

Data Projects REDI 3x3 data projects Secure data projects – Tax data – QES data – Key issue for both is how to do this within the current legal framework; trust; worry that secure facility is based in CT Harmonisation/data creation projects – SESE: Survey of Employers and the Self-employed, 4 surveys: 2001, 2005, 2009 and 2013 – PALMS: Post-Apartheid Labour Market Series, v2 Contains employment, wages, some infrastructure OHS: annual LFS: biannual QLFS: quarterly q.1 39 surveys, almost 3.8 million records

Data Projects PALMS: What did we add? Rename/redefine variables to be as consistent across time as possible A set of harmonised weights Real earnings series across time: – Changes in measurement – Dealing with outliers – Dealing with brackets/missing incomes

Data Projects Harmonising weights Why do we need to do this? Problems with Stats SA weights – Branson & Wittenberg (2014)

Data Projects Harmonising weights

Data Projects Measurement changes Lots of changes Biggest - break between OHSs and LFSs – Two questions in OHSs (wages and earnings from self-employment; could answer both) – Only one question in LFSs Coverage change between OHSs and LFSs – Big increase in low income earners Mainly self-employed agricultural workers

Data Projects Outliers –Millionaires (real terms) unweightedweighted unweightedweighted SurveynproptotalpropSurveynproptotalprop : : : : : : : : : : E-0610: : : : : : : : : : : : : :2 0 0

Data Projects How do we deal with this? Run (“Mincerian”) wage regression – Generate residuals (i.e. deviations from the predicted wage) – “Studentize” these – Flag residuals that are bigger than 5 in absolute value – should have seen 0.3 cases on a dataset as big as PALMS Actually flagged 476 Outlier variable included with PALMS public release

Data Projects Brackets (LFS case) Salary category 00:100:201:101:202:102:203:103:2 None R 1 - R R R R R R R R R R R R R R R R R R R R R R R R or more

Data Projects How does one deal with this? 4 approaches: – Reweighting: Let those giving Rand amounts “represent” missing incomes in the same bracket – Deterministic imputations Midpoint, Mean, Conditional mean – Stochastic imputations Hot deck – Match individuals to “similar” individuals (on covariates like gender, education etc.), copy income – Multiple stochastic imputation Problem with stochastic imputation is that the value that is imputed is not actually measured, it is the true value plus some error We need to take the variability associated with this into account Do the stochastic imputation multiple times Can take the uncertainty arising from the imputation into account

Data Projects How does PALMS deal with this? “Bracket weights” – Does the reweighting of point values to take the brackets into account Multiple stochastic imputation – Released a dataset with 10 versions of real earnings

Data Projects What do the adjustments do? Point values onlyReweightedImputations (no outliers) outliersremovedoutliersremovedmeanmidpthotdeckmultiple (1)(2)(3)(4)(5)(6)(7)(8) (54.73)(54.74)(59.33)(59.34)(53.15)(57.47)(54.32)(66.63) (42.5)(42.51)(95.37)(95.39)(52.77)(60.29)(55.41)(70.15) (90)(75.37)(111.01)(96.57)(68.33)(67.95)(72.03)(79.7) (327.01)(77.62)(259.53)(84.85)(66.26)(74.57)(68.73)(111.25) 2000: (80.22)(73.01)(90.96)(85.78)(69.45)(84.94)(74.63)(72.67) 2000: ( )(74.85)(990.97)(78.26)(72.71)(85.54)(74.65)(79.74) 2001: (43.67)(42.25)(61.42)(60.53)(51.24)(55.77)(54.46)(61.7) 2001: (59.3)(50.3)(77.94)(69.3)(55.21)(65.37)(57.25)(60.77) Estimated standard errors in parentheses, correcting for clustering, but not correcting for imputations (except in the multiple imputations case)

USING THE DATA: WAGE AND WAGE INEQUALITY TRENDS REDI 3x3

Wage and Wage Inequality Trends Real wage trends

Wage and Wage Inequality Trends Looking at the wage distribution

USING THE DATA: TOP EARNINGS REDI 3x3

Top Earnings Preview Preliminary work done on PALMS v1 Core idea: fit a Pareto distribution to the top tail Estimation strategy – Nonparametric – Parametric Results

Top Earnings Why Pareto distribution? Seems to fit the top tail reasonably well Cowell & Flachaire (2007) suggest that in the presence of data quality issues, inequality might be estimated better by a hybrid approach: – Standard nonparametric estimates on the bulk of the distribution, combined with estimation of the Pareto coefficient at the top Pareto coefficient is a measure of how “heavy” the tails at the top are

Top Earnings Pareto distribution

Top Earnings Position of the top tail

Top Earnings Distribution within the top tail

Top Earnings Estimated Pareto coefficients Cutoff: R4501 (1996)Cutoff: R6001 (1996)Cutoff: R8001 (1996)Cutoff: R2501 (1996) alpha n n n n 95Oct1.950(0.0376)4, (0.0527)2, (0.0788)1, (0.0180)9,536 96Oct1.873(0.0639)1, (0.0841) (0.114) (0.0284)3,781 97Oct1.712(0.0451)2, (0.0556)1, (0.0671) (0.0224)5,999 98Oct1.471(0.0451)1, (0.0510)1, (0.0631) (0.0297)4,175 99Oct1.728(0.0540)2, (0.0657)1, (0.0850) (0.0282)4,990 00Sep1.805(0.0686)2, (0.0959)1, (0.124) (0.0282)5,048 01Sep2.138(0.0621)2, (0.0818)1, (0.0897) (0.0248)5,614 02Sep1.914(0.0584)2, (0.0871)1, (0.122) (0.0265)5,079 03Sep2.054(0.0549)2, (0.0706)1, (0.0911) (0.0240)5,442 04Sep2.097(0.0709)2, (0.0926)1, (0.126) (0.0306)5,088 05Sep1.808(0.0621)2, (0.0920)1, (0.109) (0.0271)5,024 06Sep1.857(0.0651)2, (0.0793)1, (0.117) (0.0282)5,354 07Sep1.628(0.0918)2, (0.119)1, (0.155)1, (0.0453)5,166 Pooled1.823(0.0140)53, (0.0186)31, (0.0238)17, (0.0064)117,647

Top Earnings Summary No evidence in the graphs or table that there is a systematic trend for the distribution to flatten out/steepen Above a cut-off of R4500 the parameter estimates are not that sensitive to the particular cut-off chosen

Top Earnings Implications

Top Earnings Example Illustrative probabilities in the tail cut-off (monthly)probnumbers E E-064

Top Earnings Tax statistics Cutoff

Top Earnings Discussion Results in this case are somewhat sensitive to the choice of the cut-off – For some choices there seems to be evidence for the tail to get “fatter” – Change in coverage? The range of the Pareto estimates (1.5 to 1.1) are noticeably smaller than in the case of labour earnings – Impact of returns on investments? Other forms of compensation? Some comparative figures for other countries (Levy & Levy): US 1.35, UK 1.06, France 1.83

WHERE TO NOW? REDI 3x3

Top Earnings PALMS We will update PALMS next year There seems to be a need for more extensive training – Use of the “bracket weights” – Use of the multiple imputation dataset Further work on data quality adjustments

Top Earnings TAX DATA Hopefully we’ll be able to redo the “top tails” analyses on unit record data Make a “synthetic” version available