Frank Cowell: TU Lisbon – Inequality & Poverty Inequality: Empirical Issues July 2006 Inequality and Poverty Measurement Technical University of Lisbon.

Slides:



Advertisements
Similar presentations
Tests of Hypotheses Based on a Single Sample
Advertisements

“Students” t-test.
Hypothesis Testing A hypothesis is a claim or statement about a property of a population (in our case, about the mean or a proportion of the population)
Hypothesis Testing IV Chi Square.
Chapter 10 Section 2 Hypothesis Tests for a Population Mean
Statistical Techniques I EXST7005 Lets go Power and Types of Errors.
Sociology 601 Class 13: October 13, 2009 Measures of association for tables (8.4) –Difference of proportions –Ratios of proportions –the odds ratio Measures.
Confidence Interval and Hypothesis Testing for:
The Multiple Regression Model Prepared by Vera Tabakova, East Carolina University.
Empirical Analysis Doing and interpreting empirical work.
Copyright ©2011 Brooks/Cole, Cengage Learning Analysis of Variance Chapter 16 1.
Business Statistics - QBM117
Evaluating Hypotheses Chapter 9. Descriptive vs. Inferential Statistics n Descriptive l quantitative descriptions of characteristics.
Lecture 5 Outline – Tues., Jan. 27 Miscellanea from Lecture 4 Case Study Chapter 2.2 –Probability model for random sampling (see also chapter 1.4.1)
Topic 2: Statistical Concepts and Market Returns
Inferences About Process Quality
BCOR 1020 Business Statistics
5-3 Inference on the Means of Two Populations, Variances Unknown
Chapter 9: Introduction to the t statistic
Copyright (c) 2004 Brooks/Cole, a division of Thomson Learning, Inc. Chapter 8 Tests of Hypotheses Based on a Single Sample.
4.1 Introducing Hypothesis Tests 4.2 Measuring significance with P-values Visit the Maths Study Centre 11am-5pm This presentation.
AM Recitation 2/10/11.
Section 9.1 Introduction to Statistical Tests 9.1 / 1 Hypothesis testing is used to make decisions concerning the value of a parameter.
Chapter 8 Introduction to Hypothesis Testing
CENTRE FOR INNOVATION, RESEARCH AND COMPETENCE IN THE LEARNING ECONOMY Session 2: Basic techniques for innovation data analysis. Part I: Statistical inferences.
AP Statistics Chapters 0 & 1 Review. Variables fall into two main categories: A categorical, or qualitative, variable places an individual into one of.
Copyright © Cengage Learning. All rights reserved. 8 Tests of Hypotheses Based on a Single Sample.
4.2 One Sided Tests -Before we construct a rule for rejecting H 0, we need to pick an ALTERNATE HYPOTHESIS -an example of a ONE SIDED ALTERNATIVE would.
Correlation.
CHAPTER 16: Inference in Practice. Chapter 16 Concepts 2  Conditions for Inference in Practice  Cautions About Confidence Intervals  Cautions About.
Chapter 8 Introduction to Hypothesis Testing
6.1 - One Sample One Sample  Mean μ, Variance σ 2, Proportion π Two Samples Two Samples  Means, Variances, Proportions μ 1 vs. μ 2.
LECTURE 19 THURSDAY, 14 April STA 291 Spring
Statistical Decision Making. Almost all problems in statistics can be formulated as a problem of making a decision. That is given some data observed from.
University of Ottawa - Bio 4118 – Applied Biostatistics © Antoine Morin and Scott Findlay 08/10/ :23 PM 1 Some basic statistical concepts, statistics.
Random Regressors and Moment Based Estimation Prepared by Vera Tabakova, East Carolina University.
Agresti/Franklin Statistics, 1 of 122 Chapter 8 Statistical inference: Significance Tests About Hypotheses Learn …. To use an inferential method called.
CHAPTER 17: Tests of Significance: The Basics
Hypothesis Testing A procedure for determining which of two (or more) mutually exclusive statements is more likely true We classify hypothesis tests in.
1 Chapter 10: Introduction to Inference. 2 Inference Inference is the statistical process by which we use information collected from a sample to infer.
Confidence intervals and hypothesis testing Petter Mostad
Maximum Likelihood Estimation Methods of Economic Investigation Lecture 17.
Nonparametric Tests IPS Chapter 15 © 2009 W.H. Freeman and Company.
Chapter 5 Parameter estimation. What is sample inference? Distinguish between managerial & financial accounting. Understand how managers can use accounting.
Lecture 4: Statistics Review II Date: 9/5/02  Hypothesis tests: power  Estimation: likelihood, moment estimation, least square  Statistical properties.
Statistical Inference for the Mean Objectives: (Chapter 9, DeCoursey) -To understand the terms: Null Hypothesis, Rejection Region, and Type I and II errors.
Hypothesis Testing An understanding of the method of hypothesis testing is essential for understanding how both the natural and social sciences advance.
Fall 2002Biostat Statistical Inference - Confidence Intervals General (1 -  ) Confidence Intervals: a random interval that will include a fixed.
Robust Estimators.
Introduction Suppose that a pharmaceutical company is concerned that the mean potency  of an antibiotic meet the minimum government potency standards.
Review of Statistics.  Estimation of the Population Mean  Hypothesis Testing  Confidence Intervals  Comparing Means from Different Populations  Scatterplots.
Statistical Techniques
Statistical Inference Statistical inference is concerned with the use of sample data to make inferences about unknown population parameters. For example,
 Assumptions are an essential part of statistics and the process of building and testing models.  There are many different assumptions across the range.
Comparing Means Chapter 24. Plot the Data The natural display for comparing two groups is boxplots of the data for the two groups, placed side-by-side.
Testing a Single Mean Module 16. Tests of Significance Confidence intervals are used to estimate a population parameter. Tests of Significance or Hypothesis.
PEP-PMMA Training Session Statistical inference Lima, Peru Abdelkrim Araar / Jean-Yves Duclos 9-10 June 2007.
Hypothesis Testing Steps : 1. Review Data : –Sample size. –Type of data. –Measurement of data. –The parameter ( ,  2,P) you want to test. 2. Assumption.
Statistical Inference for the Mean Objectives: (Chapter 8&9, DeCoursey) -To understand the terms variance and standard error of a sample mean, Null Hypothesis,
Copyright (c) 2004 Brooks/Cole, a division of Thomson Learning, Inc. Chapter 7 Inferences Concerning Means.
Copyright © 2013, 2009, and 2007, Pearson Education, Inc. 1 FINAL EXAMINATION STUDY MATERIAL III A ADDITIONAL READING MATERIAL – INTRO STATS 3 RD EDITION.
Statistical Decision Making. Almost all problems in statistics can be formulated as a problem of making a decision. That is given some data observed from.
Model Comparison. Assessing alternative models We don’t ask “Is the model right or wrong?” We ask “Do the data support a model more than a competing model?”
STA248 week 121 Bootstrap Test for Pairs of Means of a Non-Normal Population – small samples Suppose X 1, …, X n are iid from some distribution independent.
Learning Objectives Describe the hypothesis testing process Distinguish the types of hypotheses Explain hypothesis testing errors Solve hypothesis testing.
Two-Sample Hypothesis Testing
...Relax... 9/21/2018 ST3131, Lecture 3 ST5213 Semester II, 2000/2001
When we free ourselves of desire,
Chapter 7: The Normality Assumption and Inference with OLS
Inequality: Empirical Issues
Presentation transcript:

Frank Cowell: TU Lisbon – Inequality & Poverty Inequality: Empirical Issues July 2006 Inequality and Poverty Measurement Technical University of Lisbon Frank Cowell

Frank Cowell: TU Lisbon – Inequality & Poverty Motivation Interested in sensitivity to extreme values for a number of reasons Interested in sensitivity to extreme values for a number of reasons Welfare properties of income distribution Welfare properties of income distribution Robustness in estimation Robustness in estimation Intrinsic interest in the very rich, the very poor. Intrinsic interest in the very rich, the very poor.

Frank Cowell: TU Lisbon – Inequality & Poverty Sensitivity? How to define a “sensitive” inequality measure? How to define a “sensitive” inequality measure? Ad hoc discussion of individual measures Ad hoc discussion of individual measures  empirical performance on actual data (Braulke 83).  not satisfactory for characterising general properties Welfare-theoretical approaches Welfare-theoretical approaches  focuses on transfer sensitivity (Shorrocks-Foster 1987)  But does not provide a guide to the way measures may respond to extreme values. Need a general and empirically applicable tool. Need a general and empirically applicable tool.

Frank Cowell: TU Lisbon – Inequality & Poverty Preliminaries A large class of inequality measures: A large class of inequality measures: Define two moments: Can be written as:

Frank Cowell: TU Lisbon – Inequality & Poverty The Influence Function Mixture distribution: Influence function: For the class of inequality measures: which yields:

Frank Cowell: TU Lisbon – Inequality & Poverty Some Standard Measures GE: Theil: MLD: Atkinson: Log var:

Frank Cowell: TU Lisbon – Inequality & Poverty …and their IFs GE: Theil: MLD: Atkinson: Log var:

Frank Cowell: TU Lisbon – Inequality & Poverty Special case The IF: The Gini coeff: where:

Frank Cowell: TU Lisbon – Inequality & Poverty Tail behaviour z   z z z z z zzzz z  0 [log z]  zzzz - log z - -  < 0  = 0  1  > 1 Log Var GiniGE

Frank Cowell: TU Lisbon – Inequality & Poverty Implications Generalised Entropy measures with  > 1 are very sensitive to high incomes in the data. Generalised Entropy measures with  > 1 are very sensitive to high incomes in the data. GE (  < 0) are very sensitive to low incomes GE (  < 0) are very sensitive to low incomes We can’t compare the speed of increase of the IF for different values of 0 <  < 1 We can’t compare the speed of increase of the IF for different values of 0 <  < 1 If we don’t know the income distribution, we can’t compare the IFs of different class of measures. If we don’t know the income distribution, we can’t compare the IFs of different class of measures. So, let’s take a standard model… So, let’s take a standard model…

Frank Cowell: TU Lisbon – Inequality & Poverty Singh-Maddala c = 1.2 c = 0.7 c = 1.7

Frank Cowell: TU Lisbon – Inequality & Poverty Using S-M to get the IFs Use these to get true values of inequality measures. Use these to get true values of inequality measures. Obtained from the moments: Obtained from the moments: Take parameter values a=100, b=2.8, c=1.7 Normalise the IFs Use relative influence function Good model of income distribution of German households

Frank Cowell: TU Lisbon – Inequality & Poverty IFs based on S-M Gini

Frank Cowell: TU Lisbon – Inequality & Poverty IF using S-M: conclusions When z increases, IF increases faster with high values of . When z increases, IF increases faster with high values of . When z tends to 0, IF increases faster with small values of . When z tends to 0, IF increases faster with small values of . IF of Gini index increases slower than others but is larger for moderate values of z. IF of Gini index increases slower than others but is larger for moderate values of z. Comparison of the Gini index with GE or Log Variance does not lead to clear conclusions. Comparison of the Gini index with GE or Log Variance does not lead to clear conclusions.

Frank Cowell: TU Lisbon – Inequality & Poverty A simulation approach Use a simulation study to evaluate the impact of a contamination in extreme observations. Use a simulation study to evaluate the impact of a contamination in extreme observations. Simulate 100 samples of 200 observations from S-M distribution. Simulate 100 samples of 200 observations from S-M distribution. Contaminate just one randomly chosen observation by multiplying it by 10. Contaminate just one randomly chosen observation by multiplying it by 10. Contaminate just one randomly chosen observation by dividing it by 10. Contaminate just one randomly chosen observation by dividing it by 10. Compute the quantity Compute the quantity Empirical Distribution Contaminated Distribution

Frank Cowell: TU Lisbon – Inequality & Poverty Contamination in high values 100 different samples sorted such that Gini realisations are increasing. RC(I) Gini is less affected by contamination than GE. Impact on Log Var and GE (  1 is relatively small compared to GE (  1) GE (0   1) is less sensitive if  is smaller Log Var is slightly more sensitive than Gini

Frank Cowell: TU Lisbon – Inequality & Poverty Contamination in low values 100 different samples sorted such that Gini realisations are increasing. RC(I) Gini is less affected by contamination than GE. Impact on Log Var and GE (  1 is relatively small compared to GE (  1) GE (0   1) is less sensitive if  is larger Log Var is more sensitive than Gini

Frank Cowell: TU Lisbon – Inequality & Poverty Influential Observations Drop the i th observation from the sample Call the resulting inequality estimate Î (i) Compare I(F) with Î (i) Use the statistic Take sorted sample of 5000 Examine 10 from bottom, middle and top

Frank Cowell: TU Lisbon – Inequality & Poverty Influential observations: summary Observations in the middle of the sorted sample don’t affect estimates compared to smallest or highest observations. Observations in the middle of the sorted sample don’t affect estimates compared to smallest or highest observations. Highest values are more influential than smallest values. Highest values are more influential than smallest values. Highest value is very influential for GE (  = 2) Highest value is very influential for GE (  = 2) Its estimate should be modified by nearly if we remove it. Its estimate should be modified by nearly if we remove it. GE (  = –1) strongly influenced by the smallest observation. GE (  = –1) strongly influenced by the smallest observation.

Frank Cowell: TU Lisbon – Inequality & Poverty Extreme values An extreme value is not necessarily an error or some sort of contamination An extreme value is not necessarily an error or some sort of contamination Could be an observation belonging to the true distribution Could be an observation belonging to the true distribution Could convey important information. Could convey important information. Observation is extreme in the sense that its influence on the inequality measure estimate is important. Observation is extreme in the sense that its influence on the inequality measure estimate is important. Call this a high-leverage observation. Call this a high-leverage observation.

Frank Cowell: TU Lisbon – Inequality & Poverty High-leverage observations The term leaves open the question of whether such observations “belong” to the distribution The term leaves open the question of whether such observations “belong” to the distribution But they can have important consequences on the statistical performance of the measure. But they can have important consequences on the statistical performance of the measure. Can use this performance to characterise the properties of inequality measures under certain conditions. Can use this performance to characterise the properties of inequality measures under certain conditions. Focus on the Error in Rejection Probability as a criterion. Focus on the Error in Rejection Probability as a criterion.

Frank Cowell: TU Lisbon – Inequality & Poverty Davidson-Flachaire (1) Even in very large samples the ERP of an asymptotic or bootstrap test based on the Theil index, can be significant Even in very large samples the ERP of an asymptotic or bootstrap test based on the Theil index, can be significant Tests are therefore not reliable. Tests are therefore not reliable. Three main possible causes : Three main possible causes : 1. Nonlinearity 2. Noise 3. Nature of the tails.

Frank Cowell: TU Lisbon – Inequality & Poverty Davidson-Flachaire (2) Three main possible causes : Three main possible causes : 1. Indices are nonlinear functions of sample moments. Induces biases and nonnormality in estimates. 2. Estimates of the covariances of the sample moments used to construct indices are often noisy. 3. Indices often sensitive to the exact nature of the tails. A bootstrap sample with nothing resampled from the tail can have properties different from those of the population. Simulation experiments show that case 3 is often quantitatively the most important. Simulation experiments show that case 3 is often quantitatively the most important. Statistical performance should be better with MLD and GE (0 <  < 1 ), than with Theil. Statistical performance should be better with MLD and GE (0 <  < 1 ), than with Theil.

Frank Cowell: TU Lisbon – Inequality & Poverty Empirical methods The empirical distribution Inequality estimate Empirical moments Empirical Distribution Indicator function

Frank Cowell: TU Lisbon – Inequality & Poverty Testing Test statistic Variance estimate For given value I 0 test

Frank Cowell: TU Lisbon – Inequality & Poverty Bootstrap To construct bootstrap test, resample from the original data. To construct bootstrap test, resample from the original data. Bootstrap inference should be superior Bootstrap inference should be superior For bootstrap sample j, j = 1,…,B, a bootstrap statistic W * j is computed almost as W from the original data For bootstrap sample j, j = 1,…,B, a bootstrap statistic W * j is computed almost as W from the original data But I 0 in the numerator is replaced by the index Î estimated from the original data. But I 0 in the numerator is replaced by the index Î estimated from the original data. Then the bootstrap P-value is Then the bootstrap P-value is

Frank Cowell: TU Lisbon – Inequality & Poverty Error in Rejection Probability: A ERPs of asymptotic tests at the nominal level 0.05 Difference between the actual and nominal probabilities of rejection Example: o oN = observations o oERP of GE (  =2) is 0.11 o oAsymptotic test over-rejects the null hypothesis o oThe actual level is 16%, when the nominal level is 5%.

Frank Cowell: TU Lisbon – Inequality & Poverty Error in Rejection Probability: B ERPs of bootstrap tests. Distortions are reduced for all measures But ERP of GE (  = 2) is still very large even in large samples ERPs of GE (  = 0.5, –1) is small only for large samples. GE (  =0) (MLD) performs better than others. ERP is small for 500 or more observations.

Frank Cowell: TU Lisbon – Inequality & Poverty More on ERP for GE What would happen in very large samples? –1–1–1– N=100,000 N=50,000 

Frank Cowell: TU Lisbon – Inequality & Poverty ERP: conclusions Rate of convergence to zero of ERP of asymptotic tests is very slow. Same applies to bootstrap Tests based on GE measures can be unreliable even in large samples.

Frank Cowell: TU Lisbon – Inequality & Poverty Sensitivity: a broader perspective Results so far are for a specific Singh-Maddala distribution. Results so far are for a specific Singh-Maddala distribution. It is realistic, but – obviously – special. It is realistic, but – obviously – special. Consider alternative parameter values Consider alternative parameter values  Particular focus on behaviour in the upper tail Consider alternative distributions Consider alternative distributions  Use other familiar and “realistic” functional forms  Focus on lognormal and Pareto

Frank Cowell: TU Lisbon – Inequality & Poverty Alternative distributions First consider comparative contamination performance for alternative distributions, same inequality index First consider comparative contamination performance for alternative distributions, same inequality index Use same diagrammatic tool as before Use same diagrammatic tool as before x-axis is the 100 different samples, sorted such inequality realizations are increasing x-axis is the 100 different samples, sorted such inequality realizations are increasing y-axis is RC(I) for the MLD index y-axis is RC(I) for the MLD index

Frank Cowell: TU Lisbon – Inequality & Poverty Singh-Maddala c = 1.2 c = 0.7 (“heavy” upper tail) c = 1.7 Inequality found from: Distribution function:

Frank Cowell: TU Lisbon – Inequality & Poverty Contamination S-M

Frank Cowell: TU Lisbon – Inequality & Poverty Lognormal  = 0.7  = 1.0 (“heavy” upper tail)  = 0.5 Inequality: Distribution function:

Frank Cowell: TU Lisbon – Inequality & Poverty Contamination: Lognormal

Frank Cowell: TU Lisbon – Inequality & Poverty Pareto  = 2.0  = 2.5  = 1.5 (“heavy” upper tail)

Frank Cowell: TU Lisbon – Inequality & Poverty MLD Contamination Pareto

Frank Cowell: TU Lisbon – Inequality & Poverty ERP at nominal 5%: MLD Asymptotic tests Bootstrap tests

Frank Cowell: TU Lisbon – Inequality & Poverty ERP at nominal 5%: Theil Asymptotic tests Bootstrap tests

Frank Cowell: TU Lisbon – Inequality & Poverty Comparing Distributions Bootstrap tests usually improve numerical performance. MLD is more sensitive to contamination in high incomes when the underlying distribution upper tail is heavy. ERP of an asymptotic and bootstrap test based on the MLD or Theil index is more significant when the underlying distribution upper tail is heavy.

Frank Cowell: TU Lisbon – Inequality & Poverty Why the Gini…? Why use the Gini coefficient? Why use the Gini coefficient?  Obvious intuitive appeal  Sometimes suggested that Gini is less prone to the influence of outliers Less sensitive to contamination in high incomes than GE indices. Less sensitive to contamination in high incomes than GE indices. But little to choose between… But little to choose between…  the Gini coefficient and MLD  Gini and the logarithmic variance

Frank Cowell: TU Lisbon – Inequality & Poverty The Bootstrap…? Does the bootstrap “get you out of trouble”? Does the bootstrap “get you out of trouble”? bootstrap performs better than asymptotic methods, bootstrap performs better than asymptotic methods,  but does it perform well enough? In terms of the ERP, the bootstrap does well only for the Gini, MLD and logarithmic variance. In terms of the ERP, the bootstrap does well only for the Gini, MLD and logarithmic variance. If we use a distribution with a heavy upper tail bootstrap performs poorly in the case of  = 0 If we use a distribution with a heavy upper tail bootstrap performs poorly in the case of  = 0  even in large samples.