Stat 305 2009 Lab 7
Two methods of finding estimators 1. Method of Moment estimator 2. Maximum Likelihood estimator How to compare estimators? Which one is better? How to measure the goodness of estimators of θ?
θ Closeness
Distance function d( , ): Random!!
E Mean Squared Error Random!! ≠ Var( ) In one dimension, ≠ Var( ) Mean Squared Error = Var + bias2, where bias = E( ) - θ
if has a smaller MSE than In one dimension, is better than if has a smaller MSE than for all θ
Hard to get the explicit form!! How to estimate MSE? In simulation, we can generate B independent random samples of size n, say for i = 1, …, B. True value
Assignment 2
(a) For n = 200 , write an R function that simulates the incomes of n individuals from the Pareto distribution. Return Yn and Zn. Hints: Let U be a uniformly distributed random variable over [0, 1] . Then, 1000U−1/α follows the Pareto distribution. (i) Generate a random sample, u, of size n from the uniform distribution over [0, 1]. runif(n) (ii) Transform u into 1000u−1/α, and then call the transformed sample x.
(a) For n = 200 , write an R function that simulates the incomes of n individuals from the Pareto distribution. Return Yn and Zn. myfunction1a= function( n ) { # Define Yn and Zn Yn = ??? Zn = ??? return( c(Yn, Zn) ) }
(b) for ( i in 1:1000) { Y[i] = myfunction1a(200)[1] Write an R function that repeats (a) for K = 1000 times. Then, we have 1000 different values for both Y200 and Z200 . Y = rep(0, 1000) Z = rep(0, 1000) for ( i in 1:1000) { Y[i] = myfunction1a(200)[1] Z[i] = myfunction1a(200)[2] }
(b1) boxplot(Y, Z) boxplot(Y) boxplot(Z) Compare the box-plots of Y200 and Z200 . Output the sample means and sample standard deriations for Y200 and Z200 . Plot the box-plots of Y200 and Z200 . boxplot(Y, Z) boxplot(Y) boxplot(Z) OR Same scale !!! For example, boxplot(Y, ylim=c(0.9,1.8)) boxplot(Z, ylim=c(0.9,1.8))
(b2) Define newy200 = sqrt(200)(Y− α) hist(newy200, freq=F) Plot the histogram for sqrt(200)(Y200 − α) Define newy200 = sqrt(200)(Y− α) hist(newy200, freq=F)
(c1) # Plot 4 histograms in the same window. par(mfrow=c(2,2)) Repeat the exercise in (b) for n = 400, 600, 800, 1000 . What does the histogram of sqrt(1000)(Y1000 − α) look like? # Plot 4 histograms in the same window. par(mfrow=c(2,2)) hist(newy400, freq=F, main=“n=400”) hist(newy600, freq=F) hist(newy800, freq=F) hist(newy1000, freq=F)
(c2) If you want to estimate α , which estimator, Yn or Zn , would you prefer? α =1.16 MSE B = 1000
(d) Simulate the incomes earned by a population of n = 10000 individuals as in part(a). For p = 1, 2, . . . , 100 , calculate the total income earned by the p% of the population with lowest income. Obtain Q(p) , the proportion of income (earned by the whole population) owned by p% of the people with lowest income. Plot the Lorenz curve, i.e., Q(p) against p . The theoretical value of Q(p), the Lorenz curve, L(p): where X(p) is the p/100 th population quantile of the distribution of X, i.e. P(X < X(p)) = p/100
(d) Simulate the incomes earned by a population of n = 10000 individuals as in part(a). Obtain Q(p) , the proportion of income (earned by the whole population) owned by p% of the people with lowest income. Plot the Lorenz curve, i.e., Q(p) against p . For p = 1, 2, . . . , 100 , calculate the total income earned by the p% of the population with lowest income. The theoretical value of Q(p), the Lorenz curve, L(p): where X(p) is the p/100 th population quantile of the distribution of X, i.e. P(X ≤ X(p)) = p/100
(d) 1/n 1/n [z] is the largest integer less than or equal to z. For p = 1, 2, . . . , 100 , calculate the total income earned by the p% of the population with lowest income. The theoretical value of Q(p), the Lorenz curve, L(p): where x[np/100] is the [np/100] th ordered data, i.e. p% of data are less than x[np/100]. 1/n [z] is the largest integer less than or equal to z. For example, [1.89]=1, [0.23]=0 or [-1.34]=-2. 1/n
(d) Partial sum of the ordered data: For p = 1, 2, . . . , 100 , calculate the total income earned by the p% of the population with lowest income. (d) The numerator of Q(p) Partial sum of the ordered data: Data sorting: sort(x) (default sorting: ascending) x[1]+...+x[n/100] When p = 1, The sum of the first 100 ordered x x[1]+...+x[n/100]+ x[n/100+1]+…+ x[2n/100] When p = 2, x[1]+……+x[n] When p = 100, The sum of all x
(d) Obtain Q(p) , the proportion of income (earned by the whole population) owned by p% of the people with lowest income.
(d) Plot the Lorenz curve, i.e., Q(p) against p .
(d) Plot the Lorenz curve, i.e., Q(p) against p .
The explicit form of L(p) for the Pareto distribution is
(e) = 1/(2α -1) G = 1 − 2 · Area under the Lorenz curve Find the Gini coefficient G for the simulated data obtained in (d). G = 1 − 2 · Area under the Lorenz curve = 1/(2α -1) Approximate the area under the Lorenz curve by the average of Q(p).