Statistical Inference and Regression Analysis: Stat-GB. 3302

Statistical Inference and Regression Analysis: Stat-GB. 3302
Statistical Inference and Regression Analysis: Stat-GB , Stat-UB Professor William Greene Stern School of Business IOMS Department Department of Economics

Part 2 – A Expectations of Random Variables
2-B Covariance and Correlation 2-C Limit Results for Sums

Expected Value of a Random Variable
Weighted average of the values taken by the variable

Discrete Uniform X = 1,2,…,J Prob(X = x) = 1/J
E[X] = 1/J + 2/J + … + J/J = J(J+1)/2 * 1/J = (J+1)/2 Expected toss of a die = 3.5 (J=6) Expected sum of two dice = 7. Proof?

Poisson ()

Poisson (5)

The St. Petersburg Paradox
Coin toss game. If first heads comes up on nth toss, you win $2n Entry fee to play a game is $C Expected value of the game = E[Win] -C + (½)21 + (½)222 + … + (½)k2k   Game has infinite value. Noone would pay very much to play. Why not?

Continuous Random Variable

Gamma Random Variable

Gamma Function: (1/2)=

Gamma Distributed Random Variable
Used to model nonnegative random variables – e.g., survival of people and electronic components Two special cases P = 1 is the exponential distribution P = ½ and  = ½ is the chi squared with one “degree of freedom”

Expected Value of a Linear Translation
Z = aX+b E[Z] = aE[X] + b Proof is trivial using the definition of the expected value and the fact that the density integrates to 1 to have E[b]=b.

Normal(,) Variable From the definition of the random variable,  is the mean. Proof in Rice (119) uses the linear translation. If X ~ N[0,1], X +  ~ N(,)

Cauchy Random Variables
f(x)=(1/) 1/(1+x2) Mean does not exist. No higher moments exist. If X~N[0,1] and Y ~ N[0,1] then X/Y has the Cauchy distribution. Many applications obtain estimates of interesting quantities as ratios of estimators that are normally distributed.

Cauchy Random Sample

Expected Value of a Function of X
Y=g(X) One to one case E[Y] = expected value of Y(X) – find the distribution of the new variable E[g(X)] = x g(x)f(x) will equal E[Y] Many to one case – similar argument. Proceed without the transformation of the random variable. E[g(X)] is generally not equal to g(E[X]) if g(X) is not linear

Linear Translation Z = aX+b E[Z] = E[aX+b] E[Z] = aE[X] + b
Proof is trivial using the definition of the expected value and the fact that the density integrates to 1 to E[b]=b.

Powers of x - Moments Moment = E[Xk] for positive integer x
Raw moment: E[Xk] Central moment: E[(X – E[X])k] Standard notation E[Xk] = k E[(X – E[X])k] = k Mean = 1 = 

Variance as a g(X) Variance = E[(X – E[X])2]
Standard deviation = square root of variance is usually more interesting

Variance of a Translation: Y = a + bX
Var[a] = 0 Var[bX] = b2Var[X] Standard deviation of Y = |b|S.D.(X)

Shortcut Var[X] = E[X2] - {E[X]}2

Bernoulli Prob(X=1)=; Prob(X=0)=1-  E[X] = 0(1- ) + 1 = 
Var[X] =  - 2 = (1- )

Poisson: Factorial Moment

Normal Moments

Gamma Random Variable

Chi Squared [1] Chi squared [1] = Gamma(½, ½) P = ½ ,  = ½
Mean = P/  = (½)/(½) = 1 Variance = P/ 2 = (½)/[(½)2] = 2

Higher Moments Skewness: 3. Kurtosis: 4.
0 for all symmetric distributions (not just the normal) Standardized measure 3/3 Kurtosis: 4. Standardized 4/4. Compare to normal, 3 Degree of excess = 4/4 – 3.

Symmetric and Skewed Distributions

Kurtosis: t[5] vs. Normal
Kurtosis of normal(0,1) = 3, Excess = 0 Excess Kurtosis of t[k] = 6/(k-4); for t[5] = 6/(5-4) = 6.

Approximations for g(X)
g(X) = continuous function g() exists Continuous first derivative not equal to zero at  Taylor series approximation around mu g(X) = g() g’()(X- ) ½ g’’()(X- ) (+ higher order terms)

Approximation to the Mean
g(X) ~ g() g’()(X- ) ½ g’’()(X - )2 E[g(X)] ~ E[approximation] = g() ½ g’’() E[(X - )2] = g() + ½ g’’() 2

Example: N[, ]. g(X)=exp(X). True mean = exp( +  2/2). Approximation: = exp() + ½ exp()  2 Example:  =0, s = 1, True mean = exp(.5) = Approximation = exp(0) + .5*exp(0)*1 =

Delta method: Var[g(X)]
Use linear approximation g(X) ~ g() + g’()(X - ) Var[g(X)] ~ Var[approximation] = [g’()]22 Example: Var[X2] ~ (2)22

Delta Method – x ~ N[, 2] y = g(x) = exp(x) ~ lognormal Exact
E[y] = exp( + ½ 2) Var[y] = exp(2 + 2)[exp(2) – 1] Approximate E*[y] = exp() + ½ exp() 2 V*[y] = [exp()]2 2 N[0,1], exact mean and variance are exp(.5) =1.648 and exp(1)(exp(1)-1) = Approximations are 1.5 and 1 (!)

Moment Generating Function
Let g(X) = exp(tX) M(t) = E[exp(tX)] = the moment generating function for random variable X.

MGF Bernoulli P(x) = (1-) for x=0 and  for x=1
E[exp(tX)] = (1- )exp(0t) + exp(1t) = (1 - ) + exp(t).

MGF Poisson

MGF Gamma

MGF Normal MX(t) for X ~ N[0,1] is exp(½ t2)
MY(t) for Y = X +  is exp(t)MX(t) = exp[t + ½ 2t2] This is the moment generating function for N[,2]

Generating the Moments
rth derivative of M(t) evaluated at t = 0 gives the rth raw moment, r’ M(r)(t) = drM(t)/dtr |t=0 = equals rth raw moment.

Poisson MGF M(t) = exp((exp(t) – 1)); M(0)=1
M’(t) = M(t) * exp(t); M’(0)=   = M’(0)=1  1 =  2’ = E[X2] = M’’(0) = M’(0) exp(0) exp(0)M(0) = 2 +  Variance = 2’ - 2 = 

Useful Properties MGF of X = MX(t) and y = a+bX then
MY(t) for y is exp(at)MX(bt) For independent X and Y, MX+Y (t) = is MX(t)MY(t) The sequence of moments does not uniquely define the distribution

Side Results MGF MX(t) = E[exp(tx)] does not always exist.
Characteristic function E[exp(itx)] always exists. Used to prove central limit theorems Cumulant generating function logMX(t) is sometimes useful. Cumulants are functions of moments. First cumulant is the mean, second is the variance.

Part 2 – B Covariance and Correlation

Covariance Random variables X,Y with joint discrete distribution p(X,Y) or continuous density f(x,y). Covariance = E({X – E[X]}{Y-E[Y]}) = E[XY] – E[X] E[Y]. (Note, Covariance of X,X = Var[X]. Connection to joint distribution and covariation

Correlation and Covariance

Correlated Populations

Correlated Variables X1 and X2 are independent with means 0 and standard deviations 1. Y = aX1 + bX2. Choose a and b such that X1 and Y have means 0, standard deviation 1 and correlation rho. Var[Y] = a2 + b2 = 1 Cov[X1,Y] = a = . b = sqr(1 – 2)

Conditional Distributions
f(y|x) = f(y,x) / f(x) Conditional distribution of y given a realization of x Conditional mean = mean of the conditional random variable = regression function Conditional variance = variance of conditional random variable = scedastic function

Litigation Risk Analysis
Form probability tree for decisions and outcomes Determine conditional expected payoffs (gains or losses) Choose strategy to optimize expected value of payoff function (minimize loss or maximize (net) gain.

Litigation Risk Analysis: Using Conditional Probabilities to Determine a Strategy
P(Upper path) = P(Causation|Liability,Document)P(Liability|Document)P(Document) = P(Causation,Liability|Document)P(Document) = P(Causation,Liability,Document) = .7(.6)(.4)= (Similarly for lower path, probability = .5(.3)(.6) = .09.) Two paths to a favorable outcome. Probability = (upper) .7(.6)(.4) + (lower) .5(.3)(.6) = = .258. How can I use this to decide whether to litigate or not? Suppose the cost to litigate = $1,000,000 and a favorable outcome pays $3,000,000. What should you do?

Joint Normal Random Variables

Conditional Normal

Y and Y|X Y X X

Application: Conditional Expected Profits and Risk
You must decide how many copies of your self published novel to print . Based on market research, you believe the following distribution describes X, your likely sales (demand). x P(X=x) (Note: Sales are in thousands. Convert your final result to dollars after all computations are done by multiplying your final results by $1,000.) Printing costs are $1.25 per book. (It’s a small book.) The selling price will be $ Any unsold books that you print must be discarded (at a loss of $1.25/copy). You must decide how many copies of the book to print, 25, 40, 55 or 70. (You are committed to one of these four – 0 is not an option.) A. What is the expected number of copies demanded. B. What is the standard deviation of the number of copies demanded. C. Which of the four print runs shown maximizes your expected profit? Compute all four. D. Which of the four print runs is least risky – i.e., minimizes the standard deviation of the profit (given the number printed). Compute all four. E. Based on C. and D., which of the four print runs seems best for you?

Compute expected profit given the decision how many to print.

Expected Profit Given Print Run

Run=70,000 Run=55,000 Run=40,000 Run=25,000

Run=70,000 55,000 surely dominates 70,000 Run=55,000 Run=40,000 Run=25,000

Run=70,000 40,000 should dominate 25,000 for any but the most extremely risk averse decision maker. Run=55,000 Now what? Run=40,000 Run=25,000

On average, does the number of doctor visits by a household vary systematically with income? YES, negatively in fact. The data suggest that people with higher incomes are healthier and require fewer doctor visits.

Useful Theorems - 1 E[Y] = EX[EY[Y|X]] Expectation over X of EY[Y|X]
Law of Iterated Expectations

Example: Hierarchical Model

Useful Theorems - 2 Decomposition of variance
Var[Y] = Var[E[Y|X]] + E[Var[Y|X]]

Bivariate Normal

Useful Theorems - 3 Cov(X,Y)=Cov(X,E[Y|X])
In the hierarchical model, E[y|x]=x so Cov(X,Y)=Cov(X, X)= Var[X]= /2

Mean Squared Error

Minimum MSE Predictor

Variance of the Sum of X and Y
Var[X+Y] = EYEX[ {(X+Y) - (X + Y) }2 ] = EYEX[ {(X- X) + (Y- Y)}2] = EYEX[ (X - X)2] + EYEX[(Y- Y )2] + 2 EYEX[(X- X)(Y- Y)] = EX[ (X - X)2] + EY[(Y- Y )2] + 2 EYEX[(X- X)(Y- Y)] = Var[X] + Var[Y] + 2Cov(X,Y)

Variance of Weighted Sum
Var[aX+bY] = Var[aX] + Var[bY] +2Cov(aX,bY) = a2Var[X] + b2Var[Y] + 2ab Cov(X,Y). Also, Cov(X,Y) is the numerator in ρxy, so Cov(X,Y) = ρxy σx σy.

Application - Portfolio
You have $1000 to allocate between assets A and B. The yearly returns on the two assets are random variables rA and rB. The means of the two returns are E[rA] = μA and E[rB] = μB The standard deviations (risks) of the returns are σA and σB. The correlation of the two returns is ρAB You will allocate a proportion w of your $1000 to A and (1-w) to B.

Risk and Return Your expected return on each dollar is
E[wrA + (1-w)rB] = wμA + (1-w)μB The variance your return on each dollar is Var[wrA + (1-w)rB] = w2 σA2 + (1-w)2σB2 + 2w(1-w)ρABσAσB The standard deviation is the square root.

Risk and Return: Example
Suppose you know μA, μB, ρAB, σA, and σB (You have watched these stocks for many years.) The mean and standard deviation are then just functions of w. I will then compute the mean and standard deviation for different values of w. For example, μA = .04, μB, = .07 σA = .02, σB,=.06, ρAB = -.4 E[return] = w(.04) + (1-w)(.07) = w SD[return] = sqr[w2(.022)+ (1-w)2(.062) + 2w(1-w)(-.4)(.02)(.06)] = sqr[.0004w (1-w) w(1-w)]

Mean and Variance of a Sum

Extension: Weighted Sum

More General Portfolio Problem

Optimal Portfolio?

Part 2-C – Sums of Random Variables

Sequences of Independent Random Variables
x1,x2,…,xn = a set of n random variables Same (marginal) probability distribution, f(x) Finite identical mean μ and variance σ2 Statistically independent IID = independent identically distributed This is a “random sample” from the population f(x).

The Sample Mean

Convergence of a Random Variable to a Constant

Convergence in Mean Square
If E[Xn ] = μ and Var[Xn ]  0 as n   Then Xn converges in mean square to μ Slightly Broader Extension If E[Xn ]  μ as n   Var[Xn ]  0 as n   Then Xn converges in mean square to μ. Xn + (1/n) converges in mean square to μ.

Convergence in Mean Square: The top figure is a histogram for 1,000 means of samples of 4; the center is for samples of 9, the lowest one is for samples of 15. The vertical bars go through 7, 10 and 13 on all three figures.

Convergence of Means

Probability Limits: Plim xn

Probability Limits and Expectations
What is the difference between E[xn] and plim xn? Consider: X = n with prob(X=n)=1/n X = 1 with prob(X=1)=1 – 1/n E[X]=2 – 1/n  2 Plim(X) = 1

The Slutsky Theorem Assumptions: If
xn is a random variable such that plim xn = θ. For now, we assume θ is a constant. g(.) is a continuous function with continuous derivatives. g(.) is not a function of n. Conclusion: Then plim[g(xn)] = g[plim(xn)] assuming g[plim(xn)] exists. Works for probability limits. Does not work for expectations.

Multivariate Slutsky Theorem
Plim xn = a, Plim yn = b g(xn,yn) is continuous, has continuous first derivatives and exists at (a,b). Plim g(xn,yn) = g(a,b) Generalizes to K functions of M random variables

Monte Carlo Integration Using the Law of Large Numbers

Application For Normal(2,1.52), E[exp(x)] = exp(2 + ½1.52) = 22.76
Draw 10,000 random U(0,1) draws. Transform to x ~ N(0,1) then z = *x Compute q=exp(z) and average 10,000 draws on q. My result was

Limit Results Mean converges in probability to . Variance goes to zero. If n is finite, what can be said about its behavior? Objective: characterize the distribution of the mean when n is large but finite Strategy: find a limit result then use it to approximate for finite n.

A Finite Sample Distribution
Means of 1000 samples of 8 observations from U[0,1].

Central Limit Theorems

Limiting Distributions
Xn has probability density fXn(Xn) and cdf FXn(Xn). If FXn(Xn)  F(X) as n  , then F(X) is the limiting distribution. (At points where F(X) is continuous in X.)

Lindeberg – Levy Central Limit Theorem

Other Central Limit Theorems
Lindeberg Levy for i.i.d. Lindeberg Feller – heteroscedastic. Variances may differ Lyupanov: distributions may differ Extensions - time series with some covariance across observations.

Rough Approximations

A Useful Convergence Result

Combine Slutsky with the Central Limit Theorem
General Result: if Xn()  F(X|) and if yn  , then Xn(yn)  F(X|)

Asymptotic Distributions
An asymptotic distribution is an approximation to a true finite n distribution based on a result found for the limiting distribution (with infinite n)

Asymptotic Distribution

Appendix

The Chebychev Inequality
For any random variable with finite mean  and variance 2, Prob[|X-|/ > k] < 1/k2 Prob X is farther than k standard deviations from the mean is less than or equal to 1/k2. Useful for proofs, not for practical computations.

Normal Approximation to Binomial
Binomial (n,p) equals the sum of n Bernoulli’s with parameter p. Each Bernoulli X has  = p and 2 = p(1-p). Sum of n variables is approximately normal with mean np and variance np(1-p).

Approximation to binomial with n = 48, p=.25

Demoivre’s Normal Approximation
The binomial density function has n=48, θ=.25, so μ = 12 and σ = 3. The normal density plotted has mean 12 and standard deviation 3.

Using deMoivre’s Approximation
Total P[8 < x < 15]= P[(8-12)/3<z<(15-12)/3]= P[-1.33 < z < 1]= P[z < 1] – P[z < -1.33]= – = % error The binomial has n=48, θ=.25, so μ = 12 and σ = 3. The normal distribution plotted has mean 12 and standard deviation 3.  What happened?

Continuity Correction
When using a continuous distribution (normal) to approximate a discrete probability (binomial), subtract .5 from the lowest value in the range and add .5 to the highest value in the range. (The correction becomes less important as n increases.)

Correcting deMoivre’s Approximation
Total P[7.5 < x < 15.5]= P[(7.5-12)/3<z<( )/3]= P[-1.5 < z < 1.166]= P[z < 1.166] – P[z < -1.5]= – = % error The binomial has n=48, θ=.25, so μ = 12 and σ = 3. The normal distribution plotted has mean 12 and standard deviation 3. 

Statistical Inference and Regression Analysis: Stat-GB. 3302

Similar presentations

Presentation on theme: "Statistical Inference and Regression Analysis: Stat-GB. 3302"— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

Statistical Inference and Regression Analysis: Stat-GB. 3302

Similar presentations

Presentation on theme: "Statistical Inference and Regression Analysis: Stat-GB. 3302"— Presentation transcript:

Similar presentations

About project

Feedback