Download presentation
Presentation is loading. Please wait.
Published byHarry Moore Modified over 6 years ago
1
Statistical Inference and Regression Analysis: Stat-GB. 3302
Statistical Inference and Regression Analysis: Stat-GB , Stat-UB Professor William Greene Stern School of Business IOMS Department Department of Economics
2
Part 2 – A Expectations of Random Variables
2-B Covariance and Correlation 2-C Limit Results for Sums
3
Expected Value of a Random Variable
Weighted average of the values taken by the variable
4
Discrete Uniform X = 1,2,…,J Prob(X = x) = 1/J
E[X] = 1/J + 2/J + … + J/J = J(J+1)/2 * 1/J = (J+1)/2 Expected toss of a die = 3.5 (J=6) Expected sum of two dice = 7. Proof?
5
Poisson ()
6
Poisson (5)
7
The St. Petersburg Paradox
Coin toss game. If first heads comes up on nth toss, you win $2n Entry fee to play a game is $C Expected value of the game = E[Win] -C + (½)21 + (½)222 + … + (½)k2k Game has infinite value. Noone would pay very much to play. Why not?
8
Continuous Random Variable
9
Gamma Random Variable
10
Gamma Function: (1/2)=
11
Gamma Distributed Random Variable
Used to model nonnegative random variables – e.g., survival of people and electronic components Two special cases P = 1 is the exponential distribution P = ½ and = ½ is the chi squared with one “degree of freedom”
12
Expected Value of a Linear Translation
Z = aX+b E[Z] = aE[X] + b Proof is trivial using the definition of the expected value and the fact that the density integrates to 1 to have E[b]=b.
13
Normal(,) Variable From the definition of the random variable, is the mean. Proof in Rice (119) uses the linear translation. If X ~ N[0,1], X + ~ N(,)
14
Cauchy Random Variables
f(x)=(1/) 1/(1+x2) Mean does not exist. No higher moments exist. If X~N[0,1] and Y ~ N[0,1] then X/Y has the Cauchy distribution. Many applications obtain estimates of interesting quantities as ratios of estimators that are normally distributed.
15
Cauchy Random Sample
16
Expected Value of a Function of X
Y=g(X) One to one case E[Y] = expected value of Y(X) – find the distribution of the new variable E[g(X)] = x g(x)f(x) will equal E[Y] Many to one case – similar argument. Proceed without the transformation of the random variable. E[g(X)] is generally not equal to g(E[X]) if g(X) is not linear
17
Linear Translation Z = aX+b E[Z] = E[aX+b] E[Z] = aE[X] + b
Proof is trivial using the definition of the expected value and the fact that the density integrates to 1 to E[b]=b.
18
Powers of x - Moments Moment = E[Xk] for positive integer x
Raw moment: E[Xk] Central moment: E[(X – E[X])k] Standard notation E[Xk] = k E[(X – E[X])k] = k Mean = 1 =
19
Variance as a g(X) Variance = E[(X – E[X])2]
Standard deviation = square root of variance is usually more interesting
20
Variance of a Translation: Y = a + bX
Var[a] = 0 Var[bX] = b2Var[X] Standard deviation of Y = |b|S.D.(X)
21
Shortcut Var[X] = E[X2] - {E[X]}2
22
Bernoulli Prob(X=1)=; Prob(X=0)=1- E[X] = 0(1- ) + 1 =
Var[X] = - 2 = (1- )
23
Poisson: Factorial Moment
24
Normal Moments
25
Gamma Random Variable
26
Chi Squared [1] Chi squared [1] = Gamma(½, ½) P = ½ , = ½
Mean = P/ = (½)/(½) = 1 Variance = P/ 2 = (½)/[(½)2] = 2
27
Higher Moments Skewness: 3. Kurtosis: 4.
0 for all symmetric distributions (not just the normal) Standardized measure 3/3 Kurtosis: 4. Standardized 4/4. Compare to normal, 3 Degree of excess = 4/4 – 3.
28
Symmetric and Skewed Distributions
29
Kurtosis: t[5] vs. Normal
Kurtosis of normal(0,1) = 3, Excess = 0 Excess Kurtosis of t[k] = 6/(k-4); for t[5] = 6/(5-4) = 6.
30
Approximations for g(X)
g(X) = continuous function g() exists Continuous first derivative not equal to zero at Taylor series approximation around mu g(X) = g() g’()(X- ) ½ g’’()(X- ) (+ higher order terms)
31
Approximation to the Mean
g(X) ~ g() g’()(X- ) ½ g’’()(X - )2 E[g(X)] ~ E[approximation] = g() ½ g’’() E[(X - )2] = g() + ½ g’’() 2
32
Example: N[, ]. g(X)=exp(X). True mean = exp( + 2/2). Approximation: = exp() + ½ exp() 2 Example: =0, s = 1, True mean = exp(.5) = Approximation = exp(0) + .5*exp(0)*1 =
33
Delta method: Var[g(X)]
Use linear approximation g(X) ~ g() + g’()(X - ) Var[g(X)] ~ Var[approximation] = [g’()]22 Example: Var[X2] ~ (2)22
34
Delta Method – x ~ N[, 2] y = g(x) = exp(x) ~ lognormal Exact
E[y] = exp( + ½ 2) Var[y] = exp(2 + 2)[exp(2) – 1] Approximate E*[y] = exp() + ½ exp() 2 V*[y] = [exp()]2 2 N[0,1], exact mean and variance are exp(.5) =1.648 and exp(1)(exp(1)-1) = Approximations are 1.5 and 1 (!)
35
Moment Generating Function
Let g(X) = exp(tX) M(t) = E[exp(tX)] = the moment generating function for random variable X.
36
MGF Bernoulli P(x) = (1-) for x=0 and for x=1
E[exp(tX)] = (1- )exp(0t) + exp(1t) = (1 - ) + exp(t).
37
MGF Poisson
38
MGF Gamma
39
MGF Normal MX(t) for X ~ N[0,1] is exp(½ t2)
MY(t) for Y = X + is exp(t)MX(t) = exp[t + ½ 2t2] This is the moment generating function for N[,2]
40
Generating the Moments
rth derivative of M(t) evaluated at t = 0 gives the rth raw moment, r’ M(r)(t) = drM(t)/dtr |t=0 = equals rth raw moment.
41
Poisson MGF M(t) = exp((exp(t) – 1)); M(0)=1
M’(t) = M(t) * exp(t); M’(0)= = M’(0)=1 1 = 2’ = E[X2] = M’’(0) = M’(0) exp(0) exp(0)M(0) = 2 + Variance = 2’ - 2 =
42
Useful Properties MGF of X = MX(t) and y = a+bX then
MY(t) for y is exp(at)MX(bt) For independent X and Y, MX+Y (t) = is MX(t)MY(t) The sequence of moments does not uniquely define the distribution
43
Side Results MGF MX(t) = E[exp(tx)] does not always exist.
Characteristic function E[exp(itx)] always exists. Used to prove central limit theorems Cumulant generating function logMX(t) is sometimes useful. Cumulants are functions of moments. First cumulant is the mean, second is the variance.
44
Part 2 – B Covariance and Correlation
45
Covariance Random variables X,Y with joint discrete distribution p(X,Y) or continuous density f(x,y). Covariance = E({X – E[X]}{Y-E[Y]}) = E[XY] – E[X] E[Y]. (Note, Covariance of X,X = Var[X]. Connection to joint distribution and covariation
46
Correlation and Covariance
47
Correlated Populations
48
Correlated Variables X1 and X2 are independent with means 0 and standard deviations 1. Y = aX1 + bX2. Choose a and b such that X1 and Y have means 0, standard deviation 1 and correlation rho. Var[Y] = a2 + b2 = 1 Cov[X1,Y] = a = . b = sqr(1 – 2)
49
Conditional Distributions
f(y|x) = f(y,x) / f(x) Conditional distribution of y given a realization of x Conditional mean = mean of the conditional random variable = regression function Conditional variance = variance of conditional random variable = scedastic function
50
Litigation Risk Analysis
Form probability tree for decisions and outcomes Determine conditional expected payoffs (gains or losses) Choose strategy to optimize expected value of payoff function (minimize loss or maximize (net) gain.
51
Litigation Risk Analysis: Using Conditional Probabilities to Determine a Strategy
P(Upper path) = P(Causation|Liability,Document)P(Liability|Document)P(Document) = P(Causation,Liability|Document)P(Document) = P(Causation,Liability,Document) = .7(.6)(.4)= (Similarly for lower path, probability = .5(.3)(.6) = .09.) Two paths to a favorable outcome. Probability = (upper) .7(.6)(.4) + (lower) .5(.3)(.6) = = .258. How can I use this to decide whether to litigate or not? Suppose the cost to litigate = $1,000,000 and a favorable outcome pays $3,000,000. What should you do?
52
Joint Normal Random Variables
53
Conditional Normal
54
Y and Y|X Y X X
55
Application: Conditional Expected Profits and Risk
You must decide how many copies of your self published novel to print . Based on market research, you believe the following distribution describes X, your likely sales (demand). x P(X=x) (Note: Sales are in thousands. Convert your final result to dollars after all computations are done by multiplying your final results by $1,000.) Printing costs are $1.25 per book. (It’s a small book.) The selling price will be $ Any unsold books that you print must be discarded (at a loss of $1.25/copy). You must decide how many copies of the book to print, 25, 40, 55 or 70. (You are committed to one of these four – 0 is not an option.) A. What is the expected number of copies demanded. B. What is the standard deviation of the number of copies demanded. C. Which of the four print runs shown maximizes your expected profit? Compute all four. D. Which of the four print runs is least risky – i.e., minimizes the standard deviation of the profit (given the number printed). Compute all four. E. Based on C. and D., which of the four print runs seems best for you?
58
Compute expected profit given the decision how many to print.
60
Expected Profit Given Print Run
62
Run=70,000 Run=55,000 Run=40,000 Run=25,000
63
Run=70,000 55,000 surely dominates 70,000 Run=55,000 Run=40,000 Run=25,000
64
Run=70,000 40,000 should dominate 25,000 for any but the most extremely risk averse decision maker. Run=55,000 Now what? Run=40,000 Run=25,000
65
On average, does the number of doctor visits by a household vary systematically with income? YES, negatively in fact. The data suggest that people with higher incomes are healthier and require fewer doctor visits.
67
----------------------------------------------------------------------
Poisson Regression Dependent variable DOCTOR VISITS Variable| Coefficient Standard Error b/St.Er. P[|Z|>z] Mean of X Constant| *** AGE| *** EDUC| *** FEMALE| *** MARRIED| INCOME| *** HHKIDS| ***
68
Useful Theorems - 1 E[Y] = EX[EY[Y|X]] Expectation over X of EY[Y|X]
Law of Iterated Expectations
69
Example: Hierarchical Model
70
Useful Theorems - 2 Decomposition of variance
Var[Y] = Var[E[Y|X]] + E[Var[Y|X]]
72
Bivariate Normal
73
Useful Theorems - 3 Cov(X,Y)=Cov(X,E[Y|X])
In the hierarchical model, E[y|x]=x so Cov(X,Y)=Cov(X, X)= Var[X]= /2
74
Mean Squared Error
75
Minimum MSE Predictor
76
Variance of the Sum of X and Y
Var[X+Y] = EYEX[ {(X+Y) - (X + Y) }2 ] = EYEX[ {(X- X) + (Y- Y)}2] = EYEX[ (X - X)2] + EYEX[(Y- Y )2] + 2 EYEX[(X- X)(Y- Y)] = EX[ (X - X)2] + EY[(Y- Y )2] + 2 EYEX[(X- X)(Y- Y)] = Var[X] + Var[Y] + 2Cov(X,Y)
77
Variance of Weighted Sum
Var[aX+bY] = Var[aX] + Var[bY] +2Cov(aX,bY) = a2Var[X] + b2Var[Y] + 2ab Cov(X,Y). Also, Cov(X,Y) is the numerator in ρxy, so Cov(X,Y) = ρxy σx σy.
78
Application - Portfolio
You have $1000 to allocate between assets A and B. The yearly returns on the two assets are random variables rA and rB. The means of the two returns are E[rA] = μA and E[rB] = μB The standard deviations (risks) of the returns are σA and σB. The correlation of the two returns is ρAB You will allocate a proportion w of your $1000 to A and (1-w) to B.
79
Risk and Return Your expected return on each dollar is
E[wrA + (1-w)rB] = wμA + (1-w)μB The variance your return on each dollar is Var[wrA + (1-w)rB] = w2 σA2 + (1-w)2σB2 + 2w(1-w)ρABσAσB The standard deviation is the square root.
80
Risk and Return: Example
Suppose you know μA, μB, ρAB, σA, and σB (You have watched these stocks for many years.) The mean and standard deviation are then just functions of w. I will then compute the mean and standard deviation for different values of w. For example, μA = .04, μB, = .07 σA = .02, σB,=.06, ρAB = -.4 E[return] = w(.04) + (1-w)(.07) = w SD[return] = sqr[w2(.022)+ (1-w)2(.062) + 2w(1-w)(-.4)(.02)(.06)] = sqr[.0004w (1-w) w(1-w)]
82
Mean and Variance of a Sum
83
Extension: Weighted Sum
84
More General Portfolio Problem
85
Optimal Portfolio?
86
Part 2-C – Sums of Random Variables
87
Sequences of Independent Random Variables
x1,x2,…,xn = a set of n random variables Same (marginal) probability distribution, f(x) Finite identical mean μ and variance σ2 Statistically independent IID = independent identically distributed This is a “random sample” from the population f(x).
88
The Sample Mean
89
Convergence of a Random Variable to a Constant
90
Convergence in Mean Square
If E[Xn ] = μ and Var[Xn ] 0 as n Then Xn converges in mean square to μ Slightly Broader Extension If E[Xn ] μ as n Var[Xn ] 0 as n Then Xn converges in mean square to μ. Xn + (1/n) converges in mean square to μ.
91
Convergence in Mean Square: The top figure is a histogram for 1,000 means of samples of 4; the center is for samples of 9, the lowest one is for samples of 15. The vertical bars go through 7, 10 and 13 on all three figures.
92
Convergence of Means
93
Probability Limits: Plim xn
94
Probability Limits and Expectations
What is the difference between E[xn] and plim xn? Consider: X = n with prob(X=n)=1/n X = 1 with prob(X=1)=1 – 1/n E[X]=2 – 1/n 2 Plim(X) = 1
95
The Slutsky Theorem Assumptions: If
xn is a random variable such that plim xn = θ. For now, we assume θ is a constant. g(.) is a continuous function with continuous derivatives. g(.) is not a function of n. Conclusion: Then plim[g(xn)] = g[plim(xn)] assuming g[plim(xn)] exists. Works for probability limits. Does not work for expectations.
96
Multivariate Slutsky Theorem
Plim xn = a, Plim yn = b g(xn,yn) is continuous, has continuous first derivatives and exists at (a,b). Plim g(xn,yn) = g(a,b) Generalizes to K functions of M random variables
97
Monte Carlo Integration Using the Law of Large Numbers
98
Application For Normal(2,1.52), E[exp(x)] = exp(2 + ½1.52) = 22.76
Draw 10,000 random U(0,1) draws. Transform to x ~ N(0,1) then z = *x Compute q=exp(z) and average 10,000 draws on q. My result was
99
Limit Results Mean converges in probability to . Variance goes to zero. If n is finite, what can be said about its behavior? Objective: characterize the distribution of the mean when n is large but finite Strategy: find a limit result then use it to approximate for finite n.
100
A Finite Sample Distribution
Means of 1000 samples of 8 observations from U[0,1].
101
Central Limit Theorems
102
Limiting Distributions
Xn has probability density fXn(Xn) and cdf FXn(Xn). If FXn(Xn) F(X) as n , then F(X) is the limiting distribution. (At points where F(X) is continuous in X.)
103
Lindeberg – Levy Central Limit Theorem
104
Other Central Limit Theorems
Lindeberg Levy for i.i.d. Lindeberg Feller – heteroscedastic. Variances may differ Lyupanov: distributions may differ Extensions - time series with some covariance across observations.
105
Rough Approximations
106
A Useful Convergence Result
107
Combine Slutsky with the Central Limit Theorem
General Result: if Xn() F(X|) and if yn , then Xn(yn) F(X|)
108
Asymptotic Distributions
An asymptotic distribution is an approximation to a true finite n distribution based on a result found for the limiting distribution (with infinite n)
109
Asymptotic Distribution
110
Appendix
111
The Chebychev Inequality
For any random variable with finite mean and variance 2, Prob[|X-|/ > k] < 1/k2 Prob X is farther than k standard deviations from the mean is less than or equal to 1/k2. Useful for proofs, not for practical computations.
113
Normal Approximation to Binomial
Binomial (n,p) equals the sum of n Bernoulli’s with parameter p. Each Bernoulli X has = p and 2 = p(1-p). Sum of n variables is approximately normal with mean np and variance np(1-p).
114
Approximation to binomial with n = 48, p=.25
115
Demoivre’s Normal Approximation
The binomial density function has n=48, θ=.25, so μ = 12 and σ = 3. The normal density plotted has mean 12 and standard deviation 3.
116
Using deMoivre’s Approximation
Total P[8 < x < 15]= P[(8-12)/3<z<(15-12)/3]= P[-1.33 < z < 1]= P[z < 1] – P[z < -1.33]= – = % error The binomial has n=48, θ=.25, so μ = 12 and σ = 3. The normal distribution plotted has mean 12 and standard deviation 3. What happened?
117
Continuity Correction
When using a continuous distribution (normal) to approximate a discrete probability (binomial), subtract .5 from the lowest value in the range and add .5 to the highest value in the range. (The correction becomes less important as n increases.)
118
Correcting deMoivre’s Approximation
Total P[7.5 < x < 15.5]= P[(7.5-12)/3<z<( )/3]= P[-1.5 < z < 1.166]= P[z < 1.166] – P[z < -1.5]= – = % error The binomial has n=48, θ=.25, so μ = 12 and σ = 3. The normal distribution plotted has mean 12 and standard deviation 3.
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.