Purpose of Regression Analysis

Purpose of Regression Analysis
1. Estimate a relationship among economic variables, such as y = f(x). 2. Forecast or predict the value of one variable, y, based on the value of another variable, x.

Weekly Food Expenditures
y = dollars spent each week on food items. x = consumer’s weekly income. The relationship between x and the expected value of y , given x, might be linear: E(y|x) = 1 + 2 x

Probability Distribution f(y|x=480)
of Food Expenditures if given income x=$480.

Probability Distribution of Food
f(y|x) f(y|x=480) f(y|x=800) y|x=480 y|x=800 y Probability Distribution of Food Expenditures if given income x=$480 and x=$800.

{ Average Expenditure E(y|x) E(y|x)=1+2x E(y|x) 2= E(y|x) x x
x (income) The Economic Model: a linear relationship between average expenditure on food and income.

. . Homoskedastic Case E(y|x) = 1 + 2 x f(yt) yt xt x1=480 x2=800
expenditure . . E(y|x) = 1 + 2 x xt x1=480 x2=800 income The probability density function for yt at two levels of household income, x t

. . . Heteroskedastic Case f(yt) yt x1 x2 x3 x t
expenditure . . . x1 x2 x3 x t income The variance of yt increases as household income, x t , increases.

The Error Term y is a random variable composed of two parts:
I. Systematic component: E(y|x) = 1 + 2x This is the mean of y. II. Random component: ε = y - E(y) = y - 1 - 2x This is called the random error. Together E(y) and ε form the model: y =  2x + ε

y . y4 . y3 . y2 . y1 x1 x2 x3 x4 x

y . y4 . y3 . y y2 . y1 x1 x2 x3 x4 x

. . . . { y y4 ε4 E(y) = 1 + 2x y3 } ε3 y ε2 y2 ε1 y1 x1 x2 x3 x4 x
The relationship among y, εand the true regression line.

Why must the stochastic error term be present in a regression equation
Many minor influences on Y are omitted from the equation (for instance, because data are unavailable). It is virtually impossible to avoid some sort of measurement error in at least one of the equation’s variables. The underlying theoretical equation might have a different functional form (or shape) than the one chosen for the regression. For example, the underlying equation might be nonlinear in the variables for a linear regression. All attempts to generalize human behavior must contain at least some amount of unpredictable or purely random variation.

The Assumptions of Simple Linear Regression Models
1. The value of y, for each value of x, is y =  2x + ε 2. The average value of the random error e is: E(ε) = 0 3. The variance of the random error e is: var(ε) = 2 = var(y) 4. The covariance between any pair of e is: cov(εi , εj) = cov(yi ,yj) = 0 for i  j 5. x must take at least two different values so that x is not a constant. 6. e is normally distributed with mean 0, var(ε)=2 (optional) ε ~ N(0,2)

Population regression line: E(y t|x t) = 1 + 2x t
Population regression values: y t = 1 + 2x t + ε t Population regression line: E(y t|x t) = 1 + 2x t Sample regression values: y t = b1 + b2x t + ε t Sample regression line: y t = b1 + b2x t ^ ^

The relationship among y, ε and the fitted regression line.
^ ε4 { y = b1 + b2x ^ . ^ y4 ^ y3 . ^ . ε3 } . y2 y3 ^ ε2 { . ^ y2 ^ y1 . ^ } ε1 . y1 x1 x2 x3 x4 x ^ The relationship among y, ε and the fitted regression line.

y t = 1 + 2x t + ε t εt = y t － 1 － 2x t
Minimize error sum of squared deviations: S(1,2) =  (y t － 1 － 2x t )2 T t =1 T =  εt 2 t =1

Minimize S(1,2) w. r. t. 1 and 2:
S(1,2) = (y t － 1 － 2x t )2 t =1 T S() = － 2 (y t － 1 － 2x t ) 1 S() = －2 x t (y t － 1 － 2x t ) 2 Set each of these two derivatives equal to zero and solve these two equations for the two unknowns: 1, 2

. . . Minimize w. r. t. 1 and 2: S() = (y t － 1 － 2x t )2 S(.)
 S(.)  < .  i  S(.)   =  S(.)  >  i  i . bi i

To minimize S(.), you set the two derivatives equal to zero to get:
= － 2 (y t－ b1 － b2x t ) = 0 1 S() = － 2 x t (y t －b1 － b2x t ) = 0 2 When these two terms are set to zero, 1 and 2 become b1 and b2 because they no longer represent just any value of 1 and 2 but the special values that correspond to the minimum of S() .

x t y t － b1 x t － b2  xt = 0 Tb1 + b2  x t = y t
－2 (y t － b1 － b2x t ) = 0 －2 x t (y t － b1 － b2x t ) = 0 y t － Tb1 － b2  x t = 0 x t y t － b1 x t － b2  xt = 0 2 Tb b2  x t = y t b1 x t + b2  xt = x t y t 2

x t yt － T x y x t2 － T x2 T x t yt －  x t y t
Tb b2  x t = y t Normal Equation b1 x t + b2  xt = x t y t 2 Solve for b1 and b2 using definitions of x and y T x t yt －  x t y t T x t － ( x t) 2 b2 = x t yt － T x y x t2 － T x2 = b1 = y － b2 x

Interpretation of Coefficients, b1 and b2
b2 represents an estimate of the mean change in y responding to a one-unit change in x. b1 is an estimate of the mean y when x = It must be very careful to interpret the estimated intercept since we usually do not have any data points near x = 0. Note that regression analysis cannot be interpreted as a procedure for establishing a cause-and-effect relationship between variables.

Simple Linear Regression Model
yt = 1 + 2 x t +  t yt = demand for cars x t = prices For a given level of x t, the expected level of demand for cars will be: E(yt|x t) = 1 + 2 x t

Assumptions of the Simple Linear Regression Model
1. yt = 1 + 2x t +  t 2. E( t) = 0 <=> E(yt I xt) = 1 + 2x t 3. var( t) = 2 = var(yt) 4. cov( i, j) = cov(yi,yj) = 0 5. x t is not constant (no perfect collinearity) 6.  t~N(0,2) <=> yt~N(1+ 2x t,2)

The population parameters 1 and 2 are unknown population constants.
The formulas that produce the sample estimates b1 and b2 are called the estimators of 1 and 2. When b1 and b2 are used to represent the formulas rather than specific values, they are called estimators of 1 and 2 which are random variables because they are different from sample to sample.

Estimators are Random Variables
( estimates are not ) If the least squares estimators b1 and b2 are random variables, then what are their their means, variances, covariances and probability distributions? Compare the properties of alternative estimators to the properties of the least squares estimators.

The Expected Values of b1 and b2
The least squares formulas (estimators) in the simple regression case: b2 = Txtyt - xt yt Txt -(xt) 2 b1 = y - b2x where y = yt / T and x = x t / T

Txtt - xt t b2 = 2 + Txt -(xt) TxtE(t) - xt E(t)
Substitute in yt = 1 + 2x t +  t to get: b2 = 2 + Txtt - xt t Txt -(xt) 2 The mean of b2 is: E(b2) = 2 + TxtE(t) - xt E(t) Txt -(xt) 2 Since E(t) = 0, then E(b2) = 2 .

b2 is an unbiased estimator of 2.
The result E(b2) = 2 means that the distribution of b2 is centered at 2. Since the distribution of b2 is centered at 2 ,we say that b2 is an unbiased estimator of 2.

Wrong Model Specification
The unbiasedness result on the previous slide assumes that we are using the correct model. If the model is of the wrong form or is missing important variables, then E(t) = 0, then E(b2) = 2 .

E(b1) = 1 Unbiased Estimator of the Intercept
In a similar manner, the estimator b1 of the intercept or constant term can be shown to be an unbiased estimator of 1 when the model is correctly specified. E(b1) = 1

Equivalent expressions for b2:
(xt  x)yt  y ) xt  x ) 2 Expand and multiply top and bottom by T: b2 = Txtyt  xt yt Txt (xt) 2 xtyt – T x y = xt2 – T x2

x t  x Variance of b2 2 var(b2) =
Given that both yt and t have variance 2, the variance of the estimator b2 is: x t  x 2 2 var(b2) = b2 is a function of the yt values but var(b2) does not involve yt directly.

x t x t  x Variance of b1 b1 = y  b2x var(b1) = 2 Given
the variance of the estimator b1 is: x t 2 var(b1) = 2 x t  x 2

Covariance of b1 and b2 x t  x 2 cov(b1,b2) = 2 x

What factors determine variance and covariance of b1 and b2?
1. The larger the 2, the greater the uncertainty about b1, b2 and their relationship. 2. The more spread out the xt values are then the more confidence we have in b1, b2, etc. 3. The larger the sample size, T, the smaller the variances and covariances. 4. The variance b1 is large when the (squared) xt values are far from zero (in either direction). 5. Changing the slope, b2, has no effect on the intercept, b1, when the sample mean is zero. But if sample mean is positive, the covariance between b1 and b2 will be negative, and vice versa.

Gauss-Markov Theorem Under the first five assumptions of the
simple, linear regression model, the ordinary least squares estimators b1 and b2 have the smallest variance of all linear and unbiased estimators of 1 and 2. This means that b1and b2 are the Best Linear Unbiased Estimators (BLUE) of 1 and 2.

Implications of Gauss-Markov
b1 and b2 are best within the class of linear and unbiased estimators. 2. Best means smallest variance within the class of linear/unbiased. 3. All of the first five assumptions must hold to satisfy Gauss-Markov. 4. Gauss-Markov does not require assumption six: normality. 5. G-Markov is not based on the least squares principle but on the estimation rules of b1 and b2.

G-Markov implications (continued)
6. If we are not satisfied with restricting our estimation to the class of linear and unbiased estimators, we should ignore the Gauss-Markov Theorem and use some nonlinear and/or biased estimator instead. (Note: a biased or nonlinear estimator could have smaller variance than those satisfying Gauss-Markov.) 7. Gauss-Markov applies to the b1 and b2 estimators and not to particular sample values (estimates) of b1 and b2.

yt and t normally distributed
The least squares estimator of 2 and 1 can be expressed as a linear combination of yt: b2 = wt yt x t  x 2 where wt = x t  x b1 = y  b2x This means that b1and b2 are normal since linear combinations of normals are normal.

normally distributed under The Central Limit Theorem
If the first five Gauss-Markov assumptions hold, and sample size, T, is sufficiently large, then the least squares estimators, b1 and b2, have a distribution that approximates the normal distribution with greater accuracy the larger the value of sample size, T.

Probability Distribution of Least Squares Estimators
If one of the above two conditions is satisfied, then the distributions of b1 and b2 are x t  x 2 2 x t b1 ~ N 1 , b2 ~ N 2 , x t  x 2 2

Consistency We would like our estimators, b1 and b2, to collapse onto the true population values, 1 and 2, as sample size, T, goes to infinity. One way to achieve this consistency property is for the variances of b1 and b2 to go to zero as T goes to infinity. Since the formulas for the variances of the least squares estimators b1 and b2 show that their variances do, in fact, go to zero, then b1 and b2, are consistent estimators of 1 and 2.

Estimating the variance of the error term, 2
^ εt = yt b1  b2 x t εt T ^ 2 ^  = t =1 T2 ^  is an unbiased estimator of 2

The Least Squares Predictor, yo
^ Given a value of the explanatory variable, Xo, we would like to predict a value of the dependent variable, yo. The least squares predictor is: yo = b1 + b2 x o ^

Probability Distribution of Least Squares Estimators
b1 ~ N 1 , x t  x 2 2 x t b2 ~ N 2 , x t  x 2 2

x t  x b2 2   var(b2) 2 b2 ~ N 2 ,
Create a standardized normal random variable, Z, by subtracting the mean of b2 and dividing by its standard deviation: b2 2 var(b2)  

   Error Variance Estimation 2 = 2 ^ ^
Unbiased estimator of the error variance: 2 ^ =  εt   Transform to a chi-square distribution:  2 ^   ()

Chi-Square degrees of freedom
Since the errors  t = yt  1  2x t are not observable, we estimate them with the sample residuals ε t = yt  b1  b2x t. Unlike the errors, the sample residuals are not independent since they use up two degrees of freedom by using b1 and b2 to estimate 1 and 2. We get only T2 degrees of freedom instead of T. 

Student-t Distribution
t = ~ t(m) Z V / m where Z ~ N(0,1) and V ~ (m) 2 Provided both Z and V are independent.

2 Z t = ~ t(T-2) V / ( T2) (b2 2) where Z = var(b2)
and var(b2) = 2 ( xi  x )2

^ V = 2 ^ 2 Z t = V / (T-2) (b2 2) var(b2) t = ( T2)

2 notice the cancellations var(b2) = ( xi  x )2 2 ^ ( xi  x )2

(b2 2) 2 ( xi  x )2 ^ t = = var(b2) t = (b2 2) se(b2)

t has a Student-t Distribution
Student t - statistic t = ~ t (T2) (b2 2) se(b2) t has a Student-t Distribution with T2 degrees of freedom.

The Least Squares Predictor, yo
^ Given a value of the explanatory variable, X0, we would like to predict a value of the dependent variable, yo. The least squares predictor is: yo = b1 + b2 x o ^ Prediction error : f = yo  yo = (b1 - 1) + (b2 - 2)x0 – ε0 ^

x o  x  x t  x Prediction error :
f = yo  yo = (b1 - 1) + (b2 - 2)x0 – ε0 ^ E[f ] =E[ yo  yo] = 0 ^ x t  x 2 var( f ) =  1  x o  x f ~ N [0, var( f )]

x o  x  x t  x yo  t(T-2),/2 se( f ) se( f ) = var( f )
Prediction Intervals A (1)x100% prediction interval for yo is: yo  t(T-2),/2 se( f ) ^ se( f ) = var( f ) ^ f = yo  yo ^ x t  x 2 var( f ) =  ^ 1  x o  x

The Least Squares Estimator of Mean Response, o , when x = x0
^ o = b1 + b2 x o ^ Estimation error :  o  E[yo] = (b1 - 1) + (b2 - 2)x0 ^ x t  x 2 var( 0) =  1  x o  x ^

. . . . . . ^ ^ Mean Response  and Predicted Response y f(yt) y1 1
0 y0 X x0 x1 ^ ^ Mean Response  and Predicted Response y

Explaining Variation in yt
Predicting yt without any explanatory variables: (yt  b1) = 0 t = 1 T yt = 1 + εt yt  Tb1 = 0 t = 1 T εt = (yt  1)2 2 t = 1 T b1 = y =  (yt  b1) = 0 εt 2 t = 1 T 1

y . y4 . y3 . y2 . y1 x1 x2 x3 x4 x

y . y4 . y3 . y y2 . y1 x1 x2 x3 x4 x

yt = b1 + b2xt + εt ^ yt = b1 + b2xt ^ Explained variation: Unexplained variation: εt = yt  yt = yt  b1 b2xt ^

. . . . ^ ^ ^ ^ ^ { Y4 ε4 ε2 ε3 ε1 y y4 Y = b1 +b2x y3 } y y2 y1 x1 x2

^ ^ yt = yt + εt using y as baseline yt  y = yt  y + εt ^ cross product term drops out (yty)2 = (yty)2 +εt t = 1 T ^ 2 SST = SSR + SSE

. . . . { y y4 E4 = Y  (b1+b2X) (SSE) b1 + b2x (b1+b2X)  Y (SSR) y3
Y  Y (SST) } y { y2 SSR (SST) . SSE y1 x1 x2 x3 x4 x The relationship among SST, SSR, and SSE.

(yt y)2= yt2 Ty2 SST = Total Variation in yt
SST = total sum of squares SST measures variation of yt around y (yt y)2= yt2 Ty2 t = 1 T SST =

Explained Variation in yt
SSR = regression sum of squares yt = b1 + b2xt ^ Fitted yt values: ^ SSR measures variation of yt around y ^ (yt y)2=b22 (xt x)2 t = 1 T SSR= ^

Unexplained Variation in yt
SSE = error sum of squares ^ ^ εt = ytyt = yt b1  b2xt SSE measures variation of yt around yt ^ (yt yt)2 = εt2 t = 1 T SSE = ^

Analysis of Variance Table
Table Analysis of Variance Table Source of Sum of Mean Variation DF Squares Square Explained SSR MSR =SSR/1 Unexplained T SSE MSE =SSE/(T-2) [= 2] Total T SST ^

Coefficient of Determination
What proportion of the variation in yt is explained? 0 < R2 < 1 SSR SST R2 =

R2 = = 1  Coefficient of Determination = + 1 = + SSR SSE SST
SST = SSR + SSE SST SSR SSE SST SST SST = Dividing by SST SSR SSE SST SST 1 = SSR SST R2 = = 1  SSE

Coefficient of Determination
R2 is only a descriptive measure. R2 does not measure the quality of the regression model. Focusing solely on maximizing R2 is not a good idea.

1. Under H0, t = b2 / se(b2) ~ t(T-2)
In simple linear regression models, there are two ways to test H0: β2 = 0 vs HA: β2 ≠ 0 1. Under H0, t = b2 / se(b2) ~ t(T-2) 2. Under H0, F = MSR / MSE ~ F1, T-2 Note that 1. It can be show that t2(T-2) = F1, T-2 2. F = MSR / MSE = R2 / [(1  R2) / (T  2)]

Regression Computer Output
Typical computer output of regression estimates: Table Computer Generated Least Squares Results (1) (2) (3) (4) (5) Parameter Standard T for H0: Variable Estimate Error Parameter=0 Prob>|T| INTERCEPT X

b1 = b2 = se(b1) = var(b1) = = ^ se(b2) = var(b2) = = ^ se(b1) t = = = 1.84 b1 se(b2) b2 t = = = 0.1283 0.0305

Sources of variation in the dependent variable: Table Analysis of Variance Table Sum of Mean Source DF Squares Square Explained Unexplained Total R-square:

SST = (yty)2 = SSR = (yty)2 = ^ SSE = εt2 = ^ SSE /(T-2) = 2 = ^ SSR SST R2 = = 1 = 0.317 SSE

Reporting Regression Results
This R2 value may seem low but it is typical in studies involving cross-sectional data analyzed at the individual or micro level. A considerably higher R2 value would be expected in studies involving time-series data analyzed at an aggregate or macro level.

Purpose of Regression Analysis

Similar presentations

Presentation on theme: "Purpose of Regression Analysis"— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

Purpose of Regression Analysis

Similar presentations

Presentation on theme: "Purpose of Regression Analysis"— Presentation transcript:

Similar presentations

About project

Feedback