Regression Hal Varian 10 April 2006. What is regression? History Curve fitting v statistics Correlation and causation Statistical models Gauss-Markov.

Slides:



Advertisements
Similar presentations
Multiple Regression Analysis
Advertisements

Chapter 12 Inference for Linear Regression
Kin 304 Regression Linear Regression Least Sum of Squares
The Multiple Regression Model.
Chap 12-1 Statistics for Business and Economics, 6e © 2007 Pearson Education, Inc. Chapter 12 Simple Regression Statistics for Business and Economics 6.
6-1 Introduction To Empirical Models 6-1 Introduction To Empirical Models.
Linear regression models
Lecture 9 Today: Ch. 3: Multiple Regression Analysis Example with two independent variables Frisch-Waugh-Lovell theorem.
The General Linear Model. The Simple Linear Model Linear Regression.
1 Chapter 2 Simple Linear Regression Ray-Bing Chen Institute of Statistics National University of Kaohsiung.
Lecture 4 Econ 488. Ordinary Least Squares (OLS) Objective of OLS  Minimize the sum of squared residuals: where Remember that OLS is not the only possible.
The Simple Linear Regression Model: Specification and Estimation
Reading – Linear Regression Le (Chapter 8 through 8.1.6) C &S (Chapter 5:F,G,H)
Chapter 10 Simple Regression.
Chapter 12 Simple Regression
Econ Prof. Buckles1 Multiple Regression Analysis y =  0 +  1 x 1 +  2 x  k x k + u 1. Estimation.
Lecture 25 Multiple Regression Diagnostics (Sections )
Note 14 of 5E Statistics with Economics and Business Applications Chapter 12 Multiple Regression Analysis A brief exposition.
FIN357 Li1 Multiple Regression Analysis y =  0 +  1 x 1 +  2 x  k x k + u 1. Estimation.
Topics: Regression Simple Linear Regression: one dependent variable and one independent variable Multiple Regression: one dependent variable and two or.
Empirical Estimation Review EconS 451: Lecture # 8 Describe in general terms what we are attempting to solve with empirical estimation. Understand why.
Analysis of Individual Variables Descriptive – –Measures of Central Tendency Mean – Average score of distribution (1 st moment) Median – Middle score (50.
Linear Regression Models Powerful modeling technique Tease out relationships between “independent” variables and 1 “dependent” variable Models not perfect…need.
Correlation and Regression Analysis
Regression and Correlation Methods Judy Zhong Ph.D.
Introduction to Linear Regression and Correlation Analysis
Class 4 Ordinary Least Squares SKEMA Ph.D programme Lionel Nesta Observatoire Français des Conjonctures Economiques
Chapter 11 Simple Regression
Hypothesis Testing in Linear Regression Analysis
2-1 MGMG 522 : Session #2 Learning to Use Regression Analysis & The Classical Model (Ch. 3 & 4)
Statistics for Business and Economics 7 th Edition Chapter 11 Simple Regression Copyright © 2010 Pearson Education, Inc. Publishing as Prentice Hall Ch.
1 Experimental Statistics - week 10 Chapter 11: Linear Regression and Correlation Note: Homework Due Thursday.
Random Regressors and Moment Based Estimation Prepared by Vera Tabakova, East Carolina University.
1 Multiple Regression Analysis y =  0 +  1 x 1 +  2 x  k x k + u.
Inference for Regression Simple Linear Regression IPS Chapter 10.1 © 2009 W.H. Freeman and Company.
Multiple Regression Petter Mostad Review: Simple linear regression We define a model where are independent (normally distributed) with equal.
1 Multiple Regression A single numerical response variable, Y. Multiple numerical explanatory variables, X 1, X 2,…, X k.
STA 286 week 131 Inference for the Regression Coefficient Recall, b 0 and b 1 are the estimates of the slope β 1 and intercept β 0 of population regression.
Tutorial 4 MBP 1010 Kevin Brown. Correlation Review Pearson’s correlation coefficient – Varies between – 1 (perfect negative linear correlation) and 1.
STA291 Statistical Methods Lecture LINEar Association o r measures “closeness” of data to the “best” line. What line is that? And best in what terms.
Multiple Regression. Simple Regression in detail Y i = β o + β 1 x i + ε i Where Y => Dependent variable X => Independent variable β o => Model parameter.
Chapter 4 The Classical Model Copyright © 2011 Pearson Addison-Wesley. All rights reserved. Slides by Niels-Hugo Blunch Washington and Lee University.
Ch14: Linear Least Squares 14.1: INTRO: Fitting a pth-order polynomial will require finding (p+1) coefficients from the data. Thus, a straight line (p=1)
Linear Models Alan Lee Sample presentation for STATS 760.
1 Prof. Dr. Rainer Stachuletz Multiple Regression Analysis y =  0 +  1 x 1 +  2 x  k x k + u 1. Estimation.
I271B QUANTITATIVE METHODS Regression and Diagnostics.
Chapter 11: Linear Regression and Correlation Regression analysis is a statistical tool that utilizes the relation between two or more quantitative variables.
Tutorial 5 Thursday February 14 MBP 1010 Kevin Brown.
1 AAEC 4302 ADVANCED STATISTICAL METHODS IN AGRICULTURAL RESEARCH Part II: Theory and Estimation of Regression Models Chapter 5: Simple Regression Theory.
Statistics 350 Lecture 2. Today Last Day: Section Today: Section 1.6 Homework #1: Chapter 1 Problems (page 33-38): 2, 5, 6, 7, 22, 26, 33, 34,
Statistics 350 Review. Today Today: Review Simple Linear Regression Simple linear regression model: Y i =  for i=1,2,…,n Distribution of errors.
The “Big Picture” (from Heath 1995). Simple Linear Regression.
STA302/1001 week 11 Regression Models - Introduction In regression models, two types of variables that are studied:  A dependent variable, Y, also called.
Lecture 6 Feb. 2, 2015 ANNOUNCEMENT: Lab session will go from 4:20-5:20 based on the poll. (The majority indicated that it would not be a problem to chance,
Chapter 20 Linear and Multiple Regression
Chapter 4 Basic Estimation Techniques
Regression Analysis AGEC 784.
Basic Estimation Techniques
The Simple Linear Regression Model: Specification and Estimation
More on Specification and Data Issues
Chapter 12: Regression Diagnostics
More on Specification and Data Issues
Basic Estimation Techniques
CHAPTER 29: Multiple Regression*
CHAPTER 26: Inference for Regression
Linear regression Fitting a straight line to observations.
Simple Linear Regression
More on Specification and Data Issues
Regression Models - Introduction
Presentation transcript:

Regression Hal Varian 10 April 2006

What is regression? History Curve fitting v statistics Correlation and causation Statistical models Gauss-Markov theorem Maximum likelihood Conditional mean What can go wrong… Examples

Francis Galton, 1877 Plotted first regression line: Diameter of sweetpeas v diameter of parents Heights of fathers v heights of sons Sons of unusually tall fathers tend to be tall, but shorter than their fathers. Galton called this “regression to mediocrity”. But this is also true the other way around! Regression to the mean fallacy. Pick the lowest scoring 10% on the midterm and give them extra tutoring If they do better on the final, what can you conclude? Did the tutoring help?

Regression analysis Assume a linear relation between two variables and estimate unknown parameters y t = a + b x t + e t for t= 1,…,T observed = fitted + error or residual dependent variable ~ independent variables/predictors/correlates

Curve fitting v regression Often choose (a,b) to minimize the sum of squared residuals (“least squares”) Why not absolute value of residuals? Why not fit x t = a + b y t ? How much can you trust the estimated values? Need a statistical model to answer these questions! Linear regression: linear in parameters Nonlinear regression, local regression, general linear model, general additive model: same principles apply

Possible goals Estimate parameters (a, b and error variance) Test hypotheses (such as “x has no influence on y”) Make predictions about y conditional on observing a new x-value Summarize data (most common unstated goal!)

Summarizing relationships Would like to be able to interpret regression as “causal” “If x changes by  x, then y will on average change by  x b.” Correlation v causation Compare the time on my wristwatch with the time on your wristwatch… Even ideally, best you can say is: “When x changes by  x in the sample, then on average y changes by  x b in the sample.”

Problem with causality There may be a “third cause” “my watch time” and “your watch time” both depend on NIST time Economics example income ~ b education + (unobserved IQ+other) education ~ IQ Higher income is associated with higher education in sample, but b is a biased estimate of partial effect of education on income Need a controlled experiment or more elaborate estimation technique to resolve this “simultaneous equations bias”

Statistical regression model y t = a + b x t +  t for t= 1,…,T Think of random variable  t as the sum of the other omitted effects What are attractive properties for error term? E  t = 0 Var  t = constant E  t  s  0 (errors are independent) E x t  t = 0 (errors are conditionally uncorrelated with explanatory variables – often problematic for reasons on last slide! Exogenous v endogenous.) Have to ask: how do the variables you don’t observe affect the variables you do observe?

Optimality properties Gauss-Markov theorem: If the error term has these properties, then the linear regression estimates of (a,b) are BLUE = “best linear unbiased estimates” = out of all unbiased estimates that are linear in y t the least squares estimates have minimum variance. If  t are Normal IID distributed, then the OLSQ estimates are maximum likelihood estimates

Conditional means In the regression model, note that the expected value of y t is a + b x t. So the conditional mean is linear in x t, which is another interpretation of regression. More generally, can think of regression model as being: E y t = f(x t, b)

Regression output Estimates of parameters Standard errors of estimates and error term t-statistics = estimate/se and p-values R 2 = goodness of fit measure Total SS = Fitted SS + Residual SS R 2 = Fitted SS / Total SS

Example from R > x <- 1:100 > y <- x + 10*rnorm(100) > summary(lm(y~x)) Coefficients: Estimate Std. Error t value Pr(>|t|) (Intercept) x <2e-16 *** Residual standard error: on 98 degrees of freedom Multiple R-Squared: ,Adjusted R- squared :

What can go wrong? Nonlinear relationship Try quadratic, interaction term, logs, etc. Var  t is not constant Heteroskedasticity – affects testing not estimates Take logs or use weighted least squares Serial correlation – affects testing and prediction accuracy Use time series methods Multiple regression – colinearity Socks ~ right shoes + left shoes + shoes + error

What can go wrong, cont Errors in variables Underestimate magnitude of true effect Omitted variable bias Bias depending on correlation of omitted with included variables Simultaneous equations bias Third cause alluded to earlier, need to estimate full model or use controlled experiment Outliers Non-normality of errors and influential observations – remove them or use robust estimation

Diagnostics Look at residuals!! R allows you to plot various regression diagnostics reg <- lm(y~x) plot(reg) Examples to follow…