Linear Regression Models Powerful modeling technique Tease out relationships between “independent” variables and 1 “dependent” variable Models not perfect…need.

Slides:



Advertisements
Similar presentations
Applied Econometrics Second edition
Advertisements

Econometric Modeling Through EViews and EXCEL
Multiple Regression Analysis
The Simple Regression Model
Conclusion to Bivariate Linear Regression Economics 224 – Notes for November 19, 2008.
6-1 Introduction To Empirical Models 6-1 Introduction To Empirical Models.
11 Simple Linear Regression and Correlation CHAPTER OUTLINE
Regression Analysis Once a linear relationship is defined, the independent variable can be used to forecast the dependent variable. Y ^ = bo + bX bo is.
A Short Introduction to Curve Fitting and Regression by Brad Morantz
Lecture 4 Econ 488. Ordinary Least Squares (OLS) Objective of OLS  Minimize the sum of squared residuals: where Remember that OLS is not the only possible.
Lecture 8 Relationships between Scale variables: Regression Analysis
1 Lecture 2: ANOVA, Prediction, Assumptions and Properties Graduate School Social Science Statistics II Gwilym Pryce
1 Lecture 8 Regression: Relationships between continuous variables Slides available from Statistics & SPSS page of Social.
Chapter 13 Multiple Regression
1 Lecture 2: ANOVA, Prediction, Assumptions and Properties Graduate School Social Science Statistics II Gwilym Pryce
The Simple Linear Regression Model: Specification and Estimation
Chapter 10 Simple Regression.
BA 555 Practical Business Analysis
Chapter 3 Simple Regression. What is in this Chapter? This chapter starts with a linear regression model with one explanatory variable, and states the.
Econ Prof. Buckles1 Multiple Regression Analysis y =  0 +  1 x 1 +  2 x  k x k + u 1. Estimation.
Chapter 12 Multiple Regression
Statistics for Business and Economics
Regression Hal Varian 10 April What is regression? History Curve fitting v statistics Correlation and causation Statistical models Gauss-Markov.
Lecture 24: Thurs. Dec. 4 Extra sum of squares F-tests (10.3) R-squared statistic (10.4.1) Residual plots (11.2) Influential observations (11.3,
Pengujian Parameter Koefisien Korelasi Pertemuan 04 Matakuliah: I0174 – Analisis Regresi Tahun: Ganjil 2007/2008.
Chapter Topics Types of Regression Models
Linear Regression and Correlation Analysis
Statistical Analysis SC504/HS927 Spring Term 2008 Session 7: Week 23: 7 th March 2008 Complex independent variables and regression diagnostics.
ANOVA ESM February Metal toxicity in fish Clark Fork River (Montana) contaminated with mixture of toxic metals Can trout develop “resistance”
Topic 3: Regression.
Empirical Estimation Review EconS 451: Lecture # 8 Describe in general terms what we are attempting to solve with empirical estimation. Understand why.
Correlation and Regression Analysis
Multiple Linear Regression Analysis
Correlation & Regression
3. Multiple Regression Analysis: Estimation -Although bivariate linear regressions are sometimes useful, they are often unrealistic -SLR.4, that all factors.
Regression and Correlation Methods Judy Zhong Ph.D.
Introduction to Linear Regression and Correlation Analysis
Inference for regression - Simple linear regression
Statistics for Managers Using Microsoft Excel, 4e © 2004 Prentice-Hall, Inc. Chap 12-1 Chapter 12 Simple Linear Regression Statistics for Managers Using.
Hypothesis Testing in Linear Regression Analysis
Regression Method.
Simple Linear Regression
1 Least squares procedure Inference for least squares lines Simple Linear Regression.
Statistics for Business and Economics Chapter 10 Simple Linear Regression.
OPIM 303-Lecture #8 Jose M. Cruz Assistant Professor.
© 2003 Prentice-Hall, Inc.Chap 13-1 Basic Business Statistics (9 th Edition) Chapter 13 Simple Linear Regression.
© 2001 Prentice-Hall, Inc. Statistics for Business and Economics Simple Linear Regression Chapter 10.
Random Regressors and Moment Based Estimation Prepared by Vera Tabakova, East Carolina University.
Lecture 8 Simple Linear Regression (cont.). Section Objectives: Statistical model for linear regression Data for simple linear regression Estimation.
1 Multiple Regression Analysis y =  0 +  1 x 1 +  2 x  k x k + u.
Byron Gangnes Econ 427 lecture 3 slides. Byron Gangnes A scatterplot.
Inference for Regression Simple Linear Regression IPS Chapter 10.1 © 2009 W.H. Freeman and Company.
Chapter Three TWO-VARIABLEREGRESSION MODEL: THE PROBLEM OF ESTIMATION
Simple Linear Regression ANOVA for regression (10.2)
Copyright © 2006 The McGraw-Hill Companies, Inc. All rights reserved. McGraw-Hill/Irwin The Two-Variable Model: Hypothesis Testing chapter seven.
Business Statistics: A Decision-Making Approach, 6e © 2005 Prentice-Hall, Inc. Chap 13-1 Introduction to Regression Analysis Regression analysis is used.
1 Prof. Dr. Rainer Stachuletz Multiple Regression Analysis y =  0 +  1 x 1 +  2 x  k x k + u 1. Estimation.
Quantitative Methods. Bivariate Regression (OLS) We’ll start with OLS regression. Stands for  Ordinary Least Squares Regression. Relatively basic multivariate.
Lecturer: Ing. Martina Hanová, PhD.. Regression analysis Regression analysis is a tool for analyzing relationships between financial variables:  Identify.
1 AAEC 4302 ADVANCED STATISTICAL METHODS IN AGRICULTURAL RESEARCH Part II: Theory and Estimation of Regression Models Chapter 5: Simple Regression Theory.
The Simple Linear Regression Model: Specification and Estimation  Theory suggests many relationships between variables  These relationships suggest that.
Quantitative Methods Residual Analysis Multiple Linear Regression C.W. Jackson/B. K. Gordor.
Chapter 13 Simple Linear Regression
Statistics for Managers using Microsoft Excel 3rd Edition
Chapter 5: The Simple Regression Model
Chapter 3: TWO-VARIABLE REGRESSION MODEL: The problem of Estimation
I271B Quantitative Methods
Simple Linear Regression
Chapter 13 Additional Topics in Regression Analysis
Linear Regression Summer School IFPRI
Presentation transcript:

Linear Regression Models Powerful modeling technique Tease out relationships between “independent” variables and 1 “dependent” variable Models not perfect…need an error term Measurement errors, wrong model, omitted variables, inherent randomness Linear models often misused.

Example: Lake Water Quality Chlorophyll-a (C) widely used indicator – measure of eutrophication Nitrogen (N) associated with eutrophication Q: Golf Course Development. Nitrogen expected to . By how much will C increase/decrease? How should we proceed?

Plot C vs. N

A “Better” Model Explain (single) regression line (model?). Neg. relationship suggests a problem. Omitted variable: Phosphorus (P) Want to tease out effect of N, P separately. Write a Multiple Linear Regression Model: Model designed to “tease out” effect of N and effect of P, separately, on C. (**) Define and interpret variables, parameters.

Estimation Use data to estimate parameter values that give “best fit”: b 0 =-9.4, b 1 =0.3, b 2 =1.2 Answer: A one unit increase in N, results in about a 1.2 unit increase in C. Importance: Omitting phosphorus from model introduced significant bias!!!

Question: US Gas Consumption Gasoline consumption produces many negative byproducts. Policy may be directed at increasing the price of gas to reduce consumption. But what is effect of price change? Question: What is the price elasticity of demand for gasoline in the U.S.?

Some Gasoline Data

Gas Data Cont’d Gas consumption increases through time. But no info here about price. Next plot shows (+) relationship between gas price and gas consumption. Note opposite of demand curve. Something is wrong here… Just as in Eutrophication problem, may have omitted important variables. May have other problems, too.

The OLS “Estimator” Estimator: A rule or strategy for using data to estimate an unknown parameter. Defined before the data are drawn. Ordinary Least Squares (OLS) estimator finds value of parameter that minimizes sum of squared deviations (see C vs. N plot) Several assumptions for OLS estimator to apply to a model

Linear Model The model must be linear Linear in parameters, not in variables. Difference between parameter, variable. Examples:

Transforming Models Previous “Ricker” model is non- linear (in the parameter). Sometimes, can transform model so linear. When plot, graph is nonlinear. Take log of both sides, giving:

CLRM: Assumption 1 Dependent variable (Y) is function of specific set of independent variables (X’s). Linear in parameters Additive error Coefficients are constant but unknown Violations called “specification errors”, e.g. Wrong regressors (a.k.a. indep. vars; X’s) Nonlinearity Changing parameters (e.g. through time)

CLRM: Assumption 2 Disturbances ( i ’s) are independently and identically distributed ~ (0, 2 ) Typically we assume  i ~ N(0, 2 ) Mean = 0 Constant variance,  2 (but unknown) Errors uncorrelated with one another Example of violations: Measurement Bias (seep gas flux) Heteroskedasticity (variance differs). Autocorrelated Errors (disturbances correlated)

CLRM: Assumption 3 It is possible to repeat the sample with same independent variables. If had same levels of explanatory vars, would it be possible to generate same value of Y? Common Violations: Errors in variables – measurement error in X. Autoregression – when lagged dependent variable should be independent variable Simultaneous Equations – several relationships act jointly.

Properties of Estimators Estimators have many properties. “6” is an estimator, but not a very good one. Two main properties we care about: Unbiased: The expected distance of estimator from thing it is estimating is 0. Efficient: Small variance (spread) “6” is biased, but has a very small variance (zero). OLS estimator is unbiased and has minimum variance of all unbiased estimators.

Correlation vs. Causation Now we know just enough to be dangerous! Can estimate how any set of variables affects some other variable….Very Powerful. Problem is: Correlation doesn’t imply Causation! …. Why Data Mining is bad. Chicken production, Global CO 2. May be “spurious” (no underlying relationship) Difficult to tease out statistically. “Granger Causality”

Violations & Consequences ProblemConsequences AutocorrelationUnbiased, wrong inf. HeterskedasticityUnbiased, wrong inf. Contemporaneous Correlation (X, corr.) Biased MulticollinearityUsually OK Omitted VariablesBiased Included RegressorsUnbiased, extra noise True model nonlinearBiased, Wrong inf.

Guide to Model Specification 1. Start with theory to generate model 2. Check assumptions of CLRM 3. Collect and plot data 4. Estimate model, test restrictions  Possibly perform Box-Cox transform 5. Check R 2, and “Adjusted R 2 ” 6. Plot residuals – look for patterns 7. Seek explanations for patterns

What’s a Residual? General form of linear model: Graphically on board.

Residual Plots Residuals vs. Fit Normal Quantile Plot

Back to Gasoline Consumption Recall, interested in how gas consumption is affected by price increase (say $0.10/gal.) Variables: Gas consumption per capita (G) Gas price (Pg) Income (Y) New car price (Pnc) Used car price (Puc)

2 Alternative Specifications Linear specification: Log-log specification (often used with economic data) One way to test specification is Box-Cox Transform (see 3 lectures back)

Results of Linear Model Parameter estimate, (p-value of t-test). Low p-value: “statistically significant” R 2 measures goodness of fit of model. Low p-value of F statistic means model has explanatory power. b0b0 bb b2b2 b3b3 b4b4 R2R2 p (F) -.09 (.08) -.04 (.002).0002 (.000) -.10 (.11) -.04 (.08)

Answer to Question A 1 unit increase in price leads to a.04 unit decrease in gas consumption. Units are: G(1000 gallons), Pg($). So, a $0.10 increase in gas price leads to, on average, a 4 gallon decrease in gas consumption…not much!