Nonlinear Regression Functions

Slides:



Advertisements
Similar presentations
Managerial Economics in a Global Economy
Advertisements

Welcome to Econ 420 Applied Regression Analysis
Economics 20 - Prof. Anderson1 Multiple Regression Analysis y =  0 +  1 x 1 +  2 x  k x k + u 7. Specification and Data Problems.
Stat 112: Lecture 7 Notes Homework 2: Due next Thursday The Multiple Linear Regression model (Chapter 4.1) Inferences from multiple regression analysis.
Irwin/McGraw-Hill © Andrew F. Siegel, 1997 and l Chapter 12 l Multiple Regression: Predicting One Factor from Several Others.
Copyright © 2009 Pearson Education, Inc. Chapter 29 Multiple Regression.
Copyright © 2010 Pearson Education, Inc. Slide
6-1 Introduction To Empirical Models 6-1 Introduction To Empirical Models.
Regression Analysis Once a linear relationship is defined, the independent variable can be used to forecast the dependent variable. Y ^ = bo + bX bo is.
Linear Regression with Multiple Regressors
Multiple Regression Fenster Today we start on the last part of the course: multivariate analysis. Up to now we have been concerned with testing the significance.
Qualitative Variables and
The Use and Interpretation of the Constant Term
Choosing a Functional Form
Chapter 13 Multiple Regression
Copyright © 2010, 2007, 2004 Pearson Education, Inc. *Chapter 29 Multiple Regression.
Chapter 10 Simple Regression.
1Prof. Dr. Rainer Stachuletz Multiple Regression Analysis y =  0 +  1 x 1 +  2 x  k x k + u 7. Specification and Data Problems.
Chapter 12 Simple Regression
Linear Regression with One Regression
Chapter 12 Multiple Regression
Chapter 8 Nonlinear Regression Functions. 2 Nonlinear Regression Functions (SW Chapter 8)
Lecture 19: Tues., Nov. 11th R-squared (8.6.1) Review
Lecture 24: Thurs. Dec. 4 Extra sum of squares F-tests (10.3) R-squared statistic (10.4.1) Residual plots (11.2) Influential observations (11.3,
1 Regression Analysis Regression used to estimate relationship between dependent variable (Y) and one or more independent variables (X). Consider the variable.
Statistical Analysis SC504/HS927 Spring Term 2008 Session 7: Week 23: 7 th March 2008 Complex independent variables and regression diagnostics.
Chapter 11 Multiple Regression.
Lecture 16 – Thurs, Oct. 30 Inference for Regression (Sections ): –Hypothesis Tests and Confidence Intervals for Intercept and Slope –Confidence.
Multiple Linear Regression
Ch. 14: The Multiple Regression Model building
Stat 112: Lecture 9 Notes Homework 3: Due next Thursday
Correlation and Regression Analysis
Spreadsheet Problem Solving
Introduction to Regression Analysis, Chapter 13,
Simple Linear Regression Analysis
Statistics for Managers Using Microsoft Excel, 4e © 2004 Prentice-Hall, Inc. Chap 13-1 Chapter 13 Introduction to Multiple Regression Statistics for Managers.
Multiple Linear Regression Analysis
Chapter 8 Forecasting with Multiple Regression
Hypothesis Tests and Confidence Intervals in Multiple Regressors
Chapter 8 Nonlinear Regression Functions.
Hypothesis Testing in Linear Regression Analysis
Assessing Studies Based on Multiple Regression
Regression Method.
Chapter 12 Multiple Regression and Model Building.
Copyright © 2007 Pearson Education, Inc. Publishing as Pearson Addison-Wesley Slide
Multiple regression - Inference for multiple regression - A case study IPS chapters 11.1 and 11.2 © 2006 W.H. Freeman and Company.
Random Regressors and Moment Based Estimation Prepared by Vera Tabakova, East Carolina University.
Review of Building Multiple Regression Models Generalization of univariate linear regression models. One unit of data with a value of dependent variable.
Lesson Multiple Regression Models. Objectives Obtain the correlation matrix Use technology to find a multiple regression equation Interpret the.
Multiple Regression Petter Mostad Review: Simple linear regression We define a model where are independent (normally distributed) with equal.
Discussion of time series and panel models
STA 286 week 131 Inference for the Regression Coefficient Recall, b 0 and b 1 are the estimates of the slope β 1 and intercept β 0 of population regression.
Stat 112 Notes 9 Today: –Multicollinearity (Chapter 4.6) –Multiple regression and causal inference.
Chapter 22: Building Multiple Regression Models Generalization of univariate linear regression models. One unit of data with a value of dependent variable.
Chapter 14: Inference for Regression. A brief review of chapter 4... (Regression Analysis: Exploring Association BetweenVariables )  Bi-variate data.
1 MGT 511: Hypothesis Testing and Regression Lecture 8: Framework for Multiple Regression Analysis K. Sudhir Yale SOM-EMBA.
I271B QUANTITATIVE METHODS Regression and Diagnostics.
Business Statistics: A Decision-Making Approach, 6e © 2005 Prentice- Hall, Inc. Chap 14-1 Business Statistics: A Decision-Making Approach 6 th Edition.
11 Chapter 5 The Research Process – Hypothesis Development – (Stage 4 in Research Process) © 2009 John Wiley & Sons Ltd.
Basic Business Statistics, 10e © 2006 Prentice-Hall, Inc.. Chap 14-1 Chapter 14 Introduction to Multiple Regression Basic Business Statistics 10 th Edition.
Statistics for Managers Using Microsoft Excel, 5e © 2008 Prentice-Hall, Inc.Chap 14-1 Statistics for Managers Using Microsoft® Excel 5th Edition Chapter.
4-1 MGMG 522 : Session #4 Choosing the Independent Variables and a Functional Form (Ch. 6 & 7)
PO 141: INTRODUCTION TO PUBLIC POLICY Summer I (2015) Claire Leavitt Boston University.
Multiple Regression Analysis: Inference
Correlation and Linear Regression
Chapter 4 Basic Estimation Techniques
Linear Regression with One Regression
Chapter 7: The Normality Assumption and Inference with OLS
Seminar in Economics Econ. 470
Presentation transcript:

Nonlinear Regression Functions

Nolinear in a Single Variable General Ideas Nolinear in a Single Variable Polynomials Logarithm Interactions Between Independent Variables Between Two Dummies Between Continuous and Binary Variables Between Continuous Variables Application

Everything so far has been linear in the X’s. The approximation that the regression function is linear might be good for some variables, but not for others. The multiple regression framework can be extended to handle regression functions that are nonlinear in one or more X.

The Test Score - ST R relation looks approximately linear.

But the Test Score - Income relation looks like it is nonlinear.

General Ideas If a relation between Y and X is nonlinear: The effect on Y of a change in X depends on the value of X - that is, the marginal effect of X is not constant. A linear regression is mis-specified - the functional form is wrong. The estimator of the effect on Y of X is biased - it need not even be right on average. The solution to this is to estimate a regression function that is nonlinear in X.

General Nonlinear Regression Function The General Nonlinear Population Regression Function Assumptions (same); implies that f is the conditional expectation of Y given the X’s. are i.i.d. (same). “enough” moments exist (same idea; the precise statement depends on specific f ). No perfect multicollinearity (same idea; the precise statement depends on the specific f ).

Nonlinear Functions of a Single Independent Variable We’ll look at two complementary approaches: Polynomials in X The population regression function is approximated by a quadratic, cubic, or higher-degree polynomial. Logarithmic transformations Y and/or X is transformed by taking its logarithm. This gives a “percentage” interpretation that makes sense in many applications.

1. Polynomials in X Approximate the population regression function by a polynomial. This is just the linear multiple regression model – except that the regressors are powers of X. Estimation, hypothesis testing, etc. proceed as in the multiple regression model using OLS. The coefficients are difficult to interpret, but the regression function itself is interpretable.

Example: the Test Score - Income relation Incomei = average district income in the ith district (thousdand dollars per capita) Quadratic specification: Cubic specification:

Interpreting the estimated regression function (a) Plot the predicted values

(b) Compute “effects” for different values of X. Predicted change in Test Score for a change in income to 6,000 from 5,000 per capita:

Predicted “effects” for different values of X The “effect” of a change in income is greater at low than high income levels (perhaps, a declining marginal benefit of an increase in school budgets?) Caution! What about a change from 65 to 66? Don’t extrapolate outside the range of the data.

Testing the null hypothesis of linearity, against the alternative that the population regression is quadratic and/or cubic, that is, it is a polynomial of degree up to 3. The hypothesis that the population regression is linear is rejected at the 1% significance level against the alternative that it is a polynomial of degree up to 3.

Summary: polynomial regression functions Estimation: by OLS after defining new regressors. Coefficients have complicated interpretations. To interpret the estimated regression function: plot predicted values as a function of x. compute predicted at different values of x. Hypotheses concerning degree r can be tested by t- and F-tests on the appropriate (blocks of) variable(s). Choice of degree r . plot the data; t- and F-tests, check sensitivity of estimated effects, then judgement.

2. Logarithmic functions of Y and/or X ln(X) = the natural logarithm of X. Logarithmic transforms permit modeling relations in “percentage” terms (like elasticities), rather than linearly.

Three cases:

Interpreting log-linear model, When Xi increases to Xi + 1, predicted Yi becomes . Therefore, the percentage change of is

because Taylor series of f(x) is

Example: Test Score vs. ln(Income) First defining the new regressor, ln(Income). The model is now linear in ln(Income), so the linear-log model can be estimated by OLS. so a 1% increase in Income is associated with an increase in Test Score of 0.36 points on the test. Standard errors, confidence intervals, R2 - all the usual tools of regression apply here. How does this compare to the cubic model?

The linear-log and cubic regression functions

Example: ln(Test Score) vs. ln(Income) First defining a new dependent variable, ln(Test Score), and the new regressor, ln(Income). The model is now a linear regression of ln(Test Score) against ln(Income), which can be estimated by OLS: An 1% increase in Income is associated with an increase of .0554% in Test Score

How does log-log compare to the log-linear model? Neither specification seems to fit as well as the cubic or linear-log.

Summary: Logarithmic transformations Three cases, differing in whether Y and/or X is transformed by taking logarithms. After creating the new variable(s) ln(Y) and/or ln(X), the regression is linear in the new variables and the coefficients can be estimated by OLS. Hypothesis tests and confidence intervals are now standard. The interpretation of differs from case to case. Choice of specification should be guided by judgment (which interpretation makes the most sense in your application?), tests, and plotting predicted values.

Interactions Between Independent Variables Perhaps a class size reduction is more effective in some circumstances than in others. Perhaps smaller classes help more if there are many English learners, who need individual attention. That is, might depend on Pct EL. More generally, might depend on X2. How to model such “interactions” between X1 and X2? We first consider binary X’s, then continuous X’s.

(a) Interactions between two binary variables are binary. is the effect of changing D1 = 0 to D1 = 1. In this specification, this effect doesn’t depend on the value of D2. To allow the effect of changing D1 to depend on D2, include the “interaction term” as a regressor.

Interpreting the coefficients General rule: compare the various cases. subtract: The effect of D1 depends on d2. = increment to the effect of D1, when D2 = 1.

Example: Test Score, STR, English learners Let and

“Effect” of Hi STR when Hi EL = 0 is -1.9. Class size reduction is estimated to have a bigger effect when the percent of English learners is large. This interaction isn’t statistically significant: t = 3.5/3.1

(b) Interactions between continuous and binary variables Di is binary, X is continuous. As specified above, the effect on Y of X (holding constant D) = , which does not depend on D. To allow the effect of X to depend on D, include the “interaction term” as a regressor.

Binary-continuous interactions: the two regression lines Observations with Di = 0: Observations with Di = 1:

Interpreting the coefficients The effect of X depends on D. = increment to the effect of X, when D = 1.

Example: Test Score, STR, Hi EL (=1 if Pct EL≥20) When Hi EL = 0

When Hi EL = 1, Two regression lines: one for each Hi STR group. Class size reduction is estimated to have a larger effect when the percent of English learners is large.

Example, ctd. Testing various hypotheses: The two regression lines have the same slope the coefficient on STR × Hi EL is zero: t = -1.28/0.97 = -1.32 can’t reject. The two regression lines have the same intercept the coefficient on Hi EL is zero: t = -5.6/19.5 = 0.29 ) can’t reject

Example, ctd. Joint hypothesis that the two regression lines are the same population coefficient on Hi EL = 0 and population coefficient on STR × Hi EL = 0. F = 89.94 (p-value < .001) !! Why do we reject the joint hypothesis but neither individual hypothesis? Consequence of high but imperfect multicollinearity: high correlation between Hi EL and STR × Hi EL.

(c) Interactions between two continuous variables X1, X2 are continuous As specified, the effect of X1 doesn’t depend on X2. As specified, the effect of X2 doesn’t depend on X1. To allow the effect of X1 to depend on X2, include the “interaction term” X1i × X2i as a regressor.

Coefficients in continuous-continuous interactions The effect of X1 depends on X2. = increment to the effect of X1 from a unit change in X2.

Example: Test Score, STR, Pct EL The estimated effect of class size reduction is nonlinear because the size of the effect itself depends on Pct EL:

Does population coefficient on STR × Pct EL = 0. t =. 0012/. 019 = Does population coefficient on STR × Pct EL = 0 ? t = .0012/.019 = .06 can’t reject at 5% level. Does population coefficient on STR = 0 ? t = -1.12/0.59 = -1.90 can’t reject null at 5% level. Do the coefficients on both STR and STR × Pct EL =0? F = 3.89 (p-value = .021))reject null at 5% level(!!) (Why? high but imperfect multicollinearity)

Application: Nonlinear Effects on Test Scores of the Student-Teacher Ratio Nonlinear specifications let us examine more questions about the Test Score- STR relations, such as Are there nonlinear effects of class size reduction on test scores? (Does a reduction from 35 to 30 have same effect as a reduction from 20 to 15?) Are there nonlinear interactions between Pct EL and STR? (Are small classes more effective when there are many English learners?)

Strategy for Question #1 (different effects for different STR?) Estimate linear and nonlinear functions of STR, holding constant relevant demographic variables. Pct EL Income (remember the nonlinear Test Score-Income relation!) LunchPCT (fraction on free/subsidized lunch) See whether adding the nonlinear terms makes an “economically important” quantitative difference (“economic” or ”real-world” importance is different than statistically significant). Test for whether the nonlinear terms are significant.

Strategy for Question #2 (nonlinear interactions between Pct EL and STR?) Estimate linear and nonlinear functions of STR, interacted with Pct EL. If the specification is nonlinear (with STR, STR2, STR3), then you need to add interactions with all the terms so that the entire functional form can be different, depending on the level of Pct EL. We will use a binary-continuous interaction specification by adding Hi EL × STR, Hi EL × STR2, and Hi EL × STR3.

What is a good “base” specification? The TestScore- Income relation The logarithmic specification is better behaved near the ends of the sample, especially large values of income.

Question #1: Investigate by considering a polynomial in STR(column 5)

Interpreting the regression function via plots (preceding regression is labeled (5) in this figure)

Are the higher order terms in STR statistically significant? H0: quadratic in STR v. H1: cubic in STR? t = .059/.021 = 2.86(p = .005) H0: linear in STR v. H1: nonlinear/up to cubic in STR? F = 6.17(p = .002)

Question #2: STR - Pct EL interactions (column 4) (to simplify things, ignore STR2, STR3 terms for now)

“Real-world” importance of the interaction term: It is -1.12 if Hi EL = 1 and is -.53 if Hi EL = 0. The difference in the estimated effect of reducing the STR is substantial; class size reduction is more effective in districts with more English learners.

Is the interaction effect statistically significant? H0: coeff. on interaction=0 v. H1: nonzero interaction t = -0.58/0.50 = -1.17 not significant at the 10% level. H0: both coeffs involving STR = 0 vs. H1: at least one coefficient is nonzero (STR enters) F = 5.92(p = .003)

column (6)

Interpreting the regression functions via plots:

Tests of joint hypotheses:

Summary: Nonlinear Regression Functions Using functions of the independent variables such as ln(X) or X1 × X2, allows recasting a large family of nonlinear regression functions as multiple regression. Estimation and inference proceeds in the same way as in the linear multiple regression model. Interpretation of the coefficients is model-specific, but the general rule is to compute effects by comparing different cases (different value of the original X’s). Many nonlinear specifications are possible, so you must use judgment: What nonlinear effect you want to analyze? What makes sense in your application?