Unit 2b: Dealing “Rationally” with Nonlinear Relationships © Andrew Ho, Harvard Graduate School of EducationUnit 2b – Slide 1

Slides:

Advertisements

Similar presentations

Irwin/McGraw-Hill © Andrew F. Siegel, 1997 and l Chapter 12 l Multiple Regression: Predicting One Factor from Several Others.

Advertisements

Copyright © 2009 Pearson Education, Inc. Chapter 29 Multiple Regression.

Copyright © 2010 Pearson Education, Inc. Slide

Regression Analysis Once a linear relationship is defined, the independent variable can be used to forecast the dependent variable. Y ^ = bo + bX bo is.

Copyright (c) 2004 Brooks/Cole, a division of Thomson Learning, Inc. Chapter 13 Nonlinear and Multiple Regression.

Simple Linear Regression and Correlation

ELASTICITIES AND DOUBLE-LOGARITHMIC MODELS

Lecture 9 Today: Ch. 3: Multiple Regression Analysis Example with two independent variables Frisch-Waugh-Lovell theorem.

Lecture 4 This week’s reading: Ch. 1 Today:

Lecture 8 Relationships between Scale variables: Regression Analysis

Sociology 601 Class 19: November 3, 2008 Review of correlation and standardized coefficients Statistical inference for the slope (9.5) Violations of Model.

Chapter 13 Multiple Regression

Chapter 10 Simple Regression.

Chapter 12 Multiple Regression

Statistics for Business and Economics

The Simple Regression Model

SIMPLE LINEAR REGRESSION

So far, we have considered regression models with dummy variables of independent variables. In this lecture, we will study regression models whose dependent.

Chapter 11 Multiple Regression.

Multiple Linear Regression

SIMPLE LINEAR REGRESSION

Basic Business Statistics, 11e © 2009 Prentice-Hall, Inc. Chap 15-1 Chapter 15 Multiple Regression Model Building Basic Business Statistics 11 th Edition.

Correlation and Regression Analysis

Back to House Prices… Our failure to reject the null hypothesis implies that the housing stock has no effect on prices – Note the phrase “cannot reject”

Regression Model Building Setting: Possibly a large set of predictor variables (including interactions). Goal: Fit a parsimonious model that explains variation.

Unit 5c: Adding Predictors to the Discrete Time Hazard Model © Andrew Ho, Harvard Graduate School of EducationUnit 5c– Slide 1

S052/Shopping Presentation – Slide #1 © Willett, Harvard University Graduate School of Education S052: Applied Data Analysis Shopping Presentation: A.

Review for Final Exam Some important themes from Chapters 9-11 Final exam covers these chapters, but implicitly tests the entire course, because we use.

Unit 5c: Adding Predictors to the Discrete Time Hazard Model © Andrew Ho, Harvard Graduate School of EducationUnit 5c– Slide 1

Unit 4c: Taxonomies of Logistic Regression Models © Andrew Ho, Harvard Graduate School of EducationUnit 4c – Slide 1

Unit 3b: From Fixed to Random Intercepts © Andrew Ho, Harvard Graduate School of EducationUnit 3b – Slide 1

Unit 4c: Taxonomies of Logistic Regression Models © Andrew Ho, Harvard Graduate School of EducationUnit 4c – Slide 1

Unit 4b: Fitting the Logistic Model to Data © Andrew Ho, Harvard Graduate School of EducationUnit 4b – Slide 1

Chapter 8: Bivariate Regression and Correlation

SIMPLE LINEAR REGRESSION

© Willett, Harvard University Graduate School of Education, 8/27/2015S052/I.3(c) – Slide 1 More details can be found in the “Course Objectives and Content”

Introduction to Linear Regression and Correlation Analysis

Chapter 13: Inference in Regression

Simple Linear Regression Models

How do Lawyers Set fees?. Learning Objectives 1.Model i.e. “Story” or question 2.Multiple regression review 3.Omitted variables (our first failure of.

Statistics for Business and Economics Chapter 10 Simple Linear Regression.

Unit 5b: The Logistic Regression Approach to Life Table Analysis © Andrew Ho, Harvard Graduate School of EducationUnit 5b– Slide 1

What is the MPC?. Learning Objectives 1.Use linear regression to establish the relationship between two variables 2.Show that the line is the line of.

CHAPTER 14 MULTIPLE REGRESSION

1 Experimental Statistics - week 10 Chapter 11: Linear Regression and Correlation Note: Homework Due Thursday.

© 2001 Prentice-Hall, Inc. Statistics for Business and Economics Simple Linear Regression Chapter 10.

Unit 1c: Detecting Influential Data Points and Assessing Their Impact © Andrew Ho, Harvard Graduate School of EducationUnit 1c – Slide 1

Various topics Petter Mostad Overview Epidemiology Study types / data types Econometrics Time series data More about sampling –Estimation.

Warsaw Summer School 2015, OSU Study Abroad Program Regression.

MGS3100_04.ppt/Sep 29, 2015/Page 1 Georgia State University - Confidential MGS 3100 Business Analysis Regression Sep 29 and 30, 2015.

Part 2: Model and Inference 2-1/49 Regression Models Professor William Greene Stern School of Business IOMS Department Department of Economics.

+ Chapter 12: More About Regression Section 12.1 Inference for Linear Regression.

S052/Shopping Presentation – Slide #1 © Willett, Harvard University Graduate School of Education S052: Applied Data Analysis What Would You Like To Know.

Unit 3a: Introducing the Multilevel Regression Model © Andrew Ho, Harvard Graduate School of EducationUnit 3a – Slide 1

Multiple Regression I 1 Copyright © 2005 Brooks/Cole, a division of Thomson Learning, Inc. Chapter 4 Multiple Regression Analysis (Part 1) Terry Dielman.

© Willett, Harvard University Graduate School of Education, 12/16/2015S052/I.1(d) – Slide 1 More details can be found in the “Course Objectives and Content”

© Willett, Harvard University Graduate School of Education, 1/19/2016S052/I.2(a) – Slide 1 More details can be found in the “Course Objectives and Content”

Copyright (C) 2002 Houghton Mifflin Company. All rights reserved. 1 Understandable Statistics Seventh Edition By Brase and Brase Prepared by: Lynn Smith.

1 NONLINEAR REGRESSION Suppose you believe that a variable Y depends on a variable X according to the relationship shown and you wish to obtain estimates.

Unit 2a: Dealing “Empirically” with Nonlinear Relationships © Andrew Ho, Harvard Graduate School of EducationUnit 2a – Slide 1

CHAPTER 12 More About Regression

QM222 Class 9 Section A1 Coefficient statistics

CHAPTER 9 Testing a Claim

CHAPTER 12 More About Regression

Multiple Regression Chapter 14.

CHAPTER 12 More About Regression

CHAPTER 12 More About Regression

MGS 3100 Business Analysis Regression Feb 18, 2016

Presentation transcript:

Unit 2b: Dealing “Rationally” with Nonlinear Relationships © Andrew Ho, Harvard Graduate School of EducationUnit 2b – Slide 1

Introducing a theory-driven approach to fitting nonlinear models to data Fitting nonlinear model and interpreting results Polynomial regression © Andrew Ho, Harvard Graduate School of Education Unit 2b– Slide 2 Multiple Regression Analysis (MRA) Multiple Regression Analysis (MRA) Do your residuals meet the required assumptions? Test for residual normality Use influence statistics to detect atypical datapoints If your residuals are not independent, replace OLS by GLS regression analysis Use Individual growth modeling Specify a Multi-level Model If time is a predictor, you need discrete- time survival analysis… If your outcome is categorical, you need to use… Binomial logistic regression analysis (dichotomous outcome) Multinomial logistic regression analysis (polytomous outcome) If you have more predictors than you can deal with, Create taxonomies of fitted models and compare them. Form composites of the indicators of any common construct. Conduct a Principal Components Analysis Use Cluster Analysis Use non-linear regression analysis. Transform the outcome or predictor If your outcome vs. predictor relationship is non-linear, Use Factor Analysis: EFA or CFA? Course Roadmap: Unit 2b Today’s Topic Area

© Andrew Ho, Harvard Graduate School of EducationUnit 2b – Slide 3 Two General Approaches to Fitting Nonlinear Relationships  Use theory, or knowledge of the field, to postulate a non-linear model for the hypothesized relationship between outcome and predictor.  Use nonlinear regression analysis to fit the postulated trend, and conduct all of your statistical inference there.  Interpret the parameter estimates directly, and produce plots of findings.  Use theory, or knowledge of the field, to postulate a non-linear model for the hypothesized relationship between outcome and predictor.  Use nonlinear regression analysis to fit the postulated trend, and conduct all of your statistical inference there.  Interpret the parameter estimates directly, and produce plots of findings. This Class Harder to apply, easier to interpret Theory-Driven, “Rational” Approach  Find an ad-hoc transformation of either the outcome or the predictor, or both, that renders their relationship linear.  Use regular linear regression analysis to fit a linear trend in the transformed world, and conduct all statistical inference there.  De-transform fitted model to produce plots of findings, and tell the substantive story in the untransformed world.  Find an ad-hoc transformation of either the outcome or the predictor, or both, that renders their relationship linear.  Use regular linear regression analysis to fit a linear trend in the transformed world, and conduct all statistical inference there.  De-transform fitted model to produce plots of findings, and tell the substantive story in the untransformed world. Last Class Easier to apply, harder to interpret Data-Driven, “Empirical” Approach

© Andrew Ho, Harvard Graduate School of EducationUnit 2b – Slide 4 Theory-Driven, “Rational” Approach  Use theory, or knowledge of the field, to postulate a non-linear model for the hypothesized relationship between outcome and predictor.  Use nonlinear regression analysis to fit the postulated trend, and conduct all of your statistical inference there.  Interpret the parameter estimates directly, and produce plots of findings. Theory-Driven, “Rational” Approach  Use theory, or knowledge of the field, to postulate a non-linear model for the hypothesized relationship between outcome and predictor.  Use nonlinear regression analysis to fit the postulated trend, and conduct all of your statistical inference there.  Interpret the parameter estimates directly, and produce plots of findings. Theory: Pioneers in mathematical psychology, in the mid- 20 th century, theorized that human learning was state- dependent – that the rate at which individuals learned was proportional to the amount that they had left to learn. This led psychologists, like Nancy Bayley, to hypothesize that IQ had a negative exponential trajectory with age: Under this theory, the shape of the IQ/AGE trend in the BAYLEY data would look like this: IQ AGE Because the meaning of model parameters is not immediately obvious, we need to build intuition about the shape of the negative exponential curves … by sketching a few plots. lambda: Greek “l” gamma: Greek “g”

© Andrew Ho, Harvard Graduate School of EducationUnit 2b – Slide 5 You can build intuition about how the shape of a negative exponential curve depends on the values of its parameters by sketching curves for prototypical parameter values, fixing all but one and varying others. Let’s start with parameter λ… AGE IQ when λ=200  =.04 IQ when λ=200  =.04 IQ when λ=250  =.04 IQ when λ=250  =.04 IQ when λ=300  =.04 IQ when λ=300  =.04 Conclusion? Parameter λ is the upper asymptote -- larger λ, higher the asymptote. Conclusion? Parameter λ is the upper asymptote -- larger λ, higher the asymptote. Sliders in Excel Properties: 1) Linked Cell, 2) Min/Max

© Andrew Ho, Harvard Graduate School of EducationUnit 2b – Slide 6 And here’s how the values of parameter  affect the shape … AGE IQ when λ=200  =.01 IQ when λ=200  =.01 IQ when λ=200  =.04 IQ when λ=200  =.04 IQ when λ=200  =.07 IQ when λ=200  =.07 Conclusion? Parameter  determines the rate at which the asymptote is approached – the higher the value of , the more rapid the approach (see later). Conclusion? Parameter  determines the rate at which the asymptote is approached – the higher the value of , the more rapid the approach (see later).

© Andrew Ho, Harvard Graduate School of EducationUnit 2b – Slide 7 Fitting a hypothesized negative exponential curve to the BAYLEY data, using nl, proceeds by an iterative process of informed guessing … if your were to do it by hand, here is an initial (pretty bad) guess. What might the next step be? Observed Child IQ & AGE Observed Child IQ & AGE Observed Data Initial guess for fitted IQ In the next step, would you …  Increase or decrease the initial estimate of ?  Increase or decrease the initial estimate of  ? In the next step, would you …  Increase or decrease the initial estimate of ?  Increase or decrease the initial estimate of  ?

© Andrew Ho, Harvard Graduate School of EducationUnit 2b – Slide 8 Step 0 There’s another useful way of looking at the iterative journey to a final fitted model …think of it as a hike through a mountainous region of SSELAND, whose map grid is laid out in units of and , and we keep going downhill. SSE   Step 1 Step 2 Step 3 Step 4 Step 5 ??? The problem: How do we know our “local minimum” is our “global minimum”? You might try a number of different starting points and see if you converge to the same answer. Also, always visualize fit if you can. The problem: How do we know our “local minimum” is our “global minimum”? You might try a number of different starting points and see if you converge to the same answer. Also, always visualize fit if you can.

© Andrew Ho, Harvard Graduate School of EducationUnit 2b – Slide 9 Unit 2b.do File Unit 2b.do File... programming STATA to conduct a non-linear regression analysis … * * Hypothesize and fit a nonlinear relationship directly. * * Specify the hypothesized non-linear model and conduct nonlinear regression * analysis, providing some sensible initial guesses ("start values") for the * parameter estimates. nl (IQ = {lambda}*(1-exp(-{gamma}*AGE))), initial (lambda 225 gamma 1) * Output the predicted values and raw residuals for brief diagnosis: predict PREDICTED, yhat predict RESID, resid * Other standard diagnostic statistics can also be output. nl is the STATA routine for fitting nonlinear regression models by least squares You not only have to identify the outcome and predictors, you also have to provide the hypothesized model. STATA recognizes the variable names in the model (here “IQ” & “AGE”) and assumes that other “names” in the model (here “lambda” & “gamma”) are parameters you want to estimate. You have to provide some sensible initial guesses (“starting values”) for the parameter estimates. Where your hike begins. You can output diagnostic datasets, as in linear regression analysis, including diagnostic statistics, although they are limited due to the nonlinear fit (I choose not to do a full accounting and output only residuals and fits, to retain focus on the nonlinear modeling itself. But much of what you already know still applies) Warning. The hypothesized model is fitted to the data ITERATIVELY, by a process of guessing parameter estimates and then successively refining that guess, while attending to a best-fit criterion. The process stops when parameter estimates have “converged” on the “best” answer. With difficult problems, this can sometimes take a lot of steps, lead to loops, or, worse, lead you to a suboptimal answer. Adjusting starting values and convergence criteria can help.

© Andrew Ho, Harvard Graduate School of EducationUnit 2b – Slide 10 Iteration 0: residual SS = Iteration 1: residual SS = Iteration 2: residual SS = Iteration 3: residual SS = Iteration 4: residual SS = Iteration 5: residual SS = Iteration 6: residual SS = Iteration 0: residual SS = Iteration 1: residual SS = Iteration 2: residual SS = Iteration 3: residual SS = Iteration 4: residual SS = Iteration 5: residual SS = Iteration 6: residual SS = Here is the actual sequence of refinements to the Sum of Squared Residuals made by Stata as it iterated towards a final fitted negative exponential curve for the BAYLEY data … STATA began the iterative fitting process at “Step Zero” by computing the SSE associated with the initial guesses that I had provided … … clearly, my initial guesses were not good! STATA began the iterative fitting process at “Step Zero” by computing the SSE associated with the initial guesses that I had provided … … clearly, my initial guesses were not good! The computer regards the fitting process as having “converged” when SSE is reduced by less than one millionth between any two contiguous steps … you can modify this criterion, and choose your own. Over the next three steps, STATA focused rapidly on better estimates of the parameters, and SSE plummeted from over 70,000 to just under 700. Iteration step # Sum of squared errors, SSE Then, STATA spent a couple of steps trying to refine the final estimates, without much luck … making only a marginal improvement to SSE. And it quit, when between Step #5 and Step #6, it could not reduce SSE any further …

© Andrew Ho, Harvard Graduate School of EducationUnit 2b – Slide 11 Source | SS df MS Number of obs = 21 Model | R-squared = Residual | Adj R-squared = Root MSE = Total | Res. dev. = IQ | Coef. Std. Err. t P>|t| [95% Conf. Interval] /lambda | /gamma | Here are the t-statistics and p-values for each predictor. They test the usual marginal null hypotheses of no population effect on the outcome, for the respective predictor variable, given all else in the model A familiar quantity? Approximate standard errors: 95% confidence intervals on each regression parameter. Final parameter estimates: The “rational approach” provides parameter estimates that have an intuitive meaning in the context of the theory that provided the hypothesized regression model …

© Andrew Ho, Harvard Graduate School of EducationUnit 2b – Slide 12 Does it Fit? R 2 statistic = Pretty darn good, but don’t forget this is one individual with time series data. Does it Fit? R 2 statistic = Pretty darn good, but don’t forget this is one individual with time series data.

© Andrew Ho, Harvard Graduate School of Education Unit 2b– Slide 13 Residual Diagnostics, Normality Insufficient evidence to reject the null hypothesis that the residuals are normally distributed in the population. A bit of a heavy lower tail in the residual distribution, but there’s not much to say given the low sample size… Insufficient evidence to reject the null hypothesis that the residuals are normally distributed in the population. A bit of a heavy lower tail in the residual distribution, but there’s not much to say given the low sample size…

© Andrew Ho, Harvard Graduate School of Education Unit 2b– Slide 14 Because we have time series data, we might begin to ask about autocorrelation… Residual Diagnostics, Heteroscedasticity, Autocorrelation A look at the residuals seems to hint at heteroscedasticity, but it is difficult to claim with this small sample size. Consistent with greater measurement error at the center of raw-score test scales (test theory) whereas error is reduced towards the asymptote? Adjacent residuals do show signs of being correlated, as negatives tend to predict adjacent negatives and positives tend to predict adjacent positives. A look at the residuals seems to hint at heteroscedasticity, but it is difficult to claim with this small sample size. Consistent with greater measurement error at the center of raw-score test scales (test theory) whereas error is reduced towards the asymptote? Adjacent residuals do show signs of being correlated, as negatives tend to predict adjacent negatives and positives tend to predict adjacent positives.

Polynomial Regression: Interacting Variables with Themselves © Andrew Ho, Harvard Graduate School of EducationUnit 2b – Slide 15 X Y “I test the following hypotheses… wives’ percentage of income is associated with divorce in an inverted U-shaped curve such that the odds of divorce are highest when spouses’ economic contributions are similar” Source: Rogers, SJ (2004). Dollars, dependency, and divorce: Four perspectives on the role of wives’ income. Journal of Marriage and Family, 66, Quadratic model We allow a predictor’s effect to differ according to levels of that predictor. The test on  2 provides a test of whether the quadratic term (model) is necessary All quadratics are non-monotonic— they both rise and fall (or fall and rise) However, quadratic regression can fit monotonic curves as well: As with all interactions, we have to be careful about extrapolation.

© Andrew Ho, Harvard Graduate School of Education Unit 2b– Slide 16 Residual Diagnostics, Heteroscedasticity, Autocorrelation

Higher-Order Polynomials: Less Rational Than Empirical Linear Quadratic Cubic Quartic A quadratic model may have a loose argument for being theory-driven, but polynomial regression is largely a data-driven exercise. An advantage of polynomial regression over Box-Cox is a built-in framework for testing the hypothesis that an additional order added to the polynomial is useful for prediction. © Andrew Ho, Harvard Graduate School of Education Unit 2b– Slide 17