QM222 Class 16 & 17 Today’s New topic: Estimating nonlinear relationships QM222 Fall 2017 Section A1.

Slides:



Advertisements
Similar presentations
Dummy Variables and Interactions. Dummy Variables What is the the relationship between the % of non-Swiss residents (IV) and discretionary social spending.
Advertisements

CHOW TEST AND DUMMY VARIABLE GROUP TEST
Christopher Dougherty EC220 - Introduction to econometrics (chapter 5) Slideshow: slope dummy variables Original citation: Dougherty, C. (2012) EC220 -
Christopher Dougherty EC220 - Introduction to econometrics (chapter 4) Slideshow: interactive explanatory variables Original citation: Dougherty, C. (2012)
Heteroskedasticity The Problem:
ELASTICITIES AND DOUBLE-LOGARITHMIC MODELS
Lecture 9 Today: Ch. 3: Multiple Regression Analysis Example with two independent variables Frisch-Waugh-Lovell theorem.
1 Nonlinear Regression Functions (SW Chapter 8). 2 The TestScore – STR relation looks linear (maybe)…
Lecture 4 This week’s reading: Ch. 1 Today:
Sociology 601 Class 19: November 3, 2008 Review of correlation and standardized coefficients Statistical inference for the slope (9.5) Violations of Model.
Sociology 601 Class 25: November 24, 2009 Homework 9 Review –dummy variable example from ASR (finish) –regression results for dummy variables Quadratic.
Sociology 601 Class 28: December 8, 2009 Homework 10 Review –polynomials –interaction effects Logistic regressions –log odds as outcome –compared to linear.
Introduction to Regression Analysis Straight lines, fitted values, residual values, sums of squares, relation to the analysis of variance.
So far, we have considered regression models with dummy variables of independent variables. In this lecture, we will study regression models whose dependent.
Basic Business Statistics, 11e © 2009 Prentice-Hall, Inc. Chap 15-1 Chapter 15 Multiple Regression Model Building Basic Business Statistics 11 th Edition.
SLOPE DUMMY VARIABLES 1 The scatter diagram shows the data for the 74 schools in Shanghai and the cost functions derived from a regression of COST on N.
Christopher Dougherty EC220 - Introduction to econometrics (chapter 4) Slideshow: semilogarithmic models Original citation: Dougherty, C. (2012) EC220.
EDUC 200C Section 4 – Review Melissa Kemmerle October 19, 2012.
Christopher Dougherty EC220 - Introduction to econometrics (chapter 5) Slideshow: dummy variable classification with two categories Original citation:
Christopher Dougherty EC220 - Introduction to econometrics (chapter 5) Slideshow: the effects of changing the reference category Original citation: Dougherty,
DUMMY CLASSIFICATION WITH MORE THAN TWO CATEGORIES This sequence explains how to extend the dummy variable technique to handle a qualitative explanatory.
1 INTERACTIVE EXPLANATORY VARIABLES The model shown above is linear in parameters and it may be fitted using straightforward OLS, provided that the regression.
1 TWO SETS OF DUMMY VARIABLES The explanatory variables in a regression model may include multiple sets of dummy variables. This sequence provides an example.
How do Lawyers Set fees?. Learning Objectives 1.Model i.e. “Story” or question 2.Multiple regression review 3.Omitted variables (our first failure of.
EDUC 200C Section 3 October 12, Goals Review correlation prediction formula Calculate z y ’ = r xy z x for a new data set Use formula to predict.
What is the MPC?. Learning Objectives 1.Use linear regression to establish the relationship between two variables 2.Show that the line is the line of.
MULTIPLE REGRESSION WITH TWO EXPLANATORY VARIABLES: EXAMPLE 1 This sequence provides a geometrical interpretation of a multiple regression model with two.
Correlation and Linear Regression. Evaluating Relations Between Interval Level Variables Up to now you have learned to evaluate differences between the.
Biostat 200 Lecture Simple linear regression Population regression equationμ y|x = α +  x α and  are constants and are called the coefficients.
Chapter 5: Dummy Variables. DUMMY VARIABLE CLASSIFICATION WITH TWO CATEGORIES 1 We’ll now examine how you can include qualitative explanatory variables.
Christopher Dougherty EC220 - Introduction to econometrics (chapter 4) Slideshow: exercise 4.5 Original citation: Dougherty, C. (2012) EC220 - Introduction.
Special topics. Importance of a variable Death penalty example. sum death bd- yv Variable | Obs Mean Std. Dev. Min Max
COST 11 DUMMY VARIABLE CLASSIFICATION WITH TWO CATEGORIES 1 This sequence explains how you can include qualitative explanatory variables in your regression.
STAT E100 Section Week 12- Regression. Course Review - Project due Dec 17 th, your TA. - Exam 2 make-up is Dec 5 th, practice tests have been updated.
RAMSEY’S RESET TEST OF FUNCTIONAL MISSPECIFICATION 1 Ramsey’s RESET test of functional misspecification is intended to provide a simple indicator of evidence.
1 CHANGES IN THE UNITS OF MEASUREMENT Suppose that the units of measurement of Y or X are changed. How will this affect the regression results? Intuitively,
SEMILOGARITHMIC MODELS 1 This sequence introduces the semilogarithmic model and shows how it may be applied to an earnings function. The dependent variable.
F TESTS RELATING TO GROUPS OF EXPLANATORY VARIABLES 1 We now come to more general F tests of goodness of fit. This is a test of the joint explanatory power.
1 COMPARING LINEAR AND LOGARITHMIC SPECIFICATIONS When alternative specifications of a regression model have the same dependent variable, R 2 can be used.
VARIABLE MISSPECIFICATION I: OMISSION OF A RELEVANT VARIABLE In this sequence and the next we will investigate the consequences of misspecifying the regression.
QM222 Class 19 Section D1 Tips on your Project
QM222 Class 12 Section D1 1. A few Stata things 2
Chapter 15 Multiple Regression Model Building
Chapter 14 Introduction to Multiple Regression
QM222 Class 9 Section A1 Coefficient statistics
QM222 Class 11 Section D1 1. Review and Stata: Time series data, multi-category dummies, etc. (chapters 10,11) 2. Capturing nonlinear relationships (Chapter.
business analytics II ▌appendix – regression performance the R2 
QM222 Class 10 Section D1 1. Goodness of fit -- review 2
QM222 Nov. 7 Section D1 Multicollinearity Regression Tables What to do next on your project QM222 Fall 2016 Section D1.
assignment 7 solutions ► office networks ► super staffing
QM222 Class 11 Section A1 Multiple Regression
QM222 Class 19 Omitted Variable Bias pt 2 Different slopes for a single variable QM222 Fall 2017 Section A1.
Multiple Regression Analysis and Model Building
QM222 Class 18 Omitted Variable Bias
QM222 Class 9 Section D1 1. Multiple regression – review and in-class exercise 2. Goodness of fit 3. What if your Dependent Variable is an 0/1 Indicator.
QM222 A1 On tests and projects
QM222 Class 8 Section A1 Using categorical data in regression
QM222 Class 8 Section D1 1. Review: coefficient statistics: standard errors, t-statistics, p-values (chapter 7) 2. Multiple regression 3. Goodness of fit.
QM222 A1 Nov. 27 More tips on writing your projects
The slope, explained variance, residuals
QM222 A1 How to proceed next in your project Multicollinearity
Simple Linear Regression
QM222 Class 14 Today’s New topic: What if the Dependent Variable is a Dummy Variable? QM222 Fall 2017 Section A1.
QM222 Your regressions and the test
QM222 Class 15 Section D1 Review for test Multicollinearity
Covariance x – x > 0 x (x,y) y – y > 0 y x and y axes.
Introduction to Econometrics, 5th edition
MGS 3100 Business Analysis Regression Feb 18, 2016
Introduction to Econometrics, 5th edition
Presentation transcript:

QM222 Class 16 & 17 Today’s New topic: Estimating nonlinear relationships QM222 Fall 2017 Section A1

To-dos I am available today 11-3:30. If you don’t have your data completed, come then or make a special appointment. QM222 Fall 2017 Section A1

Today Estimating nonlinear relationships - part 1 Future special topics: Omitted variable bias (leaving things out of the regression) One variable with different slopes. We already covered A dummy variable as the dependent (Y) variable Multiple-category dummies Time series analysis - See me if you have time-series cross-section QM222 Fall 2017 Section A1

Estimating nonlinear relationships Could the relationship be non-linear, and if so, how can we estimate this using linear regression? QM222 Fall 2017 Section A1

Non-linear relationships between Y and X Sometimes, the relationship between a numerical Y variable and a numerical X variable is unlikely to be linear. This may lead you to measure a very low insignificant slope. e.g. If you ran a regression of this graph, its coefficient would be zero. QM222 Fall 2017 Section A1

Many of you believe that you might have nonlinear relationships e.g. You want to see how age will affect job-satisfaction (measured on a scale 1 to 10) so you think about running the regression: regress jobsatis age Is this likely to be linear? Maybe job satisfaction goes up with age and then down again. For instance, maybe you do not believe that an extra year increases job-satisfaction by a constant amount whether if it is the difference between 24 to 25 years old, or the difference between 60 and 61 years old. QM222 Fall 2017 Section A1

Many of you believe that you might have nonlinear relationships Note that this section is only applicable for numerical variables. You cannot do these nonlinear things with dummy variables. Why not? 12 = 1 and 02 = 0 Squaring a dummy variable just gives you the same dummy variable. QM222 Fall 2017 Section A1

To solve the problem of Y possibly increasing with X and then decreasing: You simply add to the regression a new X variable that is a non-linear version of old variable. My suggestion: estimate a quadratic by making a new variable X2 and run the regression with both the linear and non-linear (quadratic) term in the equation. If you don’t know if a relationship is nonlinear, you can estimate the regression assuming it is nonlinear (e.g. quadratic) and then examine the results to see if this assumption is correct. QM222 Fall 2017 Section A1

Quadratic: Y = b0 + b1 X + b2 X2 In high school you learned that quadratic equations look like this. So by adding a squared term, you can estimate these shapes. QM222 Fall 2017 Section A1

However, a regression with a quadratic can estimate ANY part of these shapes So, using a quadratic does not mean that the curve need actually ever change from a positive to a negative slope or vice versa … QM222 Fall 2017 Section A1

How do you know whether the relationship really is nonlinear? Put a nonlinear term (e.g. a squared term) in the regression and let the |t-stats|’s in the equation tell you if it belongs in there. If the |t-stat|>2, you are more than 95% confident that the relationship is nonlinear. However, it’s a good idea to keep in the quadratic term as long as the| t-stat | >1, which means that I am at least 68% confident the relationship is nonlinear. Recall: The adjusted R2 goes up if you add in a variable with |t|>=1 (so that you are >= 68% certain that the coefficient is not 0 ) The adjusted R2 goes down if you add in a variable with |t|<1 (so that you are < 68% certain that the coefficient is not 0 ) QM222 Fall 2017 Section A1

Example of testing nonlinear relationships by adding in a squared term Example: I know annual visitors to a national park (Cape Canaveral). I want to know if they are growing (or falling) at a constant rate over time, or not. Assume the data is in chronological order. First I make the variables: gen time= _n gen timesq = time^2 Then I run a regression with BOTH variables, time and timesq QM222 Fall 2017 Section A1

Here are regressions of visitors (to Cape Canaveral) first on time, then on time AND timesq. Is the relationship nonlinear? Are visitors growing/shrinking, and at a constant rate? . regress annualvisitors time Source | SS df MS Number of obs = 23 -------------+------------------------------ F( 1, 21) = 1.74 Model | 1.1103e+11 1 1.1103e+11 Prob > F = 0.2010 Residual | 1.3382e+12 21 6.3722e+10 R-squared = 0.0766 -------------+------------------------------ Adj R-squared = 0.0326 Total | 1.4492e+12 22 6.5872e+10 Root MSE = 2.5e+05 ------------------------------------------------------------------------------ annualvisi~s | Coef. Std. Err. t P>|t| [95% Conf. Interval] -------------+---------------------------------------------------------------- time | -10474.59 7935.107 -1.32 0.201 -26976.55 6027.369 _cons | 1639786 108800.7 15.07 0.000 1413523 1866050 . regress annualvisitors time timesq -------------+------------------------------ F( 2, 20) = 35.96 Model | 1.1339e+12 2 5.6695e+11 Prob > F = 0.0000 Residual | 3.1528e+11 20 1.5764e+10 R-squared = 0.7824 -------------+------------------------------ Adj R-squared = 0.7607 Total | 1.4492e+12 22 6.5872e+10 Root MSE = 1.3e+05 time | 118497.8 16490.43 7.19 0.000 84099.37 152896.3 timesq | -5373.85 667.1316 -8.06 0.000 -6765.462 -3982.238 _cons | 1102401 85902.09 12.83 0.000 923212.7 1281590 QM222 Fall 2017 Section A1

Sketching that Quadratic Visitors = 1102401 + 118498 time - 5374 time2 The linear term in positive, so at a small X eg. time=0.1 the slope is positive. The squared is negative so the slope eventually becomes negatively sloped. So the general shape is as below. But which part of the curve is it? For those who don’t think in derivatives, plug in high, medium and low values for X in the original equation. In this data, time goes from 1 to 23 so: At time=1, Visitors = 1102401 + 118498 (1) - 5374 (1) = 1,215,525 At time=10, Visitors = 1102401 + 118498 (10) - 5374 (102) =1,749,981 At time=23, Visitors = 1102401 + 118498 (23) - 5374 (232) =985,009 So over these 23 years, predicted visitors go up, then back down again. QM222 Fall 2017 Section A1

Sketching the Quadratic using calculus (easier Sketching the Quadratic using calculus (easier!) Visitors = 1102401 + 118498 time - 5374 time2 Calculus tells us the slope: dVisitors/dtime = 118498 – 2*5374 time The slope gets smaller as time increases. At the top of this curve, the slope is exactly zero. So solve slope= 118498 – 2*5374 time = 0 time = 118498/2 = 11.03 QM222 Fall 2017 Section A1

Another kind of non-linear term Another kind of non-linear term. [NOT ON TEST] You believe that a 1% increase in X will have the same percentage (% ) effect on Y no matter what X you start at. e.g. You believe a 1 percent increase in price has a constant percentage effect on sales. Mathematical rule: If lnY = b0+ b1 lnX, b1 represents the %∆Y/ %∆X (for small changes) Or, the percentage change in Y when X changes by 1% (ln is natural log, the coefficient of “e”. log means to the base 10. Either works.) So just make two new variables from yvariable and xvariable: gen lnY=ln(yvariable) gen lnX =ln(xvariable) regress lnY lnX Then the coefficient will tell you the percentage change in Y when X changes by 1% QM222 Fall 2017 Section A1

A case when logs might be useful? If you have skewed data (like lifetime gross in movies), you could just regress ln(Lifetime gross) = b0 + b1 ln(metascore) QM222 Fall 2017 Section A1

Of course, we said last time that with a very skewed Y- variable, you could…. You could predict the median, not the average, income. qreg incwage age - You could top-code the variable (i.e. treat all incomes above a certain amount – your choice – as that top amount) e.g. make all incomes above $400,000 into $400,000: replace incwage=400000 if incwage>400000 QM222 Fall 2017 Section A1

Which non-linear way of dealing with things is best with a skewed Y? There is no one best way. It is trial and error. It is an art rather than a science. BUT ONLY use methods that you understand. QM222 Fall 2017 Section A1

Comparing regressions with skewed Y Regular variable: regress income wage Source | SS df MS Number of obs = 150,000 -------------+---------------------------------- F(1, 149998) = 6049.10 Model | 3.7133e+13 1 3.7133e+13 Prob > F = 0.0000 Residual | 9.2077e+14 149,998 6.1385e+09 R-squared = 0.0388 -------------+---------------------------------- Adj R-squared = 0.0388 Total | 9.5790e+14 149,999 6.3860e+09 Root MSE = 78349 ------------------------------------------------------------------------------ income | Coef. Std. Err. t P>|t| [95% Conf. Interval] -------------+---------------------------------------------------------------- age | 1418.423 18.23728 77.78 0.000 1382.678 1454.167 _cons | 28233.09 820.0105 34.43 0.000 26625.89 29840.3 With a quadratic: regress income age agesq -------------+---------------------------------- F(2, 149997) = 4162.44 Model | 5.0368e+13 2 2.5184e+13 Prob > F = 0.0000 Residual | 9.0753e+14 149,997 6.0503e+09 R-squared = 0.0526 -------------+---------------------------------- Adj R-squared = 0.0526 Total | 9.5790e+14 149,999 6.3860e+09 Root MSE = 77784 age | 8636.687 155.388 55.58 0.000 8332.13 8941.245 agesq | -81.87879 1.750603 -46.77 0.000 -85.30994 -78.44764 _cons | -120758.3 3287.881 -36.73 0.000 -127202.5 -114314.1 QM222 Fall 2017 Section A1

Comparing regressions Regular variable: regress income wage Source | SS df MS Number of obs = 150,000 -------------+---------------------------------- F(1, 149998) = 6049.10 Model | 3.7133e+13 1 3.7133e+13 Prob > F = 0.0000 Residual | 9.2077e+14 149,998 6.1385e+09 R-squared = 0.0388 -------------+---------------------------------- Adj R-squared = 0.0388 Total | 9.5790e+14 149,999 6.3860e+09 Root MSE = 78349 ------------------------------------------------------------------------------ income | Coef. Std. Err. t P>|t| [95% Conf. Interval] -------------+---------------------------------------------------------------- age | 1418.423 18.23728 77.78 0.000 1382.678 1454.167 _cons | 28233.09 820.0105 34.43 0.000 26625.89 29840.3 With topcoding if income>400000 regress incometop age -------------+---------------------------------- F(1, 149998) = 6919.29 Model | 3.3996e+13 1 3.3996e+13 Prob > F = 0.0000 Residual | 7.3698e+14 149,998 4.9133e+09 R-squared = 0.0441 -------------+---------------------------------- Adj R-squared = 0.0441 Total | 7.7098e+14 149,999 5.1399e+09 Root MSE = 70095 incometop | Coef. Std. Err. t P>|t| [95% Conf. Interval] age | 1357.2 16.31598 83.18 0.000 1325.221 1389.179 _cons | 29247.77 733.6221 39.87 0.000 27809.88 30685.65 QM222 Fall 2017 Section A1

Comparing regressions With a quadratic: regress income age agesq Source | SS df MS Number of obs = 150,000 -------------+---------------------------------- F(2, 149997) = 4162.44 Model | 5.0368e+13 2 2.5184e+13 Prob > F = 0.0000 Residual | 9.0753e+14 149,997 6.0503e+09 R-squared = 0.0526 -------------+---------------------------------- Adj R-squared = 0.0526 Total | 9.5790e+14 149,999 6.3860e+09 Root MSE = 77784 ------------------------------------------------------------------------------ income | Coef. Std. Err. t P>|t| [95% Conf. Interval] -------------+---------------------------------------------------------------- age | 8636.687 155.388 55.58 0.000 8332.13 8941.245 agesq | -81.87879 1.750603 -46.77 0.000 -85.30994 -78.44764 _cons | -120758.3 3287.881 -36.73 0.000 -127202.5 -114314.1 With topcoding (if income>400000) AND a quadratic . regress incometop age agesq Best adj Rsq -------------+---------------------------------- F(2, 149997) = 4747.54 Model | 4.5899e+13 2 2.2949e+13 Prob > F = 0.0000 Residual | 7.2508e+14 149,997 4.8339e+09 R-squared = 0.0595 -------------+---------------------------------- Adj R-squared = 0.0595 Total | 7.7098e+14 149,999 5.1399e+09 Root MSE = 69527 incometop | Coef. Std. Err. t P>|t| [95% Conf. Interval] age | 8202.253 138.8926 59.05 0.000 7930.027 8474.48 agesq | -77.64535 1.564765 -49.62 0.000 -80.71226 -74.57844 _cons | -112040.2 2938.851 -38.12 0.000 -117800.3 -106280.1 QM222 Fall 2017 Section A1

Comparing regressions With topcoding if income>400000 AND a quadratic . regress incometop age agesq Source | SS df MS Number of obs = 150,000 -------------+---------------------------------- F(2, 149997) = 4747.54 Model | 4.5899e+13 2 2.2949e+13 Prob > F = 0.0000 Residual | 7.2508e+14 149,997 4.8339e+09 R-squared = 0.0595 -------------+---------------------------------- Adj R-squared = 0.0595 Total | 7.7098e+14 149,999 5.1399e+09 Root MSE = 69527 ------------------------------------------------------------------------------ incometop | Coef. Std. Err. t P>|t| [95% Conf. Interval] -------------+---------------------------------------------------------------- age | 8202.253 138.8926 59.05 0.000 7930.027 8474.48 agesq | -77.64535 1.564765 -49.62 0.000 -80.71226 -74.57844 _cons | -112040.2 2938.851 -38.12 0.000 -117800.3 -106280.1 With logs regress lnincome lnage -------------+---------------------------------- F(1, 149998) = 11370.59 Model | 4376.05532 1 4376.05532 Prob > F = 0.0000 Residual | 57727.8423 149,998 .384857413 R-squared = 0.0705 -------------+---------------------------------- Adj R-squared = 0.0705 Total | 62103.8976 149,999 .414028744 Root MSE = .62037 lnincome | Coef. Std. Err. t P>|t| [95% Conf. Interval] lnage | .646042 .0060586 106.63 0.000 .6341673 .6579166 _cons | 8.755814 .0227184 385.41 0.000 8.711287 8.800342 This is the best adj Rsq, but you are fitting a whole different Y-variable. It does look very good from the t-stat on lnage. QM222 Fall 2017 Section A1

Comparing regressions With logs regress lnincome lnage Source | SS df MS Number of obs = 150,000 -------------+---------------------------------- F(1, 149998) = 11370.59 Model | 4376.05532 1 4376.05532 Prob > F = 0.0000 Residual | 57727.8423 149,998 .384857413 R-squared = 0.0705 -------------+---------------------------------- Adj R-squared = 0.0705 Total | 62103.8976 149,999 .414028744 Root MSE = .62037 ------------------------------------------------------------------------------ lnincome | Coef. Std. Err. t P>|t| [95% Conf. Interval] -------------+---------------------------------------------------------------- lnage | .646042 .0060586 106.63 0.000 .6341673 .6579166 _cons | 8.755814 .0227184 385.41 0.000 8.711287 8.800342 With logged income,and an age quadratic (rather than using log age) . regress lnincome age agesq -------------+---------------------------------- F(2, 149997) = 7249.43 Model | 5473.90915 2 2736.95457 Prob > F = 0.0000 Residual | 56629.9885 149,997 .377540807 R-squared = 0.0881 -------------+---------------------------------- Adj R-squared = 0.0881 Total | 62103.8976 149,999 .414028744 Root MSE = .61444 age | .0959792 .0012275 78.19 0.000 .0935734 .098385 agesq | -.0009264 .0000138 -66.99 0.000 -.0009535 -.0008993 _cons | 8.863046 .0259722 341.25 0.000 8.812141 8.913951 This is the best fit so far, and it IS the same lnY variable as above. QM222 Fall 2017 Section A1

Comparing regressions With logs for income, an age quadratic (rather than using log age) . regress lnincome age agesq Source | SS df MS Number of obs = 150,000 -------------+---------------------------------- F(2, 149997) = 7249.43 Model | 5473.90915 2 2736.95457 Prob > F = 0.0000 Residual | 56629.9885 149,997 .377540807 R-squared = 0.0881 -------------+---------------------------------- Adj R-squared = 0.0881 Total | 62103.8976 149,999 .414028744 Root MSE = .61444 ------------------------------------------------------------------------------ lnincome | Coef. Std. Err. t P>|t| [95% Conf. Interval] -------------+---------------------------------------------------------------- age | .0959792 .0012275 78.19 0.000 .0935734 .098385 agesq | -.0009264 .0000138 -66.99 0.000 -.0009535 -.0008993 _cons | 8.863046 .0259722 341.25 0.000 8.812141 8.913951 Finally, with qreg to get a prediction for median wage, with age and agesq qreg income age agesq Median regression Number of obs = 150,000 Raw sum of deviations 3.35e+09 (about 69000) Min sum of deviations 3.22e+09 Pseudo R2 = 0.0382 income | Coef. Std. Err. t P>|t| [95% Conf. Interval] age | 5607.143 95.94801 58.44 0.000 5419.087 5795.199 agesq | -53.57143 1.080952 -49.56 0.000 -55.69007 -51.45279 _cons | -66000 2030.18 -32.51 0.000 -69979.11 -62020.89 Think of pseudo R-sq are adjusted R-sq. This does NOT fit as well as lnincome QM222 Fall 2017 Section A1

To sum today up There are many ways to deal with a skewed Y variable Top-code it Take its log Predict the median rather than the average. There are many ways to deal with an X variable when you think the Y-X relationship is nonlinear First look at the scatter diagram – It helps Then add a quadratic X term and see if |t| >1 [EASIEST] Or add a different kind of non-linear X term (e.g. log of X) Remember: If you use logs of both sides, the coefficient can be interpreted as: The percentage change in Y when X changes by 1% QM222 Fall 2017 Section A1

Which non-linear way of dealing with things is best with a skewed Y? There is no one best way. It is trial and error. It is an art rather than a science. BUT ONLY use methods that you understand. QM222 Fall 2017 Section A1