Stat 112 Notes 17 Time Series and Assessing the Assumption that the Disturbances Are Independent (Chapter 6.8) Using and Interpreting Indicator Variables.

Slides:



Advertisements
Similar presentations
Stat 112: Lecture 7 Notes Homework 2: Due next Thursday The Multiple Linear Regression model (Chapter 4.1) Inferences from multiple regression analysis.
Advertisements

Lecture 28 Categorical variables: –Review of slides from lecture 27 (reprint of lecture 27 categorical variables slides with typos corrected) –Practice.
6-1 Introduction To Empirical Models 6-1 Introduction To Empirical Models.
Ch11 Curve Fitting Dr. Deshi Ye
1 Multiple Regression Model Error Term Assumptions –Example 1: Locating a motor inn Goodness of Fit (R-square) Validity of estimates (t-stats & F-stats)
Lecture 9- Chapter 19 Multiple regression Introduction In this chapter we extend the simple linear regression model and allow for any number of.
Class 16: Thursday, Nov. 4 Note: I will you some info on the final project this weekend and will discuss in class on Tuesday.
Chapter 13 Multiple Regression
Multiple regression analysis
Chapter 13 Additional Topics in Regression Analysis
Stat 112: Lecture 17 Notes Chapter 6.8: Assessing the Assumption that the Disturbances are Independent Chapter 7.1: Using and Interpreting Indicator Variables.
Lecture 23: Tues., Dec. 2 Today: Thursday:
Class 15: Tuesday, Nov. 2 Multiple Regression (Chapter 11, Moore and McCabe).
Stat 112: Lecture 10 Notes Fitting Curvilinear Relationships –Polynomial Regression (Ch ) –Transformations (Ch ) Schedule: –Homework.
Class 19: Tuesday, Nov. 16 Specially Constructed Explanatory Variables.
1 Multiple Regression Chapter Introduction In this chapter we extend the simple linear regression model, and allow for any number of independent.
Lecture 25 Multiple Regression Diagnostics (Sections )
Chapter 12 Multiple Regression
© 2000 Prentice-Hall, Inc. Chap Multiple Regression Models.
Additional Topics in Regression Analysis
Multiple Regression Models. The Multiple Regression Model The relationship between one dependent & two or more independent variables is a linear function.
Lecture 23: Tues., April 6 Interpretation of regression coefficients (handout) Inference for multiple regression.
© 2003 Prentice-Hall, Inc.Chap 14-1 Basic Business Statistics (9 th Edition) Chapter 14 Introduction to Multiple Regression.
Stat 112: Lecture 19 Notes Chapter 7.2: Interaction Variables Thursday: Paragraph on Project Due.
Lecture 26 Omitted Variable Bias formula revisited Specially constructed variables –Interaction variables –Polynomial terms for curvature –Dummy variables.
Lecture 24: Thurs. Dec. 4 Extra sum of squares F-tests (10.3) R-squared statistic (10.4.1) Residual plots (11.2) Influential observations (11.3,
Lecture 24 Multiple Regression (Sections )
Chapter Topics Types of Regression Models
Lecture 24: Thurs., April 8th
Chapter 11 Multiple Regression.
Lecture 27 Polynomial Terms for Curvature Categorical Variables.
Lecture 16 – Thurs, Oct. 30 Inference for Regression (Sections ): –Hypothesis Tests and Confidence Intervals for Intercept and Slope –Confidence.
Stat 112: Lecture 20 Notes Chapter 7.2: Interaction Variables. Chapter 8: Model Building. I will Homework 6 by Friday. It will be due on Friday,
Stat 217 – Day 25 Regression. Last Time - ANOVA When?  Comparing 2 or means (one categorical and one quantitative variable) Research question  Null.
Stat 112: Lecture 18 Notes Chapter 7.1: Using and Interpreting Indicator Variables. Visualizing polynomial regressions in multiple regression Review Problem.
1 4. Multiple Regression I ECON 251 Research Methods.
Stat Notes 5 p-values for one-sided tests Caution about forecasting outside the range of the explanatory variable (Chapter 3.7.2) Fitting a linear.
Class 20: Thurs., Nov. 18 Specially Constructed Explanatory Variables –Dummy variables for categorical variables –Interactions involving dummy variables.
Lecture 21 – Thurs., Nov. 20 Review of Interpreting Coefficients and Prediction in Multiple Regression Strategy for Data Analysis and Graphics (Chapters.
Stat 112: Lecture 16 Notes Finish Chapter 6: –Influential Points for Multiple Regression (Section 6.7) –Assessing the Independence Assumptions and Remedies.
Stat 112: Lecture 9 Notes Homework 3: Due next Thursday
Statistical hypothesis testing – Inferential statistics II. Testing for associations.
Statistics for Managers Using Microsoft Excel, 4e © 2004 Prentice-Hall, Inc. Chap 13-1 Chapter 13 Introduction to Multiple Regression Statistics for Managers.
12 Autocorrelation Serial Correlation exists when errors are correlated across periods -One source of serial correlation is misspecification of the model.
Inference for regression - Simple linear regression
Chapter 13: Inference in Regression
Regression Method.
Chapter 12 Multiple Regression and Model Building.
Copyright © 2014, 2011 Pearson Education, Inc. 1 Chapter 22 Regression Diagnostics.
Copyright © 2015 McGraw-Hill Education. All rights reserved. No reproduction or distribution without the prior written consent of McGraw-Hill Education.
Stat 112 Notes 20 Today: –Interaction Variables (Chapter ) –Interpreting slope when Y is logged but not X –Model Building (Chapter 8)
Chap 14-1 Statistics for Business and Economics, 6e © 2007 Pearson Education, Inc. Chapter 14 Additional Topics in Regression Analysis Statistics for Business.
Copyright © 2005 Brooks/Cole, a division of Thomson Learning, Inc Chapter 18 Multiple Regression.
Lecture 8 Simple Linear Regression (cont.). Section Objectives: Statistical model for linear regression Data for simple linear regression Estimation.
Lesson Multiple Regression Models. Objectives Obtain the correlation matrix Use technology to find a multiple regression equation Interpret the.
Multiple Regression Petter Mostad Review: Simple linear regression We define a model where are independent (normally distributed) with equal.
Multiple Regression BPS chapter 28 © 2006 W.H. Freeman and Company.
SCHEDULE OF WEEK 10 Project 2 is online, due by Monday, Dec 5 at 03:00 am 2. Discuss the DW test and how the statistic attains less/greater that 2 values.
Categorical Independent Variables STA302 Fall 2013.
28. Multiple regression The Practice of Statistics in the Life Sciences Second Edition.
Stat 112 Notes 5 Today: –Chapter 3.7 (Cautions in interpreting regression results) –Normal Quantile Plots –Chapter 3.6 (Fitting a linear time trend to.
Introduction to Multiple Regression Lecture 11. The Multiple Regression Model Idea: Examine the linear relationship between 1 dependent (Y) & 2 or more.
Stat 112 Notes 6 Today: –Chapters 4.2 (Inferences from a Multiple Regression Analysis)
1 Regression Review Population Vs. Sample Regression Line Residual and Standard Error of Regression Interpretation of intercept & slope T-test, F-test.
Stat 112 Notes 14 Assessing the assumptions of the multiple regression model and remedies when assumptions are not met (Chapter 6).
2/25/ lecture 121 STATS 330: Lecture 12. 2/25/ lecture 122 Diagnostics 4 Aim of today’s lecture To discuss diagnostics for independence.
1 Assessment and Interpretation: MBA Program Admission Policy The dean of a large university wants to raise the admission standards to the popular MBA.
Forecasting. Model with indicator variables The choice of a forecasting technique depends on the components identified in the time series. The techniques.
Lecture 9 Forecasting. Introduction to Forecasting * * * * * * * * o o o o o o o o Model 1Model 2 Which model performs better? There are many forecasting.
Inference for Least Squares Lines
Presentation transcript:

Stat 112 Notes 17 Time Series and Assessing the Assumption that the Disturbances Are Independent (Chapter 6.8) Using and Interpreting Indicator Variables (Chapter 7.1)

Time Series Data and Autocorrelation When Y is a variable collected for the same entity (person, state, country) over time, we call the data time series data. For time series data, we need to consider the independence assumption for the simple and multiple regression model. Independence Assumption: The residuals are independent of one another. This means that if the residual is positive this year, it needs to be equally likely for the residuals to be positive or negative next year, i.e., there is no autocorrelation. Positive autocorrelation: Positive residuals are more likely to be followed by positive residuals than by negative residuals. Negative autocorrelation: Positive residuals are more likely to be followed by negative residuals than by positive residuals.

Ski Ticket Sales Christmas Week is a critical period for most ski resorts. A ski resort in Vermont wanted to determine the effect that weather had on its sale of lift tickets during Christmas week. Data from past 20 years. Y i = lift tickets during Christmas week in year i X i1 =snowfall during Christmas week in year i X i2 = average temperature during Christmas week in year i. Data in skitickets.JMP

Residuals suggest positive autocorrelation

Durbin-Watson Test of Independence The Durbin-Watson test is a test of whether the residuals are independent. The null hypothesis is that the residuals are independent and the alternative hypothesis is that the residuals are not independent (either positively or negatively) autocorrelated. The test works by computing the correlation of consecutive residuals. To compute Durbin-Watson test in JMP, after Fit Model, click the red triangle next to Response, click Row Diagnostics and click Durbin- Watson Test. Then click red triangle next to Durbin-Watson to get p-value. For ski ticket data, p-value = Strong evidence of autocorrelation

Remedies for Autocorrelation Add time variable to the regression. Add lagged dependent (Y) variable to the regression. We can do this by creating a new column and right clicking, then clicking Formula, clicking Row and clicking Lag and then clicking the Y variables. After adding these variables, refit the model and then recheck the Durbin-Watson statistic to see if autocorrelation has been removed.

Example 6.10 in book

Categorical variables Categorical (nominal) variables: Variables that define group membership, e.g., sex (male/female), color (blue/green/red), county (Bucks County, Chester County, Delaware County, Philadelphia County). How to use categorical variables as explanatory variables in regression analysis?

Comparing Toy Factory Managers An analysis has shown that the time required to complete a production run in a toy factory increases with the number of toys produced. Data were collected for the time required to process 20 randomly selected production runs as supervised by three managers (Alice, Bob and Carol). Data in toyfactorymanager.JMP. How do the managers compare? Picture from Toy Story (1995)

Marginal Comparison Marginal comparison could be misleading. We know that large production runs with more toys take longer than small runs with few toys. How can we be sure that Carol has not simply been supervising very small production runs? Solution: Run a multiple regression in which we include size of the production run as an explanatory variable along with manager, in order to control for size of the production run.

Including Categorical Variable in Multiple Regression: Wrong Approach We could assign codes to the managers, e.g., Alice = 0, Bob=1, Carol=2. This model says that for the same run size, Bob is 31 minutes faster than Alice and Carol is 31 minutes faster than Bob. This model restricts the difference between Alice and Bob to be the same as the difference between Bob and Carol – we have no reason to do this. If we use a different coding for Manager, we get different results, e.g., Bob=0, Alice=1, Carol=2 Alice 5 min. faster than Bob

Including Categorical Variable in Multiple Regression: Right Approach Create an indicator (dummy) variable for each category. Manager[Alice] = 1 if Manager is Alice 0 if Manager is not Alice Manager[Bob] = 1 if Manager is Bob 0 if Manager is not Bob Manager[Carol] = 1 if Manager is Carol 0 if Manager is not Carol

Categorical Variables in Multiple Regression in JMP Make sure that the categorical variable is coded as nominal. To change coding, right clock on column of variable, click Column Info and change Modeling Type to nominal. Use Fit Model and include the categorical variable into the multiple regression. After Fit Model, click red triangle next to Response and click Estimates, then Expanded Estimates (the initial output in JMP uses a different, more confusing coding of the dummy variables).

For a run size of length 100, the estimated time for run of Alice, Bob and Carol For the same run size, Alice is estimated to be on average (-14.65)=53.06 minutes slower than Bob and (-23.76)=62.17 minutes slower than Carol.

Election Regression

Prediction and Prediction Interval for 2008

Effect Tests Effect test for manager: vs. H a : At least two of manager[Alice], manager[Bob] and manager[Carol] are not equal. Null hypothesis is that all managers are the same (in terms of mean run time) when run size is held fixed, alternative hypothesis is that not all managers are the same (in terms of mean run time) when run size is held fixed. This is a partial F test. p-value for Effect Test < Strong evidence that not all managers are the same when run size is held fixed. Note that is equivalent to because JMP has constraint that manager[a]+manager[b]+manager[c]=0. Effect test for Run size tests null hypothesis that Run Size coefficient is 0 versus alternative hypothesis that Run size coefficient isn’t zero. Same p- value as t-test.

Effect tests shows that managers are not equal. For the same run size, Carol is best (lowest mean run time), followed by Bob and then Alice. The above model assumes no interaction between Manager and run size – the difference between the mean run time of the managers is the same for all run sizes.

Testing for Differences Between Specific Managers

Inference for Differences of Coefficients in JMP