Diploma in Statistics Introduction to Regression Lecture 5.11 Introduction to Regression Lecture 5.1 1.Review 2.Transforming data, the log transform i.liver.

Slides:



Advertisements
Similar presentations
More on understanding variance inflation factors (VIFk)
Advertisements

Guide to Using Minitab 14 For Basic Statistical Applications To Accompany Business Statistics: A Decision Making Approach, 8th Ed. Chapter 15: Multiple.
Diploma in Statistics Introduction to Regression Lecture 4.11 Introduction to Regression Lecture 4.2 Indicator variables for estimating seasonal effects.
STAT E-100 Section 2—Monday, September 23, :30-6:30PM, SC113.
Chapter 3 Bivariate Data
Objectives (BPS chapter 24)
Correlation and Linear Regression
1 Copyright © 2005 Brooks/Cole, a division of Thomson Learning, Inc. Summarizing Bivariate Data Introduction to Linear Regression.
Guide to Using Minitab For Basic Statistical Applications To Accompany Business Statistics: A Decision Making Approach, 6th Ed. Chapter 14: Multiple Regression.
Stat 112: Lecture 15 Notes Finish Chapter 6: –Review on Checking Assumptions (Section ) –Outliers and Influential Points (Section 6.7) Homework.
BA 555 Practical Business Analysis
January 6, morning session 1 Statistics Micro Mini Multiple Regression January 5-9, 2008 Beth Ayers.
Stat 112: Lecture 13 Notes Finish Chapter 5: –Review Predictions in Log-Log Transformation. –Polynomials and Transformations in Multiple Regression Start.
Chap 15-1 Copyright ©2012 Pearson Education, Inc. publishing as Prentice Hall Chap 15-1 Chapter 15 Multiple Regression Model Building Basic Business Statistics.
1 Simple Linear Regression Linear regression model Prediction Limitation Correlation.
Basic Business Statistics, 11e © 2009 Prentice-Hall, Inc. Chap 15-1 Chapter 15 Multiple Regression Model Building Basic Business Statistics 11 th Edition.
Copyright ©2011 Pearson Education 15-1 Chapter 15 Multiple Regression Model Building Statistics for Managers using Microsoft Excel 6 th Global Edition.
Multiple Linear Regression Response Variable: Y Explanatory Variables: X 1,...,X k Model (Extension of Simple Regression): E(Y) =  +  1 X 1 +  +  k.
Objectives of Multiple Regression
Inference for regression - Simple linear regression
STAT 211 – 019 Dan Piett West Virginia University Lecture 2.
© 2004 Prentice-Hall, Inc.Chap 15-1 Basic Business Statistics (9 th Edition) Chapter 15 Multiple Regression Model Building.
© 2002 Prentice-Hall, Inc.Chap 14-1 Introduction to Multiple Regression Model.
Describing the Relation Between Two Variables
Diploma in Statistics Introduction to Regression Lecture 2.21 Introduction to Regression Lecture Review of Lecture 2.1 –Homework –Multiple regression.
Lecture 12 Model Building BMTRY 701 Biostatistical Methods II.
STA302/ week 111 Multicollinearity Multicollinearity occurs when explanatory variables are highly correlated, in which case, it is difficult or impossible.
M23- Residuals & Minitab 1  Department of ISM, University of Alabama, ResidualsResiduals A continuation of regression analysis.
Variable selection and model building Part II. Statement of situation A common situation is that there is a large set of candidate predictor variables.
Business Statistics: A First Course, 5e © 2009 Prentice-Hall, Inc. Chap 12-1 Correlation and Regression.
Introduction to Linear Regression
1 © 2008 Brooks/Cole, a division of Thomson Learning, Inc. Chapter 5 Summarizing Bivariate Data.
Midterm Review! Unit I (Chapter 1-6) – Exploring Data Unit II (Chapters 7-10) - Regression Unit III (Chapters 11-13) - Experiments Unit IV (Chapters 14-17)
Summarizing Bivariate Data
Diploma in Statistics Introduction to Regression Lecture 2.11 Introduction to Regression Lecture Review of Lecture Correlation 3.Pitfalls with.
Multiple Regression and Model Building Chapter 15 Copyright © 2014 by The McGraw-Hill Companies, Inc. All rights reserved.McGraw-Hill/Irwin.
Diploma in Statistics Introduction to Regression Lecture 3.11 Lecture 3.1 Multiple Regression (continued) Review Homework Review Analysis of Variance Review.
Detecting and reducing multicollinearity. Detecting multicollinearity.
Warm-up Ch.11 Inference for Linear Regression Day 2 1. Which of the following are true statements? I. By the Law of Large Numbers, the mean of a random.
Diploma in Statistics Design and Analysis of Experiments Lecture 2.11 Design and Analysis of Experiments Lecture Review of Lecture Randomised.
Inference for regression - More details about simple linear regression IPS chapter 10.2 © 2006 W.H. Freeman and Company.
1 Copyright © 2005 Brooks/Cole, a division of Thomson Learning, Inc. Summarizing Bivariate Data Non-linear Regression Example.
Multiple regression. Example: Brain and body size predictive of intelligence? Sample of n = 38 college students Response (Y): intelligence based on the.
Lecture 10 Chapter 23. Inference for regression. Objectives (PSLS Chapter 23) Inference for regression (NHST Regression Inference Award)[B level award]
Multiple Regression INCM 9102 Quantitative Methods.
LECTURE 9 Tuesday, 24 FEBRUARY STA291 Fall Administrative 4.2 Measures of Variation (Empirical Rule) 4.4 Measures of Linear Relationship Suggested.
[1] Simple Linear Regression. The general equation of a line is Y = c + mX or Y =  +  X.  > 0  > 0  > 0  = 0  = 0  < 0  > 0  < 0.
Chapter 14: Inference for Regression. A brief review of chapter 4... (Regression Analysis: Exploring Association BetweenVariables )  Bi-variate data.
Statistics for Managers Using Microsoft Excel, 4e © 2004 Prentice-Hall, Inc. Chap 14-1 Chapter 14 Multiple Regression Model Building Statistics for Managers.
Warm-up O Turn in HW – Ch 8 Worksheet O Complete the warm-up that you picked up by the door. (you have 10 minutes)
Copyright © 2004 by The McGraw-Hill Companies, Inc. All rights reserved.
Copyright © 2004 by The McGraw-Hill Companies, Inc. All rights reserved.
Basic Business Statistics, 10e © 2006 Prentice-Hall, Inc. Chap 15-1 Chapter 15 Multiple Regression Model Building Basic Business Statistics 10 th Edition.
Statistics for Managers Using Microsoft Excel, 4e © 2004 Prentice-Hall, Inc. Chap 14-1 Chapter 14 Multiple Regression Model Building Statistics for Managers.
Inference for regression - More details about simple linear regression IPS chapter 10.2 © 2006 W.H. Freeman and Company.
Multicollinearity. Multicollinearity (or intercorrelation) exists when at least some of the predictor variables are correlated among themselves. In observational.
Simple linear regression. What is simple linear regression? A way of evaluating the relationship between two continuous variables. One variable is regarded.
David Housman for Math 323 Probability and Statistics Class 05 Ion Sensitive Electrodes.
Chapter 12 REGRESSION DIAGNOSTICS AND CANONICAL CORRELATION.
Yandell – Econ 216 Chap 15-1 Chapter 15 Multiple Regression Model Building.
Predicting Energy Consumption in Buildings using Multiple Linear Regression Introduction Linear regression is used to model energy consumption in buildings.
Chapter 15 Multiple Regression Model Building
Chapter 20 Linear and Multiple Regression
Introduction to Regression Lecture 6.2
Guide to Using Minitab 14 For Basic Statistical Applications
(Residuals and
1) A residual: a) is the amount of variation explained by the LSRL of y on x b) is how much an observed y-value differs from a predicted y-value c) predicts.
CHAPTER 29: Multiple Regression*
The greatest blessing in life is
Multiple Regression Chapter 14.
Presentation transcript:

Diploma in Statistics Introduction to Regression Lecture 5.11 Introduction to Regression Lecture Review 2.Transforming data, the log transform i.liver fluke egg hatching rate ii.explaining CEO remuneration iii.brain weights and body weights 3.SLR with transformed data 4.Transforming X, quadratic fit 5.Other options

Diploma in Statistics Introduction to Regression Lecture 5.12 Using t values Convention: n >30 is big, n < 30 is small. Z 0.05 = 1.96 ≈ 2 t 30, 0.05 = 2.04 ≈ 2

Diploma in Statistics Introduction to Regression Lecture 5.13

Diploma in Statistics Introduction to Regression Lecture 5.14 Quantify the extent of the recovery in Year 6, Q3. = 1030 Q Q Q Q Time Year 6 Q2: P = 1657 = × 22 = 2033 P – = 1657 – 2033 = – 376 Year 6 Q3: P = 2185 = × 23 = 1985 P – = 2185 – 1985 = 200 Homework 4.2.1

Diploma in Statistics Introduction to Regression Lecture 5.15 Homework List correspondences between the output from the original regression and the output from the alternative regression. Confirm that the coefficients of Q1, Q2 and Q3 in the original are the corresponding coefficients in the alternative with the Q4 coefficient added.

Diploma in Statistics Introduction to Regression Lecture 5.16 Predictor Coef SE Coef T P Noconstant Q Q Q Q Time S = Predictor Coef SE Coef T P Constant Q Q Q Time S =

Diploma in Statistics Introduction to Regression Lecture 5.17 Homework Calculate the simple linear regressions of Jobtime on each of T_Ops and Units. Confirm the corresponding t-values. 2.Calculate the simple linear regression of Jobtime on Ops per Unit. Comment on the negative correlation of Jobtime with Ops per Unit in the light of the corresponding t-value. 3.Confirm the calculation of the R 2 values.

Diploma in Statistics Introduction to Regression Lecture 5.18 Solution Calculate the simple linear regression of Jobtime on Ops per Unit. Comment on the negative correlation of Jobtime with Ops per Unit in the light of the corresponding t-value. Comment: The t-value is insignificant; the negative correlation is just chance variation, with no substantive meaning.

Diploma in Statistics Introduction to Regression Lecture 5.19 Variance Inflation Factors Convention: problem if > 90% or VIF k > 10

Diploma in Statistics Introduction to Regression Lecture What to do? Get new X values, to break correlation pattern –impractical in observational studies Choose a subset of the X variables –manually –automatically stepwise regression other methods

Diploma in Statistics Introduction to Regression Lecture Residential load survey data. Data collected by a US electricity supplier during an investigation of the factors that influence peak demand for electricity by residential customers. Load is demand at system peak demand hour, (kW) Size is house size, in SqFt/1000, Income (X2) is annual family income, in $/1000, AirCon (X3) is air conditioning capacity, in tons, Index (X4) is the house appliance index, in kW, Residents (X5) is number in house on a typical day

Diploma in Statistics Introduction to Regression Lecture Matrix plot

Diploma in Statistics Introduction to Regression Lecture Results All variables in: Predictor Coef SE Coef T P Constant Size Income AirCon Index Residents Income deleted Predictor Coef SE Coef T P Constant Size AirCon Index Residents

Diploma in Statistics Introduction to Regression Lecture Exercise Calculate the VIF for Size. Comment. Homework Calculate variance inflation factors for all explanatory variables. Discuss

Diploma in Statistics Introduction to Regression Lecture Multicollinearity when when there is perfect correlation within the X variables. Example: Indicators Illustration: Minitab

Diploma in Statistics Introduction to Regression Lecture Introduction to Regression Lecture Review 2.Transforming data, the log transform i.liver fluke egg hatching rate ii.explaining CEO remuneration iii.brain weights and body weightsA 3.SLR with transformed data 4.Transforming X, quadratic fit 5.Other options

Diploma in Statistics Introduction to Regression Lecture (i)Hatching of liver fluke eggs The life cycle of the liver fluke

Diploma in Statistics Introduction to Regression Lecture Hatching of liver fluke eggs: Duration and Success rate

Diploma in Statistics Introduction to Regression Lecture 5.119

Diploma in Statistics Introduction to Regression Lecture 5.120

Diploma in Statistics Introduction to Regression Lecture (ii)Explaining CEO Compensation and Company Sales, (Forbes magazine, May 1994)

Diploma in Statistics Introduction to Regression Lecture Explaining CEO Remuneration, bivariate log transformation

Diploma in Statistics Introduction to Regression Lecture (iii) Mammals' Brainweight vs Bodyweight

Diploma in Statistics Introduction to Regression Lecture Scatterplot view

Diploma in Statistics Introduction to Regression Lecture Scatterplot view, log transform

Diploma in Statistics Introduction to Regression Lecture Scatterplot view, Dinosaurs deleted

Diploma in Statistics Introduction to Regression Lecture Histogram view

Diploma in Statistics Introduction to Regression Lecture Histogram view, log transform

Diploma in Statistics Introduction to Regression Lecture Changing spread with log

Diploma in Statistics Introduction to Regression Lecture Changing spread with log

Diploma in Statistics Introduction to Regression Lecture Changing spread with log

Diploma in Statistics Introduction to Regression Lecture Changing spread with log

Diploma in Statistics Introduction to Regression Lecture Changing spread with log

Diploma in Statistics Introduction to Regression Lecture Changing spread with log

Diploma in Statistics Introduction to Regression Lecture Changing spread with log

Diploma in Statistics Introduction to Regression Lecture Changing spread with log

Diploma in Statistics Introduction to Regression Lecture Changing spread with log

Diploma in Statistics Introduction to Regression Lecture Why the log transform works High spread at high X transformed to low spread at high Y Low spread at low X transformed to high spread at low Y

Diploma in Statistics Introduction to Regression Lecture Why the log transform works 10 to 100 transformed to log 10 (10) to log 10 (10 2 ) i.e. 1 to 2 1/10 = 0.1 to 1/100 = 0.01 transformed to log 10 (10 –1 ) to log 10 (10 –2 ) i.e., – 1 to – 2

Diploma in Statistics Introduction to Regression Lecture Introduction to Regression Lecture Review 2.Transforming data, the log transform i.liver fluke egg hatching rate ii.explaining CEO remuneration iii.brain weights and body weights 3.SLR with transformed data 4.Transforming X, quadratic fit 5.Other options

Diploma in Statistics Introduction to Regression Lecture SLR with transformed data LBrainW versus LBodyW The regression equation is LBrainW = LBodyW PredictorCoef SE Coef T P Constant LBodyW S =

Diploma in Statistics Introduction to Regression Lecture Application: Do humans conform? Human

Diploma in Statistics Introduction to Regression Lecture Application: Do humans conform? Delete the Human data, calculate regression, predict human LBrainW and compare to actual, relative to s

Diploma in Statistics Introduction to Regression Lecture Application: Do humans conform? Regression Analysis: LBrainW versus LBodyW The regression equation is LBrainW = LBodyW Predictor Coef SE Coef t p Constant LBodyW S =

Diploma in Statistics Introduction to Regression Lecture Application: Do humans conform? LBodyW(Human) = LBrainW(Human) = Predicted LBrainW= × = Residual= – = Residual / s = / = 3.03

Diploma in Statistics Introduction to Regression Lecture Deleted residuals For each potentially exceptional case: –delete the case –calculate the regression from the rest –use the fitted equation to calculate a deleted fitted value –calculate deleted residual = obseved value – deleted fitted value Minitab does this automatically for all cases!

Diploma in Statistics Introduction to Regression Lecture Application: Do humans conform? With 63 cases, we do not expect to see any cases with residuals exceeding 3 standard deviations. On the other hand, recalling the scatter plot, the humans do not appear particulary exceptional. The dotplot view of deleted residuals emphasises this: Water opossums appear more exceptional. Human Water Opossum

Diploma in Statistics Introduction to Regression Lecture Application: Do humans conform?

Diploma in Statistics Introduction to Regression Lecture Introduction to Regression Lecture Review 2.Transforming data, the log transform i.liver fluke egg hatching rate ii.explaining CEO remuneration iii.brain weights and body weights 3.SLR with transformed data 4.Transforming X, quadratic fit 5.Other options

Diploma in Statistics Introduction to Regression Lecture Optimising a nicotine extraction process In determining the quantity of nicotine in different samples of tobacco, temperature is a key variable in optimising the extraction process. A study of this phenomenon involving analysis of 18 samples produced these data.

Diploma in Statistics Introduction to Regression Lecture Optimising a nicotine extraction process Regression Analysis: Nicotine versus Temperature The regression equation is Nicotine = Temperature Predictor Coef SE Coef T P Constant Temperature S = R-Sq = 74.8%

Diploma in Statistics Introduction to Regression Lecture Optimising a nicotine extraction process

Diploma in Statistics Introduction to Regression Lecture Optimising a nicotine extraction process, quadratic fit

Diploma in Statistics Introduction to Regression Lecture Optimising a nicotine extraction process, quadratic fit The regression equation is Nicotine = Temperature Temp-sqr Predictor Coef SE Coef T P Constant Temperature Temp-sqr S = R-Sq = 81.5%

Diploma in Statistics Introduction to Regression Lecture Optimising a nicotine extraction process, quadratic fit

Diploma in Statistics Introduction to Regression Lecture Optimising a nicotine extraction process, quadratic fit, case 5 excluded The regression equation is Nicotine = Temperature Temp-sqr Predictor Coef SE Coef T P Constant Temperature Temp-sqr S = R-Sq = 88.6%

Diploma in Statistics Introduction to Regression Lecture Optimising a nicotine extraction process, quadratic fit, case 5 excluded

Diploma in Statistics Introduction to Regression Lecture Other options Other functions, –e.g., 1/Y,  Y, Y 2, etc., same for X Generalised linear models, –choose a function of Y, a model for  etc.

Diploma in Statistics Introduction to Regression Lecture Reading EM Section Hamilton, Ch. 5 Extra Notes: More on log