Diploma in Statistics Introduction to Regression Lecture 5.11 Introduction to Regression Lecture Review 2.Transforming data, the log transform i.liver fluke egg hatching rate ii.explaining CEO remuneration iii.brain weights and body weights 3.SLR with transformed data 4.Transforming X, quadratic fit 5.Other options
Diploma in Statistics Introduction to Regression Lecture 5.12 Using t values Convention: n >30 is big, n < 30 is small. Z 0.05 = 1.96 ≈ 2 t 30, 0.05 = 2.04 ≈ 2
Diploma in Statistics Introduction to Regression Lecture 5.13
Diploma in Statistics Introduction to Regression Lecture 5.14 Quantify the extent of the recovery in Year 6, Q3. = 1030 Q Q Q Q Time Year 6 Q2: P = 1657 = × 22 = 2033 P – = 1657 – 2033 = – 376 Year 6 Q3: P = 2185 = × 23 = 1985 P – = 2185 – 1985 = 200 Homework 4.2.1
Diploma in Statistics Introduction to Regression Lecture 5.15 Homework List correspondences between the output from the original regression and the output from the alternative regression. Confirm that the coefficients of Q1, Q2 and Q3 in the original are the corresponding coefficients in the alternative with the Q4 coefficient added.
Diploma in Statistics Introduction to Regression Lecture 5.16 Predictor Coef SE Coef T P Noconstant Q Q Q Q Time S = Predictor Coef SE Coef T P Constant Q Q Q Time S =
Diploma in Statistics Introduction to Regression Lecture 5.17 Homework Calculate the simple linear regressions of Jobtime on each of T_Ops and Units. Confirm the corresponding t-values. 2.Calculate the simple linear regression of Jobtime on Ops per Unit. Comment on the negative correlation of Jobtime with Ops per Unit in the light of the corresponding t-value. 3.Confirm the calculation of the R 2 values.
Diploma in Statistics Introduction to Regression Lecture 5.18 Solution Calculate the simple linear regression of Jobtime on Ops per Unit. Comment on the negative correlation of Jobtime with Ops per Unit in the light of the corresponding t-value. Comment: The t-value is insignificant; the negative correlation is just chance variation, with no substantive meaning.
Diploma in Statistics Introduction to Regression Lecture 5.19 Variance Inflation Factors Convention: problem if > 90% or VIF k > 10
Diploma in Statistics Introduction to Regression Lecture What to do? Get new X values, to break correlation pattern –impractical in observational studies Choose a subset of the X variables –manually –automatically stepwise regression other methods
Diploma in Statistics Introduction to Regression Lecture Residential load survey data. Data collected by a US electricity supplier during an investigation of the factors that influence peak demand for electricity by residential customers. Load is demand at system peak demand hour, (kW) Size is house size, in SqFt/1000, Income (X2) is annual family income, in $/1000, AirCon (X3) is air conditioning capacity, in tons, Index (X4) is the house appliance index, in kW, Residents (X5) is number in house on a typical day
Diploma in Statistics Introduction to Regression Lecture Matrix plot
Diploma in Statistics Introduction to Regression Lecture Results All variables in: Predictor Coef SE Coef T P Constant Size Income AirCon Index Residents Income deleted Predictor Coef SE Coef T P Constant Size AirCon Index Residents
Diploma in Statistics Introduction to Regression Lecture Exercise Calculate the VIF for Size. Comment. Homework Calculate variance inflation factors for all explanatory variables. Discuss
Diploma in Statistics Introduction to Regression Lecture Multicollinearity when when there is perfect correlation within the X variables. Example: Indicators Illustration: Minitab
Diploma in Statistics Introduction to Regression Lecture Introduction to Regression Lecture Review 2.Transforming data, the log transform i.liver fluke egg hatching rate ii.explaining CEO remuneration iii.brain weights and body weightsA 3.SLR with transformed data 4.Transforming X, quadratic fit 5.Other options
Diploma in Statistics Introduction to Regression Lecture (i)Hatching of liver fluke eggs The life cycle of the liver fluke
Diploma in Statistics Introduction to Regression Lecture Hatching of liver fluke eggs: Duration and Success rate
Diploma in Statistics Introduction to Regression Lecture 5.119
Diploma in Statistics Introduction to Regression Lecture 5.120
Diploma in Statistics Introduction to Regression Lecture (ii)Explaining CEO Compensation and Company Sales, (Forbes magazine, May 1994)
Diploma in Statistics Introduction to Regression Lecture Explaining CEO Remuneration, bivariate log transformation
Diploma in Statistics Introduction to Regression Lecture (iii) Mammals' Brainweight vs Bodyweight
Diploma in Statistics Introduction to Regression Lecture Scatterplot view
Diploma in Statistics Introduction to Regression Lecture Scatterplot view, log transform
Diploma in Statistics Introduction to Regression Lecture Scatterplot view, Dinosaurs deleted
Diploma in Statistics Introduction to Regression Lecture Histogram view
Diploma in Statistics Introduction to Regression Lecture Histogram view, log transform
Diploma in Statistics Introduction to Regression Lecture Changing spread with log
Diploma in Statistics Introduction to Regression Lecture Changing spread with log
Diploma in Statistics Introduction to Regression Lecture Changing spread with log
Diploma in Statistics Introduction to Regression Lecture Changing spread with log
Diploma in Statistics Introduction to Regression Lecture Changing spread with log
Diploma in Statistics Introduction to Regression Lecture Changing spread with log
Diploma in Statistics Introduction to Regression Lecture Changing spread with log
Diploma in Statistics Introduction to Regression Lecture Changing spread with log
Diploma in Statistics Introduction to Regression Lecture Changing spread with log
Diploma in Statistics Introduction to Regression Lecture Why the log transform works High spread at high X transformed to low spread at high Y Low spread at low X transformed to high spread at low Y
Diploma in Statistics Introduction to Regression Lecture Why the log transform works 10 to 100 transformed to log 10 (10) to log 10 (10 2 ) i.e. 1 to 2 1/10 = 0.1 to 1/100 = 0.01 transformed to log 10 (10 –1 ) to log 10 (10 –2 ) i.e., – 1 to – 2
Diploma in Statistics Introduction to Regression Lecture Introduction to Regression Lecture Review 2.Transforming data, the log transform i.liver fluke egg hatching rate ii.explaining CEO remuneration iii.brain weights and body weights 3.SLR with transformed data 4.Transforming X, quadratic fit 5.Other options
Diploma in Statistics Introduction to Regression Lecture SLR with transformed data LBrainW versus LBodyW The regression equation is LBrainW = LBodyW PredictorCoef SE Coef T P Constant LBodyW S =
Diploma in Statistics Introduction to Regression Lecture Application: Do humans conform? Human
Diploma in Statistics Introduction to Regression Lecture Application: Do humans conform? Delete the Human data, calculate regression, predict human LBrainW and compare to actual, relative to s
Diploma in Statistics Introduction to Regression Lecture Application: Do humans conform? Regression Analysis: LBrainW versus LBodyW The regression equation is LBrainW = LBodyW Predictor Coef SE Coef t p Constant LBodyW S =
Diploma in Statistics Introduction to Regression Lecture Application: Do humans conform? LBodyW(Human) = LBrainW(Human) = Predicted LBrainW= × = Residual= – = Residual / s = / = 3.03
Diploma in Statistics Introduction to Regression Lecture Deleted residuals For each potentially exceptional case: –delete the case –calculate the regression from the rest –use the fitted equation to calculate a deleted fitted value –calculate deleted residual = obseved value – deleted fitted value Minitab does this automatically for all cases!
Diploma in Statistics Introduction to Regression Lecture Application: Do humans conform? With 63 cases, we do not expect to see any cases with residuals exceeding 3 standard deviations. On the other hand, recalling the scatter plot, the humans do not appear particulary exceptional. The dotplot view of deleted residuals emphasises this: Water opossums appear more exceptional. Human Water Opossum
Diploma in Statistics Introduction to Regression Lecture Application: Do humans conform?
Diploma in Statistics Introduction to Regression Lecture Introduction to Regression Lecture Review 2.Transforming data, the log transform i.liver fluke egg hatching rate ii.explaining CEO remuneration iii.brain weights and body weights 3.SLR with transformed data 4.Transforming X, quadratic fit 5.Other options
Diploma in Statistics Introduction to Regression Lecture Optimising a nicotine extraction process In determining the quantity of nicotine in different samples of tobacco, temperature is a key variable in optimising the extraction process. A study of this phenomenon involving analysis of 18 samples produced these data.
Diploma in Statistics Introduction to Regression Lecture Optimising a nicotine extraction process Regression Analysis: Nicotine versus Temperature The regression equation is Nicotine = Temperature Predictor Coef SE Coef T P Constant Temperature S = R-Sq = 74.8%
Diploma in Statistics Introduction to Regression Lecture Optimising a nicotine extraction process
Diploma in Statistics Introduction to Regression Lecture Optimising a nicotine extraction process, quadratic fit
Diploma in Statistics Introduction to Regression Lecture Optimising a nicotine extraction process, quadratic fit The regression equation is Nicotine = Temperature Temp-sqr Predictor Coef SE Coef T P Constant Temperature Temp-sqr S = R-Sq = 81.5%
Diploma in Statistics Introduction to Regression Lecture Optimising a nicotine extraction process, quadratic fit
Diploma in Statistics Introduction to Regression Lecture Optimising a nicotine extraction process, quadratic fit, case 5 excluded The regression equation is Nicotine = Temperature Temp-sqr Predictor Coef SE Coef T P Constant Temperature Temp-sqr S = R-Sq = 88.6%
Diploma in Statistics Introduction to Regression Lecture Optimising a nicotine extraction process, quadratic fit, case 5 excluded
Diploma in Statistics Introduction to Regression Lecture Other options Other functions, –e.g., 1/Y, Y, Y 2, etc., same for X Generalised linear models, –choose a function of Y, a model for etc.
Diploma in Statistics Introduction to Regression Lecture Reading EM Section Hamilton, Ch. 5 Extra Notes: More on log