1 Re-expressing Data  Chapter 6 – Normal Model –What if data do not follow a Normal model?  Chapters 8 & 9 – Linear Model –What if a relationship between.

Slides:



Advertisements
Similar presentations
Chapter 3 Examining Relationships Lindsey Van Cleave AP Statistics September 24, 2006.
Advertisements

Chapter 10: Re-expressing data –Get it straight!
Chapter 10: Re-Expressing Data: Get it Straight
Chapter 10 Re-Expressing data: Get it Straight
Get it Straight!! Chapter 10
Copyright © 2010, 2007, 2004 Pearson Education, Inc. Chapter 10 Re-expressing Data: Get it Straight!
Chapter 10 Re-expressing the data
Re-expressing data CH. 10.
Re-expressing the Data: Get It Straight!
Chapter 10 Re-expressing Data: Get it Straight!!
Class 16: Thursday, Nov. 4 Note: I will you some info on the final project this weekend and will discuss in class on Tuesday.
Class 5: Thurs., Sep. 23 Example of using regression to make predictions and understand the likely errors in the predictions: salaries of teachers and.
Lecture 19: Tues., Nov. 11th R-squared (8.6.1) Review
Class 6: Tuesday, Sep. 28 Section 2.4. Checking the assumptions of the simple linear regression model: –Residual plots –Normal quantile plots Outliers.
Ch. 6 The Normal Distribution
Scatterplots, Association,and Correlation
8/7/2015Slide 1 Simple linear regression is an appropriate model of the relationship between two quantitative variables provided: the data satisfies the.
Inference for regression - Simple linear regression
Chap 6-1 Copyright ©2013 Pearson Education, Inc. publishing as Prentice Hall Chapter 6 The Normal Distribution Business Statistics: A First Course 6 th.
BPS - 3rd Ed. Chapter 211 Inference for Regression.
Copyright ©2006 Brooks/Cole, a division of Thomson Learning, Inc. Turning Data Into Information Chapter 2.
Statistics Review Chapter 10. Important Ideas In this chapter, we have leaned how to re- express the data and why it is needed.
Chapter 10: Re-Expressing Data: Get it Straight AP Statistics.
Applied Quantitative Analysis and Practices LECTURE#11 By Dr. Osman Sadiq Paracha.
MBP1010H – Lecture 4: March 26, Multiple regression 2.Survival analysis Reading: Introduction to the Practice of Statistics: Chapters 2, 10 and 11.
Inference for Regression Simple Linear Regression IPS Chapter 10.1 © 2009 W.H. Freeman and Company.
12.1 WS Solutions. (b) The y-intercept says that if there no time spent at the table, we would predict the average number of calories consumed to be
June 11, 2008Stat Lecture 10 - Review1 Midterm review Chapters 1-5 Statistics Lecture 10.
Chapter 10 Re-expressing the data
Copyright © 2007 Pearson Education, Inc. Publishing as Pearson Addison-Wesley Slide
Chapter 10 Re-expressing Data: Get It Straight!. Slide Straight to the Point We cannot use a linear model unless the relationship between the two.
Lecture 6 Re-expressing Data: It’s Easier Than You Think.
Copyright © 2010 Pearson Education, Inc. Slide A least squares regression line was fitted to the weights (in pounds) versus age (in months) of a.
Bivariate Data Analysis Bivariate Data analysis 4.
Chapter 7 Scatterplots, Association, and Correlation.
CHAPTER 5 Regression BPS - 5TH ED.CHAPTER 5 1. PREDICTION VIA REGRESSION LINE NUMBER OF NEW BIRDS AND PERCENT RETURNING BPS - 5TH ED.CHAPTER 5 2.
Chapter Bivariate Data (x,y) data pairs Plotted with Scatter plots x = explanatory variable; y = response Bivariate Normal Distribution – for.
Chapter 2 Examining Relationships.  Response variable measures outcome of a study (dependent variable)  Explanatory variable explains or influences.
Chapter 14: Inference for Regression. A brief review of chapter 4... (Regression Analysis: Exploring Association BetweenVariables )  Bi-variate data.
Residuals Recall that the vertical distances from the points to the least-squares regression line are as small as possible.  Because those vertical distances.
Reexpressing Data. Re-express data – is that cheating? Not at all. Sometimes data that may look linear at first is actually not linear at all. Straight.
Section 4.1 Transforming Relationships AP Statistics.
If the scatter is curved, we can straighten it Then use a linear model Types of transformations for x, y, or both: 1.Square 2.Square root 3.Log 4.Negative.
Stat 112 Notes 14 Assessing the assumptions of the multiple regression model and remedies when assumptions are not met (Chapter 6).
Chapter 5 Lesson 5.4 Summarizing Bivariate Data 5.4: Nonlinear Relationships and Transformations.
Re-Expressing Data. Scatter Plot of: Weight of Vehicle vs. Fuel Efficiency Residual Plot of: Weight of Vehicle vs. Fuel Efficiency.
Chapter 9 Regression Wisdom
Copyright © 2010 Pearson Education, Inc. Chapter 10 Re-expressing Data: Get it Straight!
Understanding & Comparing Distributions Chapter 5.
Copyright © 2010, 2007, 2004 Pearson Education, Inc. Chapter 10 Re-expressing Data: Get it Straight!
BUSINESS MATHEMATICS & STATISTICS. Module 6 Correlation ( Lecture 28-29) Line Fitting ( Lectures 30-31) Time Series and Exponential Smoothing ( Lectures.
BPS - 5th Ed. Chapter 231 Inference for Regression.
Statistics 10 Re-Expressing Data Get it Straight.
Copyright © 2010, 2007, 2004 Pearson Education, Inc. Chapter 10 Re-expressing Data: Get it Straight!
Copyright © 2007 Pearson Education, Inc. Publishing as Pearson Addison-Wesley Slide
 Understand why re-expressing data is useful  Recognize when the pattern of the data indicates that no re- expression will improve it  Be able to reverse.
Model validation and prediction
Re-expressing the Data: Get It Straight!
Chapter 10 Re-Expressing data: Get it Straight
Regression model Y represents a value of the response variable.
Re-expressing the Data: Get It Straight!
Re-expressing the Data: Get It Straight!
Unit 3 – Linear regression
Basic Practice of Statistics - 3rd Edition Inference for Regression
CHAPTER 12 More About Regression
Lecture 6 Re-expressing Data: It’s Easier Than You Think
Indicator Variables Response: Highway MPG
Algebra Review The equation of a straight line y = mx + b
Presentation transcript:

1 Re-expressing Data  Chapter 6 – Normal Model –What if data do not follow a Normal model?  Chapters 8 & 9 – Linear Model –What if a relationship between two variables is not linear?

2 Re-expressing Data  Re-expression is another name for changing the scale of (transforming) the data.  Usually we re-express the response variable, Y.

3 Goals of Re-expression  Goal 1 – Make the distribution of the re-expressed data more symmetric.  Goal 2 – Make the spread of the re-expressed data more similar across groups.

4 Goals of Re-expression  Goal 3 – Make the form of a scatter plot more linear.  Goal 4 – Make the scatter in the scatter plot more even across all values of the explanatory variable.

5 Ladder of Powers  Power: 2  Re-expression:  Comment: Use on left skewed data.

6 Ladder of Powers  Power: 1  Re-expression:  Comment: No re-expression. Do not re-express the data if they are already well behaved.

7 Ladder of Powers  Power: ½  Re-expression:  Comment: Use on count data or when scatter in a scatter plot tends to increase as the explanatory variable increases.

8 Ladder of Powers  Power: “0”  Re-expression:  Comments: Not really the “0” power. Use on right skewed data. Measurements cannot be negative or zero.

9 Ladder of Powers  Power: –½, –1  Re-expression:  Comments: Use on right skewed data. Measurements cannot be negative or zero. Use on ratios.

10 Goal 1 - Symmetry  Data are obtained on the time between nerve pulses along a nerve fiber.  Time is rounded to the nearest half unit where a unit is of a second. –30.5 represents

11 Time ( sec)

12 Time – Nerve Pulses  Distribution is skewed right.  Sample mean (12.305) is much larger than the sample median (7.5).  Many potential outliers.  Data not from a Normal model.

13 Sqrt(Time)

14 Log(Time)

15 Summary  Time – Highly skewed to the right.  Sqrt(Time) – Still skewed right.  Log(Time) –Fairly symmetric and mounded in the middle. –Could have come from a Normal model.

16 Goal 3 – Straighten Up  What is the relationship between the temperature of coffee and the time since it was poured? –Y, temperature ( o F) –X, time (minutes)

17

18 Cooling Coffee  There is a general negative association – as time since the coffee was poured increases the temperature of the coffee decreases.

19 Linear Model

20 Linear Model Fit  Summary –Predicted Temp = – 1.56*Time –On average, temperature decreases 1.56 o F per minute. –R 2 = 0.99, 99% of the variation in temperature is explained by the linear relationship with time.

21 Plot of Residuals

22 Curved Pattern  There is a clear pattern in the plot of residuals versus time. –Under predict, over predict, under predict.  The linear fit is very good, but we can do better.

23

24 Log(Temp) by Time  Summary –Predicted Log(Temp) = – *Time –On average, log temperature decreases log( o F) per minute.

25 Plot of Residuals

26 Interpretation  There is a random scatter of points around the zero line.  The linear model relating Log(Temp) to Time is the best we can do.

27 Original Scale?  Predicted Log(Temp) = – *Time  Predicted Temp = 180.3*e –0.0114*Time –Predicted temp at time=0, o F –The predicted temp in one more minute is the predicted temp now multiplied by e – =

28 JMP  Method 1 –Create a new column in JMP, Log(Temp): Cols – Formula – Transcendental – Log.

29 JMP  Method 1 (continued) –Fit Y by X  Y – Log(Temp)  X – Time –Fit Linear

30 JMP  Method 2 –Fit Y by X  Y – Temp  X – Time –Fit Special  Transform Y – Log