Re-expressing Data:Get it Straight!

Slides:



Advertisements
Similar presentations
Copyright © 2010 Pearson Education, Inc. Slide A least squares regression line was fitted to the weights (in pounds) versus age (in months) of a.
Advertisements

 Objective: To determine whether or not a curved relationship can be salvaged and re-expressed into a linear relationship. If so, complete the re-expression.
Chapter 10: Re-expressing data –Get it straight!
Copyright © 2011 Pearson Education, Inc. Curved Patterns Chapter 20.
Copyright © 2014, 2011 Pearson Education, Inc. 1 Chapter 20 Curved Patterns.
Chapter 10: Re-Expressing Data: Get it Straight
Extrapolation: Reaching Beyond the Data
Chapter 10 Re-Expressing data: Get it Straight
Get it Straight!! Chapter 10
Copyright © 2010, 2007, 2004 Pearson Education, Inc. Chapter 10 Re-expressing Data: Get it Straight!
Chapter 10 Re-expressing the data
Re-expressing data CH. 10.
Re-expressing the Data: Get It Straight!
Chapter 10 Re-expressing Data: Get it Straight!!
The Normal Model Ch. 6. “All models are wrong – but some are useful.” -- George Box.
Inference for regression - Simple linear regression
Transforming to achieve linearity
Chapter 10: Re-Expressing Data: Get it Straight AP Statistics.
DO NOW Read Pages 222 – 224 Read Pages 222 – 224 Stop before “Goals of Re-expression” Stop before “Goals of Re-expression” Answer the following questions:
Chapter 10 Re-expressing the data
Copyright © 2007 Pearson Education, Inc. Publishing as Pearson Addison-Wesley Slide
Chapter 10 Re-expressing Data: Get It Straight!. Slide Straight to the Point We cannot use a linear model unless the relationship between the two.
Chapter 10: Re- expressing Data by: Sai Machineni, Hang Ha AP STATISTICS.
Lecture 6 Re-expressing Data: It’s Easier Than You Think.
Copyright © 2010 Pearson Education, Inc. Slide A least squares regression line was fitted to the weights (in pounds) versus age (in months) of a.
Bivariate Data Analysis Bivariate Data analysis 4.
Reexpressing Data. Re-express data – is that cheating? Not at all. Sometimes data that may look linear at first is actually not linear at all. Straight.
If the scatter is curved, we can straighten it Then use a linear model Types of transformations for x, y, or both: 1.Square 2.Square root 3.Log 4.Negative.
Chapter 5 Lesson 5.4 Summarizing Bivariate Data 5.4: Nonlinear Relationships and Transformations.
Re-Expressing Data. Scatter Plot of: Weight of Vehicle vs. Fuel Efficiency Residual Plot of: Weight of Vehicle vs. Fuel Efficiency.
Chapter 9 Regression Wisdom
Copyright © 2010 Pearson Education, Inc. Chapter 10 Re-expressing Data: Get it Straight!
Copyright © 2010, 2007, 2004 Pearson Education, Inc. Chapter 10 Re-expressing Data: Get it Straight!
Chapter 10 Notes AP Statistics. Re-expressing Data We cannot use a linear model unless the relationship between the two variables is linear. If the relationship.
Statistics 10 Re-Expressing Data Get it Straight.
Copyright © 2010, 2007, 2004 Pearson Education, Inc. Chapter 10 Re-expressing Data: Get it Straight!
Copyright © 2007 Pearson Education, Inc. Publishing as Pearson Addison-Wesley Slide
Chapter 13 Lesson 13.2a Simple Linear Regression and Correlation: Inferential Methods 13.2: Inferences About the Slope of the Population Regression Line.
Chapter 13 Lesson 13.2a Simple Linear Regression and Correlation: Inferential Methods 13.2: Inferences About the Slope of the Population Regression Line.
 Understand why re-expressing data is useful  Recognize when the pattern of the data indicates that no re- expression will improve it  Be able to reverse.
Howard Community College
Thursday, May 12, 2016 Report at 11:30 to Prairieview
Scatterplots, Association, and Correlation
MATH-138 Elementary Statistics
WARM UP: Use the Reciprocal Model to predict the y for an x = 8
Let’s Get It Straight! Re-expressing Data Curvilinear Regression
Chapter 10: Re-Expression of Curved Relationships
Understanding and Comparing Distributions
Inferences for Regression
Re-expressing the Data: Get It Straight!
Chapter 7 Linear Regression.
Chapter 10 Re-Expressing data: Get it Straight
Re-expressing Data: Get it Straight!
Chapter 8 Part 2 Linear Regression
Re-expressing the Data: Get It Straight!
Re-expressing the Data: Get It Straight!
AP Exam Review Chapters 1-10
Chapter 7 Part 2 Scatterplots, Association, and Correlation
So how do we know what type of re-expression to use?
Understanding and Comparing Distributions
Understanding and Comparing Distributions
Chapter 10 Re-expression Day 1.
Lecture 6 Re-expressing Data: It’s Easier Than You Think
Understanding and Comparing Distributions
Inferences for Regression
Chapter 3.2 Regression Wisdom.
Chapter 9 Regression Wisdom.
Re-expressing Data: Get it Straight!
Scatterplots, Association, and Correlation
Presentation transcript:

Re-expressing Data:Get it Straight! Chapter 9 Re-expressing Data:Get it Straight!

Straight to the Point We cannot use a linear model unless the relationship between the two variables is linear. Often re-expression can save the day, straightening bent relationships so that we can fit and use a simple linear model.

Straight to the Point (cont.) Here is the scatterplot for fuel efficiency (in miles per gallon) vs weight (in pounds) for cars. The graph looks fairly linear at first:

Straight to the Point (cont.) A look at the residuals plot shows a problem: What is wrong here?

Straight to the Point (cont.) Even though the R² is 81.6%, the residuals don’t show a random scatter. The shape is bent.

Looking back at the first scatterplot, you may be able to see this slight bend. Now think about the regression line through these points. How much do you think a car would have to weigh to have a fuel efficiency of 0 mpg? Do you think this is accurate?

A Hummer H2 weighs about 6400 pounds. This vehicle is not known for its fuel efficiency, but it does get a positive fuel efficiency. Extrapolation is ALWAYS dangerous, but it becomes more dangerous as the model becomes more wrong. We can re-express the data to get a model that is more linear.

We can re-express fuel efficiency as gallons per hundred miles (a reciprocal) and eliminate the bend in the original scatterplot: We will talk about different ways to re-express in a bit and which ones we should be using for certain scenarios...

A look at the residuals plot for the new model seems more reasonable:

Now compare the two graphs: The direction of the association is positive now. Why? Because we are measuring gas consumption and heavier cars consume more gas per mile.

Now compare the two graphs: The relationship is much straighter now. Question?: What does the reciprocal model tell us about the Hummer?

The regression line for this graph predicts that the hummer, with a weight of 6400 lbs, will predict somewhere near 9.7 gallons. What does this mean? It means the car is predicted to use 9.7 gallons for every 100 miles. Or, we can write it like this so it makes more sense: 100 𝑚𝑖𝑙𝑒𝑠 9.7 𝑔𝑎𝑙𝑙𝑜𝑛𝑠 This is 10.3 mpg. This is a much more reasonable prediction than earlier. The Hummer actually gets 11 mpg, which was pretty close.

Why Do We Re-Express Data? There are several reasons. Each of the following goals help make the data more suitable for analysis by the methods that we know.

Goals of Re-expression Goal 1: Make the distribution of a variable more symmetric. Why? Because it is easier to summarize the center. If it is symmetric, we can use the mean and standard deviation. If it is normal, we can use the Empirical rule and find out things using z-scores.

Goals of Re-expression (cont.) Goal 2: Make the spread of several groups more alike, even if their centers differ. Why? Groups that share a common spread are easier to compare. Taking the Logs of all the Assets makes the boxes more symmetric with spreads that are more equal.

Goals of Re-expression (cont.) Doing this makes it easier to compare assets across market sectors. It can also reveal problems in the data. Some companies that looked like outliers turned out to be more typical. Two companies in the finance sector now stick out. They may have been placed in the wrong sector, but we wouldn’t have seen that in the original data.

Goals of Re-expression (cont.) Goal 3: Make the form of a scatterplot more nearly linear. Why? Linear scatterplots are easier to model. We can fit a linear regression to a model once it is straight. Taking the logs of the assets makes things much more linear.

Goals of Re-expression (cont.) Goal 4: Make the scatter in a scatterplot spread out evenly rather than thickening at one end. Why? Having a consistent scatter is an important condition for many of the methods we use in Statistics.

Question: What feature of each of these displays suggest that a re-expression may be helpful? Solution: The boxplots each show skewness to the high end. The medians are low in the boxes and several show high outliers.

Question: What feature of each of these displays suggest that a re-expression may be helpful? Solution: The histogram of heart rates is skewed to the right. Re-expression often makes skewed distributions look more symmetric.

Question: What feature of each of these displays suggest that a re-expression may be helpful? Solution: The scatterplot shows a curved relationship, concave upward. Re-expressing either variable may help to straighten the pattern.

Homework: Pg 247 – 250, Exercises 1 – 7, 9, 11, 13 Read Chapter 9 Work on the Guided Reading