Bivariate Data Analysis Bivariate Data analysis 4.

Slides:



Advertisements
Similar presentations
Copyright © 2010 Pearson Education, Inc. Slide A least squares regression line was fitted to the weights (in pounds) versus age (in months) of a.
Advertisements

Re-Expressing Data Get it Straight!. Page 192, #2, 4, 15, 19, 22 Residuals Pg 193, # 11, 23, 27, 33, 45 Pg 195, 16, 22, 23,25,37 Regression Wisdom Pg.
 Objective: To determine whether or not a curved relationship can be salvaged and re-expressed into a linear relationship. If so, complete the re-expression.
Chapter 10: Re-expressing data –Get it straight!
Chapter 10: Re-Expressing Data: Get it Straight
Copyright © 2010 Pearson Education, Inc. Slide
Residuals Revisited.   The linear model we are using assumes that the relationship between the two variables is a perfect straight line.  The residuals.
Chapter 10 Re-Expressing data: Get it Straight
Get it Straight!! Chapter 10
Copyright © 2010, 2007, 2004 Pearson Education, Inc. Chapter 10 Re-expressing Data: Get it Straight!
Chapter 10 Re-expressing the data
Re-expressing data CH. 10.
Re-expressing the Data: Get It Straight!
Chapter 10 Re-expressing Data: Get it Straight!!
1 Re-expressing Data  Chapter 6 – Normal Model –What if data do not follow a Normal model?  Chapters 8 & 9 – Linear Model –What if a relationship between.
Chapter 12-2 Transforming Relationships Day 2
Inference for regression - Simple linear regression
Scatterplots, Association, and Correlation Copyright © 2010, 2007, 2004 Pearson Education, Inc.
Transforming to achieve linearity
Correlation with a Non - Linear Emphasis Day 2.  Correlation measures the strength of the linear association between 2 quantitative variables.  Before.
Prior Knowledge Linear and non linear relationships x and y coordinates Linear graphs are straight line graphs Non-linear graphs do not have a straight.
Notes Bivariate Data Chapters Bivariate Data Explores relationships between two quantitative variables.
Statistics Review Chapter 10. Important Ideas In this chapter, we have leaned how to re- express the data and why it is needed.
Chapter 10: Re-Expressing Data: Get it Straight AP Statistics.
Chapter 10: Re-expressing Data It’s easier than you think!
Wednesday, May 13, 2015 Report at 11:30 to Prairieview.
The Examination of Residuals. Examination of Residuals The fitting of models to data is done using an iterative approach. The first step is to fit a simple.
Max temp v min temp. It can be seen from the scatterplot that there is a correlation between max temp and min temp. Generally, as min temp increases,
Notes Bivariate Data Chapters Bivariate Data Explores relationships between two quantitative variables.
M25- Growth & Transformations 1  Department of ISM, University of Alabama, Lesson Objectives: Recognize exponential growth or decay. Use log(Y.
Relationships If we are doing a study which involves more than one variable, how can we tell if there is a relationship between two (or more) of the.
DO NOW Read Pages 222 – 224 Read Pages 222 – 224 Stop before “Goals of Re-expression” Stop before “Goals of Re-expression” Answer the following questions:
Chapter 10 Re-expressing the data
Copyright © 2007 Pearson Education, Inc. Publishing as Pearson Addison-Wesley Slide
Chapter 10 Re-expressing Data: Get It Straight!. Slide Straight to the Point We cannot use a linear model unless the relationship between the two.
Chapter 10: Re- expressing Data by: Sai Machineni, Hang Ha AP STATISTICS.
Lecture 6 Re-expressing Data: It’s Easier Than You Think.
Copyright © 2010 Pearson Education, Inc. Slide A least squares regression line was fitted to the weights (in pounds) versus age (in months) of a.
Chapter 8 Linear Regression HOW CAN A MODEL BE CREATED WHICH REPRESENTS THE LINEAR RELATIONSHIP BETWEEN TWO QUANTITATIVE VARIABLES?
Copyright © 2010 Pearson Education, Inc. Chapter 7 Scatterplots, Association, and Correlation.
Reexpressing Data. Re-express data – is that cheating? Not at all. Sometimes data that may look linear at first is actually not linear at all. Straight.
If the scatter is curved, we can straighten it Then use a linear model Types of transformations for x, y, or both: 1.Square 2.Square root 3.Log 4.Negative.
Linear Regression Chapter 8. Fat Versus Protein: An Example The following is a scatterplot of total fat versus protein for 30 items on the Burger King.
Chapter 5 Lesson 5.4 Summarizing Bivariate Data 5.4: Nonlinear Relationships and Transformations.
Re-Expressing Data. Scatter Plot of: Weight of Vehicle vs. Fuel Efficiency Residual Plot of: Weight of Vehicle vs. Fuel Efficiency.
Chapter 9 Regression Wisdom
Copyright © 2010 Pearson Education, Inc. Chapter 10 Re-expressing Data: Get it Straight!
Copyright © 2010, 2007, 2004 Pearson Education, Inc. Chapter 10 Re-expressing Data: Get it Straight!
Chapter 10 Notes AP Statistics. Re-expressing Data We cannot use a linear model unless the relationship between the two variables is linear. If the relationship.
REGRESSION MODELS OF BEST FIT Assess the fit of a function model for bivariate (2 variables) data by plotting and analyzing residuals.
Chapter 4 More on Two-Variable Data. Four Corners Play a game of four corners, selecting the corner each time by rolling a die Collect the data in a table.
Statistics 10 Re-Expressing Data Get it Straight.
Copyright © 2010, 2007, 2004 Pearson Education, Inc. Chapter 10 Re-expressing Data: Get it Straight!
Copyright © 2007 Pearson Education, Inc. Publishing as Pearson Addison-Wesley Slide
 Understand why re-expressing data is useful  Recognize when the pattern of the data indicates that no re- expression will improve it  Be able to reverse.
Let’s Get It Straight! Re-expressing Data Curvilinear Regression
Chapter 10: Re-Expression of Curved Relationships
Re-expressing the Data: Get It Straight!
(Residuals and
Chapter 10 Re-Expressing data: Get it Straight
Suppose the maximum number of hours of study among students in your sample is 6. If you used the equation to predict the test score of a student who studied.
Re-expressing Data: Get it Straight!
Re-expressing the Data: Get It Straight!
Re-expressing the Data: Get It Straight!
Re-expressing Data:Get it Straight!
Review of Chapter 3 Examining Relationships
Lecture 6 Re-expressing Data: It’s Easier Than You Think
Re-expressing Data: Get it Straight!
Review of Chapter 3 Examining Relationships
Presentation transcript:

Bivariate Data Analysis Bivariate Data analysis 4

If the relationship is linear the residuals plotted against the original x - values would be scattered randomly above and below the line.

A scatter plot of residuals versus the x-values should be boring and have no interesting features, like direction or shape. It should stretch horizontally with about the same amount of scatter throughout. It should show no curves or outliers

r = 0.87 indicates a strong linear relationship between x and y

The scatter plot below however shows the relationship is clearly non-linear

When examining residuals to check whether a linear model is appropriate, it is usually best to plot them. The variation in the residuals is the key to assessing how well the model fits.

The pattern of residuals looks more like a parabola. This should indicate that the data were not really linear, but were more likely to be quadratic.

Discuss this data.

Discuss this situation. Outlier?

Discuss the plot of the residuals

Discuss this scatter plot

Linear?

Residuals

Useful website plots residuals, regression lines etchttp://stat-

Many of our tools for displaying and summarizing data work only when the data meet certain conditions. We cannot use a linear model unless the relationship between two variables is linear. Often re-expression can save the day, straightening bent relationships so that we can fit and use a simple linear model.

Displays of the residuals can often help you find subsets in the data.

When a scatterplot shows a CURVED form that consistently increases or decreases, we can often straighten the form of the plot be re-expressing one or both of the variables.

The correlation is That sounds pretty high, but the scatter plot shows something is not quite right.

Re-expressing f/stop speed by squaring straightens the plot.

This plot looks ‘ straight ’. The correlation is now 0.998, but the increase in correlation is not important. (The original value of is already large.) What is important is the form of the plot is now straight, so the correlation is now an appropriate measure of association.

Goals of re-expression Make the distribution (as seen in its histogram, for example) more symmetric. Make the form of the scatter plot more nearly linear. Make the scatter in a scatter plot spread out evenly rather than following a fan shape.

Some hints Try y 2 for unimodal skewed to the left. Try square root of y for counted data. Try logs for measurements that can ’ t be negative and especially when they grow by percentage increases. Try -1/y or -1/(square root of y). Logs straighten exponential trends and pull in a long right trail. Logs straighten power curves.

Try y versus x 2

Try log or 1/x

Don ’ t stray too far from the powers suggested. Taking a high power may artificially inflate R 2, but it won ’ t give a useful or meaningful model. It is better to stick with powers between 2 and -2. Even in that range you should prefer the simpler powers in the ladder to those in the cracks. A square root is easier to understand than the power.

Comparing histograms and scatter graphs

The data in the scatter plot below shows the progression of the fastest times for the men’s marathon since the Second World War. We may want to use this data to predict the fastest time at 1 January 2010 (i.e. 64 years after 1 January 1946). Page 53

Possible solutions a quadratic (y = ax 2 + bx + c) an exponential function (y = ae bx ) a power function (y = ax b ) 2 separate straight lines – one for say 0 – 23 years and one for say 23 – 60 years a line for only the later years, say 23 – 60 years

Quadratic Curve seems to fit R 2 = is very high Inappropriate to quote r as it is not linear time starts increasing (not sensible) Page 54

Exponential Doesn ’ t fit the data points particularly well

Power Function reasonable fit, R 2 is high R 2 = √

Line for only the later years ( ) Line ( ) – reasonable fit, R 2 is high Note: We only use the later years line for the prediction and ignore the earlier years √

The data in the scatter plot below comes from a random sample of 60 models of new cars taken from all models on the market in New Zealand in May We want to use the engine size to predict the weight of a car. Seems to be linear for engine sizes less than 2500cc. Very weak or no linear relationship for engine sizes over 2500cc. Solution: Fit a line for engine sizes less than 2500cc. Page 55