Correlations and scatterplots -- Optical illusion ? -- Finding the marginal distributions in the scatterplots (shoe size vs. hours of TV) Regressions --

Slides:



Advertisements
Similar presentations
Stat 1301 More on Regression. Outline of Lecture 1. Regression Effect and Regression Fallacy 2. Regression Line as Least Squares Line 3. Extrapolation.
Advertisements

Unit 4: Linear Relations Minds On 1.Determine which variable is dependent and which is independent. 2.Graph the data. 3.Label and title the graph. 4.Is.
AP Statistics Chapter 3 Practice Problems
Copyright © 2009 Pearson Education, Inc. Chapter 8 Linear Regression.
Chapter 8 Linear Regression © 2010 Pearson Education 1.
CHAPTER 8: LINEAR REGRESSION
Math 3680 Lecture #19 Correlation and Regression.
Copyright ©2006 Brooks/Cole, a division of Thomson Learning, Inc. Relationships Between Quantitative Variables Chapter 5.
AP Statistics Chapters 3 & 4 Measuring Relationships Between 2 Variables.
LSRL Least Squares Regression Line
Relationships Between Quantitative Variables
Regression and Correlation
Optical illusion ? Correlation ( r or R or  ) -- One-number summary of the strength of a relationship -- How to recognize -- How to compute Regressions.
Linear regression (continued) In a test-retest situation, almost always the bottom group on the 1 st test will on average show some improvement on the.
Class 5: Thurs., Sep. 23 Example of using regression to make predictions and understand the likely errors in the predictions: salaries of teachers and.
LINEAR REGRESSIONS: Cricket example About lines Line as a model:
Fall 2006 – Fundamentals of Business Statistics 1 Chapter 13 Introduction to Linear Regression and Correlation Analysis.
LINEAR REGRESSIONS: About lines Line as a model: Understanding the slope Predicted values Residuals How to pick a line? Least squares criterion “Point.
Chapters 10 and 11: Using Regression to Predict Math 1680.
Correlation and regression 1: Correlation Coefficient
Chapter 14 – Correlation and Simple Regression Math 22 Introductory Statistics.
2.4: Cautions about Regression and Correlation. Cautions: Regression & Correlation Correlation measures only linear association. Extrapolation often produces.
Looking at data: relationships - Caution about correlation and regression - The question of causation IPS chapters 2.4 and 2.5 © 2006 W. H. Freeman and.
Biostatistics Unit 9 – Regression and Correlation.
C. A. Warm Up 1/28/15 SoccerBasketballTotal Boys1812 Girls1614 Total Students were asked which sport they would play if they had to choose. 1)Fill in the.
New Seats – Block 1. New Seats – Block 2 Warm-up with Scatterplot Notes 1) 2) 3) 4) 5)
Section Copyright © 2014, 2012, 2010 Pearson Education, Inc. Lecture Slides Elementary Statistics Twelfth Edition and the Triola Statistics Series.
Copyright © 2008 Pearson Education, Inc. Publishing as Pearson Addison-Wesley Chapter 8 Linear Regression.
Scatterplot and trendline. Scatterplot Scatterplot explores the relationship between two quantitative variables. Example:
Statistics & Econometrics Statistics & Econometrics Statistics & Econometrics Statistics & Econometrics Statistics & Econometrics Statistics & Econometrics.
Regression Regression relationship = trend + scatter
Objectives 2.1Scatterplots  Scatterplots  Explanatory and response variables  Interpreting scatterplots  Outliers Adapted from authors’ slides © 2012.
Relationships If we are doing a study which involves more than one variable, how can we tell if there is a relationship between two (or more) of the.
Describing Relationships Using Correlations. 2 More Statistical Notation Correlational analysis requires scores from two variables. X stands for the scores.
Chapter 8 Linear Regression *The Linear Model *Residuals *Best Fit Line *Correlation and the Line *Predicated Values *Regression.
WARM-UP Do the work on the slip of paper (handout)
STA291 Statistical Methods Lecture LINEar Association o r measures “closeness” of data to the “best” line. What line is that? And best in what terms.
Examining Bivariate Data Unit 3 – Statistics. Some Vocabulary Response aka Dependent Variable –Measures an outcome of a study Explanatory aka Independent.
CHAPTER 5 Regression BPS - 5TH ED.CHAPTER 5 1. PREDICTION VIA REGRESSION LINE NUMBER OF NEW BIRDS AND PERCENT RETURNING BPS - 5TH ED.CHAPTER 5 2.
Correlations: Relationship, Strength, & Direction Scatterplots are used to plot correlational data – It displays the extent that two variables are related.
Psychology 820 Correlation Regression & Prediction.
April 1 st, Bellringer-April 1 st, 2015 Video Link Worksheet Link
Correlation and Regression: The Need to Knows Correlation is a statistical technique: tells you if scores on variable X are related to scores on variable.
 Find the Least Squares Regression Line and interpret its slope, y-intercept, and the coefficients of correlation and determination  Justify the regression.
Copyright © 2007 Pearson Education, Inc. Publishing as Pearson Addison-Wesley Slide 8- 1.
Chapter 8 Linear Regression. Fat Versus Protein: An Example 30 items on the Burger King menu:
Lecture PowerPoint Slides Basic Practice of Statistics 7 th Edition.
Correlation  We can often see the strength of the relationship between two quantitative variables in a scatterplot, but be careful. The two figures here.
GOAL: I CAN USE TECHNOLOGY TO COMPUTE AND INTERPRET THE CORRELATION COEFFICIENT OF A LINEAR FIT. (S-ID.8) Data Analysis Correlation Coefficient.
Week 2 Normal Distributions, Scatter Plots, Regression and Random.
Lecture Slides Elementary Statistics Twelfth Edition
Unit 4 LSRL.
Sections Review.
Regression and Correlation
Chapter 5 LSRL.
10.3 Coefficient of Determination and Standard Error of the Estimate
LSRL Least Squares Regression Line
Chapter 4 Correlation.
Chapter 12: Regression Diagnostics
Suppose the maximum number of hours of study among students in your sample is 6. If you used the equation to predict the test score of a student who studied.
Regression Fallacy.
Chapter 8 Part 2 Linear Regression
Algebra 1 Section 6.6.
The Weather Turbulence
Chapter 5 LSRL.
Correlation and Regression
Does age have a strong positive correlation with height? Explain.
Does age have a strong positive correlation with height? Explain.
A medical researcher wishes to determine how the dosage (in mg) of a drug affects the heart rate of the patient. Find the correlation coefficient & interpret.
Presentation transcript:

Correlations and scatterplots -- Optical illusion ? -- Finding the marginal distributions in the scatterplots (shoe size vs. hours of TV) Regressions -- Any model has predicted values and residuals. (Do we always want a model with small residuals ? ) -- The “regression effect” (Why did Galton call these things “regressions” ? ) -- Pitfalls: Outliers -- Pitfalls: Extrapolation -- Conditions for a good regression

Which looks like a stronger relationship?

Optical Illusion ?

correlation =.97 correlation =.71

Shoe size vs. hours of TV…

Linear models and non-linear models Model A:Model B: y = a + bx + error y = a x 1/2 + error Model B has smaller errors. Is it a better model?

aa opas asl poasie ;aaslkf y = )_(*_n &*^(*LKH l;j;)(*&)(*& + error This model has even smaller errors. In fact, zero errors. Tradeoff: Small errors vs. complexity. (We’ll only consider linear models.)

The “Regression” Effect A preschool program attempts to boost children’s reading scores. Children are given a pre-test and a post-test. Pre-test: mean score ≈ 100, SD ≈ 10 Post-test:mean score ≈ 100, SD ≈ 10 The program seems to have no effect.

A closer look at the data shows a surprising result: Children who were below average on the pre-test tended to gain about 5-10 points on the post-test Children who were above average on the pre-test tended to lose about 5-10 points on the post-test.

A closer look at the data shows a surprising result: Children who were below average on the pre-test tended to gain about 5-10 points on the post-test Children who were above average on the pre-test tended to lose about 5-10 points on the post-test. Maybe we should provide the program only for children whose pre-test scores are below average?

Fact: In most test–retest and analogous situations, the bottom group on the first test will on average tend to improve, while the top group on the first test will on average tend to do worse. Other examples: Students who score high on the midterm tend on average to score high on the final, but not as high. An athlete who has a good rookie year tends to slump in his or her second year. (“Sophomore jinx”, "Sports Illustrated Jinx") Tall fathers tend to have sons who are tall, but not as tall. (Galton’s original example!)

It works the other way, too: Students who score high on the final tend to have scored high on the midterm, but not as high. Tall sons tend to have fathers who are tall, but not as tall. Students who did well on the post-test showed improvements, on average, of 5-10 points, while students who did poorly on the post-test dropped an average of 5-10 points.

Students can do well on the pretest… -- because they are good readers, or -- because they get lucky. The good readers, on average, do exactly as well on the post-test. The lucky group, on average, score lower. Students can get unlucky, too, but fewer of that group are among the high-scorers on the pre-test. So the top group on the pre-test, on average, tends to score a little lower on the post-test.

Outliers (W. H. Freeman, publishers)

Extrapolation Interpolation: Using a model to estimate Y for an X value within the range on which the model was based. Extrapolation: Estimating based on an X value outside the range.

Extrapolation Interpolation: Using a model to estimate Y for an X value within the range on which the model was based. Extrapolation: Estimating based on an X value outside the range. Interpolation Good, Extrapolation Bad.

Nixon’s Graph: Economic Growth

Start of Nixon Adm.

Nixon’s Graph: Economic Growth Now Start of Nixon Adm.

Nixon’s Graph: Economic Growth Now Start of Nixon Adm. Projectio n

Conditions for regression “Straight enough” condition (linearity) Errors are mostly independent of X Errors are mostly independent of anything else you can think of Errors are more-or-less normally distributed

How to test the quality of a regression— Plot the residuals. Pattern bad, no pattern good R 2 How sure are you of the coefficients ?

Computing correlation… 1.Replace each variable with its standardized version. 2.Take an “average” of ( x i ’ times y i ’ ):