Optical illusion ? Correlation ( r or R or  ) -- One-number summary of the strength of a relationship -- How to recognize -- How to compute Regressions.

Slides:



Advertisements
Similar presentations
Two Quantitative Variables Scatterplots examples how to draw them Association what to look for in a scatterplot Correlation strength of a linear relationship.
Advertisements

AP Statistics Chapter 3 Practice Problems
Chapter 8 Linear Regression © 2010 Pearson Education 1.
Correlations and scatterplots -- Optical illusion ? -- Finding the marginal distributions in the scatterplots (shoe size vs. hours of TV) Regressions --
Copyright ©2006 Brooks/Cole, a division of Thomson Learning, Inc. Relationships Between Quantitative Variables Chapter 5.
Chapter 15 (Ch. 13 in 2nd Can.) Association Between Variables Measured at the Interval-Ratio Level: Bivariate Correlation and Regression.
LSRL Least Squares Regression Line
Relationships Between Quantitative Variables
Regression and Correlation
LINEAR REGRESSIONS: Cricket example About lines Line as a model:
Statistics Psych 231: Research Methods in Psychology.
Fall 2006 – Fundamentals of Business Statistics 1 Chapter 13 Introduction to Linear Regression and Correlation Analysis.
Two Quantitative Variables Scatterplots examples how to draw them Association what to look for in a scatterplot Correlation strength of a linear relationship.
Linear Regression and Correlation Analysis
Chapter 13 Introduction to Linear Regression and Correlation Analysis
LINEAR REGRESSIONS: About lines Line as a model: Understanding the slope Predicted values Residuals How to pick a line? Least squares criterion “Point.
Correlation and Regression. Relationships between variables Example: Suppose that you notice that the more you study for an exam, the better your score.
Correlation 10/30. Relationships Between Continuous Variables Some studies measure multiple variables – Any paired-sample experiment – Training & testing.
Chapter 5 Regression. Chapter 51 u Objective: To quantify the linear relationship between an explanatory variable (x) and response variable (y). u We.
Correlation and regression 1: Correlation Coefficient
September In Chapter 14: 14.1 Data 14.2 Scatterplots 14.3 Correlation 14.4 Regression.
Biostatistics Unit 9 – Regression and Correlation.
Linear Functions 2 Sociology 5811 Lecture 18 Copyright © 2004 by Evan Schofer Do not copy or distribute without permission.
Correlation is a statistical technique that describes the degree of relationship between two variables when you have bivariate data. A bivariate distribution.
Copyright © 2008 Pearson Education, Inc. Publishing as Pearson Addison-Wesley Chapter 8 Linear Regression.
Scatterplot and trendline. Scatterplot Scatterplot explores the relationship between two quantitative variables. Example:
Relationships If we are doing a study which involves more than one variable, how can we tell if there is a relationship between two (or more) of the.
Alternatively, dependent variable and independent variable. Alternatively, endogenous variable and exogenous variable.
STA291 Statistical Methods Lecture LINEar Association o r measures “closeness” of data to the “best” line. What line is that? And best in what terms.
Examining Bivariate Data Unit 3 – Statistics. Some Vocabulary Response aka Dependent Variable –Measures an outcome of a study Explanatory aka Independent.
CHAPTER 5 Regression BPS - 5TH ED.CHAPTER 5 1. PREDICTION VIA REGRESSION LINE NUMBER OF NEW BIRDS AND PERCENT RETURNING BPS - 5TH ED.CHAPTER 5 2.
Click to edit Master title style Midterm 3 Wednesday, June 10, 1:10pm.
April 1 st, Bellringer-April 1 st, 2015 Video Link Worksheet Link
 Find the Least Squares Regression Line and interpret its slope, y-intercept, and the coefficients of correlation and determination  Justify the regression.
Copyright © 2007 Pearson Education, Inc. Publishing as Pearson Addison-Wesley Slide 8- 1.
Multivariate Data. Descriptive techniques for Multivariate data In most research situations data is collected on more than one variable (usually many.
Copyright (C) 2002 Houghton Mifflin Company. All rights reserved. 1 Understandable Statistics Seventh Edition By Brase and Brase Prepared by: Lynn Smith.
Chapter 8 Linear Regression. Fat Versus Protein: An Example 30 items on the Burger King menu:
Simple Linear Regression The Coefficients of Correlation and Determination Two Quantitative Variables x variable – independent variable or explanatory.
GOAL: I CAN USE TECHNOLOGY TO COMPUTE AND INTERPRET THE CORRELATION COEFFICIENT OF A LINEAR FIT. (S-ID.8) Data Analysis Correlation Coefficient.
Chapter 12: Correlation and Linear Regression 1.
Describing Relationships. Least-Squares Regression  A method for finding a line that summarizes the relationship between two variables Only in a specific.
Part II Exploring Relationships Between Variables.
Week 2 Normal Distributions, Scatter Plots, Regression and Random.
Correlation.
The simple linear regression model and parameter estimation
Describing Relationships
Unit 4 LSRL.
Sections Review.
Regression and Correlation
Correlation 10/27.
SCATTERPLOTS, ASSOCIATION AND RELATIONSHIPS
Chapter 5 LSRL.
LSRL Least Squares Regression Line
Chapter 4 Correlation.
Correlation 10/27.
Suppose the maximum number of hours of study among students in your sample is 6. If you used the equation to predict the test score of a student who studied.
CHAPTER 3 Describing Relationships
Chapter 5 LSRL.
Correlation and Regression
CHAPTER 3 Describing Relationships
CHAPTER 3 Describing Relationships
CHAPTER 3 Describing Relationships
CHAPTER 3 Describing Relationships
CHAPTER 3 Describing Relationships
Algebra Review The equation of a straight line y = mx + b
CHAPTER 3 Describing Relationships
CHAPTER 3 Describing Relationships
CHAPTER 3 Describing Relationships
Presentation transcript:

Optical illusion ? Correlation ( r or R or  ) -- One-number summary of the strength of a relationship -- How to recognize -- How to compute Regressions -- Any model has predicted values and residuals. (Do we always want a model with small residuals ? ) -- Regression lines --- how to use --- how to compute -- The “regression effect” (Why did Galton call these things “regressions” ? ) -- Pitfalls: Outliers -- Pitfalls: Extrapolation -- Conditions for a good regression

Which looks like a stronger relationship?

Optical Illusion ?

Kinds of Association… Positive vs. Negative Strong vs. Weak Linear vs. Non-linear

CORRELATION (or, the CORRELATION COEFFICIENT) measures the strength of a linear relationship. If the relationship is non-linear, it measures the strength of the linear part of the relationship. But then it doesn’t tell the whole story. Correlation can be positive or negative.

correlation =.97 correlation =.71

correlation = –.97 correlation = – X Y X Y

correlation = X Y

correlation =.24 correlation =.90

correlation =.50 correlation = 0

Computing correlation… 1.Replace each variable with its standardized version. 2.Take an “average” of ( x i ’ times y i ’ ):

Computing correlation r, or R, or greek  (rho) n-1 or n ? sum of all the products

Good things about correlation It’s symmetric ( correlation of x and y means same as correlation of y and x ) It doesn’t depend on scale or units — adding or multiplying either variable by a constant doesn’t change r — of course not; r depend only on the standardized versions r is always in the range from -1 to means perfect positive correlation; dots on line -1 means perfect negative correlation; dots on line 0 means no relationship, OR no linear relationship

Bad things about correlation Sensitive to outliers Misses non-linear relationships Doesn’t imply causality

Made-up Examples PERCENT TAKING SAT STATE AVE SCORE

Made-up Examples SHOE SIZE IQ

Made-up Examples BAKING TEMP JUDGE’S IMPRESSION

Made-up Examples GDP PER CAPITA LIFE EXPECTANCY

Observed Values, Predictions, and Residuals explanatory variable resp. var.

Observed Values, Predictions, and Residuals explanatory variable resp. var.

Observed Values, Predictions, and Residuals explanatory variable resp. var.

Observed Values, Predictions, and Residuals explanatory variable resp. var. Observed value Predicted value Residual = observed – predicted

Linear models and non-linear models Model A:Model B: y = a + bx + error y = a x 1/2 + error Model B has smaller errors. Is it a better model?

aa opas asl poasie ;aaslkf y = )_(*_n &*^(*LKH l;j;)(*&)(*& + error This model has even smaller errors. In fact, zero errors. Tradeoff: Small errors vs. complexity. (We’ll only consider linear models.)

About Lines y = mx + b b slope = m

About Lines y = mx + b y intercept slope b slope = m

About Lines y = mx + b b slope = m

About Lines y = mx + b y = b + mx b slope = m

About Lines y = mx + b y = b + mx y =  +  x y =  0 +  1 x

About Lines y = mx + b y = b + mx y =  +  x y =  0 +  1 x y = b 0 + b 1 x

About Lines y = mx + b y = b + mx y =  +  x y =  0 +  1 x y = b 0 + b 1 x y intercept slope b0b0 slope = b 1

About Lines y = mx + b y = b + mx y =  +  x y =  0 +  1 x y = b 0 + b 1 x y intercept slope b0b0 slope = b 1

Computing the best-fit line In STANDARDIZED scatterplot: -- goes through origin -- slope is r In ORIGINAL scatterplot: -- goes through “point of means” -- slope is r ×  Y   x

The “Regression” Effect A preschool program attempts to boost children’s reading scores. Children are given a pre-test and a post-test. Pre-test: mean score ≈ 100, SD ≈ 10 Post-test:mean score ≈ 100, SD ≈ 10 The program seems to have no effect.

A closer look at the data shows a surprising result: Children who were below average on the pre-test tended to gain about 5-10 points on the post-test Children who were above average on the pre-test tended to lose about 5-10 points on the post-test.

A closer look at the data shows a surprising result: Children who were below average on the pre-test tended to gain about 5-10 points on the post-test Children who were above average on the pre-test tended to lose about 5-10 points on the post-test. Maybe we should provide the program only for children whose pre-test scores are below average?

Fact: In most test–retest and analogous situations, the bottom group on the first test will on average tend to improve, while the top group on the first test will on average tend to do worse. Other examples: Students who score high on the midterm tend on average to score high on the final, but not as high. An athlete who has a good rookie year tends to slump in his or her second year. (“Sophomore jinx”, "Sports Illustrated Jinx") Tall fathers tend to have sons who are tall, but not as tall. (Galton’s original example!)

It works the other way, too: Students who score high on the final tend to have scored high on the midterm, but not as high. Tall sons tend to have fathers who are tall, but not as tall. Students who did well on the post-test showed improvements, on average, of 5-10 points, while students who did poorly on the post-test dropped an average of 5-10 points.

Students can do well on the pretest… -- because they are good readers, or -- because they get lucky. The good readers, on average, do exactly as well on the post-test. The lucky group, on average, score lower. Students can get unlucky, too, but fewer of that group are among the high-scorers on the pre-test. So the top group on the pre-test, on average, tends to score a little lower on the post-test.

Extrapolation Interpolation: Using a model to estimate Y for an X value within the range on which the model was based. Extrapolation: Estimating based on an X value outside the range.

Extrapolation Interpolation: Using a model to estimate Y for an X value within the range on which the model was based. Extrapolation: Estimating based on an X value outside the range. Interpolation Good, Extrapolation Bad.

Nixon’s Graph: Economic Growth

Start of Nixon Adm.

Nixon’s Graph: Economic Growth Now Start of Nixon Adm.

Nixon’s Graph: Economic Growth Now Start of Nixon Adm. Projectio n

Conditions for regression “Straight enough” condition (linearity) Errors are mostly independent of X Errors are mostly independent of anything else you can think of Errors are more-or-less normally distributed

How to test the quality of a regression— Plot the residuals. Pattern bad, no pattern good R 2 How sure are you of the coefficients ?