Chapter 41 Describing Relationships: Scatterplots and Correlation.

Slides:



Advertisements
Similar presentations
7.1 Seeking Correlation LEARNING GOAL
Advertisements

Chapter 6: Exploring Data: Relationships Lesson Plan
CHAPTER 4: Scatterplots and Correlation. Chapter 4 Concepts 2  Explanatory and Response Variables  Displaying Relationships: Scatterplots  Interpreting.
Looking at data: relationships - Correlation IPS chapter 2.2 Copyright Brigitte Baldi 2005 ©
Looking at data: relationships Scatterplots IPS chapter 2.1 © 2006 W. H. Freeman and Company.
Scatterplots By Wendy Knight. Review of Scatterplots  Scatterplots – Show the relationship between 2 quantitative variables measured on the same individual.
Describing Relationships: Scatterplots and Correlation
Ch 2 and 9.1 Relationships Between 2 Variables
Chapter 7 Scatterplots, Association, Correlation Scatterplots and correlation Fitting a straight line to bivariate data © 2006 W. H. Freeman.
MATH 2400 Chapter 4 Notes. Response & Explanatory Variables A response variable (a.k.a. dependent variables) measures an outcome of a study. An explanatory.
LECTURE 2 Understanding Relationships Between 2 Numerical Variables
Examining Relationships Prob. And Stat. CH.2.1 Scatterplots.
Association between 2 variables We've described the distribution of 1 variable in Chapter 1 - but what if 2 variables are measured on the same individual?
Relationships Scatterplots and correlation BPS chapter 4 © 2006 W.H. Freeman and Company.
Chapter 6: Exploring Data: Relationships Chi-Kwong Li Displaying Relationships: Scatterplots Regression Lines Correlation Least-Squares Regression Interpreting.
CHAPTER 4: Scatterplots and Correlation ESSENTIAL STATISTICS Second Edition David S. Moore, William I. Notz, and Michael A. Fligner Lecture Presentation.
Chapter 6: Exploring Data: Relationships Lesson Plan Displaying Relationships: Scatterplots Making Predictions: Regression Line Correlation Least-Squares.
Lecture PowerPoint Slides Basic Practice of Statistics 7 th Edition.
Scatterplots. Learning Objectives By the end of this lecture, you should be able to: – Describe what a scatterplot is – Be comfortable with the terms.
LECTURE UNIT 7 Understanding Relationships Among Variables Scatterplots and correlation Fitting a straight line to bivariate data.
BPS - 3rd Ed. Chapter 41 Scatterplots and Correlation.
BPS - 3rd Ed. Chapter 41 Scatterplots and Correlation.
Looking at data: relationships Scatterplots IPS chapter 2.1 © 2006 W. H. Freeman and Company.
Chapter 6 Scatterplots and Correlation Chapter 7 Objectives Scatterplots  Scatterplots  Explanatory and response variables  Interpreting scatterplots.
IPS Chapter 2 DAL-AC FALL 2015  2.1: Scatterplots  2.2: Correlation  2.3: Least-Squares Regression  2.4: Cautions About Correlation and Regression.
Notes Bivariate Data Chapters Bivariate Data Explores relationships between two quantitative variables.
Chapter 14 Describing Relationships: Scatterplots and Correlation Chapter 141.
1 Examining Relationships in Data William P. Wattles, Ph.D. Francis Marion University.
Essential Statistics Chapter 41 Scatterplots and Correlation.
Objectives (IPS Chapter 2.1)
Notes Bivariate Data Chapters Bivariate Data Explores relationships between two quantitative variables.
Correlation tells us about strength (scatter) and direction of the linear relationship between two quantitative variables. In addition, we would like to.
Scatterplots are used to investigate and describe the relationship between two numerical variables When constructing a scatterplot it is conventional to.
Lecture PowerPoint Slides Basic Practice of Statistics 7 th Edition.
Objectives 2.1Scatterplots  Scatterplots  Explanatory and response variables  Interpreting scatterplots  Outliers Adapted from authors’ slides © 2012.
CHAPTER 4 SCATTERPLOTS AND CORRELATION BPS - 5th Ed. Chapter 4 1.
Chapter 4 Scatterplots and Correlation. Explanatory and Response Variables u Interested in studying the relationship between two variables by measuring.
Relationships If we are doing a study which involves more than one variable, how can we tell if there is a relationship between two (or more) of the.
Chapter 7 Scatterplots, Association, and Correlation.
Association between 2 variables We've described the distribution of 1 variable - but what if 2 variables are measured on the same individual? Examples?
Relationships Scatterplots and correlation BPS chapter 4 © 2006 W.H. Freeman and Company.
Chapter 4 - Scatterplots and Correlation Dealing with several variables within a group vs. the same variable for different groups. Response Variable:
Correlation tells us about strength (scatter) and direction of the linear relationship between two quantitative variables. In addition, we would like to.
Chapter 4 Scatterplots and Correlation. Chapter outline Explanatory and response variables Displaying relationships: Scatterplots Interpreting scatterplots.
Relationships Scatterplots and correlation BPS chapter 4 © 2006 W.H. Freeman and Company.
Chapter 2 Examining Relationships.  Response variable measures outcome of a study (dependent variable)  Explanatory variable explains or influences.
Chapter 141 Describing Relationships: Scatterplots and Correlation.
BPS - 5th Ed. Chapter 41 Scatterplots and Correlation.
Relationships Scatterplots and Correlation.  Explanatory and response variables  Displaying relationships: scatterplots  Interpreting scatterplots.
Notes Chapter 7 Bivariate Data. Relationships between two (or more) variables. The response variable measures an outcome of a study. The explanatory variable.
The Practice of Statistics for Business and Economics Third Edition David S. Moore George P. McCabe Layth C. Alwan Bruce A. Craig William M. Duckworth.
Lecture 4 Chapter 3. Bivariate Associations. Objectives (PSLS Chapter 3) Relationships: Scatterplots and correlation  Bivariate data  Scatterplots (2.
Lecture 3 – Sep 3. Normal quantile plots are complex to do by hand, but they are standard features in most statistical software. Good fit to a straight.
Statistics for Business and Economics Module 2: Regression and time series analysis Spring 2010 Lecture 2: Examining the relationship between two quantitative.
Scatter plots Adapted from 350/
Essential Statistics Chapter 41 Scatterplots and Correlation.
3. Relationships Scatterplots and correlation
Basic Practice of Statistics - 3rd Edition
Daniela Stan Raicu School of CTI, DePaul University
Basic Practice of Statistics - 3rd Edition
Basic Practice of Statistics - 3rd Edition
Basic Practice of Statistics - 5th Edition
The Practice of Statistics in the Life Sciences Fourth Edition
Daniela Stan Raicu School of CTI, DePaul University
CHAPTER 4: Scatterplots and Correlation
Chapter 2 Looking at Data— Relationships
Basic Practice of Statistics - 3rd Edition
Essential Statistics Scatterplots and Correlation
Association between 2 variables
Association between 2 variables
Presentation transcript:

Chapter 41 Describing Relationships: Scatterplots and Correlation

Objectives (BPS chapter 4) Relationships: Scatterplots and correlation u Explanatory and response variables u Displaying relationships: scatterplots u Interpreting scatterplots u Adding categorical variables to scatterplots u Measuring linear association (correlation) u Facts about correlation Chapter 42

3 Scatterplot A scatterplot is a graph in which paired (x, y) data (usually collected on the same individuals) are plotted with one variable represented on a horizontal (x -) axis and the other variable represented on a vertical (y-) axis. Each individual pair (x, y) is plotted as a single point. Example:

StudentNumber of Beers Blood Alcohol Level Here we have two quantitative variables for each of 16 students. 1. How many beers they drank, and 2. Their blood alcohol level (BAC) We are interested in the relationship between the two variables: How is one affected by changes in the other one?

StudentBeersBAC Scatterplots In a scatterplot one axis is used to represent each of the variables, and the data are plotted as points on the graph.

Explanatory (independent) variable: number of beers Response (dependent) variable: blood alcohol content x y Explanatory and response variables A response variable measures or records an outcome of a study. An explanatory variable explains changes in the response variable. Typically, the explanatory or independent variable is plotted on the x axis and the response or dependent variable is plotted on the y axis.

Some plots don’t have clear explanatory and response variables. Do calories explain sodium amounts? Does percent return on Treasury bills explain percent return on common stocks?

Chapter 48 Examining a Scatterplot You can describe the overall pattern of a scatterplot by the  Form – linear or non-linear ( quadratic, exponential, no correlation etc.)  Direction – negative, positive.  Strength – strong, very strong, moderately strong, weak etc.  Look for outliers and how they affect the correlation.

Chapter 49 Scatterplot x12345 y x 24 –2–2 – 4 y 2 6 Example: Draw a scatter plot for the data below. What is the nature of the relationship between X and Y. Strong, positive and linear.

Chapter 410 Examining a Scatterplot  Two variables are positively correlated when high values of the variables tend to occur together and low values of the variables tend to occur together. The scatterplot slopes upwards from left to right.  Two variables are negatively correlated when high values of one of the variables tend to occur with low values of the other and vice versa. The scatterplot slopes downwards from left to right.

Chapter 411 Types of Correlation x y Negative Linear Correlation x y No Correlation x y Positive Linear Correlation x y Non-linear Correlation As x increases, y tends to decrease. As x increases, y tends to increase.

Chapter 1312 Examples of Relationships

Caution: u Relationships require that both variables be quantitative (thus the order of the data points is defined entirely by their value). u Correspondingly, relationships between categorical data are meaningless. Example: Beetles trapped on boards of different colors What association? What relationship? Blue White Green Yellow Board color Blue Green White Yellow Board color Describe one category at a time. ?

Chapter 414 Thought Question 1 What type of association would the following pairs of variables have – positive, negative, or none? 1. Temperature during the summer and electricity bills 2. Temperature during the winter and heating costs 3. Number of years of education and height (Elementary School) 4. Frequency of brushing and number of cavities 5. Number of churches and number of bars in cities 6. Height of husband and height of wife

Chapter 415 Thought Question 2 Consider the two scatterplots below. How does the outlier impact the correlation for each plot? –does the outlier increase the correlation, decrease the correlation, or have no impact?

Strength of the association The strength of the relationship between the two variables can be seen by how much variation, or scatter, there is around the main form. With a strong relationship, you can get a pretty good estimate of y if you know x. With a weak relationship, for any x you might get a wide range of y values.

How to scale a scatterplot Using an inappropriate scale for a scatterplot can give an incorrect impression. Both variables should be given a similar amount of space: Plot roughly square Points should occupy all the plot space (no blank space) Same data in all four plots

Adding categorical variables to scatterplots Often, things are not simple and one-dimensional. We need to group the data into categories to reveal trends. What may look like a positive linear relationship is in fact a series of negative linear associations. Plotting different habitats in different colors allowed us to make that important distinction.

Comparison of men’s and women’s racing records over time. Each group shows a very strong negative linear relationship that would not be apparent without the gender categorization. Relationship between lean body mass and metabolic rate in men and women. While both men and women follow the same positive linear trend, women show a stronger association. As a group, males typically have larger values for both variables.

Chapter 420 Measuring Strength & Direction of a Linear Relationship u How closely does a non-horizontal straight line fit the points of a scatterplot? u The correlation coefficient (often referred to as just correlation): r –measure of the strength of the relationship: the stronger the relationship, the larger the magnitude of r. –measure of the direction of the relationship: positive r indicates a positive relationship, negative r indicates a negative relationship.

Chapter 421 Correlation Coefficient Greek Capital Letter Sigma – denotes summation or addition.

Example: Find the correlation between X and Y Chapter 422 x12345 y xy

Chapter 423 Correlation Coefficient u The range of the correlation coefficient is -1 to If r = -1 there is a perfect negative correlation If r = 1 there is a perfect positive correlation If r is close to 0 there is no linear correlation

Chapter 424 Linear Correlation Strong negative correlation Weak positive correlation Strong positive correlation Non-linear Correlation x y x y x y x y r =  0.91 r = 0.88 r = 0.42r = 0.07 Try

Chapter 425 Correlation Coefficient u special values for r :  a perfect positive linear relationship would have r = +1  a perfect negative linear relationship would have r = -1  if there is no linear relationship, or if the scatterplot points are best fit by a horizontal line, then r = 0  Note: r must be between -1 and +1, inclusive u r > 0: as one variable changes, the other variable tends to change in the same direction u r < 0: as one variable changes, the other variable tends to change in the opposite direction

Chapter 426 Correlation Coefficient u Because r uses the z-scores for the observations, it does not change when we change the units of measurements of x, y or both. u Correlation ignores the distinction between explanatory and response variables. u r measures the strength of only linear association between variables. u A large value of r does not necessarily mean that there is a strong linear relationship between the variables – the relationship might not be linear; always look at the scatterplot. u When r is close to 0, it does not mean that there is no relationship between the variables, it means there is no linear relationship. u Outliers can inflate or deflate correlations. Try

Chapter 427 Not all Relationships are Linear Miles per Gallon versus Speed u Curved relationship (r is misleading) u Speed chosen for each subject varies from 20 mph to 60 mph u MPG varies from trial to trial, even at the same speed u Statistical relationship r=-0.06

Chapter 428 Common Errors Involving Correlation 1. Causation: It is wrong to conclude that correlation implies causality. 2. Averages: Averages suppress individual variation and may inflate the correlation coefficient. 3. Linearity: There may be some relationship between x and y even when there is no linear correlation.

Chapter 429 Example A survey of the world’s nations in 2004 shows a strong positive correlation between percentage of countries using cell phones and life expectancy in years at birth. a) Does this mean that cell phones are good for your health? No. It simply means that in countries where cell phone use is high, the life expectancy tends to be high as well. b) What might explain the strong correlation? The economy could be a lurking variable. Richer countries generally have more cell phone use and better health care.

Chapter 430 Example The correlation between Age and Income as measured on 100 people is r = Explain whether or not each of these conclusions is justified. a) When Age increases, Income increases as well. b) The form of the relationship between Age and Income is linear. c) There are no outliers in the scatterplot of Income vs. Age. d) Whether we measure Age in years or months, the correlation will still be 0.75.

Chapter 431 Example Explain the mistakes in the statements below: a) “My correlation of between GDP and Infant Mortality Rate shows that there is almost no association between GDP and Infant Mortality Rate”. b) “There was a correlation of 0.44 between GDP and Continent” c) “There was a very strong correlation of 1.22 between Life Expectancy and GDP”.

Chapter 432 Key Concepts u Strength of Linear Relationship u Direction of Linear Relationship u Correlation Coefficient u Common Problems with Correlations u r can only be calculated for quantitative data.