Scatterplots. Learning Objectives By the end of this lecture, you should be able to: – Describe what a scatterplot is – Be comfortable with the terms.

Slides:



Advertisements
Similar presentations
Chapter 3 Examining Relationships Lindsey Van Cleave AP Statistics September 24, 2006.
Advertisements

Chapter 3 Examining Relationships
Statistics 100 Lecture Set 7. Chapters 13 and 14 in this lecture set Please read these, you are responsible for all material Will be doing chapters
Chapter 6: Exploring Data: Relationships Lesson Plan
Chapter 41 Describing Relationships: Scatterplots and Correlation.
Looking at data: relationships Scatterplots IPS chapter 2.1 © 2006 W. H. Freeman and Company.
CHAPTER 3 Describing Relationships
Chapter 7 Scatterplots, Association, Correlation Scatterplots and correlation Fitting a straight line to bivariate data © 2006 W. H. Freeman.
LECTURE 2 Understanding Relationships Between 2 Numerical Variables
Chapter 7 Scatterplots and Correlation Scatterplots: graphical display of bivariate data Correlation: a numerical summary of bivariate data.
Regression.
Examining Relationships Prob. And Stat. CH.2.1 Scatterplots.
Association between 2 variables We've described the distribution of 1 variable in Chapter 1 - but what if 2 variables are measured on the same individual?
Relationships Scatterplots and correlation BPS chapter 4 © 2006 W.H. Freeman and Company.
Chapter 6: Exploring Data: Relationships Chi-Kwong Li Displaying Relationships: Scatterplots Regression Lines Correlation Least-Squares Regression Interpreting.
Chapter 3: Examining relationships between Data
Chapter 6: Exploring Data: Relationships Lesson Plan Displaying Relationships: Scatterplots Making Predictions: Regression Line Correlation Least-Squares.
LECTURE UNIT 7 Understanding Relationships Among Variables Scatterplots and correlation Fitting a straight line to bivariate data.
Looking at data: relationships Scatterplots IPS chapter 2.1 © 2006 W. H. Freeman and Company.
Chapter 6 Scatterplots and Correlation Chapter 7 Objectives Scatterplots  Scatterplots  Explanatory and response variables  Interpreting scatterplots.
1 Examining Relationships in Data William P. Wattles, Ph.D. Francis Marion University.
Exploring Relationships Between Variables Chapter 7 Scatterplots and Correlation.
Objectives (IPS Chapter 2.1)
Notes Bivariate Data Chapters Bivariate Data Explores relationships between two quantitative variables.
The Practice of Statistics, 5th Edition Starnes, Tabor, Yates, Moore Bedford Freeman Worth Publishers CHAPTER 3 Describing Relationships 3.1 Scatterplots.
Correlation tells us about strength (scatter) and direction of the linear relationship between two quantitative variables. In addition, we would like to.
Objectives 2.1Scatterplots  Scatterplots  Explanatory and response variables  Interpreting scatterplots  Outliers Adapted from authors’ slides © 2012.
The Practice of Statistics
Relationships If we are doing a study which involves more than one variable, how can we tell if there is a relationship between two (or more) of the.
Chapter 7 Scatterplots, Association, and Correlation.
Association between 2 variables We've described the distribution of 1 variable - but what if 2 variables are measured on the same individual? Examples?
Relationships Scatterplots and correlation BPS chapter 4 © 2006 W.H. Freeman and Company.
3.2: Linear Correlation Measure the strength of a linear relationship between two variables. As x increases, no definite shift in y: no correlation. As.
Correlation tells us about strength (scatter) and direction of the linear relationship between two quantitative variables. In addition, we would like to.
Relationships Scatterplots and correlation BPS chapter 4 © 2006 W.H. Freeman and Company.
Relationships Scatterplots and Correlation.  Explanatory and response variables  Displaying relationships: scatterplots  Interpreting scatterplots.
Lecture 8 Sections Objectives: Bivariate and Multivariate Data and Distributions − Scatter Plots − Form, Direction, Strength − Correlation − Properties.
Lecture 4 Chapter 3. Bivariate Associations. Objectives (PSLS Chapter 3) Relationships: Scatterplots and correlation  Bivariate data  Scatterplots (2.
Lecture 3 – Sep 3. Normal quantile plots are complex to do by hand, but they are standard features in most statistical software. Good fit to a straight.
Statistics for Business and Economics Module 2: Regression and time series analysis Spring 2010 Lecture 2: Examining the relationship between two quantitative.
Scatter plots Adapted from 350/
3. Relationships Scatterplots and correlation
Exploring Relationships Between Variables
Scatterplots Chapter 6.1 Notes.
CHAPTER 3 Describing Relationships
Chapter 3: Describing Relationships
Chapter 3: Describing Relationships
Module 11 Math 075. Module 11 Math 075 Bivariate Data Proceed similarly as univariate distributions … What is univariate data? Which graphical models.
The Practice of Statistics in the Life Sciences Fourth Edition
Chapter 7 Part 1 Scatterplots, Association, and Correlation
Chapter 3: Describing Relationships
Chapter 2 Looking at Data— Relationships
Chapter 3: Describing Relationships
CHAPTER 3 Describing Relationships
CHAPTER 3 Describing Relationships
CHAPTER 3 Describing Relationships
CHAPTER 3 Describing Relationships
Chapter 3: Describing Relationships
CHAPTER 3 Describing Relationships
CHAPTER 3 Describing Relationships
CHAPTER 3 Describing Relationships
CHAPTER 3 Describing Relationships
Association between 2 variables
CHAPTER 3 Describing Relationships
CHAPTER 3 Describing Relationships
Day 8 Agenda: Quiz 1.1 & minutes Begin Ch 3.1
Scatterplots.
CHAPTER 3 Describing Relationships
Correlation/regression using averages
Image from Minitab Website
Presentation transcript:

Scatterplots

Learning Objectives By the end of this lecture, you should be able to: – Describe what a scatterplot is – Be comfortable with the terms exaplanatory variable and response variable. – Describe a scatterplot in terms of form, direction, and strength – Define what is meant by an outlier, and be able to Identify them on a scatterplot – Recognize why poorly chosen scales on a scatterplot can give misleading impressions of the data

Examining Relationships Up to this point, we have focused on single-variable (“univariate”) data. Eg: Women’s heights, Percentage of Hispanics in each state, SAT scores, etc. Most statistical studies involve more than one variable. For example, a great deal of analysis goes into examining the relationship between two variables. Example: We may be interested in the relationship between The number of beers they consumed at a party Blood alcohol level (BAC) With the proper statistical tools we can try to determine things like: IS there a relationship? I.E. Does the number of beers affect blood alcohol level? If there is a relationship, can we predict how much each beer contributes to BAC. A great human flaw: It is tempting to just intuitively assume that there is a relationship between two variables. However, this can lead to some highly erroneous conclusions. As humans, we LOVE to assume stuff, find patterns that don’t truly exist, and then jump to conclusions. This is a very well-known flaw in the human character and we should be aware of it. We will discuss this topic in more detail as we progress through the course.

StudentBeersBlood Alcohol S150.1 S S S S S S S S S S S S S S S Here, we have two quantitative variables for each of 16 students (n=16). 1) How many beers they drank, and 2) Their blood alcohol level (BAC) We are interested in the relationship between the two variables: How is one affected by changes in the other one?

Looking for relationships between variables  Start with a graph (always – whenever possible)  Look for  an overall pattern  deviations from the pattern (deviations such as outliers are sometimes the most interesting part!)  If appropriate, try to provide numerical descriptions of the data and overall pattern.

StudentBeersBAC Scatterplots In a scatterplot, one axis is used to represent each of the variables, and the data are plotted as points on the graph.

Number of Beers (Explanatory Variable) Blood Alcohol Content (Response variable) x y Explanatory and response variables A response variable measures or records an outcome of a study. An explanatory variable explains (“causes”) the changes in the response variable. Typically, the explanatory variable is plotted on the x axis, and the response variable is plotted on the y axis.

Terminology: Dependent / Independent Instead of explanatory / response, you will often encounter the terms independent and dependent used. – Independent for Explanatory – Dependent for Response They are pretty much interchangable, but there is a subtle difference. However, it is more accurate to use the terms explanatory and response, so I would like you to focus on those terms. – You will ocasionally see SPSS use dependent/indepdent.

Which should be the explanatory, and which the response? The variable that you think “causes” the change in the other variable should be the explanatory variable. – (This is why it is frequently called the ‘dependent’ variable. But as was just mentioned, there is a subtle distinction between them which we may get to down the road). The variable that “responds” to a change in the explanatory variable, is, then, the response variable. Example: – Exercise v.s. Calories burned? Answer: The amount of exercise will (hopefully!) result in a change in calories burned. Whereas, burning calories, does not ‘cause’ a change in exercise. So exercise should be our explanatory variable, and calories the response variable. – Exam Score v.s. Hours studying Answer: We would expect that that the amount of hours studying would cause a change in exam score rather than the othe rway around. So ‘hours studying’ would be our explanatory variable.

Describing/Interpreting scatterplots When describing a scatterplot, we describe the relationship by examining the form, direction, and strength of the association. We look for an overall pattern … – Form: linear (a straight line), curved, clusters, no pattern – Direction: positive, negative, no direction – Strength: how closely the points fit the “form”

Form of an association: Linear / Nonlinear / No Relationship Linear Nonlinear No relationship

A linear relationship is given a directional description of Positive or Negative Positive association: High values of one variable tend to occur together with high values of the other variable. Negative association: High values of one variable tend to occur together with low values of the other variable. Direction of a linear association Positive or Negative Note that we only describe the direction of the relationship when the relationship is linear.

Sometimes there isn’t any relationship: X and Y may vary, but are independent of each other. Knowing a value for X tells you nothing about the value for Y. We describe as ‘no relationship’ Scatterplot Direction: No Relationship

Scatterplot: Strength of the association The strength of the relationship between the two variables can be seen by how much variation, or scatter, there is around the main form. With a strong relationship, you can get a pretty good estimate of y if you know x. With a weak relationship, for any x you might get a wide range of y values. (You could probably make a reasonable argument that the reationship of this plot isn’t even linear.) ? ? ?

This is a strong relationship. The daily amount of gas consumed can be predicted quite accurately for a given temperature value. This is a relatively weak relationship. For a particular state median household income, you can’t predict the state per capita income very well.

Describing the strength For now we are using the admittedly vague terms ‘strong, moderate, weak’. In a subsequent lecture on scatterplots, we will learn a technique for quantifying the strength.

Describing/Interpreting scatterplots As mentioned earlier, when you are asked to interpret a scatterplot, you should be familiar with these 3 terms in particular. – Form: linear, curved, clusters, no pattern – Direction: positive, negative, no direction – Strength: how closely the points fit the “form” – Note: Recall that if the relationship is not linear, we will not bother to describe direction or strength.

Examples – Describe each plot Form: Linear, Direction: positive, Strength: strong Form: Linear, Direction: negative, Strength: moderate Form: No relationship. Note that for a given x does not tell us anything new about y. As a result, the terms ‘postive/negative’ don’t apply. Neither does the strength.

Examples Form: Non-linear. Therefore, we don’t bother trying to describe direction or strength. Form: Linear, Direction: positive, Strength: moderate In our next lecture on scatterplots, we will discuss a tool for quantifying the strength of the relationship.

Lying with statistics: How (not) to scale a scatterplot Using an inappropriate scale for a scatterplot can give an incorrect impression. Ideally, both variables should be given a similar amount of space: Plot roughly square Points should occupy most of the plot space Same data in all four plots

How to scale a scatterplot Same data in all four plots In other words, if faced with this group plots, you should be suspicious of most of them!

Outliers An outlier is a data value that has a very low probability of occurrence (i.e., it is unusual or unexpected). In a scatterplot, outliers are points that fall outside of the overall pattern of the relationship.

Not an outlier: The upper right-hand point here is not an outlier of the relationship—It is what you would expect for this many beers given the linear relationship between beers/weight and blood alcohol. This point is not in line with the others, so it is an outlier of the relationship. Outliers

IQ score and Grade point average Describe in words what this plot shows. Looking to see if there is a relationship between IQ score and GPA. Describe the direction, shape, and strength. Are there outliers? Shape: linear Direction: positive Strength: appears somewhat weak Outliers present? Appear to be outliers, but it is hard to say.

IQ score and Grade point average Are there outliers present? The circled datapoints (and perhaps some of the others too) appear to be outliers. Still, it is hard to say. How do we decide? Recall that on a scatterplot, we consider a datapoint to be an outlier if it is way off the “line”. If the “regression” line (the line through the points) looks like the one here, then both IQ scores (circled) would almost certainly be considered outliers.

IQ score and Grade point average Are there outliers present? If the regression line looks like the one drawn here, then certainly the lower circled datapoint (and probably some of others nearby as well) would be considered outliers.

IQ score and Grade point average Are there outliers present? Conversely, if the regression line looks like the one drawn here, then certainly the upper circled datapoint (and probably several of others nearby as well) would be considered outliers. But the lower one would not be.

WHICH line, then, is the “correct” regression line? Answer: Once again, we use a mathematical model to draw a regression line. We will discuss how to do so in our next lecture on scatterplots.