Describing Scatterplots

Slides:



Advertisements
Similar presentations
Describing Relationships: Scatterplots and Correlation
Advertisements

LECTURE 2 Understanding Relationships Between 2 Numerical Variables
Examining Relationships Prob. And Stat. CH.2.1 Scatterplots.
Examining Relationships
Scatterplots. Learning Objectives By the end of this lecture, you should be able to: – Describe what a scatterplot is – Be comfortable with the terms.
LECTURE UNIT 7 Understanding Relationships Among Variables Scatterplots and correlation Fitting a straight line to bivariate data.
Objectives 2.1Scatterplots  Scatterplots  Explanatory and response variables  Interpreting scatterplots  Outliers Adapted from authors’ slides © 2012.
Relationships If we are doing a study which involves more than one variable, how can we tell if there is a relationship between two (or more) of the.
Scatter plots Adapted from 350/
(Unit 6) Formulas and Definitions:. Association. A connection between data values.
Copyright © 2017, 2014 Pearson Education, Inc. Slide 1 Chapter 4 Regression Analysis: Exploring Associations between Variables.
Chapter 3: Describing Relationships
Sections 3.3 & 3.4 Quiz tomorrow.
Scatterplots Chapter 6.1 Notes.
CHAPTER 3 Describing Relationships
Chapter 3: Describing Relationships
Sections Review.
CHAPTER 7 LINEAR RELATIONSHIPS
Chapter 3: Describing Relationships
Bivariate Data.
Bell Ringer Create a stem-and-leaf display using the Super Bowl data from yesterday’s example
Scatterplots, Association and Correlation
Linear transformations
Scatterplots and Correlation
Chapter 7 Part 1 Scatterplots, Association, and Correlation
WARM – UP Mean = Range = Median = Standard Dev. = IQR =
Chapter 3: Describing Relationships
Scatterplots Lecture 19 Sec Wed, Feb 18, 2004.
Chapter 3: Describing Relationships
Chapter 3: Describing Relationships
Chapter 3: Describing Relationships
Chapter 3: Describing Relationships
Review of Chapter 3 Examining Relationships
Looking at data: relationships - Caution about correlation and regression - The question of causation IPS chapters 2.4 and 2.5 © 2006 W. H. Freeman and.
Chapter 3: Describing Relationships
CHAPTER 3 Describing Relationships
CHAPTER 3 Describing Relationships
3.1: Scatterplots & Correlation
Chapter 3: Describing Relationships
CHAPTER 3 Describing Relationships
CHAPTER 3 Describing Relationships
CHAPTER 3 Describing Relationships
Chapter 3: Describing Relationships
Examining Relationships
CHAPTER 3 Describing Relationships
CHAPTER 3 Describing Relationships
Summarizing Bivariate Data
Scatterplots Lecture 18 Sec Fri, Oct 1, 2004.
CHAPTER 3 Describing Relationships
CHAPTER 3 Describing Relationships
Examining Relationships
Correlation/regression using averages
Chapter 3: Describing Relationships
Chapter 3: Describing Relationships
Chapter 3: Describing Relationships
CHAPTER 3 Describing Relationships
Chapter 3: Describing Relationships
CHAPTER 3 Describing Relationships
Chapter 3: Describing Relationships
Chapter 3: Describing Relationships
Chapter 3: Describing Relationships
Chapter 3: Describing Relationships
Honors Statistics Review Chapters 7 & 8
Chapter 3: Describing Relationships
Scatterplots.
Review of Chapter 3 Examining Relationships
CHAPTER 3 Describing Relationships
Scatterplots, Association, and Correlation
Correlation/regression using averages
Chapter 3: Describing Relationships
Presentation transcript:

Describing Scatterplots Section 3.1

Data Univariate data:

Data Univariate data: data that involve a single variable per case This is the type of data we have been working with so far.

Data Bivariate data:

Data Bivariate data: data that involve two variables per case

Data Bivariate data: data that involve two variables per case For quantitative variables, often displayed as a scatterplot.

Scatterplot Scatterplot shows relationship between two quantitative variables.

Scatterplot Scatterplot shows relationship between two quantitative variables. y vs x

Describing Scatterplots Recall, for univariate data we use shape, center, and spread to summarize data.

Describing Scatterplots Recall, for univariate data we use shape, center, and spread to summarize data. For bivariate data, we use shape, trend, and strength.

Describing Scatterplots Describing scatterplots is a 6-step process.

6-Step Process Identify cases and variables

6-Step Process Identify cases and variables Describe overall shape

6-Step Process Identify cases and variables Describe overall shape Describe trend

6-Step Process Identify cases and variables Describe overall shape Describe trend Describe strength

6-Step Process Identify cases and variables Describe overall shape Describe trend Describe strength Does pattern generalize to other cases?

6-Step Process Identify cases and variables Describe overall shape Describe trend Describe strength Does pattern generalize to other cases? Plausible explanation for pattern – or is there a lurking variable

1. Identify Cases and Variables

1. Identify Cases and Variables Need these to put the data in context; otherwise, data are meaningless

Cases and Variables Scatterplot: each point represents one case

Cases and Variables Scatterplot: each point represents one case x-coordinate equal to value of one variable and y-coordinate equal to value of the other variable Describe scale (units of measurement) and range of each variable

2. Describe Overall Shape

2. Describe Overall Shape Linearity: is pattern linear, curved, or none at all?

2. Describe Overall Shape Linearity: is pattern linear, curved, or none at all? Clusters: is there just one cluster or more than one?

2. Describe Overall Shape Linearity: is pattern linear, curved, or none at all? Clusters: is there just one cluster or more than one? Outliers: any striking exceptions to the pattern?

Shape : Linear Points do not have to form a line to have a linear pattern!

Shape : Linear

Shape: Curved

Shape: Curved

Shape: Curved

Shape: None

Shape: None

Outliers : Extreme x-value

Outliers : Extreme x-value

Outlier: Extreme y-value

Outlier: Extreme y-value

Outliers: Not Follow General Trend

Outliers: Not Follow General Trend

Clusters Is there just one cluster or is there more than one?

3. Describe Trend If as x gets larger, y tends to get larger, there is a positive trend.

Trend

Trend Think of slope!

Trend If as x gets larger, y tends to get smaller, there is a negative trend.

Trend

Trend If there is no shape, then there is no trend.

Trend

4. Describe Strength If the points cluster closely around an imaginary line or curve, the strength is strong.

Strength

4. Describe Strength If the points are scattered farther away from an imaginary line or curve, the strength decreases.

Strength

Strength

Strength

Variability of Strength Is strength constant or does the strength vary?

Variability: Fairly Uniform

Variability: Fairly Uniform Could be strong. moderate, or weak and still be fairly uniform

Variability: Heteroscedasticity

Variability: Heteroscedasticity

5. Pattern Generalize? Does pattern generalize to other cases or is relationship a case of “what you see is all you get?”

5. Pattern Generalize? Does pattern generalize to other cases or is relationship a case of “what you see is all you get?”

6. Plausible Explanation Are there plausible explanations for pattern?

6. Plausible Explanation Are there plausible explanations for pattern? Is it reasonable to conclude that change in one variable causes a change in the other?

6. Plausible Explanation Are there plausible explanations for pattern? Is it reasonable to conclude that change in one variable causes a change in the other? Is there a third or lurking variable that might be causing both variables to change?

6. Plausible Explanation

Generalization vs Explanation Prices of houses sold in Gainsville, Fl in one month

Example Turn to page 110 and look at problem P1.

Page 110, P1

Page 110, P1 b. These data are not very interesting to describe. The x-axis shows ages 2 to 7 years, and the y-axis shows the median height of children at each age. The shape is linear, the trend is positive, and the strength is very strong. That is, the scatterplot shows a very strong positive linear trend. Students may mention that a typical child grows about 2.7 inches per year.

Page 110, P1 c. The linear trend could reasonably be expected to hold for another year. However, median height could not be expected to increase at this rate to age 50, as people typically stop growing around age 20.

Page 110, P1 d. In the background is something called “growing up” that happens over time during the early years of life. That is, an increase in age is associated with an increase in height.

Page 111, E1

Page 111, E1 Plot a shows a positive relationship that is strong and linear. There is fairly uniform variation across all values of x.

Page 111, E1 Plot b shows a negative relationship that is strong and linear, again with fairly uniform variation across all values of x.

Page 111, E1 Plot c shows a positive relationship that is moderate and linear with fairly uniform variation across all values of x. One point lies a short distance from the bulk of the data.

Page 111, E1 Plot d shows a negative relationship that is moderate and linear with fairly uniform variation across all values of x. Again, there is one outlier.

Page 111, E1 Plot e shows a positive relationship that is strong and linear except for the outlier. The one outlier has dramatic influence on the strength of this relationship. There is fairly uniform variation across all values of x.

Page 111, E1 Plot f shows a negative relationship that is very strong and curved. One point on the far right lies in the general pattern but far away from the remainder of the data, which accentuates the strong relationship. Another outlier lies below the bulk of the data on the left .

Page 111, E1 Plot g shows a negative relationship that is strong and curved. The two points at either end of the array accentuate the curvature. There is a bit more variability among values of y for smaller values of x than for larger values of x.

Page 111, E1 Plot h shows a positive relationship that is strong and curved. Again, the outlier on the extreme right accentuates the curved pattern and would have dramatic influence on where a trend line might be placed. The variability in y is fairly constant across all values of x.

Page 113, E5

Page 113, E5 a. Plots A, B, and C are the most linear. Plot D is not linear because of the seven universities in the lower right, which may be different from the rest. Plot A, of graduation rate versus alumni giving rate, gives some impression of downward curvature. However, if you disregard the point in the upper right, the impression of any curvature disappears.

Page 113, E5 Plots A, B, and C all have just one cluster. However, plot D, the plot of graduation rate versus top 10% in high school, has two clusters. Most of the points follow the upward linear trend, but the cluster of seven points in the lower right with the highest percentage of freshmen in the top 10% shows little relationship with the graduation rate.

Page 113, E5 Plots A and C have possible outliers. Plot A, the plot of graduation rate versus alumni giving rate has a possible outlier in the upper right. The point is below the general trend and its x-value (but not its y-value) is unusually large. In plot C, the plot of graduation rate versus SAT 75th percentile, the points toward the upper left and the middle right should be examined because they are farther from the general trend than the other points, although neither their x-values nor their y-values are unusual. The point in the lower right of plot D should also be examined, along with the other six points nearby.

Page 113, E5 b. Plots A and C have similar moderate Positive linear trends. Plot D shows wide variation in both variables, with little or no trend. Plot B is the only plot that shows a negative

c. Among these four variables, it appears that the alumni giving rate is the best predictor of the graduation rate, and SAT scores (as measured by the 75th percentile) is second best. However, both of these relationships are moderate and neither is a strong predictor of graduation rate. Ranking in high school class (as measured by the top 10%) is almost useless as a predictor of college graduation rate. Plot A, of graduation rate versus alumni giving rate, owes part of the impression of a strong relationship to the point in the upper right. This plot shows some heteroscedasticity, with the graduation rate varying more with smaller alumni giving rates.

Questions?