Exploring Relationships Between Variables

Slides:



Advertisements
Similar presentations
Looking at data: relationships - Correlation Lecture Unit 7.
Advertisements

Chapter 6: Exploring Data: Relationships Lesson Plan
Chapter 7 Scatterplots, Association, Correlation Scatterplots and correlation Fitting a straight line to bivariate data © 2006 W. H. Freeman.
LECTURE 2 Understanding Relationships Between 2 Numerical Variables
Chapter 7 Scatterplots and Correlation Scatterplots: graphical display of bivariate data Correlation: a numerical summary of bivariate data.
Association between 2 variables We've described the distribution of 1 variable in Chapter 1 - but what if 2 variables are measured on the same individual?
Relationships Scatterplots and correlation BPS chapter 4 © 2006 W.H. Freeman and Company.
Chapter 6: Exploring Data: Relationships Chi-Kwong Li Displaying Relationships: Scatterplots Regression Lines Correlation Least-Squares Regression Interpreting.
Chapter 3: Examining relationships between Data
Examining Relationships
Chapter 6: Exploring Data: Relationships Lesson Plan Displaying Relationships: Scatterplots Making Predictions: Regression Line Correlation Least-Squares.
LECTURE UNIT 7 Understanding Relationships Among Variables Scatterplots and correlation Fitting a straight line to bivariate data.
Chapter 6 Scatterplots and Correlation Chapter 7 Objectives Scatterplots  Scatterplots  Explanatory and response variables  Interpreting scatterplots.
1 Examining Relationships in Data William P. Wattles, Ph.D. Francis Marion University.
Exploring Relationships Between Variables Chapter 7 Scatterplots and Correlation.
Objectives (IPS Chapter 2.1)
Objectives 2.1Scatterplots  Scatterplots  Explanatory and response variables  Interpreting scatterplots  Outliers Adapted from authors’ slides © 2012.
Chapter 7 Scatterplots, Association, and Correlation.
Chapter 4 - Scatterplots and Correlation Dealing with several variables within a group vs. the same variable for different groups. Response Variable:
Relationships Scatterplots and Correlation.  Explanatory and response variables  Displaying relationships: scatterplots  Interpreting scatterplots.
Warm up In a population of 10 th grade girls, the mean height is 65 in. If 15% of the population is under 61 inches tall, what is the standard deviation?
Lecture 8 Sections Objectives: Bivariate and Multivariate Data and Distributions − Scatter Plots − Form, Direction, Strength − Correlation − Properties.
1 Simple Linear Regression Review 1. review of scatterplots and correlation 2. review of least squares procedure 3. inference for least squares lines.
Lecture 4 Chapter 3. Bivariate Associations. Objectives (PSLS Chapter 3) Relationships: Scatterplots and correlation  Bivariate data  Scatterplots (2.
Scatter plots Adapted from 350/
3. Relationships Scatterplots and correlation
CHAPTER 3 Describing Relationships
Variables Dependent variable: measures an outcome of a study
CHAPTER 3 Describing Relationships
Chapter 6: Exploring Data: Relationships Lesson Plan
Simple Linear Regression Review 1
Module 11 Math 075. Module 11 Math 075 Bivariate Data Proceed similarly as univariate distributions … What is univariate data? Which graphical models.
Exploring Relationships Between Variables
Review for Test Chapters 1 & 2:
Looking at data: relationships - Correlation
Data Analysis and Statistical Software I ( ) Quarter: Autumn 02/03
Describing Bivariate Relationships
Chapter 6: Exploring Data: Relationships Lesson Plan
The Practice of Statistics in the Life Sciences Fourth Edition
Scatterplots, Association, and Correlation
AGENDA: Quiz # minutes Begin notes Section 3.1.
Chapter 7 Part 1 Scatterplots, Association, and Correlation
Variables Dependent variable: measures an outcome of a study
Examining Relationships in Data
Chapter 2 Looking at Data— Relationships
Examining Relationships
Objectives (IPS Chapter 2.3)
Looking at data: relationships - Caution about correlation and regression - The question of causation IPS chapters 2.4 and 2.5 © 2006 W. H. Freeman and.
CHAPTER 3 Describing Relationships
CHAPTER 3 Describing Relationships
Chapter 4 - Scatterplots and Correlation
Chapter 3 Scatterplots and Correlation.
CHAPTER 3 Describing Relationships
CHAPTER 3 Describing Relationships
CHAPTER 3 Describing Relationships
CHAPTER 3 Describing Relationships
Summarizing Bivariate Data
CHAPTER 3 Describing Relationships
CHAPTER 3 Describing Relationships
Correlation/regression using averages
Association between 2 variables
CHAPTER 3 Describing Relationships
Describing Bivariate Relationships
CHAPTER 3 Describing Relationships
Describing Relationships
Day 8 Agenda: Quiz 1.1 & minutes Begin Ch 3.1
Bivariate Data Response Variable: measures the outcome of a study (aka Dependent Variable) Explanatory Variable: helps explain or influences the change.
Review of Chapter 3 Examining Relationships
CHAPTER 3 Describing Relationships
Correlation/regression using averages
Presentation transcript:

Exploring Relationships Between Variables Scatterplots and Correlation

Objectives Scatterplots Explanatory and response variables Interpreting scatterplots Outliers Categorical variables in scatterplots Correlation The correlation coefficient “r” r does not distinguish x and y r has no units r ranges from -1 to +1 Influential points

Basic Terminology Univariate data: 1 variable is measured on each sample unit or population unit e.g. height of each student in a sample Bivariate data: 2 variables are measured on each sample unit or population unit e.g. height and GPA of each student in a sample; (caution: data from 2 separate samples is not bivariate data)

Same goals with bivariate data that we had with univariate data Graphical displays and numerical summaries Seek overall patterns and deviations from those patterns Descriptive measures of specific aspects of the data

Here, we have two quantitative variables for each of 16 students. Beers Blood Alcohol 1 5 0.1 2 0.03 3 9 0.19 4 7 0.095 0.07 6 0.02 8 0.085 0.12 10 0.04 11 0.06 12 0.05 13 14 0.09 15 0.01 16 Here, we have two quantitative variables for each of 16 students. 1) How many beers they drank, and 2) Their blood alcohol level (BAC) We are interested in the relationship between the two variables: How is one affected by changes in the other one?

Scatterplots Useful method to graphically describe the relationship between 2 quantitative variables

Scatterplot: Blood Alcohol Content vs Number of Beers In a scatterplot, one axis is used to represent each of the variables, and the data are plotted as points on the graph. Student Beers BAC 1 5 0.1 2 0.03 3 9 0.19 4 7 0.095 0.07 6 0.02 8 0.085 0.12 10 0.04 11 0.06 12 0.05 13 14 0.09 15 0.01 16 Quantitative data - have two pieces of data per individual and wonder if there is an association between them. Very important in biology, as we not only want to describe individuals but also understand various things about them. Here for example we plot BAC vs number of beers. Can clearly see that there is a pattern to the data. When you drink more beers you generally have a higher BAC. Dots are arranged in pretty straight line - a linear relationship. And since when one goes up, the other does too, it is a positive linear relationship. Also see that

Focus on Three Features of a Scatterplot Look for an overall pattern regarding … Shape - ? Approximately linear, curved, up-and-down? Direction - ? Positive, negative, none? Strength - ? Are the points tightly clustered in the particular shape, or are they spread out? … and deviations from the overall pattern: Outliers 

Scatterplot: Fuel Consumption vs Car Weight. x=car weight, y=fuel cons. (xi, yi): (3.4, 5.5) (3.8, 5.9) (4.1, 6.5) (2.2, 3.3) (2.6, 3.6) (2.9, 4.6) (2, 2.9) (2.7, 3.6) (1.9, 3.1) (3.4, 4.9)

Explanatory and response variables A response variable measures or records an outcome of a study. An explanatory variable explains changes in the response variable. Typically, the explanatory or independent variable is plotted on the x axis, and the response or dependent variable is plotted on the y axis. Explanatory (independent) variable: number of beers Response (dependent) variable: blood alcohol content x y An example of a study in which you are looking at the effects of number of beers on blood alcohol content. If you think about it, the response is obviously an increase in blood alcohol, and we want see if we can explain it by the number of beers drunk. Always put the explanatory variable on the x axis and response variable on the y axis.

SAT Score vs Proportion of Seniors Taking SAT 2005 IW IL NC 74% 1010

Objectives Correlation The correlation coefficient “r” r does not distinguish x and y r has no units r ranges from -1 to +1 Influential points

The correlation coefficient "r" The correlation coefficient is a measure of the direction and strength of the linear relationship between 2 quantitative variables. It is calculated using the mean and the standard deviation of both the x and y variables. Correlation can only be used to describe quantitative variables. Categorical variables don’t have means and standard deviations.

Correlation: Fuel Consumption vs Car Weight

Example: calculating correlation (x1, y1), (x2, y2), (x3, y3) (1, 3) (1.5, 6) (2.5, 8) Automate calculation of the correlation! (Excel, statcrunch, calculator, etc.)

Properties of Correlation r is a measure of the strength of the linear relationship between x and y. No units [like demand elasticity in economics (-infinity, 0)] -1 < r < 1

Properties (cont.) r ranges from -1 to+1 "r" quantifies the strength and direction of a linear relationship between 2 quantitative variables. Strength: how closely the points follow a straight line. Direction: is positive when individuals with higher X values tend to have higher values of Y.

Properties of Correlation (cont.) r = -1 only if y = a + bx with slope b<0 r = +1 only if y = a + bx with slope b>0 y = 1 + 2x y = 11 - x

Properties (cont.) High correlation does not imply cause and effect CARROTS: Hidden terror in the produce department at your neighborhood grocery Everyone who ate carrots in 1920, if they are still alive, has severely wrinkled skin!!! Everyone who ate carrots in 1865 is now dead!!! 45 of 50 17 yr olds arrested in Raleigh for juvenile delinquency had eaten carrots in the 2 weeks prior to their arrest !!!

Properties (cont.) Cause and Effect There is a strong positive correlation between the monetary damage caused by structural fires and the number of firemen present at the fire. (More firemen-more damage) Improper training? Will no firemen present result in the least amount of damage?

Properties (cont.) Cause and Effect (1,2) (24,75) (1,0) (18,59) (9,9) (3,7) (5,35) (20,46) (1,0) (3,2) (22,57) x = fouls committed by player; y = points scored by same player The correlation is due to a third “lurking” variable – playing time r measures the strength of the linear relationship between x and y; it does not indicate cause and effect correlation r = .935