Bivariate Data analysis. Bivariate Data In this PowerPoint we look at sets of data which contain two variables. Scatter plotsCorrelation OutliersCausation.

Slides:



Advertisements
Similar presentations
 Objective: To look for relationships between two quantitative variables.
Advertisements

Correlation Data collected from students in Statistics classes included their heights (in inches) and weights (in pounds): Here we see a positive association.
Chapter 4 The Relation between Two Variables
Chapter 3 Bivariate Data
Copyright ©2006 Brooks/Cole, a division of Thomson Learning, Inc. Relationships Between Quantitative Variables Chapter 5.
Scatter Diagrams and Linear Correlation
AP Statistics Chapters 3 & 4 Measuring Relationships Between 2 Variables.
Correlation and Linear Regression
Scatterplots, Association, and Correlation 60 min.
Chapter 2: Looking at Data - Relationships /true-fact-the-lack-of-pirates-is-causing-global-warming/
Relationships Between Quantitative Variables
Describing the Relation Between Two Variables
Examining Relationship of Variables  Response (dependent) variable - measures the outcome of a study.  Explanatory (Independent) variable - explains.
RESEARCH STATISTICS Jobayer Hossain Larry Holmes, Jr November 6, 2008 Examining Relationship of Variables.
Describing Relationships: Scatterplots and Correlation
Correlation and Regression Analysis
Scatterplots, Association, and Correlation Copyright © 2010, 2007, 2004 Pearson Education, Inc.
September In Chapter 14: 14.1 Data 14.2 Scatterplots 14.3 Correlation 14.4 Regression.
Copyright © 2010, 2007, 2004 Pearson Education, Inc. Chapter 7 Scatterplots, Association, and Correlation.
Looking at data: relationships - Caution about correlation and regression - The question of causation IPS chapters 2.4 and 2.5 © 2006 W. H. Freeman and.
Copyright © 2010 Pearson Education, Inc. Unit 2: Chapter 7 Scatterplots, Association, and Correlation.
Scatterplots, Associations, and Correlation
 Chapter 7 Scatterplots, Association, and Correlation.
BPS - 3rd Ed. Chapter 41 Scatterplots and Correlation.
Section Copyright © 2014, 2012, 2010 Pearson Education, Inc. Lecture Slides Elementary Statistics Twelfth Edition and the Triola Statistics Series.
Notes Bivariate Data Chapters Bivariate Data Explores relationships between two quantitative variables.
1 Chapter 10 Correlation and Regression 10.2 Correlation 10.3 Regression.
Slide 7-1 Copyright © 2004 Pearson Education, Inc.
1 Chapter 7 Scatterplots, Association, and Correlation.
1 Further Maths Chapter 4 Displaying and describing relationships between two variables.
Chapter 10 Correlation and Regression
Objectives (IPS Chapter 2.1)
Correlation Analysis. A measure of association between two or more numerical variables. For examples height & weight relationship price and demand relationship.
Correlation & Regression
Notes Bivariate Data Chapters Bivariate Data Explores relationships between two quantitative variables.
Scatterplots are used to investigate and describe the relationship between two numerical variables When constructing a scatterplot it is conventional to.
Chapters 8 & 9 Linear Regression & Regression Wisdom.
Correlation.  It should come as no great surprise that there is an association between height and weight  Yes, as you would expect, taller students.
Objectives 2.1Scatterplots  Scatterplots  Explanatory and response variables  Interpreting scatterplots  Outliers Adapted from authors’ slides © 2012.
Correlation & Regression Chapter 15. Correlation It is a statistical technique that is used to measure and describe a relationship between two variables.
Relationships If we are doing a study which involves more than one variable, how can we tell if there is a relationship between two (or more) of the.
Chapter 7 Scatterplots, Association, and Correlation
Chapter 5 Regression. u Objective: To quantify the linear relationship between an explanatory variable (x) and response variable (y). u We can then predict.
What do you see in these scatter plots? Latitude (°S) Mean January Air Temperatures for 30 New Zealand Locations Temperature.
Bivariate Data AS (3 credits) Complete a statistical investigation involving bi-variate data.
Creating a Residual Plot and Investigating the Correlation Coefficient.
 Describe the association between two quantitative variables using a scatterplot’s direction, form, and strength  If the scatterplot’s form is linear,
Chapter 4 Scatterplots and Correlation. Chapter outline Explanatory and response variables Displaying relationships: Scatterplots Interpreting scatterplots.
Copyright © 2008 Pearson Education, Inc. Publishing as Pearson Addison-Wesley Chapter 7 Scatterplots, Association, and Correlation.
Copyright © 2010 Pearson Education, Inc. Chapter 7 Scatterplots, Association, and Correlation.
Section Copyright © 2014, 2012, 2010 Pearson Education, Inc. Chapter 10 Correlation and Regression 10-2 Correlation 10-3 Regression.
What Do You See?. A scatterplot is a graphic tool used to display the relationship between two quantitative variables. How to Read a Scatterplot A scatterplot.
Notes Chapter 7 Bivariate Data. Relationships between two (or more) variables. The response variable measures an outcome of a study. The explanatory variable.
Module 11 Scatterplots, Association, and Correlation.
Copyright © 2010, 2007, 2004 Pearson Education, Inc. Chapter 7 Scatterplots, Association, and Correlation.
GOAL: I CAN USE TECHNOLOGY TO COMPUTE AND INTERPRET THE CORRELATION COEFFICIENT OF A LINEAR FIT. (S-ID.8) Data Analysis Correlation Coefficient.
Bivariate Data AS Complete a statistical investigation involving bi-variate data.
Copyright © 2007 Pearson Education, Inc. Publishing as Pearson Addison-Wesley Chapter 7 Scatterplots, Association, and Correlation.
Statistics 7 Scatterplots, Association, and Correlation.
+ The Practice of Statistics, 4 th edition – For AP* STARNES, YATES, MOORE Chapter 3: Describing Relationships Section 3.1 Scatterplots and Correlation.
Copyright © 2007 Pearson Education, Inc. Publishing as Pearson Addison-Wesley Slide 7- 1.
Correlation Correlation measures the strength of the linear association between two quantitative variables Get the correlation coefficient (r) from your.
Scatterplots, Association, and Correlation
What do you see in these scatter plots?
Chapter 7 Part 2 Scatterplots, Association, and Correlation
Scatterplots, Association, and Correlation
Bivariate Data analysis
Scatterplots Scatterplots may be the most common and most effective display for data. In a scatterplot, you can see patterns, trends, relationships, and.
Correlation r Explained
Honors Statistics Review Chapters 7 & 8
Presentation transcript:

Bivariate Data analysis

Bivariate Data In this PowerPoint we look at sets of data which contain two variables. Scatter plotsCorrelation OutliersCausation

Variables Discrete Continuous Quantitative (Numerical) (measurements and counts) Qualitative (categorical) (define groups) Ordinal (fall in natural order) Categorical (no idea of order) We are only going to consider quantitative variables in this AS

Quantitative Discrete Many repeated values Age groups Marks Continuous Few repeated values Height Length Weight

Qualitative Categorical Gender Religious denomination Blood types Sport ’ s numbers (e.g. He wears the number ‘ 8 ’ jersey) Ordinal Grades Places in a race (e.g. 1st, 2nd, 3rd)

We often want to know if there is a relationship between two numerical variables. A scatter plot, which gives a visual display of the relationship between two variables, provides a good starting point.

In a relationship involving two variables, if the values of one variable ‘ depend ’ on the values of another variable, then the former variable is referred to as the dependent (or response) variable and the latter variable is referred to as the independent (or explanatory) variable. y - axis dependent (response) variable x - axis independent (explanatory) variable

Consider data on ‘ hours of study ’ vs ‘ test score ’ HoursScoreHoursScoreHoursScore

We may want to see if we could predict the test score (response variable) based on the hours of study (explanatory variable). y - axis: Test score x - axis: Hours of study

We look for a pattern in the way the points lie Certain patterns tell us about the relationship This is called correlation This point is an outlier

We could describe the rest of the data as having a linear form.

Scatter plots Use hollow circles for points Label axes correctly with units What you want to predict goes on the y-axis (response variable) Title of graph No background; No gridlines Unless you need to show categories- no legend Show different categories on a single graph in different colours rather than on separate graphs. Adjust scale and size of font (14pt for pasting)

What to look for in your plot? Direction of the relationship - positive or negative Form of the graph - linear or curved The strength - whether it is strong, moderate or weak Scatter - constant scatter, a fan effect… Outliers Groupings

Page 22

What do you see in this scatter plot? There appears to be a linear trend. There appears to be moderate constant scatter. Negative Association. No outliers or groupings visible Latitude (°S) Mean January Air Temperatures for 30 New Zealand Locations Temperature (°C)

What do you see in this scatter plot? There appears to be a non-linear trend. There appears to be non-constant scatter about the trend line. Positive Association. One possible outlier (Large GDP, low % Internet Users). Internet Users (%) % of population who are Internet Users vs GDP per capita for 202 Countries

What do you see in this scatter plot? Two non-linear trends (Male and Female). Very little scatter about the trend lines Negative association until about 1970, then a positive association. Gap in the data collection (Second World War).

Rank these relationships from weakest (1) to strongest (4):

Describe these relationships Perfect, negative, linear relationship Perfect, positive, linear relationship No relationship Moderate, negative linear relationship Weak, positive linear relationship

Describe this relationship.

As the hours of study increase, the test score....?...

Pearson ’ s product-moment correlation coefficient, r Correlation measures the strength of the linear association between two quantitative variables. r = -1 r = -0.7 r = -0.4 r = 0 r = 0.3 r = 0.8 r = 1 Points fall exactly on a straight line No linear relationship (uncorrelated) Points fall exactly on a straight line The correlation coefficient may take any value between -1.0 and +1.0

How close the points in the scatter plot come to lying on the line. r - what does it tell you? r = 0.99 x y * * * * * * * * ** * * * * * * * * * * r = 0.57 x y * * * * * * * * * * * * * * * * * * * * r = 0.99r = 0.57

Interpreting r Strong positive linear association Moderate positive linear association Weak positive linear association No association or weak linear association Weak negative linear association Moderate negative linear association Strong negative linear association

Useful websites e/index.html Regression by eyehttp:// e/index.html Guessinghttp://istics.net/stat/Correlations/ spx?ID=L455#whatif effect of outliershttp://illuminations.nctm.org/LessonDetail.a spx?ID=L455#whatif

Assumptions linear relationship between x and y continuous random variables The residuals must be normally distributed x and y must be independent of each other all individuals must be selected at random from the population all individuals must have equal chance of being selected

What is correlation? A measure of the strength of a LINEAR association between two quantitative variables.

Sure you can calculate a correlation coefficient for any pair of variables but correlation measures the strength only of the linear association and will be misleading if the relationship is not linear.

Do you know that: Correlation applies only to quantitative variables. Check you know the units and what they measure. Outliers can distort the correlation dramatically.

Some facts about the correlation coefficient The sign gives the direction of the association. Correlation is always between -1 and 1. Correlation treats x and y symmetrically. The correlation of x and y is the same as the correlation of y with x. Correlation has no units and is generally given as a decimal. r is a multiple of the slope Note: variables can have a strong association but still have a small correlation if the association isn ’ t linear. Correlation is sensitive to outliers. A single outlying value can make a small correlation large or make a large one small.

The sign gives the direction of the association. Positive Negative

Correlation treats x and y symmetrically. The correlation of x and y is the same as the correlation of y with x.

r is a multiple of the slope

Variables can have a strong association but still have a small correlation if the association isn ’ t linear. Always plot the data before looking at the correlation!

Would it be OK to use a correlation coefficient to describe the strength of the relationship? Position Number Distance (million miles) Distances of Planets from the Sun √ Reaction Times (seconds) for 30 Year 10 Students Non-dominant Hand Dominant Hand Latitude (°S) Mean January Air Temperatures for 30 New Zealand Locations Temperature (°C) √ Female ($) Average Weekly Income for Employed New Zealanders in 2001 Male ($) X X

Correlation is sensitive to outliers. A single outlying value can make a small correlation large or make a large one small.

You should be cautious in interpreting the correlation - these graphs all have the same correlation coefficient (0.817)

Data set 1

Data set 2

Data set 3

Data set 4

Outliers can distort the correlation dramatically. An outlier can make an otherwise small correlation look big or hide a large correlation. It can even give an otherwise positive association a negative correlation coefficient (and vice versa).

What do you see in this scatterplot? Foot size (cm) Height (cm) Height and Foot Size for 30 Year 10 Students Appears to be a linear trend, with a possible outlier (tall person with a small foot size.) Appears to be constant scatter. Positive association.

What will happen to the correlation coefficient if the tallest Year 10 student is removed? Foot size (cm) Height (cm) Height and Foot Size for 30 Year 10 Students It will get smaller It will get bigger It will stay the same

What do you see in this scatter plot? Appears to be a strong linear trend. Outlier in X (the elephant). Appears to be constant scatter. Positive association Gestation (Days) Life Expectancy (Years) Life Expectancies and Gestation Period for a sample of non-human Mammals Elephant

Gestation (Days) Life Expectancy (Years) Life Expectancies and Gestation Period for a sample of non-human Mammals Elephant What will happen to the correlation coefficient if the elephant is removed? It will get smaller It will get bigger It will stay the same

How does the outlier affect the r - value?

When you see an outlier, it ’ s often a good idea to report the correlations with and without the point.

Don ’ t confuse Correlation with causation. Scatterplots and correlation never prove causation.

Using the information in the plot, can you suggest what needs to be done in a country to increase the life expectancy? Explain People per Doctor Life Expectancy Life Expectancy and Availability of Doctors for a Sample of 40 Countries Perhaps if you have less people per Doctor (i.e. more Doctors per person), then the life expectancy will increase.

Using the information in this plot, can you make another suggestion as to what needs to be done in a country to increase life expectancy? People per Television Life Expectancy Life Expectancy and Availability of Televisions for a Sample of 40 Countries It looks like if you decrease the number of people per television (i.e. have more TVs per person), then the life expectancy will increase!

Can you suggest another variable that is linked to life expectancy and the availability of doctors (and televisions) which explains the association between the life expectancy and the availability of doctors (and televisions)? Some measure of wealth of a country. Eg Average income per person or GDP.

Damaged for life by too much TV

Watching too much television as a child causes serious health problems years later, and raises the risk of heart disease, a New Zealand study of 1000 children has found…. It links the amount of time spent in front of the box as a child with obesity, high cholesterol, poor fitness and smoking…. Damaged for life by too much TV

Health Score TV watching r =

Causal relationships Two general types of studies: experiments and observational studies In an experiment, the experimenter determines which experimental units receive which treatments. In an observational study, we simply compare units that happen to have received each of the treatments.

Only properly designed and carefully executed experiments can reliably demonstrate causation. An observational study is often useful for identifying possible causes of effects, but it cannot reliably establish causation Causal relationships

In observational studies, strong relationships are not necessarily causal relationships. Correlation does not imply causation. Be aware of the possibility of lurking variables. Causal relationships

Watch out for lurking variables. Damage ($) vs number of firemen would show a strong correlation, but damage doesn ’ t cause firemen and firemen do seem to cause damage (spraying water and chopping holes). The underlying variable is the size of the blaze.

Although there was plenty of evidence that increased smoking was associated with increased levels of lung cancer, it took years to provide evidence that smoking actually causes lung cancer.

It would be a good idea to read the two pages of notes you have that discusses correlation and causation!

So now you want to know how to calculate the correlation coefficient, r. Here is one version of the formula!

Luckily the computer will calculate R 2 and you can square root this to get r. Remember only when the association is linear.

r measures the strength of the relationship NOT R 2 !!!! r measures the strength of the relationship NOT R 2 !!!! r measures the strength of the relationship NOT R 2 !!!!

The words you use There is a strong, positive, linear relationship between ‘ x ’ and ‘ y ’ and when the x- values increase, the y-values increase also. This is indicated by the value of the correlation coefficient i.e. r = 0.85 which is close to 1. (Note: Do not use ‘ x ’ and ‘ y ’ use what they represent.)