Exploratory Data Analysis: Two Variables

Slides:



Advertisements
Similar presentations
Chapter 3, Numerical Descriptive Measures
Advertisements

Chapter 3 Examining Relationships
Correlation and Linear Regression
 Objective: To look for relationships between two quantitative variables.
Chapter 4 The Relation between Two Variables
1 Objective Investigate how two variables (x and y) are related (i.e. correlated). That is, how much they depend on each other. Section 10.2 Correlation.
Chapter 3 Review Two Variable Statistics Veronica Wright Christy Treekhem River Brooks.
Chapter 41 Describing Relationships: Scatterplots and Correlation.
Chapter 2: Looking at Data - Relationships /true-fact-the-lack-of-pirates-is-causing-global-warming/
Describing the Relation Between Two Variables
Scatterplots By Wendy Knight. Review of Scatterplots  Scatterplots – Show the relationship between 2 quantitative variables measured on the same individual.
Describing Relationships: Scatterplots and Correlation
Ch 2 and 9.1 Relationships Between 2 Variables
1 Copyright © 2005 Brooks/Cole, a division of Thomson Learning, Inc. Summarizing Bivariate Data Terms, Scatterplots and Correlation.
Scatterplots, Association, and Correlation Copyright © 2010, 2007, 2004 Pearson Education, Inc.
Statistics for Managers Using Microsoft Excel, 4e © 2004 Prentice-Hall, Inc. Chap 3-1 Chapter 3 Numerical Descriptive Measures Statistics for Managers.
Examining Relationships Prob. And Stat. CH.2.1 Scatterplots.
Looking at data: relationships - Caution about correlation and regression - The question of causation IPS chapters 2.4 and 2.5 © 2006 W. H. Freeman and.
CHAPTER 4: Scatterplots and Correlation ESSENTIAL STATISTICS Second Edition David S. Moore, William I. Notz, and Michael A. Fligner Lecture Presentation.
Examining Relationships
Scatterplots, Association,
Quantitative Data Essential Statistics. Quantitative Data O Review O Quantitative data is any data that produces a measurement or amount of something.
Lecture PowerPoint Slides Basic Practice of Statistics 7 th Edition.
Variable  An item of data  Examples: –gender –test scores –weight  Value varies from one observation to another.
Notes Bivariate Data Chapters Bivariate Data Explores relationships between two quantitative variables.
1 Chapter 10 Correlation and Regression 10.2 Correlation 10.3 Regression.
Sullivan – Fundamentals of Statistics – 2 nd Edition – Chapter 4 Section 1 – Slide 1 of 30 Chapter 4 Section 1 Scatter Diagrams and Correlation.
1 Chapter 7 Scatterplots, Association, and Correlation.
Chapter 2 Describing Data.
Chapter 10 Correlation and Regression
Correlation & Regression
Notes Bivariate Data Chapters Bivariate Data Explores relationships between two quantitative variables.
Categorical vs. Quantitative…
Lecture PowerPoint Slides Basic Practice of Statistics 7 th Edition.
Objectives 2.1Scatterplots  Scatterplots  Explanatory and response variables  Interpreting scatterplots  Outliers Adapted from authors’ slides © 2012.
1.1 example these are prices for Internet service packages find the mean, median and mode determine what type of data this is create a suitable frequency.
The Practice of Statistics
Relationships If we are doing a study which involves more than one variable, how can we tell if there is a relationship between two (or more) of the.
Chapter 7 Scatterplots, Association, and Correlation.
Chapter 5 Regression. u Objective: To quantify the linear relationship between an explanatory variable (x) and response variable (y). u We can then predict.
Bivariate Data AS (3 credits) Complete a statistical investigation involving bi-variate data.
Relationships Scatterplots and correlation BPS chapter 4 © 2006 W.H. Freeman and Company.
1 Association  Variables –Response – an outcome variable whose values exhibit variability. –Explanatory – a variable that we use to try to explain the.
Chapter 4 Scatterplots and Correlation. Chapter outline Explanatory and response variables Displaying relationships: Scatterplots Interpreting scatterplots.
Chapter 2 Examining Relationships.  Response variable measures outcome of a study (dependent variable)  Explanatory variable explains or influences.
9.1 - Correlation Correlation = relationship between 2 variables (x,y): x= independent or explanatory variable y= dependent or response variable Types.
Business Statistics for Managerial Decision Making
Multivariate Data. Descriptive techniques for Multivariate data In most research situations data is collected on more than one variable (usually many.
What Do You See?. A scatterplot is a graphic tool used to display the relationship between two quantitative variables. How to Read a Scatterplot A scatterplot.
Chapter 7 Scatterplots, Association, and Correlation.
Scatterplots Association and Correlation Chapter 7.
Notes Chapter 7 Bivariate Data. Relationships between two (or more) variables. The response variable measures an outcome of a study. The explanatory variable.
Lecture 8 Sections Objectives: Bivariate and Multivariate Data and Distributions − Scatter Plots − Form, Direction, Strength − Correlation − Properties.
Module 11 Scatterplots, Association, and Correlation.
Lecture PowerPoint Slides Basic Practice of Statistics 7 th Edition.
Statistical Methods © 2004 Prentice-Hall, Inc. Week 3-1 Week 3 Numerical Descriptive Measures Statistical Methods.
Introduction Exploring Categorical Variables Exploring Numerical Variables Exploring Categorical/Numerical Variables Selecting Interesting Subsets of Data.
GOAL: I CAN USE TECHNOLOGY TO COMPUTE AND INTERPRET THE CORRELATION COEFFICIENT OF A LINEAR FIT. (S-ID.8) Data Analysis Correlation Coefficient.
Statistics 7 Scatterplots, Association, and Correlation.
Unit 3 – Association: Contingency, Correlation, and Regression Lesson 3-2 Quantitative Associations.
Scatterplots, Association, and Correlation. Scatterplots are the best way to start observing the relationship and picturing the association between two.
(Unit 6) Formulas and Definitions:. Association. A connection between data values.
Midterm Review IN CLASS. Chapter 1: The Art and Science of Data 1.Recognize individuals and variables in a statistical study. 2.Distinguish between categorical.
CORRELATION.
CHAPTER 7 LINEAR RELATIONSHIPS
SIMPLE LINEAR REGRESSION MODEL
Lecture Notes The Relation between Two Variables Q Q
11A Correlation, 11B Measuring Correlation
Summarizing Bivariate Data
CORRELATION & REGRESSION compiled by Dr Kunal Pathak
Presentation transcript:

Exploratory Data Analysis: Two Variables FPP 7-9

Exploratory data analysis: two variables There are three combinations of variables we must consider. We do so in the following order 1 qualitative/categorical, 1 quantitative variables Side-by-side box plots, counts, etc. 2 quantitative variables Scatter plots, correlations, regressions 2 qualitative/categorical variables Contingency tables (we will cover these later in the semester)

Box plots A box plot is a graph of five numbers (often called the five number summary) minimum Maximum Median 1st quartile 3rd quartile We know how to compute three of the numbers (min,max,median) To compute the 1st quartile find the median of the 50% of observations that are smaller than the median To compute the 3rd quartile find the median of the 50% of observatins that are bigger than the median

JMP box plots

Side-by-side box plots Side-by-side box plots are graphical summaries of data when one variable is categorical and the other quantitative These plots can be used to compare the distributions associated with the the quantitative variable across the levels of the categorical variable

Pets and stress Are there any differences in stress levels when doing tasks with your pet, a good friend, or alone? Allen et al. (1988) asked 45 people to count backwards by 13s and 17s. People were randomly assigned to one of the three groups: pet, friend, alone. Response is subject’s average heart rate during task

Pets and stress It looks like the task is most stressful around friends and least stressful around pets Note we are comparing quantitative variable (heart rate) across different levels of categorical variable (group)

Vietnam draft lottery In 1970, the US government drafted young men for military service in the Vietnam War. These men were drafted by means of a random lottery. Basically, paper slips containing all dates in January were placed in a wooden box and then mixed. Next, all dates in February (including 2/29) were added to the box and mixed. This procedure was repeated until all 366 dates were mixed in the box. Finally, dates were successively drawn without replacement. The first data drawn (Sept. 14) was assigned rank 1, the second data drawn (April 24) was assigned rank 2, and so on. Those eligible for the draft who were born on Sept. 14 were called first to service, then those born on April 24 were called, and so on. Soon after the lottery, people began to complain that the randomization system was not completely fair. They believed that birth dates later in the year had lower lottery numbers than those earlier in the year (Fienberg, 1971) What do the data say? Was the draft lottery fair? Let’s to a statistical analysis of the data to find out.

Draft rank by month in the Vietnam draft lottery: Raw data

Draft rank by month in the Vietnam draft lottery: Box plots

Exploratory data analysis two quantitative variables Scatter plots A scatter plot shows one variable vs. the other in a 2-dimensional graph Always plot the explanatory variable, if there is one, on the horizontal axis We usually call the explanatory variable x and the response variable y alternatively x is called the independent variable y the dependent If there is no explanatory-response distinction, either variable can go on the horizontal axis

Example

Describing scatter plots Form Linear, quadratic, exponential Direction Positive association An increase in one variable is accompanied by an increase in the other Negatively associated A decrease in one variable is accompanied by an increase in the other Strength How closely the points follow a clear form

Describing scatter plots Form: Linear Direction Positive Strength Strong

Correlation coefficient We need something more than an arbitrary ocular guess to assess the strength of an association between two variables. We need a value that can summarize the strength of a relationship That doesn’t change when units change That makes no distinction between the response and explanatory variables

Correlation Coefficient Definition: Correlation coefficient is a quantity used to measure the direction and strength of a linear relationship between two quantitative variables. We will denote this value as r

Computing correlation coefficient Let x, y be any two quantitative variables for n individuals

Correlation coefficient Remember are standardized values of variable x and y respectively The correlation r is an average of the products of the standardized values of the two variables x and y for the n observations

Properties of r Makes no distinction between explanatory and response variables Both variables must be quantitative No ordering with qualitative variables Is invariant to change of units Is between -1 and 1 Is affected by outliers Measures strength of association for only linear relationships!

True or False Let X be GNP for the U.S. in dollars and Y be GNP for Mexico, in pesos. Changing Y to U.S. dollars changes the value of the correlation.

Correlation Coefficient is ____ Correlation Coefficient is ____ 5 Correlation Coefficient is ____ 5 Correlation Coefficient is _____ 5 Correlation Coefficient is ____ 5

Think about it In each case, say which correlation is higher. Height at age 4 and height at age 18, height at age 16 and height at age 18 Height at age 4 and height at age 18, weight at age 4 and weight at age 18 Height and weight at age 4, height at weight at age 18

Correlation coefficient Correlation is not an appropriate measure of association for non-linear relationships What would r be for this scatter plot

Correlation coefficient

Correlation coefficient CORRELATION IS NOT CAUSATION A substantial correlation between two variables might indicate the influence of other variables on both Or, lack of substantial correlation might mask the effect of the other variables

Correlation coefficient CORRELATION IS NOT CAUSATION Plot of life expectancy of population and number of people per TV for 22 countries (1991 data)

Correlation coefficient CORRELATION IS NOT CAUSATION A study showed that there was a strong correlation between the number of firefighters at a fire and the property damage that the fire causes. We should send less fire fighters to fight fires right?? Example of a lurking variable. What might it be?

Interpreting correlations A newspaper article contains a quote from a psychologist, who says, “The evidence indicates the correlation between the research productivity and teaching rating of faculty members is close to zero.” The paper reports this as “The professor said that good researchers tend to be poor teachers, and vice versa.” Did the newspaper get it right?

Correlation coefficient What’s wrong with each of these statements? There exists a high correlation between the gender of American workers and their income. The correlation between amount of sunlight and plant growth was r = 0.35 centimeters. There is a correlation of r =1.78 between speed of reading and years of practice

Examining many correlations simultaneously The correlation matrix displays correlations for all pairs of variables

Ecological correlation Correlations based on rates or averages. How will using rates or averages affect r?