Lecture 8 Sections 3.1-3.2 Objectives: Bivariate and Multivariate Data and Distributions − Scatter Plots − Form, Direction, Strength − Correlation − Properties.

Slides:



Advertisements
Similar presentations
Chapter 3 Examining Relationships Lindsey Van Cleave AP Statistics September 24, 2006.
Advertisements

Scatterplots and Correlation
Correlation and Linear Regression
Chapter 6: Exploring Data: Relationships Lesson Plan
AP Statistics Chapters 3 & 4 Measuring Relationships Between 2 Variables.
Chapter 2: Looking at Data - Relationships /true-fact-the-lack-of-pirates-is-causing-global-warming/
Looking at data: relationships - Correlation IPS chapter 2.2 Copyright Brigitte Baldi 2005 ©
Looking at Data-Relationships 2.1 –Scatter plots.
Looking at data: relationships Scatterplots IPS chapter 2.1 © 2006 W. H. Freeman and Company.
CHAPTER 3 Describing Relationships
Correlation and Regression Analysis
Chapter 7 Scatterplots, Association, Correlation Scatterplots and correlation Fitting a straight line to bivariate data © 2006 W. H. Freeman.
LECTURE 2 Understanding Relationships Between 2 Numerical Variables
Chapter 7 Scatterplots and Correlation Scatterplots: graphical display of bivariate data Correlation: a numerical summary of bivariate data.
Descriptive Methods in Regression and Correlation
Relationship of two variables
CENTRE FOR INNOVATION, RESEARCH AND COMPETENCE IN THE LEARNING ECONOMY Session 2: Basic techniques for innovation data analysis. Part I: Statistical inferences.
Scatterplots, Association, and Correlation Copyright © 2010, 2007, 2004 Pearson Education, Inc.
Association between 2 variables We've described the distribution of 1 variable in Chapter 1 - but what if 2 variables are measured on the same individual?
Relationships Scatterplots and correlation BPS chapter 4 © 2006 W.H. Freeman and Company.
Chapter 6: Exploring Data: Relationships Chi-Kwong Li Displaying Relationships: Scatterplots Regression Lines Correlation Least-Squares Regression Interpreting.
CHAPTER 4: Scatterplots and Correlation ESSENTIAL STATISTICS Second Edition David S. Moore, William I. Notz, and Michael A. Fligner Lecture Presentation.
Chapter 6: Exploring Data: Relationships Lesson Plan Displaying Relationships: Scatterplots Making Predictions: Regression Line Correlation Least-Squares.
LECTURE UNIT 7 Understanding Relationships Among Variables Scatterplots and correlation Fitting a straight line to bivariate data.
Looking at data: relationships Scatterplots IPS chapter 2.1 © 2006 W. H. Freeman and Company.
1 Chapter 10 Correlation and Regression 10.2 Correlation 10.3 Regression.
Exploring Relationships Between Variables Chapter 7 Scatterplots and Correlation.
Objectives (IPS Chapter 2.1)
Notes Bivariate Data Chapters Bivariate Data Explores relationships between two quantitative variables.
The Practice of Statistics, 5th Edition Starnes, Tabor, Yates, Moore Bedford Freeman Worth Publishers CHAPTER 3 Describing Relationships 3.1 Scatterplots.
CHAPTER 4: Scatterplots and Correlation ESSENTIAL STATISTICS Second Edition David S. Moore, William I. Notz, and Michael A. Fligner Lecture Presentation.
The Practice of Statistics, 5th Edition Starnes, Tabor, Yates, Moore Bedford Freeman Worth Publishers CHAPTER 3 Describing Relationships 3.1 Scatterplots.
Lecture PowerPoint Slides Basic Practice of Statistics 7 th Edition.
Objectives 2.1Scatterplots  Scatterplots  Explanatory and response variables  Interpreting scatterplots  Outliers Adapted from authors’ slides © 2012.
Relationships If we are doing a study which involves more than one variable, how can we tell if there is a relationship between two (or more) of the.
Chapter 7 Scatterplots, Association, and Correlation.
Relationships Scatterplots and correlation BPS chapter 4 © 2006 W.H. Freeman and Company.
3.2: Linear Correlation Measure the strength of a linear relationship between two variables. As x increases, no definite shift in y: no correlation. As.
4.2 Correlation The Correlation Coefficient r Properties of r 1.
Section 5.1: Correlation. Correlation Coefficient A quantitative assessment of the strength of a relationship between the x and y values in a set of (x,y)
MGT-491 QUANTITATIVE ANALYSIS AND RESEARCH FOR MANAGEMENT OSMAN BIN SAIF Session 22.
Chapter 3-Examining Relationships Scatterplots and Correlation Least-squares Regression.
Chapter 4 Scatterplots and Correlation. Chapter outline Explanatory and response variables Displaying relationships: Scatterplots Interpreting scatterplots.
Chapter 2 Examining Relationships.  Response variable measures outcome of a study (dependent variable)  Explanatory variable explains or influences.
Relationships Scatterplots and Correlation.  Explanatory and response variables  Displaying relationships: scatterplots  Interpreting scatterplots.
Statistics: Analyzing 2 Quantitative Variables MIDDLE SCHOOL LEVEL  Session #2  Presented by: Dr. Del Ferster.
What Do You See?. A scatterplot is a graphic tool used to display the relationship between two quantitative variables. How to Read a Scatterplot A scatterplot.
Notes Chapter 7 Bivariate Data. Relationships between two (or more) variables. The response variable measures an outcome of a study. The explanatory variable.
Correlation  We can often see the strength of the relationship between two quantitative variables in a scatterplot, but be careful. The two figures here.
Lecture 4 Chapter 3. Bivariate Associations. Objectives (PSLS Chapter 3) Relationships: Scatterplots and correlation  Bivariate data  Scatterplots (2.
Lecture 3 – Sep 3. Normal quantile plots are complex to do by hand, but they are standard features in most statistical software. Good fit to a straight.
Statistics for Business and Economics Module 2: Regression and time series analysis Spring 2010 Lecture 2: Examining the relationship between two quantitative.
Scatter plots Adapted from 350/
+ The Practice of Statistics, 4 th edition – For AP* STARNES, YATES, MOORE Chapter 3: Describing Relationships Section 3.1 Scatterplots and Correlation.
3. Relationships Scatterplots and correlation
Chapter 3: Describing Relationships
The Practice of Statistics in the Life Sciences Fourth Edition
Scatterplots, Association, and Correlation
Chapter 7 Part 1 Scatterplots, Association, and Correlation
Chapter 2 Looking at Data— Relationships
CHAPTER 3 Describing Relationships
Chapter 3 Scatterplots and Correlation.
CHAPTER 3 Describing Relationships
CHAPTER 3 Describing Relationships
Summarizing Bivariate Data
CHAPTER 3 Describing Relationships
CHAPTER 3 Describing Relationships
CHAPTER 3 Describing Relationships
CHAPTER 3 Describing Relationships
CHAPTER 3 Describing Relationships
Presentation transcript:

Lecture 8 Sections Objectives: Bivariate and Multivariate Data and Distributions − Scatter Plots − Form, Direction, Strength − Correlation − Properties of Correlation

Multivariate Data A multivariate data set consists of observations made simultaneously on two or more variables. One important special case is that of bivariate data, in which observations on only two variables, x and y, are available. We’ll study 1)the scatter plot: a graphical tool to gain insight into the nature of any relationship between x and y. 2)the correlation coefficient: a numerical measure of how strongly two variables are related. 3) the regression problem: a statistical tool to model the relationship between two variables and to predict y from x.

Scatter Plots A scatter plot is a graphical tool for displaying association between two quantitative variables measured on the same individuals. You can’t use the scatter plot to display the association between two qualitative variables or the association between a qualitative variable and quantitative variable. A response variable measures or records an outcome of a study. An explanatory variable explains changes in the response variable. Typically, the explanatory or independent variable is plotted on the x axis, and the response or dependent variable is plotted on the y axis.

Scatter Plots After plotting two variables on a scatterplot, we describe the relationship by examining the form, direction, and strength of the association. We look for an overall pattern … –Form: linear, curved, clusters, no pattern –Direction: positive, negative, no direction –Strength: how closely the points fit the “form” … and deviations from that pattern. –Outliers

Scatter Plots Linear Nonlinear No relationship

The strength of the relationship between the two variables can be seen by how much variation, or scatter, there is around the main form. With a strong relationship, you can get a pretty good estimate of y if you know x. With a weak relationship, for any x you might get a wide range of y values. Strength of association

Outliers An outlier is a data value that has a very low probability of occurrence (i.e., it is unusual or unexpected). In a scatterplot, outliers are points that fall outside of the overall pattern of the relationship.

Example Forest growth and decline phenomena throughout the world have attracted considerable public and scientific interest. The following observations were taken on y=mean crown dieback (%) and x=soil pH (in the article “Relationships Among Crown Condition, Growth, and Sand Nutrition in Seven Northern Vermont Sugarbushes”, Cana. J. of Forest Res., 1995: ): x: y: Plot the scatterplot and explain the association.

Sample Correlation Coefficient Pearson’s sample correlation coefficient r is given by  The correlation coefficient is a measure of the direction and strength of a linear relationship.  It is calculated using the mean and the standard deviation of both the x and y variables.  Correlation can only be used to describe quantitative variables. Categorical variables don’t have means and standard deviations.

Properties of Correlation Coefficient, r  Correlation coefficient does not depend on the unit of measurement for either variable.  Correlation coefficient is not affected by the distinction between explanatory and response variables.  Correlation coefficient is always a number between -1 and 1. Value of r near 0 indicate a very weak linear relationship while values of r close to -1 or 1 indicate a strong linear relationship. Positive r indicates a positive linear association between the variables and negative r indicates a negative linear association.  Correlation coefficient is strongly affected by outliers.

"r" ranges from -1 to +1 "r" quantifies the strength and direction of a linear relationship between 2 quantitative variables. Strength: how closely the points follow a straight line. Direction: is positive when individuals with higher X values tend to have higher values of Y.

Correlations are calculated using means and standard deviations, and thus are NOT resistant to outliers. Outliers Just moving one point away from the general trend here decreases the correlation from to -0.75

Example In recent years, environmental scientists have mounted a major effort to investigate the sources of acid rain. Nitrates are a major constituent of acid rain, and arsenic has been proposed as a tracer element. The accompanying data on x=nitrate concentration (μM) of a precipitation sample and y=arsenic concentration (nM) was from the article “The Atmospheric Deposition of Arsenic and Association with Acid Precipitation” (Atmospheric Environ., 1988: ): x: y: Calculate the correlation coefficient r. Sample correlation coefficient measures the direction and strength of linear association between two quantitative variables. A value of r close to zero does not rule out any strong relationship between x and y; there could still be a strong relationship but one that is not linear.

Example The accompanying data on y=glucose concentration(g/L) and x=fermentation time (days) for a particular brand of malt liquor was read from a scatter plot appearing in the article “Improving Fermentation Productivity with Reverse Osmosis” (Food Tech., 1984:92-96): x: y: Calculate the correlation coefficient r and state the relationship using the scatter plot.

Population Correlation Coefficient The sample correlation coefficient r measures how strongly the x and y values in a sample of pairs are linearly related. There is an analogous measure of how strongly x and y are related in the entire population of pairs from which the sample (x 1,y 1 ),…,( x n,y n ) was obtained. It is called the population correlation coefficient and is denoted by ρ. The population correlation coefficient satisfies 1) -1 ≤ ρ ≤ 1 2) ρ = 1 or -1 if and only if all (x,y) pairs in the population lie exactly on a straight line.

Correlation Not Causation Correlation and Causation o Correlation between variables need not be the result of a causal link between them. o It is possible to find correlation between variables, that in truth have nothing to do with each other. o Association does not imply causation. Example. Consider x= # of TV sets per person for a country and y=life expectancy. Suppose that r (correlation b/w x and y) is large & positive. Could we lengthen the lives of people by shipping TV sets? Economic status can cause such a high correlation b/w x and y. These two variables are strongly related to another third variable like “Economic status”. These variables called “lurking variable”.