Basic Concepts of Correlation. Definition A correlation exists between two variables when the values of one are somehow associated with the values of.

Slides:



Advertisements
Similar presentations
Chapter 10 Correlation and Regression
Advertisements

11-1 Empirical Models Many problems in engineering and science involve exploring the relationships between two or more variables. Regression analysis.
Lesson 10: Linear Regression and Correlation
Forecasting Using the Simple Linear Regression Model and Correlation
6-1 Introduction To Empirical Models 6-1 Introduction To Empirical Models.
Regression Analysis Module 3. Regression Regression is the attempt to explain the variation in a dependent variable using the variation in independent.
Probabilistic & Statistical Techniques Eng. Tamer Eshtawi First Semester Eng. Tamer Eshtawi First Semester
Correlation and Regression
1 Objective Investigate how two variables (x and y) are related (i.e. correlated). That is, how much they depend on each other. Section 10.2 Correlation.
© 2010 Pearson Prentice Hall. All rights reserved Least Squares Regression Models.
Chapter 12 Simple Regression
SIMPLE LINEAR REGRESSION
10-2 Correlation A correlation exists between two variables when the values of one are somehow associated with the values of the other in some way. A.
© 2000 Prentice-Hall, Inc. Chap Forecasting Using the Simple Linear Regression Model and Correlation.
11-1 Empirical Models Many problems in engineering and science involve exploring the relationships between two or more variables. Regression analysis.
Correlation and Regression Analysis
1 Chapter 10 Correlation and Regression We deal with two variables, x and y. Main goal: Investigate how x and y are related, or correlated; how much they.
1 1 Slide © 2008 Thomson South-Western. All Rights Reserved Slides by JOHN LOUCKS & Updated by SPIROS VELIANITIS.
Correlation & Regression
STATISTICS ELEMENTARY C.M. Pascual
Linear Regression.
Chapter 10 Correlation and Regression
Introduction to Linear Regression and Correlation Analysis
Copyright © 2010, 2007, 2004 Pearson Education, Inc. All Rights Reserved Section 10-3 Regression.
Relationship of two variables
Correlation.
Section Copyright © 2014, 2012, 2010 Pearson Education, Inc. Lecture Slides Elementary Statistics Twelfth Edition and the Triola Statistics Series.
Sections 9-1 and 9-2 Overview Correlation. PAIRED DATA Is there a relationship? If so, what is the equation? Use that equation for prediction. In this.
1 Chapter 9. Section 9-1 and 9-2. Triola, Elementary Statistics, Eighth Edition. Copyright Addison Wesley Longman M ARIO F. T RIOLA E IGHTH E DITION.
Copyright © 2010, 2007, 2004 Pearson Education, Inc. All Rights Reserved Section 10-1 Review and Preview.
Section Copyright © 2014, 2012, 2010 Pearson Education, Inc. Lecture Slides Elementary Statistics Twelfth Edition and the Triola Statistics Series.
Probabilistic and Statistical Techniques 1 Lecture 24 Eng. Ismail Zakaria El Daour 2010.
Section Copyright © 2014, 2012, 2010 Pearson Education, Inc. Lecture Slides Elementary Statistics Twelfth Edition and the Triola Statistics Series.
OPIM 303-Lecture #8 Jose M. Cruz Assistant Professor.
Copyright © 2010, 2007, 2004 Pearson Education, Inc. All Rights Reserved Lecture Slides Elementary Statistics Eleventh Edition and the Triola.
Section Copyright © 2014, 2012, 2010 Pearson Education, Inc. Lecture Slides Elementary Statistics Twelfth Edition and the Triola Statistics Series.
1 Chapter 10 Correlation and Regression 10.2 Correlation 10.3 Regression.
Chapter 10 Correlation and Regression
Chapter 4 Linear Regression 1. Introduction Managerial decisions are often based on the relationship between two or more variables. For example, after.
Slide Slide 1 Warm Up Page 536; #16 and #18 For each number, answer the question in the book but also: 1)Prove whether or not there is a linear correlation.
1 11 Simple Linear Regression and Correlation 11-1 Empirical Models 11-2 Simple Linear Regression 11-3 Properties of the Least Squares Estimators 11-4.
© Copyright McGraw-Hill Correlation and Regression CHAPTER 10.
Chapter 10 Correlation and Regression Lecture 1 Sections: 10.1 – 10.2.
Section Copyright © 2014, 2012, 2010 Pearson Education, Inc. Chapter 10 Correlation and Regression 10-2 Correlation 10-3 Regression.
1 MVS 250: V. Katch S TATISTICS Chapter 5 Correlation/Regression.
Slide Slide 1 Copyright © 2007 Pearson Education, Inc Publishing as Pearson Addison-Wesley. Lecture Slides Elementary Statistics Tenth Edition and the.
Chapter 14 Introduction to Regression Analysis. Objectives Regression Analysis Uses of Regression Analysis Method of Least Squares Difference between.
Slide Slide 1 Chapter 10 Correlation and Regression 10-1 Overview 10-2 Correlation 10-3 Regression 10-4 Variation and Prediction Intervals 10-5 Multiple.
Slide 1 Copyright © 2004 Pearson Education, Inc. Chapter 10 Correlation and Regression 10-1 Overview Overview 10-2 Correlation 10-3 Regression-3 Regression.
1 Objective Given two linearly correlated variables (x and y), find the linear function (equation) that best describes the trend. Section 10.3 Regression.
Section Copyright © 2014, 2012, 2010 Pearson Education, Inc. Lecture Slides Elementary Statistics Twelfth Edition and the Triola Statistics Series.
Lecture Slides Elementary Statistics Twelfth Edition
Linear Regression Essentials Line Basics y = mx + b vs. Definitions
Review and Preview and Correlation
Correlation and Simple Linear Regression
CHS 221 Biostatistics Dr. wajed Hatamleh
Chapter 5 STATISTICS (PART 4).
Elementary Statistics
Correlation and Simple Linear Regression
Lecture Slides Elementary Statistics Thirteenth Edition
Correlation and Regression
Chapter 10 Correlation and Regression
Correlation and Simple Linear Regression
Lecture Slides Elementary Statistics Eleventh Edition
Lecture Slides Elementary Statistics Tenth Edition
Simple Linear Regression and Correlation
Correlation and Regression Lecture 1 Sections: 10.1 – 10.2
Lecture Slides Elementary Statistics Eleventh Edition
Created by Erin Hodgess, Houston, Texas
Chapter 9 Correlation and Regression
Presentation transcript:

Basic Concepts of Correlation

Definition A correlation exists between two variables when the values of one are somehow associated with the values of the other in some way.

Definition The linear correlation coefficient r measures the strength of the linear relationship between the paired quantitative x- and y- values in a sample.

Exploring the Data We can often see a relationship between two variables by constructing a scatterplot. Figure 10-2 following shows scatterplots with different characteristics.

Scatterplots of Paired Data Figure 10-2

Scatterplots of Paired Data Figure 10-2

Scatterplots of Paired Data Figure 10-2

Requirements 1. The sample of paired ( x, y ) data is a simple random sample of quantitative data. 2. Visual examination of the scatterplot must confirm that the points approximate a straight-line pattern. 3. The outliers must be removed ONLY if they are known to be errors. The effects of any other outliers should be considered by calculating r with and without the outliers included.

Notation for the Linear Correlation Coefficient n = number of pairs of sample data  denotes the addition of the items indicated.  x denotes the sum of all x- values.  x 2 indicates that each x - value should be squared and then those squares added. (  x ) 2 indicates that the x- values should be added and then the total squared.

Notation for the Linear Correlation Coefficient  xy indicates that each x -value should be first multiplied by its corresponding y- value. After obtaining all such products, find their sum. r = linear correlation coefficient for sample data.  = linear correlation coefficient for population data.

Formula 10-1 n  xy – (  x)(  y) n(  x 2 ) – (  x) 2 n(  y 2 ) – (  y) 2 r = The linear correlation coefficient r measures the strength of a linear relationship between the paired values in a sample. Computer software or calculators can compute r Formula

Interpreting r Using Table A-6: If the absolute value of the computed value of r, denoted | r |, exceeds the value in Table A-6, conclude that there is a linear correlation. Otherwise, there is not sufficient evidence to support the conclusion of a linear correlation. Using Software: If the computed P-value is less than or equal to the significance level, conclude that there is a linear correlation. Otherwise, there is not sufficient evidence to support the conclusion of a linear correlation.

- Ken Carroll, 1975 Correlation of incidence of death due to breast cancer with animal fat intake

- Ken Carroll, 1975 Correlation of incidence of death due to breast cancer with vegetable fat intake

From Francis Anscombe, 1973… each plot has the same mean (y=7.5), SD (4.12), r, (0.816), and linear regression line (y = x)…

ppppr 2 df … From the previous slide: note that with df of 9 r = is significant at p < 0.01 A meaningless correlation in anyone’s book!!!

Interpreting the Linear Correlation Coefficient r Critical Values from Table A-6 and the Computed Value of r

Interpreting r : Explained Variation The value of r 2 is the proportion of the variation in y that is explained by the linear relationship between x and y.

Common Errors Involving Correlation 1. Causation: It is wrong to conclude that correlation implies causality. 2. Averages: Averages suppress individual variation and may inflate the correlation coefficient. 3. Linearity: There may be some relationship between x and y even when there is no linear correlation.

Part 1: Basic Concepts of Regression

Regression The typical equation of a straight line y = mx + b is expressed in the form y = b 0 + b 1 x, where b 0 is the y -intercept and b 1 is the slope. ^ The regression equation expresses a relationship between x (called the explanatory variable, predictor variable or independent variable), and y (called the response variable or dependent variable). ^

Definitions  Regression Equation Given a collection of paired data, the regression equation  Regression Line The graph of the regression equation is called the regression line (or line of best fit, or least squares line). y = b 0 + b 1 x ^ algebraically describes the relationship between the two variables.

Notation for Regression Equation y -intercept of regression equation Slope of regression equation Equation of the regression line Population Parameter Sample Statistic  0 b 0  1 b 1 y =  0 +  1 x y = b 0 + b 1 x ^

Requirements 1. The sample of paired ( x, y ) data is a random sample of quantitative data. 2. Visual examination of the scatterplot shows that the points approximate a straight-line pattern. 3. Any outliers must be removed if they are known to be errors. Consider the effects of any outliers that are not known errors.

Formulas for b 0 and b 1 Formula 10-3 (slope) ( y -intercept) Formula 10-4 calculators or computers can compute these values

The regression line fits the sample points best. Special Property

Rounding the y -intercept b 0 and the Slope b 1  Round to three significant digits.  If you use the formulas 10-3 and 10-4, do not round intermediate values.

- Ken Carroll, 1975 From Francis Anscombe, 1973… each plot has the same mean (y=7.5), SD (4.12), r, (0.816), and linear regression line (y = x)…

We should know that the regression equation is an estimate of the true regression equation. This estimate is based on one particular set of sample data, but another sample drawn from the same population would probably lead to a slightly different equation.

1.Use the regression equation for predictions only if the graph of the regression line on the scatterplot confirms that the regression line fits the points reasonably well?????. Using the Regression Equation for Predictions 2.Use the regression equation for predictions only if the linear correlation coefficient r indicates that there is a linear correlation between the two variables????? Think… r 2

A % Fat % Fat Based on actual BD Based on predicted BD A25.96% 13.92% B 7.91% 13.92% B

Be VERY skeptical of individual “results” based on linear regression equations being used to make multiple serial predictions

Summary - correlation is used to calculate the strength of a linear relationship between paired variables from a sample - r value > or < 0 indicates a non-zero association between the 2 variables - a scatter plot is used to visualize the relationship between the 2 variables - r CANNOT indicate a causal relationship - interpret correlation by using r 2 … indicating the proportion of variation in variable A accounted for by the variation in variable B “significant r only means NOT a “0 association” - regression equation describes the association between the 2 variables - regression line is a graph of the regression equation describing the best fit line of the scatter plot - only with confidence limits can you properly interpret a linear regression line