Correlation and Regression Lecture 1 Sections: 10.1 – 10.2

Slides:



Advertisements
Similar presentations
11-1 Empirical Models Many problems in engineering and science involve exploring the relationships between two or more variables. Regression analysis.
Advertisements

6-1 Introduction To Empirical Models 6-1 Introduction To Empirical Models.
Correlation & Regression Chapter 10. Outline Section 10-1Introduction Section 10-2Scatter Plots Section 10-3Correlation Section 10-4Regression Section.
Probabilistic & Statistical Techniques Eng. Tamer Eshtawi First Semester Eng. Tamer Eshtawi First Semester
Correlation and Regression
1 Objective Investigate how two variables (x and y) are related (i.e. correlated). That is, how much they depend on each other. Section 10.2 Correlation.
Chapter 4 Describing the Relation Between Two Variables
SIMPLE LINEAR REGRESSION
Correlation A correlation exists between two variables when one of them is related to the other in some way. A scatterplot is a graph in which the paired.
SIMPLE LINEAR REGRESSION
11-1 Empirical Models Many problems in engineering and science involve exploring the relationships between two or more variables. Regression analysis.
Correlation and Regression Analysis
1 Chapter 10 Correlation and Regression We deal with two variables, x and y. Main goal: Investigate how x and y are related, or correlated; how much they.
STATISTICS ELEMENTARY C.M. Pascual
Descriptive Methods in Regression and Correlation
Chapter 10 Correlation and Regression
SIMPLE LINEAR REGRESSION
Relationship of two variables
Correlation.
Section Copyright © 2014, 2012, 2010 Pearson Education, Inc. Lecture Slides Elementary Statistics Twelfth Edition and the Triola Statistics Series.
Sections 9-1 and 9-2 Overview Correlation. PAIRED DATA Is there a relationship? If so, what is the equation? Use that equation for prediction. In this.
Is there a relationship between the lengths of body parts ?
1 Chapter 9. Section 9-1 and 9-2. Triola, Elementary Statistics, Eighth Edition. Copyright Addison Wesley Longman M ARIO F. T RIOLA E IGHTH E DITION.
Copyright © 2010, 2007, 2004 Pearson Education, Inc. All Rights Reserved Section 10-1 Review and Preview.
Section Copyright © 2014, 2012, 2010 Pearson Education, Inc. Lecture Slides Elementary Statistics Twelfth Edition and the Triola Statistics Series.
Correlation and Regression Paired Data Is there a relationship? Do the numbers appear to increase or decrease together? Does one set increase as the other.
Probabilistic and Statistical Techniques 1 Lecture 24 Eng. Ismail Zakaria El Daour 2010.
Copyright © 2010, 2007, 2004 Pearson Education, Inc. All Rights Reserved Lecture Slides Elementary Statistics Eleventh Edition and the Triola.
Section Copyright © 2014, 2012, 2010 Pearson Education, Inc. Lecture Slides Elementary Statistics Twelfth Edition and the Triola Statistics Series.
1 Chapter 10 Correlation and Regression 10.2 Correlation 10.3 Regression.
Chapter 10 Correlation and Regression
Correlation & Regression
Chapter 10 Lecture 2 Section: We analyzed paired data with the goal of determining whether there is a linear correlation between two variables.
Introduction to Probability and Statistics Thirteenth Edition Chapter 12 Linear Regression and Correlation.
Basic Concepts of Correlation. Definition A correlation exists between two variables when the values of one are somehow associated with the values of.
© Copyright McGraw-Hill Correlation and Regression CHAPTER 10.
Chapter 10 Correlation and Regression Lecture 1 Sections: 10.1 – 10.2.
Copyright (C) 2002 Houghton Mifflin Company. All rights reserved. 1 Understandable Statistics Seventh Edition By Brase and Brase Prepared by: Lynn Smith.
Section Copyright © 2014, 2012, 2010 Pearson Education, Inc. Chapter 10 Correlation and Regression 10-2 Correlation 10-3 Regression.
Copyright (C) 2002 Houghton Mifflin Company. All rights reserved. 1 Understandable Statistics Seventh Edition By Brase and Brase Prepared by: Lynn Smith.
1 MVS 250: V. Katch S TATISTICS Chapter 5 Correlation/Regression.
Slide Slide 1 Copyright © 2007 Pearson Education, Inc Publishing as Pearson Addison-Wesley. Lecture Slides Elementary Statistics Tenth Edition and the.
Slide Slide 1 Chapter 10 Correlation and Regression 10-1 Overview 10-2 Correlation 10-3 Regression 10-4 Variation and Prediction Intervals 10-5 Multiple.
Slide 1 Copyright © 2004 Pearson Education, Inc. Chapter 10 Correlation and Regression 10-1 Overview Overview 10-2 Correlation 10-3 Regression-3 Regression.
Section Copyright © 2014, 2012, 2010 Pearson Education, Inc. Lecture Slides Elementary Statistics Twelfth Edition and the Triola Statistics Series.
Lecture Slides Elementary Statistics Twelfth Edition
Regression and Correlation
Review and Preview and Correlation
Is there a relationship between the lengths of body parts?
Elementary Statistics
11-1 Empirical Models Many problems in engineering and science involve exploring the relationships between two or more variables. Regression analysis.
CHS 221 Biostatistics Dr. wajed Hatamleh
Chapter 13 Created by Bethany Stubbe and Stephan Kogitz.
Correlation and Regression
Elementary Statistics
Lecture Slides Elementary Statistics Twelfth Edition
Suppose the maximum number of hours of study among students in your sample is 6. If you used the equation to predict the test score of a student who studied.
Lecture Slides Elementary Statistics Thirteenth Edition
Correlation and Regression
CHAPTER 26: Inference for Regression
Linear Regression/Correlation
Chapter 10 Correlation and Regression
Lecture Slides Elementary Statistics Eleventh Edition
SIMPLE LINEAR REGRESSION
Product moment correlation
SIMPLE LINEAR REGRESSION
Chapter Fourteen McGraw-Hill/Irwin
Lecture Slides Elementary Statistics Eleventh Edition
Chapter 9 Correlation and Regression
Presentation transcript:

Correlation and Regression Lecture 1 Sections: 10.1 – 10.2 Chapter 10 Correlation and Regression Lecture 1 Sections: 10.1 – 10.2

You will now be introduced to important methods for making inferences based on sample data that come in pairs. In the previous chapter, we discussed matched pairs which dealt with differences between two population means. The objective of this chapter is to determine whether there is a relationship between two variables, which is sometimes referred to as bivariate data. If there exists a relationship, then we want to describe it with an equation that can be used for predictions. We will begin by considering the concept of correlation, which is used to determine whether there is a statistically significant relationship between two variables. We can investigate correlation by using a scatterplot. We will consider only linear relationships, which means that when graphed, the points approximate a straight-line pattern. We can also investigate correlation with a measure of the direction and strength of linear association between two variables, which is called the linear correlation coefficient.

Definitions: 1. A correlation exists between two variables when one of them is related to the other in some way. 2. A scatterplot sometimes called a scatter diagram, is a graph in which the paired (x, y) sample data are plotted with a horizontal x-axis and a vertical y-axis. Each individual (x, y) pair is plotted as a single point. 3. The linear correlation coefficient, denoted as r measures the strength of the linear relationship between the paired x- and y-quantitative values in a sample. If we had every pair of population values for x and y, the result would be a population parameter, represented by Greek letter “rho – ρ”.

Assumptions: 1. The sample data of pairs (x, y) is a random sample of quantitative data. 2. The pairs of (x, y) data have a bivariate normal distribution. This assumption requires that for any fixed value of x, the corresponding values of y have a distribution that is normal. Furthermore, for any fixed value of y, the values of x have a distribution that is normal. This assumption is usually difficult to check, but a partial check can be made by determining whether the values of both x and y have distributions that are approximately normal.

ρ: represents the linear correlation coefficient for a population. Notation for the Linear Correlation Coefficient: n: represents the number of pairs of data present. ∑: denotes the addition of the items indicated. ∑x: denotes the sum of all x-values. ∑x2: indicates that each x-value should be squared and then those squares added. (∑x)2:indicates that the x-values should be added and the total then squared. *NOTE: ∑x2 ≠ (∑x)2 ∑xy: indicates that each x-value should first be multiplied by its corresponding y-value. After obtaining all such products, find their sum. r: represents the linear correlation coefficient for a sample. ρ: represents the linear correlation coefficient for a population.

Round to 3 decimal places Properties of the Linear Correlation Coefficient r: 1. 2. – 1≤ r ≤+1 3. The value of r does not change if all values of either variable are converted to a different scale. 4. The value of r is not affected by the choice of x or y. Interchange all x- and y-values and the value of r will not change. 5. r measures the strength of a linear relationship. It is not designed to measure the strength of a relationship that is not linear. 6. The value of r2 is the proportion of the variation in y that is explained by the linear relationship between x and y. Round to 3 decimal places

Food Expenditure: $1,400 $2,400 $1,300 $1,600 $900 $1,500 $1,700 The accompanying table lists monthly income and their food expenditures for the month of December. Income: $5,500 $8,300 $3,800 $6,100 $3,300 $4,900 $6,700 Food Expenditure: $1,400 $2,400 $1,300 $1,600 $900 $1,500 $1,700 MINITAB Output Regression Analysis: FoodEx versus income The regression equation is FoodEx = 151 + 0.252 income Predictor Coef SE Coef T P Constant 150.7 217.4 0.69 0.519 income 0.25246 0.03788 6.66 0.001 S = 159.508 R-Sq = 89.9% R-Sq(adj) = 87.9%

Common Errors Involving Correlation: 1. A common error is to conclude that correlation implies causality. Using the example above, we can conclude that there is a correlation between neck size and head width, but we cannot conclude that neck size causes head width. 2. Error arises with data based on averages. Averages suppress individual variation and may inflate the correlation coefficient. 3. The property of linearity. A relationship may exist between x and y even when there is no significant linear correlation. If we look at the figure at the right, r = 0. This is an indication of no linear correlation between the two variables. However, we can easily see that the figure has a pattern reflecting a very strong nonlinear relationship.

Formal Hypothesis Test: Results and Conclusions

2. NECK(X) 15.8 18.0 16.3 17.5 16.6 17.2 16.5 Arm Length(Y) 33.5 36.8 35.2 35.0 34.7 35.7 34.8 Test the claim that there is a linear correlation between neck size and arm length. α = 0.01

3. The U.S. Department of education reports that there exists a linear correlation between SAT scores and GPA at the high school level. Ten high school students were randomly selected and their SAT score and GPA are listed below. a.Test the claim b.The proportion of the variation in y that is explained by the linear relationship between x and y. SAT GPA 1591 4.33 1530 3.96 1322 3.74 1169 3.12 979 2.80 825 2.70 791 2.54 766 2.35 743 2.32 633 2.07

4. The accompanying table lists weights in pounds of paper discarded by a sample of households, along with the size of the household. Paper: 2.41 7.57 9.55 8.82 8.72 6.96 6.83 11.42 HSize : 2 3 3 6 4 2 1 5 Test the claim that ρ = 0

5. The following is the age and the corresponding blood pressure of 10 subjects randomly selected subjects from a large city Is there significant linear correlation between age and blood pressure? Age 38 41 42 45 50 52 55 60 62 65 Blood Pressure 120 115 130 120 132 135 140 145 140 149

6. A study was conducted to investigate the relationship between age (in years) and BAC (blood alcohol concentration) measured when convicted DWI (driving while intoxicated) jail inmates were first arrested. Based on the data below, does the BAC seem to be correlated to the age of the person tested? Age 17.2 43.5 30.7 53.1 37.2 21.0 27.6 46.3 BAC 0.19 0.20 0.26 0.16 0.24 0.20 0.18 0.23