Correlation & Regression

Slides:



Advertisements
Similar presentations
Forecasting Using the Simple Linear Regression Model and Correlation
Advertisements

Correlation and Linear Regression.
Correlation and regression Dr. Ghada Abo-Zaid
6-1 Introduction To Empirical Models 6-1 Introduction To Empirical Models.
Correlation & Regression Chapter 10. Outline Section 10-1Introduction Section 10-2Scatter Plots Section 10-3Correlation Section 10-4Regression Section.
Probabilistic & Statistical Techniques Eng. Tamer Eshtawi First Semester Eng. Tamer Eshtawi First Semester
Correlation and Regression
Chapter 4 The Relation between Two Variables
Ch11 Curve Fitting Dr. Deshi Ye
Correlation and Regression
Correlation and Regression
Correlation and Linear Regression
Chapter 4 Describing the Relation Between Two Variables
Correlation and Regression
SIMPLE LINEAR REGRESSION
Linear Regression and Correlation Analysis
Topics: Regression Simple Linear Regression: one dependent variable and one independent variable Multiple Regression: one dependent variable and two or.
Correlation and Regression. Correlation What type of relationship exists between the two variables and is the correlation significant? x y Cigarettes.
Introduction to Probability and Statistics Linear Regression and Correlation.
Regression Chapter 10 Understandable Statistics Ninth Edition By Brase and Brase Prepared by Yixun Shi Bloomsburg University of Pennsylvania.
Chapter 9: Correlation and Regression
© 2000 Prentice-Hall, Inc. Chap Forecasting Using the Simple Linear Regression Model and Correlation.
Correlation and Regression Analysis
Simple Linear Regression and Correlation
Relationships Among Variables
Correlation & Regression
Correlation and Linear Regression
Correlation and Regression
Linear Regression and Correlation
Linear Regression.
Correlation and Regression
Introduction to Linear Regression and Correlation Analysis
Relationship of two variables
Correlation.
Correlation and Regression
Chapter 15 Correlation and Regression
© The McGraw-Hill Companies, Inc., Chapter 11 Correlation and Regression.
Correlation and Regression
Unit 10 Correlation and Regression McGraw-Hill, Bluman, 7th ed., Chapter 10 1.
© Copyright McGraw-Hill Correlation and Regression CHAPTER 10.
Multiple Correlation and Regression
Correlation & Regression Analysis
Copyright © 2010 Pearson Education, Inc Chapter Seventeen Correlation and Regression.
Linear Regression and Correlation Chapter GOALS 1. Understand and interpret the terms dependent and independent variable. 2. Calculate and interpret.
Copyright (C) 2002 Houghton Mifflin Company. All rights reserved. 1 Understandable Statistics Seventh Edition By Brase and Brase Prepared by: Lynn Smith.
Video Conference 1 AS 2013/2012 Chapters 10 – Correlation and Regression 15 December am – 11 am Puan Hasmawati Binti Hassan
REGRESSION AND CORRELATION SIMPLE LINEAR REGRESSION 10.2 SCATTER DIAGRAM 10.3 GRAPHICAL METHOD FOR DETERMINING REGRESSION 10.4 LEAST SQUARE METHOD.
Correlation and Regression. O UTLINE Introduction  10-1 Scatter plots.  10-2 Correlation.  10-3 Correlation Coefficient.  10-4 Regression.
Irwin/McGraw-Hill © Andrew F. Siegel, 1997 and l Chapter 9 l Simple Linear Regression 9.1 Simple Linear Regression 9.2 Scatter Diagram 9.3 Graphical.
Slide 1 Copyright © 2004 Pearson Education, Inc. Chapter 10 Correlation and Regression 10-1 Overview Overview 10-2 Correlation 10-3 Regression-3 Regression.
CHAPTER 10 & 13 Correlation and Regression
Regression and Correlation
10.2 Regression If the value of the correlation coefficient is significant, the next step is to determine the equation of the regression line which is.
Correlation and Regression
Chapter 5 STATISTICS (PART 4).
SIMPLE LINEAR REGRESSION MODEL
Simple Linear Regression
Correlation and Regression
Elementary Statistics
Correlation and Regression
CHAPTER 10 Correlation and Regression (Objectives)
Correlation and Simple Linear Regression
Correlation and Regression
Correlation and Simple Linear Regression
Correlation and Regression
Simple Linear Regression and Correlation
Product moment correlation
Presentation transcript:

Correlation & Regression Chapter 10

Introduction Another area of inferential statistics involves determining whether a relationship exists between two or more quantitative variables For example: Business person deciding whether volume of sales for given month is related to amount of advertising the firm does that month Educators interested in how number of hours a student studies is related to student’s score on an exam Medical researchers interested in determining if caffeine is related to heart damage

Introduction cont. Correlation Regression Questions to be answered Statistical method used to determine whether a relationship between variables exists Regression Statistical method used to describe nature of relationship between variables, that is, positive or negative, linear or nonlinear Questions to be answered Are two or more variables related? If so, what is strength of relationship? What type of relationship exists? What kind of predictions can be made from relationship?

Types of Relationships Two types of relationships: simple and multiple Simple relationship One independent (explanatory) variable, and one dependent (response) variable Simple relationship analysis is called simple regression Positive relationship – exists when both variables increase or decrease at the same time Negative relationship – exists when one variable increases as the other decreases, and vice versa Multiple relationship Two or more independent variables are used to predict one dependent variable

10.1 – Scatter Plots & Regression In simple correlation and regression studies, researcher collects data on two quantitative variables to see whether a relationship exists between them Independent variable can be controlled or manipulated (designated as x-axis variable) Dependent variable cannot be controlled or manipulated (designated as y-axis variable)

Scatter Plots Scatter plot Example 10 – 1 Example 10 – 2 Graph of ordered pairs (x, y) of numbers consisting of independent variable x and the dependent variable y Visual way to describe nature of relationship between independent and dependent variables After plot is drawn, it should be analyzed to determine which type of relationship, if any, exists Example 10 – 1 P. 536 Example 10 – 2 P. 537 Example 10 – 3 P. 538

Correlation Statisticians use correlation coefficient to determine strength of linear relationship between two variables Pearson product moment correlation coefficient (PPMC) Named after statistician Karl Pearson, who pioneered research in this area Correlation coefficient Computed from sample data measures strength and direction of linear relationship between two variables Symbol for sample correlation coefficient is r Symbol for population correlation coefficient is ρ (Greek letter rho)

Formula for Correlation Coefficient Range of the correlation coefficient is from -1 to +1 Value of r close to +1 suggests strong positive linear relationship Value of r close to -1 suggests strong negative linear relationship Value of r close to 0 suggest weak or no relationship Formula for Correlation Coefficient r 𝒓= 𝒏 𝒙𝒚 −( 𝒙 )( 𝒚 ) 𝒏 𝒙 𝟐 − ( 𝒙 ) 𝟐 𝒏( 𝒚 𝟐 ) − ( 𝒚 ) 𝟐 Where n is the number of data pairs

Example 10 – 4 Compute the correlation coefficient for data in example 10-1

Significance of Correlation Coefficient Question arises, when is value of r due to change, and when does it suggest a significant linear relationship between the variables? Since value of r is computed from samples, two possibilities exist when r is not equal to zero Either value of r is high enough to conclude there is significant linear relationship OR Value of r is due to change To make a decision, use a hypothesis-testing procedure similar to the traditional method

Population Correlation Coefficient Sample correlation coefficient can be used as an estimator of p (rho) if following assumptions are valid Variables x and y are linearly related Variables are random variables Two variables have a bivariate normal distribution Population correlation coefficient Correlation computed by using all possible pairs of data values (x,y) taken from a population

Hypothesis Testing In hypothesis testing, one of these is true 𝐻 0 : 𝜌=0 OR 𝐻 1 : 𝜌≠0 When null hypothesis is rejected at a specific level, it means there is a significant difference between the value of r and 0. When null hypothesis is not rejected, it means value of r is not significantly different from 0 and is probably due to chance Do not have to identify claim, since question will always be whether there is significant linear relationship between variable

Formula for t Test Formula for t Test for Correlation Coefficient 𝑡=𝑟 𝑛−2 1− 𝑟 2 with degrees of freedom equal to n – 2 Example 10 – 7 Test the significance of the correlation coefficient found in example 10 – 4. Use α = 0.05 and r = 0.982

Correlation and Causation When a hypothesis test indicates that a significant linear relationship exists between variables, researchers must consider possibilities outlined next. Possible Relationships Between Variables When null hypothesis has been rejected for a specific α value, any of the following five possibilities can exist: There is a direct cause-and-effect relationship between variables There is a reverse cause-and-effect relationship between variables Relationship between variables may be caused by a third variable There may be a complexity of interrelationships among many variables Relationship may be coincidental Remember, correlation does not necessarily imply causation

10.2 – Regression If value of correlation coefficient is significant, next step is to determine equation of regression line Regression line Data’s line of best fit Allows researcher to see rend and make predictions on basis of the data

Line of Best Fit Given a scatter plot, you must be able to draw the line of best fit Line of best fit Line drawn so that sum of squares of vertical distances from each point in scatter plot to line is at a minimum

Determination of Regression Line Equation Linear equation in algebra is written as 𝑦=𝑚𝑥+𝑏 In statistics, regression line is written as 𝑦 ′ =𝑎+𝑏𝑥 Where 𝑎 𝑖𝑠 𝑦 ′ 𝑖𝑛𝑡𝑒𝑟𝑐𝑒𝑝𝑡 𝑎𝑛𝑑 𝑏 𝑖𝑠 𝑠𝑙𝑜𝑝𝑒 𝑜𝑓 𝑙𝑖𝑛𝑒 Formula for Regression Line y’= a + bx 𝑎= 𝑦 𝑥 2 −( 𝑥 )( 𝑥𝑦 ) 𝑛 𝑥 2 − ( 𝑥 ) 2 and 𝑏= 𝑛 𝑥𝑦 −( 𝑥 )( 𝑦 ) 𝑛 𝑥 2 − ( 𝑥 ) 2 Rounding rule: round values of a and b to three decimal places

Examples 10 – 9 Find the equation of the regression line for data in example 10 – 4 and graph the line on the scatter plot of the data 10 – 11 Use the equation of the regression line to predict the income of a car rental agency that has 200,000 automobiles

Assumptions Marginal change Magnitude of change in one variable when the other variable changes exactly 1 unit When r is not significantly different from 0, best predictor of y is mean of data values of y For valid predictions, value of correlation coefficient must be significant, also two other assumptions must be met: For any specific value of the independent variable x, the value of the dependent variable y must be normally distributed about the regression line The standard deviation of each of the dependent variables must be the same for each value of the independent variable

Checking for Outliers All scatter plots should be checked for outliers Influential points/ influential observations Points that can affect equation of regression line When point on scatter plot seems to be an outlier it should be checked to see if it is an influential point because influential points seem to “pull” regression line towards it Researchers should use their judgment whether to include influential observations in final analysis of data If researcher feels observation is not necessary, then it should be excluded so it does not influence results of study If researcher feels that it is necessary, he or she may want to obtain additional data values whose x values are near x value of influential point

10.3 – Coefficient of Determination & Standard Error of the Estimate If correlation coefficient can is significant then equation of regression line can be determined Other measures are associated with correlation and regression techniques: Coefficient of determination Standard error of the estimate Prediction interval

Regression Model Consider this hypothetical regression model X values: {1, 2, 3, 4, 5} Y values: {10, 8, 12, 16, 20} Regression line equation is: 𝑦 ′ =4.8+2.8𝑥 and r = 0.919 For each value of x there is an observed value and a predicted y’ value When x = 1, y = 10, and y’ = 7.6 Recall that closer the y’ values are to actual y values then the better the fit and closer r is to +1 or -1

Total Variation Total variation Explained variation Sum of squares of vertical distances each point is from mean (𝑦− 𝑦 ) 2 Explained variation Variation obtained from the relationship (y’ predicted values) (𝑦 ′ − 𝑦 ) 2 Unexplained variation Variation due to chance (𝑦− 𝑦 ′ ) 2 *Total variation = Explained variation + unexplained variation* (𝑦− 𝑦 ) 2 = (𝑦 ′ − 𝑦 ) 2 + (𝑦− 𝑦 ′ ) 2

Residuals & Least-Squares Difference between actual value of y and predicted y’ value for a given x value Least-squares line Another name for a regression line because it is computed using sum of squares of residuals is the smallest possible value

Coefficient of Determination Measure of the variation of the dependent variables that is explained by the regression line and the independent variable Ratio of explained variation and total variation 𝑟 2 = 𝑒𝑥𝑝𝑙𝑎𝑖𝑛𝑒𝑑 𝑣𝑎𝑟𝑖𝑎𝑡𝑖𝑜𝑛 𝑡𝑜𝑡𝑎𝑙 𝑣𝑎𝑟𝑖𝑎𝑡𝑖𝑜𝑛 Can also be found by squaring the r value Coefficient of nondetermination Found by subtracting coefficient of determination from 1 1− 𝑟 2

Standard Error of the Estimate When a y’ value is predicted for a specific x value, prediction is a point estimate Standard error of the estimate Denoted by sest, is the standard deviation of the observed y values about the predicted y’ values Prediction interval uses this statistic Formula for standard error of estimate is 𝑠 𝑒𝑠𝑡 = (𝑦− 𝑦 ′ ) 2 𝑛−2

Examples 10 – 12 A researcher collects the following data (page 569) and determines that there is a significant relationship between age of a copy machine and its monthly maintenance cost. The regression line is 𝑦 ′ =55.57+8.13𝑥 Find the standard error of the estimate

Prediction Interval Prediction interval With d.f. = n – 2 Similar to a confidence interval where the standard error of the estimate is used to create an interval about a y’ value By selecting an α value, you can achieve a 1−𝛼 ∗100% confidence that the interval contains the actual mean of the y values that correspond to the given x value Formula for the Prediction Interval about a Value y’ 𝑦=𝑦′± 𝑡 𝛼/2 𝑠 𝑒𝑠𝑡 1+ 1 𝑛 + 𝑛 (𝑥− 𝑋 ) 2 𝑛 𝑥 2 − ( 𝑥 ) 2 With d.f. = n – 2

Example 10 – 14 For the data in Example 10 – 12, find the 95% prediction interval for the monthly maintenance cost of a machine that is 3 years old