Introduction to Probability and Statistics Linear Regression and Correlation.

Slides:



Advertisements
Similar presentations
11-1 Empirical Models Many problems in engineering and science involve exploring the relationships between two or more variables. Regression analysis.
Advertisements

Forecasting Using the Simple Linear Regression Model and Correlation
Inference for Regression
6-1 Introduction To Empirical Models 6-1 Introduction To Empirical Models.
Regression Analysis Once a linear relationship is defined, the independent variable can be used to forecast the dependent variable. Y ^ = bo + bX bo is.
Chapter 12 Simple Linear Regression
Chapter 12 Linear Regression and Correlation
Chapter 10 Simple Regression.
The Simple Regression Model
1 Pertemuan 13 Uji Koefisien Korelasi dan Regresi Matakuliah: A0392 – Statistik Ekonomi Tahun: 2006.
SIMPLE LINEAR REGRESSION
Pengujian Parameter Koefisien Korelasi Pertemuan 04 Matakuliah: I0174 – Analisis Regresi Tahun: Ganjil 2007/2008.
Chapter Topics Types of Regression Models
1 Simple Linear Regression Chapter Introduction In this chapter we examine the relationship among interval variables via a mathematical equation.
Correlation and Regression. Correlation What type of relationship exists between the two variables and is the correlation significant? x y Cigarettes.
Note 13 of 5E Statistics with Economics and Business Applications Chapter 11 Linear Regression and Correlation Correlation Coefficient, Least Squares,
SIMPLE LINEAR REGRESSION
Korelasi dalam Regresi Linear Sederhana Pertemuan 03 Matakuliah: I0174 – Analisis Regresi Tahun: Ganjil 2007/2008.
© 2000 Prentice-Hall, Inc. Chap Forecasting Using the Simple Linear Regression Model and Correlation.
11-1 Empirical Models Many problems in engineering and science involve exploring the relationships between two or more variables. Regression analysis.
Correlation and Regression Analysis
Simple Linear Regression and Correlation
Chapter 7 Forecasting with Simple Regression
Copyright ©2006 Brooks/Cole, a division of Thomson Learning, Inc. More About Regression Chapter 14.
Linear Regression/Correlation
1 1 Slide © 2008 Thomson South-Western. All Rights Reserved Slides by JOHN LOUCKS & Updated by SPIROS VELIANITIS.
Lecture 5 Correlation and Regression
Correlation & Regression
Regression and Correlation Methods Judy Zhong Ph.D.
SIMPLE LINEAR REGRESSION
Introduction to Linear Regression and Correlation Analysis
Regression Analysis (2)
Simple Linear Regression Models
Ms. Khatijahhusna Abd Rani School of Electrical System Engineering Sem II 2014/2015.
Inferences in Regression and Correlation Analysis Ayona Chatterjee Spring 2008 Math 4803/5803.
OPIM 303-Lecture #8 Jose M. Cruz Assistant Professor.
© 2003 Prentice-Hall, Inc.Chap 13-1 Basic Business Statistics (9 th Edition) Chapter 13 Simple Linear Regression.
INTRODUCTORY LINEAR REGRESSION SIMPLE LINEAR REGRESSION - Curve fitting - Inferences about estimated parameter - Adequacy of the models - Linear.
Introduction to Probability and Statistics Chapter 12 Linear Regression and Correlation.
Introduction to Linear Regression
Chap 12-1 A Course In Business Statistics, 4th © 2006 Prentice-Hall, Inc. A Course In Business Statistics 4 th Edition Chapter 12 Introduction to Linear.
CS433: Modeling and Simulation Dr. Anis Koubâa Al-Imam Mohammad bin Saud University 15 October 2010 Lecture 05: Statistical Analysis Tools.
Elementary Statistics Correlation and Regression.
1 Chapter 12 Simple Linear Regression. 2 Chapter Outline  Simple Linear Regression Model  Least Squares Method  Coefficient of Determination  Model.
Go to Table of Content Single Variable Regression Farrokh Alemi, Ph.D. Kashif Haqqi M.D.
Introduction to Probability and Statistics Thirteenth Edition Chapter 12 Linear Regression and Correlation.
Copyright ©2011 Nelson Education Limited Linear Regression and Correlation CHAPTER 12.
© Copyright McGraw-Hill Correlation and Regression CHAPTER 10.
Copyright ©2011 Brooks/Cole, Cengage Learning Inference about Simple Regression Chapter 14 1.
Lecture 10: Correlation and Regression Model.
Chapter 14: Inference for Regression. A brief review of chapter 4... (Regression Analysis: Exploring Association BetweenVariables )  Bi-variate data.
Chapter 8: Simple Linear Regression Yang Zhenlin.
McGraw-Hill/IrwinCopyright © 2009 by The McGraw-Hill Companies, Inc. All Rights Reserved. Simple Linear Regression Analysis Chapter 13.
Copyright (C) 2002 Houghton Mifflin Company. All rights reserved. 1 Understandable Statistics Seventh Edition By Brase and Brase Prepared by: Lynn Smith.
Chapter 12 Simple Linear Regression n Simple Linear Regression Model n Least Squares Method n Coefficient of Determination n Model Assumptions n Testing.
1 1 Slide The Simple Linear Regression Model n Simple Linear Regression Model y =  0 +  1 x +  n Simple Linear Regression Equation E( y ) =  0 + 
Regression Analysis Presentation 13. Regression In Chapter 15, we looked at associations between two categorical variables. We will now focus on relationships.
REGRESSION AND CORRELATION SIMPLE LINEAR REGRESSION 10.2 SCATTER DIAGRAM 10.3 GRAPHICAL METHOD FOR DETERMINING REGRESSION 10.4 LEAST SQUARE METHOD.
Regression and Correlation
11-1 Empirical Models Many problems in engineering and science involve exploring the relationships between two or more variables. Regression analysis.
Statistics for Managers using Microsoft Excel 3rd Edition
Correlation and Simple Linear Regression
Simple Linear Regression
Correlation and Simple Linear Regression
Linear Regression/Correlation
Correlation and Simple Linear Regression
SIMPLE LINEAR REGRESSION
Simple Linear Regression and Correlation
SIMPLE LINEAR REGRESSION
Presentation transcript:

Introduction to Probability and Statistics Linear Regression and Correlation

Example  Let y be a student’s college achievement, measured by his/her GPA. This might be a function of several variables: x 1 = rank in high school class x 2 = high school’s overall rating x 3 = high school GPA x 4 = SAT scores  We want to predict y using knowledge of x 1, x 2, x 3 and x 4.

Some Questions  Which of the independent variables are useful and which are not?  How could we create a prediction equation to allow us to predict y using knowledge of x 1, x 2, x 3 etc?  How good is this prediction? We start with the simplest case, in which the response y is a function of a single independent variable, x.

A Simple Linear Model  We use the equation of a line to describe the relationship between y and x for a sample of n pairs, (x, y).  If we want to describe the relationship between y and x for the whole population, there are two models we can choose Deterministic Model: y =  x Probabilistic Model: –y = deterministic model + random error –y =  x  Deterministic Model: y =  x Probabilistic Model: –y = deterministic model + random error –y =  x 

A Simple Linear Model exactly  Since the measurements that we observe do not generally fall exactly on a straight line, we choose to use:  Probabilistic Model: y = x  y = x  E(y) = x E(y) = x Points deviate from the line of means line of means by an amount  where  has a normal distribution with mean 0 and variance  2.

The Random Error E(y) = x,  The line of means, E(y) = x, describes average value of y for any fixed value of x.  The population of measurements is generated as y deviates from the population line  by . We estimate   andusing sample information.

The Method of Least Squares  The equation of the best-fitting line is calculated using a set of n pairs (x i, y i ). We choose our estimates a and b to estimate  and  so that the vertical distances of the points from the line, are minimized. Applet

Least Squares Estimators

Example The table shows the math achievement test scores for a random sample of n = 10 college freshmen, along with their final calculus grades. Student Math test, x Calculus grade, y Use your calculator to find the sums and sums of squares.

Example

total sum of squares  The total variation in the experiment is measured by the total sum of squares: The Analysis of Variance Total SS The Total SS is divided into two parts: SSR SSR (sum of squares for regression): measures the variation explained by using x in the model. SSE SSE (sum of squares for error): measures the leftover variation not explained by x.

The Analysis of Variance We calculate

The ANOVA Table Total df = Mean Squares Regression df = Error df = n -1 1 n –1 – 1 = n - 2 MSR = SSR/(1) MSE = SSE/(n-2) SourcedfSSMSF Regression1SSRSSR/(1)MSR/MSE Errorn - 2SSESSE/(n-2) Totaln -1Total SS

The Calculus Problem SourcedfSSMSF Regression Error Total

Testing the Usefulness of the Model The first question to ask is whether the independent variable x is of any use in predicting y. If it is not, then the value of y does not change, regardless of the value of x. This implies that the slope of the line, , is zero.

Testing the Usefulness of the Model The test statistic is function of b, our best estimate of  Using MSE as the best estimate of the random variation  2, we obtain a t statistic.

The Calculus Problem Is there a significant relationship between the calculus grades and the test scores at the 5% level of significance? Applet Reject H 0 when |t| > Since t = 4.38 falls into the rejection region, H 0 is rejected. There is a significant linear relationship between the calculus grades and the test scores for the population of college freshmen.

The F Test  You can test the overall usefulness of the model using an F test. If the model is useful, MSR will be large compared to the unexplained variation, MSE. This test is exactly equivalent to the t-test, with t 2 = F.

Measuring the Strength of the Relationship If the independent variable x is of useful in predicting y, you will want to know how well the model fits. The strength of the relationship between x and y can be measured using:

Measuring the Strength of the Relationship Since Total SS = SSR + SSE, r 2 measures the proportion of the total variation in the responses that can be explained by using the independent variable x in the model. the percent reduction the total variation by using the regression equation rather than just using the sample mean y-bar to estimate y. For the calculus problem, r 2 =.705 or 70.5%. The model is working well!

Interpreting a Significant Regression Even if you do not reject the null hypothesis that the slope of the line equals 0, it does not necessarily mean that y and x are unrelated. Type II error—falsely declaring that the slope is 0 and that x and y are unrelated. It may happen that y and x are perfectly related in a nonlinear way.

Some Cautions You may have fit the wrong model. Extrapolation—predicting values of y outside the range of the fitted data. Causality—Do not conclude that x causes y. There may be an unknown variable at work!

Checking the Regression Assumptions 1.The relationship between x and y is linear, given by y =  +  x +  2.The random error terms  are independent and, for any value of x, have a normal distribution with mean 0 and variance  2. 1.The relationship between x and y is linear, given by y =  +  x +  2.The random error terms  are independent and, for any value of x, have a normal distribution with mean 0 and variance  2. Remember that the results of a regression analysis are only valid when the necessary assumptions have been satisfied.

Diagnostic Tools 1.Normal probability plot of residuals 2.Plot of residuals versus fit or residuals versus variables 1.Normal probability plot of residuals 2.Plot of residuals versus fit or residuals versus variables We use the following diagnostic tools to check the normality assumption and the assumption of equal variances.

Residuals residual errorThe residual error is the “leftover” variation in each data point after the variation explained by the regression model has been removed. normalIf all assumptions have been met, these residuals should be normal, with mean 0 and variance  2.

If the normality assumption is valid, the plot should resemble a straight line, sloping upward to the right. If not, you will often see the pattern fail in the tails of the graph. If the normality assumption is valid, the plot should resemble a straight line, sloping upward to the right. If not, you will often see the pattern fail in the tails of the graph. Normal Probability Plot

If the equal variance assumption is valid, the plot should appear as a random scatter around the zero center line. If not, you will see a pattern in the residuals. If the equal variance assumption is valid, the plot should appear as a random scatter around the zero center line. If not, you will see a pattern in the residuals. Residuals versus Fits

Estimation and Prediction Once you have determined that the regression line is useful used the diagnostic plots to check for violation of the regression assumptions. You are ready to use the regression line to Estimate the average value of y for a given value of x Predict a particular value of y for a given value of x.

Estimation and Prediction Estimating the average value of y when x = x 0 Estimating a particular value of y when x = x 0

Estimation and Prediction The best estimate of either E(y) or y for a given value x = x 0 is Particular values of y are more difficult to predict, requiring a wider range of values in the prediction interval.

Estimation and Prediction

The Calculus Problem  Estimate the average calculus grade for students whose achievement score is 50 with a 95% confidence interval.

The Calculus Problem  Estimate the calculus grade for a particular student whose achievement score is 50 with a 95% confidence interval. Notice how much wider this interval is!

Correlation Analysis coefficient of correlationThe strength of the relationship between x and y is measured using the coefficient of correlation: (1)-1  r  1 (2) r and b have the same sign (3)r  0 means no linear relationship (4) r  1 or –1 means a strong (+) or (-) relationship

Example The table shows the heights and weights of n = 10 randomly selected college football players. Player Height, x Weight, y Use your calculator to find the sums and sums of squares.

Football Players r =.8261 Strong positive correlation As the player’s height increases, so does his weight. r =.8261 Strong positive correlation As the player’s height increases, so does his weight.

Some Correlation Patterns Exploring CorrelationUse the Exploring Correlation applet to explore some correlation patterns: Applet r = 0; No correlation r =.931; Strong positive correlation r = 1; Linear relationship r = -.67; Weaker negative correlation

Inference using r coefficient of correlationThe population coefficient of correlation is called  (“rho”). We can test for a significant correlation between x and y using a t test: This test is exactly equivalent to the t-test for the slope .

Example Is there a significant positive correlation between weight and height in the population of all college football players? Use the t-table with n-2 = 8 df to bound the p-value as p-value <.005. There is a significant positive correlation. Applet