LECTURE 3 Introduction to Linear Regression and Correlation Analysis

Slides:



Advertisements
Similar presentations
Chap 12-1 Statistics for Business and Economics, 6e © 2007 Pearson Education, Inc. Chapter 12 Simple Regression Statistics for Business and Economics 6.
Advertisements

Forecasting Using the Simple Linear Regression Model and Correlation
Regresi Linear Sederhana Pertemuan 01 Matakuliah: I0174 – Analisis Regresi Tahun: Ganjil 2007/2008.
Chapter 13 Multiple Regression
Chapter 14 Introduction to Linear Regression and Correlation Analysis
Chapter 12 Simple Regression
Basic Business Statistics, 11e © 2009 Prentice-Hall, Inc. Chap 13-1 Chapter 13 Simple Linear Regression Basic Business Statistics 11 th Edition.
Chapter 12 Multiple Regression
Chapter 13 Introduction to Linear Regression and Correlation Analysis
Fall 2006 – Fundamentals of Business Statistics 1 Chapter 13 Introduction to Linear Regression and Correlation Analysis.
Statistics for Managers Using Microsoft Excel, 5e © 2008 Prentice-Hall, Inc.Chap 13-1 Statistics for Managers Using Microsoft® Excel 5th Edition Chapter.
Pengujian Parameter Koefisien Korelasi Pertemuan 04 Matakuliah: I0174 – Analisis Regresi Tahun: Ganjil 2007/2008.
Chapter 12 Simple Linear Regression
Chapter Topics Types of Regression Models
Linear Regression and Correlation Analysis
Chapter 13 Introduction to Linear Regression and Correlation Analysis
Linear Regression Example Data
© 2000 Prentice-Hall, Inc. Chap Forecasting Using the Simple Linear Regression Model and Correlation.
Chapter 14 Introduction to Linear Regression and Correlation Analysis
Basic Business Statistics, 10e © 2006 Prentice-Hall, Inc. Chap 13-1 Chapter 13 Simple Linear Regression Basic Business Statistics 10 th Edition.
Correlation and Regression Analysis
Chapter 7 Forecasting with Simple Regression
Chapter 13 Simple Linear Regression
Statistics for Business and Economics 7 th Edition Chapter 11 Simple Regression Copyright © 2010 Pearson Education, Inc. Publishing as Prentice Hall Ch.
Chapter 13 Simple Linear Regression
Regression and Correlation Methods Judy Zhong Ph.D.
Introduction to Linear Regression and Correlation Analysis
Statistics for Managers Using Microsoft Excel, 4e © 2004 Prentice-Hall, Inc. Chap 12-1 Chapter 12 Simple Linear Regression Statistics for Managers Using.
Chapter 11 Simple Regression
MAT 254 – Probability and Statistics Sections 1,2 & Spring.
Statistics for Business and Economics 8 th Edition Chapter 11 Simple Regression Copyright © 2013 Pearson Education, Inc. Publishing as Prentice Hall Ch.
Chapter 14 Simple Regression
OPIM 303-Lecture #8 Jose M. Cruz Assistant Professor.
Statistics for Business and Economics 7 th Edition Chapter 11 Simple Regression Copyright © 2010 Pearson Education, Inc. Publishing as Prentice Hall Ch.
© 2003 Prentice-Hall, Inc.Chap 13-1 Basic Business Statistics (9 th Edition) Chapter 13 Simple Linear Regression.
Business Statistics: A First Course, 5e © 2009 Prentice-Hall, Inc. Chap 12-1 Correlation and Regression.
Introduction to Linear Regression
Chap 12-1 A Course In Business Statistics, 4th © 2006 Prentice-Hall, Inc. A Course In Business Statistics 4 th Edition Chapter 12 Introduction to Linear.
Linear Regression and Correlation Analysis. Regression Analysis Regression Analysis attempts to determine the strength of the relationship between one.
Applied Quantitative Analysis and Practices LECTURE#22 By Dr. Osman Sadiq Paracha.
EQT 373 Chapter 3 Simple Linear Regression. EQT 373 Learning Objectives In this chapter, you learn: How to use regression analysis to predict the value.
Applied Quantitative Analysis and Practices LECTURE#23 By Dr. Osman Sadiq Paracha.
Chap 13-1 Copyright ©2012 Pearson Education, Inc. publishing as Prentice Hall Chap 13-1 Chapter 13 Simple Linear Regression Basic Business Statistics 12.
Statistics for Business and Economics 8 th Edition Chapter 11 Simple Regression Copyright © 2013 Pearson Education, Inc. Publishing as Prentice Hall Ch.
Statistics for Business and Economics 8 th Edition Chapter 11 Simple Regression Copyright © 2013 Pearson Education, Inc. Publishing as Prentice Hall Ch.
Lecture 4 Introduction to Multiple Regression
Business Statistics: A Decision-Making Approach, 6e © 2005 Prentice-Hall, Inc. Chap 13-1 Introduction to Regression Analysis Regression analysis is used.
Lecture 10: Correlation and Regression Model.
Applied Quantitative Analysis and Practices LECTURE#25 By Dr. Osman Sadiq Paracha.
Statistics for Business and Economics 8 th Edition Chapter 11 Simple Regression Copyright © 2013 Pearson Education, Inc. Publishing as Prentice Hall Ch.
© 2001 Prentice-Hall, Inc.Chap 13-1 BA 201 Lecture 18 Introduction to Simple Linear Regression (Data)Data.
Statistics for Managers Using Microsoft® Excel 5th Edition
Introduction to Multiple Regression Lecture 11. The Multiple Regression Model Idea: Examine the linear relationship between 1 dependent (Y) & 2 or more.
Chapter 12 Simple Linear Regression.
Statistics for Managers Using Microsoft Excel, 5e © 2008 Prentice-Hall, Inc.Chap 14-1 Statistics for Managers Using Microsoft® Excel 5th Edition Chapter.
Lecture 10 Introduction to Linear Regression and Correlation Analysis.
Statistics for Managers Using Microsoft Excel, 4e © 2004 Prentice-Hall, Inc. Chap 12-1 Chapter 12 Simple Linear Regression Statistics for Managers Using.
Conceptual Foundations © 2008 Pearson Education Australia Lecture slides for this course are based on teaching materials provided/referred by: (1) Statistics.
Chapter 12 Simple Regression Statistika.  Analisis regresi adalah analisis hubungan linear antar 2 variabel random yang mempunyai hub linear,  Variabel.
Introduction Many problems in Engineering, Management, Health Sciences and other Sciences involve exploring the relationships between two or more variables.
Chapter 13 Simple Linear Regression
Chapter 14 Introduction to Multiple Regression
Linear Regression and Correlation Analysis
Chapter 11 Simple Regression
Chapter 13 Simple Linear Regression
Simple Linear Regression
PENGOLAHAN DAN PENYAJIAN
Chapter 13 Simple Linear Regression
Presentation transcript:

LECTURE 3 Introduction to Linear Regression and Correlation Analysis 1 Simple Linear Regression 2 Regression Analysis 3 Regression Model Validity

Goals After this, you should be able to: Interpret the simple linear regression equation for a set of data Use descriptive statistics to describe the relationship between X and Y Determine whether a regression model is significant

Goals (continued) After this, you should be able to: Interpret confidence intervals for the regression coefficients Interpret confidence intervals for a predicted value of Y Check whether regression assumptions are satisfied Check to see if the data contains unusual values

Introduction to Regression Analysis Regression analysis is used to: Predict the value of a dependent variable based on the value of at least one independent variable Explain the impact of changes in an independent variable on the dependent variable Dependent variable: the variable we wish to explain Independent variable: the variable used to explain the dependent variable

Simple Linear Regression Model Only one independent variable, x Relationship between x and y is described by a linear function Changes in y are assumed to be caused by changes in x

Types of Regression Models Positive Linear Relationship Relationship NOT Linear Negative Linear Relationship No Relationship

Population Linear Regression The population regression model: Random Error term, or residual Population Slope Coefficient Population y intercept Independent Variable Dependent Variable Linear component Random Error component

Linear Regression Assumptions The underlying relationship between the x variable and the y variable is linear The distribution of the errors has constant variability Error values are normally distributed Error values are independent (over time)

Population Linear Regression y Observed Value of y for xi εi Slope = β1 Predicted Value of y for xi Random Error for this x value Intercept = β0 xi x

Estimated Regression Model The sample regression line provides an estimate of the population regression line Estimated (or predicted) y value Estimate of the regression intercept Estimate of the regression slope Independent variable

Interpretation of the Slope and the Intercept b0 is the estimated average value of y when the value of x is zero b1 is the estimated change in the average value of y as a result of a one-unit change in x

Finding the Least Squares Equation The coefficients b0 and b1 will be found using computer software, such as Excel’s data analysis add-in or MegaStat Other regression measures will also be computed as part of computer-based regression analysis

Simple Linear Regression Example A real estate agent wishes to examine the relationship between the selling price of a home and its size (measured in square feet) A random sample of 10 houses is selected Dependent variable (y) = house price in $1000 Independent variable (x) = square feet

Sample Data for House Price Model House Price in $1000s (y) Square Feet (x) 245 1400 312 1600 279 1700 308 1875 199 1100 219 1550 405 2350 324 2450 319 1425 255

Regression output from Excel – Data – Data Analysis or MegaStat – Correlation/ regression

MegaStat Output The regression equation is: Regression Analysis r² 0.581 n 10 r 0.762 k 1 Std. Error 41.330 Dep. Var. Price($000) ANOVA table Source SS df MS F p-value Regression 18,934.9348 11.08 .0104 Residual 13,665.5652 8 1,708.1957 Total 32,600.5000 9   Regression output confidence interval variables coefficients std. error t (df=8) 95% lower 95% upper Intercept 98.2483 Square feet 0.1098 0.0330 3.329 0.0337 0.1858

Graphical Presentation House price model: scatter plot and regression line Slope = 0.10977 Intercept = 98.248

Interpretation of the Intercept, b0 b0 is the estimated average value of Y when the value of X is zero (if x = 0 is in the range of observed x values) Here, houses with 0 square feet do not occur, so b0 = 98.24833 just indicates the height of the line.

Interpretation of the Slope Coefficient, b1 b1 measures the estimated change in Y as a result of a one-unit increase in X Here, b1 = .10977 tells us that the average value of a house increases by .10977($1000) = $109.77, on average, for each additional one square foot of size

Least Squares Regression Properties The simple regression line always passes through the mean of the y variable and the mean of the x variable The least squares coefficients are unbiased estimates of β0 and β1

Coefficient of Determination, R2 The percentage of variability in Y that can be explained by variability in X. Note: In the single independent variable case, the coefficient of determination is where: R2 = Coefficient of determination r = Simple correlation coefficient

Examples of R2 Values y R2 = 1, correlation = -1 Perfect linear relationship between x and y: 100% of the variation in y is explained by variation in x x R2 = 1 y x R2 = 1, correlation = +1

Examples of Approximate R2 Values y 0 < R2 < 1, correlation is negative Weaker linear relationship between x and y: Some but not all of the variation in y is explained by variation in x x y 0 < R2 < 1, correlation is positive x

Examples of Approximate R2 Values y No linear relationship between x and y: The value of Y does not depend on x. (None of the variation in y is explained by variation in x) x R2 = 0

Excel Output Regression Analysis r² 0.581 r 0.762 Std. Error 41.330 58.08% of the variation in house prices is explained by variation in square feet The correlation of .762 shows a fairly strong direct relationship. The typical error in predicting Price is 41.33($000) = $41,330

Inference about the Slope: t Test t test for a population slope Is there a linear relationship between x and y? Null and alternative hypotheses H0: β1 = 0 (no linear relationship) Ha: β1  0 (linear relationship does exist) Obtain p-value from ANOVA or across from the slope coefficient (they are the same in simple regression)

Inference about the Slope: t Test (continued) Estimated Regression Equation: House Price in $1000s (y) Square Feet (x) 245 1400 312 1600 279 1700 308 1875 199 1100 219 1550 405 2350 324 2450 319 1425 255 The slope of this model is 0.1098 Does square footage of the house affect its sales price?

Inferences about the Slope: t Test Example P-value H0: β1 = 0 Ha: β1  0 From Excel output:   Coefficients Standard Error t Stat P-value Intercept 98.24833 58.03348 1.69296 0.12892 Square Feet 0.10977 0.03297 3.32938 0.01039 Decision: Conclusion: Reject H0 We can be 98.96% confident that square feet is related to house price.

Regression Analysis for Description Confidence Interval Estimate of the Slope:   Coefficients Standard Error t Stat P-value Lower 95% Upper 95% Intercept 98.24833 58.03348 1.69296 0.12892 -35.57720 232.07386 Square Feet 0.10977 0.03297 3.32938 0.01039 0.03374 0.18580 Excel Printout for House Prices: We can be 95% confident that house prices increase by between $33.74 and $185.80 for a 1 square foot increase.

Estimates of Expected y for Different Values of x The relationship describes how x impacts your estimate from y y  yp y  y = b0 + b1x x xp x

Interval Estimates for Different Values of x Prediction Interval for an individual y, given xp The father from x the less accurate the prediction. y y  y = b0 + b1x x xp x

Example: House Prices Estimated Regression Equation: House Price in $1000s (y) Square Feet (x) 245 1400 312 1600 279 1700 308 1875 199 1100 219 1550 405 2350 324 2450 319 1425 255 Predict the price for a house with 2000 square feet

Example: House Prices (continued) Predict the price for a house with 2000 square feet: The predicted price for a house with 2000 square feet is 317.85($1,000s) = $317,850

Estimation of Individual Values: Example Prediction Interval Estimate for y|xp Find the 95% confidence interval for an individual house with 2,000 square feet Predicted Price Yi = 317.85 ($1,000s) = $317, 850 MegaStat will give both the predicted value as well as the lower and upper limits Predicted values for: Price($000)   95% Confidence Interval 95% Prediction Interval Square feet Predicted lower upper 2,000 317.784 280.664 354.903 215.503 420.065 The prediction interval endpoints are from $215,503 to $420,065. We can be 95% confident that the price of a 2000 ft2 home will fall within those limits.

Residual Analysis Purposes Graphical Analysis of Residuals Check for linearity assumption Check for the constant variability assumption for all levels of predicted Y Check normal residuals assumption Check for independence over time Graphical Analysis of Residuals Can plot residuals vs. x and predicted Y Can create NPP of residuals to check for normality (or use Skewness/Kurtosis) Can check D-W statistic to confirm independence

Residual Analysis for Linearity x x x x residuals residuals  Not Linear Linear

Residual Analysis for Constant Variance x x Ŷ Ŷ residuals residuals  Constant variance Non-constant variance

Residual Analysis for Normality Can create NPP of residuals to check for normality. If you see an approximate straight line residuals are acceptably normal. You can also use Skewness/Kurtosis. If both are within + 1 the residuals are acceptably normal Residual Analysis for Independence Can check D-W statistic to confirm independence. If D-W statistic is greater than 1.3 the residuals are acceptably independent. Needed only if the data is collected over time.

Checking Unusual Data Points Check for outliers from the predicted values (studentized and studentized deleted residuals do this; MegaStat highlights in blue) Check for outliers on the X-axis; they are indicated by large leverage values; more than twice as large as the average leverage. MegaStat highlights in blue. Check Cook’s Distance which measures the harmful influence of a data point on the equation by looking at residuals and leverage together. Cook’s D > 1 suggests potentially harmful data points and those points should be checked for data entry error. MegaStat highlights in blue based on F distribution values.

Patterns of Outliers a). Outlier is extreme in both X and Y but not in pattern. The point is unlikely to alter regression line. b). Outlier is extreme in both X and Y as well as in the overall pattern. This point will strongly influence regression line c). Outlier is extreme for X nearly average for Y. The further it is away from the pattern the more it will change the regression. d). Outlier extreme in Y not in X. The further it is away from the pattern the more it will change the regression. e). Outlier extreme in pattern, but not in X or Y. Slope may not be changed much but intercept will be higher with this point included.

Summary Introduced simple linear regression analysis Calculated the coefficients for the simple linear regression equation measures of strength (r, R2 and se)

Summary (continued) Described inference about the slope Addressed prediction of individual values Discussed residual analysis to address assumptions of regression and correlation Discussed checks for unusual data points