Simple linear regression and correlation analysis

Slides:



Advertisements
Similar presentations
Multiple Regression and Model Building
Advertisements

11-1 Empirical Models Many problems in engineering and science involve exploring the relationships between two or more variables. Regression analysis.
Lesson 10: Linear Regression and Correlation
Chapter 12 Simple Linear Regression
Forecasting Using the Simple Linear Regression Model and Correlation
6-1 Introduction To Empirical Models 6-1 Introduction To Empirical Models.
Regression Analysis Module 3. Regression Regression is the attempt to explain the variation in a dependent variable using the variation in independent.
Regression Analysis Once a linear relationship is defined, the independent variable can be used to forecast the dependent variable. Y ^ = bo + bX bo is.
Prediction, Correlation, and Lack of Fit in Regression (§11. 4, 11
Objectives (BPS chapter 24)
Simple Linear Regression
Multiple Regression [ Cross-Sectional Data ]
Chapter 12 Simple Linear Regression
1-1 Regression Models  Population Deterministic Regression Model Y i =  0 +  1 X i u Y i only depends on the value of X i and no other factor can affect.
Chapter 10 Simple Regression.
Multivariate Data Analysis Chapter 4 – Multiple Regression.
Linear Regression and Correlation
The Simple Regression Model
SIMPLE LINEAR REGRESSION
Chapter Topics Types of Regression Models
Chapter 11 Multiple Regression.
Simple Linear Regression Analysis
Introduction to Probability and Statistics Linear Regression and Correlation.
© 2000 Prentice-Hall, Inc. Chap Forecasting Using the Simple Linear Regression Model and Correlation.
11-1 Empirical Models Many problems in engineering and science involve exploring the relationships between two or more variables. Regression analysis.
Simple Linear Regression and Correlation
Simple Linear Regression Analysis
Linear Regression/Correlation
Relationships Among Variables
Correlation & Regression
Regression and Correlation Methods Judy Zhong Ph.D.
Marketing Research Aaker, Kumar, Day and Leone Tenth Edition
Introduction to Linear Regression and Correlation Analysis
Inference for regression - Simple linear regression
1 MULTI VARIATE VARIABLE n-th OBJECT m-th VARIABLE.
© 2002 Prentice-Hall, Inc.Chap 14-1 Introduction to Multiple Regression Model.
OPIM 303-Lecture #8 Jose M. Cruz Assistant Professor.
1 1 Slide © 2008 Thomson South-Western. All Rights Reserved Chapter 15 Multiple Regression n Multiple Regression Model n Least Squares Method n Multiple.
1 Chapter 12 Simple Linear Regression. 2 Chapter Outline  Simple Linear Regression Model  Least Squares Method  Coefficient of Determination  Model.
Multiple Regression and Model Building Chapter 15 Copyright © 2014 by The McGraw-Hill Companies, Inc. All rights reserved.McGraw-Hill/Irwin.
Lecture 8 Simple Linear Regression (cont.). Section Objectives: Statistical model for linear regression Data for simple linear regression Estimation.
Inference for Regression Simple Linear Regression IPS Chapter 10.1 © 2009 W.H. Freeman and Company.
Anaregweek11 Regression diagnostics. Regression Diagnostics Partial regression plots Studentized deleted residuals Hat matrix diagonals Dffits, Cook’s.
© Copyright McGraw-Hill Correlation and Regression CHAPTER 10.
Chapter 13 Multiple Regression
Multiple Regression. Simple Regression in detail Y i = β o + β 1 x i + ε i Where Y => Dependent variable X => Independent variable β o => Model parameter.
1 B IVARIATE AND MULTIPLE REGRESSION Estratto dal Cap. 8 di: “Statistics for Marketing and Consumer Research”, M. Mazzocchi, ed. SAGE, LEZIONI IN.
Correlation & Regression Analysis
Copyright © 2011 by The McGraw-Hill Companies, Inc. All rights reserved. McGraw-Hill/Irwin Simple Linear Regression Analysis Chapter 13.
Chapter 12 Simple Linear Regression n Simple Linear Regression Model n Least Squares Method n Coefficient of Determination n Model Assumptions n Testing.
1 1 Slide The Simple Linear Regression Model n Simple Linear Regression Model y =  0 +  1 x +  n Simple Linear Regression Equation E( y ) =  0 + 
Copyright © 2011 by The McGraw-Hill Companies, Inc. All rights reserved. McGraw-Hill/Irwin Multiple Regression Chapter 14.
Inference about the slope parameter and correlation
11-1 Empirical Models Many problems in engineering and science involve exploring the relationships between two or more variables. Regression analysis.
Part 5 - Chapter
Statistical Quality Control, 7th Edition by Douglas C. Montgomery.
Regression Diagnostics
Correlation and Simple Linear Regression
CHAPTER 29: Multiple Regression*
6-1 Introduction To Empirical Models
Hypothesis testing and Estimation
Correlation and Simple Linear Regression
Simple Linear Regression and Correlation
Simple Linear Regression
Product moment correlation
Adequacy of Linear Regression Models
Adequacy of Linear Regression Models
Introduction to Regression
Correlation and Simple Linear Regression
Correlation and Simple Linear Regression
Presentation transcript:

Simple linear regression and correlation analysis Significance testing

1. Simple linear regression analysis Simple regression describes relationship between two variables Two variables, generally Y = f(X) Y = dependent variable (regressand) X = independent variable (regressor)

Simple linear regression f (x) – regression equation ei – random error, residual deviation independent random quantity N (0, σ2)

Simple linear regression – straight line b0 = constant b1 = coefficient of regression

Parameter estimates → least squares condition difference of the actual Y from the estimated Y est. is minimal hence n is number of observations (yi,xi) adjustment under partial derivation of function according to parameters b0, b1, derivation of the S sum of squared deviationas are equated to zero:

Two approches to parameter estimates with using of least squares condition (made for straight line equation) Normal equation system for straight line Matrix computation approach y = dependent variable vector X = independent variable matrix b = vector of regression coefficient (straight line → b0 and b1) ε = vector of random values

Simple linear regression observation yi smoothed values yi est; yi´ residual deviation residual sum of squares residual variance

Simple lin. reg. → dependence Y on X Straight line equation Normal equation system Parameter estimates – computational formula

Simple lin. reg. → dependence X on Y Associated straight line equation Parameters estimates – computational formula

2. Correlation analysis corr. analysis measures strength of dependence – coeff. of correlation „r“ │r│is in <0; +1> │r│is in <0; 0,33> weak dependence │r│is in <0,34; 0,66> medium strong dependence │r│is in <0,67; 1> strong to very strong dependence r2 = coeff. of determination, proportion (%) of variance Y, that is caused by the effect of X

3. Significance testing in simple regression

Significance test of parameters b1 (straight line) (two-sided) test criterion estimate sb for par. b1 table value (two-sided) if test criterion>table value→H0 is rejected and H1 is valid; if test alfa>p-value→H0 is rejected

Coefficient of regression estimation interval estimate for the unknown βi

Significance test of coeff. corr. r (straight line) (two-sided) test criterion table value (two-sided) if test criterion>table value→H0 is rejected and H1 is valid; if test alfa>p-value→H0 is rejected

Coefficient of correlation estimation small samples and not normal distribution Fischer Z – transformation first r is assigned to Z (by tables) interval estimate for the unknown σ last step Z1 a Z2 is assigned to r1 a r2

The summary ANOVA Variation Sum of deviaton squares df Variance Test criterion along the regression function k - 1 across the regression function n - k

The summary ANOVA (alternatively) test criterion table value

Multicollinearity relationship between (among) independent variables among independent variables (X1; X2….XN) is almost perfect linear relationship, high multicollinearity before model formation is needed to analyze of relationship linear independent of culumns (variables) is disturbed

Causes of multicollinearity tendencies of time series, similar tendencies among variables (regression) including of exogenous variables, delay using 0;1 coding in our sample

Consequences of multicollinearity wrong sampling null hypothesis about zero regression coefficient is not rejected, really is rejected confidence intervals are wide regression coeff estimation is very influented by data changing regression coeff can have wrong sign regression equation is not suitable for prediction

Testing of multicollinearity Paired coefficient of correlation t - test Farrar-Glauber test test criterion table value if test criterion>table value→H0 is rejected

Elimination of multicollinearity variables excluding get new sample once again re-formulate and think out the model (chosen variables) variables transformation – chosen variables recounting (not total consumption, but consumption per capita… etc.)

Regression diagnostics Data quality for the chosen model Suitable model for the chosen dataset Method conditions

Data quality evaluation A) outlying observation in „y“ set Studentized residuals |SR| > 2 → outlying observation → outlying need not to be influential (influential has cardinal influence on regression)

Data quality evaluation B) outlying observation in „x“ set Hat Diag leverage hii – diagonal values of hat matrix H H = X . (XT . X)-1 . XT hii > → outlying observation

Data quality evaluation C) influential observation Cook D (influential obs. influence the whole equation) Di > 4 → influential obs. Welsch – Kuh DFFITS distance (influential obs. influence smoothed observation) |DFFITS| > → influential obs.

Method condition regression parameters <-∞; +∞> regression model is linear in parameters (not linear – data transformation) independent of residues normal distribution of residues N(0;σ2)