REGRESSION ANALYSIS 11/28/2019.

Slides:



Advertisements
Similar presentations
Lesson 10: Linear Regression and Correlation
Advertisements

Chapter 12 Simple Linear Regression
Forecasting Using the Simple Linear Regression Model and Correlation
Regression Analysis Module 3. Regression Regression is the attempt to explain the variation in a dependent variable using the variation in independent.
Chapter 10 Simple Regression.
9. SIMPLE LINEAR REGESSION AND CORRELATION
Chapter 12 Simple Regression
Chapter 13 Introduction to Linear Regression and Correlation Analysis
The Simple Regression Model
SIMPLE LINEAR REGRESSION
Chapter Topics Types of Regression Models
1 Simple Linear Regression Chapter Introduction In this chapter we examine the relationship among interval variables via a mathematical equation.
SIMPLE LINEAR REGRESSION
© 2000 Prentice-Hall, Inc. Chap Forecasting Using the Simple Linear Regression Model and Correlation.
Chapter 14 Introduction to Linear Regression and Correlation Analysis
Correlation and Regression Analysis
Introduction to Regression Analysis, Chapter 13,
Simple Linear Regression. Introduction In Chapters 17 to 19, we examine the relationship between interval variables via a mathematical equation. The motivation.
1 1 Slide © 2008 Thomson South-Western. All Rights Reserved Slides by JOHN LOUCKS & Updated by SPIROS VELIANITIS.
Correlation and Linear Regression
Correlation and Linear Regression
McGraw-Hill/Irwin Copyright © 2010 by The McGraw-Hill Companies, Inc. All rights reserved. Chapter 13 Linear Regression and Correlation.
SIMPLE LINEAR REGRESSION
Introduction to Linear Regression and Correlation Analysis
Linear Regression and Correlation
MAT 254 – Probability and Statistics Sections 1,2 & Spring.
1 FORECASTING Regression Analysis Aslı Sencer Graduate Program in Business Information Systems.
Chapter 6 & 7 Linear Regression & Correlation
Introduction to Linear Regression
Chap 12-1 A Course In Business Statistics, 4th © 2006 Prentice-Hall, Inc. A Course In Business Statistics 4 th Edition Chapter 12 Introduction to Linear.
Applied Quantitative Analysis and Practices LECTURE#23 By Dr. Osman Sadiq Paracha.
McGraw-Hill/Irwin Copyright © 2010 by The McGraw-Hill Companies, Inc. All rights reserved. Chapter 13 Linear Regression and Correlation.
1 Chapter 12 Simple Linear Regression. 2 Chapter Outline  Simple Linear Regression Model  Least Squares Method  Coefficient of Determination  Model.
Chapter 4 Linear Regression 1. Introduction Managerial decisions are often based on the relationship between two or more variables. For example, after.
© Copyright McGraw-Hill Correlation and Regression CHAPTER 10.
Lecture 10: Correlation and Regression Model.
Multiple Regression. Simple Regression in detail Y i = β o + β 1 x i + ε i Where Y => Dependent variable X => Independent variable β o => Model parameter.
Linear Regression and Correlation Chapter GOALS 1. Understand and interpret the terms dependent and independent variable. 2. Calculate and interpret.
Lecture 10 Introduction to Linear Regression and Correlation Analysis.
©The McGraw-Hill Companies, Inc. 2008McGraw-Hill/Irwin Linear Regression and Correlation Chapter 13.
Bivariate Regression. Bivariate Regression analyzes the relationship between two variables. Bivariate Regression analyzes the relationship between two.
Chapter 13 Linear Regression and Correlation. Our Objectives  Draw a scatter diagram.  Understand and interpret the terms dependent and independent.
Chapter 13 Simple Linear Regression
Statistical analysis.
Chapter 20 Linear and Multiple Regression
Regression and Correlation
Statistics for Managers using Microsoft Excel 3rd Edition
Correlation and Simple Linear Regression
Statistical analysis.
Linear Regression and Correlation Analysis
A Session On Regression Analysis
Chapter 5 STATISTICS (PART 4).
Correlation and Simple Linear Regression
Correlation and Regression
REGRESSION.
Chapter 14 – Correlation and Simple Regression
PENGOLAHAN DAN PENYAJIAN
Correlation and Simple Linear Regression
SIMPLE LINEAR REGRESSION
Simple Linear Regression and Correlation
Product moment correlation
SIMPLE LINEAR REGRESSION
Review I am examining differences in the mean between groups How many independent variables? OneMore than one How many groups? Two More than two ?? ?
Linear Regression and Correlation
Correlation and Simple Linear Regression
Correlation and Simple Linear Regression
Presentation transcript:

REGRESSION ANALYSIS 11/28/2019

REGRESSION ANALYSIS Regression analysis attempts to establish nature of relation between variables Measure of average relation between two or more variables Most frequently used technique in economics and business research 11/28/2019

Historical Origin of Regression Regression Analysis was first developed by Sir Francis Galton, who studied the relation between heights of sons and fathers. Heights of sons of both tall and short fathers appeared to “revert” or “regress” to the mean of the group. 1. In the latter part of 19th century He considered this tendency to be a regression to “mediocrity. He developed a mathematical description of this regression tendency. Galton’s Model is the precursor of today’s regression models.

REGRESSION ANALYSIS Statistical tool to estimate the unknown values of one variable from known values of another variable Independent (X) and dependent variable (Y) Simple linear regression analysis: only one predictor and straight line Dependent and independent refer to the mathematical or functional meaning Values of Y are dependent on values of X, X may or may not be causing change in Y 11/28/2019

USES Provides estimates of values of dependent variables from values of independent values : regression lines Obtains a measure of error involved in using regression line as basis for estimation Correlation coefficient can be calculated with help of regression coefficient 11/28/2019

DIFFERENCES WITH CORRELATION Correlation : Measure of degree of relationship, measure degree of co variability Regression : Study the nature of relationship Correlation : Can not tell which variable is cause (& effect) Regression : One variable is dependent, another independent 11/28/2019

REGRESSION LINES Lines cut each other at point of average of X and Y Drawn on assumption of least square 11/28/2019

REGRESSION EQUATIONS Regression equation of ‘Y’ on ‘X’ is expressed as:- Y = a + bX ‘Y’ is dependent variable, ‘X’ is independent ‘a’ is ‘Y-Intercept’, ‘b’ is slope (change in Y for unit change in X) Values of ‘a’ and ‘b’ by method of least squares 11/28/2019

REGRESSION EQUATIONS Least Square Method : line should be drawn through plotted points in such a manner that the sum of squares of deviations of actual ‘y’ values from computed ‘y’ values is the least Σ(y-ye)2 should be minimum to obtain best fitting line 11/28/2019

CHARACTERISTICS OF STRAIGHT LINE (BEST FIT) Gives the best fit of data Σ(y-ye)2 should be minimum, deviation above the line equals those below the line Straight line goes through overall mean of data For data representing sample from a population, least square line is ‘best’ estimate of population regression line 11/28/2019

REGRESSION EQUATIONS SIMILARLY, REGRESSION EQUATION OF ‘X’ ON ‘Y’ IS EXPRESSED AS:- X = a + bY ‘X’ IS DEPENDENT VARIABLE, ‘Y’ IS INDEPENDENT. ‘a’ IS “X-INTERCEPT”, ‘b’ IS SLOPE (CHANGE IN ‘X’ FOR UNIT CHANGE IN ‘Y’). FIND VALUES OF ‘a’ AND ‘b’ BY METHOD OF LEAST SQUARES. 11/28/2019

EXPRESSION FOR A LINE Q P y’ x’ a = intercept y 9 8 y = 4 +0.3x 7 6 5 2 1 Q y = 4 +0.3x P y’ x’ b (Slope) = y’/x’ a = intercept 0 2 4 6 8 10 12 14 16 18 X

REGRESSION ANALYSIS : LIMITATIONS Assumption; relationship has not changed since regression equation was computed Relationship shown by the scatter diagram may not be the same if equation is extended beyond the values used in computing the equation 11/28/2019

LINE OF BEST FIT Regression Equation is given by Where, and The numerator of equation for b is called Sum of Products SPxy Denominator is Sum of Squared Deviations from mean SSx. Denominator will always be +ive and sign of slope of the line would be determined by sign of numerator.

REGRESSION EQUATION FOR POINT ESTIMATE If number of hrs study is 4 hrs, what will be estimate of marks in Exam? ‘Point Estimate’ of y using the regression equation. Y = a + b * x = 1.0277 + 5.1389 * 4 = 21.58 { Value of ‘x’ for which you wish to find estimate of y, should lie within the range of given data ( i.e. 3-10)}.   Reliability of Point Estimate depends on:- Sample size. Amount of variation within the sample. Value of ‘x’ ? Therefore, ‘Interval Estimate’ is always better.

(Measure of Goodness of Fit) (Std Error of Regression) STD ERROR OF ESTIMATE (Measure of Goodness of Fit) (Std Error of Regression)

ASSUMPTIONS LINE 1. All actual values of y for a given value of x are normally distributed around its estimated value y (half negative and half positive). 2. Mean of each error component is zero (Mean of all y’s for a given x is equal to y estimate. 3. Variances of each error component (variances of all the y’s for various x’s) are same - homoscedasticity. 4. The errors are indep of each other.

Assumptions of the Simple Linear Regression Model X Y LINE assumptions of the Simple Linear Regression Model LINEAR, INDEPENDENT, NORMAL & EQUAL VAR Identical normal distributions of errors, all centered on the regression line. my|x=a +  x x y N(my|x, sy|x2) 19

Pictorial Presentation of Linear Regression Model The number of man-hours Y is treated in a regression model as a random variable. For each lot size, there is postulated a probability distribution of Y. This figure shows a probability distribution for X= 30, X=50, and X=70. The actual number of man-hours Y is then viewed as a random selection from this probability distribution. The means of the probability distributions have a systematic relation to the level of X. This systematic relationship is called the regression function of Y on X. The graph of regression function is called regression curve. In this figure the regression function is linear. This implies that the mean number of man-hours varies linearly with lot size.

REPRESENTING STANDARD ERROR OF ESTIMATE y  1Sy,x  2Sy,x y = a + b x   3Sy,x Dependent Variable Indep Variable X

STANDARD ERROR OF ESTIMATE In HRS of study example Std error of estimate would be =√2.884=1.698 marks. What does it mean ?

INTERPRETING STD ERROR OF ESTIMATE We can expect to find 68.26% of the points (y values) within  1 sy,x 95.45% of the points (y values) within  2 sy,x 99.7% of the points (y values) within  3 sy,x. of estimated y (y hat) Larger the std error of estimate, greater the scattering of points around the scatter line. Conversely, if sy,x = 0, estimating eqn would be a perfect estimator of the dependent variable.

INTERVAL ESTIMATION Interval estimation of y for an x value (for a given LoS and sample size) to Accuracy of this interval estimation depends on the distance of x from its mean (x bar). Closer the value of x, more reliable the estimate Hence, for x values other than x bar, a correction factor is used

CONFIDENCE INTERVAL FOR ESTIMATION OF MEAN Confidence Interval for mean value of y (using correction factor for a given x ) is given by:- to

PREDICTION OF INTERVAL ESTIMATION OF INDL Y VALUE Confidence Interval for value of y (and not the mean value of y) is given by:- to THEREFORE INTERVAL FOR Y WOULD BE BIGGER THAN INTERVAL FOR MEAN Y

Confidence Interval for the Average Value of Y Mean Y 28

Confidence Interval for the Average Value of Y and Prediction Interval for the Individual Value of Y Mean Y 29

AN ILLUSTRATION : LRCA Qn . A study was conducted by the Air Force on the effect of sleep deprivation on air traffic controllers’ performance whilst on watch. The sample data is as follows: No of hrs w/o Sleep No of Errors 8 8 8 6 12 6 12 10 16 8 16 14 20 14 20 12 24 16 24 12 Estimate No of errors if No of hrs w/o sleep were 10 at 95% CL.

CORRELATION ANALYSIS How strong is the relationship between the dependent and indep variables. How are the variables correlated. Statistical tool to describe the deg to which one variable is linearly related to another. Measures for describing the correlation between two variables: - Coefficient of Determination, r2 - Coefficient of Correlation, r

COEFFICIENT OF DETERMINATION Measures extent or strength of association. Its % of explained variation in dependent variable (y). Coeff of Determination = Total Variation – Unexplained Variation Total Variation For ATC Case = SST – SSE 968 – 17.3 SST 968 = Case of No of errors and going w/o sleep in ATC r2 = 0.64, What does it mean? Means 64% of errors explained ie due to lack of sleep and balance could be due to poor trg etc Sum of Squared Regression SSR = SST – SSE, SSR is Explained variation

COEFFICIENT OF DETERMINATION Measures extent or strength of association. Its % of explained variation in dependent variable (y). Coeff of Determination:- y x y = y < < Sum of Squared Regression SSR = SST – SSE, SSR is Explained variation r2 = 0, IF y = y for all values of x showing no correlation. r2 = 1, IF y = y for all values of x showing perfect correlation. < <

CORRELATION ANALYSIS INTERPRETING r2 ANOTHER WAY. Total variation = Interpret the coeff of determination by looking at amount of the variation in y that can be explained by the regression line. UNEXPLAINED VAR y < (y – y) Total variation = Explained variation + Unexplained var TOTAL VAR EXPLAINED VAR (y – y) < (y – y ) y x

CORRELATION ANALYSIS the Coefficient of Correlation, r r =  r2 Measures the strength of relationship ie how strongly the variables are related Multiple r = 0.8, in case of ATC (Errors & Hrs w/o sleep) means very strong relationship between the two variables Sign of ‘r’ is guided by the sign of the slope (b) of the regression line - ive sign indicates inverse relationship between two variables

CORRELATION COEFFICIENT (r) PROPERTIES OF SAMPLE CORRELATION COEFFICIENT (r) Ranges between -1 to +1. Sign of r tells whether relationship is positive or negative. Larger absolute value of r indicates stronger relationship. r value near zero indicates ‘no or poor’ relationship between x and y. r = + 1 or - 1 indicates perfect linear relationship. r values of 0, 1 or -1 are rare in practice.

? 11/28/2019