A Session On Regression Analysis www.AssignmentPoint.com.

Slides:



Advertisements
Similar presentations
Managerial Economics in a Global Economy
Advertisements

Lesson 10: Linear Regression and Correlation
CmpE 104 SOFTWARE STATISTICAL TOOLS & METHODS MEASURING & ESTIMATING SOFTWARE SIZE AND RESOURCE & SCHEDULE ESTIMATING.
Simple Regression. Major Questions Given an economic model involving a relationship between two economic variables, how do we go about specifying the.
9. SIMPLE LINEAR REGESSION AND CORRELATION
CORRELATION.
SIMPLE LINEAR REGRESSION
REGRESSION AND CORRELATION
SIMPLE LINEAR REGRESSION
Regression Analysis British Biometrician Sir Francis Galton was the one who used the term Regression in the later part of 19 century.
Correlation and Linear Regression
Correlation and Linear Regression
McGraw-Hill/Irwin Copyright © 2010 by The McGraw-Hill Companies, Inc. All rights reserved. Chapter 13 Linear Regression and Correlation.
SIMPLE LINEAR REGRESSION
Introduction to Linear Regression and Correlation Analysis
Linear Regression and Correlation
Regression Analysis: “Regression is that technique of statistical analysis by which when one variable is known the other variable can be estimated. In.
Chapter 6 & 7 Linear Regression & Correlation
McGraw-Hill/Irwin Copyright © 2010 by The McGraw-Hill Companies, Inc. All rights reserved. Chapter 13 Linear Regression and Correlation.
Correlation Analysis. A measure of association between two or more numerical variables. For examples height & weight relationship price and demand relationship.
Y X 0 X and Y are not perfectly correlated. However, there is on average a positive relationship between Y and X X1X1 X2X2.
MGS3100_04.ppt/Sep 29, 2015/Page 1 Georgia State University - Confidential MGS 3100 Business Analysis Regression Sep 29 and 30, 2015.
10B11PD311 Economics REGRESSION ANALYSIS. 10B11PD311 Economics Regression Techniques and Demand Estimation Some important questions before a firm are.
© Copyright McGraw-Hill Correlation and Regression CHAPTER 10.
CORRELATION. Correlation key concepts: Types of correlation Methods of studying correlation a) Scatter diagram b) Karl pearson’s coefficient of correlation.
CHAPTER 5 CORRELATION & LINEAR REGRESSION. GOAL : Understand and interpret the terms dependent variable and independent variable. Draw a scatter diagram.
Correlation & Regression Analysis
Chapter 8: Simple Linear Regression Yang Zhenlin.
Linear Regression and Correlation Chapter GOALS 1. Understand and interpret the terms dependent and independent variable. 2. Calculate and interpret.
©The McGraw-Hill Companies, Inc. 2008McGraw-Hill/Irwin Linear Regression and Correlation Chapter 13.
Chapter 14 Introduction to Regression Analysis. Objectives Regression Analysis Uses of Regression Analysis Method of Least Squares Difference between.
WELCOME TO THE PRESENTATION ON LINEAR REGRESSION ANALYSIS & CORRELATION (BI-VARIATE) ANALYSIS.
CORRELATION-REGULATION ANALYSIS Томский политехнический университет.
Correlation and regression by M.Shayan Asad
Chapter 13 Linear Regression and Correlation. Our Objectives  Draw a scatter diagram.  Understand and interpret the terms dependent and independent.
The simple linear regression model and parameter estimation
Copyright © Cengage Learning. All rights reserved.
CHAPTER 1 Linear Equations Section 1.1 p1.
Chapter 4: Basic Estimation Techniques
Regression and Correlation
Chapter 4 Basic Estimation Techniques
Regression Analysis AGEC 784.
Correlation and Simple Linear Regression
Chapter 5 STATISTICS (PART 4).
Correlation and Simple Linear Regression
CORRELATION.
Correlation and Regression
Regression Analysis Week 4.
LESSON 21: REGRESSION ANALYSIS
REGRESSION.
CHAPTER- 17 CORRELATION AND REGRESSION
Correlation and Simple Linear Regression
Undergraduated Econometrics
Correlation and Regression
Single Regression Equation
SIMPLE LINEAR REGRESSION
Simple Linear Regression and Correlation
Product moment correlation
SIMPLE LINEAR REGRESSION
Lecture # 2 MATHEMATICAL STATISTICS
Linear Regression Dr. Richard Jackson
Warsaw Summer School 2017, OSU Study Abroad Program
Chapter Thirteen McGraw-Hill/Irwin
Linear Regression and Correlation
MGS 3100 Business Analysis Regression Feb 18, 2016
REGRESSION ANALYSIS 11/28/2019.
Correlation and Simple Linear Regression
Correlation and Simple Linear Regression
Presentation transcript:

A Session On Regression Analysis www.AssignmentPoint.com

What does the term Regression literally mean? The term “ regressions” literally mean “stepping back towards the average”. www.AssignmentPoint.com

What is Regression? Regression is a statistical tool to estimate (or predict) the unknown values of one variable from known values of another variable. Example: If we know that advertising and sales are correlated, we may find out the expected amount of sales for a given advertising expenditure or the amount of expenditure for achieving a fixed sales target. www.AssignmentPoint.com

What is the utility of studying? The regression analysis is a branch of statistical theory that is widely used in almost all the scientific disciplines. In economics it is the basic technique for measuring or estimating the relationship among economic variables that constitute the essence of economic theory and economic life. Example: If two variables price (X) and demand (Y) are closely related one can find out the most probable value of X for a given value of Y or the most probable value of Y for a given value of X. Similarly, if the amount of tax and the rise in the price of a commodity are closely related, it is possible find out the expected price for a certain amount of tax levy. www.AssignmentPoint.com

Distinguish between correlation and regression analysis? The points of difference between correlation and regression analysis are: While correlation co-efficient is a measure of degree of relationship between X and Y, the regression analysis helps study the ‘nature of relationship’ between the variables. The cause and effect relation is more clearly indicated through regression analysis than by correlation. Correlation is merely a tool of ascertaining the degree of relationship between two variables and, therefore, one can not say that one variable is the cause and the other the effect, while regression shows the extent of dependence of one variable on another. www.AssignmentPoint.com

What are Regression Lines? If we take the case of two variables X and Y, we shall have two regression lines as regression line of X on Y and the regression line of Y on X. The regression line of Y on X gives most probable values of Y for given values of X and the regression line of X on Y gives most probable values of X for given values of Y. Thus we have two regression lines. www.AssignmentPoint.com

Under what conditions can there be one regression line? When there is either perfect positive or perfect negative correlation between the two variables, the two correlation lines will coincide i.e. we will have one line. The further the two regression lines are from each other, the lesser is the degree of correlation and the nearer the two regression lines to each other, the higher is the degree of correlation. If variables are independent, r is zero and the lines of regression are at right angles i.e. parallel to X – axis and Y – axis. www.AssignmentPoint.com

What are regression equations? Regression Equations are algebraic expressions of regression lines. Since there are two regression lines there are two regression equations–the regression equation of X on Y is used to describe the variations in the values of X for given change in Y and the regression equation of Y on X is used to describe the variation in the values of Y for given changes in X. www.AssignmentPoint.com

Why is the line of ‘best fit’? The line of regression is the line which gives the best estimate to the value of one variable for any specific value of other variable. Thus the line of regression is the line of “best fit” and is obtained by the principles of least squares. www.AssignmentPoint.com

What is the general form of the regression equation of Y on X? The general form of the linear regression equation of Y on X is expressed as follows: Ye = a+bX, where Ye = dependent variable to be estimated X = independent variable. In this equation a and b are two unknown constants (fixed numerical values) which determine the position of the line completely. The constants are called the parameters of the line. If the value of either one or both of them are changed, another line is determined. www.AssignmentPoint.com

What is the general form of the regression equation of Y on X? The parameter ‘a’ determines the level of the fitted line (i.e. the distance of the line directly, above or below the origin). The parameter ‘b’ determines the slope of the line i.e. the change in Y for unit change in X. To determine the values of ‘a’ and ‘b’, the following two normal equations are to be solved simultaneously. Y=Na+bX, XY=aX +bX2. www.AssignmentPoint.com

What is the general form of the regression equation of X on Y? The general form of the regression equation of X on Y is expressed as follows: X = a+bY To determine the values of ‘a’ and ‘b’, the following two normal equations are to be solved simultaneously. X = Na+bY, XY = aY+bY2. Continued…… www.AssignmentPoint.com

What is the general form of the regression equation of X on Y? These equations are usually called the normal equations. In the equations X, Y, XY, X2, indicate totals which are computed from the observed pairs of values of the two variables X and Y to which the least squares estimating line is to be fitted and N is the total number of observed pairs of values. The geometrical presentation of the linear equation , Y = a+bX is shown in the diagram below: www.AssignmentPoint.com

Y Y=a+bX b 1 UNIT IN X o X www.AssignmentPoint.com

It is clear from this diagram, the height of the line tells the average value of Y at a fixed value of X. When X=0, the average value of Y is equal to a . The value of a is called the Y- intercept since it is the point at which the straight line crosses the Y- axis. The slope of the line is measured by b, which gives the average amount of change of Y per unit change in the value of X. The sign of b also indicates the type of relationship between Y and X. www.AssignmentPoint.com

How are the values of ‘a’ and ‘b’ obtained to determine a regression completely? The values of ‘a’ and ‘b’ are obtained by “the method of least squares” which states that the line should be drawn through the plotted points in such a manner that the sum of the squares of the vertical deviations of the actual Y values from the estimated Y values is the least, or in other words, in order to obtain a line which fits the points best, (Y–Ye)2 should be minimum. Such a line is known as line of best fit. www.AssignmentPoint.com

How can the normal equations be arrived at? Continued…… www.AssignmentPoint.com

Let S = (Y –Ye)2 = (Y –a – bX)2 ( Ye = a+bX) Differentiating partially with respect to a and b, www.AssignmentPoint.com Continued…………

Regression equation of Y on X is given by Y = a+bX Example: The following data give the hardness (X) and tensile strength (Y) of 7 samples of metal in certain units. Find the linear regression equation of Y on X. X: 146 152 158 164 170 176 182 Y: 65 78 77 89 82 85 86 Solution: Regression equation of Y on X is given by Y = a+bX The normal equations are: Y = Na + bX…………………………….. (1) XY = aX+ bX2………………………… (2) www.AssignmentPoint.com Continued…………

Calculation of regression equations X Y X2 Y2 XY 146 75 21316 5625 10950 152 78 23104 6084 11856 158 77 24964 5929 12166 164 89 26896 7921 14596 170 82 28900 6724 13940 176 85 30976 7225 14960 182 86 33124 7396 15652 1148 (=X) 572 (=Y) 189280 (=X2) 46904 (=Y2) 94120 (=XY) Continued…….. www.AssignmentPoint.com

Substituting the values in equations (1) and (2), we get Here, N = 7 Substituting the values in equations (1) and (2), we get 572 =7a +1148 b ………………………. (5) 94120 = 1148 a +189280 b ……………… (4) Multiplying the equation (3) by 164, we get 93808 =1148 a +188272 b ………..(5) Subtracting this equation from (4), we get b = 0.31 Putting this value of b in equation (3), we have Continued…….. www.AssignmentPoint.com

 The linear regression equation of Y on X is Y = 30.87 + 0.31X www.AssignmentPoint.com

Calculate the regression equations of X on Y and Y on X from the following data 1 2 3 4 5 Y: 8 7 Solution: X Y X2 Y2 XY 1 2 4 5 25 10 3 9 8 16 64 32 7 49 35 X=15 Y=25 X2 = 55 Y2 =151 XY = 88 Continued…….. www.AssignmentPoint.com

Regression equation of X on Y is given by X= a +bY. The normal equations are X = Na+bY and XY=aY+bY2 Substituting the values, we get 15 =5a+25b …….. (1) 88= 25a+151b ……(2) Solving (1) and (2), we get =0.5 and b =0.5 www.AssignmentPoint.com

Hence, the regression equation of X on Y is given by X = 0.5+ 0.5Y The Regression equation of Y on X is : Y = a+bX The normal equations are Y= Na+ bX XY = aX + bX2 Substituting the values, we get 25 = 5a +15b …………………… (iii) 88 = 15 a + 55b ……………… (iv) Solving (iii) and (iv), we get a = 1.10 and b = 1.3 Hence, the regression equation of Y on X is given by Y =1.10 +1.30X Continued…….. www.AssignmentPoint.com

The value byx can be easily obtained as follows: What will be the forms of regression equation of X on Y and regression equation of Y on X on the basis of deviations taken from arithmetic Means of X and Y? If we take the deviations of X and Y series from their respective means, the regression equation of Y on X will take the form The value byx can be easily obtained as follows: The two normal equations in terms of x and y will then become www.AssignmentPoint.com

Since x =y =0 (deviations being taken from means) Equation (1) reduces to Na = 0  a = 0 Equation (2) reduces to www.AssignmentPoint.com

After obtaining the value of byx , the regression equation can easily be written in terms of X and Y by substituting for y, and for x, . Similarly, the regression equations X= a+bY is reduced to = and the value of bxy can be similarly obtained as www.AssignmentPoint.com

Example: The following data give the hardness (X) and tensile strength (Y) 7 samples of metal in certain units. Find the linear regression equation of Y on X. X: 146 152 158 164 170 176 182 Y: 75 78 77 89 82 85 86 Continued…….. www.AssignmentPoint.com

Hardness (X) Strength (Y) (x) (y) x2 y2 xy 1 146 75 - 18 - 6.7142 324 45.08 120.86 2 152 78 - 12 -3.7122 144 13.80 44.57 3 158 77 - 6 - 4.7142 36 22.22 28.29 4 164 89 7.2858 53.08 5 170 82 6 0.2853 0.08 1.71 176 85 12 3.2858 10.80 39.43 7 182 86 18 4.2858 18.37 77.14 N=7 X=1148 Y=572 x=0 y=0 x2= 1008 y2 =163.43 xy =312 Continued…….. www.AssignmentPoint.com

www.AssignmentPoint.com

What are regression co–efficients? The quantity b in the regression equations (Y= a+bX and X = a+bY) is called the “regression co–efficient” or “slope co-efficient”. Since there are two regression equations, therefore, there are two regression co-efficients regression co–efficient of X on Y and regression co–efficient of Y on X. www.AssignmentPoint.com

What is regression co–efficient of X on Y? The regression co–efficient of X on Y is represented by the symbol bxy or b1 . It measures the amount of change in X corresponding to a unit change in Y. The regression co–efficient of X on Y is given by When deviations are taken from the means of X and Y, the regression co–efficient is obtained by Continued….. www.AssignmentPoint.com

What is regression co–efficient of X on Y? When deviations are taken from assumed means, the value of bxy is obtained as follows: www.AssignmentPoint.com

What are the assumptions of constructing a regression modal? The value of the dependent variable, Y, is dependent in some degree upon the value of the independent variable, X. The dependent variable is assumed to be a random variable, but the values of X are assumed to be fixed quantities that are selected and controlled by the experimenter. The requirement that the independent variables assumes fixed values, however, is not a critical one. Useful results can still be obtained by regression analysis in the case where both X and Y are random variables. The average relationship between X and Y can be adequately described by a linear equation Y= a+bX whose geometrical presentation is a straight line. www.AssignmentPoint.com

The height of the line tells the average value of Y at a fixed value of X. When X= 0, the average value of Y is equal to a. The value of a is called the Y intercept, since it is the point at which the straight line crosses the Y–-axis. The slope of the line is measured by b, which gives the average amount of change of Y per unit change in the value of X. The sign of b also indicates the type of relationship between Y and X. Associated with each value of X there is a sub– population of Y. The distribution of the sub– population may be assumed to be normal or non– specified in the sense that it is unknown. In any event, the distribution of each population Y is conditional to the value of X. www.AssignmentPoint.com

An individual value in each sub-population Y, may be expressed as: The mean of each sub- population Y is called the expected value of Y for a given X: . Furthermore, under the assumption of a linear relationship between X and Y, all values of must fall on a straight line. This is Which is the population regression equation for our bivariate linear model. In this equation a and b are called the population regression co– efficient. An individual value in each sub-population Y, may be expressed as: www.AssignmentPoint.com

Where e is the deviation of a particular value of Y from and is called the error term or the stochastic disturbance term. The errors are assumed to be independent random variables because Y’s are random variables and independent. The expectations of these errors are zero; E(e) = 0. Moreover, if Y’s are normal variables, the error can also be assumed to be normal. It is assumed that the variances of all sub–populations, called variances of the regression, are identical. www.AssignmentPoint.com

What is regression co–efficient of Y on X ? The regression Co–efficient of Y on X is represented by byx or b2. It measures the amount of change in Y corresponding to a unit change in X. The value of byx is given When deviations are taken from the means of X on Y, Continued…….. www.AssignmentPoint.com

When deviations are taken from assumed mean www.AssignmentPoint.com

What are the properties of regression co–efficients? The co–efficient of correlation is the geometric mean of the two regression co–efficients . Symbolically If one of the regression co–efficients is greater than unity, the other must be less than unity, as the value of the co–efficient correlation cannot exceed unity. Example: if bxy =1.2 and byx =1.4, Continued…… www.AssignmentPoint.com

What are the properties of regression co–efficient? Both the regression co–efficients will have the same sign i.e. they will be either positive or negative. The co–efficient of correlation will have the same sign as that of regression co–efficient. Example: Continued…… www.AssignmentPoint.com

What are the properties of regression co-efficient? The average value of the two regression co–efficients would be greater than the value of co–efficient of correlation. Symbolically, Example:  r Continued…… www.AssignmentPoint.com

Regression co–efficients are independent of change of origin but not scale. www.AssignmentPoint.com

Example: Prove that the co–efficient of correlation is the geometric mean of the regression co–efficient Proof: Let bxy be co– efficient of X on Y and byx be co–efficient of Y on X.  The co–efficient of correlation is the geometric mean of the two regression co–efficients. www.AssignmentPoint.com

Prove that Regression co–efficients are independent of change of origin but not scale. www.AssignmentPoint.com

i) The sales for advertisement expenditure of Tk. 80 lakhs and Example: The following figures relate to advertisement expenditure and corresponding sales Advertisement (in lakhs of Taka) 60 62 65 70 73 75 71 Sales ( in crores of Taka) 10 11 13 15 16 19 14 Estimate i) The sales for advertisement expenditure of Tk. 80 lakhs and ii) The advertisement expenditure for a sales target of Tk. 25 crores www.AssignmentPoint.com

Let the advertisement expenditure be denoted by X and sales by y calculation of regression equations xy 60 8 64 10 4 16 32 62 6 36 11 3 9 18 65 13 1 70 2 15 73 5 25 75 7 49 19 35 71 14 X=476 x= 0 x2=196 Y=98 y= 0 y2= 476 xy= 100 www.AssignmentPoint.com

www.AssignmentPoint.com

When advertisement expenditure, X= 80 lakhs, Sales,Y= 0.51×80 – 20.68 = 40.8 – 20.68 = 20.12 The likely sales for advertisement expenditure of Tk. 80 lakhs = Tk. 20.12 crores. www.AssignmentPoint.com

ii) Regression Equation of X on Y:  Putting the values in equation (2), we get www.AssignmentPoint.com

When sales target, Y= 25 croes, Advertisement expenditure , X= 1.79 ×25+43 = 44.75+43 = 87.75  The likely advertisement expenditure for a sales target of Tk.25 crores = Tk. 87.75 Lakhs. www.AssignmentPoint.com

Advertising Expenditure 10 12 15 23 20 Sales 14 17 25 21 Example: The following data relate to advertising expenditure (in lakhs of Taka) and their corresponding sales (in crores of Taka): Advertising Expenditure 10 12 15 23 20 Sales 14 17 25 21 Estimate i) the sales corresponding to advertising expenditure of Tk 30 lakhs and ii) the advertising expenditure for a sales target of Tka. 35 crores. www.AssignmentPoint.com

Calculation of regression equations X (x) x2 Y (y) y2 xy 10 - 6 36 14 +36 12 - 4 16 17 - 3 9 +12 15 - 1 1 23 +3 +7 49 25 +5 +35 20 +4 21 +1 X= 80 x= 0 x2=118 Y=100 y=0 y2=80 xy=84 . www.AssignmentPoint.com

From equation (1) , we have www.AssignmentPoint.com

When the advertisement expenditure is Tk. 30 lakhs, Sales, Y= 0.712 X +8.608= 21.36+8.608=29.968 Thus the likely sales corresponding to advertisement expenditure of Tk.30 lakhs is Tk 29.968 crores. Regression equation of X on Y is given by www.AssignmentPoint.com

advertising expenditure, X = - 5 +1.05× 35 = - 5 +36.75 = 31.75  when the sales target is Tk. 35 crores, the advertising expenditure, X = - 5 +1.05× 35 = - 5 +36.75 = 31.75 The advertising expenditure for a sales target of Tk. 35 crores is Tk. 31.75 lakhs. www.AssignmentPoint.com

What is the standard error of estimate? The measure of variation of the observations around the computed regression line is referred to as the standard error of estimate. Just as the standard deviation is a measure of the scatter of observations in a frequency distribution around the mean of that distribution, the standard error of estimate is a measure of the scatter of the observed values of Y around the corresponding computed values of Y on the regression line. It is computed as a standard deviation, being also a square root of the mean of the squared deviation. But the deviations here are not deviations of the items from the arithmetic mean; they are rather the vertical distances of every dot from the line of average relationship. www.AssignmentPoint.com

What are the formulae for calculating the standard error of estimate? where Syx= S.E. of estimate of regression equation of Y on X. www.AssignmentPoint.com

The standard error of estimate can be easily calculated with the help of the following formula. www.AssignmentPoint.com Continued…….

Significance of standard error The standard error of estimate measures the accuracy of the estimated figures. The smaller the value of standard error of estimate, the closer will be dots to the regression line and the better the estimates based on the equation for this line. If standard error of estimate is zero, then there is no variation about the line and the correlation will be perfect. With the help of standard error of estimate, it is possible for us to ascertain how good and representative the regression line is as a description of the average relationship between two series. www.AssignmentPoint.com

What is co–efficient of determination? The ratio of the unexplained variation to the total variation represents the proportion of variation in Y that is not explained by regression on X. Subtraction of this proportion from 1.0 gives the proportion of variation in Y that is explained by regression on X. The statistic used to express this proportion is called the co–-efficient of determination and is denoted by R2. It may be written as follows: Continued……. www.AssignmentPoint.com

The value of R2 is the proportion of the variation in the dependent variable Y explained by regression on the independent variable X. www.AssignmentPoint.com

Thank you For Attending The Session www.AssignmentPoint.com