A Session On Regression Analysis www.AssignmentPoint.com
What does the term Regression literally mean? The term “ regressions” literally mean “stepping back towards the average”. www.AssignmentPoint.com
What is Regression? Regression is a statistical tool to estimate (or predict) the unknown values of one variable from known values of another variable. Example: If we know that advertising and sales are correlated, we may find out the expected amount of sales for a given advertising expenditure or the amount of expenditure for achieving a fixed sales target. www.AssignmentPoint.com
What is the utility of studying? The regression analysis is a branch of statistical theory that is widely used in almost all the scientific disciplines. In economics it is the basic technique for measuring or estimating the relationship among economic variables that constitute the essence of economic theory and economic life. Example: If two variables price (X) and demand (Y) are closely related one can find out the most probable value of X for a given value of Y or the most probable value of Y for a given value of X. Similarly, if the amount of tax and the rise in the price of a commodity are closely related, it is possible find out the expected price for a certain amount of tax levy. www.AssignmentPoint.com
Distinguish between correlation and regression analysis? The points of difference between correlation and regression analysis are: While correlation co-efficient is a measure of degree of relationship between X and Y, the regression analysis helps study the ‘nature of relationship’ between the variables. The cause and effect relation is more clearly indicated through regression analysis than by correlation. Correlation is merely a tool of ascertaining the degree of relationship between two variables and, therefore, one can not say that one variable is the cause and the other the effect, while regression shows the extent of dependence of one variable on another. www.AssignmentPoint.com
What are Regression Lines? If we take the case of two variables X and Y, we shall have two regression lines as regression line of X on Y and the regression line of Y on X. The regression line of Y on X gives most probable values of Y for given values of X and the regression line of X on Y gives most probable values of X for given values of Y. Thus we have two regression lines. www.AssignmentPoint.com
Under what conditions can there be one regression line? When there is either perfect positive or perfect negative correlation between the two variables, the two correlation lines will coincide i.e. we will have one line. The further the two regression lines are from each other, the lesser is the degree of correlation and the nearer the two regression lines to each other, the higher is the degree of correlation. If variables are independent, r is zero and the lines of regression are at right angles i.e. parallel to X – axis and Y – axis. www.AssignmentPoint.com
What are regression equations? Regression Equations are algebraic expressions of regression lines. Since there are two regression lines there are two regression equations–the regression equation of X on Y is used to describe the variations in the values of X for given change in Y and the regression equation of Y on X is used to describe the variation in the values of Y for given changes in X. www.AssignmentPoint.com
Why is the line of ‘best fit’? The line of regression is the line which gives the best estimate to the value of one variable for any specific value of other variable. Thus the line of regression is the line of “best fit” and is obtained by the principles of least squares. www.AssignmentPoint.com
What is the general form of the regression equation of Y on X? The general form of the linear regression equation of Y on X is expressed as follows: Ye = a+bX, where Ye = dependent variable to be estimated X = independent variable. In this equation a and b are two unknown constants (fixed numerical values) which determine the position of the line completely. The constants are called the parameters of the line. If the value of either one or both of them are changed, another line is determined. www.AssignmentPoint.com
What is the general form of the regression equation of Y on X? The parameter ‘a’ determines the level of the fitted line (i.e. the distance of the line directly, above or below the origin). The parameter ‘b’ determines the slope of the line i.e. the change in Y for unit change in X. To determine the values of ‘a’ and ‘b’, the following two normal equations are to be solved simultaneously. Y=Na+bX, XY=aX +bX2. www.AssignmentPoint.com
What is the general form of the regression equation of X on Y? The general form of the regression equation of X on Y is expressed as follows: X = a+bY To determine the values of ‘a’ and ‘b’, the following two normal equations are to be solved simultaneously. X = Na+bY, XY = aY+bY2. Continued…… www.AssignmentPoint.com
What is the general form of the regression equation of X on Y? These equations are usually called the normal equations. In the equations X, Y, XY, X2, indicate totals which are computed from the observed pairs of values of the two variables X and Y to which the least squares estimating line is to be fitted and N is the total number of observed pairs of values. The geometrical presentation of the linear equation , Y = a+bX is shown in the diagram below: www.AssignmentPoint.com
Y Y=a+bX b 1 UNIT IN X o X www.AssignmentPoint.com
It is clear from this diagram, the height of the line tells the average value of Y at a fixed value of X. When X=0, the average value of Y is equal to a . The value of a is called the Y- intercept since it is the point at which the straight line crosses the Y- axis. The slope of the line is measured by b, which gives the average amount of change of Y per unit change in the value of X. The sign of b also indicates the type of relationship between Y and X. www.AssignmentPoint.com
How are the values of ‘a’ and ‘b’ obtained to determine a regression completely? The values of ‘a’ and ‘b’ are obtained by “the method of least squares” which states that the line should be drawn through the plotted points in such a manner that the sum of the squares of the vertical deviations of the actual Y values from the estimated Y values is the least, or in other words, in order to obtain a line which fits the points best, (Y–Ye)2 should be minimum. Such a line is known as line of best fit. www.AssignmentPoint.com
How can the normal equations be arrived at? Continued…… www.AssignmentPoint.com
Let S = (Y –Ye)2 = (Y –a – bX)2 ( Ye = a+bX) Differentiating partially with respect to a and b, www.AssignmentPoint.com Continued…………
Regression equation of Y on X is given by Y = a+bX Example: The following data give the hardness (X) and tensile strength (Y) of 7 samples of metal in certain units. Find the linear regression equation of Y on X. X: 146 152 158 164 170 176 182 Y: 65 78 77 89 82 85 86 Solution: Regression equation of Y on X is given by Y = a+bX The normal equations are: Y = Na + bX…………………………….. (1) XY = aX+ bX2………………………… (2) www.AssignmentPoint.com Continued…………
Calculation of regression equations X Y X2 Y2 XY 146 75 21316 5625 10950 152 78 23104 6084 11856 158 77 24964 5929 12166 164 89 26896 7921 14596 170 82 28900 6724 13940 176 85 30976 7225 14960 182 86 33124 7396 15652 1148 (=X) 572 (=Y) 189280 (=X2) 46904 (=Y2) 94120 (=XY) Continued…….. www.AssignmentPoint.com
Substituting the values in equations (1) and (2), we get Here, N = 7 Substituting the values in equations (1) and (2), we get 572 =7a +1148 b ………………………. (5) 94120 = 1148 a +189280 b ……………… (4) Multiplying the equation (3) by 164, we get 93808 =1148 a +188272 b ………..(5) Subtracting this equation from (4), we get b = 0.31 Putting this value of b in equation (3), we have Continued…….. www.AssignmentPoint.com
The linear regression equation of Y on X is Y = 30.87 + 0.31X www.AssignmentPoint.com
Calculate the regression equations of X on Y and Y on X from the following data 1 2 3 4 5 Y: 8 7 Solution: X Y X2 Y2 XY 1 2 4 5 25 10 3 9 8 16 64 32 7 49 35 X=15 Y=25 X2 = 55 Y2 =151 XY = 88 Continued…….. www.AssignmentPoint.com
Regression equation of X on Y is given by X= a +bY. The normal equations are X = Na+bY and XY=aY+bY2 Substituting the values, we get 15 =5a+25b …….. (1) 88= 25a+151b ……(2) Solving (1) and (2), we get =0.5 and b =0.5 www.AssignmentPoint.com
Hence, the regression equation of X on Y is given by X = 0.5+ 0.5Y The Regression equation of Y on X is : Y = a+bX The normal equations are Y= Na+ bX XY = aX + bX2 Substituting the values, we get 25 = 5a +15b …………………… (iii) 88 = 15 a + 55b ……………… (iv) Solving (iii) and (iv), we get a = 1.10 and b = 1.3 Hence, the regression equation of Y on X is given by Y =1.10 +1.30X Continued…….. www.AssignmentPoint.com
The value byx can be easily obtained as follows: What will be the forms of regression equation of X on Y and regression equation of Y on X on the basis of deviations taken from arithmetic Means of X and Y? If we take the deviations of X and Y series from their respective means, the regression equation of Y on X will take the form The value byx can be easily obtained as follows: The two normal equations in terms of x and y will then become www.AssignmentPoint.com
Since x =y =0 (deviations being taken from means) Equation (1) reduces to Na = 0 a = 0 Equation (2) reduces to www.AssignmentPoint.com
After obtaining the value of byx , the regression equation can easily be written in terms of X and Y by substituting for y, and for x, . Similarly, the regression equations X= a+bY is reduced to = and the value of bxy can be similarly obtained as www.AssignmentPoint.com
Example: The following data give the hardness (X) and tensile strength (Y) 7 samples of metal in certain units. Find the linear regression equation of Y on X. X: 146 152 158 164 170 176 182 Y: 75 78 77 89 82 85 86 Continued…….. www.AssignmentPoint.com
Hardness (X) Strength (Y) (x) (y) x2 y2 xy 1 146 75 - 18 - 6.7142 324 45.08 120.86 2 152 78 - 12 -3.7122 144 13.80 44.57 3 158 77 - 6 - 4.7142 36 22.22 28.29 4 164 89 7.2858 53.08 5 170 82 6 0.2853 0.08 1.71 176 85 12 3.2858 10.80 39.43 7 182 86 18 4.2858 18.37 77.14 N=7 X=1148 Y=572 x=0 y=0 x2= 1008 y2 =163.43 xy =312 Continued…….. www.AssignmentPoint.com
www.AssignmentPoint.com
What are regression co–efficients? The quantity b in the regression equations (Y= a+bX and X = a+bY) is called the “regression co–efficient” or “slope co-efficient”. Since there are two regression equations, therefore, there are two regression co-efficients regression co–efficient of X on Y and regression co–efficient of Y on X. www.AssignmentPoint.com
What is regression co–efficient of X on Y? The regression co–efficient of X on Y is represented by the symbol bxy or b1 . It measures the amount of change in X corresponding to a unit change in Y. The regression co–efficient of X on Y is given by When deviations are taken from the means of X and Y, the regression co–efficient is obtained by Continued….. www.AssignmentPoint.com
What is regression co–efficient of X on Y? When deviations are taken from assumed means, the value of bxy is obtained as follows: www.AssignmentPoint.com
What are the assumptions of constructing a regression modal? The value of the dependent variable, Y, is dependent in some degree upon the value of the independent variable, X. The dependent variable is assumed to be a random variable, but the values of X are assumed to be fixed quantities that are selected and controlled by the experimenter. The requirement that the independent variables assumes fixed values, however, is not a critical one. Useful results can still be obtained by regression analysis in the case where both X and Y are random variables. The average relationship between X and Y can be adequately described by a linear equation Y= a+bX whose geometrical presentation is a straight line. www.AssignmentPoint.com
The height of the line tells the average value of Y at a fixed value of X. When X= 0, the average value of Y is equal to a. The value of a is called the Y intercept, since it is the point at which the straight line crosses the Y–-axis. The slope of the line is measured by b, which gives the average amount of change of Y per unit change in the value of X. The sign of b also indicates the type of relationship between Y and X. Associated with each value of X there is a sub– population of Y. The distribution of the sub– population may be assumed to be normal or non– specified in the sense that it is unknown. In any event, the distribution of each population Y is conditional to the value of X. www.AssignmentPoint.com
An individual value in each sub-population Y, may be expressed as: The mean of each sub- population Y is called the expected value of Y for a given X: . Furthermore, under the assumption of a linear relationship between X and Y, all values of must fall on a straight line. This is Which is the population regression equation for our bivariate linear model. In this equation a and b are called the population regression co– efficient. An individual value in each sub-population Y, may be expressed as: www.AssignmentPoint.com
Where e is the deviation of a particular value of Y from and is called the error term or the stochastic disturbance term. The errors are assumed to be independent random variables because Y’s are random variables and independent. The expectations of these errors are zero; E(e) = 0. Moreover, if Y’s are normal variables, the error can also be assumed to be normal. It is assumed that the variances of all sub–populations, called variances of the regression, are identical. www.AssignmentPoint.com
What is regression co–efficient of Y on X ? The regression Co–efficient of Y on X is represented by byx or b2. It measures the amount of change in Y corresponding to a unit change in X. The value of byx is given When deviations are taken from the means of X on Y, Continued…….. www.AssignmentPoint.com
When deviations are taken from assumed mean www.AssignmentPoint.com
What are the properties of regression co–efficients? The co–efficient of correlation is the geometric mean of the two regression co–efficients . Symbolically If one of the regression co–efficients is greater than unity, the other must be less than unity, as the value of the co–efficient correlation cannot exceed unity. Example: if bxy =1.2 and byx =1.4, Continued…… www.AssignmentPoint.com
What are the properties of regression co–efficient? Both the regression co–efficients will have the same sign i.e. they will be either positive or negative. The co–efficient of correlation will have the same sign as that of regression co–efficient. Example: Continued…… www.AssignmentPoint.com
What are the properties of regression co-efficient? The average value of the two regression co–efficients would be greater than the value of co–efficient of correlation. Symbolically, Example: r Continued…… www.AssignmentPoint.com
Regression co–efficients are independent of change of origin but not scale. www.AssignmentPoint.com
Example: Prove that the co–efficient of correlation is the geometric mean of the regression co–efficient Proof: Let bxy be co– efficient of X on Y and byx be co–efficient of Y on X. The co–efficient of correlation is the geometric mean of the two regression co–efficients. www.AssignmentPoint.com
Prove that Regression co–efficients are independent of change of origin but not scale. www.AssignmentPoint.com
i) The sales for advertisement expenditure of Tk. 80 lakhs and Example: The following figures relate to advertisement expenditure and corresponding sales Advertisement (in lakhs of Taka) 60 62 65 70 73 75 71 Sales ( in crores of Taka) 10 11 13 15 16 19 14 Estimate i) The sales for advertisement expenditure of Tk. 80 lakhs and ii) The advertisement expenditure for a sales target of Tk. 25 crores www.AssignmentPoint.com
Let the advertisement expenditure be denoted by X and sales by y calculation of regression equations xy 60 8 64 10 4 16 32 62 6 36 11 3 9 18 65 13 1 70 2 15 73 5 25 75 7 49 19 35 71 14 X=476 x= 0 x2=196 Y=98 y= 0 y2= 476 xy= 100 www.AssignmentPoint.com
www.AssignmentPoint.com
When advertisement expenditure, X= 80 lakhs, Sales,Y= 0.51×80 – 20.68 = 40.8 – 20.68 = 20.12 The likely sales for advertisement expenditure of Tk. 80 lakhs = Tk. 20.12 crores. www.AssignmentPoint.com
ii) Regression Equation of X on Y: Putting the values in equation (2), we get www.AssignmentPoint.com
When sales target, Y= 25 croes, Advertisement expenditure , X= 1.79 ×25+43 = 44.75+43 = 87.75 The likely advertisement expenditure for a sales target of Tk.25 crores = Tk. 87.75 Lakhs. www.AssignmentPoint.com
Advertising Expenditure 10 12 15 23 20 Sales 14 17 25 21 Example: The following data relate to advertising expenditure (in lakhs of Taka) and their corresponding sales (in crores of Taka): Advertising Expenditure 10 12 15 23 20 Sales 14 17 25 21 Estimate i) the sales corresponding to advertising expenditure of Tk 30 lakhs and ii) the advertising expenditure for a sales target of Tka. 35 crores. www.AssignmentPoint.com
Calculation of regression equations X (x) x2 Y (y) y2 xy 10 - 6 36 14 +36 12 - 4 16 17 - 3 9 +12 15 - 1 1 23 +3 +7 49 25 +5 +35 20 +4 21 +1 X= 80 x= 0 x2=118 Y=100 y=0 y2=80 xy=84 . www.AssignmentPoint.com
From equation (1) , we have www.AssignmentPoint.com
When the advertisement expenditure is Tk. 30 lakhs, Sales, Y= 0.712 X +8.608= 21.36+8.608=29.968 Thus the likely sales corresponding to advertisement expenditure of Tk.30 lakhs is Tk 29.968 crores. Regression equation of X on Y is given by www.AssignmentPoint.com
advertising expenditure, X = - 5 +1.05× 35 = - 5 +36.75 = 31.75 when the sales target is Tk. 35 crores, the advertising expenditure, X = - 5 +1.05× 35 = - 5 +36.75 = 31.75 The advertising expenditure for a sales target of Tk. 35 crores is Tk. 31.75 lakhs. www.AssignmentPoint.com
What is the standard error of estimate? The measure of variation of the observations around the computed regression line is referred to as the standard error of estimate. Just as the standard deviation is a measure of the scatter of observations in a frequency distribution around the mean of that distribution, the standard error of estimate is a measure of the scatter of the observed values of Y around the corresponding computed values of Y on the regression line. It is computed as a standard deviation, being also a square root of the mean of the squared deviation. But the deviations here are not deviations of the items from the arithmetic mean; they are rather the vertical distances of every dot from the line of average relationship. www.AssignmentPoint.com
What are the formulae for calculating the standard error of estimate? where Syx= S.E. of estimate of regression equation of Y on X. www.AssignmentPoint.com
The standard error of estimate can be easily calculated with the help of the following formula. www.AssignmentPoint.com Continued…….
Significance of standard error The standard error of estimate measures the accuracy of the estimated figures. The smaller the value of standard error of estimate, the closer will be dots to the regression line and the better the estimates based on the equation for this line. If standard error of estimate is zero, then there is no variation about the line and the correlation will be perfect. With the help of standard error of estimate, it is possible for us to ascertain how good and representative the regression line is as a description of the average relationship between two series. www.AssignmentPoint.com
What is co–efficient of determination? The ratio of the unexplained variation to the total variation represents the proportion of variation in Y that is not explained by regression on X. Subtraction of this proportion from 1.0 gives the proportion of variation in Y that is explained by regression on X. The statistic used to express this proportion is called the co–-efficient of determination and is denoted by R2. It may be written as follows: Continued……. www.AssignmentPoint.com
The value of R2 is the proportion of the variation in the dependent variable Y explained by regression on the independent variable X. www.AssignmentPoint.com
Thank you For Attending The Session www.AssignmentPoint.com