Download presentation
Presentation is loading. Please wait.
Published byEvan Eaton Modified over 6 years ago
1
A Session On Regression Analysis
2
What does the term Regression literally mean?
The term “ regressions” literally mean “stepping back towards the average”.
3
What is Regression? Regression is a statistical tool to estimate (or predict) the unknown values of one variable from known values of another variable. Example: If we know that advertising and sales are correlated, we may find out the expected amount of sales for a given advertising expenditure or the amount of expenditure for achieving a fixed sales target.
4
What is the utility of studying?
The regression analysis is a branch of statistical theory that is widely used in almost all the scientific disciplines. In economics it is the basic technique for measuring or estimating the relationship among economic variables that constitute the essence of economic theory and economic life. Example: If two variables price (X) and demand (Y) are closely related one can find out the most probable value of X for a given value of Y or the most probable value of Y for a given value of X. Similarly, if the amount of tax and the rise in the price of a commodity are closely related, it is possible find out the expected price for a certain amount of tax levy.
5
Distinguish between correlation and regression analysis?
The points of difference between correlation and regression analysis are: While correlation co-efficient is a measure of degree of relationship between X and Y, the regression analysis helps study the ‘nature of relationship’ between the variables. The cause and effect relation is more clearly indicated through regression analysis than by correlation. Correlation is merely a tool of ascertaining the degree of relationship between two variables and, therefore, one can not say that one variable is the cause and the other the effect, while regression shows the extent of dependence of one variable on another.
6
What are Regression Lines?
If we take the case of two variables X and Y, we shall have two regression lines as regression line of X on Y and the regression line of Y on X. The regression line of Y on X gives most probable values of Y for given values of X and the regression line of X on Y gives most probable values of X for given values of Y. Thus we have two regression lines.
7
Under what conditions can there be one regression line?
When there is either perfect positive or perfect negative correlation between the two variables, the two correlation lines will coincide i.e. we will have one line. The further the two regression lines are from each other, the lesser is the degree of correlation and the nearer the two regression lines to each other, the higher is the degree of correlation. If variables are independent, r is zero and the lines of regression are at right angles i.e. parallel to X – axis and Y – axis.
8
What are regression equations?
Regression Equations are algebraic expressions of regression lines. Since there are two regression lines there are two regression equations–the regression equation of X on Y is used to describe the variations in the values of X for given change in Y and the regression equation of Y on X is used to describe the variation in the values of Y for given changes in X.
9
Why is the line of ‘best fit’?
The line of regression is the line which gives the best estimate to the value of one variable for any specific value of other variable. Thus the line of regression is the line of “best fit” and is obtained by the principles of least squares.
10
What is the general form of the regression equation of Y on X?
The general form of the linear regression equation of Y on X is expressed as follows: Ye = a+bX, where Ye = dependent variable to be estimated X = independent variable. In this equation a and b are two unknown constants (fixed numerical values) which determine the position of the line completely. The constants are called the parameters of the line. If the value of either one or both of them are changed, another line is determined.
11
What is the general form of the regression equation of Y on X?
The parameter ‘a’ determines the level of the fitted line (i.e. the distance of the line directly, above or below the origin). The parameter ‘b’ determines the slope of the line i.e. the change in Y for unit change in X. To determine the values of ‘a’ and ‘b’, the following two normal equations are to be solved simultaneously. Y=Na+bX, XY=aX +bX2.
12
What is the general form of the regression equation of X on Y?
The general form of the regression equation of X on Y is expressed as follows: X = a+bY To determine the values of ‘a’ and ‘b’, the following two normal equations are to be solved simultaneously. X = Na+bY, XY = aY+bY2. Continued……
13
What is the general form of the regression equation of X on Y?
These equations are usually called the normal equations. In the equations X, Y, XY, X2, indicate totals which are computed from the observed pairs of values of the two variables X and Y to which the least squares estimating line is to be fitted and N is the total number of observed pairs of values. The geometrical presentation of the linear equation , Y = a+bX is shown in the diagram below:
14
Y Y=a+bX b 1 UNIT IN X o X
15
It is clear from this diagram, the height of the line tells the average value of Y at a fixed value of X. When X=0, the average value of Y is equal to a . The value of a is called the Y- intercept since it is the point at which the straight line crosses the Y- axis. The slope of the line is measured by b, which gives the average amount of change of Y per unit change in the value of X. The sign of b also indicates the type of relationship between Y and X.
16
How are the values of ‘a’ and ‘b’ obtained to determine a regression completely?
The values of ‘a’ and ‘b’ are obtained by “the method of least squares” which states that the line should be drawn through the plotted points in such a manner that the sum of the squares of the vertical deviations of the actual Y values from the estimated Y values is the least, or in other words, in order to obtain a line which fits the points best, (Y–Ye)2 should be minimum. Such a line is known as line of best fit.
17
How can the normal equations be arrived at?
Continued……
18
Let S = (Y –Ye)2 = (Y –a – bX)2 ( Ye = a+bX)
Differentiating partially with respect to a and b, Continued…………
19
Regression equation of Y on X is given by Y = a+bX
Example: The following data give the hardness (X) and tensile strength (Y) of 7 samples of metal in certain units. Find the linear regression equation of Y on X. X: 146 152 158 164 170 176 182 Y: 65 78 77 89 82 85 86 Solution: Regression equation of Y on X is given by Y = a+bX The normal equations are: Y = Na + bX…………………………….. (1) XY = aX+ bX2………………………… (2) Continued…………
20
Calculation of regression equations
X Y X2 Y2 XY 146 75 21316 5625 10950 152 78 23104 6084 11856 158 77 24964 5929 12166 164 89 26896 7921 14596 170 82 28900 6724 13940 176 85 30976 7225 14960 182 86 33124 7396 15652 1148 (=X) 572 (=Y) 189280 (=X2) 46904 (=Y2) 94120 (=XY) Continued……..
21
Substituting the values in equations (1) and (2), we get
Here, N = 7 Substituting the values in equations (1) and (2), we get 572 =7a b ………………………. (5) 94120 = 1148 a b ……………… (4) Multiplying the equation (3) by 164, we get 93808 =1148 a b ………..(5) Subtracting this equation from (4), we get b = 0.31 Putting this value of b in equation (3), we have Continued……..
22
The linear regression equation of Y on X is Y = 30.87 + 0.31X
23
Calculate the regression equations of X on Y and Y on X from the following data
1 2 3 4 5 Y: 8 7 Solution: X Y X2 Y2 XY 1 2 4 5 25 10 3 9 8 16 64 32 7 49 35 X=15 Y=25 X2 = 55 Y2 =151 XY = 88 Continued……..
24
Regression equation of X on Y is given by X= a +bY.
The normal equations are X = Na+bY and XY=aY+bY2 Substituting the values, we get 15 =5a+25b …….. (1) 88= 25a+151b ……(2) Solving (1) and (2), we get =0.5 and b =0.5
25
Hence, the regression equation of X on Y is given by X = 0.5+ 0.5Y
The Regression equation of Y on X is : Y = a+bX The normal equations are Y= Na+ bX XY = aX + bX2 Substituting the values, we get 25 = 5a +15b …………………… (iii) 88 = 15 a + 55b ……………… (iv) Solving (iii) and (iv), we get a = 1.10 and b = 1.3 Hence, the regression equation of Y on X is given by Y = X Continued……..
26
The value byx can be easily obtained as follows:
What will be the forms of regression equation of X on Y and regression equation of Y on X on the basis of deviations taken from arithmetic Means of X and Y? If we take the deviations of X and Y series from their respective means, the regression equation of Y on X will take the form The value byx can be easily obtained as follows: The two normal equations in terms of x and y will then become
27
Since x =y =0 (deviations being taken from means)
Equation (1) reduces to Na = 0 a = 0 Equation (2) reduces to
28
After obtaining the value of byx , the regression equation can easily be written in terms of X and Y by substituting for y, and for x, Similarly, the regression equations X= a+bY is reduced to = and the value of bxy can be similarly obtained as
29
Example: The following data give the hardness (X) and tensile strength (Y) 7 samples of metal in certain units. Find the linear regression equation of Y on X. X: 146 152 158 164 170 176 182 Y: 75 78 77 89 82 85 86 Continued……..
30
Hardness (X) Strength (Y) (x) (y) x2 y2 xy 1 146 75 - 18 - 6.7142 324
45.08 120.86 2 152 78 - 12 144 13.80 44.57 3 158 77 - 6 36 22.22 28.29 4 164 89 7.2858 53.08 5 170 82 6 0.2853 0.08 1.71 176 85 12 3.2858 10.80 39.43 7 182 86 18 4.2858 18.37 77.14 N=7 X=1148 Y=572 x=0 y=0 x2= 1008 y2 =163.43 xy =312 Continued……..
32
What are regression co–efficients?
The quantity b in the regression equations (Y= a+bX and X = a+bY) is called the “regression co–efficient” or “slope co-efficient”. Since there are two regression equations, therefore, there are two regression co-efficients regression co–efficient of X on Y and regression co–efficient of Y on X.
33
What is regression co–efficient of X on Y?
The regression co–efficient of X on Y is represented by the symbol bxy or b1 . It measures the amount of change in X corresponding to a unit change in Y. The regression co–efficient of X on Y is given by When deviations are taken from the means of X and Y, the regression co–efficient is obtained by Continued…..
34
What is regression co–efficient of X on Y?
When deviations are taken from assumed means, the value of bxy is obtained as follows:
35
What are the assumptions of constructing a regression modal?
The value of the dependent variable, Y, is dependent in some degree upon the value of the independent variable, X. The dependent variable is assumed to be a random variable, but the values of X are assumed to be fixed quantities that are selected and controlled by the experimenter. The requirement that the independent variables assumes fixed values, however, is not a critical one. Useful results can still be obtained by regression analysis in the case where both X and Y are random variables. The average relationship between X and Y can be adequately described by a linear equation Y= a+bX whose geometrical presentation is a straight line.
36
The height of the line tells the average value of Y at a fixed value of X. When X= 0, the average value of Y is equal to a. The value of a is called the Y intercept, since it is the point at which the straight line crosses the Y–-axis. The slope of the line is measured by b, which gives the average amount of change of Y per unit change in the value of X. The sign of b also indicates the type of relationship between Y and X. Associated with each value of X there is a sub– population of Y. The distribution of the sub– population may be assumed to be normal or non– specified in the sense that it is unknown. In any event, the distribution of each population Y is conditional to the value of X.
37
An individual value in each sub-population Y, may be expressed as:
The mean of each sub- population Y is called the expected value of Y for a given X: Furthermore, under the assumption of a linear relationship between X and Y, all values of must fall on a straight line This is Which is the population regression equation for our bivariate linear model. In this equation a and b are called the population regression co– efficient. An individual value in each sub-population Y, may be expressed as:
38
Where e is the deviation of a particular value of Y from and is called the error term or the stochastic disturbance term. The errors are assumed to be independent random variables because Y’s are random variables and independent. The expectations of these errors are zero; E(e) = 0. Moreover, if Y’s are normal variables, the error can also be assumed to be normal. It is assumed that the variances of all sub–populations, called variances of the regression, are identical.
39
What is regression co–efficient of Y on X ?
The regression Co–efficient of Y on X is represented by byx or b2. It measures the amount of change in Y corresponding to a unit change in X. The value of byx is given When deviations are taken from the means of X on Y, Continued……..
40
When deviations are taken from assumed mean
41
What are the properties of regression co–efficients?
The co–efficient of correlation is the geometric mean of the two regression co–efficients . Symbolically If one of the regression co–efficients is greater than unity, the other must be less than unity, as the value of the co–efficient correlation cannot exceed unity. Example: if bxy =1.2 and byx =1.4, Continued……
42
What are the properties of regression co–efficient?
Both the regression co–efficients will have the same sign i.e. they will be either positive or negative. The co–efficient of correlation will have the same sign as that of regression co–efficient. Example: Continued……
43
What are the properties of regression co-efficient?
The average value of the two regression co–efficients would be greater than the value of co–efficient of correlation. Symbolically, Example: r Continued……
44
Regression co–efficients are independent of change of origin but not scale.
45
Example: Prove that the co–efficient of correlation is the geometric mean of the regression co–efficient Proof: Let bxy be co– efficient of X on Y and byx be co–efficient of Y on X. The co–efficient of correlation is the geometric mean of the two regression co–efficients.
46
Prove that Regression co–efficients are independent of change of origin but not scale.
47
i) The sales for advertisement expenditure of Tk. 80 lakhs and
Example: The following figures relate to advertisement expenditure and corresponding sales Advertisement (in lakhs of Taka) 60 62 65 70 73 75 71 Sales ( in crores of Taka) 10 11 13 15 16 19 14 Estimate i) The sales for advertisement expenditure of Tk. 80 lakhs and ii) The advertisement expenditure for a sales target of Tk. 25 crores
48
Let the advertisement expenditure be denoted by X and sales by y calculation of regression equations
xy 60 8 64 10 4 16 32 62 6 36 11 3 9 18 65 13 1 70 2 15 73 5 25 75 7 49 19 35 71 14 X=476 x= 0 x2=196 Y=98 y= 0 y2= 476 xy= 100
50
When advertisement expenditure, X= 80 lakhs, Sales,Y= 0.51×80 – 20.68
= 40.8 – 20.68 = 20.12 The likely sales for advertisement expenditure of Tk. 80 lakhs = Tk crores.
51
ii) Regression Equation of X on Y:
Putting the values in equation (2), we get
52
When sales target, Y= 25 croes,
Advertisement expenditure , X= 1.79 ×25+43 = = 87.75 The likely advertisement expenditure for a sales target of Tk.25 crores = Tk Lakhs.
53
Advertising Expenditure 10 12 15 23 20 Sales 14 17 25 21
Example: The following data relate to advertising expenditure (in lakhs of Taka) and their corresponding sales (in crores of Taka): Advertising Expenditure 10 12 15 23 20 Sales 14 17 25 21 Estimate i) the sales corresponding to advertising expenditure of Tk 30 lakhs and ii) the advertising expenditure for a sales target of Tka. 35 crores.
54
Calculation of regression equations
X (x) x2 Y (y) y2 xy 10 - 6 36 14 +36 12 - 4 16 17 - 3 9 +12 15 - 1 1 23 +3 +7 49 25 +5 +35 20 +4 21 +1 X= 80 x= 0 x2=118 Y=100 y=0 y2=80 xy=84 .
55
From equation (1) , we have
56
When the advertisement expenditure is Tk. 30 lakhs,
Sales, Y= X = =29.968 Thus the likely sales corresponding to advertisement expenditure of Tk.30 lakhs is Tk crores. Regression equation of X on Y is given by
57
advertising expenditure, X = - 5 +1.05× 35 = - 5 +36.75 = 31.75
when the sales target is Tk. 35 crores, the advertising expenditure, X = × 35 = = 31.75 The advertising expenditure for a sales target of Tk. 35 crores is Tk lakhs.
58
What is the standard error of estimate?
The measure of variation of the observations around the computed regression line is referred to as the standard error of estimate. Just as the standard deviation is a measure of the scatter of observations in a frequency distribution around the mean of that distribution, the standard error of estimate is a measure of the scatter of the observed values of Y around the corresponding computed values of Y on the regression line. It is computed as a standard deviation, being also a square root of the mean of the squared deviation. But the deviations here are not deviations of the items from the arithmetic mean; they are rather the vertical distances of every dot from the line of average relationship.
59
What are the formulae for calculating the standard error of estimate?
where Syx= S.E. of estimate of regression equation of Y on X.
60
The standard error of estimate can be easily calculated with the help of the following formula.
Continued…….
61
Significance of standard error
The standard error of estimate measures the accuracy of the estimated figures. The smaller the value of standard error of estimate, the closer will be dots to the regression line and the better the estimates based on the equation for this line. If standard error of estimate is zero, then there is no variation about the line and the correlation will be perfect. With the help of standard error of estimate, it is possible for us to ascertain how good and representative the regression line is as a description of the average relationship between two series.
62
What is co–efficient of determination?
The ratio of the unexplained variation to the total variation represents the proportion of variation in Y that is not explained by regression on X. Subtraction of this proportion from 1.0 gives the proportion of variation in Y that is explained by regression on X. The statistic used to express this proportion is called the co–-efficient of determination and is denoted by R2. It may be written as follows: Continued…….
63
The value of R2 is the proportion of the variation in the dependent variable Y explained by regression on the independent variable X.
64
Thank you For Attending The Session
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.