Presentation is loading. Please wait.

Presentation is loading. Please wait.

REGRESSION ANALYSIS Definition:

Similar presentations


Presentation on theme: "REGRESSION ANALYSIS Definition:"— Presentation transcript:

1 REGRESSION ANALYSIS Definition:
      A regression is a statistical analysis assessing the average association between two variables. It is used to find the relationship between two variables. (i.e). It uses one variable to predict the value of another variable *It tests hypotheses concerning the relationship between two variables *It quantifies the strength of the relationship between two variables Definition:

2 Uses of Regression Analysis
It is useful to estimate the average relationship between two variables. It is useful for prediction of unknown value. It is widely used in social science like economics, natural and physical science. It is useful to forecast the business situations. It is useful to estimate the error in sampling . It is useful to calculate correlation co-efficient and co-efficient determination.

3 Difference Between correlation and Regression
1) Relationship between two are more variables. Average Relationship between two are more variables . 2) X and Y are random variable X is the random variable and Y is the fixed variable. 3) It gives limited information after verifying the Relationship between variables . It is used for the prediction of one value, relation to the other given value. 4)The range of relationship lies between -1 and +1. Regression value is an absolute figure. 5)It studies the linear relationship between the variables. It studies the linear and non-linear relationship between the variables. 6)If the co-efficient of correlation is positive ,then the two variables are positively correlated and vice versa. The regression co-efficient explain that the decrease in one variable is associated with the increase in the other variable.

4 Regression Mathematical Equations
The Algebraic expressions of the two regressions lines are called regression equations. Regression equation of X on Y. Xc = a + bx To determined the value of a and b , the following two normal equations are to be solved simultaneously. Σx = Na+bΣy Σxy = aΣy+bΣy2 Regression equation of Y on X. Xy = a + bx To determined the value of a and b , the following two normal equations are to be solved simultaneously. Σy = Na+bΣx Σxy = aΣx+bΣx2

5 Find the regression equations
1)Deviation taken from Actual Mean of X and Y. 2) Deviation taken from Assumed Mean of X and Y. 1)Deviation taken from Actual Mean of X and Y. The regression equation of X on Y is: _ (σ x) _ X - X = r ______ (Y – Y ) (σ y) (σ x) Σxy r = __________ = ____ (σ y) Σy2 The regression equation of Y on X is: _ (σ y) _ Y-Y = r ______ (X –X) (σ x) (σ x) Σx y r= __________ = ____ (σ y) Σx2

6 1)Deviation taken from Actual Mean of X and Y
A panel of Two judges P and Q graded dramatic performance by independently awarded marks as follows: Calculate the regression equations. The eighth performance which judge Q could not attend, was awarded 37 marks by judge P. If judge Q had also been present, how many marks could be expected to have been awarded by him to the eight performance and also find out the Kral Pearson’s coefficient of correlation. Performance 1 2 3 4 5 6 7 Marks by P 46 42 44 40 43 41 45 Marks by Q 38 36 35 39 37

7 Deviation taken from Actual Mean of X and Y Solution
Let the marks awarded by judge P be represented by X and those awarded by judge Q be Y. We have to find out the value of Y when X= 37 . This can be done by finding out the regression equation X on y and equation Y on X and also find out Kral Pearson’s coefficient correlation. X _ X X; 43 =X x2 Y Y- Y;38 =y y2 xy 46 3 9 40 2 4 6 42 -1 1 38 44 36 -2 -3 35 43 39 41 37 45 ΣX= 301 Σx=0 Σx2 =28 ΣY= 266 Σy=0 Σy2= 28 Σxy= 21

8 (Continue) X – 43 (0.75)* Y – 38; X = X – 43 (0.75Y – 28.5)
Mean = X = = = 43 N _ ΣY Mean = Y = = = 38 The regression equation of X on Y is: _ (σ x) _ X-X = r ______ (Y –Y) (σ y) (σ x) Σx y r = __________ = ____ = = 0.75. (σ y) Σy X – 43 (0.75)* Y – 38; X = X – 43 (0.75Y – 28.5) X = 0.75Y = X = 0.75Y = 14.5.

9 (Continue) The regression equation of Y on X is: _ (σ y) _ Y-Y = r ______ (X –X) (σ x) (σ x) Σxy r= __________ = ____ = _____ = 0.75 (σ y) Σx Y (o.75)*(x -43) Y = y – 38 (0.75x ) Y = .75x = Y = .75x When X =37 what is Y ? Y = ( .75 * 37 )+ 5.75 Y = Y = 33.5 If Judge Q was present he would have a awarded 33.5 Marks. Kral Pearson’s correlation coefficient = √bxy x √byx = √.75 x √.75 = 0.56

10 Actual Mean Method – Additional Problem
Calculate the co-efficient of correlation and obtain the lines of regression for the following data. Obtain an estimate of Y which should correspond to the average X = 6.2. X 1 2 3 4 5 6 7 8 9 Y 10 12 11 13 14 16 15

11 Regression Analysis – Assumed Mean Method – Modal.2
The difference between the above said method and this is instead of taking deviation from the arithmetic mean ,we take deviation from the assumed mean .If the actual mean is fraction, this method can be used. The regression equation of X on Y is: _ (σ x) _ X - X = r ______ (Y – Y) (σ y) (σ x) N*Σdxdy - ( Σdx ) (Σdy) r= __________ = (σ y) N Σdy2 _ (Σdy)2 The regression equation of Y on X is: _ (σ y) _ Y - Y = r ______ (X – X) (σ x) N*Σdxdy _ ( Σdx ) (Σdy) (σ y) _____________________ r = __________ = N Σdx2 _ (Σdx)2

12 Regression Analysis –Assumed Mean Method
Illustration :1 Prices indices of cotton and wool are given below for the 12 months of a year. Obtain the equations of lines of equation between the indices. Price index of Cotton (x) 78 77 85 88 87 82 81 76 83 97 93 Price index of Wool(Y) 84 89 90 92 98 99

13 Solution dX2 dy2 dxdy dx = x – X, 84 dy = y – y, 88 78 -6 36 84 -4 16
Price index of Cotton (x) A dx = x – X, 84 dX2 Price index of Wool (Y) - A dy = y – y, 88 dy2 dxdy 78 -6 36 84 -4 16 24 77 -7 49 82 42 85 1 88 4 -3 9 -12 87 3 89 -2 90 2 81 92 -28 76 -8 64 83 -5 25 40 -1 97 13 169 98 10 100 130 93 99 11 121 Σx=1004 Σdx=-4 ΣdX2 488 Σy=1061 Σdy=5 Σdy2=365 Σdxdy =287

14 Continue The regression equation of X on Y is:
_ (σ x) _ _ ΣX ΣY X - X = r ______ (Y – Y) ; X = _____ = ; = ; Y = _____ = = (σ y) N N N*Σdxdy - ( Σdx ) (Σdy) (σ x) ______________________ r = _____ = N Σdy2 _ (Σdy)2 (σ y) 12* (-4 ) (5) *12 – (-20) = _______________________ = ____________ 12 * 365 – ( 5) * = _________ = ______ = X – = (y – 88.42) X = 0.795y – X = 0.795y – X = o.795y

15 Continue The regression equation of Y on X is:
Y - Y = r ______ (X – X) (σ x) N*Σdxdy - ( Σdx ) (Σdy) (σy) ________________________ r= __________ = N Σdx2 _ (Σdx)2 (σx) 12*287 _ (-4 ) (5) *12 – (-20) = ____________________ = ____________ 12*488- (-4) * = _________ = ______ = 0.59 Y – = 0.59 (x – 83.67) Y = 0.59x – 49.37 Y = 0.59x – Y = o.59x

16 Illustration- 2 The quantity of raw material purchased by a company at the specified price during the 12 months 0f 2010 is given. a) Find the regression equations based on the above data. b) Can you estimate the approximate quantity likely to be purchased if the price shoots up to Rs.124 per kg. Month Price perKg/Rs Quantity Kgs January 96 250 February 110 200 March 100 April 90 280 May 86 300 June 92 July 112 220 August September 108 October 116 210 November December

17 Solution Price( R.s )X (X – A) = dx (A=100) dx2 Quantity (Y) (y – A) = dy (A = 248) dy2 dxdy 96 110 100 90 86 92 112 108 116 -4 10 -10 -14 -8 12 8 16 196 64 144 256 250 200 280 300 220 210 2 -48 32 52 -28 -38 4 2304 1024 2704 784 1444 -480 -320 -728 -416 -336 -384 -608 -16 Σx=1200 Σdx=0 ΣdX2 = 1344 Σy=2980 Σdy=4 Σdy2=16768 Σdxdy =

18 Continue The regression equation of X on Y is:
_ (σ x) _ _ ΣX ΣY X - X = r ______ (Y – Y) ; X = _____ = ; = ; Y = _____ = = (σ y) N N N *Σdxdy - ( Σdx ) (Σdy) (σ x) ______________________ r _____ = N Σdy2 _ (Σdy)2 (σ y) ( 12* ) (0 ) (4) – 0 x16 = _______________________ = _______________________ 12 *16768 – ( 4) -52320 = _________ = 201200 X – 100 = (y – ) X = -0.26y X = y X = y

19 Continue The regression equation of Y on X is:
Y - Y = r ______ (X – X) (σ x) N*Σdxdy - ( Σdx ) (Σdy) (σy) ________________________ r __________ = N Σdx2 _ (Σdx)2 (σx) 12* _ (-0 ) (4) – 0 x = ____________________ = ____________ = _______ = 12*1344 – (0) Y – = (x – 100) Y = X Y = X Y = X b) When price is Rs. 124quantity is (X = 124) Y = X = (124) = =

20 University Question –December- 2009
A tyre manufacturing company is interesting in removing pollutants from the exhaust at the factory cost is concern. The company has collected data from other companies the amount of money spent in environmental measures and the resulting amount of dangerous amount of pollutants released ( as a percentage of total emission). Compute regression equation Predict the percentage of dangerous pollutants release when Rs.20,00,000 spent on control measures. Money Spent (R.s.Lakhs) Percentage of dangerous pollutants 8.4 10.2 16.5 21.7 9.4 8.3 11.5 18.4 16.7 19.3 28.4 4.7 12.3 35.9 31.8 24.7 25.2 36.8 35.8 33.4 25.4 31.4 27.4 15.8 31.5 28.9

21 Money Spent (R.s.Lakhs) (X) Percentage of dangerous pollutants (Y)
Solution Money Spent (R.s.Lakhs) (X) 1 _ X – X 14.29 = x 2 X 2 3 Percentage of dangerous pollutants (Y) 4 Y- Y; =y 5 Y 2 6 XY 2x5 =7 8.4 10.2 16.5 21.7 9.4 8.3 11.5 18.4 16.7 19.3 28.4 4.7 12.3 - 5.89 - 4.09 2.21 7.41 - 4.89 - 5.99 - 2.79 4.11 2.41 5.01 14.11 - 9.59 -1.99 34.69 16.73 4.88 54.91 23.91 35.88 7.78 16.89 5.81 25.10 199.09 91.97 3.96 35.9 31.8 24.7 25.2 36.8 35.8 33.4 25.4 31.4 27.4 15.8 31.5 28.9 6.37 2.27 - 4.83 4.33 7.27 6.27 3.87 -4.13 1.87 -2.13 -13.73 1.97 -0.63 40.58 5.15 23.33 18.75 52.85 39.31 14.98 17.06 3.50 4.54 188.51 3.88 0.40 -37.51 -9.28 -10.67 -32.08 -35.55 -37.55 -10.79 -16.97 4.50 -18.89 1.25 Σ x =185.8 ΣX2 = 521.60 Σ y = 384 ΣY 2 = 412.84 ΣXY =

22 Y = .75x + 5.75 When X =37 what is Y ? Y = ( .75 * 37 )+ 5.75
The regression equation of Y on X is: _ (σ y) _ Y-Y = r ______ (X –X) (σ x) (σ x) Σxy r= __________ = ____ = _____ = 0.75 (σ y) Σx Y (o.75)*(x -43) Y = y – 38 (0.75x ) Y = .75x = Y = .75x When X =37 what is Y ? Y = ( .75 * 37 )+ 5.75 Y = Y = 33.5 If Judge Q was present he would have a awarded 33.5 Marks.

23 Solution Price( R.s )X (X – A) = dx (A=100) dx2 Quantity (Y) (y – A) = dy (A = 248) dy2 dxdy 96 110 100 90 86 92 112 108 116 -4 10 -10 -14 -8 12 8 16 196 64 144 256 250 200 280 300 220 210 2 -48 32 52 -28 -38 4 2304 1024 2704 784 1444 -480 -320 -728 -416 -336 -384 -608 -16 Σx=1200 Σdx=0 ΣdX2 = 1344 Σy=2980 Σdy=4 Σdy2=16768 Σdxdy =

24

25 University Question July 2009
Problem.1) The following data give the experience of machine operator and their performance rating as given by the number of goods parts turnout per 100 pieces. Operator : Experience : Performance Rating : Obtain the regression lines of Performance Rating and Experience and also estimate the probable performance if an operator has 9 year experience.

26 Modal-3- Regression Analysis
From the following results , estimate the marks in Management obtained by a student who has scored 60 marks in mathematics in a certain examination: Co-efficient of correlation= 0.4 Particulars Score in Management (X) Score in Mathematics (Y) Mean 80 50 Standard Deviation 15 10

27 Solution Let us X denote marks in management and Y denote marks in mathematics .We have to estimate marks in management for a student who has secured 60 marks in mathematics. Hence we have to fit a regression of X on Y. _ (σ x) _ X - X = r ______ (Y – Y) _ _ (σ y) X = 80, Y = 50; σ x = 15; σ y = 10; r = 0.4. (15) _ X = 0.4 ______ ( Y – 50) (10) = X – 80 = 0.6 (Y - 50) = X – 80 = 0.6y = 0.6y – = 0.6y + 50 Estimate if the Y =60 X = 0.6y + 50 0.6 * = X = 86 Hence for a student who has obtained 60 marks in mathematics , the likely marks in management are 86.

28 Modal-3-Regression Analysis (additional)
From the following data of the rainfall and production of rice ,find the mostly like the production corresponding to the rainfall of 40”. Co-efficient of correlation= 0.8 Particulars Rainfall Inches X Production(quintals) (Y) Mean 35 50 Standard Deviation 5 8

29 Modal-4 – Regression Analysis
The following calculations have been made for prices of twelve stocks (X) on the calcutta stock exchange on a certain day along with the volume of sales in thousand of shares (Y).from these calculations find the regression equations of prices on stock , on the volume of sales of shares . Σx = 580; Σy = 370; Σxy = 11,494; Σx2 = 41, Σy2 = 17,206 _ (σ x) _ X - X = r ______ (Y – Y) (σ y) _ _ (σ x) Σxy - N * X * Y r= _______ = _______________ (σ y) ΣY N * (ΣY)2 _ _ X = Σx /N = 580/12 = 48.33; Y = Σy / N = 370/12 = 30.83 11,494 – 12 * * ,494 – r = = = = * (370)

30 Continue X – = (Y – 30.83) X – = Y X = – 1.102 X =82.31 – 1.102Y

31 Modal-4 – Regression Analysis- additional problem
From 10 observation of price (x) and supply (Y) of a commodity ,the following summary figure were obtained ( in approximate units). Σx = 130; Σy = 220; Σxy = 3467; Σx2 = 2288; Compute the line of regression of Y on X and interpret the result. Estimate the supply when price is 16 units.

32 Modal 5 - Regression analysis
The research investigator collected data on saving and investment from 16 household . Saving show a mean of Rs and a variance of Rs As against this , mean investment was found as Rs and variance as Rs If the coefficient of correlation between saving and investment is 0.67, find the most approximate value saving against an investment of Rs. 9000, and that of investment against a saving of Rs Solution: Let X denote Savings and Y denote Investment Regression line X on Y _ _ Given ; n =16 ; X = 6565 ; σ x2 = 250 ; y = 4525 ; σ y2 = 520 ; rxy = 0.67 ; _ (σ x) _ 0.67 x √250 X - X = r ______ (Y – Y) ; = X – 6565 = ( y – 4525) (σ y) √520 = (y – 4525) i.e. x = y When y = 9000, x = ( 9000) i.e. x = Therefore investment is Rs.9000, saving = Rs

33 Continue Now regression line of y on x is _ (σ y) _ 0.67 x √520 y - y = r ______ (x – x) ; = y – 4525 = ( x – 6565) (σ x) √250 = (x – 6565) y = x – therefore x = 5000, y = (5600) = Rs Therefore When investment is Rs. 5600, Savings = Rs

34 Modal 5 – Additional Problem
The researcher collected data on Research expenses and selling expenses from 16 companies . Research expenses show a mean of Rs. 13, and a variance of Rs As against this , mean selling expenses was found as Rs and variance as Rs If the coefficient of correlation between saving and investment is 0.67, find the most approximate value Research expenses against an selling expenses of Rs.18,000, and that of selling expenses against a Research expenses of Rs.11,200.


Download ppt "REGRESSION ANALYSIS Definition:"

Similar presentations


Ads by Google