REGRESSION ANALYSIS 11/28/2019.

REGRESSION ANALYSIS 11/28/2019

REGRESSION ANALYSIS Regression analysis attempts to establish nature of relation between variables Measure of average relation between two or more variables Most frequently used technique in economics and business research 11/28/2019

Historical Origin of Regression
Regression Analysis was first developed by Sir Francis Galton, who studied the relation between heights of sons and fathers. Heights of sons of both tall and short fathers appeared to “revert” or “regress” to the mean of the group. 1. In the latter part of 19th century He considered this tendency to be a regression to “mediocrity. He developed a mathematical description of this regression tendency. Galton’s Model is the precursor of today’s regression models.

REGRESSION ANALYSIS Statistical tool to estimate the unknown values of one variable from known values of another variable Independent (X) and dependent variable (Y) Simple linear regression analysis: only one predictor and straight line Dependent and independent refer to the mathematical or functional meaning Values of Y are dependent on values of X, X may or may not be causing change in Y 11/28/2019

USES Provides estimates of values of dependent variables from values of independent values : regression lines Obtains a measure of error involved in using regression line as basis for estimation Correlation coefficient can be calculated with help of regression coefficient 11/28/2019

DIFFERENCES WITH CORRELATION
Correlation : Measure of degree of relationship, measure degree of co variability Regression : Study the nature of relationship Correlation : Can not tell which variable is cause (& effect) Regression : One variable is dependent, another independent 11/28/2019

REGRESSION LINES Lines cut each other at point of average of X and Y
Drawn on assumption of least square 11/28/2019

REGRESSION EQUATIONS Regression equation of ‘Y’ on ‘X’ is expressed as:- Y = a + bX ‘Y’ is dependent variable, ‘X’ is independent ‘a’ is ‘Y-Intercept’, ‘b’ is slope (change in Y for unit change in X) Values of ‘a’ and ‘b’ by method of least squares 11/28/2019

REGRESSION EQUATIONS Least Square Method : line should be drawn through plotted points in such a manner that the sum of squares of deviations of actual ‘y’ values from computed ‘y’ values is the least Σ(y-ye)2 should be minimum to obtain best fitting line 11/28/2019

CHARACTERISTICS OF STRAIGHT LINE (BEST FIT)
Gives the best fit of data Σ(y-ye)2 should be minimum, deviation above the line equals those below the line Straight line goes through overall mean of data For data representing sample from a population, least square line is ‘best’ estimate of population regression line 11/28/2019

REGRESSION EQUATIONS SIMILARLY, REGRESSION EQUATION OF ‘X’ ON ‘Y’ IS EXPRESSED AS:- X = a + bY ‘X’ IS DEPENDENT VARIABLE, ‘Y’ IS INDEPENDENT. ‘a’ IS “X-INTERCEPT”, ‘b’ IS SLOPE (CHANGE IN ‘X’ FOR UNIT CHANGE IN ‘Y’). FIND VALUES OF ‘a’ AND ‘b’ BY METHOD OF LEAST SQUARES. 11/28/2019

EXPRESSION FOR A LINE Q P y’ x’ a = intercept y 9 8 y = 4 +0.3x 7 6 5
2 1 Q y = x P y’ x’ b (Slope) = y’/x’ a = intercept X

REGRESSION ANALYSIS : LIMITATIONS
Assumption; relationship has not changed since regression equation was computed Relationship shown by the scatter diagram may not be the same if equation is extended beyond the values used in computing the equation 11/28/2019

LINE OF BEST FIT Regression Equation is given by Where, and
The numerator of equation for b is called Sum of Products SPxy Denominator is Sum of Squared Deviations from mean SSx. Denominator will always be +ive and sign of slope of the line would be determined by sign of numerator.

REGRESSION EQUATION FOR POINT ESTIMATE
If number of hrs study is 4 hrs, what will be estimate of marks in Exam? ‘Point Estimate’ of y using the regression equation. Y = a + b * x = * 4 = 21.58 { Value of ‘x’ for which you wish to find estimate of y, should lie within the range of given data ( i.e. 3-10)}.   Reliability of Point Estimate depends on:- Sample size. Amount of variation within the sample. Value of ‘x’ ? Therefore, ‘Interval Estimate’ is always better.

(Measure of Goodness of Fit) (Std Error of Regression)
STD ERROR OF ESTIMATE (Measure of Goodness of Fit) (Std Error of Regression)

ASSUMPTIONS LINE 1. All actual values of y for a given value of x are normally distributed around its estimated value y (half negative and half positive). 2. Mean of each error component is zero (Mean of all y’s for a given x is equal to y estimate. 3. Variances of each error component (variances of all the y’s for various x’s) are same - homoscedasticity. 4. The errors are indep of each other.

Assumptions of the Simple Linear Regression Model
X Y LINE assumptions of the Simple Linear Regression Model LINEAR, INDEPENDENT, NORMAL & EQUAL VAR Identical normal distributions of errors, all centered on the regression line. my|x=a +  x x y N(my|x, sy|x2) 19

Pictorial Presentation of Linear Regression Model
The number of man-hours Y is treated in a regression model as a random variable. For each lot size, there is postulated a probability distribution of Y. This figure shows a probability distribution for X= 30, X=50, and X=70. The actual number of man-hours Y is then viewed as a random selection from this probability distribution. The means of the probability distributions have a systematic relation to the level of X. This systematic relationship is called the regression function of Y on X. The graph of regression function is called regression curve. In this figure the regression function is linear. This implies that the mean number of man-hours varies linearly with lot size.

REPRESENTING STANDARD ERROR OF ESTIMATE
y  1Sy,x  2Sy,x y = a + b x   3Sy,x Dependent Variable Indep Variable X

STANDARD ERROR OF ESTIMATE
In HRS of study example Std error of estimate would be =√2.884=1.698 marks. What does it mean ?

INTERPRETING STD ERROR OF ESTIMATE
We can expect to find 68.26% of the points (y values) within  1 sy,x 95.45% of the points (y values) within  2 sy,x 99.7% of the points (y values) within  3 sy,x. of estimated y (y hat) Larger the std error of estimate, greater the scattering of points around the scatter line. Conversely, if sy,x = 0, estimating eqn would be a perfect estimator of the dependent variable.

INTERVAL ESTIMATION Interval estimation of y for an x value (for a given LoS and sample size) to Accuracy of this interval estimation depends on the distance of x from its mean (x bar). Closer the value of x, more reliable the estimate Hence, for x values other than x bar, a correction factor is used

CONFIDENCE INTERVAL FOR ESTIMATION
OF MEAN Confidence Interval for mean value of y (using correction factor for a given x ) is given by:- to

PREDICTION OF INTERVAL ESTIMATION OF INDL Y VALUE
Confidence Interval for value of y (and not the mean value of y) is given by:- to THEREFORE INTERVAL FOR Y WOULD BE BIGGER THAN INTERVAL FOR MEAN Y

Confidence Interval for the Average Value of Y
Mean Y 28

Confidence Interval for the Average Value of Y and Prediction Interval for the Individual Value of Y
Mean Y 29

AN ILLUSTRATION : LRCA Qn A study was conducted by the Air Force on the effect of sleep deprivation on air traffic controllers’ performance whilst on watch. The sample data is as follows: No of hrs w/o Sleep No of Errors Estimate No of errors if No of hrs w/o sleep were 10 at 95% CL.

CORRELATION ANALYSIS How strong is the relationship between the dependent and indep variables. How are the variables correlated. Statistical tool to describe the deg to which one variable is linearly related to another. Measures for describing the correlation between two variables: - Coefficient of Determination, r2 - Coefficient of Correlation, r

COEFFICIENT OF DETERMINATION
Measures extent or strength of association. Its % of explained variation in dependent variable (y). Coeff of Determination = Total Variation – Unexplained Variation Total Variation For ATC Case = SST – SSE – 17.3 SST = Case of No of errors and going w/o sleep in ATC r2 = 0.64, What does it mean? Means 64% of errors explained ie due to lack of sleep and balance could be due to poor trg etc Sum of Squared Regression SSR = SST – SSE, SSR is Explained variation

COEFFICIENT OF DETERMINATION
Measures extent or strength of association. Its % of explained variation in dependent variable (y). Coeff of Determination:- y x y = y < < Sum of Squared Regression SSR = SST – SSE, SSR is Explained variation r2 = 0, IF y = y for all values of x showing no correlation. r2 = 1, IF y = y for all values of x showing perfect correlation. < <

CORRELATION ANALYSIS INTERPRETING r2 ANOTHER WAY. Total variation =
Interpret the coeff of determination by looking at amount of the variation in y that can be explained by the regression line. UNEXPLAINED VAR y < (y – y) Total variation = Explained variation + Unexplained var TOTAL VAR EXPLAINED VAR (y – y) < (y – y ) y x

CORRELATION ANALYSIS the Coefficient of Correlation, r r =  r2
Measures the strength of relationship ie how strongly the variables are related Multiple r = 0.8, in case of ATC (Errors & Hrs w/o sleep) means very strong relationship between the two variables Sign of ‘r’ is guided by the sign of the slope (b) of the regression line - ive sign indicates inverse relationship between two variables

CORRELATION COEFFICIENT (r)
PROPERTIES OF SAMPLE CORRELATION COEFFICIENT (r) Ranges between -1 to +1. Sign of r tells whether relationship is positive or negative. Larger absolute value of r indicates stronger relationship. r value near zero indicates ‘no or poor’ relationship between x and y. r = + 1 or - 1 indicates perfect linear relationship. r values of 0, 1 or -1 are rare in practice.

? 11/28/2019

REGRESSION ANALYSIS 11/28/2019.

Similar presentations

Presentation on theme: "REGRESSION ANALYSIS 11/28/2019."— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

REGRESSION ANALYSIS 11/28/2019.

Similar presentations

Presentation on theme: "REGRESSION ANALYSIS 11/28/2019."— Presentation transcript:

Similar presentations

About project

Feedback