Presentation is loading. Please wait.

Presentation is loading. Please wait.

General Announcement ( )

Similar presentations


Presentation on theme: "General Announcement ( )"— Presentation transcript:

1 General Announcement (01.07.2004)
All course slides, solutions to quizzes and solutions to assignments 1 and 3 as well as practice questions on probability have been uploaded on the Intranet. All quizzes and assignments have been corrected. The corrected documents can be seen by approaching Mr. Parmanand Bhuye in Secretarial section on ground floor (in front of reception). You can report totalling mistake, if any. Mark Sheet for the above will be put on notice board by today evening.

2 QUANTITATIVE METHODS 1 SAMIR K. SRIVASTAVA

3 Correlation and Regression
Univariate vs. Bivariate data (Multivariate) More than one attribute for each member of population. Height Weight Absenteeism Production Advertising Expenditure Sales Volume Unemployment Crime Rate Rainfall Food Production Web Site Visitor Profile 11/14/2018 Quantitative Methods 1

4 Correlation and Regression
Are the two attributes related to each other? Can we use one to predict the other? Can we change one to control the other? Predictor Variable and Response Variable Relationship may be Positive or negative (or nonexistent) Weak or strong Two variables are said to be correlated if value of one is indicative of the value of the other. 11/14/2018 Quantitative Methods 1

5 Organizing Bivariate Data Scatter Plots
X Y Negatively Correlated Positively Correlated Loosely Correlated Strongly Correlated Not Correlated 11/14/2018 Quantitative Methods 1

6 Measuring the Strength of Correlation
Can we define a quantitative measure of strength of correlation? Covariance is such a measure. Looks similar to variance. Can be positive as well as negative. When will it have a positive vs. negative value? A High vs. Low value? 11/14/2018 Quantitative Methods 1

7 Measuring the Strength of Correlation
11/14/2018 Quantitative Methods 1

8 Coefficient of Correlation
Suppose we wish to measure the strength of correlation on a scale of 0 to 1 Is Covariance an appropriate measure? What if we multiply all X values by a constant? The measure should not be affected by change of scale. Coefficient of Correlation r = xy/(x.y) Value of r lies between -1 and 1 Values close to 0 indicate little or no correlation Values close to 1 or -1 indicate a very strong correlation. 11/14/2018 Quantitative Methods 1

9 Illustration x 1.25 1.75 2.25 2.00 2.50 2.70 17.50 x = 2.15 y 125 105 65 85 75 80 50 55 640 y= 80 x-x -0.9 -0.4 0.1 -0.15 0.35 0.55 y-y 45 25 -15 5 -5 -30 -25 (x-x)2 0.8100 0.1600 0.0100 0.0225 0.1225 0.3025 1.560 Sxx (y-y)2 2025 625 225 25 900 4450 Syy (x-x)(y-y) -40.50 -10.00 -1.50 -0.75 -1.75 -16.50 -8.75 -79.75 Sxy 11/14/2018 Quantitative Methods 1

10 Correlation and Causation
Is there a causal relationship between the two variables? Rainfall Food production Absenteeism Production Advertising Expenditure Sales Strange, Spurious or “nonsense” correlations Teachers’ salaries liquor sales Divorce rate death rate (negative correlation) 11/14/2018 Quantitative Methods 1

11 Correlation and Causation
Spurious correlation is due to a third “Lurking Variable” Economic Growth  higher salaries, higher liquor consumption Age  older couples have fewer divorces, but higher death rate. To establish causation between variables, establish Consistency (relationship true in a variety of contexts) Responsiveness (change in one precedes change in other) Mechanism (manner in which change in X changes Y) 11/14/2018 Quantitative Methods 1

12 Regression Francis Galton Introduced the term in 1877
Height of children  Mean of population Predicting one variable from another Relationships of association, not causal Relating variables mathematically Linear or non-linear Bivariate Linear – between two variables 11/14/2018 Quantitative Methods 1

13 Bivariate Regression Assumptions
Assumptions for bivariate regression : 1. Random sample Ideally N > 20 But different rules of thumb exist. (10, 30, etc.) 2. Variables are linearly related i.e., the mean of Y increases linearly with X Check scatter plot for general linear trend Watch out for non-linear relationships (e.g., U-shaped) 11/14/2018 Quantitative Methods 1

14 Bivariate Regression Assumptions
3. Y is normally distributed for every outcome of X in the population “Conditional normality” Ex: Years of Education = X, Job Prestige (Y) Suppose we look only at a sub-sample: X = 12 years of education Is a histogram of Job Prestige approximately normal? What about for people with X = 4? X = 16 If all are roughly normal, the assumption is met 11/14/2018 Quantitative Methods 1

15 Two Possible Regressions
11/14/2018 Quantitative Methods 1

16 Simple Linear Regression: An Example
For a sample of 8 employees, a personnel director has collected the following data on ownership of company stock, y, versus years with the firm, x. x y (a) Determine the least squares regression line and interpret its slope. (b) For an employee who has been with the firm 10 years, what is the predicted number of shares of stock owned?

17 An Example, cont. x y x•y x2 Mean: Sum: ,

18 An Example, cont. Slope: y-Intercept:
So the “best-fit” linear model, rounding to the nearest tenth, is: b 1 = ( x i y ) n × å 2 41238 8 10 . 5 451 25 968 - 38 7558 ˆ y = 44 . 3140 + 38 7558 x 3 8

19 An Example, cont. Interpretation of the slope: For every additional year an employee works for the firm, the employee acquires an estimated 38.8 shares of stock per year. If x1 = 10, the point estimate for the number of shares of stock that this employee owns is: ˆ y = 44 . 314 + 38 7558 × x ( 10 ) 431 872 432 shares

20 Using the Regression Equation
Before using the regression model, we need to assess how well it fits the data. If we are satisfied with how well the model fits the data, we can use it to make predictions for y. Illustration Predict the selling price of a three-year-old Car with 40,000 km on the odometer 11/14/2018 Quantitative Methods 1

21 Bivariate Regression Assumptions
Examine sub-samples at different values of X. Make histograms and check for normality. Good Not very good 11/14/2018 Quantitative Methods 1

22 Estimating the Coefficients
The estimates are determined by drawing a sample from the population of interest, calculating sample statistics. producing a straight line that cuts into the data. x y w w w w w w w w w The question is: Which straight line fits best? 11/14/2018 Quantitative Methods 1

23 Ordinary Least Squares
1. ‘Best Fit’ Means Difference Between Actual Values (Yi) & Predicted Values (Xi) Are a Minimum But Positive Differences Off-Set Negative 2. OLS Minimizes the Sum of the Squared Differences (or Errors) 11/14/2018 Quantitative Methods 1

24 Least Squares Method The best line is the one that minimizes the sum of squared vertical differences between the points and the line. Sum of squared differences = (2 - 1)2 + (4 - 2)2 + ( )2 + ( )2 = 6.89 Sum of squared differences = (2 -2.5)2 + ( )2 + ( )2 + ( )2 = 3.99 3 4 1 2 2.5 Let us compare two lines (1,2) (2,4) (3,1.5) (4,3.2) The second line is horizontal w The smaller the sum of squared differences the better the fit of the line to the data. 11/14/2018 Quantitative Methods 1

25 Assumptions of OLS regression
Model is linear in parameters The residuals are normally distributed The residuals have constant variance The expected value of the residuals is always zero The residuals are independent from one another The X values are precise The independent variables are not too strongly collinear If these assumptions are satisfied, then OLS estimator is unbiased and has minimum variance of all unbiased estimators. How can we test these assumptions? If assumptions are violated, what does this do to our conclusions? how do we fix the problem? 11/14/2018 Quantitative Methods 1

26 The Model The first order linear model or a simple regression model,
y = dependent variable x = independent variable b0 = y-intercept b1 = slope of the line  = error variable x y b0 Run Rise b1 = Rise/Run b0 and b1 are unknown, therefore, are estimated from the data. 11/14/2018 Quantitative Methods 1

27 Least Squares Method To calculate the estimates of the coefficients that minimize the differences between the data points and the line, use the formulas: 11/14/2018 Quantitative Methods 1

28 Least Squares Method Now we define 11/14/2018 Quantitative Methods 1

29 Least Squares Method Then
The estimated simple linear regression equation that estimates the equation of the first order linear model is: 11/14/2018 Quantitative Methods 1

30 Error Variable: Required Conditions
The error e is a critical part of the regression model. Five requirements involving the distribution of e must be satisfied. The mean of e is zero: E(e) = 0. The standard deviation of e is a constant (se) for all values of x. The errors are independent. The errors are independent of the independent variable x. The probability distribution of e is normal. 11/14/2018 Quantitative Methods 1

31 Standard error of estimate
If se is small the errors tend to be close to zero (close to the mean error). Then, the model fits the data well. Therefore, we can, use se as a measure of the suitability of using a linear model. An unbiased estimator of se2 is given by se2 11/14/2018 Quantitative Methods 1

32 Assessing the Model The least squares method will produce a regression line whether or not there is a linear relationship between x and y. Consequently, it is important to assess how well the linear model fits the data. Several methods are used to assess the model: Testing and/or estimating the coefficients. Using descriptive measurements. 11/14/2018 Quantitative Methods 1

33 Outliers An outlier is an observation that is unusually small or large. Several possibilities need to be investigated when an outlier is observed: There was an error in recording the value. The point does not belong in the sample. The observation is valid. Identify outliers from the scatter diagram and remove them. 11/14/2018 Quantitative Methods 1

34 Practice Problem A car dealer wants to find the relationship between the odometer reading and the selling price of used cars. A random sample of 100 cars is selected, and the data recorded. Find the regression line. Independent variable x Dependent variable y 11/14/2018 Quantitative Methods 1

35 Solution We need to calculate several statistics first; where n = 100.
11/14/2018 Quantitative Methods 1

36 Coefficient of Determination
A measure of the Strength of the linear relationship between x and y. The larger the value of r2, the more the value of y depends in a linear way on the value of x. Amount of variation in y that is related to variation in x. Ratio of variation in y that is explained by the regression model divided by the total variation in y.

37 Coefficient of determination
To understand the significance of this coefficient note: The regression model Explained in part by Overall variability in y Remains, in part, unexplained The error 11/14/2018 Quantitative Methods 1

38 Coefficient of determination
When we want to measure the strength of the linear relationship, we use the coefficient of determination. 11/14/2018 Quantitative Methods 1

39 Conclusion Used scatter diagram to visualize relationship between two variables Learnt the use of correlation analysis Described the linear regression model Explained ordinary least-squares method in generating equation Learnt the limitations of regression and correlation analysis 11/14/2018 Quantitative Methods 1

40 Thank You ! 11/14/2018 Quantitative Methods 1


Download ppt "General Announcement ( )"

Similar presentations


Ads by Google