SESSION Last Update 17 th June 2011 Regression
Lecturer:Florian Boehlandt University:University of Stellenbosch Business School Domain: analysis.net/pages/vega.php
Learning Objectives 1.XY-Scatter Diagrams 2.Plotting the Regression Line 3.Coefficient Estimates 4.Pearson Coefficient of Correlation 5.Spearman Rank Correlation Coefficient
XY-Scatter Diagram To draw a scatter diagram we need data for two variables. In applications where one variable depends to some degree on the other variable, the dependent variable is labeled Y and the other, called the independent variable, X. The values for X and Y are combined into a single data point using the observations for X and Y as coordinates.
Example Temperature - Truck TempTrucks Obsxy
Regression Analysis Regression analysis is used to predict the value of one variable on the basis of the other variables. The first-order linear model describes the relationship between the dependent variable Y and the independent variable(s) X. The regression model with a as the y-intercept and m as the slope coefficient is of the form:
Example Temperature - Truck TempTrucks Obsxy The estimators of the intercept a and slope coefficient b are based on drawing a straight line through the sample data:
Intercept and Slope The intercept a is the y-coordinate of the point where the linear function intersects the y-axis. The slope coefficient b is defined as the change in y for a unit change in x.
Fitted Line With Residuals The line drawn through the point is called the regression line.
Residuals Squared The regression or least square line represents a line that minimizes the sum of the squared differences between the points and the line.
Calculating Coefficients Raw Data (y-variable as dependent and x as independent variable): TempTrucks Obsxy
Solution TempTrucks Obsxyxyx^ Total Step1: Calculate the gradient (beta):
Solution TempTrucks Obsxyxyx^ Total Step 2: Calculate the intercept (alpha):
Interpreting the Coefficients The slope coefficient b may be interpreted as the change in the dependent variable y for a one unit change in x. In the previous example, a one unit change in temperature results in a b = additional truckloads of cool drinks sold. The intercept a is the point at which the regression line and the y-axis intersect. If x = 0 lies far outside the range of sample values x, the interpretation of the intercept is not straight- forward. In the temperature-truck example, x = 0 lies outside the smallest and largest values for x in the sample. Interpreting the intercept for x would imply that at temperature of x = 0, the soft-drink sales decline to negative 3.914!
Point Prediction Upon obtaining the coefficient estimates we can predict the outcome for various x (point prediction) between the minimum and maximum sample observation using the regression function y = a + mx. For example: x = 16 degrees?y = *16y = ≈ 7 truckloads X = 32 degrees?y = *32y = ≈ 17 truckloads
Pearson Coefficient of Correlation The Pearson coefficient of correlation R may be used to test for linear association between variables. The coefficient is useful to determine whether or not a linear relationship exists between y and x. Note that variables may be positively or negatively correlated. R = 1 denotes perfect positive correlation, R = -1 signifies perfect negative correlation. R is defined for:
Type of Relationship DIRECT LINEAR RELATIONSHIP Small Dispersion Wide Dispersion INVERSE LINEAR RELATIONSHIP Small DispersionWide Dispersion NO LINEAR RELATIONSHIP Positive Linear Correlation exists 0 < r <+ 1 Negative Linear Correlation exists -1 < r < 0 No Correlation r = 0
Coefficient of Determination Squaring the Pearson coefficient of correlation delivers the coefficient of determination R 2 in regression. It may be interpreted as the proportion of variation in the dependent variable y that is explained by the variation in the explanatory variable x. R 2 is a measure of strength of the linear relationship between y and x.
Solution Step 3: Calculate R and R 2 TempTrucks Obsxyxyx^2y^ Total
Spearman Rank Correlation The standard coefficient of correlation allows for determining whether there is evidence of a linear relationship between two interval variables. In case where the variables are ordinal, or, if both variables are interval, the normality requirement may not be satisfied. A nonparametric test statistic called Spearman Rank Correlation Coefficient may be used under the circumstances.
Objective: Comparing 2 Variables Nominal Chi-Square test of a contingency table Nominal Analyzing the relationship between two variables Ordinal Data type? Spearman Rank Correlation Population Distribution? Error is normal or x and y bivariate normal x and y not bivariate normal Simple linear regression
Example Ranking Business Aspect Manag ementStaff Brand Equity11 Financial Controls23 Customer Service32 Planning Systems46 Research & Development54 Company Morale67 Productivity75 Below there is a list of organizational strengths that were independently ranked by management and staff and the managing director wished to know how closely correlated were the assessments:
Calculating R S Ranking Business AspectObs Manage mentStaffdd^2 Brand Equity11100 Financial Controls2231 Customer Service33211 Planning Systems Research & Development55411 Company Morale6671 Productivity77524 Total12