Presentation is loading. Please wait.

Presentation is loading. Please wait.

Simple Linear Regression (SLR)

Similar presentations


Presentation on theme: "Simple Linear Regression (SLR)"— Presentation transcript:

1 Simple Linear Regression (SLR)
CHE1147 Saed Sayad University of Toronto

2 Types of Correlation Positive correlation Negative correlation
No correlation

3 Simple linear regression describes the
linear relationship between a predictor variable, plotted on the x-axis, and a response variable, plotted on the y-axis dependent Variable (Y) Independent Variable (X)

4 Y 1.0 X

5 Y 1.0 X

6 Y X

7 ε Y ε X

8 Fitting data to a linear model
Observations are measure in a bivariate way intercept slope residuals

9 How to fit data to a linear model?
The Ordinary Least Square Method (OLS)

10 Least Squares Regression
Model line: Residual (ε) = Sum of squares of residuals = we must find values of and that minimise

11 Regression Coefficients

12 Required Statistics

13 Descriptive Statistics

14 Regression Statistics

15 explained by predictors
Variance to be explained by predictors (SST) Y

16 X1 Variance explained by X1 (SSR) Y Variance NOT explained by X1 (SSE)

17 Regression Statistics

18 Coefficient of Determination
Regression Statistics Coefficient of Determination to judge the adequacy of the regression model

19 Regression Statistics
Correlation measures the strength of the linear association between two variables.

20 Regression Statistics
Standard Error for the regression model

21 ANOVA to test significance of regression
df SS MS F P-value Regression 1 SSR SSR / df MSR / MSE P(F) Residual n-2 SSE SSE / df Total n-1 SST If P(F)<a then we know that we get significantly better prediction of Y from the regression model than by just predicting mean of Y. ANOVA to test significance of regression

22 Hypothesis Tests for Regression Coefficients

23 Hypotheses Tests for Regression Coefficients

24 Confidence Interval for b1
Confidence Interval on Regression Coefficients Confidence Interval for b1

25 Hypothesis Tests on Regression Coefficients

26 Confidence Interval for the intercept
Confidence Interval on Regression Coefficients Confidence Interval for the intercept

27 We would reject the null hypothesis if
Hypotheses Test the Correlation Coefficient We would reject the null hypothesis if

28 Diagnostic Tests For Regressions
Expected distribution of residuals for a linear model with normal distribution or residuals (errors).

29 Diagnostic Tests For Regressions
Residuals for a non-linear fit

30 Diagnostic Tests For Regressions
Residuals for a quadratic function or polynomial

31 Diagnostic Tests For Regressions
Residuals are not homogeneous (increasing in variance)

32 Regression – important points
Ensure that the range of values sampled for the predictor variable is large enough to capture the full range to responses by the response variable.

33 Y X Y X

34 Regression – important points
2. Ensure that the distribution of predictor values is approximately uniform within the sampled range.

35 Y X Y X

36 Assumptions of Regression
1. The linear model correctly describes the functional relationship between X and Y.

37 Assumptions of Regression
1. The linear model correctly describes the functional relationship between X and Y. Y X

38 Assumptions of Regression
2. The X variable is measured without error Y X

39 Assumptions of Regression
3. For any given value of X, the sampled Y values are independent 4. Residuals (errors) are normally distributed. 5. Variances are constant along the regression line.

40 Multiple Linear Regression (MLR)

41 The linear model with a single
predictor variable X can easily be extended to two or more predictor variables.

42 X2 X1 Y Common variance explained by X1 and X2
Unique variance explained by X2 X2 X1 Y Unique variance explained by X1 Variance NOT explained by X1 and X2

43 A “good” model X1 X2 Y

44 Partial Regression Coefficients
intercept residuals Partial Regression Coefficients (slopes): Regression coefficient of X after controlling for (holding all other predictors constant) influence of other variables from both X and Y.

45 Ordinary Least Square Intercept and Slopes: Predicted Values:
The matrix algebra of Ordinary Least Square Intercept and Slopes: Predicted Values: Residuals:

46 Regression Statistics
How good is our model?

47 Coefficient of Determination
Regression Statistics Coefficient of Determination to judge the adequacy of the regression model

48 Regression Statistics
n = sample size k = number of independent variables Adjusted R2 are not biased!

49 Regression Statistics
Standard Error for the regression model

50 ANOVA to test significance of regression
at least one! df SS MS F P-value Regression k SSR SSR / df MSR / MSE P(F) Residual n-k-1 SSE SSE / df Total n-1 SST If P(F)<a then we know that we get significantly better prediction of Y from the regression model than by just predicting mean of Y. ANOVA to test significance of regression

51 Hypothesis Tests for Regression Coefficients

52 Hypotheses Tests for Regression Coefficients

53 Confidence Interval for bi
Confidence Interval on Regression Coefficients Confidence Interval for bi

54

55

56

57

58 Diagnostic Tests For Regressions
Expected distribution of residuals for a linear model with normal distribution or residuals (errors).

59 Standardized Residuals

60 Model Selection Avoiding predictors (Xs) that do not
contribute significantly to model prediction

61 Model Selection - Forward selection - Backward elimination
The ‘best’ predictor variables are entered, one by one. - Backward elimination The ‘worst’ predictor variables are eliminated, one by one.

62 Forward Selection

63 Backward Elimination

64 Model Selection: The General Case
Reject H0 if :

65 Multicolinearity The degree of correlation between Xs.
A high degree of multicolinearity produces unacceptable uncertainty (large variance) in regression coefficient estimates (i.e., large sampling variation) Imprecise estimates of slopes and even the signs of the coefficients may be misleading. t-tests which fail to reveal significant factors.

66 Scatter Plot

67 Multicolinearity If the F-test for significance of regression is significant, but tests on the individual regression coefficients are not, multicolinearity may be present. Variance Inflation Factors (VIFs) are very useful measures of multicolinearity. If any VIF exceed 5, multicolinearity is a problem.

68 Prediction Error Sum of Squares
Model Evaluation Prediction Error Sum of Squares (leave-one-out)

69 Thank You!


Download ppt "Simple Linear Regression (SLR)"

Similar presentations


Ads by Google