Regression Analysis Simple Regression
y = mx + b y = a + bx
where: y dependent variable (value depends on x) a y-intercept (value of y when x = 0) b slope (rate of change in ratio of delta y divided by delta x) x independent variable
Assumptions Linearity Independence of Error Homoscedasticity Normality
Linearity The most fundamental assumption is that the model fits the situation [i.e.: the Y variable is linearly related to the value of the X variable].
Independence of Error The error (residual) is independent for each value of X. [Residual = observed - predicted]
Homoscedasticity The variation around the line of regression constant for all values of X.
Normality T he values of Y be normally distributed at each value of X.
Diagnostic Checking u Linearity u Independence u Examine scatter plot of residuals versus fitted [Y hat ] for evidence of nonlinearity u Plot residuals in time order and look for patterns
Diagnostic Checking u Homoscedasticity u Normality u Examine scatter plots of residuals versus fitted [Y hat ] and residuals vs time order and look for changing scatter. u Examine histogram of residuals. Look for departures from normal curve.
Goal Develop a statistical model that can predict the values of a dependent (response) variable based upon the values of the independent (explanatory) variable(s).
Goal
Simple Regression quantitative quantitative A statistical model that utilizes one quantitative independent variable “X” to predict the quantitative dependent variable “Y.”
Mini-Case Since a new housing complex is being developed in Carmichael, management is under pressure to open a new pie restaurant. Assuming that population and annual sales are related, a study was conducted to predict expected sales.
Mini-Case (Descartes Pie Restaurants)
Mini-Case u What preliminary conclusions can management draw from the data? u What could management expect sales to be if population of the new complex is approximately 18,000 people?
Scatter Diagrams u The values are plotted on a two- dimensional graph called a “scatter diagram.” u Each value is plotted at its X and Y coordinates.
Scatter Plot of Pieshop
Types of Models No relationship between X and Y Positive linear relationship Negative linear relationship
Method of Least Squares u The straight line that best fits the data. u Determine the straight line for which the differences between the actual values (Y) and the values that would be predicted from the fitted line of regression (Y-hat) are as small as possible.
Measures of Variation Explained Unexplained Total
Explained Variation Sum of Squares (Y hat - Y bar ) 2 due to Regression [SSR]
Unexplained Variation Sum of Squares (Y obs - Y hat ) 2 Error [SSE]
Total Variation Sum of Squares (Y obs - Y bar ) 2 Total [SST]
H0:H0: There is no linear relationship between the dependent variable and the explanatory variable
Hypotheses H 0 : = 0 H 1 : 0 or H 0 : No relationship exists H 1 : A relationship exists
Analysis of Variance for Regression
Standard Error of the Estimate s y.x - the measure of variability around the line of regression
Relationship When null hypothesis is rejected, a relationship between Y and X variables exists.
Coefficient of Determination R 2 measures the proportion of variation that is explained by the independent variable in the regression model. R 2 = SSR / SST
Confidence interval estimates »True mean YX »Individual Y-hat
Pieshop Forecasting
Coefficient of Sanity
Diagnostic Checking u H 0 retain or reject {Reject if p-value 0.05} u R 2 (larger is “better”) u s y.x (smaller is “better”)
Analysis of Variance for Regression for Pieshop
Coefficient of Determination R 2 = SSR / SST = % thus, percent of the variation in annual sales is explained by the population.
Standard Error of the Estimate s y.x = with SSE = 1,530.0
Regression Analysis [Simple Regression] *** End of Presentation *** Questions?