Time Series Analysis – Chapter 2 Simple Regression Essentially, all models are wrong, but some are useful. - George Box Empirical Model-Building and Response Surfaces (1987), co-authored with Norman R. Draper, p. 424, ISBN George Box is the son-in-law of Sir Ronald Fisher.
Time Series Analysis – Chapter 2 Simple Regression Equation of a Line – Algebra Vs. Simple Regression – Statistics
Equation of a Line Example y = mx + b wage = 3.55educ – 33.8 y = wage in dollars per hour x = education in years completed Note: if I know how many years of education someone has completed I can predict their wage perfectly. Nothing else matters.
Simple Regression Example
Algebra vs. Statistics - Summary
StudentGPAACT
The Analysis of Variance Table Analysis of Variance Source DF SS MS F P Regression Residual Error Total
ANOVA Models can be evaluated by examining variability. There are three types of variability that are quantified. Overall or total variability present in the data (SST) Variability explained by the regression model (SSR) Error variability that is unexplained (SSE) SST = SSR + SSE
ANOVA The larger the regression variability (SSR) is compared to the error variability (SSE) the more evidence there is that the model is explanatory. Analysis of Variance Source DF SS MS F P Regression Residual Error Total
ANOVA – R 2 R 2 is the Coefficient of Determination R 2 = SSR/SST = 1 – SSE/SST TYPO on pg. 40!! R 2 is the percent of the variation in y (response variable) explained by x (explanatory variable). R-Sq = SSR/SST = / = 57.7%
ANOVA – r
ANOVA – R 2 vs. r R 2 always exists for simple regression and multiple regression and always has the same definition r only exists and makes sense for simple regression
Nobel Prize vs. # of McDonalds Explanatory variable is number of McDonalds a country has Response variable is number of Nobel Prizes that have been awarded that country.
Logs
Level – Level Model StudentGPAACT
Level – Log Model Dependent variable: y Independent variable: log(x) Not used in this chapter, discussed in future chapters.
Log – Level Model StudentGPAACTlog(GPA)
Log – Level Model
Level – Level Model
Log – Level Model
Is this still linear regression?
Log – Log Model StudentGPAACTlog(GPA)log(ACT)
Log – Log or Constant Elasticity Model
Simple Linear Regression Assumptions
SLR.2: The sample of size n used to estimate the model parameters is a random sample (sometimes called a simple random sample). What is the definition of a random sample?
Simple Linear Regression Assumptions SLR.3: The sample x values are not all the same value. OKNOT OK xy xy
Simple Linear Regression Assumptions
Ordinary Least Squares Estimators
Ordinary Least Squares Minimize the sum of the squared residuals.
Ordinary Least Squares StudentGPAACTRESI
Ordinary Least Squares