Presentation is loading. Please wait.

Presentation is loading. Please wait.

Stat 324 – Day 25 Penalized Regression.

Similar presentations


Presentation on theme: "Stat 324 – Day 25 Penalized Regression."— Presentation transcript:

1 Stat 324 – Day 25 Penalized Regression

2 Last Time - Variable selection
Want to find the combination of variables that explains the most variability in the simplest possible model Look for variables that explain a higher percentage of the remaining unexplained variation (partial correlation coefficients) Can use automated procedures … with caution

3 Principal components Example: Have ranked communities on 9 variables. What best distinguishes the communities? Climate and Terrain (higher scores are better) Housing (lower scores are better) Health Care & the Environment (higher) Crime (lower scores are better) Transportation (higher) Education (higher) The Arts (higher) Recreation (higher) Economics (higher)

4 Example The first principal component formula:
Could then be used as an explanatory variable in a regression model to predict rating Second component can also be used with the bonus of being orthogonal to the first *probably should standardize first

5 Example Here is how the original variable correlate with the first three principal components Five variables have a strong correlation with PC1 (communities with better housing tend to have better health etc.) PC1 is really about quality of arts PC2 is about health PC3 suggests places with high crime tend to also have better recreation facilities

6 Stepwise Regression (Mixed)

7 Best Subsets

8 Last Time

9 Last Time: AIC vs. BIC AIC BIC tyer: 311.1 tiyer: 311.9 typer: 312.7
tiyper: 313.9 tyer: 322.4 te: 322.7 tye: 324.2 ter: 324.6 The idea behind these measures is similar but BIC has a larger penalty for number of variables so tends to be a bit more conservative (often choosing smaller, less complex models)

10 Other notes Insignificant terms
Doesn’t really hurt to leave them in the model as long as you clarify that they are not significant vs. Parsimony, R2adj Could keep in by request of subject matter expert or for sake of completeness (e.g., lower order terms of polynomial, set of indicator variables, indicators in presence of interactions)

11 Today Another method, developed to deal with multicollinearity, is increasingly popular as a form of variable selection as well

12 To Do Practice problem Wednesday/Thursday: Lab Assignment
Dr. Chance questions!


Download ppt "Stat 324 – Day 25 Penalized Regression."

Similar presentations


Ads by Google