Stat 324 – Day 25 Penalized Regression.

Slides:



Advertisements
Similar presentations
Chapter 5 Multiple Linear Regression
Advertisements

Items to consider - 3 Multicollinearity
Automated Regression Modeling Descriptive vs. Predictive Regression Models Four common automated modeling procedures Forward Modeling Backward Modeling.
MULTIPLE REGRESSION. OVERVIEW What Makes it Multiple? What Makes it Multiple? Additional Assumptions Additional Assumptions Methods of Entering Variables.
Regression Analysis. Unscheduled Maintenance Issue: l 36 flight squadrons l Each experiences unscheduled maintenance actions (UMAs) l UMAs costs $1000.
Statistics for Managers Using Microsoft® Excel 5th Edition
Statistics for Managers Using Microsoft® Excel 5th Edition
Linear Regression Models Based on Chapter 3 of Hastie, Tibshirani and Friedman Slides by David Madigan.
Lecture 11 Multivariate Regression A Case Study. Other topics: Multicollinearity  Assuming that all the regression assumptions hold how good are our.
Basic Business Statistics, 11e © 2009 Prentice-Hall, Inc. Chap 15-1 Chapter 15 Multiple Regression Model Building Basic Business Statistics 11 th Edition.
Chapter 15: Model Building
Stat 112: Lecture 9 Notes Homework 3: Due next Thursday
Chapter 7 Correlational Research Gay, Mills, and Airasian
Regression Model Building Setting: Possibly a large set of predictor variables (including interactions). Goal: Fit a parsimonious model that explains variation.
Smith/Davis (c) 2005 Prentice Hall Chapter Eight Correlation and Prediction PowerPoint Presentation created by Dr. Susan R. Burns Morningside College.
Copyright ©2011 Pearson Education 15-1 Chapter 15 Multiple Regression Model Building Statistics for Managers using Microsoft Excel 6 th Global Edition.
Correlation Nabaz N. Jabbar Near East University 25 Oct 2011.
Copyright ©2011 Pearson Education, Inc. publishing as Prentice Hall 15-1 Chapter 15 Multiple Regression Model Building Statistics for Managers using Microsoft.
Lecture 12 Model Building BMTRY 701 Biostatistical Methods II.
Lecture 22 Dustin Lueker.  The sample mean of the difference scores is an estimator for the difference between the population means  We can now use.
Slide 1 Estimating Performance Below the National Level Applying Simulation Methods to TIMSS Fourth Annual IES Research Conference Dan Sherman, Ph.D. American.
Soc 3306a Lecture 9: Multivariate 2 More on Multiple Regression: Building a Model and Interpreting Coefficients.
Chapter 17 Partial Correlation and Multiple Regression and Correlation.
Multiple Linear Regression. Purpose To analyze the relationship between a single dependent variable and several independent variables.
+ The Practice of Statistics, 4 th edition – For AP* STARNES, YATES, MOORE Chapter 4: Designing Studies Section 4.2 Experiments.
CHAPTER 9: Producing Data: Experiments. Chapter 9 Concepts 2  Observation vs. Experiment  Subjects, Factors, Treatments  How to Experiment Badly 
Chapter 16 Data Analysis: Testing for Associations.
Statistics for Managers Using Microsoft Excel, 4e © 2004 Prentice-Hall, Inc. Chap 14-1 Chapter 14 Multiple Regression Model Building Statistics for Managers.
Dummy Variables; Multiple Regression July 21, 2008 Ivan Katchanovski, Ph.D. POL 242Y-Y.
Copyright © 2012 by Nelson Education Limited. Chapter 14 Partial Correlation and Multiple Regression and Correlation 14-1.
Roger B. Hammer Assistant Professor Department of Sociology Oregon State University Conducting Social Research Specification: Choosing the Independent.
Basic Business Statistics, 10e © 2006 Prentice-Hall, Inc. Chap 15-1 Chapter 15 Multiple Regression Model Building Basic Business Statistics 10 th Edition.
Statistics for Managers Using Microsoft Excel, 4e © 2004 Prentice-Hall, Inc. Chap 14-1 Chapter 14 Multiple Regression Model Building Statistics for Managers.
1 Statistics 262: Intermediate Biostatistics Regression Models for longitudinal data: Mixed Models.
The Principal Components Regression Method David C. Garen, Ph.D. Hydrologist USDA Natural Resources Conservation Service National Water and Climate Center.
2011 Data Mining Industrial & Information Systems Engineering Pilsung Kang Industrial & Information Systems Engineering Seoul National University of Science.
Chapter 12 REGRESSION DIAGNOSTICS AND CANONICAL CORRELATION.
Yandell – Econ 216 Chap 15-1 Chapter 15 Multiple Regression Model Building.
Stats Methods at IC Lecture 3: Regression.
Chapter 15 Multiple Regression Model Building
CHAPTER 4 Designing Studies
Chapter 9 Multiple Linear Regression
CHAPTER 4 Designing Studies
Factor analysis Advanced Quantitative Research Methods
Multiple Regression Analysis and Model Building
BUSI 410 Business Analytics
Eco 6380 Predictive Analytics For Economists Spring 2014
CHAPTER 4 Designing Studies
Stat 414 – Day 19.
Descriptive Statistics vs. Factor Analysis
Stat 324 – Day 28 Model Validation (Ch. 11).
STA 291 Summer 2008 Lecture 23 Dustin Lueker.
Linear Model Selection and regularization
CHAPTER 4 Designing Studies
Lecture 12 Model Building
Chapter 4: Designing Studies
Statistical Reasoning December 8, 2015 Chapter 6.2
Correlation and Regression
Combined predictor Selection for Multiple Clinical Outcomes Using PHREG Grisell Diaz-Ramirez.
CHAPTER 4 Designing Studies
CHAPTER 4 Designing Studies
Chapter 13 Additional Topics in Regression Analysis
CHAPTER 4 Designing Studies
CHAPTER 4 Designing Studies
Chapter 4: Designing Studies
CHAPTER 4 Designing Studies
STA 291 Spring 2008 Lecture 23 Dustin Lueker.
CHAPTER 4 Designing Studies
CHAPTER 4 Designing Studies
10/28/ B Experimental Design.
Presentation transcript:

Stat 324 – Day 25 Penalized Regression

Last Time - Variable selection Want to find the combination of variables that explains the most variability in the simplest possible model Look for variables that explain a higher percentage of the remaining unexplained variation (partial correlation coefficients) Can use automated procedures … with caution

Principal components Example: Have ranked communities on 9 variables. What best distinguishes the communities? Climate and Terrain (higher scores are better) Housing (lower scores are better) Health Care & the Environment (higher) Crime (lower scores are better) Transportation (higher) Education (higher) The Arts (higher) Recreation (higher) Economics (higher) https://onlinecourses.science.psu.edu/stat505/node/53

Example The first principal component formula: Could then be used as an explanatory variable in a regression model to predict rating Second component can also be used with the bonus of being orthogonal to the first *probably should standardize first

Example Here is how the original variable correlate with the first three principal components Five variables have a strong correlation with PC1 (communities with better housing tend to have better health etc.) PC1 is really about quality of arts PC2 is about health PC3 suggests places with high crime tend to also have better recreation facilities

Stepwise Regression (Mixed)

Best Subsets

Last Time

Last Time: AIC vs. BIC AIC BIC tyer: 311.1 tiyer: 311.9 typer: 312.7 tiyper: 313.9 tyer: 322.4 te: 322.7 tye: 324.2 ter: 324.6 The idea behind these measures is similar but BIC has a larger penalty for number of variables so tends to be a bit more conservative (often choosing smaller, less complex models)

Other notes Insignificant terms Doesn’t really hurt to leave them in the model as long as you clarify that they are not significant vs. Parsimony, R2adj Could keep in by request of subject matter expert or for sake of completeness (e.g., lower order terms of polynomial, set of indicator variables, indicators in presence of interactions)

Today Another method, developed to deal with multicollinearity, is increasingly popular as a form of variable selection as well

To Do Practice problem Wednesday/Thursday: Lab Assignment Email Dr. Chance questions!