From OLS to Generalized Regression Chong Ho Yu (I am regressing)

Slides:



Advertisements
Similar presentations
Questions From Yesterday
Advertisements

Multiple Regression Analysis
Brief introduction on Logistic Regression
Regularization David Kauchak CS 451 – Fall 2013.
3.2 OLS Fitted Values and Residuals -after obtaining OLS estimates, we can then obtain fitted or predicted values for y: -given our actual and predicted.
Regression Analysis Module 3. Regression Regression is the attempt to explain the variation in a dependent variable using the variation in independent.
Ch11 Curve Fitting Dr. Deshi Ye
Instrumental Variables Estimation and Two Stage Least Square
Sociology 601 Class 17: October 28, 2009 Review (linear regression) –new terms and concepts –assumptions –reading regression computer outputs Correlation.
A Short Introduction to Curve Fitting and Regression by Brad Morantz
Class 16: Thursday, Nov. 4 Note: I will you some info on the final project this weekend and will discuss in class on Tuesday.
Classification and Prediction: Regression Via Gradient Descent Optimization Bamshad Mobasher DePaul University.
MULTIPLE REGRESSION. OVERVIEW What Makes it Multiple? What Makes it Multiple? Additional Assumptions Additional Assumptions Methods of Entering Variables.
Multipe and non-linear regression. What is what? Regression: One variable is considered dependent on the other(s) Correlation: No variables are considered.
N-way ANOVA. 3-way ANOVA 2 H 0 : The mean respiratory rate is the same for all species H 0 : The mean respiratory rate is the same for all temperatures.
Copyright (c) Bani K. Mallick1 STAT 651 Lecture #18.
Econ Prof. Buckles1 Multiple Regression Analysis y =  0 +  1 x 1 +  2 x  k x k + u 1. Estimation.
Econ Prof. Buckles1 Multiple Regression Analysis y =  0 +  1 x 1 +  2 x  k x k + u 4. Further Issues.
Regression Hal Varian 10 April What is regression? History Curve fitting v statistics Correlation and causation Statistical models Gauss-Markov.
Lecture 23: Tues., April 6 Interpretation of regression coefficients (handout) Inference for multiple regression.
FIN357 Li1 Multiple Regression Analysis y =  0 +  1 x 1 +  2 x  k x k + u 1. Estimation.
CHAPTER 4 ECONOMETRICS x x x x x Multiple Regression = more than one explanatory variable Independent variables are X 2 and X 3. Y i = B 1 + B 2 X 2i +
1Prof. Dr. Rainer Stachuletz Multiple Regression Analysis y =  0 +  1 x 1 +  2 x  k x k + u 4. Further Issues.
FIN357 Li1 Multiple Regression Analysis y =  0 +  1 x 1 +  2 x  k x k + u 1. Estimation.
So far, we have considered regression models with dummy variables of independent variables. In this lecture, we will study regression models whose dependent.
Lecture 24: Thurs., April 8th
Stat 112: Lecture 9 Notes Homework 3: Due next Thursday
Simple Linear Regression Analysis
Classification and Prediction: Regression Analysis
SW388R7 Data Analysis & Computers II Slide 1 Multiple Regression – Basic Relationships Purpose of multiple regression Different types of multiple regression.
Statistical hypothesis testing – Inferential statistics II. Testing for associations.
Classifiers, Part 3 Week 1, Video 5 Classification  There is something you want to predict (“the label”)  The thing you want to predict is categorical.
Objectives of Multiple Regression
3. Multiple Regression Analysis: Estimation -Although bivariate linear regressions are sometimes useful, they are often unrealistic -SLR.4, that all factors.
PATTERN RECOGNITION AND MACHINE LEARNING
Classification Part 3: Artificial Neural Networks
Chapter 9 Analyzing Data Multiple Variables. Basic Directions Review page 180 for basic directions on which way to proceed with your analysis Provides.
Multiple Regression The Basics. Multiple Regression (MR) Predicting one DV from a set of predictors, the DV should be interval/ratio or at least assumed.
Jeff Howbert Introduction to Machine Learning Winter Regression Linear Regression.
Multiple Linear Regression. Purpose To analyze the relationship between a single dependent variable and several independent variables.
Regression. Population Covariance and Correlation.
Estimation Kline Chapter 7 (skip , appendices)
1 Multiple Regression Analysis y =  0 +  1 x 1 +  2 x  k x k + u.
Lesson Multiple Regression Models. Objectives Obtain the correlation matrix Use technology to find a multiple regression equation Interpret the.
Ensemble Learning (1) Boosting Adaboost Boosting is an additive model
Chapter 16 Data Analysis: Testing for Associations.
Data Analysis.
Multiple Regression. Simple Regression in detail Y i = β o + β 1 x i + ε i Where Y => Dependent variable X => Independent variable β o => Model parameter.
SEM Basics 2 Byrne Chapter 2 Kline pg 7-15, 50-51, ,
Linear Regression Basics III Violating Assumptions Fin250f: Lecture 7.2 Spring 2010 Brooks, chapter 4(skim) 4.1-2, 4.4, 4.5, 4.7,
CpSc 881: Machine Learning
1 Prof. Dr. Rainer Stachuletz Multiple Regression Analysis y =  0 +  1 x 1 +  2 x  k x k + u 1. Estimation.
Parametric tests: Please treat them well Chong Ho Yu.
Copyright © 2011 by The McGraw-Hill Companies, Inc. All rights reserved. McGraw-Hill/Irwin Simple Linear Regression Analysis Chapter 13.
Psychology 202a Advanced Psychological Statistics October 22, 2015.
Chong Ho Yu.  Data mining (DM) is a cluster of techniques, including decision trees, artificial neural networks, and clustering, which has been employed.
Chapter 8 Relationships Among Variables. Outline What correlational research investigates Understanding the nature of correlation What the coefficient.
Chapter 12 REGRESSION DIAGNOSTICS AND CANONICAL CORRELATION.
Applied Regression Analysis BUSI 6220
From OLS to Generalized Regression
Regression Analysis Week 4.
Ungraded quiz Unit 5.
Simple Linear Regression
Simple Linear Regression
What is Regression Analysis?
Introduction to Predictive Modeling
Linear Model Selection and regularization
Introduction to Regression
Regression Part II.
Presentation transcript:

From OLS to Generalized Regression Chong Ho Yu (I am regressing)

OLS regression has a long history Ordinal least squares (OLS) regression, also known as standard least squares (SLS) regression, was discovered by Legendre (1805) and Gauss (1809) when our great grandparents were born.

OLS regression Least square = least square of residuals Residual = distance between actual and predicted Best fit

R square The purpose of simple regression is to find a relationship (but not the one in the picture below). When there are multiple predictors, the multiple- relationship is denoted by the R-square (variance explained).

Inflated variance explained Picture that the overlapping area between Y and Xs is the variance explained (multiple-relationship). When you put more and more Xs on Y, the circle of Y is almost fully covered. R-square =.89! Wow! Voila! Allelujah!

Useless model A student asked me how he could improve his grade. I told him that my fifty-variable regression model could predict almost 89% of test performance: study long hours, earn more money, buy a reliable car, watch less TV, browse more often on the Web, exercise more often, attend church more often, pray more often, go to fewer movies, play fewer video games, cut your hair more often, drink more milk and coffee...etc. This complicated model is useless!

Fitness In this example I want to use six variables to predict weight. The method is OLS regression.

Negative adjusted R-square! The R-square is.199. Not bad! This model can explain 20% of the weight variance. But when many predictors are used, the program used adjusted R-square to adjust the inflated R-square. It is negative! What is that?

How about all possible interactions?

100% R-square but biased? If I use all possible interactions, the R-square is 100%, but JMP cannot estimate the adjusted R- square, and every parameter estimate is biased. What is happening?

Problems of OLS regression Too many assumptions about the residuals and the predictors It tends to overfit to the sample. The model is unstable when some predictors are strongly correlated (collinearity) There is no unique solution with a large data set. It must be a linear model.

Generalized regression Also known as regularized regression (R2). Introduced by Friedman (2008) Similar to abduction or IBE: don't fix on one single answer, consider a few. There may be many solutions to solve the problem. Why not explore different paths? Start with no modeling or zero-coefficient. Try out a series of models. The solution is elastic (changeable). Pick the best (by the algorithm, not by you)!

Four alternatives in JMP Maximum likelihood (Classical) Penalized regression: give the model a penalty if it is too complicated or the fitness is inflated → Keep it simple, stupid (KISS)!  Lasso  Ridge  Elastic (use this)

Lasso Will zero out the regression coefficient → select variables by dropping some out. If there are too many predictors and too few observations (high p, low n), LASSO will saturate very fast (stop further selection of variables). When there are too many collinear predictors, LASSO select just one and ignore others.

Ridge Counter measure against collinearity & variance inflation: Shrinking the regression coefficients towards zero. But regression coefficients will not be zero. You may end up with all the coefficients or none. It controls the cancer cell, but won't remove it.

Elastic Adaptive, versatile It combines the penalties of the lasso and ridge approaches. Why not use the best method only?

Example Use multiple predictors to predict diabetics progression (Y).

The JMP's GR output looks like a standard regression output.

SPSS SPSS can also do regularized regression,

SPSS You can access this feature from: Analyze → Regression → Optimal scaling (CATREG) → regularized. Categorized regression: quantify categorical variables. It is harder to interpret the SPSS output.

Pros and cons Pros  It can solve the problem of collinearity.  It can avoid ovefitting.  It is the best of all possible paths. Cons  It is still a global model (one size fits all). Unlike hierarchical regression, it cannot discover local structures or specific solutions for special population segments.  It is still a linear model. What if the real relationship is non-linear?

Suggestions If your colleague or the reviewer wants a conventional solution (wants to see the term “regression”), use generalized regression. If there are many predictors and some are collinear, use GR. If the data structure has other problems in addition to collinearity, consider the decision tree and bootstrap forest (will be covered later). If the relationship is nonlinear, use artificial neural network (will be covered later).