R EGRESSION S HRINKAGE AND S ELECTION VIA THE L ASSO Author: Robert Tibshirani Journal of the Royal Statistical Society 1996 Presentation: Tinglin Liu.

Slides:



Advertisements
Similar presentations
Modeling of Data. Basic Bayes theorem Bayes theorem relates the conditional probabilities of two events A, and B: A might be a hypothesis and B might.
Advertisements

Ordinary Least-Squares
Lecture 4. Linear Models for Regression
11-1 Empirical Models Many problems in engineering and science involve exploring the relationships between two or more variables. Regression analysis.
Managerial Economics in a Global Economy
Chapter Outline 3.1 Introduction
Edge Preserving Image Restoration using L1 norm
Notes Sample vs distribution “m” vs “µ” and “s” vs “σ” Bias/Variance Bias: Measures how much the learnt model is wrong disregarding noise Variance: Measures.
MS&E 211 Quadratic Programming Ashish Goel. A simple quadratic program Minimize (x 1 ) 2 Subject to: -x 1 + x 2 ≥ 3 -x 1 – x 2 ≥ -2.
Ridge Regression Population Characteristics and Carbon Emissions in China ( ) Q. Zhu and X. Peng (2012). “The Impacts of Population Change on Carbon.
Prediction with Regression
R OBERTO B ATTITI, M AURO B RUNATO. The LION Way: Machine Learning plus Intelligent Optimization. LIONlab, University of Trento, Italy, Feb 2014.
6-1 Introduction To Empirical Models 6-1 Introduction To Empirical Models.
Copyright (c) 2004 Brooks/Cole, a division of Thomson Learning, Inc. Chapter 13 Nonlinear and Multiple Regression.
Ch11 Curve Fitting Dr. Deshi Ye
Chapter 2: Lasso for linear models
© 2010 Pearson Prentice Hall. All rights reserved Least Squares Regression Models.
The Simple Linear Regression Model: Specification and Estimation
Linear Methods for Regression Dept. Computer Science & Engineering, Shanghai Jiao Tong University.
Linear Regression Models Based on Chapter 3 of Hastie, Tibshirani and Friedman Slides by David Madigan.
MAE 552 Heuristic Optimization
Ordinary least squares regression (OLS)
Simple Linear Regression Analysis
Objectives of Multiple Regression
Introduction to Linear Regression and Correlation Analysis
Regression Analysis Regression analysis is a statistical technique that is very useful for exploring the relationships between two or more variables (one.
CPE 619 Simple Linear Regression Models Aleksandar Milenković The LaCASA Laboratory Electrical and Computer Engineering Department The University of Alabama.
Simple Linear Regression Models
Course 12 Calibration. 1.Introduction In theoretic discussions, we have assumed: Camera is located at the origin of coordinate system of scene.
MTH 161: Introduction To Statistics
CS Statistical Machine learning Lecture 18 Yuan (Alan) Qi Purdue CS Oct
Business Statistics: A Decision-Making Approach, 6e © 2005 Prentice-Hall, Inc. Chap 13-1 Introduction to Regression Analysis Regression analysis is used.
Geology 5670/6670 Inverse Theory 21 Jan 2015 © A.R. Lowry 2015 Read for Fri 23 Jan: Menke Ch 3 (39-68) Last time: Ordinary Least Squares Inversion Ordinary.
Data Modeling Patrice Koehl Department of Biological Sciences National University of Singapore
CpSc 881: Machine Learning
Psychology 202a Advanced Psychological Statistics October 22, 2015.
Curve Fitting Introduction Least-Squares Regression Linear Regression Polynomial Regression Multiple Linear Regression Today’s class Numerical Methods.
Ridge Regression: Biased Estimation for Nonorthogonal Problems by A.E. Hoerl and R.W. Kennard Regression Shrinkage and Selection via the Lasso by Robert.
© 2011 Cengage Learning. All Rights Reserved. May not be copied, scanned, or duplicated, in whole or in part, except for use as permitted in a license.
Geology 5670/6670 Inverse Theory 16 Mar 2015 © A.R. Lowry 2015 Last time: Review of Inverse Assignment 1 Expect Assignment 2 on Wed or Fri of this week!
Multiple Regression David A. Kenny January 12, 2014.
Intro to Statistics for the Behavioral Sciences PSYC 1900 Lecture 7: Regression.
Regularized Least-Squares and Convex Optimization.
1 AAEC 4302 ADVANCED STATISTICAL METHODS IN AGRICULTURAL RESEARCH Part II: Theory and Estimation of Regression Models Chapter 5: Simple Regression Theory.
Statistics 350 Lecture 2. Today Last Day: Section Today: Section 1.6 Homework #1: Chapter 1 Problems (page 33-38): 2, 5, 6, 7, 22, 26, 33, 34,
LECTURE 13: LINEAR MODEL SELECTION PT. 3 March 9, 2016 SDS 293 Machine Learning.
Bivariate Regression. Bivariate Regression analyzes the relationship between two variables. Bivariate Regression analyzes the relationship between two.
I. Statistical Methods for Genome-Enabled Prediction of Complex Traits OUTLINE THE CHALLENGES OF PREDICTING COMPLEX TRAITS ORDINARY LEAST SQUARES (OLS)
Estimating standard error using bootstrap
The simple linear regression model and parameter estimation
Chapter 4: Basic Estimation Techniques
Chapter 7. Classification and Prediction
Probability Theory and Parameter Estimation I
Basic Estimation Techniques
Boosting and Additive Trees (2)
Generalized regression techniques
CSE 4705 Artificial Intelligence
Multiple Regression.
Chapter 12: Regression Diagnostics
Chapter 3: TWO-VARIABLE REGRESSION MODEL: The problem of Estimation
Roberto Battiti, Mauro Brunato
Basic Estimation Techniques
What is Regression Analysis?
Linear Model Selection and regularization
Biointelligence Laboratory, Seoul National University
Simple Linear Regression
Generally Discriminant Analysis
Lecture 8: Image alignment
Sparse Principal Component Analysis
Introduction to Regression
Presentation transcript:

R EGRESSION S HRINKAGE AND S ELECTION VIA THE L ASSO Author: Robert Tibshirani Journal of the Royal Statistical Society 1996 Presentation: Tinglin Liu Oct

O UTLINE What’s the Lasso? Why should we use the Lasso? Why will the results of Lasso be sparse? How to find the Lasso solutions? 2

O UTLINE What’s the Lasso? Why should we use the Lasso? Why will the results of Lasso be sparse? How to find the Lasso solutions? 3

L ASSO (L EAST A BSOLUTE S HRINKAGE AND S ELECTION O PERATOR ) Definition It’s a coefficients shrunken version of the ordinary Least Square Estimate, by minimizing the Residual Sum of Squares subjecting to the constraint that the sum of the absolute value of the coefficients should be no greater than a constant. 4

L ASSO (L EAST A BSOLUTE S HRINKAGE AND S ELECTION O PERATOR ) Features Equivalent to the Classic Expression of Sparse Coding Here, the standardization is required to make the and normalize every predictor variable. Murray, W., Gill, P. and Wright, M.(1981) Practical Optimization. Chapter 5. Academic Press 5

L ASSO (L EAST A BSOLUTE S HRINKAGE AND S ELECTION O PERATOR ) Features Sparse Solutions Let be the full least square estimates and Value will cause the shrinkage Let as the scaled Lasso parameter 6

L ASSO (L EAST A BSOLUTE S HRINKAGE AND S ELECTION O PERATOR ) Features Lasso as Bayes Estimate Assume that, and has the double exponential probability distribution as: Then, we can derive the lasso regression estimate as the Bayes posterior mode. 7

O UTLINE What’s the Lasso? Why should we use the Lasso? Why will the results of Lasso be sparse? How to find the Lasso solutions? 8

W HY L ASSO ? Prediction Accuracy Assume, and, then the prediction error of the estimate is OLS estimates often have low bias but large variance, the Lasso can improve the overall prediction accuracy by sacrifice a little bias to reduce the variance of the predicted value. 9

W HY L ASSO ? Interpretation In many cases, the response is determined by just a small subset of the predictor variables. 10

O UTLINE What’s the Lasso? Why should we use the Lasso? Why will the results of Lasso be sparse? How to find the Lasso solutions? 11

W HY S PARSE ? 12 Geometry of Lasso The criterion equals to the quadratic function as: This function is expressed as the elliptical contour centered at the OLS estimates. The L1 norm constraints is expressed as the square centered at the origin. The Lasso solution is the first place where the contour touches the square.

W HY S PARSE ? 13

W HY S PARSE ? 14 Geometry of Lasso Since the variables are standardized, the principal axes of the contours are at to the co-ordinate axes. The correlations between the variables can influence the axis length of the elliptical contours, but have almost no influence upon the solution of the Lasso solutions.

O UTLINE What’s the Lasso? Why should we use the Lasso? Why will the results of Lasso be sparse? How to find the Lasso solutions? 15

H OW TO SOLVE THE PROBLEM ? The absolute inequality constraints can be translated into inequality constraints. ( p stands for the number of predictor variables ) Where is an matrix, corresponding to linear inequality constraints. But direct application of this procedure is not practical due to the fact that may be very large. 16 Lawson, C. and Hansen, R. (1974) Solving Least Squares Problems. Prentice Hall.

H OW TO SOLVE THE PROBLEM ? Outline of the Algorithm Sequentially introduce the inequality constraints In practice, the average iteration steps required is in the range of (0.5p, 0,75p), so the algorithm is acceptable. 17 Lawson, C. and Hansen, R. (1974) Solving Least Squares Problems. Prentice Hall.

18