SUG London 2007 Least Angle Regression Translating the S-Plus/R Least Angle Regression package to Mata Adrian Mander MRC-Human Nutrition Research Unit,

Slides:



Advertisements
Similar presentations
Subspace Embeddings for the L1 norm with Applications Christian Sohler David Woodruff TU Dortmund IBM Almaden.
Advertisements

Ordinary Least-Squares
Lecture 4. Linear Models for Regression
Kin 304 Regression Linear Regression Least Sum of Squares
Chapter Outline 3.1 Introduction
Regression analysis Relating two data matrices/tables to each other Purpose: prediction and interpretation Y-data X-data.
Penalized Regression, Part 2
Ridge Regression Population Characteristics and Carbon Emissions in China ( ) Q. Zhu and X. Peng (2012). “The Impacts of Population Change on Carbon.
1 Transportation problem The transportation problem seeks the determination of a minimum cost transportation plan for a single commodity from a number.
Prediction with Regression
Econ 140 Lecture 81 Classical Regression II Lecture 8.
L.M. McMillin NOAA/NESDIS/ORA Regression Retrieval Overview Larry McMillin Climate Research and Applications Division National Environmental Satellite,
Section 4.6 (Rank).
LIAL HORNSBY SCHNEIDER
Maths for Computer Graphics
Coefficient Path Algorithms Karl Sjöstrand Informatics and Mathematical Modelling, DTU.
6 6.1 © 2012 Pearson Education, Inc. Orthogonality and Least Squares INNER PRODUCT, LENGTH, AND ORTHOGONALITY.
Linear Methods for Regression Dept. Computer Science & Engineering, Shanghai Jiao Tong University.
Linear Regression Models Based on Chapter 3 of Hastie, Tibshirani and Friedman Slides by David Madigan.
Canonical correlations
NOTES ON MULTIPLE REGRESSION USING MATRICES  Multiple Regression Tony E. Smith ESE 502: Spatial Data Analysis  Matrix Formulation of Regression  Applications.
Prediction Methods Mark J. van der Laan Division of Biostatistics U.C. Berkeley
Lasso regression. The Goals of Model Selection Model selection: Choosing the approximate best model by estimating the performance of various models Goals.
Linear Programming – Max Flow – Min Cut Orgad Keller.
6 6.1 © 2012 Pearson Education, Inc. Orthogonality and Least Squares INNER PRODUCT, LENGTH, AND ORTHOGONALITY.
Basics of regression analysis
Discussion of “Least Angle Regression” by Weisberg Mike Salwan November 2, 2006 Stat 882.
Subspaces, Basis, Dimension, Rank
Lecture 11 Chapter 6. Correlation and Linear Regression.
Separate multivariate observations
Structural Equation Modeling Continued: Lecture 2 Psy 524 Ainsworth.
Foundations of Computer Graphics (Fall 2012) CS 184, Lecture 2: Review of Basic Math
Presented By Wanchen Lu 2/25/2013
+ Review of Linear Algebra Optimization 1/14/10 Recitation Sivaraman Balakrishnan.
Shrinkage Estimation of Vector Autoregressive Models Pawin Siriprapanukul 11 January 2010.
Chapter 12 Multiple Linear Regression Doing it with more variables! More is better. Chapter 12A.
Section 3.6 – Solving Systems Using Matrices
Statistics and Linear Algebra (the real thing). Vector A vector is a rectangular arrangement of number in several rows and one column. A vector is denoted.
Research Methods I Lecture 10: Regression Analysis on SPSS.
Linear Systems and Augmented Matrices. What is an augmented matrix?  An augmented matrix is essentially two matrices put together.  In the case of a.
Ensemble Learning (1) Boosting Adaboost Boosting is an additive model
4.6: Rank. Definition: Let A be an mxn matrix. Then each row of A has n entries and can therefore be associated with a vector in The set of all linear.
Scientific Computing General Least Squares. Polynomial Least Squares Polynomial Least Squares: We assume that the class of functions is the class of all.
Multiple Regression INCM 9102 Quantitative Methods.
Estimation in Marginal Models (GEE and Robust Estimation)
R EGRESSION S HRINKAGE AND S ELECTION VIA THE L ASSO Author: Robert Tibshirani Journal of the Royal Statistical Society 1996 Presentation: Tinglin Liu.
Chapter 22: Building Multiple Regression Models Generalization of univariate linear regression models. One unit of data with a value of dependent variable.
CpSc 881: Machine Learning
Model Selection and Estimation in Regression with Grouped Variables.
Warm- Up Solve the following systems using elimination or substitution : 1. x + y = 6 -3x + y = x + 4y = 7 x + 2y = 7.
Psychology 202a Advanced Psychological Statistics October 22, 2015.
From OLS to Generalized Regression Chong Ho Yu (I am regressing)
Logistic Regression & Elastic Net
1 Representation. 2 Objectives Introduce concepts such as dimension and basis Introduce coordinate systems for representing vectors spaces and frames.
Using SPSS Note: The use of another statistical package such as Minitab is similar to using SPSS.
R Workshop #2 Basic Data Analysis. What we did last week: Understand the basics of how R works Generated objects (vectors, matrices, etc.) Read in data.
Université d’Ottawa / University of Ottawa 2001 Bio 8100s Applied Multivariate Biostatistics L11.1 Lecture 11: Canonical correlation analysis (CANCOR)
LECTURE 13: LINEAR MODEL SELECTION PT. 3 March 9, 2016 SDS 293 Machine Learning.
Boosting and Additive Trees (2)
Statistics in MSmcDESPOT
Linear Regression.
Lasso/LARS summary Nasimeh Asgarian.
4.6: Rank.
1.3 Vector Equations.
Linear Model Selection and regularization
Simple and Multiple Regression
Vectors and Matrices In MATLAB a vector can be defined as row vector or as a column vector. A vector of length n can be visualized as matrix of size 1xn.
Maths for Signals and Systems Linear Algebra in Engineering Lectures 4-5, Tuesday 18th October 2016 DR TANIA STATHAKI READER (ASSOCIATE PROFFESOR) IN.
Penalized Regression, Part 3
Regression and Correlation of Data
Presentation transcript:

SUG London 2007 Least Angle Regression Translating the S-Plus/R Least Angle Regression package to Mata Adrian Mander MRC-Human Nutrition Research Unit, Cambridge

SUG London 2007 Least Angle Regression Outline LARS package Lasso (the constrained OLS) Forward Stagewise regression Least Angle Regression Translating Hastie & Efron’s code from R to Mata The lars Stata command

SUG London 2007 Least Angle Regression Lasso Let y be the dependent variable and x j be the m covariates The usual linear predictor Want to minimise the squared differences N.B. Ridge regression does constraint on L2 norm Subject to this constraint, large t gives OLS solution

SUG London 2007 Least Angle Regression Lasso graphically The constraints can be seen below. One property of this constraint is that there will be coefficients =0 for a subset of variables

SUG London 2007 Least Angle Regression Ridge Regression The constraints can be seen below. The coefficients are shrunk but does not have the property of parsimony

SUG London 2007 Least Angle Regression Forward Stagewise Using constraints The function of current correlations is Move the mean in the direction of the greatest correlation for some small ε FORWARD STEPWISE is greedy and selects

SUG London 2007 Least Angle Regression Least Angle Regression The LARS (S suggesting LaSso and Stagewise) Starts like classic Forward Selection Find predictor x j1 most correlated with the current residual Make a step (epsilon) large enough until another predictor x j2 has as much correlation with the current residual LARS – now step in the direction equiangular between two predictors until x j3 earns its way into the “correlated set”

SUG London 2007 Least Angle Regression Least Angle Regression Geometrically Two covariates x 1 and x 2 and the space L(x 1,x 2 ) that is spanned by them μ0μ0 μ1μ1 x1x1 x2x2 x2x2 y1y1 y2y2 y 2 is the projection of y onto L(x 1,x 2 ) Start at μ 0 =0

SUG London 2007 Least Angle Regression Continued… The current correlations only depend on the projection of y on L(x 1,x 2 ) I.e. y 2

SUG London 2007 Least Angle Regression Programming similarities The code comparing Splus to Mata looks incredibly similar

SUG London 2007 Least Angle Regression Programming similarities There are some differences though Array of arrays… beta[[k]] = array Indexing on the left hand side… beta[positive] = beta0 Being able to “join” null matrices. Row and column vectors are not very strict in Splus. Being able to use the minus sign in indexing beta[-positive] “Local”-ness of mata functions within mata functions? Local is from the first call of Mata Not the easiest language to debug when you don’t know what you are doing (thanks to statalist/Kit to push start me).

SUG London 2007 Least Angle Regression Stata command LARS is very simple to use lars y, a(lar) lars y, a(lasso) lars y, a(stagewise) Not everything in the Splus package is implemented because I didn’t have all the data required to test all the code

SUG London 2007 Least Angle Regression Stata command

SUG London 2007 Least Angle Regression Graph output

SUG London 2007 Least Angle Regression Conclusions Mata could be a little easier to use Translating Splus code is pretty simple Least Angle Regression/Lasso/Forward Stagewise are all very attractive algorithms and certainly an improvement over Stepwise.