Auto-tuned high-dimensional regression with the TREX: theoretical guarantees and non-convex global optimization Jacob Bien 1, Irina Gaynanova 1, Johannes.

Slides:

Advertisements

Similar presentations

Chapter 0 Review of Algebra.

Advertisements

Nonnegative Matrix Factorization with Sparseness Constraints S. Race MA591R.

Ch 7.6: Complex Eigenvalues

Regularized risk minimization

Factorial Mixture of Gaussians and the Marginal Independence Model Ricardo Silva Joint work-in-progress with Zoubin Ghahramani.

Prediction with Regression

RECITATION 1 APRIL 14 Lasso Smoothing Parameter Selection Splines.

Exact or stable image\signal reconstruction from incomplete information Project guide: Dr. Pradeep Sen UNM (Abq) Submitted by: Nitesh Agarwal IIT Roorkee.

Wangmeng Zuo, Deyu Meng, Lei Zhang, Xiangchu Feng, David Zhang

Easy Optimization Problems, Relaxation, Local Processing for a small subset of variables.

Chapter 2: Lasso for linear models

Fuzzy Support Vector Machines (FSVMs) Weijia Wang, Huanren Zhang, Vijendra Purohit, Aditi Gupta.

Martin Burger Institut für Numerische und Angewandte Mathematik European Institute for Molecular Imaging CeNoS Total Variation and related Methods Numerical.

Proximal Support Vector Machine Classifiers KDD 2001 San Francisco August 26-29, 2001 Glenn Fung & Olvi Mangasarian Data Mining Institute University of.

Support Vector Machines Kernel Machines

Lecture outline Support vector machines. Support Vector Machines Find a linear hyperplane (decision boundary) that will separate the data.

Mehdi Ghayoumi MSB rm 132 Ofc hr: Thur, a Machine Learning.

On L1q Regularized Regression Authors: Han Liu and Jian Zhang Presented by Jun Liu.

Blitz: A Principled Meta-Algorithm for Scaling Sparse Optimization Tyler B. Johnson and Carlos Guestrin University of Washington.

An Introduction to Optimization Theory. Outline Introduction Unconstrained optimization problem Constrained optimization problem.

Javad Lavaei Department of Electrical Engineering Columbia University Joint work with Somayeh Sojoudi Convexification of Optimal Power Flow Problem by.

UNCONSTRAINED MULTIVARIABLE

Feng Lu Chuan Heng Foh, Jianfei Cai and Liang- Tien Chia Information Theory, ISIT IEEE International Symposium on LT Codes Decoding: Design.

1 Decentralized Jointly Sparse Optimization by Reweighted Lq Minimization Qing Ling Department of Automation University of Science and Technology of China.

PATTERN RECOGNITION AND MACHINE LEARNING

CSC2535: Computation in Neural Networks Lecture 11: Conditional Random Fields Geoffrey Hinton.

Scalable training of L1-regularized log-linear models

Optimization in Engineering Design Georgia Institute of Technology Systems Realization Laboratory 101 Quasi-Newton Methods.

BrainStorming 樊艳波 Outline Several papers on icml15 & cvpr15 PALM Information Theory Learning.

Learning With Structured Sparsity

Proximal Support Vector Machine Classifiers KDD 2001 San Francisco August 26-29, 2001 Glenn Fung & Olvi Mangasarian Data Mining Institute University of.

Stochastic Subgradient Approach for Solving Linear Support Vector Machines Jan Rupnik Jozef Stefan Institute.

CS Statistical Machine learning Lecture 18 Yuan (Alan) Qi Purdue CS Oct

Center for Evolutionary Functional Genomics Large-Scale Sparse Logistic Regression Jieping Ye Arizona State University Joint work with Jun Liu and Jianhui.

The Group Lasso for Logistic Regression Lukas Meier, Sara van de Geer and Peter Bühlmann Presenter: Lu Ren ECE Dept., Duke University Sept. 19, 2008.

Mathematical Models & Optimization?

Javad Lavaei Department of Electrical Engineering Columbia University Convex Relaxation for Polynomial Optimization: Application to Power Systems and Decentralized.

Regularization and Feature Selection in Least-Squares Temporal Difference Learning J. Zico Kolter and Andrew Y. Ng Computer Science Department Stanford.

Biointelligence Laboratory, Seoul National University

R EGRESSION S HRINKAGE AND S ELECTION VIA THE L ASSO Author: Robert Tibshirani Journal of the Royal Statistical Society 1996 Presentation: Tinglin Liu.

Introduction to Semidefinite Programs Masakazu Kojima Semidefinite Programming and Its Applications Institute for Mathematical Sciences National University.

Network Lasso: Clustering and Optimization in Large Graphs

Ariadna Quattoni Xavier Carreras An Efficient Projection for l 1,∞ Regularization Michael Collins Trevor Darrell MIT CSAIL.

Topics: Topic 1: Solving Linear Equations Topic 2: Solving Quadratic Equations Topic 3: Solving Proportions involving linear and quadratic functions. Topic.

2.1 – Linear and Quadratic Equations Linear Equations.

Recitation4 for BigData Jay Gu Feb LASSO and Coordinate Descent.

Ridge Regression: Biased Estimation for Nonorthogonal Problems by A.E. Hoerl and R.W. Kennard Regression Shrinkage and Selection via the Lasso by Robert.

Logistic Regression & Elastic Net

An Efficient Algorithm for a Class of Fused Lasso Problems Jun Liu, Lei Yuan, and Jieping Ye Computer Science and Engineering The Biodesign Institute Arizona.

Appendix A.6 Solving Inequalities. Introduction Solve an inequality  Finding all values of x for which the inequality is true. The set of all real numbers.

Data Mining Lectures Lecture 7: Regression Padhraic Smyth, UC Irvine ICS 278: Data Mining Lecture 7: Regression Algorithms Padhraic Smyth Department of.

Regularized Least-Squares and Convex Optimization.

Dimension reduction (2) EDR space Sliced inverse regression Multi-dimensional LDA Partial Least Squares Network Component analysis.

Simultaneous Equations 1

StingyCD: Safely Avoiding Wasteful Updates in Coordinate Descent

Jeremy Watt and Aggelos Katsaggelos Northwestern University

Multiplicative updates for L1-regularized regression

Solver & Optimization Problems

Chapter 0 Review of Algebra.

CH 5: Multivariate Methods

Nonnegative polynomials and applications to learning

Basic Algorithms Christina Gallner

USPACOR: Universal Sparsity-Controlling Outlier Rejection

5-2 Direct Variation What is Direct Variation?

A quadratic equation is written in the Standard Form,

Polynomial DC decompositions

Nuclear Norm Heuristic for Rank Minimization

Basic Concepts of Optimization

CS639: Data Management for Data Science

CIS 700: “algorithms for Big Data”

Presentation transcript:

Auto-tuned high-dimensional regression with the TREX: theoretical guarantees and non-convex global optimization Jacob Bien 1, Irina Gaynanova 1, Johannes Lederer 1, Christian L. Müller 2,3 1 Cornell University, Ithaca 2 New York University, 3 Simons Center for Data Analysis, New York

High-dimensional variable selection in linear regression

Standard approach: The LASSO (Tibshirani, 1996) Novel proposition: The TREX (Lederer and M., AAAI 2015) + convex optimization problem + good statistical properties - Tuning of regularization parameter required + good statistical properties + Tuning-free method - non-convex optimization problem BUT…

The non-convex TREX objective function can be globally optimally solved by using Second Order Cone Programming. 1.) The TREX (with e.g. constant a=0.5) can be written as:

2.) For each index j this leads to a pair of problem of the form: 3.) or, in general, 2p problems of the quadratic over linear form: Each problem is a Second-Order Cone Program!

Phase transition of exact recovery with the TREX and the LASSO Figure 1: Success probability P[S±(β) = S±(β ∗ )] of obtaining the correct signed support versus the rescaled sample size θ(n, p, k) = n/[2k log(p − k)] for problem size p=64 with sparsity k = ⌈ 0.20 p 0.75 ⌉. The number of repetitions is 50. The optimal a=0.5 in TREX. The lambda in LASSO is automatically determined by MATLAB. Variable selection using the function gap property (Fun TREX) is shown in red. p=64 k=0.2p 0.75 Rescaled sample size Θ(n,p,k)=n/2 k log(p-k) Prob. of EXACT recovery

β TREX f TREX Solution space Δf β TREX f TREX β TREX f TREX Solution space Increasing n The topology of the objective function can be used as an alternative variable selection method. Sketching the topology of the TREX Consider the case where data (p>>n) are generated from a linear model with a sparse β vector with k<<p non-zero entries of equal absolute value.

How can we scale the TREX to BIG DATA? Current solvers for SOCP + ECOS solver (Interior-Point method) + SCS solver (ADMM scheme) Current solvers for local minimization of non-convex TREX function (smooth-non-convex + L1) + Projected scaled sub-gradient method (Mark Schmidt’s code) + Orthant-wise L-BFGS + Proximal gradient (Jason Lee’s package) ANY IDEA HOW TO SPEED THINGS UP?