An Efficient Algorithm for a Class of Fused Lasso Problems Jun Liu, Lei Yuan, and Jieping Ye Computer Science and Engineering The Biodesign Institute Arizona.

Slides:



Advertisements
Similar presentations
Bayesian Belief Propagation
Advertisements

ICSA, 6/2007 Pei Wang, 1 Spatial Smoothing and Hot Spot Detection for CGH data using the Fused Lasso Pei Wang Cancer Prevention Research.
Accelerated, Parallel and PROXimal coordinate descent IPAM February 2014 APPROX Peter Richtárik (Joint work with Olivier Fercoq - arXiv: )
Regularization David Kauchak CS 451 – Fall 2013.
Optimization Tutorial
Multi-Task Compressive Sensing with Dirichlet Process Priors Yuting Qi 1, Dehong Liu 1, David Dunson 2, and Lawrence Carin 1 1 Department of Electrical.
Model-based clustering of gene expression data Ka Yee Yeung 1,Chris Fraley 2, Alejandro Murua 3, Adrian E. Raftery 2, and Walter L. Ruzzo 1 1 Department.
Model Assessment, Selection and Averaging
Chapter 2: Lasso for linear models
Middle Term Exam 03/01 (Thursday), take home, turn in at noon time of 03/02 (Friday)
Jun Zhu Dept. of Comp. Sci. & Tech., Tsinghua University This work was done when I was a visiting researcher at CMU. Joint.
Tumour karyotype Spectral karyotyping showing chromosomal aberrations in cancer cell lines.
Yanxin Shi 1, Fan Guo 1, Wei Wu 2, Eric P. Xing 1 GIMscan: A New Statistical Method for Analyzing Whole-Genome Array CGH Data RECOMB 2007 Presentation.
Coefficient Path Algorithms Karl Sjöstrand Informatics and Mathematical Modelling, DTU.
Predictive Automatic Relevance Determination by Expectation Propagation Yuan (Alan) Qi Thomas P. Minka Rosalind W. Picard Zoubin Ghahramani.
Algorithms for Smoothing Array CGH data
L15:Microarray analysis (Classification). The Biological Problem Two conditions that need to be differentiated, (Have different treatments). EX: ALL (Acute.
Reformulated - SVR as a Constrained Minimization Problem subject to n+1+2m variables and 2m constrains minimization problem Enlarge the problem size and.
Comparative Genomic Hybridization (CGH). Outline Introduction to gene copy numbers and CGH technology DNA copy number alterations in breast cancer (Pollack.
1 Multiple Kernel Learning Naouel Baili MRL Seminar, Fall 2009.
Mehdi Ghayoumi MSB rm 132 Ofc hr: Thur, a Machine Learning.
Blitz: A Principled Meta-Algorithm for Scaling Sparse Optimization Tyler B. Johnson and Carlos Guestrin University of Washington.
Relevance Feedback Content-Based Image Retrieval Using Query Distribution Estimation Based on Maximum Entropy Principle Irwin King and Zhong Jin The Chinese.
1 Harvard Medical School Transcriptional Diagnosis by Bayesian Network Hsun-Hsien Chang and Marco F. Ramoni Children’s Hospital Informatics Program Harvard-MIT.
Graph-based consensus clustering for class discovery from gene expression data Zhiwen Yum, Hau-San Wong and Hongqiang Wang Bioinformatics, 2007.
Mining Discriminative Components With Low-Rank and Sparsity Constraints for Face Recognition Qiang Zhang, Baoxin Li Computer Science and Engineering Arizona.
Correlate February 19, 2010 Sam Gross, Balasubramanian Narasimhan, Robert Tibshirani, and Daniela Witten A method for the integrative analysis of two genomic.
Jeff Howbert Introduction to Machine Learning Winter Regression Linear Regression.
Stochastic Subgradient Approach for Solving Linear Support Vector Machines Jan Rupnik Jozef Stefan Institute.
CS 782 – Machine Learning Lecture 4 Linear Models for Classification  Probabilistic generative models  Probabilistic discriminative models.
Center for Evolutionary Functional Genomics Large-Scale Sparse Logistic Regression Jieping Ye Arizona State University Joint work with Jun Liu and Jianhui.
The Group Lasso for Logistic Regression Lukas Meier, Sara van de Geer and Peter Bühlmann Presenter: Lu Ren ECE Dept., Duke University Sept. 19, 2008.
Ch 4. Linear Models for Classification (1/2) Pattern Recognition and Machine Learning, C. M. Bishop, Summarized and revised by Hee-Woong Lim.
Multivariate Dyadic Regression Trees for Sparse Learning Problems Xi Chen Machine Learning Department Carnegie Mellon University (joint work with Han Liu)
Sparse Signals Reconstruction Via Adaptive Iterative Greedy Algorithm Ahmed Aziz, Ahmed Salim, Walid Osamy Presenter : 張庭豪 International Journal of Computer.
Learning from Positive and Unlabeled Examples Investigator: Bing Liu, Computer Science Prime Grant Support: National Science Foundation Problem Statement.
Correlation Matrix Diagonal Segmentation (CMDS) A Fast Genome-wide Approach for Identifying Recurrent DNA Copy Number Alterations across Cancer Patients.
High-dimensional Error Analysis of Regularized M-Estimators Ehsan AbbasiChristos ThrampoulidisBabak Hassibi Allerton Conference Wednesday September 30,
Simultaneous estimation of monotone trends and seasonal patterns in time series of environmental data By Mohamed Hussian and Anders Grimvall.
1  The Problem: Consider a two class task with ω 1, ω 2   LINEAR CLASSIFIERS.
Guest lecture: Feature Selection Alan Qi Dec 2, 2004.
Exact Differentiable Exterior Penalty for Linear Programming Olvi Mangasarian UW Madison & UCSD La Jolla Edward Wild UW Madison December 20, 2015 TexPoint.
1 Classification and Feature Selection Algorithms for Multi-class CGH data Jun Liu, Sanjay Ranka, Tamer Kahveci
CpSc 881: Machine Learning
1  Problem: Consider a two class task with ω 1, ω 2   LINEAR CLASSIFIERS.
1 Neighboring Feature Clustering Author: Z. Wang, W. Zheng, Y. Wang, J. Ford, F. Makedon, J. Pearlman Presenter: Prof. Fillia Makedon Dartmouth College.
Ariadna Quattoni Xavier Carreras An Efficient Projection for l 1,∞ Regularization Michael Collins Trevor Darrell MIT CSAIL.
Markov Random Fields & Conditional Random Fields
CGH Data BIOS Chromosome Re-arrangements.
Robust Optimization and Applications in Machine Learning.
Ultra-high dimensional feature selection Yun Li
CSE182 L14 Mass Spec Quantitation MS applications Microarray analysis.
Lecture notes for Stat 231: Pattern Recognition and Machine Learning 1. Stat 231. A.L. Yuille. Fall 2004 Linear Separation and Margins. Non-Separable and.
Charalampos (Babis) E. Tsourakakis Joint work with Gary Miller, Richard Peng, Russell Schwartz, Stanley Shackney, Dave Tolliver, Maria A. Tsiarli Algorithms.
Institute of Statistics and Decision Sciences In Defense of a Dissertation Submitted for the Degree of Doctor of Philosophy 26 July 2005 Regression Model.
Linear Discriminant Functions Chapter 5 (Duda et al.) CS479/679 Pattern Recognition Dr. George Bebis.
Predictive Automatic Relevance Determination by Expectation Propagation Y. Qi T.P. Minka R.W. Picard Z. Ghahramani.
Jeremy Watt and Aggelos Katsaggelos Northwestern University
Multiplicative updates for L1-regularized regression
Zhu Han University of Houston Thanks for Dr. Mingyi Hong’s slides
Boosted Augmented Naive Bayes. Efficient discriminative learning of
Computational Optimization
Jan Rupnik Jozef Stefan Institute
Probabilistic Models for Linear Regression
NESTA: A Fast and Accurate First-Order Method for Sparse Recovery
Estimating Networks With Jumps
CS639: Data Management for Data Science
Learning Incoherent Sparse and Low-Rank Patterns from Multiple Tasks
Primal Sparse Max-Margin Markov Networks
An Efficient Projection for L1-∞ Regularization
Presentation transcript:

An Efficient Algorithm for a Class of Fused Lasso Problems Jun Liu, Lei Yuan, and Jieping Ye Computer Science and Engineering The Biodesign Institute Arizona State University 1

Sparse Learning for Feature Selection features Data points min loss(x) + penalty(x) Our prior assumption on the parameter x x 2 Convex function defined on a set of training samples

Outline Fused Lasso and Applications Proposed Algorithm Experiments Conclusion 3

The Fused Lasso Penalty (Tibshirani et al., 2005; Tibshirani and Wang, 2008; Friedman et al., 2007) Lasso Fused Lasso 4 A solution that is sparse in both the parameters and their successive differences

ArrayCGH Data arrayCGH: array-based Comparative Genomic Hybridization (Tibshirani and Wang, 2007) 5 ratio= # DNA copies of the gene in the tumor cells # DNA copies in the reference cells piecewise constant shape of copy number changes copy number alterations: 1. large chromosome segmentation gain/loss 2. abrupt local amplification/deletion

Unordered Data leukaemia data (Tibshirani et al., 2005) 7129 genes and 38 samples: 27 in class 1 (acute lymphocytic leukaemia) 11 in class 2 (acute myelogenous leukaemia) 6 Hierarchical clustering can be used to estimate an ordering of the genes, putting correlated genes near one another in the list. (Tibshirani et al., 2005)

Outline Fused Lasso and Applications Proposed Algorithm Experiments Conclusion 7

The Fused Lasso Penalized Problem Smooth convex loss functions: Least squares loss (Tibshirani, 2005) Logistic loss (Ahmed and Xing, 2009) 8 non-smooth and non-separable Smooth reformulation (auxiliary variables and constraints) + general solver (e.g., CVX) “One difficulty in using the fused lasso is computational speed, …, speed could become a practical limitation. This is especially true if five or tenfold cross-validation is carried out.” (Tibshirani et al., 2005)

9 Efficient Fused Lasso Algorithm (EFLA) SmoothNonSmooth Accelerated gradient descent (Nesterov, 2007; Beck and Teboulle, 2009), which has convergence rate of O(1/k 2 ) for k iterations Fused Lasso Signal Approximator (FLSA, Friedman et al., 2007) FLSA is called in each iteration of EFLA

Subgradient Finding Algorithm (1) 10 Fused Lasso Signal Approximator (FLSA) simplified problem

Subgradient Finding Algorithm (2) 11 (P) duality gap primal (D) dual relationship

Subgradient Finding Algorithm (3) 12 (D) SFA G and SFA N achieve duality gaps of 1e-2 and 1e-3 in 200 iterations, respectively. With the restart technique, we can achieve exact solution in 8 iterations SFA G : SFA via Gradient Descent SFA N : SFA via Nesterov’s method SFA R : SFA via the restart technique dual v is of dimensionality n=10 5. Its entries are from the standard normal distribution.

Outline Fused Lasso and Applications Proposed Algorithm Experiments Conclusion 13

Illustration of SFA 14 v is of dimensionality n=10 5. Its entries are from the standard normal distribution (P) (D) less than 1e-12

Efficiency of SFA 15 We compare with the recently proposed pathFLSA (Hofling, 2009) Much more efficient than pathFLSA (over 10 times for most cases)

Efficiency of EFLA (Comparison with the CVX solver) arrayCGH: 57 samples of dimensionality 2,385 Prostate cancer: 132 samples of dimensionality 15,154 Leukaemia (unordered): 72 samples of dimensionality 7,129 Leu r (reordered via Hierarchical clustering): 72 samples of dimensionality 7, EFLA : our proposed Efficient Fused Lasso Algorithm, with efficient SFA CVX : smooth reformulation via CVX

Classification Performance 17 Significant performance improvement on Array CGH Comparable results on the prostate cancer data set Hierarchical clustering can help improve the performance Significant performance improvement on Array CGH Comparable results on the prostate cancer data set Hierarchical clustering can help improve the performance

SLEP: Sparse Learning with Efficient Projections Liu, Ji, and Ye (2009) SLEP: A Sparse Learning Package Liu, Ji, and Ye (2009) SLEP: A Sparse Learning Package 18

Conclusion and Future work Contributions: An efficient algorithm for the class of fused Lasso problem Subgradient finding algorithm with a novel restart technique Future work: Extend the algorithm to the multi-dimensional fused Lasso Apply the proposed algorithm for learning time-varying network 19

20

SFA via the Restart Technique 21 (P) (D) With a start point z 0, we apply gradient descent to (D) get an approximate solution z prime dual relationship Terminate We compute a restart point z 0 from z Key Observations: 1.With an approximate solution z, we can obtain an exact solution x=ω(z) to (P) if the support set S(z) is correct. 2.S(z) is much easier to obtain than z. 3.With x, we can compute a unique restart point z 0 satisfying x=v-R T z 0, which may have significantly higher precision than z.