An Efficient Algorithm for a Class of Fused Lasso Problems Jun Liu, Lei Yuan, and Jieping Ye Computer Science and Engineering The Biodesign Institute Arizona State University 1
Sparse Learning for Feature Selection features Data points min loss(x) + penalty(x) Our prior assumption on the parameter x x 2 Convex function defined on a set of training samples
Outline Fused Lasso and Applications Proposed Algorithm Experiments Conclusion 3
The Fused Lasso Penalty (Tibshirani et al., 2005; Tibshirani and Wang, 2008; Friedman et al., 2007) Lasso Fused Lasso 4 A solution that is sparse in both the parameters and their successive differences
ArrayCGH Data arrayCGH: array-based Comparative Genomic Hybridization (Tibshirani and Wang, 2007) 5 ratio= # DNA copies of the gene in the tumor cells # DNA copies in the reference cells piecewise constant shape of copy number changes copy number alterations: 1. large chromosome segmentation gain/loss 2. abrupt local amplification/deletion
Unordered Data leukaemia data (Tibshirani et al., 2005) 7129 genes and 38 samples: 27 in class 1 (acute lymphocytic leukaemia) 11 in class 2 (acute myelogenous leukaemia) 6 Hierarchical clustering can be used to estimate an ordering of the genes, putting correlated genes near one another in the list. (Tibshirani et al., 2005)
Outline Fused Lasso and Applications Proposed Algorithm Experiments Conclusion 7
The Fused Lasso Penalized Problem Smooth convex loss functions: Least squares loss (Tibshirani, 2005) Logistic loss (Ahmed and Xing, 2009) 8 non-smooth and non-separable Smooth reformulation (auxiliary variables and constraints) + general solver (e.g., CVX) “One difficulty in using the fused lasso is computational speed, …, speed could become a practical limitation. This is especially true if five or tenfold cross-validation is carried out.” (Tibshirani et al., 2005)
9 Efficient Fused Lasso Algorithm (EFLA) SmoothNonSmooth Accelerated gradient descent (Nesterov, 2007; Beck and Teboulle, 2009), which has convergence rate of O(1/k 2 ) for k iterations Fused Lasso Signal Approximator (FLSA, Friedman et al., 2007) FLSA is called in each iteration of EFLA
Subgradient Finding Algorithm (1) 10 Fused Lasso Signal Approximator (FLSA) simplified problem
Subgradient Finding Algorithm (2) 11 (P) duality gap primal (D) dual relationship
Subgradient Finding Algorithm (3) 12 (D) SFA G and SFA N achieve duality gaps of 1e-2 and 1e-3 in 200 iterations, respectively. With the restart technique, we can achieve exact solution in 8 iterations SFA G : SFA via Gradient Descent SFA N : SFA via Nesterov’s method SFA R : SFA via the restart technique dual v is of dimensionality n=10 5. Its entries are from the standard normal distribution.
Outline Fused Lasso and Applications Proposed Algorithm Experiments Conclusion 13
Illustration of SFA 14 v is of dimensionality n=10 5. Its entries are from the standard normal distribution (P) (D) less than 1e-12
Efficiency of SFA 15 We compare with the recently proposed pathFLSA (Hofling, 2009) Much more efficient than pathFLSA (over 10 times for most cases)
Efficiency of EFLA (Comparison with the CVX solver) arrayCGH: 57 samples of dimensionality 2,385 Prostate cancer: 132 samples of dimensionality 15,154 Leukaemia (unordered): 72 samples of dimensionality 7,129 Leu r (reordered via Hierarchical clustering): 72 samples of dimensionality 7, EFLA : our proposed Efficient Fused Lasso Algorithm, with efficient SFA CVX : smooth reformulation via CVX
Classification Performance 17 Significant performance improvement on Array CGH Comparable results on the prostate cancer data set Hierarchical clustering can help improve the performance Significant performance improvement on Array CGH Comparable results on the prostate cancer data set Hierarchical clustering can help improve the performance
SLEP: Sparse Learning with Efficient Projections Liu, Ji, and Ye (2009) SLEP: A Sparse Learning Package Liu, Ji, and Ye (2009) SLEP: A Sparse Learning Package 18
Conclusion and Future work Contributions: An efficient algorithm for the class of fused Lasso problem Subgradient finding algorithm with a novel restart technique Future work: Extend the algorithm to the multi-dimensional fused Lasso Apply the proposed algorithm for learning time-varying network 19
20
SFA via the Restart Technique 21 (P) (D) With a start point z 0, we apply gradient descent to (D) get an approximate solution z prime dual relationship Terminate We compute a restart point z 0 from z Key Observations: 1.With an approximate solution z, we can obtain an exact solution x=ω(z) to (P) if the support set S(z) is correct. 2.S(z) is much easier to obtain than z. 3.With x, we can compute a unique restart point z 0 satisfying x=v-R T z 0, which may have significantly higher precision than z.