Proximal Methods for Sparse Hierarchical Dictionary Learning Rodolphe Jenatton, Julien Mairal, Guillaume Obozinski, Francis Bach Presented by Bo Chen,

Slides:



Advertisements
Similar presentations
Training Guide. `
Advertisements

January 23 rd, Document classification task We are interested to solve a task of Text Classification, i.e. to automatically assign a given document.
1 Challenge the future Multi-scale mining of fMRI data with hierarchical structured sparsity – R. Jenatton et al, SIAM Journal of Imaging Sciences, 2012.
Associative Learning Memories -SOLAR_A
Integrated Instance- and Class- based Generative Modeling for Text Classification Antti PuurulaUniversity of Waikato Sung-Hyon MyaengKAIST 5/12/2013 Australasian.
Prediction with Regression
A generic model to compose vision modules for holistic scene understanding Adarsh Kowdle *, Congcong Li *, Ashutosh Saxena, and Tsuhan Chen Cornell University,
Semi-Stochastic Gradient Descent Methods Jakub Konečný University of Edinburgh BASP Frontiers Workshop January 28, 2014.
Structured Sparse Principal Component Analysis Reading Group Presenter: Peng Zhang Cognitive Radio Institute Friday, October 01, 2010 Authors: Rodolphe.
Middle Term Exam 03/01 (Thursday), take home, turn in at noon time of 03/02 (Friday)
Presented by: Mingyuan Zhou Duke University, ECE September 18, 2009
Learning sparse representations to restore, classify, and sense images and videos Guillermo Sapiro University of Minnesota Supported by NSF, NGA, NIH,
Learning Convolutional Feature Hierarchies for Visual Recognition
Jun Zhu Dept. of Comp. Sci. & Tech., Tsinghua University This work was done when I was a visiting researcher at CMU. Joint.
1 Transfer Learning Algorithms for Image Classification Ariadna Quattoni MIT, CSAIL Advisors: Michael Collins Trevor Darrell.
Sparse Word Graphs: A Scalable Algorithm for Capturing Word Correlations in Topic Models Ramesh Nallapati Joint work with John Lafferty, Amr Ahmed, William.
Document Classification Comparison Evangel Sarwar, Josh Woolever, Rebecca Zimmerman.
AN ANALYSIS OF SINGLE- LAYER NETWORKS IN UNSUPERVISED FEATURE LEARNING [1] Yani Chen 10/14/
Mehdi Ghayoumi MSB rm 132 Ofc hr: Thur, a Machine Learning.
Blitz: A Principled Meta-Algorithm for Scaling Sparse Optimization Tyler B. Johnson and Carlos Guestrin University of Washington.
Overview of Back Propagation Algorithm
Review Rong Jin. Comparison of Different Classification Models  The goal of all classifiers Predicating class label y for an input x Estimate p(y|x)
Image Classification using Sparse Coding: Advanced Topics
Online Dictionary Learning for Sparse Coding International Conference on Machine Learning, 2009 Julien Mairal, Francis Bach, Jean Ponce and Guillermo Sapiro.
Predicting Income from Census Data using Multiple Classifiers Presented By: Arghya Kusum Das Arnab Ganguly Manohar Karki Saikat Basu Subhajit Sidhanta.
Online Learning for Matrix Factorization and Sparse Coding
Qual Presentation Daniel Khashabi 1. Outline  My own line of research  Papers:  Fast Dropout training, ICML, 2013  Distributional Semantics Beyond.
Unsupervised Learning of Compositional Sparse Code for Natural Image Representation Ying Nian Wu UCLA Department of Statistics October 5, 2012, MURI Meeting.
Universit at Dortmund, LS VIII
Sparse Inverse Covariance Estimation with Graphical LASSO J. Friedman, T. Hastie, R. Tibshirani Biostatistics, 2008 Presented by Minhua Chen 1.
Transfer Learning Task. Problem Identification Dataset : A Year: 2000 Features: 48 Training Model ‘M’ Testing 98.6% Training Model ‘M’ Testing 97% Dataset.
Group Sparse Coding Samy Bengio, Fernando Pereira, Yoram Singer, Dennis Strelow Google Mountain View, CA (NIPS2009) Presented by Miao Liu July
CS Statistical Machine learning Lecture 18 Yuan (Alan) Qi Purdue CS Oct
Center for Evolutionary Functional Genomics Large-Scale Sparse Logistic Regression Jieping Ye Arizona State University Joint work with Jun Liu and Jianhui.
The Group Lasso for Logistic Regression Lukas Meier, Sara van de Geer and Peter Bühlmann Presenter: Lu Ren ECE Dept., Duke University Sept. 19, 2008.
Sharing Features Between Objects and Their Attributes Sung Ju Hwang 1, Fei Sha 2 and Kristen Grauman 1 1 University of Texas at Austin, 2 University of.
Learning to Sense Sparse Signals: Simultaneous Sensing Matrix and Sparsifying Dictionary Optimization Julio Martin Duarte-Carvajalino, and Guillermo Sapiro.
Hierarchical Classification
The Infinite Hierarchical Factor Regression Model Piyush Rai and Hal Daume III NIPS 2008 Presented by Bo Chen March 26, 2009.
Class Imbalance in Text Classification
Ariadna Quattoni Xavier Carreras An Efficient Projection for l 1,∞ Regularization Michael Collins Trevor Darrell MIT CSAIL.
Data Mining and Decision Support
Naïve Bayes Classification Christina Wallin Computer Systems Research Lab
An Efficient Algorithm for a Class of Fused Lasso Problems Jun Liu, Lei Yuan, and Jieping Ye Computer Science and Engineering The Biodesign Institute Arizona.
Bayesian Hierarchical Clustering Paper by K. Heller and Z. Ghahramani ICML 2005 Presented by David Williams Paper Discussion Group ( )
CS 548 Spring 2016 Model and Regression Trees Showcase by Yanran Ma, Thanaporn Patikorn, Boya Zhou Showcasing work by Gabriele Fanelli, Juergen Gall, and.
Jianchao Yang, John Wright, Thomas Huang, Yi Ma CVPR 2008 Image Super-Resolution as Sparse Representation of Raw Image Patches.
1 Bilinear Classifiers for Visual Recognition Computational Vision Lab. University of California Irvine To be presented in NIPS 2009 Hamed Pirsiavash Deva.
Label Embedding Trees for Large Multi-class Tasks Samy Bengio Jason Weston David Grangier Presented by Zhengming Xing.
Hierarchical Sampling for Active Learning Sanjoy Dasgupta and Daniel Hsu University of California, San Diego Session : Active Learning and Experimental.
Editing Editing – the process of updating a word processing document to: make changes correct errors make it visually appealing.
Sparsity Based Poisson Denoising and Inpainting
Multiplicative updates for L1-regularized regression
Learning Mid-Level Features For Recognition
Boosting and Additive Trees (2)
Restricted Boltzmann Machines for Classification
Asymmetric Gradient Boosting with Application to Spam Filtering
J. Zhu, A. Ahmed and E.P. Xing Carnegie Mellon University ICML 2009
Group Norm for Learning Latent Structural SVMs
Estimating Networks With Jumps
Lecture 4: Econometric Foundations
Classification and Prediction
Sparse Learning Based on L2,1-norm
Adarsh Kowdle*, Congcong Li*, Ashutosh Saxena, and Tsuhan Chen
Exploring Clustering Applications in Outlier Detection for Administrative Data Elizabeth Ayres Sunday, July 29, 2018.
Learning Incoherent Sparse and Low-Rank Patterns from Multiple Tasks
Primal Sparse Max-Margin Markov Networks
Learned Convolutional Sparse Coding
Presentation transcript:

Proximal Methods for Sparse Hierarchical Dictionary Learning Rodolphe Jenatton, Julien Mairal, Guillaume Obozinski, Francis Bach Presented by Bo Chen, 2010, 6.11

Outline 1. Structured Sparsity 2. Dictionary Learning 3. Sparse Hierarchical Dictionary Learning 4. Experimental Results

Structured Sparsity Lasso (R. Tibshirani.,1996) Group Lasso (M. Yuan & Y. Lin, 2006) Tree-Guided Group Lasso (Kim & Xing, 2009)

Tree-Guided Structure Example Tree Regularization Definition: Kim & Xing, 2009 Multi-task:

Tree-Guided Structure Penalty Introduce two parameters: Rewrite the penalty term, if the number of tasks is 2. (K=2): Generally: Kim & Xing, 2009

In Detail Kim & Xing, 2009

Some Definitions about Hierarchical Groups

Hierarchical Sparsity-Inducing Norms

Dictionary Learning  If the structure information is introduced, the difference between dictionary learning and group lasso: 1.Group Lasso is a regression problem. Each feature has its own physical meaning. The structure information should be meaningful and correct. Otherwise, the ‘structure’ will hurt the method. 2.In dictionary learning, the dictionary is unknown. So the structure information will be a guide to help learn the structured dictionary.

Optimization Proximal Operator for Structure Norm Fix the dictionary D, the objective function: = Transformed to a proximal problem: Proximal operator with the structure penalty:

Learning the Dictionary Updating D 5 times in each iteration, Updating A,

Experiments : Natural Image Patches Use the learned dictionary from training set to impute the missing values in testing samples. Each sample is a 8x8 patch. Training set: 50000; Testing set: Test 21 balanced tree structures of depth 3 and 4. Also set the number of the nodes in each layer.

Learned Hierarchical Dictionary

Experiments : Text Documents Key points:

Visualization of NIPS proceedings Documents: 1714 Words: 8274

Postings Classification Training set: 1000; Testing set: 425; Documents: 1425; Words:13312 Goal: classify the postings from the two newsgroups, alt.atheism and talk.religion.misc.