Learning Incoherent Sparse and Low-Rank Patterns from Multiple Tasks

Slides:

Advertisements

Similar presentations

Neural Networks and Kernel Methods

Advertisements

Bayesian Belief Propagation

Bayesian network classification using spline-approximated KDE Y. Gurwicz, B. Lerner Journal of Pattern Recognition.

Active Appearance Models

Neural networks Introduction Fitting neural networks

ICONIP 2005 Improve Naïve Bayesian Classifier by Discriminative Training Kaizhu Huang, Zhangbing Zhou, Irwin King, Michael R. Lyu Oct

Exploiting Sparse Markov and Covariance Structure in Multiresolution Models Presenter: Zhe Chen ECE / CMR Tennessee Technological University October 22,

Multi-Task Compressive Sensing with Dirichlet Process Priors Yuting Qi 1, Dehong Liu 1, David Dunson 2, and Lawrence Carin 1 1 Department of Electrical.

Ziming Zhang*, Ze-Nian Li, Mark Drew School of Computing Science Simon Fraser University Vancouver, Canada {zza27, li, AdaMKL: A Novel.

Proportion Priors for Image Sequence Segmentation Claudia Nieuwenhuis, etc. ICCV 2013 Oral.

Computer vision: models, learning and inference

Bayesian Robust Principal Component Analysis Presenter: Raghu Ranganathan ECE / CMR Tennessee Technological University January 21, 2011 Reading Group (Xinghao.

We propose a successive convex matching method to detect actions in videos. The proposed scheme does not need foreground/background separation, works in.

Prénom Nom Document Analysis: Parameter Estimation for Pattern Recognition Prof. Rolf Ingold, University of Fribourg Master course, spring semester 2008.

Principal Component Analysis

1 Transfer Learning Algorithms for Image Classification Ariadna Quattoni MIT, CSAIL Advisors: Michael Collins Trevor Darrell.

An Introduction to Support Vector Machines Martin Law.

Jinhui Tang †, Shuicheng Yan †, Richang Hong †, Guo-Jun Qi ‡, Tat-Seng Chua † † National University of Singapore ‡ University of Illinois at Urbana-Champaign.

Constrained Optimization for Validation-Guided Conditional Random Field Learning Minmin Chen ， Yixin Chen ， Michael Brent ， Aaron Tenney Washington University.

Extracting Places and Activities from GPS Traces Using Hierarchical Conditional Random Fields Yong-Joong Kim Dept. of Computer Science Yonsei.

Mining Discriminative Components With Low-Rank and Sparsity Constraints for Face Recognition Qiang Zhang, Baoxin Li Computer Science and Engineering Arizona.

Jifeng Dai 2011/09/27.  Introduction  Structural SVM  Kernel Design  Segmentation and parameter learning  Object Feature Descriptors  Experimental.

Sparse Gaussian Process Classification With Multiple Classes Matthias W. Seeger Michael I. Jordan University of California, Berkeley

Chengjie Sun,Lei Lin, Yuan Chen, Bingquan Liu Harbin Institute of Technology School of Computer Science and Technology 1 19/11/ :09 PM.

Center for Evolutionary Functional Genomics Large-Scale Sparse Logistic Regression Jieping Ye Arizona State University Joint work with Jun Liu and Jianhui.

An Introduction to Support Vector Machines (M. Law)

1 Robust Nonnegative Matrix Factorization Yining Zhang

Relative Hidden Markov Models Qiang Zhang, Baoxin Li Arizona State University.

Tell Me What You See and I will Show You Where It Is Jia Xu 1 Alexander G. Schwing 2 Raquel Urtasun 2,3 1 University of Wisconsin-Madison 2 University.

A Clustering Method Based on Nonnegative Matrix Factorization for Text Mining Farial Shahnaz.

Learning from Positive and Unlabeled Examples Investigator: Bing Liu, Computer Science Prime Grant Support: National Science Foundation Problem Statement.

Ariadna Quattoni Xavier Carreras An Efficient Projection for l 1,∞ Regularization Michael Collins Trevor Darrell MIT CSAIL.

Large-Scale Matrix Factorization with Missing Data under Additional Constraints Kaushik Mitra University of Maryland, College Park, MD Sameer Sheoreyy.

Gaussian Process Networks Nir Friedman and Iftach Nachman UAI-2K.

An Efficient Algorithm for a Class of Fused Lasso Problems Jun Liu, Lei Yuan, and Jieping Ye Computer Science and Engineering The Biodesign Institute Arizona.

Computer Vision Lecture 7 Classifiers. Computer Vision, Lecture 6 Oleh Tretiak © 2005Slide 1 This Lecture Bayesian decision theory (22.1, 22.2) –General.

1 Bilinear Classifiers for Visual Recognition Computational Vision Lab. University of California Irvine To be presented in NIPS 2009 Hamed Pirsiavash Deva.

Combining Models Foundations of Algorithms and Machine Learning (CS60020), IIT KGP, 2017: Indrajit Bhattacharya.

Semi-Supervised Clustering

Deep Feedforward Networks

Jinbo Bi Joint work with Jiangwen Sun, Jin Lu, and Tingyang Xu

LECTURE 09: BAYESIAN ESTIMATION (Cont.)

Learning Recommender Systems with Adaptive Regularization

کاربرد نگاشت با حفظ تنکی در شناسایی چهره

Computer vision: models, learning and inference

Recovery from Occlusion in Deep Feature Space for Face Recognition

Machine Learning Basics

Deep learning and applications to Natural language processing

J. Zhu, A. Ahmed and E.P. Xing Carnegie Mellon University ICML 2009

Graph Based Multi-Modality Learning

Nuclear Norm Heuristic for Rank Minimization

Towards Understanding the Invertibility of Convolutional Neural Networks Anna C. Gilbert1, Yi Zhang1, Kibok Lee1, Yuting Zhang1, Honglak Lee1,2 1University.

Probabilistic Models with Latent Variables

SMEM Algorithm for Mixture Models

Design of Hierarchical Classifiers for Efficient and Accurate Pattern Classification M N S S K Pavan Kumar Advisor : Dr. C. V. Jawahar.

MEgo2Vec: Embedding Matched Ego Networks for User Alignment Across Social Networks Jing Zhang+, Bo Chen+, Xianming Wang+, Fengmei Jin+, Hong Chen+, Cuiping.

Sparse Learning Based on L2,1-norm

Generally Discriminant Analysis

Rank-Sparsity Incoherence for Matrix Decomposition

Designing Neural Network Architectures Using Reinforcement Learning

Paper Reading Dalong Du April.08, 2011.

View Inter-Prediction GAN: Unsupervised Representation Learning for 3D Shapes by Learning Global Shape Memories to Support Local View Predictions 1,2 1.

Recursively Adapted Radial Basis Function Networks and its Relationship to Resource Allocating Networks and Online Kernel Learning Weifeng Liu, Puskal.

Neural networks (1) Traditional multi-layer perceptrons

Wellington Cabrera Advisor: Carlos Ordonez

Primal Sparse Max-Margin Markov Networks

Non-Negative Matrix Factorization

An Efficient Projection for L1-∞ Regularization

NON-NEGATIVE COMPONENT PARTS OF SOUND FOR CLASSIFICATION Yong-Choon Cho, Seungjin Choi, Sung-Yang Bang Wen-Yi Chu Department of Computer Science &

Learned Convolutional Sparse Coding

Presentation transcript:

Learning Incoherent Sparse and Low-Rank Patterns from Multiple Tasks Jianhui Chen Computer Science & Engineering Arizona State University Joint work with Ji Liu and Jieping Ye

Learning Multiple Tasks Task m Single Task Learning (STL) Learn f1 Learn f2 Learn fm Multi-Task Learning (MTL) Learn f1 , f2, … fm simultaneously

Outline Introduction Multi-Task Learning Formulations Projected Gradient Scheme Main Algorithms Experiments

Motivation Goal of Multi-task learning ? Improve the overall generalization performance. How can the performance be improved? Learn the tasks simultaneously via exploiting tasks relationship. When do we need multi-task learning? When there are a number of related tasks, but the training data is limited for each task.

Introduction MTL has been studied from many perspectives: share hidden units of neural networks among similar tasks [Caruana’97; Baxter’00] model task relatedness via the common prior distribution in hierarchical Bayesian models [Bakker’03, Schwaighofer,’04; Yu’05; Zhang’05 ] learn the parameters of Gaussian Process covariance from multiple tasks [Lawrence’04] extend kernel methods and regularization networks to MTL setting [Evgeniou’05] learn a shared low-rank structure from multiple tasks [Ando’05; Chen’09] employ trace norm regularization for multi-task learning [Abernethy’09; Argyriou’08; Obozinski’09; Pong’09] In the past decade, many approaches have been proposed for MTL.

Applications MTL has been applied in many areas such as Bioinformatics [Ando’07] Medical image analysis [Bi’08] Web search ranking [Chapelle’10] Computer vision [Quattoni’07] …...

Outline Introduction Multi-Task Learning Formulations Projected Gradient Scheme Main Algorithms Experiments In this work, we propose a novel multi-task learning formulation.

Learning Multiple Tasks Task m Linear classifiers: correlated via underlying relationship Task 1 Task 2 Task m

Sparse and Low-Rank Structure The task relationship can be captured as (Argyriou’08; Pong’09) low-rank structure Multiple tasks may have sufficient difference and the discriminative features can be sparse. sparse structure Low-rank and sparse structures are desirable.

Sparse and Low-Rank Structure Transformation matrix Incoherent sparse and low-rank structure Sparse Component Low-Rank Component + Transformation Matrix = Incoherent Structure Based on this assumption, we can induce the incoherent structure in the transformation Z.

Sparse and Low-Rank Structure Rough shape of the faces Detailed facial marks

Muti-task Learning Formulation The proposed MTL Formulation Smooth convex loss Sparse structure Incoherent structure Low-rank structure The proposed formulation is non-convex . This problem NP-hard.

Convex Envelop We consider a convex relaxation via non-convex term substitution. Substitution using convex envelops The convex envelope is the tightest convex function approximate the non-convex function from below. convex envelop B can be seen as the best convex approximation o f A.

Convex Formulation A convex relaxation The formulation is Substituted convex envelopes convex non-smooth constrained The formulation is This limitation further motives us to develop efficient algorithms .. It can be reformulated as a semi-definite program

Outline Introduction Multi-Task Learning Formulations Projected Gradient Scheme Main Algorithms Experiments We propose to apply the Projected Gradient Scheme to solve the convex relaxation.

Projected Gradient Scheme A compact form the convex relaxation Smooth term Non-smooth term Convex domain set Projected gradient scheme finds the optimal solution T* via a sequence

Projected Gradient Scheme The key component in projected gradient scheme solution point searching point step size If g(T) = 0 and M is all Real space, then

Projected Gradient Scheme For the convex relaxation, the key component is A closed form solution Solved via an SVD + a projection

Outline Introduction Multi-Task Learning Formulations Projected Gradient Scheme Main Algorithms Experiments

Main Algorithms Main components of the algorithms find the appropriate step size 1 / Li via line search solve the optimization Projected gradient algorithm (PG) set Si = Ti attain the convergence rate at Ο(1/k) Accelerated Projected gradient algorithm (AG) set Si = (1+αi)Ti – αiTi-1 attain the convergence rate at Ο(1/k2)

Outline Introduction Multi-Task Learning Formulations Projected Gradient Scheme Main Algorithms Experiments

Performance Evaluation Key observations Incoherent structure improves performance Sparse structure is effective on multi-media data Low-rank structure is important on image and gene data

Efficiency Comparison AG is more efficient than PG for solving the proposed MTL formulation

Conclusion Future Work Main Contributions Propose MTL formulations and the efficient algorithms Conduct experiments for demonstration Future Work Conduct theoretical analysis on the MTL formulations Apply the MTL formulation on real-world application