Introduction to several works and Some Ideas Songcan Chen 2012.9.4.

Slides:



Advertisements
Similar presentations
1 Closed-Form MSE Performance of the Distributed LMS Algorithm Gonzalo Mateos, Ioannis Schizas and Georgios B. Giannakis ECE Department, University of.
Advertisements

Component Analysis (Review)
L1 sparse reconstruction of sharp point set surfaces
Principal Component Analysis Based on L1-Norm Maximization Nojun Kwak IEEE Transactions on Pattern Analysis and Machine Intelligence, 2008.
Dimension reduction (2) Projection pursuit ICA NCA Partial Least Squares Blais. “The role of the environment in synaptic plasticity…..” (1998) Liao et.
Pattern Recognition and Machine Learning
Optimization Tutorial
Some Topics Deserved Concerns Songcan Chen
Exploiting Sparse Markov and Covariance Structure in Multiresolution Models Presenter: Zhe Chen ECE / CMR Tennessee Technological University October 22,
by Rianto Adhy Sasongko Supervisor: Dr.J.C.Allwright
Graph Laplacian Regularization for Large-Scale Semidefinite Programming Kilian Weinberger et al. NIPS 2006 presented by Aggeliki Tsoli.
Sketching for M-Estimators: A Unified Approach to Robust Regression
Continuous optimization Problems and successes
Chapter 2: Lasso for linear models
Second order cone programming approaches for handing missing and uncertain data P. K. Shivaswamy, C. Bhattacharyya and A. J. Smola Discussion led by Qi.
Bayesian Robust Principal Component Analysis Presenter: Raghu Ranganathan ECE / CMR Tennessee Technological University January 21, 2011 Reading Group (Xinghao.
EE 290A: Generalized Principal Component Analysis Lecture 6: Iterative Methods for Mixture-Model Segmentation Sastry & Yang © Spring, 2011EE 290A, University.
L15:Microarray analysis (Classification) The Biological Problem Two conditions that need to be differentiated, (Have different treatments). EX: ALL (Acute.
An Introduction to Kernel-Based Learning Algorithms K.-R. Muller, S. Mika, G. Ratsch, K. Tsuda and B. Scholkopf Presented by: Joanna Giforos CS8980: Topics.
Chebyshev Estimator Presented by: Orr Srour. References Yonina Eldar, Amir Beck and Marc Teboulle, "A Minimax Chebyshev Estimator for Bounded Error Estimation"
L15:Microarray analysis (Classification). The Biological Problem Two conditions that need to be differentiated, (Have different treatments). EX: ALL (Acute.
Gaussian Information Bottleneck Gal Chechik Amir Globerson, Naftali Tishby, Yair Weiss.
Independent Component Analysis (ICA) and Factor Analysis (FA)
EE 685 presentation Optimization Flow Control, I: Basic Algorithm and Convergence By Steven Low and David Lapsley Asynchronous Distributed Algorithm Proof.
Visual Recognition Tutorial
Mehdi Ghayoumi MSB rm 132 Ofc hr: Thur, a Machine Learning.
ECE 530 – Analysis Techniques for Large-Scale Electrical Systems
PCA Extension By Jonash.
Probability of Error Feature vectors typically have dimensions greater than 50. Classification accuracy depends upon the dimensionality and the amount.
Presented By Wanchen Lu 2/25/2013
Efficient Direct Density Ratio Estimation for Non-stationarity Adaptation and Outlier Detection Takafumi Kanamori Shohei Hido NIPS 2008.
Minimum Phoneme Error Based Heteroscedastic Linear Discriminant Analysis for Speech Recognition Bing Zhang and Spyros Matsoukas BBN Technologies Present.
Ordinary Least-Squares Emmanuel Iarussi Inria. Many graphics problems can be seen as finding the best set of parameters for a model, given some data Surface.
Trust-Aware Optimal Crowdsourcing With Budget Constraint Xiangyang Liu 1, He He 2, and John S. Baras 1 1 Institute for Systems Research and Department.
BrainStorming 樊艳波 Outline Several papers on icml15 & cvpr15 PALM Information Theory Learning.
1 Sparsity Control for Robust Principal Component Analysis Gonzalo Mateos and Georgios B. Giannakis ECE Department, University of Minnesota Acknowledgments:
IEEE TRANSSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE
ECE 8443 – Pattern Recognition ECE 8423 – Adaptive Signal Processing Objectives: Deterministic vs. Random Maximum A Posteriori Maximum Likelihood Minimum.
Ch 4. Linear Models for Classification (1/2) Pattern Recognition and Machine Learning, C. M. Bishop, Summarized and revised by Hee-Woong Lim.
ECE 8443 – Pattern Recognition LECTURE 10: HETEROSCEDASTIC LINEAR DISCRIMINANT ANALYSIS AND INDEPENDENT COMPONENT ANALYSIS Objectives: Generalization of.
Estimation of Number of PARAFAC Components
Low-Rank Kernel Learning with Bregman Matrix Divergences Brian Kulis, Matyas A. Sustik and Inderjit S. Dhillon Journal of Machine Learning Research 10.
ECE 8443 – Pattern Recognition LECTURE 08: DIMENSIONALITY, PRINCIPAL COMPONENTS ANALYSIS Objectives: Data Considerations Computational Complexity Overfitting.
Map-Reduce for Machine Learning on Multicore C. Chu, S.K. Kim, Y. Lin, Y.Y. Yu, G. Bradski, A.Y. Ng, K. Olukotun (NIPS 2006) Shimin Chen Big Data Reading.
Learning Spectral Clustering, With Application to Speech Separation F. R. Bach and M. I. Jordan, JMLR 2006.
Rank Minimization for Subspace Tracking from Incomplete Data
Mathematical Analysis of MaxEnt for Mixed Pixel Decomposition
Optimal Reverse Prediction: Linli Xu, Martha White and Dale Schuurmans ICML 2009, Best Overall Paper Honorable Mention A Unified Perspective on Supervised,
Robust Optimization and Applications in Machine Learning.
Iterative K-Means Algorithm Based on Fisher Discriminant UNIVERSITY OF JOENSUU DEPARTMENT OF COMPUTER SCIENCE JOENSUU, FINLAND Mantao Xu to be presented.
Kernelized Value Function Approximation for Reinforcement Learning Gavin Taylor and Ronald Parr Duke University.
ECE 530 – Analysis Techniques for Large-Scale Electrical Systems Prof. Hao Zhu Dept. of Electrical and Computer Engineering University of Illinois at Urbana-Champaign.
Regularized Least-Squares and Convex Optimization.
Dimension reduction (1) Overview PCA Factor Analysis Projection persuit ICA.
Dimension reduction (2) EDR space Sliced inverse regression Multi-dimensional LDA Partial Least Squares Network Component analysis.
The Chinese University of Hong Kong Learning Larger Margin Machine Locally and Globally Dept. of Computer Science and Engineering The Chinese University.
Lecture XXVII. Orthonormal Bases and Projections Suppose that a set of vectors {x 1,…,x r } for a basis for some space S in R m space such that r  m.
Linli Xu Martha White Dale Schuurmans University of Alberta
LECTURE 09: BAYESIAN ESTIMATION (Cont.)
ROBUST SUBSPACE LEARNING FOR VISION AND GRAPHICS
LECTURE 10: DISCRIMINANT ANALYSIS
Nonnegative polynomials and applications to learning
USPACOR: Universal Sparsity-Controlling Outlier Rejection
CSCI B609: “Foundations of Data Science”
Aviv Rosenberg 10/01/18 Seminar on Experts and Bandits
Sparse Learning Based on L2,1-norm
Generally Discriminant Analysis
LECTURE 09: DISCRIMINANT ANALYSIS
Multivariate Methods Berlin Chen
Multivariate Methods Berlin Chen, 2005 References:
Presentation transcript:

Introduction to several works and Some Ideas Songcan Chen

Outlines Introduction to Several works Some ideas from Sparsity Aware

Introduction to Several works 1.A Least-Squares Framework for Component Analysis (CA)[1] 2.On the convexity of log-sum-exp functions with positive definite matrices[2]

Some Ideas Motivated by CA framework [1] Motivated by Log-Sum-Exp [2] Motivated by Sparsity Aware[3-4]

CA framework

Proposes a unified least-squares framework, called least-squares weighted kernel reduced rank regression (LS-WKRRR), to formulate many CA methods, As a result, PCA, LDA, CCA, SC, LE and their kernel versions become its special cases. LS-WKRRR’s benefits (1)provides a clean connection between many CA techniques (2)yields efficient numerical schemes to solve CA techniques (3) overcomes the small sample size problem; (4) provides a framework to easily extend CA methods. For example, weighted generalizations of PCA, LDA, SC, and CCA, and several new CA techniques.

The LS-WKRRR problem minimizes the following expression: Where Factors: Weights: Data:

Solutions to A and B A GEP:

Computational Aspects Subspace Iteration Alternated Least Squares (ALS) Gradient Descent and Second-Order Methods Important to notice: both the ALS and the gradient-based algorithms effectively solve the SSS problem, unlike those that directly solve the GEP.

PCA,KPCA AND WEIGHTED EXTENSIONS PCA: That is, in (1), set Or alternative formulation:

KPCA & WEIGHTED EXTENSIONS KPCA: Weighted PCA:

LDA, KLDA and Weighted Extensions LDA: In (1), Set G is label matrix using one- of-c encoding for c classes!

CCA, KCCA and Weighted Extensions CCA In (1), set

The relations to LLE, LE etc. Please refer to [1]

On the convexity of log-sum-exp functions with positive definite (PD) matrices [2]

Log-Sum-Exp (LSE) function One of the fundamental functions in convex analysis is the LSE function whose convexity is the core ingredient in the methodology of geometric programming (GP) which has made considerable impact in different fields, e.g., power control in communication theory! This paper Extends these results and consider the convexity of the log-determinant of a sum of rank one PD matrices with scalar exponential weights!

LSE function (convex):

Extending convexity of vector- function to matrix-variablefor PD A general convexity definition: Where between any two points q 0 and q 1 in the domain.

Several Definitions:

More general,

Applications Robust covariance estimation Kronecker structured covariance estimation Hybrid Robust Kronecker model

Robust covariance estimation Assume: The ML objective:

The objective is convex in 1/q i and its minimizers are Plugging this solution back into the objective, results in A key lemma:

Applying this lemma to (37) yields Plugging it back into the objective yields

For avoiding ill-condition, regularize (37) and minimize

Other priors added if available: 1) Bounded peak values: 2) Bounded second moment: 3) Smoothness: 4) Sparsity:

Kronecker structured covariance estimation The basic Kronecker model is The ML objective:

Use The problem (58) turns to

Hybrid Robust Kronecker Model The ML objective: Solving for Σ>0 again via Lemma 4 yields

the problem (73) reduces to Solve (75) using the fixed point iteration Arbitrary can be used as initial iteration.

Some Ideas Motivated by CA framework [1] Motivated by Log-Sum-Exp [2] Motivated by Sparsity Aware [3][4]

Motivated by CA framework [1] Recall

Motivated by Log-Sum-Exp [2] 1) Metric Learning (ML) ML&CL, Relative Distance constraints, LMNN-like,… 2) Classification learning Predictive function: f(X)=tr(W T X)+b; The objective:

ML across heterogeneous domains 2 lines: 1) Line 1: 2) Line 2 (for ML&CL) Symmetry and PSD An indefinite measure ({U i } is base & { α i } is sparsified) Implying that 2 lines can be unified to a common indefinite ML!

Motivated by Sparsity Aware [3][4] Noise model Where c is the c-th class or cluster, e ci is noise and o ci is outlier and its ||o ci ||≠0 if outlier, 0 otherwise. Discuss: 1)U c =0, o ci =0; e ci ~N(0, dI)  Means; Lap(0,dI)  Medians; other priors  other statistics 2)U c ≠ 0, o ci =0; e ci ~ N(0, dI)  PCA; Lap(0,dI)  L 1 -PCA; other priors  other PCAs;

3) U c =0, o ci ≠0; e ci ~N(0, dI)  Robust (k-)Means; ~ Lap(0,dI)  (k-)Medians; 4) Subspace U c ≠0, o ci ≠0; e ci ~N(0, dI)  Robust k-subspaces; 5) m c =0 …… 6) Robust (Semi-)NMF …… 7) Robust CA …… where noise model:Γ=BA T Υ+E+O

Reference [1] Fernando De la Torre, A Least-Squares Framework for Component Analysis, IEEE TPAMI,34(6) 2012: [2] Ami Wiesel, On the convexity of log-sum-exp functions with positive definite matrices, available at [3] Gonzalo Mateos & Georgios B. Giannakis, Robust PCA as Bilinear Decomposition with Outlier-Sparsity Regularization, available at homepage of Georgios B. Giannakis. [4] Pedro A. Forero, Vassilis Kekatos & Georgios B. Giannakis, Robust Clustering Using Outlier-Sparsity Regularization, available at homepage of Georgios B. Giannakis.

Thanks! Q&A