Bayesian Robust Principal Component Analysis Presenter: Raghu Ranganathan ECE / CMR Tennessee Technological University January 21, 2011 Reading Group (Xinghao.

Slides:

Advertisements

Similar presentations

Part 2: Unsupervised Learning

Advertisements

Sampling Design, Spatial Allocation, and Proposed Analyses Don Stevens Department of Statistics Oregon State University.

Bayesian Belief Propagation

Active Appearance Models

Principal Component Analysis Based on L1-Norm Maximization Nojun Kwak IEEE Transactions on Pattern Analysis and Machine Intelligence, 2008.

Bayesian dynamic modeling of latent trait distributions Duke University Machine Learning Group Presented by Kai Ni Jan. 25, 2007 Paper by David B. Dunson,

Dimensionality Reduction PCA -- SVD

Fast Bayesian Matching Pursuit Presenter: Changchun Zhang ECE / CMR Tennessee Technological University November 12, 2010 Reading Group (Authors: Philip.

Proposed concepts illustrated well on sets of face images extracted from video: Face texture and surface are smooth, constraining them to a manifold Recognition.

T HE POWER OF C ONVEX R ELAXATION : N EAR - OPTIMAL MATRIX COMPLETION E MMANUEL J. C ANDES AND T ERENCE T AO M ARCH, 2009 Presenter: Shujie Hou February,

Exploiting Sparse Markov and Covariance Structure in Multiresolution Models Presenter: Zhe Chen ECE / CMR Tennessee Technological University October 22,

More MR Fingerprinting

Robust Network Compressive Sensing Lili Qiu UT Austin NSF Workshop Nov. 12, 2014.

A Constraint Generation Approach to Learning Stable Linear Dynamical Systems Sajid M. Siddiqi Byron Boots Geoffrey J. Gordon Carnegie Mellon University.

A Bayesian Approach to Joint Feature Selection and Classifier Design Balaji Krishnapuram, Alexander J. Hartemink, Lawrence Carin, Fellow, IEEE, and Mario.

“Random Projections on Smooth Manifolds” -A short summary

1 Robust Video Stabilization Based on Particle Filter Tracking of Projected Camera Motion (IEEE 2009) Junlan Yang University of Illinois,Chicago.

Probabilistic video stabilization using Kalman filtering and mosaicking.

Dimension reduction : PCA and Clustering Slides by Agnieszka Juncker and Chris Workman.

Dimension reduction : PCA and Clustering Christopher Workman Center for Biological Sequence Analysis DTU.

Arizona State University DMML Kernel Methods – Gaussian Processes Presented by Shankar Bhargav.

Laurent Itti: CS599 – Computational Architectures in Biological Vision, USC Lecture 7: Coding and Representation 1 Computational Architectures in.

Review of Lecture Two Linear Regression Normal Equation

Chapter Two Probability Distributions: Discrete Variables

Chapter 2 Dimensionality Reduction. Linear Methods

The horseshoe estimator for sparse signals CARLOS M. CARVALHO NICHOLAS G. POLSON JAMES G. SCOTT Biometrika (2010) Presented by Eric Wang 10/14/2010.

© by Yu Hen Hu 1 ECE533 Digital Image Processing Image Restoration.

Mining Discriminative Components With Low-Rank and Sparsity Constraints for Face Recognition Qiang Zhang, Baoxin Li Computer Science and Engineering Arizona.

Non Negative Matrix Factorization

Cs: compressed sensing

1 Sparsity Control for Robust Principal Component Analysis Gonzalo Mateos and Georgios B. Giannakis ECE Department, University of Minnesota Acknowledgments:

ECE 8443 – Pattern Recognition ECE 8423 – Adaptive Signal Processing Objectives: Deterministic vs. Random Maximum A Posteriori Maximum Likelihood Minimum.

User Interests Imbalance Exploration in Social Recommendation: A Fitness Adaptation Authors : Tianchun Wang, Xiaoming Jin, Xuetao Ding, and Xiaojun Ye.

Simulation of the matrix Bingham-von Mises- Fisher distribution, with applications to multivariate and relational data Discussion led by Chunping Wang.

Latent Semantic Indexing: A probabilistic Analysis Christos Papadimitriou Prabhakar Raghavan, Hisao Tamaki, Santosh Vempala.

Progress in identification of damping: Energy-based method with incomplete and noisy data Marco Prandina University of Liverpool.

Max-Margin Classification of Data with Absent Features Presented by Chunping Wang Machine Learning Group, Duke University July 3, 2008 by Chechik, Heitz,

A Sparse Non-Parametric Approach for Single Channel Separation of Known Sounds Paris Smaragdis, Madhusudana Shashanka, Bhiksha Raj NIPS 2009.

Efficient computation of Robust Low-Rank Matrix Approximations in the Presence of Missing Data using the L 1 Norm Anders Eriksson and Anton van den Hengel.

Journal Club Journal of Chemometrics May 2010 August 23, 2010.

Efficient Gaussian Process Regression for large Data Sets ANJISHNU BANERJEE, DAVID DUNSON, SURYA TOKDAR Biometrika, 2008.

Statistics What is the probability that 7 heads will be observed in 10 tosses of a fair coin? This is a ________ problem. Have probabilities on a fundamental.

Robust Principal Components Analysis IT530 Lecture Notes.

Large-Scale Matrix Factorization with Missing Data under Additional Constraints Kaushik Mitra University of Maryland, College Park, MD Sameer Sheoreyy.

Principal Component Analysis (PCA)

Zhilin Zhang, Bhaskar D. Rao University of California, San Diego March 28,

Multi-label Prediction via Sparse Infinite CCA Piyush Rai and Hal Daume III NIPS 2009 Presented by Lingbo Li ECE, Duke University July 16th, 2010 Note:

Colorado Center for Astrodynamics Research The University of Colorado 1 STATISTICAL ORBIT DETERMINATION Kalman Filter with Process Noise Gauss- Markov.

Edge Preserving Spatially Varying Mixtures for Image Segmentation Giorgos Sfikas, Christophoros Nikou, Nikolaos Galatsanos (CVPR 2008) Presented by Lihan.

Dimension reduction (2) EDR space Sliced inverse regression Multi-dimensional LDA Partial Least Squares Network Component analysis.

1 Dongheng Sun 04/26/2011 Learning with Matrix Factorizations By Nathan Srebro.

I. Statistical Methods for Genome-Enabled Prediction of Complex Traits OUTLINE THE CHALLENGES OF PREDICTING COMPLEX TRAITS ORDINARY LEAST SQUARES (OLS)

Biointelligence Laboratory, Seoul National University

Compressive Coded Aperture Video Reconstruction

Ch 12. Continuous Latent Variables ~ 12

Ch3: Model Building through Regression

کاربرد نگاشت با حفظ تنکی در شناسایی چهره

Lecture 8:Eigenfaces and Shared Features

Dynamical Statistical Shape Priors for Level Set Based Tracking

Structure from motion Input: Output: (Tomasi and Kanade)

Learning with information of features

Goodfellow: Chapter 14 Autoencoders

Dimension reduction : PCA and Clustering

Rank-Sparsity Incoherence for Matrix Decomposition

JFG de Freitas, M Niranjan and AH Gee

Parametric Methods Berlin Chen, 2005 References:

Unfolding with system identification

Structure from motion Input: Output: (Tomasi and Kanade)

Speech Enhancement Based on Nonparametric Factor Analysis

Goodfellow: Chapter 14 Autoencoders

Presentation transcript:

Bayesian Robust Principal Component Analysis Presenter: Raghu Ranganathan ECE / CMR Tennessee Technological University January 21, 2011 Reading Group (Xinghao Ding, Lihan He, and Lawrence Carin)

Paper contribution ■The problem of matrix decomposition into low-rank and sparse components is considered employing a hierarchical approach ■The matrix is assumed noisy, with unknown and possibly non- stationary noise statistics ■The Bayesian framework approximately infers the noise statistics in addition to the low-rank and sparse outlier contributions ■The model proposed is robust to a broad range of noise levels without having to change the hyper-parameter settings ■In addition, a Markov dependency between successive rows of the matrix is inferred by the Bayesian model to exploit additional structure in the observed matrix, particularly, in video applications 21/21/11

Introduction ■Most high-dimensional data such as images, biological data, and social network data (Netflix data) reside in a low- dimensional subspace or low-dimensional manifold 31/21/11

Noise models ■In low-rank matrix representations, two types of noise models are usually considered ■One causes small scale perturbation to all the matrix elements, e.g. i.i.d. Gaussian noise added to each element. ■In this case, if the noise energy is small compared to the dominant singular values of the SVD, it does not significantly affect the principal vectors ■The second case is sparse noise with arbitrary magnitude, impacting a small subset of matrix elements, for example a moving object in video, in the presence of a static background manifests such sparse noise 41/21/11

Convex optimization approach 511/5/10

Bayesian approach ■The observation matrix is considered to be of the form Y = L (low-rank)+ S (sparse)+ E (noise), with the presence of both sparse noise, S, and dense noise E. ■In the proposed Bayesian model, the noise statistics of E are approximately learned, along with learning S, and L. ■The proposed model is robust to a broad range of noise variances ■The Bayesian model infers approximation to the posterior distributions on the model parameters, and obtains approximate probability distributions for L, S, and E ■The advantage of Bayesian model is that prior knowledge is employed in the inference 61/21/11

Bayesian approach ■The Bayesian framework exploits the anticipated structure in the sparse component. ■In video analysis, it is desired to separate the spatially localized moving objects (sparse component), from the static or quasi-static background (low-rank component) in the presence frame dependent additive noise E. ■The correlation between the sparse components of the video from frame to frame (column to column in the matrix) has to be considered ■In this paper, a Markov dependency in time and space is assumed between the sparse components of consecutive matrix columns ■This structure is incorporated into the Bayesian framework, with the Markov parameters inferred through the observed matrix 71/21/11

Bayesian Robust PCA ■The work in this paper is closely related to the low-rank matrix completion problem where we try to approximate a matrix (with noisy entries) by a low- rank matrix and to predict the missing entries ■The matrix Y = L + S + E is missing random entries; the proposed model can make estimates for the missing entries (in terms of the low-rank term L) ■The S term is defined as a sparse set of matrix entries; the location of S must be inferred while estimating the values of L, S, and E ■Typically, in Bayesian inference, a sparseness promoting prior is imposed on the desired signal, and the posterior distribution of the sparse signal is inferred. 81/21/11

Bayesian Low-rank and Sparse Model 91/21/11

1011/5/10

Bayesian Low-rank and Sparse Model 1111/5/10

Bayesian Low-rank and Sparse Model 1211/5/10

C. Noise component ■The measurement noise is drawn i.i.d from a Gaussian distribution, and the noise affects all measurements ■The noise variance is assumed unknown, and is learned within the model inference. Mathematically, the noise is modeled as ■The model can learn different noise variances for different parts of E, i.e. each column/row of Y (each frame) in general have its own noise level. The noise structure is modified as 131/21/11

Relation to the optimization based approach 141/21/11

Relation to the optimization based approach ■In the Bayesian model, it is not required to know the noise variance a priori, the model will learn the noise during inference ■For the low-rank component instead of the constraint to impose sparseness of singular values, the Gaussian prior together with the beta- Bernoulli distribution is used to obtain an constraint ■For the sparse component, instead of the constraint, the constraint and the beta-Bernoulli distribution is employed to enforce sparsity ■Compared to the Laplacian prior (gives many small entries close to 0), the beta-Bernoulli prior yields exactly zero values ■In Bayesian learning, numerical methods are used to estimate the distribution for the unknown parameters, whereas in the optimization based approach, a solution to the minimum of a function similar to 151/21/11

Markov dependency of Sparse Term in Time and Space 161/21/11

Markov dependency of Sparse Term in Time and Space 171/21/11

Markov dependency of Sparse Term in Time and Space 181/21/11

Posterior inference 191/21/11

2011/5/10

Experimental results 2111/5/10

2211/5/10

2311/5/10

B. Video example ■The application of video surveillance with a fixed camera is considered ■The objective is to reconstruct a near static background and moving foreground from a video sequence ■The data are organized such that the column m of Y is constructed by concatenating all pixels of frame m from a grayscale video sequence ■The background is modeled as the low-rank component, and the moving foreground as the sparse component. ■The rank r is usually small for a static background, and the sparse components across frames (columns of Y) are strongly correlated, modeled by a Markov dependency 241/21/11

2511/5/10

2611/5/10

Conclusions ■The authors have developed a new robust Bayesian PCA framework for analysis of matrices with sparsely distributed noise of arbitrary magnitude ■The Bayesian approach is found to be robust to densely distributed noise, and the noise statistics may be inferred based on the data, with no tuning of hyperparameters ■In addition, using the Markov property, the model allows the noise statistics to vary from frame to frame ■Future research directions would involve a moving camera which would assume the background resides in a low-dimensional manifold as opposed to low-dimensional linear subspace ■The Bayesian framework may be extended to infer the properties of the low- dimensional manifold 271/21/11