Fast Bayesian Matching Pursuit Presenter: Changchun Zhang ECE / CMR Tennessee Technological University November 12, 2010 Reading Group (Authors: Philip.

Slides:

Advertisements

Similar presentations

Pattern Recognition and Machine Learning

Advertisements

Bayes rule, priors and maximum a posteriori

University of Joensuu Dept. of Computer Science P.O. Box 111 FIN Joensuu Tel fax Gaussian Mixture.

Notes Sample vs distribution “m” vs “µ” and “s” vs “σ” Bias/Variance Bias: Measures how much the learnt model is wrong disregarding noise Variance: Measures.

Slide 1 Bayesian Model Fusion: Large-Scale Performance Modeling of Analog and Mixed- Signal Circuits by Reusing Early-Stage Data Fa Wang*, Wangyang Zhang*,

Joint work with Irad Yavneh

Pattern Recognition and Machine Learning

Computer vision: models, learning and inference Chapter 8 Regression.

1 12. Principles of Parameter Estimation The purpose of this lecture is to illustrate the usefulness of the various concepts introduced and studied in.

Multi-Task Compressive Sensing with Dirichlet Process Priors Yuting Qi 1, Dehong Liu 1, David Dunson 2, and Lawrence Carin 1 1 Department of Electrical.

Segmentation and Fitting Using Probabilistic Methods

Bayesian Robust Principal Component Analysis Presenter: Raghu Ranganathan ECE / CMR Tennessee Technological University January 21, 2011 Reading Group (Xinghao.

Visual Recognition Tutorial

Volkan Cevher, Marco F. Duarte, and Richard G. Baraniuk European Signal Processing Conference 2008.

Linear Regression Models Based on Chapter 3 of Hastie, Tibshirani and Friedman Slides by David Madigan.

An Optimal Learning Approach to Finding an Outbreak of a Disease Warren Scott Warren Powell

Introduction to Signal Estimation. 94/10/142 Outline 

Estimation and the Kalman Filter David Johnson. The Mean of a Discrete Distribution “I have more legs than average”

Clustering with Bregman Divergences Arindam Banerjee, Srujana Merugu, Inderjit S. Dhillon, Joydeep Ghosh Presented by Rohit Gupta CSci 8980: Machine Learning.

Topics in MMSE Estimation for Sparse Approximation Michael Elad The Computer Science Department The Technion – Israel Institute of technology Haifa 32000,

Maximum likelihood (ML)

Adaptive Signal Processing

Review of Probability.

Data Selection In Ad-Hoc Wireless Sensor Networks Olawoye Oyeyele 11/24/2003.

EE513 Audio Signals and Systems Statistical Pattern Classification Kevin D. Donohue Electrical and Computer Engineering University of Kentucky.

Binary Variables (1) Coin flipping: heads=1, tails=0 Bernoulli Distribution.

The horseshoe estimator for sparse signals CARLOS M. CARVALHO NICHOLAS G. POLSON JAMES G. SCOTT Biometrika (2010) Presented by Eric Wang 10/14/2010.

Chapter 15 Modeling of Data. Statistics of Data Mean (or average): Variance: Median: a value x j such that half of the data are bigger than it, and half.

ECE 8443 – Pattern Recognition LECTURE 06: MAXIMUM LIKELIHOOD AND BAYESIAN ESTIMATION Objectives: Bias in ML Estimates Bayesian Estimation Example Resources:

ECE 8443 – Pattern Recognition LECTURE 03: GAUSSIAN CLASSIFIERS Objectives: Normal Distributions Whitening Transformations Linear Discriminants Resources.

REVISED CONTEXTUAL LRT FOR VOICE ACTIVITY DETECTION Javier Ram’ırez, Jos’e C. Segura and J.M. G’orriz Dept. of Signal Theory Networking and Communications.

ECE 8443 – Pattern Recognition ECE 8423 – Adaptive Signal Processing Objectives: Deterministic vs. Random Maximum A Posteriori Maximum Likelihood Minimum.

Modern Navigation Thomas Herring

PATTERN RECOGNITION AND MACHINE LEARNING CHAPTER 3: LINEAR MODELS FOR REGRESSION.

ECE 8443 – Pattern Recognition LECTURE 07: MAXIMUM LIKELIHOOD AND BAYESIAN ESTIMATION Objectives: Class-Conditional Density The Multivariate Case General.

CS 782 – Machine Learning Lecture 4 Linear Models for Classification  Probabilistic generative models  Probabilistic discriminative models.

SUPA Advanced Data Analysis Course, Jan 6th – 7th 2009 Advanced Data Analysis for the Physical Sciences Dr Martin Hendry Dept of Physics and Astronomy.

ECE 8443 – Pattern Recognition ECE 8423 – Adaptive Signal Processing Objectives: Definitions Random Signal Analysis (Review) Discrete Random Signals Random.

The Dirichlet Labeling Process for Functional Data Analysis XuanLong Nguyen & Alan E. Gelfand Duke University Machine Learning Group Presented by Lu Ren.

Vector Norms and the related Matrix Norms. Properties of a Vector Norm: Euclidean Vector Norm: Riemannian metric:

ECE 8443 – Pattern Recognition ECE 8423 – Adaptive Signal Processing Objectives: ML and Simple Regression Bias of the ML Estimate Variance of the ML Estimate.

PROBABILITY AND STATISTICS FOR ENGINEERING Hossein Sameti Department of Computer Engineering Sharif University of Technology Principles of Parameter Estimation.

Sparse Signals Reconstruction Via Adaptive Iterative Greedy Algorithm Ahmed Aziz, Ahmed Salim, Walid Osamy Presenter : 張庭豪 International Journal of Computer.

INTRODUCTION TO Machine Learning 3rd Edition

NCAF Manchester July 2000 Graham Hesketh Information Engineering Group Rolls-Royce Strategic Research Centre.

1  The Problem: Consider a two class task with ω 1, ω 2   LINEAR CLASSIFIERS.

Lecture 2: Statistical learning primer for biologists

A Weighted Average of Sparse Representations is Better than the Sparsest One Alone Michael Elad and Irad Yavneh SIAM Conference on Imaging Science ’08.

Chapter 20 Classification and Estimation Classification – Feature selection Good feature have four characteristics: –Discrimination. Features.

Chapter 13 (Prototype Methods and Nearest-Neighbors )

Principal Component Analysis (PCA)

Zhilin Zhang, Bhaskar D. Rao University of California, San Diego March 28,

6. Population Codes Presented by Rhee, Je-Keun © 2008, SNU Biointelligence Lab,

Statistics Sampling Distributions and Point Estimation of Parameters Contents, figures, and exercises come from the textbook: Applied Statistics and Probability.

ELEC 303 – Random Signals Lecture 17 – Hypothesis testing 2 Dr. Farinaz Koushanfar ECE Dept., Rice University Nov 2, 2009.

Regularization of energy-based representations Minimize total energy E p (u) + (1- )E d (u,d) E p (u) : Stabilizing function - a smoothness constraint.

Density Estimation in R Ha Le and Nikolaos Sarafianos COSC 7362 – Advanced Machine Learning Professor: Dr. Christoph F. Eick 1.

I. Statistical Methods for Genome-Enabled Prediction of Complex Traits OUTLINE THE CHALLENGES OF PREDICTING COMPLEX TRAITS ORDINARY LEAST SQUARES (OLS)

Bayesian Semi-Parametric Multiple Shrinkage

Chapter 3: Maximum-Likelihood Parameter Estimation

12. Principles of Parameter Estimation

Probability Theory and Parameter Estimation I

Ch3: Model Building through Regression

Propagating Uncertainty In POMDP Value Iteration with Gaussian Process

Presenter: Xudong Zhu Authors: Xudong Zhu, etc.

Where did we stop? The Bayes decision rule guarantees an optimal classification… … But it requires the knowledge of P(ci|x) (or p(x|ci) and P(ci)) We.

Parametric Methods Berlin Chen, 2005 References:

Multivariate Methods Berlin Chen

Multivariate Methods Berlin Chen, 2005 References:

12. Principles of Parameter Estimation

Presentation transcript:

Fast Bayesian Matching Pursuit Presenter: Changchun Zhang ECE / CMR Tennessee Technological University November 12, 2010 Reading Group (Authors: Philip Schniter, Lee C. Potter, and Justin Ziniel)

Outline ■ Introduction ■ Signal Model ■ Estimation of Basis and Parameters ■ Numerical Results ■ Conclusions 211/5/10

Introduction ■ Linear regression model □ Unknown parameter □ Unit columns in the regressor matrix □ Additive noise ■ Rough survey of existing approaches □ Greedy approaches □ Penalized least-squares solution ■ In the literature, primary focus is placed on the detection of the few significant entries of the sparse ■ In contrast, this paper adopts a MMSE (minimum mean-squared error) estimation formulation and focus on accurately inferring x from the noisy observations, y. 311/5/10

Signal Model 411/5/10 ■ Observing, - a noisy linear combination of the parameters in : Assumptions: 1.The noise is assumed to be white Gaussian with variance, i.e., 2.The columns of, are taken to be unit-norm. 3.The parameters are generated from Gaussian mixture density: a)Covariance is determined by a discrete random vector of mixture parameters. b)Also, take to be diagonal with, implying that are independent with. c) are Bernoulli( ) and to model spare x, choose and. 4.N>>M, and

Signal Model (continued) 511/5/10 ■ From the previous assumptions, it can be seen that

Estimation of Basis and Parameters 611/5/10 ■ Basis Selection Metric □ The non-zero locations in specify which of the basis elements(i.e., columns of A) are “active”. Thus, basis selection reduces to estimation of. □ Estimation of, not only computes the which of the basis configurations are most likely, but also how likely these bases are. The latter is accomplished by the estimation of dominant posteriors while it can be written as below, via Bayes rule. where. The estimation is reduced to computing the

Estimation of Basis and Parameters 711/5/10 ■ Basis Selection Metric (continued) □ The size of makes it impractical to compute or, the set occupying the dominant posteriors is used for simplicity. □ Working on the log domain, it is found that: is refer as basis selection metric.

Estimation of Basis and Parameters 811/5/10 ■ MMSE Parameter Estimation The MMSE estimate of x from y is while. Due to 2^N terms for, the MMSE estimate is closely approximately using only the dominant posteriors: Likewise, the covariance of the corresponding estimation error can be closely approximated as The primary challenge becomes that of obtaining and for each belongs to.

Estimation of Basis and Parameters 911/5/10 ■ Bayesian Matching Pursuit The efficient means of determining, the set of mixture parameters s yielding the dominant values of, or, equivalently, the dominant values of. Steps: 1.Start from and turns on one mixture parameters each time, yielding a set of N binary vectors s, i.e, 2.Compute the metrics for these vectors and collect the D largest metrics into 3.Based on the, all locations of a second active mixture parms are considered, yielding ND-D*(D+1)/2 unique vectors to store in 4.In, also collect the D largest elements into 5.Repeat the process until is obtained satisfying that is very small. 6. constitutes the final estimate of

Estimation of Basis and Parameters 1011/5/10 ■ Fast Metric Update A fast metric update which computes the change in u(.) that results from the activation of a single mixture parameter. That is to compute the in which the is the vector identical to except for the coefficient, which is active in but inactive in. Property of :

Estimation of Basis and Parameters 1111/5/10 ■ Fast Metric Update (continued) This equation quantifies the change in the basis selection metric due to the activation of the tap of. So just computing the based on its nearest, will simplify the computation of basic selection metric.

Estimation of Basis and Parameters 1211/5/10 ■ Fast Bayesian Matching Pursuit Here the complexity of computing can be simplified from to be linear in M by exploiting the structure of. Say that contains the indices of active elements in s. Here where and are the value of and when activating index. Then only need to be computed for surviving indices. This trick makes that the number of multiplications required by the algorithm to become

Numerical Results 1311/5/10 ■ FBMP behavior □ Parameters to be considered ▪ N ▪ M ▪ SNR ▪ p1 ▪ P = where P0 = is the target value of ▪ D □ Results: ▪ Figure 1 – Normalized MSE ~ M, D ▪ Figure2 – Average number of missed coeffcients ~ M, D ▪ Figure3 – Normalized MSE ~ Number of active coefficents, D ▪ Figure4 – Normalized MSE ~ SNR, D ▪ Figure5- Normalized MSE ~ SNR, for MMSE and MAP cases ▪ Figure6- Average FBMP runtime ~ D

1411/5/10

1511/5/10

1611/5/10

1711/5/10

1811/5/10

1911/5/10

Numerical Results (continued) 2011/5/10 ■ Comparison To other Algorithms □ Algorithms used comparison ▪ SparseBayes ▪ OMP ▪ StOMP ▪ GPSR-Basic ▪ BCS □ Measurements for comparison ▪ Normalized MSE ~ M ▪ Normalized MSE ~ SNR ▪ Average Runtime ~ M

2111/5/10

2211/5/10

2311/5/10

Conclusion 2411/5/10 ■ Brief Review of the Process 1.FBMP models each unknown coefficient as either inactive or active (with prior probability p1) 2.The Gaussian distribution (zero mean and variance )is assigned to the values 3.Observation y is modeled as an AWGN-corrupted version of the unknown coefficients mixed by a known matrix A 4.FBMP searches the active/inactive configuration S to find the subset S* with dominant posterior probability. 5.Parameter D is used to control the tradeoff between complexity and accuracy. ■ Numerical results show that the FBMP estimation outperform (NMSE) those other popular algorithms by several dB in certain situations.

Thank you! 2511/5/10