Empirical Bayes approaches to thresholding Bernard Silverman, University of Bristol (joint work with Iain Johnstone, Stanford) IMS meeting 30 July 2002.

Slides:



Advertisements
Similar presentations
Bayesian Learning & Estimation Theory
Advertisements

Pattern Recognition and Machine Learning
Pattern Recognition and Machine Learning
Lecture XXIII.  In general there are two kinds of hypotheses: one concerns the form of the probability distribution (i.e. is the random variable normally.
ECE 8443 – Pattern Recognition LECTURE 05: MAXIMUM LIKELIHOOD ESTIMATION Objectives: Discrete Features Maximum Likelihood Resources: D.H.S: Chapter 3 (Part.
Signal Denoising with Wavelets. Wavelet Threholding Assume an additive model for a noisy signal, y=f+n K is the covariance of the noise Different options.
Chapter 7 Title and Outline 1 7 Sampling Distributions and Point Estimation of Parameters 7-1 Point Estimation 7-2 Sampling Distributions and the Central.
Fast Bayesian Matching Pursuit Presenter: Changchun Zhang ECE / CMR Tennessee Technological University November 12, 2010 Reading Group (Authors: Philip.
Multi-Task Compressive Sensing with Dirichlet Process Priors Yuting Qi 1, Dehong Liu 1, David Dunson 2, and Lawrence Carin 1 1 Department of Electrical.
CSC321: 2011 Introduction to Neural Networks and Machine Learning Lecture 10: The Bayesian way to fit models Geoffrey Hinton.
CMPUT 466/551 Principal Source: CMU
Reducing Drift in Parametric Motion Tracking
2008 SIAM Conference on Imaging Science July 7, 2008 Jason A. Palmer
ECE 472/572 - Digital Image Processing Lecture 8 - Image Restoration – Linear, Position-Invariant Degradations 10/10/11.
Visual Recognition Tutorial
Part 4 b Forward-Backward Algorithm & Viterbi Algorithm CSE717, SPRING 2008 CUBS, Univ at Buffalo.
0 Pattern Classification All materials in these slides were taken from Pattern Classification (2nd ed) by R. O. Duda, P. E. Hart and D. G. Stork, John.
Machine Learning CMPT 726 Simon Fraser University
Visual Recognition Tutorial
(1) A probability model respecting those covariance observations: Gaussian Maximum entropy probability distribution for a given covariance observation.
Lecture II-2: Probability Review
CSC2535: 2013 Advanced Machine Learning Lecture 3a: The Origin of Variational Bayes Geoffrey Hinton.
Image Denoising using Wavelet Thresholding Techniques Submitted by Yang
EE513 Audio Signals and Systems Statistical Pattern Classification Kevin D. Donohue Electrical and Computer Engineering University of Kentucky.
Binary Variables (1) Coin flipping: heads=1, tails=0 Bernoulli Distribution.
0 Pattern Classification, Chapter 3 0 Pattern Classification All materials in these slides were taken from Pattern Classification (2nd ed) by R. O. Duda,
The horseshoe estimator for sparse signals CARLOS M. CARVALHO NICHOLAS G. POLSON JAMES G. SCOTT Biometrika (2010) Presented by Eric Wang 10/14/2010.
Speech Recognition Pattern Classification. 22 September 2015Veton Këpuska2 Pattern Classification  Introduction  Parametric classifiers  Semi-parametric.
Prof. Dr. S. K. Bhattacharjee Department of Statistics University of Rajshahi.
ECE 8443 – Pattern Recognition LECTURE 03: GAUSSIAN CLASSIFIERS Objectives: Normal Distributions Whitening Transformations Linear Discriminants Resources.
“A fast method for Underdetermined Sparse Component Analysis (SCA) based on Iterative Detection- Estimation (IDE)” Arash Ali-Amini 1 Massoud BABAIE-ZADEH.
ECE 8443 – Pattern Recognition ECE 8423 – Adaptive Signal Processing Objectives: Deterministic vs. Random Maximum A Posteriori Maximum Likelihood Minimum.
Learning Theory Reza Shadmehr Linear and quadratic decision boundaries Kernel estimates of density Missing data.
Bayesian Methods I: Parameter Estimation “A statistician is a person who draws a mathematically precise line from an unwarranted assumption to a foregone.
- 1 - Bayesian inference of binomial problem Estimating a probability from binomial data –Objective is to estimate unknown proportion (or probability of.
Image Denoising Using Wavelets
Example: Bioassay experiment Problem statement –Observations: At each level of dose, 5 animals are tested, and number of death are observed.
Chapter 3: Maximum-Likelihood Parameter Estimation l Introduction l Maximum-Likelihood Estimation l Multivariate Case: unknown , known  l Univariate.
INTRODUCTION TO Machine Learning 3rd Edition
Application: Signal Compression Jyun-Ming Chen Spring 2001.
Chapter 7 Point Estimation of Parameters. Learning Objectives Explain the general concepts of estimating Explain important properties of point estimators.
Lecture 2: Statistical learning primer for biologists
ETHEM ALPAYDIN © The MIT Press, Lecture Slides for.
ECE 8443 – Pattern Recognition ECE 8527 – Introduction to Machine Learning and Pattern Recognition LECTURE 12: Advanced Discriminant Analysis Objectives:
ECE 8443 – Pattern Recognition ECE 8423 – Adaptive Signal Processing Objectives: MLLR For Two Gaussians Mean and Variance Adaptation MATLB Example Resources:
Zhilin Zhang, Bhaskar D. Rao University of California, San Diego March 28,
Statistics Sampling Distributions and Point Estimation of Parameters Contents, figures, and exercises come from the textbook: Applied Statistics and Probability.
Optimal Eye Movement Strategies In Visual Search.
Introduction to Estimation Theory: A Tutorial
Imola K. Fodor, Chandrika Kamath Center for Applied Scientific Computing Lawrence Livermore National Laboratory IPAM Workshop January, 2002 Exploring the.
Lecture 22 Image Restoration. Image restoration Image restoration is the process of recovering the original scene from the observed scene which is degraded.
Density Estimation in R Ha Le and Nikolaos Sarafianos COSC 7362 – Advanced Machine Learning Professor: Dr. Christoph F. Eick 1.
CSC321: Lecture 8: The Bayesian way to fit models Geoffrey Hinton.
A Study on Speaker Adaptation of Continuous Density HMM Parameters By Chin-Hui Lee, Chih-Heng Lin, and Biing-Hwang Juang Presented by: 陳亮宇 1990 ICASSP/IEEE.
ETHEM ALPAYDIN © The MIT Press, Lecture Slides for.
Biointelligence Laboratory, Seoul National University
Bayesian Semi-Parametric Multiple Shrinkage
Chapter 3: Maximum-Likelihood Parameter Estimation
Probability Theory and Parameter Estimation I
LECTURE 11: Advanced Discriminant Analysis
Chapter 2 Simple Comparative Experiments
Special Topics In Scientific Computing
Image Denoising in the Wavelet Domain Using Wiener Filtering
Course Outline MODEL INFORMATION COMPLETE INCOMPLETE
Wavelet-Based Denoising Using Hidden Markov Models
Where did we stop? The Bayes decision rule guarantees an optimal classification… … But it requires the knowledge of P(ci|x) (or p(x|ci) and P(ci)) We.
Wavelet-Based Denoising Using Hidden Markov Models
Pattern Recognition and Machine Learning
Parametric Methods Berlin Chen, 2005 References:
Applied Statistics and Probability for Engineers
Presentation transcript:

Empirical Bayes approaches to thresholding Bernard Silverman, University of Bristol (joint work with Iain Johnstone, Stanford) IMS meeting 30 July 2002

30 July 2002IMSJohnstone/Silverman 2 Finding needles or hay in haystacks Archetypal problem: n noisy observations Sequence  i may well be sparse, but not necessarily Assume noise variance 1; no restriction in practice

30 July 2002IMSJohnstone/Silverman 3 Examples Wavelet coefficients at each level of an unknown function Coefficients in some more general dictionary Pixels of a nearly, or not so nearly, black object or image

30 July 2002IMS Needles or hay? Needles : rare objects in the noise if the observation was 3, would be inclined to think it was straw Hay : common objects non-sparse signal if the observation was 3, would be inclined to think it was a nonzero “object” A good method will adapt to either situation automatically

30 July 2002IMSJohnstone/Silverman 5 Thresholding Choose a threshold t If |Y i | is less than t, estimate  i = 0 If |Y i | is greater than t, estimate  i = Y i Gain strength from sparsity; if sparse a high threshold gives great accuracy Data-dependent choice of threshold is essential to adapt to sparsity: a high threshold can be disadvantageous for a ‘dense’ signal

30 July 2002IMSJohnstone/Silverman 6 Aims for a thresholding method Adaptive to sparse and dense signals Stable to small data changes Tractable, with available software Performs well on simulations Performs well on real data Has good theoretical properties Our method does all these!

30 July 2002IMSJohnstone/Silverman 7 Bayesian Formulation Prior for each parameter is a mixture of an atom at zero (prob 1-w) and a suitable heavy-tailed density  (prob w) Posterior median is a true thresholding rule; denote its threshold by t(w) Small w  large threshold, so want small w for sparse signals, large for dense

30 July 2002IMSJohnstone/Silverman 8 Other possible thresholding rules hard or soft thresholding with the same threshold t(w) posterior mean: not a strict thresholding rule Posterior probability of non-zero gives probability that pixel/coefficient/feature is ‘really there’ –threshold if this prob is < 0.5 –threshold for some larger prob? Mean and Bayes factor rules generalize to complex and multivariate case

30 July 2002IMSJohnstone/Silverman 9 Let g= convolution of  and normal density Marginal log likelihood of w is computationally tractable to maximize automatically adaptive; gives large w if a large number of Y i are large and vice versa Empirical Bayes: Data-based choice of w

30 July 2002IMSJohnstone/Silverman 10 Example Six signals of varying sparsity Each has values arranged as an image for display Independent Gaussian noise added Excellent behaviour of the MML automatic thresholding method is borne out by other simulations, including in the wavelet context

30 July 2002IMS

30 July 2002IMS

Root mean square error plotted against threshold

30 July 2002IMSJohnstone/Silverman 14 Root mean square error plotted against threshold Much lower RMSE can be obtained for sparse signals with suitable thresholding Best threshold decreases as density increases The MML automatic choice of threshold is excellent

30 July 2002IMS Estimates obtained with optimal threshold

30 July 2002IMSJohnstone/Silverman 16 Theoretical Properties Characterize sparseness by n -1  |  i | p   p for some small p > 0 Among all signals with given energy (sum of squares), the sparsest are those with small l p norm For signals with this level of sparsity, best possible estimation MSE is O(  p |log  | (2-p)/2 )

30 July 2002IMSJohnstone/Silverman 17 Automatic adaptivity MML thresholding method achieves this best mean square error rate, without telling it p or , all the way down to p=0 Price to pay is an additional O(n -1 log 3 n) term Result also works if error is measured in q- norm, for 0 < q  2

30 July 2002IMSJohnstone/Silverman 18 Adaptivity for standard wavelet transform Assume MML method is applied level by level Assume array of coefficients lies in some Besov class with 0 < p  2; allows for a very wide range of function classes, including very inhomogeneous Allow mean q-norm error Apart from an O(n -1 log 4 n) term, achieve minimax rate regardless of parameters.

30 July 2002IMSJohnstone/Silverman 19 Conclusion to this part Empirical Bayes thresholding has great promise as an adaptive method Wavelets are only one of many contexts where this approach can be used Bayesian aspects have not been considered much in practical contexts; if you want 95% posterior probability that a feature is there, you just increase the threshold