Multiscale Likelihood Analysis and Image Reconstruction

Slides:



Advertisements
Similar presentations
Brief introduction on Logistic Regression
Advertisements

Principal Component Analysis Based on L1-Norm Maximization Nojun Kwak IEEE Transactions on Pattern Analysis and Machine Intelligence, 2008.
11/11/02 IDR Workshop Dealing With Location Uncertainty in Images Hasan F. Ates Princeton University 11/11/02.
Fast Algorithms For Hierarchical Range Histogram Constructions
1 12. Principles of Parameter Estimation The purpose of this lecture is to illustrate the usefulness of the various concepts introduced and studied in.
Chap 8: Estimation of parameters & Fitting of Probability Distributions Section 6.1: INTRODUCTION Unknown parameter(s) values must be estimated before.
Multiscale Analysis of Photon-Limited Astronomical Images Rebecca Willett.
Multiscale Analysis for Intensity and Density Estimation Rebecca Willett’s MS Defense Thanks to Rob Nowak, Mike Orchard, Don Johnson, and Rich Baraniuk.
1 Inzell, Germany, September 17-21, 2007 Agnieszka Lisowska University of Silesia Institute of Informatics Sosnowiec, POLAND
Visual Recognition Tutorial
Clustering short time series gene expression data Jason Ernst, Gerard J. Nau and Ziv Bar-Joseph BIOINFORMATICS, vol
Kernel methods - overview
0 Pattern Classification All materials in these slides were taken from Pattern Classification (2nd ed) by R. O. Duda, P. E. Hart and D. G. Stork, John.
Spatial and Temporal Data Mining
Evaluating Hypotheses
Machine Learning CUNY Graduate Center Lecture 3: Linear Regression.
Code and Decoder Design of LDPC Codes for Gbps Systems Jeremy Thorpe Presented to: Microsoft Research
Clustering with Bregman Divergences Arindam Banerjee, Srujana Merugu, Inderjit S. Dhillon, Joydeep Ghosh Presented by Rohit Gupta CSci 8980: Machine Learning.
Representation and Compression of Multi-Dimensional Piecewise Functions Dror Baron Signal Processing and Systems (SP&S) Seminar June 2009 Joint work with:
Maximum likelihood (ML)
Entropy and some applications in image processing Neucimar J. Leite Institute of Computing
Binary Variables (1) Coin flipping: heads=1, tails=0 Bernoulli Distribution.
Machine Learning CUNY Graduate Center Lecture 3: Linear Regression.
Prof. Dr. S. K. Bhattacharjee Department of Statistics University of Rajshahi.
Random Sampling, Point Estimation and Maximum Likelihood.
Lecture 12 Statistical Inference (Estimation) Point and Interval estimation By Aziza Munir.
Hierarchical Distributed Genetic Algorithm for Image Segmentation Hanchuan Peng, Fuhui Long*, Zheru Chi, and Wanshi Siu {fhlong, phc,
ECE 8443 – Pattern Recognition ECE 8423 – Adaptive Signal Processing Objectives: Deterministic vs. Random Maximum A Posteriori Maximum Likelihood Minimum.
The Examination of Residuals. Examination of Residuals The fitting of models to data is done using an iterative approach. The first step is to fit a simple.
ECE 8443 – Pattern Recognition LECTURE 10: HETEROSCEDASTIC LINEAR DISCRIMINANT ANALYSIS AND INDEPENDENT COMPONENT ANALYSIS Objectives: Generalization of.
ECE 8443 – Pattern Recognition ECE 8423 – Adaptive Signal Processing Objectives: ML and Simple Regression Bias of the ML Estimate Variance of the ML Estimate.
Lecture 4: Statistics Review II Date: 9/5/02  Hypothesis tests: power  Estimation: likelihood, moment estimation, least square  Statistical properties.
Image Denoising Using Wavelets
EE565 Advanced Image Processing Copyright Xin Li Image Denoising Theory of linear estimation Spatial domain denoising techniques Conventional Wiener.
Coarse-to-Fine Image Reconstruction Rebecca Willett In collaboration with Robert Nowak and Rui Castro.
Robust Estimation With Sampling and Approximate Pre-Aggregation Author: Christopher Jermaine Presented by: Bill Eberle.
ECE 8443 – Pattern Recognition ECE 8527 – Introduction to Machine Learning and Pattern Recognition LECTURE 12: Advanced Discriminant Analysis Objectives:
Machine Learning 5. Parametric Methods.
Bivariate Splines for Image Denoising*° *Grant Fiddyment University of Georgia, 2008 °Ming-Jun Lai Dept. of Mathematics University of Georgia.
Active Learning and the Importance of Feedback in Sampling Rui Castro Rebecca Willett and Robert Nowak.
ETHEM ALPAYDIN © The MIT Press, Lecture Slides for.
1 CS 391L: Machine Learning: Computational Learning Theory Raymond J. Mooney University of Texas at Austin.
Combining Models Foundations of Algorithms and Machine Learning (CS60020), IIT KGP, 2017: Indrajit Bhattacharya.
Multiscale Likelihood Analysis and Inverse Problems in Imaging
Chapter-4 Single-Photon emission computed tomography (SPECT)
Data Transformation: Normalization
Chapter 7. Classification and Prediction
Compressive Coded Aperture Video Reconstruction
12. Principles of Parameter Estimation
LECTURE 11: Advanced Discriminant Analysis
Appendix A: Probability Theory
Multiscale Representations for Point Cloud Data
Hypothesis Testing and Confidence Intervals (Part 1): Using the Standard Normal Lecture 8 Justin Kern October 10 and 12, 2017.
Morphing and Shape Processing
Data Mining Lecture 11.
Bias and Variance of the Estimator
Data Analysis in Particle Physics
Fitting Curve Models to Edges
Chapter 2 Minimum Variance Unbiased estimation
Statistical Methods For Engineers
Modelling data and curve fitting
Where did we stop? The Bayes decision rule guarantees an optimal classification… … But it requires the knowledge of P(ci|x) (or p(x|ci) and P(ci)) We.
Lecture 4: Econometric Foundations
Foundation of Video Coding Part II: Scalar and Vector Quantization
Product moment correlation
Parametric Methods Berlin Chen, 2005 References:
Machine learning overview
12. Principles of Parameter Estimation
Generalized Additive Model
Introduction to Artificial Intelligence Lecture 22: Computer Vision II
Presentation transcript:

Multiscale Likelihood Analysis and Image Reconstruction Rebecca Willett and Robert Nowak presented by Veronique Delouille Rice University University of Wisconsin-Madison You’ll have 20 minutes total. 15 minute talk with 5 min for questions?

Beyond Wavelets Multiresolution analysis is a powerful tool, but what about… Edges? Nongaussian noise? Inverse problems? Piecewise polynomial and platelet-based estimators address these issues. Non-Gaussian data? Image Edges? Inverse problems? Wavelet analysis has led to recent breakthroughs in statistical signal and image processing. They are effective because they adaptively zoom in and out on the data for more effective analysis. Despite the success of wavelets, however, several key issues are not adequately handled with conventional wavelet-based statistical methods. Some of these issues are: 1. Many applications, particularly in astronomy, involve non-Gaussian data and standard wavelet denoising methods are difficult to analyze. 2. Images differ from one-dimensional signals in that edges are not isolated point singularies, but instead occur on one-dimensional manifolds. Classical wavelet bases do not provide good approximations to edges of this type. 3. Ill-conditioned inverse problems are routinely encountered. Simple wavelet thresholding/shrinkage methods are applicable in only a small subset of inverse problems of interest. The methods presented here are similar to wavelets in that they too adaptively zoom in and out on the data, but these new methods have been designed to handle challenges such as these. I’ll first discuss the piecewise polynomial technique for one dimensional signals, and then I’ll discuss it’s extension to two dimensions.

Non-Gaussian Noise GRB: BATSE 845 Observed Poisson Data Begin by addressing non-Gaussian noise problems in the special case of 1D. Astronomical signals, such as this observation of a gamma ray burst, are frequently non Gaussian. In fact, the data are collected by counting the number of photons hitting a detector over time, and these statistics are distinctly Poissonian. The analysis of most conventional wavelet-based techniques, however, assumes that the observations are Gaussian. If we were to use wavelet-based techniques to estimate the underlying intensity of this signal, the analysis of the wavelet coefficients would be intractable and we would not be able to perform tasks such as setting optimal threshold levels, leading to distorted results. The only exception to this is the Haar wavelet analysis, which leads to piecewise constant intensity estimates such as the one shown here. We would like to produce smoother estimates of intensities of this type using methods optimal for non-Gaussian data. Haar Wavelet Estimate of time-varying intensity

Optimal Tree Pruning Piecewise Constant Estimate Piecewise Linear Estimate Thresholding Haar wavelet coefficents is effectively optimally pruning a binary tree representation of the observations until each leaf node represents an interval of uniform intensity. Smoother results can be obtained with piecewise polynomial estimates. As mentioned before, wavelets can produce smooth results but are difficult to analyze. It is possible, however, to conduct optimal tree pruning and have each leaf node represent a polynomially varying intensity. Such a tree pruning algorithm can be precisely analyzed for non-Gaussian noise models and thus does not exhibit the artifacts associated with wavelet analysis on non-Gaussian data.

Piecewise Polynomial Estimation Observations Haar Est. Piecewise Cubic Est. When we apply this tree-pruning to the GRB data to obtain a piecewise cubic estimate of the intensity, we get the result displayed at the bottom of the screen. Clearly this estimate is smoother than the Haar estimate, and, unlike a traditional wavelet-based estimate, it is optimal for the observation model. Let’s explore this optimality more carefully... (onto next slide)

Penalized Likelihood Estimation Minimize –log(likelihood) + g penalty Theory: Variance  (# terms) Sparse approximation  lower variance  fewer artifacts The proposed tree pruning can be accomplished with in a penalized likelihood estimation framework. In this case, we penalize the negative log likelihood with a penalty proportional to the number of terms in the approximation (e.g. the number of cubic pieces in the approximation on the previous slide). The idea is to balance the estimate between fidelity to the data and the complexity of the estimate. An estimate with a large number of terms could fit the data very well. This is a problem because it means another observation of the same process would result in a very different estimate - that is, the estimator would have a high variance. Conversely, an estimate with a small number of terms would have a smaller variance because a large number of observations would be used to estimate each term. As a result, these estimators would exhibit fewer artifacts.

Upper Bounds on the Risk Goal: determine the best penalization strategy such that minimizing results in a bound on the risk Now the key questions are: How should the penalty be calculated? 2.Given a penalization scheme, what can we say about the expected error? We are going to discuss these bounds in terms of density estimation, but it’s important to emphasize that they apply to Poisson intensity estimate with only minor alterations. Specifically, observations of random variables can be modeled using a multinomial distribution, where the parameters of the multinomial correspond to the pmf we are trying to estimate. This is like estimating Poisson observations with a priori knowledge of how many events would be counted. We use the squared Hellinger distance for a number of reasons. First, it is a general non-parametric measure appropriate for any density. Second, the Hellinger distance provides an upper and lower bound on the L1 error. The significance of these bounds will be detailed later. Estimate True Density Squared Hellinger distance

Minimum Description Length For fixed-degree polynomials (pen  number of bits needed to prefix-encode order-r, m piece estimate; guaranteed to satisfy the Kraft inequality) Partition Polynomial coefficients We have demonstrated that if the penalty is set to be proportional to the number of bits required to prefix code the estimate, then the penalty would satisfy the Kraft inequality and allow us to take advantage of a key error bound theorem first developed by Li and Barron and then generalized by Kolaczyk and Nowak. In this case, the penalty has two parts: one for encoding the locations of the knots in the polynomial and another for encoding the coefficients on each polynomial piece.

Fixed-degree Risk Bound This is near-optimal: Give the penalization scheme on the previous slide, we can bound the error as stated in this theorem. (If someone asks, this is for f in a Besov space parameterized by alpha). Upper bound within logarithmic factor of lower bound. This tells us that we do not expect to be able to find any other estimator able to asymptotically outperform the one proposed here.

Implications for l1 Error Similar bounds can also be developed for the L1 error. L1 errors are particularly interesting in density estimation because of Scheffe’s identity, which tells us that a bound on the L1 error provides a bound on the difference of the true probability measure and the density estimator’s measure on every event of interest. As in the Hellinger case, the L1 error upper bound is within a logarithmic factor of the lower bound.

Minimum Description Length For fixed-degree polynomials (pen  number of bits needed to prefix-encode order-r, m piece estimate; guaranteed to satisfy the Kraft inequality) Partition Polynomial coefficients For free-degree polynomials (pen  number of bits needed to prefix-encode estimate with d coefficients on m intervals with lengths {li}) These ideas can easily be extended to the case of free degree polynomials. The tree-pruning algorithm described above allows free-degree piecewise polynomial estimates to be computed with a computationally efficient algorithm, and this method also has some desireable theoretical properties. First note that the penalization structured is about the same, except that for free-degree, we also have to encode the degree of each polynomial piece in the estimate. Partition Polynomial coefficients Distribution of coefficients

Near-parametric Rates for Free-Degree Estimation Given the penalization scheme on the previous slide, we can bound the risk (for some f) by a term within a logarithmic factor of the parametric rate. That means that if we knew the form of f a priori and just had to estimate the beta parameters, asymptotically we wouldn’t do much better than this free-degree method.

Density Estimation Simulation Comparing the proposed density estimation method with the method proposed by Donoho et al. These are average errors after a number of trials. In each case, the penalties were weighted to minimized the average L1 error. This demonstrates that the proposed tree-pruning method, which uses the more precise multinomial model, outperforms other method which use a Gaussian approximation, as predicted by the theory. D. Donoho, I. Johnstone, G. Kerkyacharian, and D. Picard, “Density estimation by wavelet thresholding,” Ann. Statist., vol. 24, pp. 508–539, 1996.

Haar Wavelets and Wedgelets Original Image Haar Wavelet Partition Wedgelet Partition Wedgelet Donoho (Stanford) began to address the issue of edge representation with his work on wedgelets. Instead of representing edges with successive refinements of the dyadic partition space, he would refine the partition until he could approximate an edge with a straight line to within a given error. The image examined here is a Shepp-Logan phantom, commonly used for PET reconstruction simulations. Below the original image is a picture of just its edges. Next is a Haar wavelet approximation of the phantom, and the partition space underlying the approximation. Finally we see a wedgelet approximation of the phantom, exhibiting the same total MSE as the Haar approximation, and its underlying partition space. Clearly the use of wedgelets affords a more efficient partitioning of the image into regions of constant intensity. The key idea behind wedgelets is that not all edges in images are strictly vertical or horizontal, and that we can better approximate these edges by accounting for this fact. Analogously, not all smooth regions in images exhibit homogeneous intensities. Platelets account for this fact and attempt to better approximate smoothly varying regions.

Approximation with Platelets True profile Wedgelet profile Instead of approximating the image on each cell of the partition by a constant, as is done in a Haarwavelet or wedgelet analysis, we can approximate it with a planar surface.Each platelet requires three coefficients, compared with the one coefficient for piecewise constantapproximation. These cross-sections demonstrate the piecewise-linear nature of platelet approximations. Platelet profile

Platelet Approximation Theory Twice continuously differentiable Twice continuously differentiable m-term approximation error decay rate: Fourier: O(m-1/2) Wavelets: O(m-1) Wedgelets: O(m-1) Platelets: O(m-2) We have show that for images consisting of smooth regions separated by smooth boundaries, m-term platelet approximations may significantly outperform Fourier, wavelet, or wedgelet approximations, which have rates of O(m-1/2 ), O(m-1), and O(m-1), respectively, for this class. Wavelets and Fourier approximations do not perform well on this class of images due to the boundary. Conversely, wedgelets can handle boundaries of this type, but produce piecewise constant approximations and perform poorly in the smoother (but non-constant) regions of images. Using the m-term approximation theory for platelets, combined with the estimation error bounding technique outlined above, it is straightforward to bound platelet approximation error.

Platelet Denoising Noisy Image Haar Estimate Platelet Estimate Point out better edges and surface representations. Noisy Image Haar Estimate Platelet Estimate

Inverse Problems Goal: estimate f from observations x ~ Poisson(Pm) EM-MLE algorithm: This theory can also be extended to solving inverse problems in a very general context. The EM-MEL is typically used in medical imaging or astronomical applications; however, this can lead to strong artifacts as the number of iterations gets large, and demonstrated in the movie above. However, we have shown the the MLE part of the EM-MLE algorithm can be replaced by the platelet penalized likelihood estimation described above for improved results.

Platelet Performance MLE recon-struction Haar recon-struction Point out how error decays with iteration, so no one has to guess about where to stop the iterations. Also, the platelet estimate converges to have error less than the minimum possible error achievable by the EM-MLE. Finally, point out how many fewer artifacts the Platelet approximation has. Platelet reconstruction

Edge and surface approximations with platelets  fewer noise artifacts Conclusions Edge and surface approximations with platelets  fewer noise artifacts Multiscale polynomial and platelet tree pruning  computationally near as efficient as wavelet analysis Beyond pixels, beyond wavelets, there are piecewise polynomials and platelets!