Multiscale Likelihood Analysis and Image Reconstruction Rebecca Willett and Robert Nowak presented by Veronique Delouille Rice University University of Wisconsin-Madison You’ll have 20 minutes total. 15 minute talk with 5 min for questions?
Beyond Wavelets Multiresolution analysis is a powerful tool, but what about… Edges? Nongaussian noise? Inverse problems? Piecewise polynomial and platelet-based estimators address these issues. Non-Gaussian data? Image Edges? Inverse problems? Wavelet analysis has led to recent breakthroughs in statistical signal and image processing. They are effective because they adaptively zoom in and out on the data for more effective analysis. Despite the success of wavelets, however, several key issues are not adequately handled with conventional wavelet-based statistical methods. Some of these issues are: 1. Many applications, particularly in astronomy, involve non-Gaussian data and standard wavelet denoising methods are difficult to analyze. 2. Images differ from one-dimensional signals in that edges are not isolated point singularies, but instead occur on one-dimensional manifolds. Classical wavelet bases do not provide good approximations to edges of this type. 3. Ill-conditioned inverse problems are routinely encountered. Simple wavelet thresholding/shrinkage methods are applicable in only a small subset of inverse problems of interest. The methods presented here are similar to wavelets in that they too adaptively zoom in and out on the data, but these new methods have been designed to handle challenges such as these. I’ll first discuss the piecewise polynomial technique for one dimensional signals, and then I’ll discuss it’s extension to two dimensions.
Non-Gaussian Noise GRB: BATSE 845 Observed Poisson Data Begin by addressing non-Gaussian noise problems in the special case of 1D. Astronomical signals, such as this observation of a gamma ray burst, are frequently non Gaussian. In fact, the data are collected by counting the number of photons hitting a detector over time, and these statistics are distinctly Poissonian. The analysis of most conventional wavelet-based techniques, however, assumes that the observations are Gaussian. If we were to use wavelet-based techniques to estimate the underlying intensity of this signal, the analysis of the wavelet coefficients would be intractable and we would not be able to perform tasks such as setting optimal threshold levels, leading to distorted results. The only exception to this is the Haar wavelet analysis, which leads to piecewise constant intensity estimates such as the one shown here. We would like to produce smoother estimates of intensities of this type using methods optimal for non-Gaussian data. Haar Wavelet Estimate of time-varying intensity
Optimal Tree Pruning Piecewise Constant Estimate Piecewise Linear Estimate Thresholding Haar wavelet coefficents is effectively optimally pruning a binary tree representation of the observations until each leaf node represents an interval of uniform intensity. Smoother results can be obtained with piecewise polynomial estimates. As mentioned before, wavelets can produce smooth results but are difficult to analyze. It is possible, however, to conduct optimal tree pruning and have each leaf node represent a polynomially varying intensity. Such a tree pruning algorithm can be precisely analyzed for non-Gaussian noise models and thus does not exhibit the artifacts associated with wavelet analysis on non-Gaussian data.
Piecewise Polynomial Estimation Observations Haar Est. Piecewise Cubic Est. When we apply this tree-pruning to the GRB data to obtain a piecewise cubic estimate of the intensity, we get the result displayed at the bottom of the screen. Clearly this estimate is smoother than the Haar estimate, and, unlike a traditional wavelet-based estimate, it is optimal for the observation model. Let’s explore this optimality more carefully... (onto next slide)
Penalized Likelihood Estimation Minimize –log(likelihood) + g penalty Theory: Variance (# terms) Sparse approximation lower variance fewer artifacts The proposed tree pruning can be accomplished with in a penalized likelihood estimation framework. In this case, we penalize the negative log likelihood with a penalty proportional to the number of terms in the approximation (e.g. the number of cubic pieces in the approximation on the previous slide). The idea is to balance the estimate between fidelity to the data and the complexity of the estimate. An estimate with a large number of terms could fit the data very well. This is a problem because it means another observation of the same process would result in a very different estimate - that is, the estimator would have a high variance. Conversely, an estimate with a small number of terms would have a smaller variance because a large number of observations would be used to estimate each term. As a result, these estimators would exhibit fewer artifacts.
Upper Bounds on the Risk Goal: determine the best penalization strategy such that minimizing results in a bound on the risk Now the key questions are: How should the penalty be calculated? 2.Given a penalization scheme, what can we say about the expected error? We are going to discuss these bounds in terms of density estimation, but it’s important to emphasize that they apply to Poisson intensity estimate with only minor alterations. Specifically, observations of random variables can be modeled using a multinomial distribution, where the parameters of the multinomial correspond to the pmf we are trying to estimate. This is like estimating Poisson observations with a priori knowledge of how many events would be counted. We use the squared Hellinger distance for a number of reasons. First, it is a general non-parametric measure appropriate for any density. Second, the Hellinger distance provides an upper and lower bound on the L1 error. The significance of these bounds will be detailed later. Estimate True Density Squared Hellinger distance
Minimum Description Length For fixed-degree polynomials (pen number of bits needed to prefix-encode order-r, m piece estimate; guaranteed to satisfy the Kraft inequality) Partition Polynomial coefficients We have demonstrated that if the penalty is set to be proportional to the number of bits required to prefix code the estimate, then the penalty would satisfy the Kraft inequality and allow us to take advantage of a key error bound theorem first developed by Li and Barron and then generalized by Kolaczyk and Nowak. In this case, the penalty has two parts: one for encoding the locations of the knots in the polynomial and another for encoding the coefficients on each polynomial piece.
Fixed-degree Risk Bound This is near-optimal: Give the penalization scheme on the previous slide, we can bound the error as stated in this theorem. (If someone asks, this is for f in a Besov space parameterized by alpha). Upper bound within logarithmic factor of lower bound. This tells us that we do not expect to be able to find any other estimator able to asymptotically outperform the one proposed here.
Implications for l1 Error Similar bounds can also be developed for the L1 error. L1 errors are particularly interesting in density estimation because of Scheffe’s identity, which tells us that a bound on the L1 error provides a bound on the difference of the true probability measure and the density estimator’s measure on every event of interest. As in the Hellinger case, the L1 error upper bound is within a logarithmic factor of the lower bound.
Minimum Description Length For fixed-degree polynomials (pen number of bits needed to prefix-encode order-r, m piece estimate; guaranteed to satisfy the Kraft inequality) Partition Polynomial coefficients For free-degree polynomials (pen number of bits needed to prefix-encode estimate with d coefficients on m intervals with lengths {li}) These ideas can easily be extended to the case of free degree polynomials. The tree-pruning algorithm described above allows free-degree piecewise polynomial estimates to be computed with a computationally efficient algorithm, and this method also has some desireable theoretical properties. First note that the penalization structured is about the same, except that for free-degree, we also have to encode the degree of each polynomial piece in the estimate. Partition Polynomial coefficients Distribution of coefficients
Near-parametric Rates for Free-Degree Estimation Given the penalization scheme on the previous slide, we can bound the risk (for some f) by a term within a logarithmic factor of the parametric rate. That means that if we knew the form of f a priori and just had to estimate the beta parameters, asymptotically we wouldn’t do much better than this free-degree method.
Density Estimation Simulation Comparing the proposed density estimation method with the method proposed by Donoho et al. These are average errors after a number of trials. In each case, the penalties were weighted to minimized the average L1 error. This demonstrates that the proposed tree-pruning method, which uses the more precise multinomial model, outperforms other method which use a Gaussian approximation, as predicted by the theory. D. Donoho, I. Johnstone, G. Kerkyacharian, and D. Picard, “Density estimation by wavelet thresholding,” Ann. Statist., vol. 24, pp. 508–539, 1996.
Haar Wavelets and Wedgelets Original Image Haar Wavelet Partition Wedgelet Partition Wedgelet Donoho (Stanford) began to address the issue of edge representation with his work on wedgelets. Instead of representing edges with successive refinements of the dyadic partition space, he would refine the partition until he could approximate an edge with a straight line to within a given error. The image examined here is a Shepp-Logan phantom, commonly used for PET reconstruction simulations. Below the original image is a picture of just its edges. Next is a Haar wavelet approximation of the phantom, and the partition space underlying the approximation. Finally we see a wedgelet approximation of the phantom, exhibiting the same total MSE as the Haar approximation, and its underlying partition space. Clearly the use of wedgelets affords a more efficient partitioning of the image into regions of constant intensity. The key idea behind wedgelets is that not all edges in images are strictly vertical or horizontal, and that we can better approximate these edges by accounting for this fact. Analogously, not all smooth regions in images exhibit homogeneous intensities. Platelets account for this fact and attempt to better approximate smoothly varying regions.
Approximation with Platelets True profile Wedgelet profile Instead of approximating the image on each cell of the partition by a constant, as is done in a Haarwavelet or wedgelet analysis, we can approximate it with a planar surface.Each platelet requires three coefficients, compared with the one coefficient for piecewise constantapproximation. These cross-sections demonstrate the piecewise-linear nature of platelet approximations. Platelet profile
Platelet Approximation Theory Twice continuously differentiable Twice continuously differentiable m-term approximation error decay rate: Fourier: O(m-1/2) Wavelets: O(m-1) Wedgelets: O(m-1) Platelets: O(m-2) We have show that for images consisting of smooth regions separated by smooth boundaries, m-term platelet approximations may significantly outperform Fourier, wavelet, or wedgelet approximations, which have rates of O(m-1/2 ), O(m-1), and O(m-1), respectively, for this class. Wavelets and Fourier approximations do not perform well on this class of images due to the boundary. Conversely, wedgelets can handle boundaries of this type, but produce piecewise constant approximations and perform poorly in the smoother (but non-constant) regions of images. Using the m-term approximation theory for platelets, combined with the estimation error bounding technique outlined above, it is straightforward to bound platelet approximation error.
Platelet Denoising Noisy Image Haar Estimate Platelet Estimate Point out better edges and surface representations. Noisy Image Haar Estimate Platelet Estimate
Inverse Problems Goal: estimate f from observations x ~ Poisson(Pm) EM-MLE algorithm: This theory can also be extended to solving inverse problems in a very general context. The EM-MEL is typically used in medical imaging or astronomical applications; however, this can lead to strong artifacts as the number of iterations gets large, and demonstrated in the movie above. However, we have shown the the MLE part of the EM-MLE algorithm can be replaced by the platelet penalized likelihood estimation described above for improved results.
Platelet Performance MLE recon-struction Haar recon-struction Point out how error decays with iteration, so no one has to guess about where to stop the iterations. Also, the platelet estimate converges to have error less than the minimum possible error achievable by the EM-MLE. Finally, point out how many fewer artifacts the Platelet approximation has. Platelet reconstruction
Edge and surface approximations with platelets fewer noise artifacts Conclusions Edge and surface approximations with platelets fewer noise artifacts Multiscale polynomial and platelet tree pruning computationally near as efficient as wavelet analysis Beyond pixels, beyond wavelets, there are piecewise polynomials and platelets!