Multiscale Likelihood Analysis and Inverse Problems in Imaging Rob Nowak Rice University Electrical and Computer Engineering www.dsp.rice.edu/~nowak Supported by DARPA, INRIA, NSF, ARO, and ONR
Image Analysis Applications remote sensing brain mapping time image restoration network tomography
Maximum Likelihood Image Analysis unknown object statistical model measurements maximize likelihood physics data prior knowledge Maximum likelihood estimate
Analysis of Gamma-Ray Bursts unknown object statistical model measurements Maximum likelihood estimate maximize physics data prior knowledge photon counting piecewise smoothness estimate of underlying intensity
Image Deconvolution unknown object statistical model measurements Maximum likelihood estimate maximize physics data prior knowledge noise & blurring image model
Brain Tomography unknown object statistical model measurements Maximum likelihood estimate maximize physics data prior knowledge counting & projection anatomy & physiology
Wavelet-Based Multiresolution Analysis high resolution image mid resolution + prediction errors low resolution + prediction errors prediction errors wavelet coefficients most wavelet coefficients are zero sparse representation
Wavelets and Denoising Noisy image Wavelet transform Denoised reconstruction thresholding estimator ‘keep’ large coefficents, ‘kill’ small coefficients near minimax optimality over range of function spaces extensions to non-Gaussian noise models are non-trivial
Wavelets and Inverse Problems linear noise operator Two common approaches : Linear reconstruction, followed by nonlinear denoise image is easy to model in reconstruction domain, but noise is not Denoise raw data, followed by linear reconstruction noise is easy to model in observation domain, but image is not Exception: If linear operator K is approximately diagonalized by a wavelet transform, then both image and noise are easy to model in both domains
Example: Image Restoration convolution noise operator Deblurring in FFT Domain O(N log N) : Denoising in DWT Domain O(N) :
Wavelets and Inverse Problems Fast exact methods for special operators Donoho (1995) wavelet-vaguelette decomposition for scale-homogeneous operators (e.g., Radon transform) Rougé (1995), Kalifa & Mallat (1999) wavelet-packet decompositions for certain scale-inhomogeneous convolutions General Methods: Heavy computation or approximate solution Liu & Moulin (1998), Yi & Nowak (1999), Neelamani et al (1999), Belge et al (2000), Jalobeanu et al (2001), Rivaz et al (2001) General Methods: Fast exact solutions via EM Nowak & Kolaczyk (1999) Poisson inverse problems Figueiredo & Nowak (2001) Gaussian inverse problems
Maximum Penalized Likelihood Estimation (MPLE) = log likelihood function Gaussian Model: Poisson Model:
Image Complexity Images have structure Not a totally unorganized city Not a completely random brain Images have structure
Location Uncertainty in Images Location information is central to image analysis : When are there jumps / bursts ? Where are the edges ? Where are the hotspots ? From limited data we can detect & locate key “structure” “images” lie on a low-dimensional manifold … but we don’t know which manifold
Complexity Penalized Likelihood Analysis Let = wavelet/multiscale coefficients find with high likelihood but keep model as simple as possible Goal: Balance trade-off between model complexity and fitting to the data Examples: gamma-ray bursts: penalty # jumps / bumps satellite images & brains: penalty # boundaries / edges
function and maximizing this function Basic Approach Multiscale maximum penalized likelihood estimator (MPLE) is easy to compute if K = I If we could directly observe the image then we would have a simple denoising problem MPLE is very difficult to compute in indirect case – no analytic solution exists in general case (K I) Expectation Maximization (EM) Idea: introduce (unobserved) direct image data iterate between estimating direct data likelihood function and maximizing this function Expectation step involves simple linear filtering ! Maximization step involves simple denoising !
Key Ingredients Fast multiscale analysis and thresholding-type estimators for directly observed Gaussian or Poisson data Selection of the “right” unobserved data space Multiscale likelihood factorizations : generalization of conventional “energy-based” MRA to “information-based” MRA Re-interpretation of the inverse problem as combination of two imaging experiments – a fictitious direct (K=I) observation and real observed (K I) observation
Multiscale Likelihood Factorizations Assume direct observation model: noise (possibly nonGaussian) y = Orthonormal Wavelet Decomposition: multiscale decomposition into orthogonal energy components I I I Multiscale Likelihood Factorization: g(y | x) = f (y | y , ) c(I) I I I Haar analysis factorization into independent “information” components
Summary of Likelihood MRA Results Multiresolution Analysis (MRA): Set of sufficient conditions for multiscale factorization of likelihood Efficient Algorithms for MPLE: Linear-complexity algorithms for analogues of “denoising” methods Risk Bounds: Risk rates like for BV and Besov objects estimated from Gaussian or Poisson data
Wavelet MRA vs. Likelihood MRA Function Space MRA Hierarchy of nested subspaces Orthonormal Basis in Scaling between subspaces Translations within subspace Likelihood MRA Hierarchy of data partitions Statistical independence in data space Likelihood function reproduces under summation Conditional likelihood of child given parent is a single parameter density
Multiscale Data Aggregation basic Haar multiscale analysis
Multiscale Likelihood Factorizations original likelihood multiscale factorization
Examples Haar wavelet coefficient function of Haar wavelet & scaling coefficients
Denoising and Information Extraction Wavelet Denoising: “keep” “kill” Multiscale Information Extraction: “keep” “kill”
Penalized Likelihood Estimation Algorithm: Set penalty to count the number of non-trivial multiscale coefficient Then optimization reduces to a set of N separate Generalized Likelihood Ratio tests which can be computed in O(N) operations analog of hard thresholding
Risk Analysis Hellinger loss: Bound on Hellinger risk: (follows from Li & Barron ’99) KL distance Choice of Penalty:
Risk Analysis – Upper Bounds Theorem (Nowak & Kolaczyk ’01) : If underlying function / intensity x belongs to BV (=1) or Besov space , then Gaussian model: Poisson model: Multinomial model (density estimation): n = number of samples
Risk Analysis – Lower Bounds Theorem (Nowak & Kolaczyk ’01) Proof technique similar in spirit to “method of hyperrectangles” (Donoho et al ’90) Multiscale likelihood factorization plays key role in deriving minimax bounds in Poisson and multinomial cases Key Point: Multiscale MPLE
Gamma-Ray Burst Analysis photon counts burst time Compton Gamma-Ray Observatory Burst and Transient Source Experiment (BATSE) One burst (10’s of seconds) emits as much energy as our entire Milky Way does in one hundred years ! x-ray “after glow”
Gamma-Ray Burst Analysis (Poisson data) If we know where jumps are located, then N = number of measurements If we don’t know where jumps are located, then we can still achieve Simultaneously detect jump locations and estimate intensity log N log N / N 1 / N Optimal estimate computed via multiresolution sequence of GLRTs in O(N) operations
BATSE Trigger 845 piecewise linear multiscale PMLE piecewise polynomial multiscale PMLE
Inverse Problems Observation model: noise (possibly nonGaussian) Goal: Apply the same multiscale PMLE methodology Difficulty: Likelihood function does not admit multiscale factorization due to presence of K Example:
EM Algorithm: Basic Idea
EM Algorithm We do not actually measure z, therefore must be estimated Given observed data y and an estimate of define The EM algorithm alternates between E Step: Computing M Step: Monotonicity:
Gaussian Model Observation model: Wavelet parameterization: Reformulation with unobserved “direct” image: “inner noise”: “outer noise”:
EM Algorithm – Gaussian Case Intialize: E Step (linear filtering) O(N log N) : M Step (basic hard-thresholding) O(N) : “complete data” likelihood log linear function of z
Example – Image Deblurring BSNR = 40dB original blurred+noise unrestored SNR = 15dB restored SNR = 21dB
Example – Image Deblurring original initialization of EM with “aggressive” linear Wiener restoration Wiener restored SNR = 18dB EM restored SNR = 22dB
Example – Satellite Image Restoration original © CNES simulated observed unrestored SNR = 18dB EM restored SNR = 26dB
Poisson Model Observation model: Multiscale parameterization of unobserved image: Complete data likelihood function: “inner noise”: “outer noise”:
EM Algorithm – Poisson Case Intialize: E Step (linear filtering) O(N log N) : M Step (N independent GLRTs) O(N) : “complete data” likelihood log linear function of z
Example – Tomographic Reconstruction Shepp-Logan phantom sinogram projection process photon-counting
Example – Tomographic Reconstruction
Open Problems What can we say about optimality ? Key Challenges Minimax optimal brain reconstruction Information-theoretic restoration / segmentation in remote sensing Key Challenges indirect measurements adaptation of risk bounds, uniqueness complex image manifolds images edges are curves
Future Work determination of convergence rates and log N log N / N 1 / N determination of convergence rates and performance bounds in inverse problems development of practical algorithms that achieve near-optimal rates multiscale image segmentation schemes based on recursive partitioning natural image modeling and alternative representations (eg edgelets, curvelets) www.dsp.rice.edu/~nowak