Statistical Methods for Image Reconstruction MIC 2007 Short Course October 30, 2007 Honolulu
Topics Lecture I: Warming Up (Larry Zeng) Lecture II: ML and MAP Reconstruction (Johan Nuyts) Lecture III: X-Ray CT Iterative Reconstruction (Bruno De Man) Round-Table Discussion of Your Interested Topics
Larry Zeng, Ph.D. University of Utah Warming Up Larry Zeng, Ph.D. University of Utah
Problem to solve: Computed Tomography with discrete measurements Continuous Unknown Image
Analytic Algorithms For example, FBP (Filtered Backprojection) Treat the unknown image as continuous Point-by-point reconstruction (arbitrary points) Regular grid points are commonly chosen for display Continuous Unknown Image
Iterative Algorithms For example, ML-EM, OS-EM, ART, GC, … Discretize the image into pixels Solve imaging equations AX=P X = unknowns (pixel values) P = projection data A = imaging system matrix P aij X
Example x4 x3 x2 x1 p4 p3 p2 p1 AX=P
Example 5 4 x1 x2 3 AX=P Rank(A) = 3 < 4 System not consistent No solution x3 x4 2
Solving AX=P In practice, A is not invertible (not square, or not full rank) A generalized inverse is used X = A†P Moore-Penrose inverse A†=(ATA)-1AT AA†A=A A†AA†=A† Least-squares solution (Gaussian noise model)
Least-squares solution 4 3 5 4 2.25 1.75 3 4 2 3 1.75 1.25
The most significant motivation of using an iterative algorithm: Usually, iterative algorithms give better reconstructions than analytical algorithms.
Computer Simulations Examples (Fessler) FBP = Filtered Backprojection (Analytical) PWLS = Penalized data-weighted least squares (Iterative) PL (or MAP) = Penalized likelihood (Iterative)
Why are iterative algorithms better than analytical algorithms? The system matrix A is more flexible than the assumptions in an analytical algorithm It is a lot harder for an analytical algorithm to handle some realistic situations
Attenuation in SPECT g
Attenuation in SPECT Photon attenuation is spatially variant Non-uniform atten. modeled in an iterative algorithm system matrix A (1970s) Uniform atten. in analytic algorithm (1980s) Non-uniform atten. in analytic algorithm (2000s)
Distance-Dependent Blurring Close Far
Distance-Dependent Blurring System matrix A models the blurring P X
Distance-Dependent Blurring Analytic treatment uses the Frequency-Distance Principle (approximation) Near Far Near Far Slope ~ Distance
Truncation (Analytic) Old FBP algorithm does not allow truncation (Iterative) System matrix only models measured data, ignores unmeasured data (Analytic) New DBP (Derivative Backprojection) algorithm can handle some truncation situations
Scatter Scattered Primary
Scatter (Iterative) System matrix A can model scatter using “effective scattering source” or Monte Carlo (Analytic) Still don’t know how, other than preprocessing data using multiple energy windows
In physics and imaging geometry modeling, analytic methods are behind the iterative ones.
Analytic algorithm — “Open-loop system” Data Filtering Backprojection Image
Iterative algorithm — “Closed-loop system” Compare Image Projection Data A AT Update Backprojection
Main Difference Analytic algorithms do not have a projector A, but have a set of assumptions. Iterative algorithms have a projector A.
Under what conditions will the analytic algorithm outperform the iterative algorithm? The analytical assumptions (e.g., sampling, geometry, physics) System matrix A is always an approximation (e.g., the pixel model assumes a uniform activity within the pixel)
Noise
Noise Consideration Iterative: Objective function (e.g., Likelihood) Iterative: Regularization (very effective!) Analytic: Filtering (assumes spatially invariant, not as good)
Modeling noise in an iterative algorithm (thus Statistical Algorithm) Example: A one-pixel image (i.e., total count is the only concern) 3 measurements: 1100 counts for 10 sec. (110/s) 100 counts for 1 sec. (100/s) 15000 counts for 100 sec. (150/s) Counts per second?
m1 = 1100, s2(m1)=1100, x1=1100/10=110, s2(x1)= s2(m1)/(10)2 = 1100/100=11 m2 = 100, s2(m2)=100, x2=100/1=10, s2(x2)= s2(m2)= 100 m3 = 15000, s2(m3)=15000, x3=15000/100=150, s2(x3)= s2(m3)/(100)2 = 15000/10000=1.5
Objective Function
Generalization If you have N unknowns, you need to make a lot more than N measurements. The measurements are redundant. Put more weights on the measurements that you trust more.
Geometric Illustration Two unknowns x1, x2 Each measurement = a linear eqn = 1 line We make 3 measurements; we have 3 lines
Noise Free (consistent) x2 x1
Noisy (inconsistent) x2 e2 e3 e1 x1
If you don’t have redundant measurements, the noise model (e. g If you don’t have redundant measurements, the noise model (e.g., s2) does not make any differences x2 x1
In practice, # of unknowns ≈ # of measurements However, iterative methods still outperform analytic methods Why?
Answer: Regularization & Constraints Helpful constraints: Non-negativity (xi ≥ 0) Image support (xk = 0, x in K) Bounds of values (e.g. transmission map) Prior
Some common regularization methods (1) — Stop early Why stop? Doesn’t the algorithm converge? Yes, in many cases (e.g., ML-EM) it does. When an ML-EM reconstruction becomes noisy, its likelihood function is still increasing. The ultimate solution (e.g., ML-EM) solution is too noisy; we don’t like it.
Iterative Reconstruction An Example (ML-EM)
Iterative Reconstruction An Example (ML-EM)
Iterative Reconstruction An Example (ML-EM)
Iterative Reconstruction An Example (ML-EM)
Iterative Reconstruction An Example (ML-EM)
Iterative Reconstruction An Example (ML-EM)
Iterative Reconstruction An Example (ML-EM)
Iterative Reconstruction An Example (ML-EM)
EARLY Stopping early is like lowpass filtering. Low freq. components converge faster than high freq. components.
OR: Over-iterate, then filter
Regularization using Prior What is a prior? Example: You want to estimate tomorrow’s temperature. You know today’s temperature. You assume that tomorrow’s temperature is pretty close to today’s
Regularization using Prior MAP algorithm = Maximum A Posteriori algorithm = Bayesian
How to select a prior? V If V(x) = x2, it enforces smoothness. Not easy. It is an art. Example: V Prior encouragement Data matching Noise model If V(x) = x2, it enforces smoothness.
Edge Preserving Prior V(x) If V(x) = |x|, it preserves edges and reduces noise. x
How does V know where is the edge? V(x) = |x| increases a lot slower than V(x) = x2. V(x) = |x| treats small jumps as noise to suppress and large jumps as edges not to suppress as much. The shape of V(x) is important. V(x) x
What is an ML algorithm? An algorithm that maximizes the likelihood function.
Example: Gaussian Noise A likelihood function is a conditional probability Take log Thus maximize the likelihood function L(X) is equivalent to Least-Squares Problem
Least-Squares = ML for Gaussian An algorithm that minimizes is an ML algorithm for independent Gaussian noise One can set up a Poisson likelihood function and find an ML algorithm for that. The well-known ML-EM algorithm is one of such algorithms.
What is a MAP algorithm? MAP = maximum a posteriori An algorithm that maximizes the likelihood function with a prior For example, an algorithm that minimizes the following objective function is a MAP algorithm
What is ill-condition? Small data noise causes large reconstruction noise SVD
Regularization changes an ill-conditioned problem into a different well-conditioned problem.
Example Big error
Assume a prior:x1 and x2 are close Objective Function: Note: b is NOT a Lagrange multiplier
To minimize We set We obtain a different linear problem
MAP Solutions Noiseless (d1 = d2= 0) b 0.01 0.1 1 10 100 Cond. # 50000 200 22 5.7 x1 0.978 1.094 x2 22.222 1.178 1.103 1.095
MAP Solutions Noisy (d1 = -d2= 0.2) b 0.01 0.1 1 10 100 Cond. # 50000 200 22 5.7 x1 0.978 0.733 1.094 1.095 x2 22.222 66.667 1.178 1.357 1.103 1.122 1.098 1.096
Observations Using a prior changes the original problem For noiseless data, the solutions are different More stable when there is noise It is an art
Observations In fact, a closed-form solution exist for this PWLS (penalized weighted least squares) problem:
Can analytic algorithm do these? Not yet. They don’t even know how to correctly enforce the non-negativity, the support etc.
On Condition Numbers AX=P Error propagation: k = Conditin # Trade-off: Introducing b reduces k, but increases ||DA||.
More on Condition Numbers AX=P Error propagation: k = Conditin # Trade-offs: More accurate (i.e., ||DA|| significantly smaller ) modeling makes A less sparse A may have a larger k The overall error of X may be reduced (better resolution and lower noise)
Ray-Driven & Pixel-Driven Projector(A)/Backprojector(AT) Ray-driven aij: Along a projection ray, calculate how much pixel i contributes to detector bin j. Usually, line-length and trip-area weighting
Line-Length Weighting pj aij Projection (A): X Backprojection (AT):
Area Weighting pj aij Projection (A): X Backprojection (AT):
Rotation-Based Projector/Backprojector Very easy to implement distance dependent blurring Fast Warping is required for convergent beams
Pixel-Driven Backprojector Widely used in analytic algorithms aij = 1 P Interpolation is required on the detector X
If you don’t know what you are doing, do not use an “iterative algorithm backprojector” (AT), e.g., line-length weighted backprojector, in an analytic algorithm An “analytic algorithm backprojector” backprojector is not AT. You may not want to use it in an iterative algorithm if you don’t know what you are doing.
Some Common Terms Used in Iterative Algorithms Pixels (Voxels) Easy to use Unrealistic high-frequency edges
Blobs (no edge, overlap) x y z Voxel x y z Blob
Natural Pixels Projection path Depends on the collimator geometry A Natural Pixel f = Σ wi xi
Iterative Methods Summary Can incorporate Imaging physics Irregular imaging geometry Prior information But It is an art to set things up Complex Long computation time