CS 433/557 Algorithms for Image Analysis Template Matching Acknowledgements: Dan Huttenlocher
CS 433/557 Algorithms for Image Analysis Matching and Registration Template Matching intensity based (correlation measures) feature based (distance transforms) Flexible Templates pictorial structures Dynamic Programming on trees generalized distance transforms Extra Material:
Intensity Based Template Matching Basic Idea Left ventricle template image Face template image Find best template “position” in the image
Intensity-Based Rigid Template matching image coordinate system s template coordinate system pixel p in template T pixel p+s in image For each position s of the template compute some goodness of “match” measure Q(s) e.g. sum of squared differences Sum over all pixels p in template T
Intensity-Based Rigid Template matching image coordinate system template coordinate system s1 s2 Search over all plausible positions s and find the optimal one that has the largest goodness of match value Q(s)
Intensity-Based Rigid Template matching What if intensities of your image are not exactly the same as in the template? (e.g. may happen due to different gain setting at image acquisition)
Other intensity based goodness of match measures Normalized correlation Mutual Information (next slide)
Other goodness of match measures : Mutual Information Will work even in extreme cases In this example the spatial structure of template and image object are similar while actual intensities are completely different
Other goodness of match measures : Mutual Information Fix s and consider joint histogram of intensity “pairs”: T I Joint histogram is spread-out for s1 s2 Joint histogram is more concentrated (peaked) for s2 T I T s1 I Mutual information between template T and image I (for given transformation s) describes “peakedness” of the joint histogram measures how well spatial structures in T and I align
Mutual Information (technical definition) Assuming two random variables X and Y their mutual information is entropy and joint entropy e for random variables X and Y measures “peakedness” of histogram/distribution marginal histogram (distribution) joint histogram (distribution)
Mutual Information Computing MI for a given position s We want to find s that maximizes MI that can be written as T marginal distributions Pr(x) and Pr(y) I joint distribution Pr(x,y) (normalized histogram) for a fixed given s NOTE: has to be careful when computing. For example, what if H(x,y)=0 for a given pair (x,y)?
Finding optimal template position s Need to search over all feasible values of s Template T could be large The bigger the template T the more time we spend computing goodness of match measure at each s Search space (of feasible positions s) could be huge Besides translation/shift, position s could include scale, rotation angle, and other parameters (e.g. shear) Q: Efficient search over all s?
Finding optimal template position s One possible solution: Hierarchical Approach Subsample both template and image. Note that the search space can be significantly reduced. The template size is also reduced. Once a good solution(s) is found at a corser scale, go to a finer scale. Refine the search in the neighborhood of the courser scale solution.
Feature Based Template Matching Features: edges, corners,… (found via filtering) Distance transforms of binary images Chamfer and Housdorff matching Iterated Closed Points
Feature-based Binary Templates/Models What are they? What are features? Object edges, corners, junctions, e.t.c. Features can be detected by the corresponding image filters Intensity can also be a considered a feature but it may not be very robust (e.g. due to illumination changes) A model (binary template) is a set of feature points in N-dimensional space (also called feature space) Each feature is defined by a descriptor (vector)
Binary Feature Templates (Models) 2D example Links may represent neighborhood relationships between the features of the model Model’s features are represented by points reference point descriptor could be a 2D vector specifying feature position with respect to model’s coordinate system Feature spaces could be 3D (or higher). E.g., position of an edge in a medical volumes is a 3D vector. But even in 2D images edge features can be described by 3D vectors (add edge’s angular orientation to its 2D location) 2D feature space For simplicity, we will mainly concentrate on 2D feature space examples
Matching Binary Template to Image L - model’s positioning - position of feature i At fixed position L we can compute match quality Q(L) using some goodness of match criteria. Object is detected at all positions which are local maxima of function Q(L) such that where K is some presence threshold Example: Q(L) = number of (exact) matches (in red) between model and image features (e.g. edges).
Exact feature matching is not robust Counting exact matches may be sensitive to even minor deviation in shape between the model and the actual object appearance
Distance Transform More robust goodness of match measures use distance transform of image features s Detect desirable image features (edges, corners, e.t.c.) using appropriate filters For all image pixels p find distance D(p) to the nearest image feature q p
Distance Transform Image features (2D) 3 4 2 5 1 Distance Transform Distance Transform is a function that for each image pixel p assigns a non-negative number corresponding to distance from p to the nearest feature in the image I
Distance Transform can be visualized as a gray-scale image Image features (edges) Distance Transform
Distance Transform can be very efficiently computed
Distance Transform can be very efficiently computed
Metric properties of discrete Distance Transforms Forward mask Backward mask Metric Set of equidistant points - 1 Manhattan (L1) metric 1.4 1 Better approximation of Euclidean metric Exact Euclidean Distance transform can be computed fairly efficiently (in linear time) without bigger masks. www.cs.cornell.edu/~dph/matchalgs/ Euclidean (L2) metric
Goodness of Match via Distance Transforms At each model position one can “probe” distance transform values at locations specified by model (template) features 3 4 2 5 1 Use distance transform values as evidence of proximity to image features.
Goodness of Match Measures using Distance Transforms Chamfer Measure sum distance transform values “probed” by template features Hausdorff Measure k-th largest value of the distance transform at locations “probed” by template features (Equivalently) number of template features with “probed” distance transform values less than fixed (small) threshold Count template features “sufficiently” close to image features Spatially coherent matching
Hausdorff Matching counting matches with a dialated set of image features
Spatial Coherence of Feature Matches 50% matched 50% matched Spatially incoherent matches Few “discontinuities” between neighboring features Neighborhood is defined by links between template/model features Spatial coherence:
Spatially Coherent Matching Separate template/model features into three subsets Matchable (red) -near image features Boundary (blue circle) -matchable but “near” un-matchable -links define “near” for model features Un-matchable (gray) -far from image features Count the number of non-boundary matchable features
Spatially Coherent Matching Percentage of non-boundary matchable features (spatially coherent matches)
Comparing different match measures Monte Carlo experiments with known object location and synthetic clutter and occlusion -Matching edge locations Varying percent clutter -Probability of edge pixel 2.5-15% Varying occlusion -Single missing interval 10-25% of the boundary Search over location, scale, orientation Binary model (edges) 5% clutter image
Comparing different match measures: ROC curves Probability of false alarm versus detection - 10% and 15% of occlusion with 5% clutter -Chamfer is lowest, Hausdorff (f=0.8) is highest -Chamfer truncated distance better than trimmed
ROC’s for Spatial Coherence Matching Clutter 3% Occlusion 20% FA CD 1 Clutter 5% Occlusion 40% Parameter defined degree of connectivity between model features If then model features are not connected at all. In this case, spatially coherent matching reduces to plain Hausdorff matching.
Edge Orientation Information Match edge orientation (in addition to location) Edge normals or gradient direction 3D model feature space (2D location + orientation) Extract 3D (edge) features from image as well. Requires 3D distance transform of image features weight orientation versus location fast forward-backward pass algorithm applies Increases detection robustness and speeds up matching better able to discriminate object from clutter better able to eliminate cells in branch and bound search
ROC’s for Oriented Edge Pixels Vast Improvement for moderate clutter Images with 5% randomly generated contours Good for 20-25% occlusion rather than 2-5% Oriented Edges Location only
Efficient search for good matching positions L Distance transform of observed image features needs to be computed only once (fast operation). Need to compute match quality for all possible template/model locations L (global search) Use hierarchical approach to efficiently prune the search space. Alternatively, gradient descent from a given initial position (e.g. Iterative Closest Point algorithm, …later) Easily gets stuck at local minima Sensitive to initialization
Global Search Hierarchical Search Space Pruning Assume that the entire box might be pruned out if the match quality is sufficiently bad in the center of the box (how? … in a moment)
Global Search Hierarchical Search Space Pruning If a box is not pruned then subdivide it into smaller boxes and test the centers of these smaller boxes.
Global Search Hierarchical Search Space Pruning Continue in this fashion until the object is localized.
Pruning a Box (preliminary technicality) Location L’ is uniformly better than L” if for all model features i 5 6 7 2 4 3 1 L’ L” 9 10 11 8 7 12 6 5 4 3 2 1 A uniformly better location is guaranteed to have better match quality!
Pruning a Box (preliminary technicality) 7 5 6 4 3 8 2 1 Assume that is uniformly better than any location in the box hypothetical location Assume that is uniformly better than any location then the match quality satisfies for any If the presence test fails ( for a given threshold K) then any location must also fail the test The entire box can be pruned by one test at !!!!
Building “ “ for a Box of “Radius” n at the center of the box 9 10 11 8 7 12 6 5 4 3 2 1 7 5 6 4 3 8 2 1 hypothetical location value of the distance transform changes at most by 1 between neighboring pixels value of can decrease by at most n (box radius) for other box positions
Global Hierarchical Search (Branch and Bound) Hierarchical search works in more general case where “position” L includes translation, scale, and orientation of the model N-dimensional search space Guaranteed or admissible search heuristic Bound on how good answer could be in unexplored region can not miss an answer In worst case won’t rule anything out In practice rule out vast majority of template locations (transformations)
Local Search (gradient descent): Iterated Closest Point algorithm ICP: Iterate until convergence Estimate correspondence between each template feature i and some image feature located at F(i) (Fitzgibbons: use DT) Move model to minimize the sum of distances between the corresponding features (like chamfer matching) Alternatively, find local move of the model improving DT-based match quality function Q(L)
Problems with ICP and gradient descent matching Slow Can take many iterations ICP: each iteration is slow due to search for correspondences Fitzgibbons: improve this by using DT No convergence guarantees Can get stuck in local minima Not much to do about this Can be improved by using robust distance measures (e.g. truncated Euclidean measure)
Observations on DT based matching Main point of DT: allows to measure match quality without explicitly finding correspondence between pairs of mode and image features (hard problem!) Hierarchical search over entire transformation space Important to use robust distance Straight Chamfer very sensitive to outliers Truncated DT can be computed very fast Fast exact or approximate methods for DT ( metric) For edge features use orientation too edge normals or intensity gradients
Rigid 2D templates Should we really care? So far we studied matching in case of 2D images and rigid 2D templates/models of objects When do rigid 2D templates work? there are rigid 2D objects (e.g. fingerprints) 3D object may be imaged from the same view point: controlled image-based data bases (e.g. photos of employees, criminals) 2D satellite images always view 3D objects from above X-Rays, microscope photography, e.t.c.
More general 3D objects 3D image volumes and 3D objects Distance transforms, DT-based matching criteria, and hierarchical search techniques easily generalize Mainly medical applications 2D image and 3D objects 3D objects may be represented by a collection of 2D templates (e.g. tree-structured templates, next slide) 3D objects may be represented by flexible 2D templates (soon)
Tree-structured templates Larger pair-wise differences higher in tree
Tree-structured templates Rule out multiple templates simultaneously - Speeds up matching - Course-to-fine search where coarse granularity can rule out many templates - Applies to variety of DT based matching measures: Chamfer, Hausdorff, robust Chamfer
Flexible Templates Flexible Template combines a number of rigid templates connected by flexible strings parts connected by springs and appearance models for each part Used for human bodies, faces Fischler & Elschlager, 1973 – considerable recent work (e.g. Felzenszwalb & Huttenlocher, 2003 )
Flexible Templates Why? To account for significant deviation between proportions of generic model (e.g average face template) and a multitude of actual object appearance non-rigid (3D) objects may consist of multiple rigid parts with (relatively) view independent 2D appearance
Flexible Templates: Formal Definition Set of parts Positioning Configuration specifies locations of the parts Appearance model matching quality of part i at location Edge for connected parts explicit dependency between edge-connected parts Interaction/connection energy e.g. elastic energy
Flexible Templates: Formal Definition Find configuration L (location of all parts) that minimizes Difficulty depends on graph structure Which parts are connected (E) and how (C) General case: exponential time
Flexible Templates: simplistic example from the past Discrete Snakes What graph? What appearance model? What connection/interaction model? What optimization algorithm?
Flexible Templates: special cases Pictorial Structures What graph? What appearance model? -intensity based match measure -DT based match measure (binary templates) What connection/interaction model? -elastic springs What optimization algorithm?
Dynamic Programming for Flexible Template Matching DP can be used for minimization of E(L) for tree graphs (no loops!)
Dynamic Programming for Flexible Template Matching root DP algorithm on trees Choose post-order traversal for any selected “root” site/part Compute for all “leaf” parts Process a part after its children are processed Select best energy position for the “root” and backtrack to “leafs” If part ‘i ‘ has only one child ‘a’ If part ‘i ‘ has two (or more) children ‘a’, ‘b’, …
Dynamic Programming for Flexible Template Matching root DP’s complexity on trees (same as for 1D snakes) n parts, m positions OK complexity for local search where “m” is relatively small (e.g. in snakes) E.g. for tracking a flexible model from frame to frame in a video sequence
Local Search Tracking Flexible Templates
Local Search Tracking Flexible Templates
Local Search Tracking Flexible Templates
Searching in the whole image (large m) m = image size or m = image size*rotations Then complexity is not good For some interactions can improve to based on Generalized Distance Transform (from Computational Geometry) This is an amazing complexity for matching n dependent parts Note that is the number of operations for finding n independent matches
Generalized Distance Transform Idea: improve efficiency of the key computational step (performed for each parent-child pair, n-times) ( operations) Intuitively: if x and y describe all feasible positions of “parts” in the image then energy functions and can be though of as some gray-scale images (e.g. like responses of the original image to some filters)
Generalized Distance Transform Idea: improve efficiency of the key computational step ( operations performed for each parent-child pair) Let (distance between x and y) reasonable interactions model! Then is called a Generalized Distance Transform of
From Distance Transform to Generalized Distance Transform Assuming then is standard Distance Transform (of image features) Locations of binary image features
From Distance Transform to Generalized Distance Transform For general and any fixed is called Generalized Distance Transform of E(y) may prefer strength of E(x) to proximity E(x) may represent non-binary image features (e.g. image intensity gradient)
Algorithm for computing Generalized Distance Transform Straightforward generalization of forward-backward pass algorithm for standard Distance Transforms Initialize to E(x) instead of Use instead of 1
Flexible Template Matching Complexity Computing via Generalized Distance Transform: previously (m-number of positions x and y) Improves complexity of Flexible Template Matching to in case of interactions
“Simple” Flexible Template Example: Central Part Model Consider special case in which parts translate with respect to common origin E.g., useful for faces Parts Distinguished central part Connect each to Elastic spring costs NOTE: for simplicity (only) we consider part positions that are translations only (no rotation or scaling of parts)
Central Part Model example “Ideal” location w.r.t. is given by where is a fixed translation vector for each i>1 “String cost for deformation from this “ideal” location Whole template energy
Central Part Model Summary of search algorithm Matching cost: For each non-central part i>1 compute matching cost for all possible positions of that part in the image For each i>1 compute Generalized DT of For all possible positions of the central part compute energy Select the best location or select all locations with larger then a fixed threshold
Central Part Model for face detection
Search Algorithm for tree-based pictorial structures Algorithm is basically the same as for Central Part Model. Each “parent” part knows ideal positions of “child” parts. String deformations are accounted for by the Generalized Distance Transform of the children’s positioning energies