Bayesian Frameworks for Deformable Pattern Classification and Retrieval by Kwok-Wai Cheung January 1999
Model-Based Scene Analysis KnowledgeInputOutput An “H” model Integrated Segmentation and Recognition
Template Matching: Limitation Reference Models Matching Score of “H” 10/12 Matching Score of “A” 10/12 Knowledge Input Output
Deformable Models A deformable model is mostly referred to as an object shape abstraction and possesses shape varying capability for modeling non-rigid objects. A Deformable “6” Model
A Common Formulation Modeling RetrievalClassification Matching
Modeling Model representation H j Model shape parameter vector w parameter space w1w1 w2w2 w3w3 Hj(w3)Hj(w3) Hj(w2)Hj(w2) Hj(w1)Hj(w1) A Common Formulation
Matching A search process (multi-criterion optimization) parameter space w0w0 w1w1 wfwf Model Deformation Criterion Data Mismatch Criterion Combined Criterion and Regularization Hj(wf)Hj(wf)
A Common Formulation Classification
A Common Formulation Retrieval
Thesis Overview Reasoning: Bayesian Framework Approach: Deformable Models Problem: Deformable Pattern Classification Problem: Deformable Pattern Retrieval Application: Handwritten Digit Recognition Application: Handwritten Word Retrieval
Presentation Outline A Bayesian framework for deformable pattern classification (applied to handwritten character recognition) Extensions of the framework –A competitive mixture of deformable models –Robust deformable matching A Bayesian framework for deformable pattern detection (applied to handwritten word retrieval) Conclusions and future works
A Bayesian Framework for Deformable Pattern Classification with Application to Isolated Handwritten Character Recognition
Bayesian Background Prior Distribution w w Posterior Distribution Likelihood Function w Data Distribution D
Bayesian Formulation Shape Parameter Distribution Prior distribution (without data) Likelihood function Posterior distribution (with data)
Bayesian Inference: Matching Matching by maximum a posteriori (MAP) estimation. parameter space MAP estimate
Bayesian Inference: Classification Classification by computing the model evidence (Laplacian approximation).
Model Representation Cubic B-splines for modeling handwritten character shape. Shape parameter vector { w, A, T } –w = spline control points (local deformation) –{A,T} = affine transform parameter (global deformation) Mixture of Gaussians for modeling black pixels.
Model Representation Stroke width Control points with sequence number Spline curve Gaussian distributions modeling black pixels
Criterion Function Formulation Model Deformation Criterion Data Mismatch Criterion Mahalanobis distance Negative log of product of a mixture of Gaussians
Matching MAP estimation for {w, A, T, , } using the expectation-maximization (EM) algorithm [Dempster et al. 1977]. No closed form solutions and iterations between the estimation of {w, A, T} (linear) and that of { , are required.
Matching Results Simple Initialization Affine Transform Initialization Final Match
Matching Results = 3.54 ~ 0.9 deformed less = 0.89 ~ 0.9 deformed more ~ 3.0 = 0.9 thinner stroke ~ 3.0 = 0.52 thicker stroke
Classification Best Match with highest P(D|H 6 ). The output class is “Six”.
Critical Factors for Higher Accuracy Size of the Model Set –how many models for each class? Model Flexibility Constraints Likelihood Inaccuracy –use prior only for the best few candidates. Unconstrained Constrained
Critical Factors for Higher Accuracy Filtering Normalized “1” Sub-part Detection These are the unmatched portions for matching model “2” to data “0”. For the NIST dataset we used, all the characters are normalized to 20x32. Some abnormal “1”s are observed.
Experiment Training Set (NIST SD-1) –11,660 digits (32x32 by 100 writers) Test Set (NIST SD-1) –11,791 digits (32x32 by 100 writers) Size of Model Set = 23 (manually created)
Experimental Results
Previous Works
Accuracy and Size of Model Set No. of models Accuracy % 99.25% [Jain et al.1997] [Our system] Optimal accuracy curve Nearest NeighborManual
Summary A unified framework based on Bayesian inference is proposed for modeling, matching and classifying non-rigid patterns with promising results for handwritten character recognition. Several critical factors related with the recognition accuracy are carefully studied.
Extensions of the Bayesian Framework
Major Limitations of the Framework The Scale-up Problem –The classification time increases linearly with the size of the model set. The Outlier Problem –The framework is very sensitive to the presence of outlier data (e.g., strokes due to the adjacent characters)
The Scale-up Problem Solns. Hardware solution –Independent Matching Process -> Highly Parallel Computing Architecture Software solution –Cutting down the unnecessary computation by carefully designing the data structure and the implementation of the algorithm.
A Competitive Mixture of Deformable Models Let H = {H 1, H 2, …, H M, 1, 2, …, M } denote a mixture of M models. Input data D H1H1 H2H2 HMHM 11 22 MM
A Competitive Mixture of Deformable Models The Bayesian framework is extended and { i } can then be estimated using the EM algorithm. By maximizing p(D| H ) and assuming the data D comes from H i, the ideal outcome of { i } = [ ] ii
Speed up: Elimination Process Input data D H1H1 H2H2 HMHM 11 22 MM
Experiment Training Set (NIST SD-1) –2,044 digits (32x32 by 30 writers) Test Set (NIST SD-1) –1,427 digits (32x32 by 19 writers) Size of Model Set = 10 (manually created) Elimination Rule –After the first iteration, only best R models are retained.
Experimental Results: Accuracy 92.7% 94.2% 95.1%
Experimental Results: Speedup
The Outlier Problem The mixture of Gaussians noise model fails when some gross errors (outliers) are present. Badly Segmented InputWell Segmented Input
The Outlier Problem There is a necessity to distinguish between the true data and the outliers. Utilize true data and suppress outliers. True data Outliers
Use of Robust Statistics Robust statistics takes into account the outliers by either: 1) Modeling them explicitly using probability distributions, e.g. uniform distribution 2) Discounting their effect (M-estimation), e.g. defining the data mismatch measure (which is normally quadratic) such that
Use of Robust Statistics Suppressing the outliers’ contribution
Robust Linear Regression Without Robust Statistics With Robust Statistics
Robust Deformable Matching An M-estimator is proposed such that Original Data Mismatch Criterion Data Mismatch Criterion with Robust Statistics
Experiment Goal: To extract the leftmost characters from handwritten words. Test Set - CEDAR database Model Set - manually created Model Initialization –Chamfer matching based on a distance transform.
Experimental Results Initialization Fixed Window Width 1 Fixed Window Width 2 Fixed Window Width 3 Robust Window
More Experimental Results
Summary The basic framework can be extended to a competitive mixture of deformable models where significant speedup can be achieved. The robust statistical approach is found to be an effective solution for robust deformable matching in the presence of outliers.
Deformable Pattern Detection
A Bayesian Framework for Deformable Pattern Detection with Application to Handwritten Word Retrieval
The Bayesian Framework Revisit Model, H i (Uniform prior) Shape parameter, w (Prior distribution of w) Regularization parameter, (Uniform prior) Data, D (Likelihood function of w) Stroke width parameter, (Uniform prior) Multivariate Gaussian Mixture of Gaussians Direction of Generation From Model to Data
A Dual View of Generativity The Sub-part Problem The Outlier Problem
Forward and Reverse Frameworks HiHi w D HiHi w D Reverse Framework Forward Framework
Model, H i Shape parameter, w Regularization parameter, (Uniform prior) Data, D (Uniform prior) Model localization parameter, (Uniform prior) Multivariate Gaussian Mixture of Gaussians (each data point is a Gaussian center) Direction of Generation From Data to Model
New Criterion Function Sub-data Mismatch Criterion Negative log of product of a mixture of Gaussians Old Data Mismatch Criterion
Forward Matching Matching –Optimal estimates {w *, A *, T *, *, * } are obtained by maximizing –The EM algorithm is used.
Pattern Detection Detection –by computing the forward evidence (Laplacian approximation) Formula for these three parts are different when compared with the reverse evidence computation.
Comparison between Two Frameworks Shape Discriminating Properties –The reverse evidence does not penalize models resting on the white space. [Proof: see Proposition 1] –The forward evidence does penalize white space. [Proof: see Proposition 2] (The sub-part problem is solved implicitly.)
Comparison between Two Frameworks Shape Matching Properties –Reverse matching is sensitive to outliers but possesses good data exploration capability. [Proof: see Proposition 3] –Forward matching is insensitive to outliers but with weak data exploration capability. Thus, its effectiveness relies on some good initialization. [Proof: see Proposition 4] (The outlier problem is solved implicitly.)
Bidirectional Matching Algorithm A matching algorithm is proposed which possesses the advantages of the two frameworks. The underlying idea is to try to obtain a correspondence between the model and data such that the model looks like the data AND vice versa (i.e., the data mismatch measures for the two frameworks should both be small.).
Bidirectional Matching Algorithm Initialization by Chamfer matching Forward Matching Compute the data mismatch measures for the two frameworks, E mis and E sub-mis Reverse Matching E mis > E sub-mis ? :=(1+ ) if :=4 Converge ? yes no
Convergence Property The local convergence property of the bidirectional matching algorithm has been proved. [see Theorem 1]
Experiment (I) Goal: To extract the leftmost characters from handwritten words. Test Set - CEDAR database –300 handwritten city name images Model Set - manually created Model Initialization –Chamfer matching based on a distance transform
Experimental Results Forward Matching Reverse Matching Bidirectional Matching
Experimental Results * Results are obtained by visual checking.
Experiment (II) Goal: To retrieve handwritten words with its leftmost character similar to an input shape query. Test Set - CEDAR database –100 handwritten city name images Query Set
Performance Evaluation More false positive cases, the precision rate decreases. More true negative cases, the recall rate decreases.
Experimental Results Best N Approach Recall = 59% Precision = 43% # of candidates = 10
Experimental Results Evidence Thresholding Recall = 65% Precision = 45% Averaged # of candidates = 12.7
Related Works [Huttenlocher et al. 1993] Hausdorff matching for image comparison. [Burl et al. 1998] Keyword spotting in on- line handwriting data. [Jain et al. 1998] Shape-based retrieval of trademark images.
Summary A novel Bayesian framework is proposed for deformable pattern detection. By combining the two proposed frameworks, the bidirectional matching algorithm is proposed and applied to handwritten word retrieval. Both theoretical and experimental results show that the algorithm is robust against outliers and possesses good data exploration capability.
Conclusions & Future Works
Summary of Contributions A comprehensive study on deformable pattern classification based on a unified framework. A competitive mixture of deformable models for alleviating the scale-up problem. A study on using non-linear robust estimation for alleviating the outlier problem.
Summary of Contributions A novel Bayesian framework for deformable pattern detection, the theoretical comparison between the frameworks and the newly proposed bidirectional matching algorithm. Portability to other shape recognition problems.
Future Works Modeling a Dataset of Non-Rigid Shapes –On Model Representation Construction –On Model Set Construction Shape Discrimination and Model Initialization Fast Implementation –More on Model Competition –On Search Space Pruning and Deformable Shape Parameterization