Combining Shape and Physical Models for Online Cursive Handwriting Synthesis Jue Wang (University of Washington) Chenyu Wu (Carnegie Mellon University)

Slides:



Advertisements
Similar presentations
Active Appearance Models
Advertisements

Unsupervised Learning Clustering K-Means. Recall: Key Components of Intelligent Agents Representation Language: Graph, Bayes Nets, Linear functions Inference.
Principal Component Analysis Based on L1-Norm Maximization Nojun Kwak IEEE Transactions on Pattern Analysis and Machine Intelligence, 2008.
Presented by Xinyu Chang
Computer vision: models, learning and inference Chapter 13 Image preprocessing and feature extraction.
PCA + SVD.
Low Complexity Keypoint Recognition and Pose Estimation Vincent Lepetit.
Instructor: Mircea Nicolescu Lecture 13 CS 485 / 685 Computer Vision.
Pattern Recognition and Machine Learning
Motion Tracking. Image Processing and Computer Vision: 82 Introduction Finding how objects have moved in an image sequence Movement in space Movement.
March 15-17, 2002Work with student Jong Oh Davi Geiger, Courant Institute, NYU On-Line Handwriting Recognition Transducer device (digitizer) Input: sequence.
LYU0603 A Generic Real-Time Facial Expression Modelling System Supervisor: Prof. Michael R. Lyu Group Member: Cheung Ka Shun ( ) Wong Chi Kin ( )
Probabilistic video stabilization using Kalman filtering and mosaicking.
Adaptive Rao-Blackwellized Particle Filter and It’s Evaluation for Tracking in Surveillance Xinyu Xu and Baoxin Li, Senior Member, IEEE.
Prénom Nom Document Analysis: Data Analysis and Clustering Prof. Rolf Ingold, University of Fribourg Master course, spring semester 2008.
Incremental Learning of Temporally-Coherent Gaussian Mixture Models Ognjen Arandjelović, Roberto Cipolla Engineering Department, University of Cambridge.
Real-time Combined 2D+3D Active Appearance Models Jing Xiao, Simon Baker,Iain Matthew, and Takeo Kanade CVPR 2004 Presented by Pat Chan 23/11/2004.
Curve Analogies Aaron Hertzmann Nuria Oliver Brain Curless Steven M. Seitz University of Washington Microsoft Research Thirteenth Eurographics.
Comparison and Combination of Ear and Face Images in Appearance-Based Biometrics IEEE Trans on PAMI, VOL. 25, NO.9, 2003 Kyong Chang, Kevin W. Bowyer,
Jacinto C. Nascimento, Member, IEEE, and Jorge S. Marques
Statistical Shape Models Eigenpatches model regions –Assume shape is fixed –What if it isn’t? Faces with expression changes, organs in medical images etc.
Face Recognition Using Neural Networks Presented By: Hadis Mohseni Leila Taghavi Atefeh Mirsafian.
Graph-based consensus clustering for class discovery from gene expression data Zhiwen Yum, Hau-San Wong and Hongqiang Wang Bioinformatics, 2007.
Introduction --Classification Shape ContourRegion Structural Syntactic Graph Tree Model-driven Data-driven Perimeter Compactness Eccentricity.
Normalization of the Speech Modulation Spectra for Robust Speech Recognition Xiong Xiao, Eng Siong Chng, and Haizhou Li Wen-Yi Chu Department of Computer.
Lecture 11 Stereo Reconstruction I Lecture 11 Stereo Reconstruction I Mata kuliah: T Computer Vision Tahun: 2010.
Summarized by Soo-Jin Kim
Isolated-Word Speech Recognition Using Hidden Markov Models
Feature and object tracking algorithms for video tracking Student: Oren Shevach Instructor: Arie nakhmani.
 1  Outline  stages and topics in simulation  generation of random variates.
Alignment and classification of time series gene expression in clinical studies Tien-ho Lin, Naftali Kaminski and Ziv Bar-Joseph.
Multimodal Interaction Dr. Mike Spann
FlowString: Partial Streamline Matching using Shape Invariant Similarity Measure for Exploratory Flow Visualization Jun Tao, Chaoli Wang, Ching-Kuang Shene.
Graphite 2004 Statistical Synthesis of Facial Expressions for the Portrayal of Emotion Lisa Gralewski Bristol University United Kingdom
Digital Image Processing, 2nd ed. © 2002 R. C. Gonzalez & R. E. Woods Chapter 11 Representation & Description Chapter 11 Representation.
INDEPENDENT COMPONENT ANALYSIS OF TEXTURES based on the article R.Manduchi, J. Portilla, ICA of Textures, The Proc. of the 7 th IEEE Int. Conf. On Comp.
International Conference on Intelligent and Advanced Systems 2007 Chee-Ming Ting Sh-Hussain Salleh Tian-Swee Tan A. K. Ariff. Jain-De,Lee.
S EGMENTATION FOR H ANDWRITTEN D OCUMENTS Omar Alaql Fab. 20, 2014.
EMIS 8381 – Spring Netflix and Your Next Movie Night Nonlinear Programming Ron Andrews EMIS 8381.
A 3D Model Alignment and Retrieval System Ding-Yun Chen and Ming Ouhyoung.
Various topics Petter Mostad Overview Epidemiology Study types / data types Econometrics Time series data More about sampling –Estimation.
Major objective of this course is: Design and analysis of modern algorithms Different variants Accuracy Efficiency Comparing efficiencies Motivation thinking.
Using Support Vector Machines to Enhance the Performance of Bayesian Face Recognition IEEE Transaction on Information Forensics and Security Zhifeng Li,
Digital Image Processing, 2nd ed. © 2002 R. C. Gonzalez & R. E. Woods Representation & Description.
A Region Based Stereo Matching Algorithm Using Cooperative Optimization Zeng-Fu Wang, Zhi-Gang Zheng University of Science and Technology of China Computer.
Learning to Sense Sparse Signals: Simultaneous Sensing Matrix and Sparsifying Dictionary Optimization Julio Martin Duarte-Carvajalino, and Guillermo Sapiro.
ECE 8443 – Pattern Recognition ECE 8423 – Adaptive Signal Processing Objectives: ML and Simple Regression Bias of the ML Estimate Variance of the ML Estimate.
Introduction --Classification Shape ContourRegion Structural Syntactic Graph Tree Model-driven Data-driven Perimeter Compactness Eccentricity.
Stable Multi-Target Tracking in Real-Time Surveillance Video
Project 11: Determining the Intrinsic Dimensionality of a Distribution Okke Formsma, Nicolas Roussis and Per Løwenborg.
CS Statistical Machine learning Lecture 24
Paper Reading Dalong Du Nov.27, Papers Leon Gu and Takeo Kanade. A Generative Shape Regularization Model for Robust Face Alignment. ECCV08. Yan.
Multimodal Interaction Dr. Mike Spann
MINIMUM WORD CLASSIFICATION ERROR TRAINING OF HMMS FOR AUTOMATIC SPEECH RECOGNITION Yueng-Tien, Lo Speech Lab, CSIE National.
 Present by 陳群元.  Introduction  Previous work  Predicting motion patterns  Spatio-temporal transition distribution  Discerning pedestrians  Experimental.
Course14 Dynamic Vision. Biological vision can cope with changing world Moving and changing objects Change illumination Change View-point.
Chapter 8. Learning of Gestures by Imitation in a Humanoid Robot in Imitation and Social Learning in Robots, Calinon and Billard. Course: Robots Learning.
Feature Extraction 主講人:虞台文. Content Principal Component Analysis (PCA) PCA Calculation — for Fewer-Sample Case Factor Analysis Fisher’s Linear Discriminant.
CS Statistical Machine learning Lecture 25 Yuan (Alan) Qi Purdue CS Nov
1 Overview representing region in 2 ways in terms of its external characteristics (its boundary)  focus on shape characteristics in terms of its internal.
Statistical Models of Appearance for Computer Vision 主講人:虞台文.
Amir Yavariabdi Introduction to the Calculus of Variations and Optical Flow.
Part 3: Estimation of Parameters. Estimation of Parameters Most of the time, we have random samples but not the densities given. If the parametric form.
Advanced Algorithms Analysis and Design
Statistical Models for Automatic Speech Recognition
Dynamical Statistical Shape Priors for Level Set Based Tracking
Outline S. C. Zhu, X. Liu, and Y. Wu, “Exploring Texture Ensembles by Efficient Markov Chain Monte Carlo”, IEEE Transactions On Pattern Analysis And Machine.
Objective of This Course
Presented by: Chang Jia As for: Pattern Recognition
Feature space tansformation methods
Presentation transcript:

Combining Shape and Physical Models for Online Cursive Handwriting Synthesis Jue Wang (University of Washington) Chenyu Wu (Carnegie Mellon University) Ying-Qing Xu (Microsoft Research Asia) Heung-Yeung Shum (Microsoft Research Asia) International Journal on Document Analysis and Recognition (IJDAR) 2004

Introduction Handwriting computing techniques (pen-based devices)  Handwriting recognition make it possible for computers to understand the information involved in handwriting  Handwriting modulation handwriting editing, error correction, script searching

Introduction Handwriting Modeling & Synthesis  Movement-simulation techniques base on motor models and try to model the process of handwriting production focus on the representation and analysis of real handwriting signals rather than handwriting synthesis

Introduction  Shape-simulation methods consider the static shape of handwriting trajectory more practical than movement-simulation tech when dynamic information is not available straight forward approach : synthesize form collected handwritten glyphs learning-based cursive handwriting synthesis approach

Introduction Successful handwriting synthesis algorithm  shapes of letters vs. training samples  connection between synthesized letters A novel cursive handwriting synthesis tech  Combine the advantages of the shape-simulation and the movement-simulation methods

Outline Sample collection and segmentation Learning strategies Synthesis Strategies Experimental results Discussion and Conclusion

Sample Collection About 200 words Each letter has appeared more than 5 times These handwriting samples firstly pass through a low pass filter and then be re-sampled to produce equidistant points

Sample Segmentation Overview  Segmentation-based recognition method  Recognition-based segmentation (rely heavily on the performance of the recognition engine)  Level-building simultaneously outputs the recognition and segmentation results segmentation and recognition are merged to give an optimal result

A Two-level Framework Framework of traditional handwriting segmentation approaches  Temporal handwriting sequence is a low level feature that denotes the coordinate and velocity of the sequence at time t

Segmentation The segmentation problem is to find the identity string {I 1,…,I n }, with the corresponding segments of the sequence {S 1,…,S n }, S 1 = {z 1,…,z t 1 },…, S n ={z t n-1,…, z T }, that best explain the sequence

Segmentation For the training of the writer-independent segmentation system  low-level feature-based segmentation algorithm works well for a small number of writers A script code is calculated from handwriting data as the middle-level feature

Middle Level Feature Five kinds of key points are extracted  points of maximum/minimum x-coordinate (X +,X - )  points of maximum/minimum y-coordinate (Y +,Y - )  crossing points ( ) Average direction of the interval sequence between two adjacent key points 

Script Codes Examples

Middle Level Feature Samples of each character are divided into several clusters  those in the same cluster have a similar structural topology Since the length of script code might not be the same in all cases → can’t directly compute the similarity The script code is modeled as a homogeneous Markov chain homogeneous Markov chain

Middle Level Feature Given two script codes T 1, T 2 We may compute the stationary distributions, and transition matrix A 1, A 2 The similarity between two script codes is measured as

Middle Level Feature The position of,, A 1, A 2 are enforced symmetrically balance the variance of the KL divergence and the difference in code length If both the stationary distribution and the transition matrix of two script codes are matched well, and their code lengths are almost the same → d(T 1, T 2 ) is close to 1

Segmentation After introducing the script code as middle-level features, the optimization problem becomes  improve the accuracy of segmentation  dramatically reduce the computational complexity of level-building

Graph Model

Result

Outline Sample collection and segmentation Learning strategies Synthesis Strategies Experimental results Discussion and Conclusion

Learning Strategies Data alignment  Trajectory matching  Training set alignment Shape models

Trajectory Matching Segmentation and reconstruction of on-line handwritten scripts (1998, Pattern Recognition) Each piece is simple arc, points can be equidistantly sampled from it to represent the stroke

Trajectory Matching Landmark-point-extraction method  pen-down, pen-up points  local extrema of curvature  inflection points of curvature A handwriting sample can be divided into as many as six pieces The same character are mostly composed of the same number of pieces and they match each other naturally

Trajectory Matching A handwriting sample can be represented by a point vector  s: number of static pieces segmented from the sample  n i : number of points extracted from the i th piece

Trajectory Matching The following is to align different vector into a common coordinate frame  estimate an affine transform for each sample that transforms the sample into the coordinate frame  Affine transformations: translation, rotation, scaling

Training Set Alignment Iterative algorithm (Learning from one example through shared densities on transforms (IEEE CVPR 2000) )  Deformable energy based criterion is defined as

Training Set Alignment - Algorithm Maintain an affine transform matrix U i for each sample, which is set to identity initially Compute the deformable energy-based criterion E Repeat until convergence:  For each one of the six unit affine matrixes[14], A j, j = 1,…,6 Let Apply to the sample and recalculate the criterion E If E has been reduced, accept, otherwise: Let and apply again, If E has been reduce, accept, otherwise revert to U i End

Shape Models By modeling the distribution of aligned vectors, new examples can be generated that are similar to those in the training set Like the Active Shape Model, principal component analysis is applied to the data (PCA) (Statistical models of appearance for computer vision, Draft report, 2000)

Shape Model Formally, the covariance of the data is calculated as Then the eigenvectors and corresponding eigenvalues of S are computed and sorted so that The training set is approximated by  represent the t eigenvectors corresponding to the largest eigenvalues  b is a v t -dimensional vector given by  By varying the elements in b, new handwriting trajectory can be generated from this model apply limits of to the elements bi

Outline Sample collection and segmentation Learning strategies Synthesis Strategies Experimental results Discussion and Conclusion

Synthesis Strategies Generate each individual letter in the word Then the baselines of these letters are aligned and juxtaposed in a sequence Concatenate letters with their neighbors to form a cursive handwriting →can’t be easily achieved  To solve this problem, a delta log-normal model based conditional sampling algorithm is proposed

Individual Letter Synthesis

Delta Log-normal Model A powerful tool in analyzing rapid human movements With respect to handwriting generation, the movement of a simple stroke is controlled by velocity The magnitude of the velocity is described as (Why handwriting segmentation can be misleading?, 13th international conference on PR, 1996) log-normal function (on a logarithmic scale axis) t 0 : activation time D i : amplitude of impulse commands : mean time delay :response time of the agonist and antagonist system

Delta Log-normal Model The angular velocity can be expressed as The angular velocity is calculated as the derivative of Give, the curvature along a stroke piece is calculated as The static shape of the piece is an arc, characterized by : initial direction c 0 : constant (arc length)

Delta Log-normal Model-Example [Why Handwriting Segmentation Can Be Misleading, 1996 IEEE ICPR]

Conditional Sampling First, the trajectories of synthesized handwriting letters are decomposed into static pieces The first piece of a trajectory is called head piece, and the last piece is called the tail piece In the concatenation process, the trajectories of letters will be deformed to produce a natural cursive handwriting, by changing the parameters of the head and the tail pieces from

Conditional Sampling A deformation energy of a stroke is defined as A concatenation energy between the i th letter and the ( i +1) th letter is defined as  By minimizing the second and the third items, the two letters are forced to connect with each other smoothly and naturally

Conditional Sampling The concatenation energy of a whole word is calculated as We must ensure that the deformed letters are consistent with models The sampling energy is calculated as The whole energy formulation is finally given as

Synthesis-Iterative Approach Randomly generate a vector b(i) for each letter initially Generate trajectories S i of letters and calculate an affine transform T i for each letter (transform it to its desired position) For each pair of adjacent letters {S i, S i+1 }, deform the pieces in these letters to minimize the concatenation energy E c (i, i+1) Project the deformed shape into the model coordinate frame Update the model parameters If not converged return to step 2

Experimental Results

Discussion & Conclusion Performance is limited by samples used for training since the shape models can only generate novel shapes within the variation of training samples Although some experimental results are shown, it is still not known how to make an objective evaluation on the synthesized scripts and compare different synthesis approaches

Markov chains Markov chain on a space X with transitions T is a random process (infinite sequence of random variables) (x(0), x(1),…x(t),…) that satisfy That is, the probability of being in a particular state at time t given the state history depends only on the state at time t-1 If the transition probabilities are fixed for all t, the chain is considered homogeneous T= x2x2 x1x1 x3x

Stationary distribution Consider the Markov chain given above: The stationary distribution is T= x2x2 x1x1 x3x x=