Polynomial Approximation in Handwriting Recognition Stephen M. Watt University of Western Ontario International Workshop on Symbolic-Numeric Computation.

Slides:



Advertisements
Similar presentations
Pseudo-Relevance Feedback For Multimedia Retrieval By Rong Yan, Alexander G. and Rong Jin Mwangi S. Kariuki
Advertisements

Statistical Machine Learning- The Basic Approach and Current Research Challenges Shai Ben-David CS497 February, 2007.
Introduction to Support Vector Machines (SVM)
Generative Models Thus far we have essentially considered techniques that perform classification indirectly by modeling the training data, optimizing.
Handwritten Mathematical Symbol Recognition for Computer Algebra Applications Xiaofang Xie, Stephen M. Watt Dept. of Computer Science, University of Western.
Data Mining Feature Selection. Data reduction: Obtain a reduced representation of the data set that is much smaller in volume but yet produces the same.
Word Spotting DTW.
ONLINE ARABIC HANDWRITING RECOGNITION By George Kour Supervised by Dr. Raid Saabne.
Medical Image Registration Kumar Rajamani. Registration Spatial transform that maps points from one image to corresponding points in another image.
Support Vector Machines
SVM—Support Vector Machines
Machine learning continued Image source:
CSCI 347 / CS 4206: Data Mining Module 07: Implementations Topic 03: Linear Models.
Chapter 1 - An Introduction to Computers and Problem Solving
The Disputed Federalist Papers : SVM Feature Selection via Concave Minimization Glenn Fung and Olvi L. Mangasarian CSNA 2002 June 13-16, 2002 Madison,
Basis Expansion and Regularization Presenter: Hongliang Fei Brian Quanz Brian Quanz Date: July 03, 2008.
Classification and Decision Boundaries
CS292 Computational Vision and Language Pattern Recognition and Classification.
Chapter 5: Linear Discriminant Functions
Chapter 2: Pattern Recognition
1 Classification: Definition Given a collection of records (training set ) Each record contains a set of attributes, one of the attributes is the class.
Supervised Distance Metric Learning Presented at CMU’s Computer Vision Misc-Read Reading Group May 9, 2007 by Tomasz Malisiewicz.
Basic Concepts and Definitions Vector and Function Space. A finite or an infinite dimensional linear vector/function space described with set of non-unique.
Pattern Classification All materials in these slides were taken from Pattern Classification (2nd ed) by R. O. Duda, P. E. Hart and D. G. Stork, John Wiley.
Pattern Classification All materials in these slides were taken from Pattern Classification (2nd ed) by R. O. Duda, P. E. Hart and D. G. Stork, John Wiley.
Pattern Recognition. Introduction. Definitions.. Recognition process. Recognition process relates input signal to the stored concepts about the object.
University of Texas at Austin CS395T - Advanced Image Synthesis Spring 2006 Don Fussell Orthogonal Functions and Fourier Series.
Ch. Eick: Support Vector Machines: The Main Ideas Reading Material Support Vector Machines: 1.Textbook 2. First 3 columns of Smola/Schönkopf article on.
Classification with Hyperplanes Defines a boundary between various points of data which represent examples plotted in multidimensional space according.
Overview of Kernel Methods Prof. Bennett Math Model of Learning and Discovery 2/27/05 Based on Chapter 2 of Shawe-Taylor and Cristianini.
ONLINE HANDWRITTEN GURMUKHI SCRIPT RECOGNITION AND ITS CHALLENGES R. K. SHARMA THAPAR UNIVERSITY, PATIALA.
Methods in Medical Image Analysis Statistics of Pattern Recognition: Classification and Clustering Some content provided by Milos Hauskrecht, University.
1 Introduction to Modeling Languages Striving for Engineering Precision in Information Systems Jim Carpenter Bureau of Labor Statistics, and President,
A D V A N C E D C O M P U T E R G R A P H I C S CMSC 635 January 15, 2013 Spline curves 1/23 Curves and Surfaces.
CS 8751 ML & KDDSupport Vector Machines1 Support Vector Machines (SVMs) Learning mechanism based on linear programming Chooses a separating plane based.
InkChat Stephen M. Watt Western University Teaching with Technology 21 May 2015, W estern University, London Ontario, Canada.
University of Texas at Austin CS384G - Computer Graphics Fall 2008 Don Fussell Orthogonal Functions and Fourier Series.
Intelligent Vision Systems ENT 496 Object Shape Identification and Representation Hema C.R. Lecture 7.
1 SUPPORT VECTOR MACHINES İsmail GÜNEŞ. 2 What is SVM? A new generation learning system. A new generation learning system. Based on recent advances in.
Universit at Dortmund, LS VIII
WB1440 Engineering Optimization – Concepts and Applications Engineering Optimization Concepts and Applications Fred van Keulen Matthijs Langelaar CLA H21.1.
SVM Support Vector Machines Presented by: Anas Assiri Supervisor Prof. Dr. Mohamed Batouche.
Machine Learning in Ad-hoc IR. Machine Learning for ad hoc IR We’ve looked at methods for ranking documents in IR using factors like –Cosine similarity,
Support Vector Machines Reading: Ben-Hur and Weston, “A User’s Guide to Support Vector Machines” (linked from class web page)
An Introduction to Support Vector Machines (M. Law)
Ranking in Information Retrieval Systems Prepared by: Mariam John CSE /23/2006.
Pattern Recognition April 19, 2007 Suggested Reading: Horn Chapter 14.
Introduction to String Kernels Blaz Fortuna JSI, Slovenija.
Classification (slides adapted from Rob Schapire) Eran Segal Weizmann Institute.
Support vector machine LING 572 Fei Xia Week 8: 2/23/2010 TexPoint fonts used in EMF. Read the TexPoint manual before you delete this box.: A 1.
Gesture Input and Gesture Recognition Algorithms.
Support Vector Machines. Notation Assume a binary classification problem. –Instances are represented by vector x   n. –Training examples: x = (x 1,
© 2009 Robert Hecht-Nielsen. All rights reserved. 1 Andrew Smith University of California, San Diego Building a Visual Hierarchy.
Fast Query-Optimized Kernel Machine Classification Via Incremental Approximate Nearest Support Vectors by Dennis DeCoste and Dominic Mazzoni International.
Support Vector Machines Reading: Ben-Hur and Weston, “A User’s Guide to Support Vector Machines” (linked from class web page)
CS654: Digital Image Analysis Lecture 11: Image Transforms.
Extending linear models by transformation (section 3.4 in text) (lectures 3&4 on amlbook.com)
Support Vector Machines Reading: Textbook, Chapter 5 Ben-Hur and Weston, A User’s Guide to Support Vector Machines (linked from class web page)
Support Vector Machines
CS 9633 Machine Learning Support Vector Machines
Glenn Fung, Murat Dundar, Bharat Rao and Jinbo Bi
K Nearest Neighbors and Instance-based methods
K Nearest Neighbor Classification
Brief Review of Recognition + Context
Digital Control Systems Waseem Gulsher
COSC 4335: Other Classification Techniques
Support Vector Machines
CS4670: Intro to Computer Vision
SVMs for Document Ranking
Support Vector Machines 2
Presentation transcript:

Polynomial Approximation in Handwriting Recognition Stephen M. Watt University of Western Ontario International Workshop on Symbolic-Numeric Computation (SNC 2011) 7-9 June 2011, San Jose, California

The Pen as an Input Device Pen input for electronic devices is becoming important as an input modality. Pens can be used where keyboards can't, on very small or very large devices, in wet or dirty environments, by people with repetitive stress injuries. They also allow much easier 2-dimensional input, e.g. for drawings, music or mathematics.

Pen-Based Math Input for CAS and document processing. 2D editing. Computer-mediated collaboration.

Pen-Based Math Different than natural language recognition: – 2-D layout is a combination of writing and drawing. – Many similar few-stroke characters. – Many alphabets, used idiosyncratically. – Many symbols, each person uses a subset. – No fixed dictionary for disambiguation.

Similar Few-Stroke Characters

Character Ambiguities

Character Recognition A story about three statisticians Will concentrate on character recognition Several projects ignoring this problem

Digital Ink Formats Collected by surface digitizer or camera Sequence of ( x,y ) points sampled at some known frequency Possibly other info (angles, pressure, etc) Grouping into traces, letters, words + labelling

InkML Concepts Traces, trace groups Device information: sampling rate, resolution, etc. Pre-defined and application defined channels Trace formats, coordinate transformations Annotation text and XML

What the Computer Sees

Usual Character Reco. Methods Smooth and re-sample data THEN Match against N models by sequence alignment OR Identify “features”, such as – C oordinate values of sample points, Number of loops, cusps, Writing direction at selected points, etc Use a classification method, such as – Nearest neighbour, Subspace projection, Cluster analysis, Support Vector Machine THEN Rank choices by consulting dictionary

Difficulties Having many similar characters (e.g. for math) means comparison against all possible symbol models is slow. Determining features from points – Requires many ad hoc parameters. – Replaces measured points with interpolations – It is not clear how many points to keep, and most methods depend on number of points – Device dependent What to do since there is no dictionary? New ideas are needed!

Two Thoughts For HWR do we need all the trace data? – Do we need all the points? – Do we need full accuracy for all the points? What is classification? – H (English aitch, Greek eta, Russian en) – O (zero, oh, degree, …) – P, C, S(R, S, T)

Main Ideas Treat traces as curves, not point sets. Have to be able to recognize perturbed input. There is not always a single “right answer”.

First Axiom of HWR ∀ A, if a sample looks like an A, then it can be an A.

First Axiom of HWR ∀ A, if a sample looks like an A, then it can be an A. Implications: Classification gives a set of valid possibilities. Must be able to classify perturbed inputs. Can use approximation to represent traces more conveniently.

Orthogonal Series Representation Main idea: Represent coordinate curves as truncated orthogonal series. Advantages: – Compact – few coefficients needed – Geometric – the truncation order is a property of the character set – gives a natural metric on the space of characters – Algebraic – properties of curves can be computed algebraically (instead of numerically using heuristic parameters) – Device independent – resolution of the device is not important

Inner Product and Basis Functions

First Look: Chebyshev Series Initially used Chebyshev series [Char+SMW ICDAR 2007]. Found could approximate closely (small RMS error) with series of order 10. Like symbols tended to form clusters.

Raw Data for Symbol G

Coordinate fn approximations

Chebyshev Approx to Character

RMS Error

Problems Want fast response – how to work while trace is being captured. Low RMS does not mean similar shape.

Problem 1. On-Line Ink The main problem: In handwriting recognition, the human and the computer take turns thinking and sitting idle. We ask: Can we do useful work while the user is writing and thereby get the answer faster after the user stops writing? We show: The answer is “Yes”!

On-Line Series Coefficients

Approach works for any inner product with linear weight function. This is the Hausdorff moment problem (1921), shown to be unstable by Talenti (1987). It is just fine, however, for the dimensions we need.

An On-Line Complexity Model

Complexity

Problem 2. Shape vs Variation

Legendre-Sobolev Basis

Life in an Inner Product Space With the Legendre-Sobolev inner product we have – Low dimensional rep for curves ( ) – Compact rep of samples ~ 160 bits [G+W 2009] – >99% linear separability => convexity of classes – A useful notion of distance between curves that is very fast to compute

Linear Separability

Recognition Some classification methods compute the distance between the input curve and models. E.g. Elastic matching with DTW takes time up to quadratic in the number of sample points and linear in the number of models. Many tricks and heuristics to improve on this. E.g. Limit amount of dynamic time warping, pre-classify based on features,... We can do substantially better.

Distance Between Curves Elastic matching: Approximate the variation between curves by some fn of distances between sample points. May be coordinate curves or curves in a jet space. Sequence alignment Interpolation (“resampling”) Why not just calculate the area? This is very fast in ortho. series representation.

Distance Between Curves

Comparison of Candidate to Models Use Euclidean distance in the coefficient space. Just as accurate as elastic matching. Much less expensive. Linear in d, the degree of the approximation. < 3 d machine instructions (30ns) vs several thousand! Can trace through SVM-induced cells incrementally. Normed space for characters gives other advantages.

Choosing between Alternatives Red class or blue class?

Choosing between Alternatives

The Joy of Convex Can compute distance of a sample to this line Distance to convex hull of nearest neighbors in class gives best recognition [Golubitsky+SMW 2009,2010]

Choosing between Alternatives The nearest convex hull of neighbors is red.

Training Using CHKNN allows training with relatively few samples. (Dozens vs Thousands per class)

Recognition Summary

Error Rates as Fn of Distance SVM Convex Hull Error rate as fn of distance gives confidence measure for classifiers [MKM – Golubitsky + SMW 2009]

Combining with Statistical Info Empirical confidence on classifiers allows geometric recognition of isolated symbols to be combined with statistical methods. Domain-specific n-gram information: – Research mathematics – 20,000 articles from arXiv [MKM -- So+SMW 2005] – 2 nd year engineering math – most popular textbooks [DAS -- SMW 2008] – Inverse problem – identifying area via n-gram freq! [DML -- SMW 2008]

Deciding with Confidence Measure Symbol Recognizer: X  Class 1 with Conf  1 Symbol X in an Expression E Context-Based Predictor: X  Class 2 with Conf  2 X  Class i   i = max(  1,  2 )

Orientation and Shear

SNC for Features

Sensible Critical Points

Representations

SNC Problems Want small perturbations wrt LS norm. Transformation between LS basis and monomial basis ill-conditioned. Want to compute resultants, etc, without transforming to monomial basis. Can use degree-grading to push some arguments through. How far can we take this?

Conclusions Ask what are we really trying to do. Work with ink traces as curves, rather than as collections of sample points. Admits powerful analytic tools. Have useful geometry on space of curves. Gives device/resolution independence. Gives faster algorithms. Gives useful insights.

Thanks Bruce Char Joseph Choi Michael Friesen Oleg Golubitsky Rui Hu Vadim Mazalov Jeliasko Polihronov Elena Smirnova Clare So Coby Viner Maplesoft Microsoft MITACS NSERC

Digital Ink Forever