Microsoft’s Cursive Handwriting Recognizer

Slides:



Advertisements
Similar presentations
Paragon Software Group presents PenReader. Paragon Software Group – International Holding Founded in 1994 Location Germany (HQ), NL, Russia, USA, Japan.
Advertisements

CSCI 6962: Server-side Design and Programming Input Validation and Error Handling.
Microsoft ® Office Excel ® 2003 Training Enter formulas CSNT, Inc. presents:
 Copyright I/O International, 2013 Visit us at: A Feature Within from Sales Rep User Friendly Maintenance – with Zip Code.
CHAPTER 1: AN OVERVIEW OF COMPUTERS AND LOGIC. Objectives 2  Understand computer components and operations  Describe the steps involved in the programming.
Clients for XProtect VMS What’s new presentation
PowerPoint Presentation for Dennis, Wixom & Tegarden Systems Analysis and Design Copyright 2001 © John Wiley & Sons, Inc. All rights reserved. Slide 1.
Working with JavaScript. 2 Objectives Introducing JavaScript Inserting JavaScript into a Web Page File Writing Output to the Web Page Working with Variables.
1 of 5 This document is for informational purposes only. MICROSOFT MAKES NO WARRANTIES, EXPRESS OR IMPLIED, IN THIS DOCUMENT. © 2007 Microsoft Corporation.
February 1, 2005Microsoft Tablet PC Microsoft’s Cursive Recognizer Jay Pittman and the entire Microsoft Handwriting Recognition Research and Development.
XP 1 Working with JavaScript Creating a Programmable Web Page for North Pole Novelties Tutorial 10.
1 of 4 To calibrate your digital pen click the Start ( ) button>Control Panel>Mobile PC>Calibrate the screen. On the General tab, tap Calibrate, and then.
Chapter 6. 2 Objectives You should be able to describe: Function and Parameter Declarations Returning a Single Value Pass by Reference Variable Scope.
July 20, 2005Microsoft Tablet PC Microsoft’s Cursive Recognizer Jay Pittman and the entire Microsoft Handwriting Recognition Research and Development Team.
PubMed Search Options (Basic Course: Module 6). Table of Contents  History  Advanced Search  Accessing full text articles from HINARI/PubMed  Failure.
XP Tutorial 1 New Perspectives on JavaScript, Comprehensive1 Introducing JavaScript Hiding Addresses from Spammers.
1 of 6 This document is for informational purposes only. MICROSOFT MAKES NO WARRANTIES, EXPRESS OR IMPLIED, IN THIS DOCUMENT. © 2007 Microsoft Corporation.
1 of 3 Using digital ink, the Microsoft® Tablet PC offers the full power and functionality of a notebook PC with the added benefits of pen-based computing.
Validation and Verification Today will look at: The difference between accuracy and validity Explaining sources of errors and how they could be overcome.
Programming. Software is made by programmers Computers need all kinds of software, from operating systems to applications People learn how to tell the.
Creating Web Page Forms
© The McGraw-Hill Companies, 2006 Chapter 1 The first step.
Information guide.
Week 4 Number Systems.
CIS 338: Creating ActiveX Controls Dr. Ralph D. Westfall March, 2003.
Pen Research Jay Pittman Development Lead Tablet PC Handwriting Recognition Microsoft Corporation Jay Pittman Development Lead Tablet PC Handwriting Recognition.
Java: Chapter 1 Computer Systems Computer Programming II.
Chapter 6 Generating Form Letters, Mailing Labels, and a Directory
CS 396 Pattern Recognition Project Language Classifier v1.0 By Paul Troncone, David Keiper, Eugene Schvarts.
3.01 – Understand Business Documents Mail Merge. Administration Congratulations in order! Objective 3.01 Business Documents Test –Test Wednesday –Review.
Mail merge I: Use mail merge for mass mailings Perform a complete mail merge Now you’ll walk through the process of performing a mail merge by using the.
PubMed/History, Advanced Search and Review (module 4.3)
1 ADVANCED MICROSOFT EXCEL Lesson 9 Applying Advanced Worksheets and Charts Options.
We will complete another date search by entering 2008 to 2010 in the Specify date range option and clicking on Search.
1 OPOL Training (OrderPro Online) Prepared by Christina Van Metre Independent Educational Consultant CTO, Business Development Team © Training Version.
Avoiding Segmentation in Multi-digit Numeral String Recognition by Combining Single and Two-digit Classifiers Trained without Negative Examples Dan Ciresan.
Input, Output, and Processing
Recognition of spoken and spelled proper names Reporter : CHEN, TZAN HWEI Author :Michael Meyer, Hermann Hild.
5 BASIC CONCEPTS OF ANY PROGRAMMING LANGUAGE Let’s get started …
Downloading and Installing Autodesk Revit 2016
CMSC 104, Version 9/011 Introduction to C Topics Compilation Using the gcc Compiler The Anatomy of a C Program 104 C Programming Standards and Indentation.
Limits From the initial (HINARI) PubMed page, we will click on the Limits search option. Note also the hyperlinks to Advanced search and Help options.
By Sasha Radjuk. - Etiquette and User Guide Give some basic notes on how to log in. To login go on Google and type in outlook web app and the type.
Intermediate 2 Software Development Process. Software You should already know that any computer system is made up of hardware and software. The term hardware.
1 User Interface Design Components Chapter Key Definitions The navigation mechanism provides the way for users to tell the system what to do The.
 When you receive a new you will be shown a highlighted in yellow box where your can be found  To open your new just double click.
STAYING SAFE: Here are some safety tips when using Change your password regularly and keep it in a safe place. Don’t share your password with anyone.
G045 Lecture 08 DFD Level 1 Diagrams (Data Flow Diagrams Level 1)
© 2012 IBM Corporation Introducing IBM Cognos Insight.
Programming Fundamentals. Overview of Previous Lecture Phases of C++ Environment Program statement Vs Preprocessor directive Whitespaces Comments.
Lecture 3 Introduction to Computer Programming CUIT A.M. Gamundani Presentation Layout from Lecture 1 Background.
COIT29222 Structured Programming 1 COIT29222-Structured Programming Lecture Week 02  Reading: Textbook(4 th Ed.), Chapter 2 Textbook (6 th Ed.), Chapters.
NetTech Solutions Troubleshooting Office Applications Lesson Seven.
Python Let’s get started!.
Introduction to Algorithmic Processes CMPSC 201C Fall 2000.
Ink Analysis Richard Anderson CSE 481b Winter 2007.
CS0007: Introduction to Computer Programming Primitive Data Types and Arithmetic Operations.
語音訊號處理之初步實驗 NTU Speech Lab 指導教授: 李琳山 助教: 熊信寬
Quiz 1 A sample quiz 1 is linked to the grading page on the course web site. Everything up to and including this Friday’s lecture except that conditionals.
An introduction to Amazon AI
Sales Presenter Available now
Sales Presenter Available now
Myth Busting: Hosted Web Apps
PubMed Search Options (Basic Course: Module 6)
Sales Presenter Available now Standard v Slim
PubMed Search Options (Basic Course: Module 6)
Jamie Cool Program Manager Microsoft
Presentation transcript:

Microsoft’s Cursive Handwriting Recognizer Jay Pittman and the entire Microsoft Handwriting Recognition Research and Development Team jpittman@microsoft.com

Agenda Neural Network Review Basic Recognition Architecture Language Model Personalization Error Reporting New Languages

Handwriting Recognition Team An experiment: A research group, but not housed in MSR Positioned inside a product group Our direction and inspiration come directly from the users This isn’t for everyone, but we like it A dozen researchers Half with PhDs Mostly CS, but 1 Neuroscience, 1 Chemistry, 1 Industrial Engineering, 1 Speech Roughly half neural network researchers With various other recognition technologies

Neural Network Review Directed acyclic graph 1.0 -2.3 1.4 1.0 0.1 -0.1 0.6 0.0 0.0 0.8 -0.8 0.0 0.7 Inputs Outputs Hiddens Directed acyclic graph Nodes and arcs, each containing a simple value Nodes contain activations, arcs contain weights Activations represent soft booleans; range from 0.0 to 1.0 Weights represent excitatory and inhibitory connections; roughly symmetric about 0 At run-time, we do a “forward pass” which computes activation from inputs to hiddens, and then to outputs From outside, app only sees input nodes and output nodes

Neural Network Forward Pass 1.0 -2.3 1.4 Features computed from ink 1.0 0.1 Probability estimates of letters -0.1 0.6 0.0 0.0 0.8 -0.8 0.0 0.7 Inputs Outputs Hiddens act = F(Σ(in × weight) + bias) F(X) = 1 e + 1 -x logistic function

Neural Network Training 1.0 -2.3 1.4 1.0 0.1 -0.1 0.6 0.0 0.0 0.8 -0.8 0.0 0.7 Inputs Outputs Hiddens Start with a fixed architecture, and a random set of weights Iterate randomly through training samples For each training sample, do forward pass, and compute error of each output (size and direction) Compute what change in individual weights (size and direction) would lead to reducing each output error Reduce the change to a small fraction Repeat this walk through the training samples over and over, in different random orders

C Example Forward Pass float Logistic(float in) { return 1.0 / ((float)exp((double)-in) + 1.0); } void Forward(LAYER *pLayer) int i; for (i = 0; i < pLayer->cActivations; i++) int j; float in = pLayer->Biases[i]; for (j = 0; j < pLayer->cInputs; j++) in += pLayer->Inputs[j] * pLayer->Weights[i][j]; pLayer->Activations[i] = Logistic(in); LAYER: int cActivations; float Activations[]; float Biases[]; float Weights[]; int cInputs; float Inputs[]; (all squares are floats)

TDNN: Time Delayed Neural Network item 1 item 2 item 3 item 4 item 5 item 6 item 1 This is still a normal back-propagation network All the points in the previous several slides still apply The difference is in the connections Connections are limited Weights are shared The input is segmented, and the same features are computed for each segment Small detail: edge effects For the first two and last two columns, the hidden nodes and input nodes that reach outside the range of our input receive zero activations

Segmentation tops bottoms tops and bottoms midpoints going up

Training We use back-propagation training We collect millions of words of ink data from thousands of writers Young and old, male and female, left handed and right handed Natural text, newspaper text, URLs, email addresses, numeric values, street addresses, phone numbers, dates, times, currency amounts, etc. We collect in more than two dozen languages around the world Training on such large databases takes weeks We constantly worry about how well our data reflect our customers Their writing styles Their text content We can be no better than the quality of our training sets And that goes for our test sets too We are teaching the computer to read

Recognizer Architecture Ink Segments Top 10 List TDNN dog 68 clog 57 dug 51 doom 42 Output Matrix divvy 37 a 88 8 68 22 63 57 4 Lexicon b ooze 35 … 23 4 61 44 57 57 4 Beam Search … … cloy 34 a … o d 92 81 51 9 47 20 14 a 00 g 57 g e doxy 29 b 13 31 8 2 14 3 3 o 12 l b 00 b t … t 12 l 07 client 22 c a g 71 12 52 8 79 90 90 c 00 b 6 t a d h a 73 dozy 13 17 17 5 7 43 13 7 o d 00 t 5 … g e … o 09 … n 7 18 57 28 57 6 5 g 68 t o 53 16 79 91 44 15 12 t 8

Maintaining Ambiguity TDNN does NOT tell us which letter it is At least not in a definite answer Instead it tells us probability estimates for each and every character that it might be The same shape might be different pieces of different letters It is important to keep all theories alive for now So we can decide later, after we add in more information from the language model I suppose “maintaining ambiguity” is a euphemism for procrastinating

Error Correction: SetTextContext() Goal: Better context usage for error correction scenarios 1. User writes “Dictionary” Recognizer misrecognizes it as “Dictum” User selects “um” and rewrites “ionary” TIP notes partial word selection, puts recognizer into correction mode with left and right context Beam search artificially recognizes left context Beam search runs ink as normal Beam search artificially recognizes right context This produces “ionary” in top 10 list; TIP must insert this to the right of “Dict” Dictum 2. Dictum 3. 4. Left Context Right Context “Dict” “” a 0 b 0 c 0 e 0 a 57 c 100 t 100 i 85 d 100 i 100 o 72 a 0 6. n 5 5. 7.

Language Model We get better recognition if we bias our interpretation of the output matrix with a language model Better recognition means we can handle sloppier cursive You can write faster, in a more relaxed manner The lexicon (system dictionary) is the main part But there is also a user dictionary And there are regular expressions for things like dates and currency amounts We want a generator We ask it: “what characters could be next after this prefix?” It answers with a set of characters We still output the top letter recognitions In case you are writing a word out-of-dictionary You will have to write more neatly

Lexicon Probs on color(s) d Simple node u Leaf node UK u A C Leaf node (end of valid word) e s r s UK A UK UK s C A A C C U.S. only s US U.K. only UK 4125 UK n a l y A C Australian only A a Canadian only … d C US 4098 Unigram score (log of probability) … z e r s 1234 b … g US US US o s t l US r s c o l o US US u r s b UK UK A A C C a a t d 3159 o g e 3463 Probs on color(s) 3354 t 4125 r u n n t h e a t e r s 952 US 3606 US 4187 r e s T H C US w a l k i n g

Offensive Words The lexicon includes all the words in the spellchecker The spellchecker includes obscenities Otherwise they would get marked as misspelled But people get upset if these words are offered as corrections for other misspellings So the spellchecker marks them as “restricted” We live in an apparently stochastic world We will throw up 6 theories about what you were trying to write If your ink is near an obscene word, we might include that Dilemma: We want to recognize your obscene word when you write it Otherwise we are censoring, which is NOT our place We DON’T want to offer these outputs when you don’t write them Solution (weak): We took these words out of the lexicon You can still write them, because you can write out-of-dictionary But you have to write very neat cursive, or nice handprint Only works at the word level Can’t remove words with dual meanings Can’t handle phrases that are obscene when the individual words are not

Regular Expressions Many built-in, callable by ISVs, web pages Number, digit string, date, time, currency amount, phone number Name, address, arbitrary word/phrase list URL, email address, file name, login name, password, isolated character Many components of the above: Month, day of month, month name, day name (of week), year hour, minute, second Local phone number, area code, country code First name, last name, prefix, suffix street name, city, state or province, postal code, country None: Yields an out-of-dictionary-only system (turns off the language model) Great for form-filling apps and web pages Accuracy is greatly improved Use SetFactoid() or SetInputScope() This is in addition to the ability to load the user dictionary One could load 500 color names for a color field in a form-based app Or 8000 drug names in a prescription app On 2000 stock symbols

Regular Expressions A simple regular expression compiler is available at run time ISVs can add their own regular expressions One could imagine the DMV adding automobile VINs Blood pressure example: (!IS_DIGITS)/(!IS_DIGITS) p(!IS_DIGITS) Latitude example: (!IS_DIGITS)°((!IS_TIME_MINORSEC)’((!IS_TIME_MINORSEC)”)+)+ (N|S) http://msdn.microsoft.com/library/default.asp?url=/library/en-us/tpcsdk10/lonestar/inkconcepts/custominputscopeswithregex.asp

Default Factoid Used when no factoid is set Intended for natural text, such as the body of an email Includes system dictionary, user dictionary, hyphenation rule, number grammar, URL grammar All wrapped by optional leading punctuation and trailing punctuation Hyphenation rule allows sequence of dictionary words with hyphens between Number grammar includes actual numbers, plus: dates, times, currency amounts, telephone numbers, percents, numeric ranges, ordinal abbreviations, number-unit combinations, Roman numbers Alternatively, can be a single character (any character supported by the system) SysDict UserDict Leading Punc Hyphenation Trailing Punc Start Final Numeric Web Single Char

Calligrapher The Russian recognition company Paragraph sold itself to SGI (Silicon Graphics, Incorporated), who then sold it to Vadem, who sold it to Microsoft. In the purchase we obtained: Calligrapher Cursive recognizer that shipped on the first Apple Newton (but not the second) Transcriber Handwriting app for handheld computers (shipped on PocketPC) Calligrapher has a very similar architecture Instead of a TDNN it employs a hand-built HMM The lexicon and beam search are similar in nature (many small differences) We combined our system with Calligrapher We use a voting system (neural nets) to combine each recognizer’s top 10 list They are very different, and make different mistakes We get the best of both worlds If either recognizer outputs a single-character “word” we forget these lists and run the isolated character recognizer

Personalization: Ink Shape Simple concept: just do same training on this customer’s ink Start with components already trained on massive database of ink samples Train further on specific user’s ink samples Explicit training User must go to a wizard and copy a short script Do have labels from customer But limited in quantity, because of tediousness Implicit training Data is collected in the background during normal use We get more data But it doesn’t have labels verified by the customer We protect ourselves from mislabeled data using our internal confidence measure and a pipeline of quarantined stores Much of the work is in the infrastructure: GUI, database, management of different user’s trained networks, etc.

Personalization: Text Harvesting Simple concept: just add the user’s new words to the lexicon Examples (at Microsoft): RTM, dev, SDET, dogfooding, KKOMO, featurization We scan Outlook for outgoing email (avoids spam), outgoing appointments, notes, tasks, and contacts (names, email addresses) We scan Internet Explorer history for URLs Natural text goes through the Indexing Service word breaker Strips punctuation, quotes, etc. We support add-to-dictionary and remove-from-dictionary, from within the TIP We also run a post-processor based on frequent correction pairs

Personalization: Vista East Asian Recognizers: Chinese (simplified and traditional), Japanese, Korea Explicit and Implicit Personalization No text harvesting Because no lexicon used in recognition Dr. Qi Zhang, Dr. John Drakopoulos, Michael Black English Recognizers US and UK Explicit Personalization and text harvesting No implicit personalization Dr. Michael Revow, Dr. Dave Stevens, David Winkler, Rick Sailor, Brian Leung, Nick Strathy

Vista: Recognition Error Reporting Button in the TIP Start menu Dr. Jamie Eisenhart

Languages: Vista Previous to Vista, we shipped: English (US), English (UK), French, German, Spanish, Italian Using a completely different approach, we also shipped: Japanese, Chinese (Simplified), Chinese (Traditional), Korean All of the above have improved accuracy in Vista Latin recognizers are significantly better on URLs EA recognizers are significantly better on cursive And some will have personalization Vista adds: Dutch, Portuguese (for Brazil) Yours truly

Future Languages We have done some initial work in: Swedish, Danish, Norwegian, Finnish, Serbian (Latin and Cyrillic) We ship based on quality, so we don’t tie to specific releases We have started initial research in roughly a dozen more Some in the Latin script and some in other scripts My research goal is to speed the development of new languages

Additional Latin Script Languages Accents We already handle acute, grave, dieresis, circumflex Ring over vowels (Danish, Norwegian, Swedish, Finnish, Czech) Double acute (Hungarian) Hacek (caron) on consonants (Czech, Slovak, Slovene, Estonian, Latvian, Lithuanian) Cedilla under consonants (Romanian, Croatian, Serbian in Latin, Catalan, Turkish, Latvian) Ogonek under vowels (Polish, Lithuanian) Dot over letter (Polish, Lithuanian, Maltese) Macron over vowels (Latvian, Lithuanian, Maori) Breve over letters (Romanian, Turkish) Others: ø (Danish, Norwegian) ł (Polish) ŀl (Catalan) ð þ (Icelandic) ħ (Maltese) Dotted capital I and dotless lowercase i (Turkish) Quotes “high quotes” (English, Spanish, Portuguese, Modern Dutch, Turkish, Catalan, Galician, Welsh, Zulu, Malay) „low quotes“ (German, Polish, Czech, Romanian, Croatian, Serbian, Hungarian, Slovak) ”left-facing quotes” (Danish, Norwegian, Swedish, Finnish) « chevron quotes » (French) Numbers 1,234,567.89 (English, some former British colonies) 1 234 567,89 (Swedish, Norwegian, Finnish, Czech, Slovak, Hungarian, Lithuanian, Latvian, Estonian) 1 234 567,89 or 1.234.567,89 (French, Italian, Polish, Slovene) 1.234.567,89 (everyone else)

Best Job at Microsoft Bill Gates makes more money, but I have more fun I remember senior people at several research institutions calling cursive recognition “waste of time and money” Some find it recognizes their writing when no one else can But I also know there are others who get poor recognition I wonder if Gary Trudeau has tried it People will adapt to a recognizer, if they use it enough Just as they adapt to the people they live with and work with My physician in Issaquah gets perfect recognition on a Newton Biggest complaints: No adaptation to my handwriting style (coming in Vista) We don’t yet ship their language (I’m working on it) Other complaints: Weak on URLs (much better in Vista), email addresses, slashes, some styles of handprint (all better in Vista) East Asian weak on cursive (much better in Vista)

© 2006 Microsoft Corporation. All rights reserved. This presentation is for informational purposes only. Microsoft makes no warranties, express or implied, in this summary.