February 1, 2005Microsoft Tablet PC Microsoft’s Cursive Recognizer Jay Pittman and the entire Microsoft Handwriting Recognition Research and Development.

Slides:



Advertisements
Similar presentations
A complete citation, notecard, and outlining tool
Advertisements

Chapter 3 Creating a Business Letter with a Letterhead and Table
COS 461 Fall 1997 Routing COS 461 Fall 1997 Typical Structure.
CS0004: Introduction to Programming Visual Studio 2010 and Controls.
 Use the Left and Right arrow keys or the Page Up and Page Down keys to move between the pages. You can also click on the pages to move forward.  To.
Efficient access to TIN Regular square grid TIN Efficient access to TIN Let q := (x, y) be a point. We want to estimate an elevation at a point q: 1. should.
Tutorial 8: Developing an Excel Application
Programming Types of Testing.
CHAPTER 1: AN OVERVIEW OF COMPUTERS AND LOGIC. Objectives 2  Understand computer components and operations  Describe the steps involved in the programming.
Copyright 2003 Peter McDevitt 1 Microsoft Excel 2002 Lecture 3 – A Professional Looking Worksheet.
1 ADVANCED MICROSOFT POWERPOINT Lesson 5 – Using Advanced Text Features Microsoft Office 2003: Advanced.
GS 540 week 6. HMM basics Given a sequence, and state parameters: – Each possible path through the states has a certain probability of emitting the sequence.
XP New Perspectives on Microsoft Office Excel 2003, Second Edition- Tutorial 3 1 Microsoft Office Excel 2003 Tutorial 3 – Developing a Professional- Looking.
Microsoft’s Cursive Handwriting Recognizer
Tirgul 9 Amortized analysis Graph representation.
Document Processing CS French Chapter 4. Text editor used for simple text entry and editing not intended to look good for editing programs and data e.g.
Spreadsheets With Microsoft Excel ® as an example.
This material in not in your text (except as exercises) Sequence Comparisons –Problems in molecular biology involve finding the minimum number of edit.
1 of 4 To calibrate your digital pen click the Start ( ) button>Control Panel>Mobile PC>Calibrate the screen. On the General tab, tap Calibrate, and then.
XP New Perspectives on Microsoft Office Excel 2003, Second Edition- Tutorial 11 1 Microsoft Office Excel 2003 Tutorial 11 – Importing Data Into Excel.
July 20, 2005Microsoft Tablet PC Microsoft’s Cursive Recognizer Jay Pittman and the entire Microsoft Handwriting Recognition Research and Development Team.
Chapter 7. 2 Objectives You should be able to describe: The string Class Character Manipulation Methods Exception Handling Input Data Validation Namespaces.
1 of 3 Using digital ink, the Microsoft® Tablet PC offers the full power and functionality of a notebook PC with the added benefits of pen-based computing.
Introduction to Array The fundamental unit of data in any MATLAB program is the array. 1. An array is a collection of data values organized into rows and.
Important Problem Types and Fundamental Data Structures
Creating Web Page Forms
HTML Tables and Forms Creating Web Pages with HTML CIS 133 Web Programming Concepts 1.
TrendReader Standard 2 This generation of TrendReader Standard software utilizes the more familiar Windows format (“tree”) views of functions and file.
Tonga Institute of Higher Education IT 141: Information Systems CS Students Lecture 2: Microsoft Word.
Information guide.
1 7-Speech Recognition (Cont’d) HMM Calculating Approaches Neural Components Three Basic HMM Problems Viterbi Algorithm State Duration Modeling Training.
MICROSOFT WORD GETTING STARTED WITH WORD. CONTENTS 1.STARTING THE PROGRAMSTARTING THE PROGRAM 2.BASIC TEXT EDITINGBASIC TEXT EDITING 3.SAVING A DOCUMENTSAVING.
T U T O R I A L  2009 Pearson Education, Inc. All rights reserved Interest Calculator Application Introducing the For...Next Repetition Statements.
CMPE 421 Parallel Computer Architecture
WEKA - Explorer (sumber: WEKA Explorer user Guide for Version 3-5-5)
7-Speech Recognition Speech Recognition Concepts
1 CIS336 Website design, implementation and management (also Semester 2 of CIS219, CIS221 and IT226) Lecture 6 XSLT (Based on Møller and Schwartzbach,
Microsoft Office Excel 2003 Tutorial 3 – Developing a Professional-Looking Worksheet.
Chapter Three The UNIX Editors. 2 Lesson A The vi Editor.
1 ADVANCED MICROSOFT EXCEL Lesson 9 Applying Advanced Worksheets and Charts Options.
1 Working with MS SQL Server Textbook Chapter 14.
Avoiding Segmentation in Multi-digit Numeral String Recognition by Combining Single and Two-digit Classifiers Trained without Negative Examples Dan Ciresan.
Incorporating Dynamic Time Warping (DTW) in the SeqRec.m File Presented by: Clay McCreary, MSEE.
Chapter 8 Collecting Data with Forms. Chapter 8 Lessons Introduction 1.Plan and create a form 2.Edit and format a form 3.Work with form objects 4.Test.
Chapter 7 File I/O 1. File, Record & Field 2 The file is just a chunk of disk space set aside for data and given a name. The computer has no idea what.
Variables and ConstantstMyn1 Variables and Constants PHP stands for: ”PHP: Hypertext Preprocessor”, and it is a server-side programming language. Special.
Term 2, 2011 Week 1. CONTENTS Problem-solving methodology Programming and scripting languages – Programming languages Programming languages – Scripting.
Sequence Comparison Algorithms Ellen Walker Bioinformatics Hiram College.
Introduction to c++ programming - object oriented programming concepts - Structured Vs OOP. Classes and objects - class definition - Objects - class scope.
LANDESK SOFTWARE CONFIDENTIAL Tips and Tricks with Filters Jenny Lardh.
Lecture 6: Output 1.Presenting results in a professional manner 2.semicolon, disp(), fprintf() 3.Placeholders 4.Special characters 5.Format-modifiers 1.
Chapter 5 Introduction To Form Builder. Lesson A Objectives  Display Forms Builder forms in a Web browser  Use a data block form to view, insert, update,
HW4: sites that look like transcription start sites Nucleotide histogram Background frequency Count matrix for translation start sites (-10 to 10) Frequency.
Phone-Level Pronunciation Scoring and Assessment for Interactive Language Learning Speech Communication, 2000 Authors: S. M. Witt, S. J. Young Presenter:
1 7-Speech Recognition Speech Recognition Concepts Speech Recognition Approaches Recognition Theories Bayse Rule Simple Language Model P(A|W) Network Types.
Word Processing1. 2 Word Processing f What you need to know about: –entering text; –word-wrap; –alter text alignment; –line spacing –alter text style.
MICROSOFT ACCESS – CHAPTER 5 MICROSOFT ACCESS – CHAPTER 6 MICROSOFT ACCESS – CHAPTER 7 Sravanthi Lakkimsety Mar 14,2016.
 2002 Prentice Hall. All rights reserved. 1 Introduction to the Visual Studio.NET IDE Outline Introduction Visual Studio.NET Integrated Development Environment.
Ink Analysis Richard Anderson CSE 481b Winter 2007.
FILE I/O: Low-level 1. The Big Picture 2 Low-Level, cont. Some files are mixed format that are not readable by high- level functions such as xlsread()
Computer Fundamentals
Homework 1 Hints.
Programming Languages Translator
Shelly Cashman: Microsoft Word 2016
DATA STRUCTURES AND OBJECT ORIENTED PROGRAMMING IN C++
Word Processing.
Microsoft Dumps With Real Exam Question Answers - Dumps4download
Subject Name:Sysytem Software Subject Code: 10SCS52
Fundamentals of Data Structures
Presentation transcript:

February 1, 2005Microsoft Tablet PC Microsoft’s Cursive Recognizer Jay Pittman and the entire Microsoft Handwriting Recognition Research and Development Team

February 1, 2005Microsoft Tablet PC Syllabus  Neural Network Review  Microsoft’s Own Cursive Recognizer  Isolated Character Recognizer  Paragraph’s Calligrapher  Combined System

February 1, 2005Microsoft Tablet PC Neural Network Review  Directed acyclic graph  Nodes and arcs, each containing a simple value  Nodes contain activations, arcs contain weights  At run-time, we do a “forward pass” which computes activation from inputs to hiddens, and then to outputs  From the outside, the application only sees the input nodes and output nodes  Node values (in and out) range from 0.0 to

February 1, 2005Microsoft Tablet PC TDNN: Time Delayed Neural Network item 2item 3item 1 item 5 item 6 item 4  This is still a normal back-propagation network  All the points in the previous slide still apply  The difference is in the connections  Connections are limited  The input is segmented, and the same features are computed for each segment  I decided I didn’t like this artwork, so I started over (next slide)

February 1, 2005Microsoft Tablet PC TDNN: Time Delayed Neural Network item 2item 3item 1 item 5 item 6 item 4 item 1 Edge Effects For the first two and last two columns, the hidden nodes and input nodes that reach outside the range of our output receive zero activations

February 1, 2005Microsoft Tablet PC TDNN: Weights Are Shared item 2item 3 item 1item 5 item 6 item 4 item Since the weights are shared, this net is not really as big as it looks. When a net is stored (on disk or in memory), there is only one copy of each weight. On disk, we don’t store the activations, just the weights (and architecture).

February 1, 2005Microsoft Tablet PC Training  We use back-propagation training  We collect millions of words of ink data from thousands of writers  Young and old, male and female, left handed and right handed  Natural text, newspaper text, URLs, addresses, street addresses  We collect in over a dozen languages around the world  Training on such large databases takes weeks  We constantly worry about how well our data reflect our customers  Their writing styles  Their text content  We can be no better than the quality of our training sets  And that goes for our test sets too

February 1, 2005Microsoft Tablet PC Languages  We ship now in:  English (US), English (UK), French, German, Spanish, Italian  We have done some initial work in:  Dutch, Portuguese, Swedish, Danish, Norwegian, Finnish  We cannot predict when we might ship these  Using a completely different approach, we also ship now in:  Japanese, Chinese (Simplified), Chinese (Traditional), Korean

February 1, 2005Microsoft Tablet PC Recognizer Architecture … … … Output Matrix dog68 clog57 dug51 doom42 divvy37 ooze35 cloy34 doxy29 client22 dozy13 Ink Segments Top 10 List d 92 a 88 b 23 c 86 o 77 a 73 l 76 t 5 g 68 t 8 b 6 o 65 g 57 t 12 TDNN a b d o g a b t t c l o g t Lexicon e a … … … … … Beam Search a b d e g h n o

February 1, 2005Microsoft Tablet PC Segmentation midpoints going up tops bottoms tops and bottoms

February 1, 2005Microsoft Tablet PC TDNN Output Matrix a b c d e f g h i j k l m n t u v w

February 1, 2005Microsoft Tablet PC Language Model  Now that we have a complete output matrix from the TDNN, what are we going to do with it?  We get better recognition if we bias our interpretation of that output matrix with a language model  Better recognition means we can handle sloppier cursive  The lexicon (system dictionary) is the main part  But there is also a user dictionary  And there are regular expressions for things like dates and currency amounts  We want a generator  We ask it: “what characters could be next after this prefix?”  It answers with a set of characters  We still output the top letter recognitions  In case you are writing a word out-of-dictionary  You will have to write more neatly

February 1, 2005Microsoft Tablet PC Lexicon a b d o g a b t t c l o g t e a … … … olo r ur s s naly s z e e r r s s d d s s US UK A AA CC C A C A C A C the 952 ater 3606 US s 4187 US re THC s US UK A C 1234 u s Simple node Leaf node (end of valid word) U.S. only U.K. only Australian only Canadian only Unigram score (log of probability) walking ru nn UK A C

February 1, 2005Microsoft Tablet PC Clumsy lexicon Issue  The lexicon includes all the words in the spellchecker  The spellchecker includes obscenities  Otherwise they would get marked as misspelled  But people get upset if these words are offered as corrections for other misspellings  So the spellchecker marks them as “restricted”  We live in an apparently stochastic world  We will throw up 6 theories about what you were trying to write  If your ink is near an obscene word, we might include that  Dilemma:  We want to recognizer your obscene word when you write it Otherwise we are censoring, which is NOT our place  We DON’T want to offer these outputs when you don’t write them  Solution (weak):  We took these words out of the lexicon  You can still write them, because you can write out-of-dictionary  But you have to write very neat cursive, or nice handprint

February 1, 2005Microsoft Tablet PC Grammars seconds: Start MonthNum = " " | "1" "012"; seconds = digit | "12345" digit; MonthNum : Stop

February 1, 2005Microsoft Tablet PC Factoids and Input Scope  IS_DEFAULT  see next slide  IS_PHRASELIST  user dictionary only  IS_DATE_FULLDATE, IS_TIME_FULLTIME  IS_TIME_HOUR, IS_TIME_MINORSEC  IS_DATE_MONTH, IS_DATE_DAY, IS_DATE_YEAR, IS_DATE_MONTHNAME, IS_DATE_DAYNAME  IS_CURRENCY_AMOUNTANDSYMBOL, IS_CURRENCY_AMOUNT  IS_TELEPHONE_FULLTELEPHONENUMBER  IS_TELEPHONE_COUNTRYCODE, IS_TELEPHONE_AREACODE, IS_TELEPHONE_LOCALNUMBER  IS_ADDRESS_FULLPOSTALADDRESS  IS_ADDRESS_POSTALCODE, IS_ADDRESS_STREET, IS_ADDRESS_STATEORPROVINCE, IS_ADDRESS_CITY, IS_ADDRESS_COUNTRYNAME, IS_ADDRESS_COUNTRYSHORTNAME  IS_URL, IS_ _USERNAME, IS_ _SMTP ADDRESS  IS_FILE_FULLFILEPATH, IS_FILE_FILENAME  IS_DIGITS, IS_NUMBER  IS_ONECHAR  NONE  This yields an out-of-dictionary-only system Setting the Factoid property merely enables and disables various grammars and lexica

February 1, 2005Microsoft Tablet PC Default Factoid  Used when no factoid is set  Intended for natural text, such as the body of an  Includes system dictionary, user dictionary, hyphenation rule, number grammar, web address grammar  All wrapped by optional leading punctuation and trailing punctuation  Hyphenation rule allows sequence of dictionary words with hyphens between  Alternatively, can be a single character (any character supported by the system) Leading Punc Number Hyphenation UserDict SysDict Trailing Punc Web Single Char StartFinal

February 1, 2005Microsoft Tablet PC Factoid Extensibility  All the grammar-based factoids were specified in a regular expression grammar, and then “compiled” into the binary table using a simple compiler  The compiler is available at run time  Software vendors can add their own regular expressions  The string is set as the value of the Factoid property  One could imagine the DMV adding automobile VINs  This is in addition to the ability to load the user dictionary  One could load 500 color names for a color field in a form-based app  Or 8000 drug names in a prescription app  Construct a WordList object, and set it to the WordList property  Set the Factoid property to “IS_PHRASELIST”

February 1, 2005Microsoft Tablet PC Recognizer Architecture … … … Output Matrix dog68 clog57 dug51 doom42 divvy37 ooze35 cloy34 doxy29 client22 dozy13 Ink Segments Top 10 List d 92 a 88 b 23 c 86 o 77 a 73 l 76 t 5 g 68 t 8 b 6 o 65 g 57 t 12 TDNN a b d o g a b t t c l o g t Lexicon e a … … … … … Beam Search a b d e g h n o

February 1, 2005Microsoft Tablet PC DTW  Dynamic Time Warping  Dynamic Programming  Elastic Matching heaplent lehapnt From dictionary From user From prototypes From user

February 1, 2005Microsoft Tablet PC Brute Force Matching elphant n a h p l e t e Entry from dictionary Entry from user User must provide distance function 0 means match 1 means no match Matrix of all possible matches

February 1, 2005Microsoft Tablet PC Cumulative Matching Each cell adds its score with the minimum of the cumulative scores to the left, below, and left below. The upper right corner cell holds the total cost of aligning these two sequences Match Scores: Cumulative Scores: We start in the lower left corner and work our way up to the upper right corner.

February 1, 2005Microsoft Tablet PC Cumulative Matching elphant n a h p l e t e

February 1, 2005Microsoft Tablet PC Alignment elphant n a h p l e t e Each cell can remember which neighbor it used, and these can be used to follow a path back from the upper right corner A vertical move indicates an omission in the entry from the user (purple) A horizontal move indicates an insertion in the entry from the user (purple)

February 1, 2005Microsoft Tablet PC Ink Prototypes Ink from prototypes Ink from user

February 1, 2005Microsoft Tablet PC Searching the Prototypes elphant n a g l e t e We can compute the score for every word in the dictionary, to find the closest set of words This is slow, due to the size of the dictionary

February 1, 2005Microsoft Tablet PC DTW as a Stack elphant n a h p l e t e n a g t a b d o g a t t ele g p Lexicon e a … … … … … a ha nt nts If we compute row-by-row (from bottom), we can treat the matrix as a stack We can pop off a row when we back up a letter This allows us to walk the dictionary tree

February 1, 2005Microsoft Tablet PC Using Columns to Avoid Memory  If we compute the scores column-by-column, we don’t need to store the entire matrix  This isn’t a stack, so we don’t have to pop back to previous columns  We don’t even need double buffering, we just need 2 local variables  We don’t need to store the simple distance, just the cumulative distance Full Matrix Double Buffer Single Buffer Locals:

February 1, 2005Microsoft Tablet PC Beam Search 1 0 e l e 0 1 l l e 1e p p l e 1e h h p l e 2e 2g 3 2 a g We can do column-by-column and row-by-row at the same time if we treat the rows as a tree, with each new row pointing backwards to its parent

February 1, 2005Microsoft Tablet PC Why Is It Called a Beam Search?  As we compute a column, we can remember the best score so far  We add a constant to that score  Any scores worst than that are culled  Back in the original cumulative distance matrix, this keeps us from computing cells too far away from the best path (the beam)  Since we are following a tree, culling a cell may allow us to avoid an entire subtree  This is the real savings

February 1, 2005Microsoft Tablet PC Out of Dictionary  This is the wrong name:  It should really be called Out of Language Model  Or simply Unsupported Since letter sequences in the language mode are called “Supported”  We simply want to walk across the output matrix and find the best characters  This is needed for part numbers, and words and abbreviations we don’t yet have in the user dictionary  We bias the output (slightly) toward the language statistics by using bigram probabilities  For instance, the probability of the sequence “at”:  P(at | ink) = P(a | ink) P(t | ink) P(at)  where P(a | ink) and P(t | ink) come from the output matrix  and P(at) comes from the bigram table  We impose a penalty for OOD words, relative to supported words  Otherwise the entire language model accomplishes nothing  The COERCE flag, if on, disables the OOD system  This forces us to output the nearest language model character sequence, or nothing at all  There is also a Factoid NONE, which yields an out-of-dictionary-only recognizer

February 1, 2005Microsoft Tablet PC Error Correction: SetTextContext() Dictum Left Context Right Context “Dict” “” d 100 a 0 b 0 c 0 i 100 e 0 t 100 n 5 c 100 a 0 i 85 a 57 o User writes “Dictionary” 2. Recognizer misrecognizes it as “Dictum” 3. User selects “um” and rewrites “ionary” 4. TIP notes partial word selection, puts recognizer into correction mode with left and right context 5. Beam search artificially recognizes left context 6. Beam search runs ink as normal 7. Beam search artificially recognizes right context 8. This produces “ionary” in top 10 list; TIP must insert this to the right of “Dict” Goal: Better context usage for error correction scenarios

February 1, 2005Microsoft Tablet PC Isolated Character Recognizer  Input character is fed via a variety of features  Single neural network takes all inputs  Have also experimented with alternate version which has a separate neural network per stroke count Input a Neural Network

February 1, 2005Microsoft Tablet PC Calligrapher  The Russian recognition company Paragraph sold itself to SGI (Silicon Graphics, Incorporated), who then sold it to Vadem, who sold it to Microsoft.  In the purchase we obtained:  Calligrapher Cursive recognizer that shipped on the first Apple Newton  Transcriber Handwriting app for handheld computers  We combined our system with Calligrapher  We use a voting system to combine each recognizer’s top 10 list  They are very different, and make different mistakes  We get the best of both worlds  If either recognizer outputs a single-character “word” we forget these lists and run the isolated character recognizer

February 1, 2005Microsoft Tablet PC HMMs (Hidden Markov Models) Start with a DTW, but replace the sequence of ink segments on the left with a sequence of probability histograms; this represents a set of ink samples

February 1, 2005Microsoft Tablet PC Calligrapher dog59 clog54 dug44 doom 37 dig31 dag 29 cloy23 clug 18 clag14 clay 9 Top 10 List Beam Search d 92 a 88 b 14 c 86 o 67 a 57 l 76 t 5 g 59 t 8 g 37 o 65 g 54 y 23 d a HMM models … … a b d o g a b t t c l o g t Lexicon e a … … … … …

February 1, 2005Microsoft Tablet PC Personalization  Ink shape personalization  Simple concept: just do same training on this customer’s ink Start with components already trained on massive database of ink samples Train further on specific user’s ink samples Trains TDNN, combiner nets, isolated character network  Explicit training User must go to a wizard and copy a short script Does have labels from customer Limited in quantity, because of tediousness  Implicit training Data is collected in the background during normal use Doesn’t have labels from customer We must assume correctness of our recognition result using our confidence measure We get more data  Much of the work is in the GUI, the database support, management of different user’s trained networks, etc.  Lexicon personalization: Harvesting  Simple concept: just add the user’s new words to the lexicon  Examples: RTM, dev, SDET, dogfooding, KKOMO, featurization  Happens when correcting words in the TIP  Also scan Word docs and outgoing (avoid spam)