Download presentation
Presentation is loading. Please wait.
1
From Anthrax to ZIP Codes- The Handwriting is on the Wall Venu Govindaraju Dept. of Computer Science & Engineering University at Buffalo venu@cedar.buffalo.edu
2
Outline Success in Postal Application Role of Handwriting Recognition Recognition Models Interactive Cognitive Models New Research Areas Other Applications
3
USPS HWAI Background Postal Sponsorship Started – 1984 370 Academic Articles Published Millions of Letters Examined Many Experimental Systems Built and Tested Migrated from Hardware to Software System Only Postal Research Continuously Funded
4
Items to be Recognized, Read, and Evaluated (Machine printed and Script) Delivery address, sender´s address, endorsements Linear Codes, Mail Class Indicia (2D-Codes, Meter Marks) Meter Mark Sender’s Address Delivery Address Linear Code Digital Post Mark Endorsem ent In Case of Undeliverable as Addressed Return to Sender Pattern Recognition Tasks
5
Deployed.. USA 250 P&DC sites 27 Remote Encoding Centers 25 Billion Images Processed Annually 89% Automated Bar-coding UK 67 Processing Centers 27 Million Pieces Per Day, 9.7 Million Pieces Per Hour Peak Australia
6
RCR Overview Bar Code Sorter Remote Encodin g Advanced Facer Canceler Multi-Line OCR Image RCR
7
At the Right Price Processing TypeCost/1000 Pieces Manual$47.78 Mechanized$27.46 Automated$5.30
8
80% encode rate and counting!
9
Impact Applications of CEDAR research helping to automate tasks at IRS and USPS 1st year that USPS used CEDAR-developed software to read handwritten addresses on envelopes, saved $100 million 1997-1999 USPS deployment of CEDAR-developed RCRs, USPS saved 12 million work hours and over $340 million 500 scientific publications and 10 patents
10
Outline Success in Postal Application Role of Handwriting Recognition Recognition Models Interactive Cognitive Models New Research Areas Other Applications
11
Role Handwriting Recognition in Address Interpretation
12
–Create street name lexicon DPF yields 8 street names ZIP+4 yields 31 street names (on average about 5 times more) HAWLEYRD1034 NEWGATERD1533 BEE MOUNTAINRD1615 DORMANRD1642 BOWERS HILLRD1757 FREEMANRD1781 PUNKUPRD1784 PARKRD6124 Context Provided by Postal Directories
13
One record per delivery point in USA Provided weekly by USPS, San Mateo Raw DPF 138 million records 15 GB (114 bytes per record); 41,889 ZIP Code files Fields of interest to HWAI ZIP Code, street name, primary number, secondary number, add-on Context CEDAR
14
ZIP Code 30% of ZIP Codes contain a single street name 5% of ZIP Codes contain a single primary number 2% of ZIP Codes contain a single add-on Maximum number of records returned is 3,071 Maximum number of records returned is 3,070 Power of Context CEDAR
15
Outline Success in Postal Application Role of Handwriting Recognition Recognition Models Interactive Cognitive Models New Research Areas Other Applications
16
Handwriting Recognition Context Ranked Lexicon
17
Multiple Choice Question Context Ranked Lexicon
18
Lexicon Driven Model w[7.6] w[7.2] r[3.8] w[5.0] w[8.6] o[7.6]r[6.3] d[4.9] w[5.0] o[6.6] o[6.0] o[7.2] o[10.6] d[6.5] d[4.4] r[7.5] r[6.4] o[7.8]r[8.6] o[8.7]r[7.4] r[7.6] o[8.3] o[7.7]r[5.8] 123456789 o[6.1] Find the best way of accounting for characters ‘w’, ‘o’, ‘r’, ‘d’ buy consuming all segments 1 to 8 in the process Distance between lexicon entry ‘word’ first character ‘w’ and the image between: - segments 1 and 4 is 5.0 - segments 1 and 3 is 7.2 - segments 1 and 2 is 7.6
19
Lexicon Free Model i[.8], l[.8] u[.5], v[.2] w[.6], m[.3] w[.7] i[.7] u[.3] m[.2] m[.1] r[.4] d[.8] o[.5] -Image from 1 to 3 is a in with 0.5 confidence -Image from segment 1 to 4 is a ‘w’ with 0.7 confidence -Image from segment 1 to 5 is a ‘w’ with 0.6 confidence and an ‘m’ with 0.3 confidence Find the best path in graph from segment 1 to 8 w o r d
20
Holistic Features Slant Norm Turn Points Position Grid and gaps Ascender Descender Reference Lines
21
Lexicon Reduction and Verification
22
Outline Success in Postal Application Role of Handwriting Recognition Recognition Models Interactive Cognitive Models New Research Areas Other Applications
23
Grapheme Models
24
Structural Features BAG Junction Loops Loop Turns End
25
Feature Extraction and Ordering Critical node: removal disconnects a connected component. 2-degree critical nodes keep feature ordering from left to right. Left Component Right Component Loop End Turns Junction Loops End Turns
26
Continuous Attributes graphe me posorientationangle Down cusp 3.0-90 o Up loop Down arc
27
Stochastic Model
28
Observations
29
Results Lex sizeTopWMR %SM CA% 10196.8696.56 298.8098.77 100191.3689.12 295.3094.06 1000179.5875.38 288.2986.29 20000162.4358.14 271.0766.49
30
Interactive Models [McClelland and Rumelhart, Psychological Review, 1981] ABLETRIPTRAP A T N Words Letters Features
31
Interactive Recognition T-crossings, loops, ascenders, descenders, length West Central Street West Main Street Sunset Avenue West Central Street East Central Street Sunset Avenue West Central Street West Central Avenue Sunset Avenue Lexicon 1 Lexicon 2Lexicon 3 Interactive Model features image
32
Adaptive Character Recognition [Park and Govindaraju, IEEE CVPR 2000] Adaptive selection of features Adaptive number of features Adaptive resolutions Adaptive sequencing of features Adaptive termination conditions
33
Features 4 gradient features 5 moment features Vector code book
34
Feature Space |V| x |N c | x |I xy | 2 9 x 10 x 85 (quad tree, 4 levels) Recognition rate and feature |V| GSC: |V| : 2 512 Tradeoffs: space vs accuracy Hierarchical space with additional resolution and features as needed
35
Active Recognition Using Quad Trees
36
Experimental Results
38
Results ClassifierActive ModelNeural NetKNN Top 1%95.7 %96.4%95.7% Templates6129763,777 Msec/char1.4511.5384 Training hrs1241 25656 training and 12242 test (Postal +NIST)
39
Outline Success in Postal Application Role of Handwriting Recognition Recognition Models Interactive Cognitive Models New Research Areas Other Applications
40
Fast Recognition -Reuse matched characters -Reuse matched sub-strings -Parallel processing
41
Combination and Dynamic Selection [Govindaraju and Ianakiev, MCS 2000] WR 1 WR 2 WR 3 + Lexicon 1 Top 5 <55 Top 50 image Optimization problem Combinatorial explosion in arrangement of recognizers lexicon reduction levels
42
Lexicon Density [Govindaraju, Slavik, and Xue, IEEE PAMI 2002] Lexicon 1Lexicon 2 MeMe HeMemo SoMemory ToMemoirs InMellon
43
Classifier Performance Prediction [Xue and Govindaraju, IEEE PAMI 2002] q: probability that recognizer make a unit distance errors D: average distance between any two words in the lexicons n: lexicon size; p: performance; a, k,: model parameters ln (-ln p) = (ln q) D + a ln ln n + ln k
44
Outline Success in Postal Application Role of Handwriting Recognition Recognition Models Interactive Cognitive Models New Research Areas Other Applications
45
Bank Check Recognition
46
PCR Trend Analysis
47
NYS EMS PCR Form NYS PCR Example Thousands are filed a day. Passed from EMS to Hospital. PCR Purpose: – Medical care/diagnosis – Legal Documentation – Quality Assurance EMS Abbreviations COPDChronic Obstructive Pulmonary Disease CHFCongestive Heart Failure D/SDextrose in Saline PIDPelvic Inflammatory Disease GSWGunshot Wound NKANo known allergies KVOKeep vein open NaCLSodium Chloride
48
Medical Text Recognition and Data Mining
49
Reading Census Forms Lexicon Anomalies Space: “sales man” and “salesman” Morphology: “acct manager” and “account management” Abbreviation Plural: “school” and “schools” Typographical: “managar” and “manager”
50
Binarization
51
Historic Manuscripts
52
Summary Handwriting recognition technology Pattern recognition task Lexicon holds domain specific knowledge Adaptive methods Classifier combination methods Many applications
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.