Presentation is loading. Please wait.

Presentation is loading. Please wait.

From Anthrax to ZIP Codes- The Handwriting is on the Wall Venu Govindaraju Dept. of Computer Science & Engineering University at Buffalo

Similar presentations


Presentation on theme: "From Anthrax to ZIP Codes- The Handwriting is on the Wall Venu Govindaraju Dept. of Computer Science & Engineering University at Buffalo"— Presentation transcript:

1 From Anthrax to ZIP Codes- The Handwriting is on the Wall Venu Govindaraju Dept. of Computer Science & Engineering University at Buffalo venu@cedar.buffalo.edu

2 Outline Success in Postal Application Role of Handwriting Recognition Recognition Models Interactive Cognitive Models New Research Areas Other Applications

3 USPS HWAI Background Postal Sponsorship Started – 1984 370 Academic Articles Published Millions of Letters Examined Many Experimental Systems Built and Tested Migrated from Hardware to Software System Only Postal Research Continuously Funded

4 Items to be Recognized, Read, and Evaluated (Machine printed and Script) Delivery address, sender´s address, endorsements Linear Codes, Mail Class Indicia (2D-Codes, Meter Marks) Meter Mark Sender’s Address Delivery Address Linear Code Digital Post Mark Endorsem ent In Case of Undeliverable as Addressed Return to Sender Pattern Recognition Tasks

5 Deployed.. USA 250 P&DC sites 27 Remote Encoding Centers 25 Billion Images Processed Annually 89% Automated Bar-coding UK 67 Processing Centers 27 Million Pieces Per Day, 9.7 Million Pieces Per Hour Peak Australia

6 RCR Overview Bar Code Sorter Remote Encodin g Advanced Facer Canceler Multi-Line OCR Image RCR

7 At the Right Price Processing TypeCost/1000 Pieces Manual$47.78 Mechanized$27.46 Automated$5.30

8 80% encode rate and counting!

9 Impact Applications of CEDAR research helping to automate tasks at IRS and USPS 1st year that USPS used CEDAR-developed software to read handwritten addresses on envelopes, saved $100 million 1997-1999 USPS deployment of CEDAR-developed RCRs, USPS saved 12 million work hours and over $340 million 500 scientific publications and 10 patents

10 Outline Success in Postal Application Role of Handwriting Recognition Recognition Models Interactive Cognitive Models New Research Areas Other Applications

11 Role Handwriting Recognition in Address Interpretation

12 –Create street name lexicon DPF yields 8 street names ZIP+4 yields 31 street names (on average about 5 times more) HAWLEYRD1034 NEWGATERD1533 BEE MOUNTAINRD1615 DORMANRD1642 BOWERS HILLRD1757 FREEMANRD1781 PUNKUPRD1784 PARKRD6124 Context Provided by Postal Directories

13 One record per delivery point in USA Provided weekly by USPS, San Mateo Raw DPF 138 million records 15 GB (114 bytes per record); 41,889 ZIP Code files Fields of interest to HWAI ZIP Code, street name, primary number, secondary number, add-on Context CEDAR

14 ZIP Code 30% of ZIP Codes contain a single street name 5% of ZIP Codes contain a single primary number 2% of ZIP Codes contain a single add-on Maximum number of records returned is 3,071 Maximum number of records returned is 3,070 Power of Context CEDAR

15 Outline Success in Postal Application Role of Handwriting Recognition Recognition Models Interactive Cognitive Models New Research Areas Other Applications

16 Handwriting Recognition Context Ranked Lexicon

17 Multiple Choice Question Context Ranked Lexicon

18 Lexicon Driven Model w[7.6] w[7.2] r[3.8] w[5.0] w[8.6] o[7.6]r[6.3] d[4.9] w[5.0] o[6.6] o[6.0] o[7.2] o[10.6] d[6.5] d[4.4] r[7.5] r[6.4] o[7.8]r[8.6] o[8.7]r[7.4] r[7.6] o[8.3] o[7.7]r[5.8] 123456789 o[6.1] Find the best way of accounting for characters ‘w’, ‘o’, ‘r’, ‘d’ buy consuming all segments 1 to 8 in the process Distance between lexicon entry ‘word’ first character ‘w’ and the image between: - segments 1 and 4 is 5.0 - segments 1 and 3 is 7.2 - segments 1 and 2 is 7.6

19 Lexicon Free Model i[.8], l[.8] u[.5], v[.2] w[.6], m[.3] w[.7] i[.7] u[.3] m[.2] m[.1] r[.4] d[.8] o[.5] -Image from 1 to 3 is a in with 0.5 confidence -Image from segment 1 to 4 is a ‘w’ with 0.7 confidence -Image from segment 1 to 5 is a ‘w’ with 0.6 confidence and an ‘m’ with 0.3 confidence Find the best path in graph from segment 1 to 8 w o r d

20 Holistic Features Slant Norm Turn Points Position Grid and gaps Ascender Descender Reference Lines

21 Lexicon Reduction and Verification

22 Outline Success in Postal Application Role of Handwriting Recognition Recognition Models Interactive Cognitive Models New Research Areas Other Applications

23 Grapheme Models

24 Structural Features BAG Junction Loops Loop Turns End

25 Feature Extraction and Ordering Critical node: removal disconnects a connected component. 2-degree critical nodes keep feature ordering from left to right. Left Component Right Component Loop End Turns Junction Loops End Turns

26 Continuous Attributes graphe me posorientationangle Down cusp 3.0-90 o Up loop Down arc

27 Stochastic Model

28 Observations

29 Results Lex sizeTopWMR %SM CA% 10196.8696.56 298.8098.77 100191.3689.12 295.3094.06 1000179.5875.38 288.2986.29 20000162.4358.14 271.0766.49

30 Interactive Models [McClelland and Rumelhart, Psychological Review, 1981] ABLETRIPTRAP A T N Words Letters Features

31 Interactive Recognition T-crossings, loops, ascenders, descenders, length West Central Street West Main Street Sunset Avenue West Central Street East Central Street Sunset Avenue West Central Street West Central Avenue Sunset Avenue Lexicon 1 Lexicon 2Lexicon 3 Interactive Model features image

32 Adaptive Character Recognition [Park and Govindaraju, IEEE CVPR 2000] Adaptive selection of features Adaptive number of features Adaptive resolutions Adaptive sequencing of features Adaptive termination conditions

33 Features 4 gradient features 5 moment features Vector code book

34 Feature Space |V| x |N c | x |I xy | 2 9 x 10 x 85 (quad tree, 4 levels) Recognition rate and feature |V| GSC: |V| : 2 512 Tradeoffs: space vs accuracy Hierarchical space with additional resolution and features as needed

35 Active Recognition Using Quad Trees

36 Experimental Results

37

38 Results ClassifierActive ModelNeural NetKNN Top 1%95.7 %96.4%95.7% Templates6129763,777 Msec/char1.4511.5384 Training hrs1241 25656 training and 12242 test (Postal +NIST)

39 Outline Success in Postal Application Role of Handwriting Recognition Recognition Models Interactive Cognitive Models New Research Areas Other Applications

40 Fast Recognition -Reuse matched characters -Reuse matched sub-strings -Parallel processing

41 Combination and Dynamic Selection [Govindaraju and Ianakiev, MCS 2000] WR 1 WR 2 WR 3 + Lexicon 1 Top 5 <55 Top 50 image Optimization problem Combinatorial explosion in arrangement of recognizers lexicon reduction levels

42 Lexicon Density [Govindaraju, Slavik, and Xue, IEEE PAMI 2002] Lexicon 1Lexicon 2 MeMe HeMemo SoMemory ToMemoirs InMellon

43 Classifier Performance Prediction [Xue and Govindaraju, IEEE PAMI 2002] q: probability that recognizer make a unit distance errors D: average distance between any two words in the lexicons n: lexicon size; p: performance; a, k,: model parameters ln (-ln p) = (ln q) D + a ln ln n + ln k

44 Outline Success in Postal Application Role of Handwriting Recognition Recognition Models Interactive Cognitive Models New Research Areas Other Applications

45 Bank Check Recognition

46 PCR Trend Analysis

47 NYS EMS PCR Form NYS PCR Example Thousands are filed a day. Passed from EMS to Hospital. PCR Purpose: – Medical care/diagnosis – Legal Documentation – Quality Assurance EMS Abbreviations COPDChronic Obstructive Pulmonary Disease CHFCongestive Heart Failure D/SDextrose in Saline PIDPelvic Inflammatory Disease GSWGunshot Wound NKANo known allergies KVOKeep vein open NaCLSodium Chloride

48 Medical Text Recognition and Data Mining

49 Reading Census Forms Lexicon Anomalies Space: “sales man” and “salesman” Morphology: “acct manager” and “account management” Abbreviation Plural: “school” and “schools” Typographical: “managar” and “manager”

50 Binarization

51 Historic Manuscripts

52 Summary Handwriting recognition technology Pattern recognition task Lexicon holds domain specific knowledge Adaptive methods Classifier combination methods Many applications


Download ppt "From Anthrax to ZIP Codes- The Handwriting is on the Wall Venu Govindaraju Dept. of Computer Science & Engineering University at Buffalo"

Similar presentations


Ads by Google