From Anthrax to ZIP Codes- The Handwriting is on the Wall Venu Govindaraju Dept. of Computer Science & Engineering University at Buffalo

Slides:



Advertisements
Similar presentations
Input & Output Devices ASHIMA KALRA.
Advertisements

Mailers+4 The #1 Mailing Software in the Industry.
A Low-cost Attack on a Microsoft CAPTCHA Yan Qiang,
Optimizing Laser Scanner Locations using Viewshed Analysis MEA 592 Final Project November 20,2009 Jeff Smith.
Input to the Computer * Input * Keyboard * Pointing Devices
CS 344: Artificial Intelligence Presented by: 1)Nikunj Saunshi ( ) 2)Aditya Bhandari ( ) 3)Sameer Kumar Agrawal ( ) Postal Address.
Projects CS 661. DAS 02, Princeton, NJ OCR Features and Systems –Degradation models, script ID, Bilingual OCR, Kannada OCR, Tamil OCR, mp versus hw checks,
May 15, 2014 Time-Delayed and Date-Certain Delivery Mail Michael Ravnitzky.
Combining Human and Machine Capabilities for Improved Accuracy and Speed in Visual Recognition Tasks Research Experiment Design Sprint: IVS Flower Recognition.
Venu Govindaraju Dept. of Computer Science & Engineering University at Buffalo Pattern Analysis and Machine.
Feature vs. Model Based Vocal Tract Length Normalization for a Speech Recognition-based Interactive Toy Jacky CHAU Department of Computer Science and Engineering.
Data Mining, Information Theory and Image Interpretation Sargur N. Srihari Center of Excellence for Document Analysis and Recognition and Department of.
Computer Science Prof. Bill Pugh Dept. of Computer Science.
Chapter 12 File Management Systems
Document Image Analysis CSE 717 An Introduction. Document Image Analysis  DIA is the theory and practice of recovering the symbol structures of digital.
Pattern Recognition. Introduction. Definitions.. Recognition process. Recognition process relates input signal to the stored concepts about the object.
Data Mining – Intro.
From Anthrax to ZIP Codes- The Handwriting is on the Wall
Brief Overview of Data Processing of Afghanistan Household Listing, Pilot Census Results, Population and Housing Census and NRVA Survey Brief Overview.
Introduction to machine learning
Postal Automated Redirection System (PARS) John Keegan Manager, Automation Equipment Engineering Mailers´ Technical Advisory Committee (MTAC) November.
Total Address Quality. Total Address Quality 2 OSV At A Glance OSV produces over 500 Million envelopes every year! OSV processes over 4 Million addresses.
VEHICLE NUMBER PLATE RECOGNITION SYSTEM. Information and constraints Character recognition using moments. Character recognition using OCR. Signature.
Walter Hop Web-shop Order Prediction Using Machine Learning Master’s Thesis Computational Economics.
Background on USPS mail forwarding operations Overview of PARS
CA III Publisher Lessons 1 and 2 © 2009 M and K Solutions, LLC -- All Rights Reserved.
Extracting Places and Activities from GPS Traces Using Hierarchical Conditional Random Fields Yong-Joong Kim Dept. of Computer Science Yonsei.
: Chapter 1: Introduction 1 Montri Karnjanadecha ac.th/~montri Principles of Pattern Recognition.
Diego Saavedra Data Capture and Validation at Point of Collection.
Pattern Recognition & Machine Learning Debrup Chakraborty
Designing a Graphical User Interface (GUI) Krisana Chinnasarn, Ph.D. January 2007.
Royal Mail Business Mail 1st Class Royal Mail Business Mail Overview.
1 Recognition of Multi-Fonts Character in Early-Modern Printed Books Chisato Ishikawa(1), Naomi Ashida(1)*, Yurie Enomoto(1), Masami Takata(1), Tsukasa.
CDP Standard Grade1 Commercial Data Processing Standard Grade Computing Studies.
Compiled By: Raj G Tiwari.  A pattern is an object, process or event that can be given a name.  A pattern class (or category) is a set of patterns sharing.
Corporate Automation Plan PHASE 2 February 4, 2004Washington D.C.
GA-Based Feature Selection and Parameter Optimization for Support Vector Machine Cheng-Lung Huang, Chieh-Jen Wang Expert Systems with Applications, Volume.
® Letter Automation History Bill Galligan Senior Vice President, Operations MTAC Meeting January 31, 2008.
1 Machine Learning 1.Where does machine learning fit in computer science? 2.What is machine learning? 3.Where can machine learning be applied? 4.Should.
Kansas State University Department of Computing and Information Sciences CIS 830: Advanced Topics in Artificial Intelligence Wednesday, March 29, 2000.
Presenter: Tracy Wessler June 5, 2007 The Use of High Speed Data Processing to Capture Census Data U.S. Census Bureau Decennial Response Integration System.
School of Computer Science 1 Information Extraction with HMM Structures Learned by Stochastic Optimization Dayne Freitag and Andrew McCallum Presented.
Lesson 13 Databases Unit 2—Using the Computer. Computer Concepts BASICS - 22 Objectives Define the purpose and function of database software. Identify.
1 ICASSP Paper Survey Presenter: Chen Yi-Ting. 2 Improved Spoken Document Retrieval With Dynamic Key Term Lexicon and Probabilistic Latent Semantic Analysis.
What is GIS? “A powerful set of tools for collecting, storing, retrieving, transforming and displaying spatial data”
Learning Kernel Classifiers 1. Introduction Summarized by In-Hee Lee.
EXAMPLE : Principle of Optimality and Dynamic Programming.
Arabic Handwriting Recognition Thomas Taylor. Roadmap  Introduction to Handwriting Recognition  Introduction to Arabic Language  Challenges of Recognition.
The Big Picture Things to think about What different ways are there to collect information automatically? What are the advantages and disadvantages of.
What Are Bad Addresses – Xactly? 1 What are “Bad” Addresses & What Do You Do With Them – Xactly? National Postal Forum Anaheim, CA May 18 th – 21 th, 2007.
Verification vs. Validation Verification: "Are we building the product right?" The software should conform to its specification.The software should conform.
Optical Character Recognition
Combining Neural Networks and Context-Driven Search for On- Line, Printed Handwriting Recognition in the Newton Larry S. Yaeger, Brandn J. Web, and Richard.
Input & Output Devices ASHIMA KALRA.
DATA COLLECTION Data Collection Data Verification and Validation.
“<Fill in your definition here.>”
School of Computer Science & Engineering
Supervised Time Series Pattern Discovery through Local Importance
Chapter 3 Raster & Vector Data.
Hazards Planning and Risk Management INTRODUCTION TO ARCGIS
Hazards Planning and Risk Management INTRODUCTION TO ARCGIS
Basic machine learning background with Python scikit-learn
Database Systems – Data Hygiene
What is Pattern Recognition?
UN Workshop on Data Capture, Bangkok Session 7 Data Capture
Outline Y. LeCun, L. Bottou, Y. Bengio, and P. Haffner, “Gradient-based learning applied to document recognition,” Proceedings of the IEEE, vol. 86, no.
UN Workshop on Data Capture, Dar es Salaam Session 7 Data Capture
Overview of Machine Learning
Feature Selection Methods
Presentation transcript:

From Anthrax to ZIP Codes- The Handwriting is on the Wall Venu Govindaraju Dept. of Computer Science & Engineering University at Buffalo

Outline Success in Postal Application Role of Handwriting Recognition Recognition Models Interactive Cognitive Models New Research Areas Other Applications

USPS HWAI Background Postal Sponsorship Started – Academic Articles Published Millions of Letters Examined Many Experimental Systems Built and Tested Migrated from Hardware to Software System Only Postal Research Continuously Funded

Items to be Recognized, Read, and Evaluated (Machine printed and Script) Delivery address, sender´s address, endorsements Linear Codes, Mail Class Indicia (2D-Codes, Meter Marks) Meter Mark Sender’s Address Delivery Address Linear Code Digital Post Mark Endorsem ent In Case of Undeliverable as Addressed Return to Sender Pattern Recognition Tasks

Deployed.. USA 250 P&DC sites 27 Remote Encoding Centers 25 Billion Images Processed Annually 89% Automated Bar-coding UK 67 Processing Centers 27 Million Pieces Per Day, 9.7 Million Pieces Per Hour Peak Australia

RCR Overview Bar Code Sorter Remote Encodin g Advanced Facer Canceler Multi-Line OCR Image RCR

At the Right Price Processing TypeCost/1000 Pieces Manual$47.78 Mechanized$27.46 Automated$5.30

80% encode rate and counting!

Impact Applications of CEDAR research helping to automate tasks at IRS and USPS 1st year that USPS used CEDAR-developed software to read handwritten addresses on envelopes, saved $100 million USPS deployment of CEDAR-developed RCRs, USPS saved 12 million work hours and over $340 million 500 scientific publications and 10 patents

Outline Success in Postal Application Role of Handwriting Recognition Recognition Models Interactive Cognitive Models New Research Areas Other Applications

Role Handwriting Recognition in Address Interpretation

–Create street name lexicon DPF yields 8 street names ZIP+4 yields 31 street names (on average about 5 times more) HAWLEYRD1034 NEWGATERD1533 BEE MOUNTAINRD1615 DORMANRD1642 BOWERS HILLRD1757 FREEMANRD1781 PUNKUPRD1784 PARKRD6124 Context Provided by Postal Directories

One record per delivery point in USA Provided weekly by USPS, San Mateo Raw DPF 138 million records 15 GB (114 bytes per record); 41,889 ZIP Code files Fields of interest to HWAI ZIP Code, street name, primary number, secondary number, add-on Context CEDAR

ZIP Code 30% of ZIP Codes contain a single street name 5% of ZIP Codes contain a single primary number 2% of ZIP Codes contain a single add-on Maximum number of records returned is 3,071 Maximum number of records returned is 3,070 Power of Context CEDAR

Outline Success in Postal Application Role of Handwriting Recognition Recognition Models Interactive Cognitive Models New Research Areas Other Applications

Handwriting Recognition Context Ranked Lexicon

Multiple Choice Question Context Ranked Lexicon

Lexicon Driven Model w[7.6] w[7.2] r[3.8] w[5.0] w[8.6] o[7.6]r[6.3] d[4.9] w[5.0] o[6.6] o[6.0] o[7.2] o[10.6] d[6.5] d[4.4] r[7.5] r[6.4] o[7.8]r[8.6] o[8.7]r[7.4] r[7.6] o[8.3] o[7.7]r[5.8] o[6.1] Find the best way of accounting for characters ‘w’, ‘o’, ‘r’, ‘d’ buy consuming all segments 1 to 8 in the process Distance between lexicon entry ‘word’ first character ‘w’ and the image between: - segments 1 and 4 is segments 1 and 3 is segments 1 and 2 is 7.6

Lexicon Free Model i[.8], l[.8] u[.5], v[.2] w[.6], m[.3] w[.7] i[.7] u[.3] m[.2] m[.1] r[.4] d[.8] o[.5] -Image from 1 to 3 is a in with 0.5 confidence -Image from segment 1 to 4 is a ‘w’ with 0.7 confidence -Image from segment 1 to 5 is a ‘w’ with 0.6 confidence and an ‘m’ with 0.3 confidence Find the best path in graph from segment 1 to 8 w o r d

Holistic Features Slant Norm Turn Points Position Grid and gaps Ascender Descender Reference Lines

Lexicon Reduction and Verification

Outline Success in Postal Application Role of Handwriting Recognition Recognition Models Interactive Cognitive Models New Research Areas Other Applications

Grapheme Models

Structural Features BAG Junction Loops Loop Turns End

Feature Extraction and Ordering Critical node: removal disconnects a connected component. 2-degree critical nodes keep feature ordering from left to right. Left Component Right Component Loop End Turns Junction Loops End Turns

Continuous Attributes graphe me posorientationangle Down cusp o Up loop Down arc

Stochastic Model

Observations

Results Lex sizeTopWMR %SM CA%

Interactive Models [McClelland and Rumelhart, Psychological Review, 1981] ABLETRIPTRAP A T N Words Letters Features

Interactive Recognition T-crossings, loops, ascenders, descenders, length West Central Street West Main Street Sunset Avenue West Central Street East Central Street Sunset Avenue West Central Street West Central Avenue Sunset Avenue Lexicon 1 Lexicon 2Lexicon 3 Interactive Model features image

Adaptive Character Recognition [Park and Govindaraju, IEEE CVPR 2000] Adaptive selection of features Adaptive number of features Adaptive resolutions Adaptive sequencing of features Adaptive termination conditions

Features 4 gradient features 5 moment features Vector code book

Feature Space |V| x |N c | x |I xy | 2 9 x 10 x 85 (quad tree, 4 levels) Recognition rate and feature |V| GSC: |V| : Tradeoffs: space vs accuracy Hierarchical space with additional resolution and features as needed

Active Recognition Using Quad Trees

Experimental Results

Results ClassifierActive ModelNeural NetKNN Top 1%95.7 %96.4%95.7% Templates ,777 Msec/char Training hrs training and test (Postal +NIST)

Outline Success in Postal Application Role of Handwriting Recognition Recognition Models Interactive Cognitive Models New Research Areas Other Applications

Fast Recognition -Reuse matched characters -Reuse matched sub-strings -Parallel processing

Combination and Dynamic Selection [Govindaraju and Ianakiev, MCS 2000] WR 1 WR 2 WR 3 + Lexicon 1 Top 5 <55 Top 50 image Optimization problem Combinatorial explosion in arrangement of recognizers lexicon reduction levels

Lexicon Density [Govindaraju, Slavik, and Xue, IEEE PAMI 2002] Lexicon 1Lexicon 2 MeMe HeMemo SoMemory ToMemoirs InMellon

Classifier Performance Prediction [Xue and Govindaraju, IEEE PAMI 2002] q: probability that recognizer make a unit distance errors D: average distance between any two words in the lexicons n: lexicon size; p: performance; a, k,: model parameters ln (-ln p) = (ln q) D + a ln ln n + ln k

Outline Success in Postal Application Role of Handwriting Recognition Recognition Models Interactive Cognitive Models New Research Areas Other Applications

Bank Check Recognition

PCR Trend Analysis

NYS EMS PCR Form NYS PCR Example Thousands are filed a day. Passed from EMS to Hospital. PCR Purpose: – Medical care/diagnosis – Legal Documentation – Quality Assurance EMS Abbreviations COPDChronic Obstructive Pulmonary Disease CHFCongestive Heart Failure D/SDextrose in Saline PIDPelvic Inflammatory Disease GSWGunshot Wound NKANo known allergies KVOKeep vein open NaCLSodium Chloride

Medical Text Recognition and Data Mining

Reading Census Forms Lexicon Anomalies Space: “sales man” and “salesman” Morphology: “acct manager” and “account management” Abbreviation Plural: “school” and “schools” Typographical: “managar” and “manager”

Binarization

Historic Manuscripts

Summary Handwriting recognition technology Pattern recognition task Lexicon holds domain specific knowledge Adaptive methods Classifier combination methods Many applications