UC Berkeley CS294-9 Fall 200011- 1 Document Image Analysis Lecture 11: Word Recognition and Segmentation Richard J. Fateman Henry S. Baird University of.

Slides:



Advertisements
Similar presentations
Patient information extraction in digitized X-ray imagery Hsien-Huang P. Wu Department of Electrical Engineering, National Yunlin University of Science.
Advertisements

Pseudo-Relevance Feedback For Multimedia Retrieval By Rong Yan, Alexander G. and Rong Jin Mwangi S. Kariuki
Road-Sign Detection and Recognition Based on Support Vector Machines Saturnino, Sergio et al. Yunjia Man ECG 782 Dr. Brendan.
Segmentation of Touching Characters in Devnagari & Bangla Scripts Using Fuzzy MultiFactorial Analysis Presented By: Sanjeev Maharjan St. Xavier’s College.
Word Spotting DTW.
Prénom Nom Document Analysis: Document Image Processing Prof. Rolf Ingold, University of Fribourg Master course, spring semester 2008.
Fitting: The Hough transform. Voting schemes Let each feature vote for all the models that are compatible with it Hopefully the noise features will not.
DIGITAL GRAPHICS & ANIMATION Complete LESSON 4 ADDING TEXT TO GRAPHICS.
Recognition of Fragmented Characters Using Multiple Feature-Subset Classifiers Recognition of Fragmented Characters Using Multiple Feature-Subset Classifiers.
Esmail Hadi Houssein ID/  „Motivation  „Problem Overview  „License plate segmentation  „Character segmentation  „Character Recognition.
CS 344: Artificial Intelligence Presented by: 1)Nikunj Saunshi ( ) 2)Aditya Bhandari ( ) 3)Sameer Kumar Agrawal ( ) Postal Address.
Quadtrees, Octrees and their Applications in Digital Image Processing
Whitmore/Stevenson: Strategies for Engineering Communication 1 of 9 Paper Résumés  Use white space effectively by providing adequate margins (about one.
Prénom Nom Document Analysis: Segmentation & Layout Analysis Prof. Rolf Ingold, University of Fribourg Master course, spring semester 2008.
An Inference Procedure
Segmentation Divide the image into segments. Each segment:
Quadtrees, Octrees and their Applications in Digital Image Processing
Chaincode Generation Contour separation extracted by algorithm Image Chaincode contour Represented as an array of coordinates and corresponding slopes.
A Probabilistic Classifier for Table Visual Analysis William Silversmith TANGO Research Project NSF Grant # and Greetings Prof. Embley!
Fitting a Model to Data Reading: 15.1,
Smart Traveller with Visual Translator. What is Smart Traveller? Mobile Device which is convenience for a traveller to carry Mobile Device which is convenience.
California Car License Plate Recognition System ZhengHui Hu Advisor: Dr. Kang.
Handwritten Character Recognition using Hidden Markov Models Quantifying the marginal benefit of exploiting correlations between adjacent characters and.
UC Berkeley CS294-9 Fall Document Image Analysis Lecture 5: Metrics Richard J. Fateman Henry S. Baird University of California – Berkeley Xerox.
Abstract # 0000 Make the Main Title with Large Bold Type Your Name Here Your Department Here Texas A&M Health Science Center Make the Main Title with Large.
Track, Trace & Control Solutions © 2010 Microscan Systems, Inc. Machine Vision Tools for Solving Auto ID Applications Part 3 of a 3-part webinar series:
(Off-Line) Cursive Word Recognition Tal Steinherz Tel-Aviv University.
Face Detection using the Viola-Jones Method
Multiclass object recognition
Classification with Hyperplanes Defines a boundary between various points of data which represent examples plotted in multidimensional space according.
FEATURE EXTRACTION FOR JAVA CHARACTER RECOGNITION Rudy Adipranata, Liliana, Meiliana Indrawijaya, Gregorius Satia Budhi Informatics Department, Petra Christian.
ONLINE HANDWRITTEN GURMUKHI SCRIPT RECOGNITION AND ITS CHALLENGES R. K. SHARMA THAPAR UNIVERSITY, PATIALA.
The Three R’s of Vision Jitendra Malik.
Chapter 4 Pattern Recognition Concepts continued.
Make the Main Title with Large Bold Type Your Name and Title Here Your Department Here Texas A&M Health Science Center Make the Main Title with Large Bold.
Abstract # 0000 Make the Main Title with Large Bold Type Use Smaller Type for the Subtitle. Above Type is 105pt. This Type is 70pt. Make authors’ names.
CS 360 Lecture 3.  The software process is a structured set of activities required to develop a software system.  Fundamental Assumption:  Good software.
02/26/02 (c) 2002 University of Wisconsin, CS 559 Last Time Canonical view pipeline Orthographic projection –There was an error in the matrix for taking.
Automated Form processing for DTIC Documents March 20, 2006 Presented By, K. Maly, M. Zubair, S. Zeil.
1 CS 430 / INFO 430 Information Retrieval Lecture 2 Text Based Information Retrieval.
S EGMENTATION FOR H ANDWRITTEN D OCUMENTS Omar Alaql Fab. 20, 2014.
CS 6825: Binary Image Processing – binary blob metrics
Compiled By: Raj G Tiwari.  A pattern is an object, process or event that can be given a name.  A pattern class (or category) is a set of patterns sharing.
Avoiding Segmentation in Multi-digit Numeral String Recognition by Combining Single and Two-digit Classifiers Trained without Negative Examples Dan Ciresan.
Quadtrees, Octrees and their Applications in Digital Image Processing.
Learning to perceive how hand-written digits were drawn Geoffrey Hinton Canadian Institute for Advanced Research and University of Toronto.
1 Pattern Recognition Pattern recognition is: 1. A research area in which patterns in data are found, recognized, discovered, …whatever. 2. A catchall.
Chapter 4: Pattern Recognition. Classification is a process that assigns a label to an object according to some representation of the object’s properties.
CS654: Digital Image Analysis Lecture 25: Hough Transform Slide credits: Guillermo Sapiro, Mubarak Shah, Derek Hoiem.
CSC508 Convolution Operators. CSC508 Convolution Arguably the most fundamental operation of computer vision It’s a neighborhood operator –Similar to the.
UC Berkeley CS294-9 Fall Document Image Analysis Lecture 4: Image Transformations Richard J. Fateman Henry S. Baird University of California.
Feature Selection and Weighting using Genetic Algorithm for Off-line Character Recognition Systems Faten Hussein Presented by The University of British.
CSC321 Lecture 5 Applying backpropagation to shape recognition Geoffrey Hinton.
Scanned Documents INST 734 Module 10 Doug Oard. Agenda Document image retrieval  Representation Retrieval Thanks for David Doermann for most of these.
CSC321: 2011 Introduction to Neural Networks and Machine Learning Lecture 6: Applying backpropagation to shape recognition Geoffrey Hinton.
Software Development Process CS 360 Lecture 3. Software Process The software process is a structured set of activities required to develop a software.
UC Berkeley CS294-9 Fall b- 1 Document Image Analysis Lecture 12b: Integrating other info Richard J. Fateman Henry S. Baird University of California.
Preliminary Transformations Presented By: -Mona Saudagar Under Guidance of: - Prof. S. V. Jain Multi Oriented Text Recognition In Digital Images.
Optical Character Recognition
Make the Main Title with Large Bold Type Use Smaller Type for the Subtitle. Above Type is 110pt. This Type is 80pt. Make authors’ names smaller. This is.
EE368 Final Project Spring 2003
UC Berkeley CS294-9 Fall Document Image Analysis Lecture 12: Word Segmentation Richard J. Fateman Henry S. Baird University of California – Berkeley.
Combining Neural Networks and Context-Driven Search for On- Line, Printed Handwriting Recognition in the Newton Larry S. Yaeger, Brandn J. Web, and Richard.
OCR Reading.
Adobe Flash Professional CS5 – Illustrated
Text Based Information Retrieval
Unit 2 Terms Word Processing.
UN Workshop on Data Capture, Bangkok Session 7 Data Capture
Outline Announcement Texture modeling - continued Some remarks
UN Workshop on Data Capture, Dar es Salaam Session 7 Data Capture
Presentation transcript:

UC Berkeley CS294-9 Fall Document Image Analysis Lecture 11: Word Recognition and Segmentation Richard J. Fateman Henry S. Baird University of California – Berkeley Xerox Palo Alto Research Center

UC Berkeley CS294-9 Fall The course so far…. DIA overview, objectives, measuring success Isolated-symbol recognition: –Symbols/glyphs, models/features/classifiers –image metrics, scaling up to 100 fonts of full ASCII –last 2 lectures: ‘best’ classifier none dominates but: voting helps combinations of randomized features/ classifiers!

UC Berkeley CS294-9 Fall Recall: we can often spot words when characters are unclear… Crude segmentation into columns, paragraphs, lines, words Bottom up, by smearing horiz/ vert … or Top down, by recursive x-y cuts what we really want is WORD recognition, most of the time.

UC Berkeley CS294-9 Fall Recall the scenario (lecture 9) Lopresti & Zhou (1994)

UC Berkeley CS294-9 Fall The flow goes one way No opportunity to correct failures in segmentation at symbol stage No opportunity to object to implausible text at the next stage. (providing alternative character choices gives limited flexibility)

UC Berkeley CS294-9 Fall Recall: Character-by-Character Voting Succeeds & Fails Majority vote (the most commonly used method)

UC Berkeley CS294-9 Fall High accuracy requires some cleverness In fact, some words, even in cleanly typeset text high- resolution scanned, have touching characters In noisy or low resolution images, adjacent characters may be nearly entirely touching or broken (or both touching and broken!) If we accept the flowchart model: we need perfect segmentation to feed the symbol recognition module If we reject the flowchart: OK, where do we go from here?

UC Berkeley CS294-9 Fall Compare alternative approaches First clarify the word recognition problem and see how to approach it. Next we see how good a job can we do on segmentation (a fall-back when can’t use the word recognition model). Robustness might require both approaches (multiple algorithms again!)

UC Berkeley CS294-9 Fall Formalize the word recognition problem (TKHo) Machine printed, ordinary fonts (var. width) Cut down on the variations –NOT: A word is all in same font/size [shape= feature] [we could trivialize task with one font, e.g. E-13B] Known lexicon (say 100,000 English words) 26^6 is 308 million; our lexicon is < 0.3% of this [trivialize with 1 item (check the box, say “yes”..)] Applications in mind: post office, UNLV bakeoff

UC Berkeley CS294-9 Fall Word Recognition: Objective

UC Berkeley CS294-9 Fall At Least Three Approaches

UC Berkeley CS294-9 Fall In reality, a combination: Later we will find that additional processing: inter-word statistics or even natural language parsing may be incorporated in the ranking.

UC Berkeley CS294-9 Fall Character Recognition Approach Symbol recognition is done at the character level. Contextual knowledge is used only at the ranking stage

UC Berkeley CS294-9 Fall One error in character segmentation can distort many characters Input word image Character Segmentation Segmented and normalized characters Recognition decisions

UC Berkeley CS294-9 Fall How to segment words to characters? Aspect ratio (fixed width, anyway) Projection profile Other tricks

UC Berkeley CS294-9 Fall Projection Profiles

UC Berkeley CS294-9 Fall Modified Projection profiles “and” adjacent columns

UC Berkeley CS294-9 Fall Poor images: confusing profiles

UC Berkeley CS294-9 Fall The argument for more context Similar shapes in different contexts, in each case different characters, or parts of them.

UC Berkeley CS294-9 Fall Segmentation- based Approach Segment the word to characters. Extract the features from normalized charcter images. Concatenate the feature vectors to form a word feature vector. The character features are compared in the context of a word. (Works if segmentation is easy but characters are difficult to recognize in isolation)

UC Berkeley CS294-9 Fall Segmentation- based Word Recognition Note that you would not have much chance to recognize these individual characters!

UC Berkeley CS294-9 Fall Word-shape Analysis Approach Squeeze out extra white space, locate global reference lines (upper, top, base, bottom: Xxp ) TKH partions a word into 40 cells: 4 vertical regions and 10 horizontal. Some words have no descender or ascender regions: Hill

UC Berkeley CS294-9 Fall Word transformations

UC Berkeley CS294-9 Fall Detecting base, upper, top by smearing

UC Berkeley CS294-9 Fall The 40 area partitions

UC Berkeley CS294-9 Fall Stroke Directions

UC Berkeley CS294-9 Fall Edges, Endpoints

UC Berkeley CS294-9 Fall Cases Each Approach is Best At …

UC Berkeley CS294-9 Fall Most effective features? Best: Defined locally, yet containing shape information: stroke vectors, Baird templates Less effective: very high level “holes”; very low level “pixel values” Uncertainly/ partial matching is important/ TK Ho..

UC Berkeley CS294-9 Fall TKHo’s experiments Context: Zip code recognition Redundancy check requires reading the whole address Postal words Character recognizer trained on images 77 font samples were used to make prototypes

UC Berkeley CS294-9 Fall TKHo’s experiments Five (10?) methods used in parallel 1.A fuzzy character template matcher plus heuristic contextual postprocessor 2.Six character recognizers 3.Segmentation-based word recognizer using pixel values 4.Word shape analyzer using strokes 5.Word shape analyzer using Baird templates

UC Berkeley CS294-9 Fall TKHo’s experiments Many interesting conclusions.. 1.If several methods agree, they are almost always (99.6%) correct or right on second choice (100%) 2.Classifiers can be dynamically selected