UC Berkeley CS294-9 Fall Document Image Analysis Lecture 11: Word Recognition and Segmentation Richard J. Fateman Henry S. Baird University of California – Berkeley Xerox Palo Alto Research Center
UC Berkeley CS294-9 Fall The course so far…. DIA overview, objectives, measuring success Isolated-symbol recognition: –Symbols/glyphs, models/features/classifiers –image metrics, scaling up to 100 fonts of full ASCII –last 2 lectures: ‘best’ classifier none dominates but: voting helps combinations of randomized features/ classifiers!
UC Berkeley CS294-9 Fall Recall: we can often spot words when characters are unclear… Crude segmentation into columns, paragraphs, lines, words Bottom up, by smearing horiz/ vert … or Top down, by recursive x-y cuts what we really want is WORD recognition, most of the time.
UC Berkeley CS294-9 Fall Recall the scenario (lecture 9) Lopresti & Zhou (1994)
UC Berkeley CS294-9 Fall The flow goes one way No opportunity to correct failures in segmentation at symbol stage No opportunity to object to implausible text at the next stage. (providing alternative character choices gives limited flexibility)
UC Berkeley CS294-9 Fall Recall: Character-by-Character Voting Succeeds & Fails Majority vote (the most commonly used method)
UC Berkeley CS294-9 Fall High accuracy requires some cleverness In fact, some words, even in cleanly typeset text high- resolution scanned, have touching characters In noisy or low resolution images, adjacent characters may be nearly entirely touching or broken (or both touching and broken!) If we accept the flowchart model: we need perfect segmentation to feed the symbol recognition module If we reject the flowchart: OK, where do we go from here?
UC Berkeley CS294-9 Fall Compare alternative approaches First clarify the word recognition problem and see how to approach it. Next we see how good a job can we do on segmentation (a fall-back when can’t use the word recognition model). Robustness might require both approaches (multiple algorithms again!)
UC Berkeley CS294-9 Fall Formalize the word recognition problem (TKHo) Machine printed, ordinary fonts (var. width) Cut down on the variations –NOT: A word is all in same font/size [shape= feature] [we could trivialize task with one font, e.g. E-13B] Known lexicon (say 100,000 English words) 26^6 is 308 million; our lexicon is < 0.3% of this [trivialize with 1 item (check the box, say “yes”..)] Applications in mind: post office, UNLV bakeoff
UC Berkeley CS294-9 Fall Word Recognition: Objective
UC Berkeley CS294-9 Fall At Least Three Approaches
UC Berkeley CS294-9 Fall In reality, a combination: Later we will find that additional processing: inter-word statistics or even natural language parsing may be incorporated in the ranking.
UC Berkeley CS294-9 Fall Character Recognition Approach Symbol recognition is done at the character level. Contextual knowledge is used only at the ranking stage
UC Berkeley CS294-9 Fall One error in character segmentation can distort many characters Input word image Character Segmentation Segmented and normalized characters Recognition decisions
UC Berkeley CS294-9 Fall How to segment words to characters? Aspect ratio (fixed width, anyway) Projection profile Other tricks
UC Berkeley CS294-9 Fall Projection Profiles
UC Berkeley CS294-9 Fall Modified Projection profiles “and” adjacent columns
UC Berkeley CS294-9 Fall Poor images: confusing profiles
UC Berkeley CS294-9 Fall The argument for more context Similar shapes in different contexts, in each case different characters, or parts of them.
UC Berkeley CS294-9 Fall Segmentation- based Approach Segment the word to characters. Extract the features from normalized charcter images. Concatenate the feature vectors to form a word feature vector. The character features are compared in the context of a word. (Works if segmentation is easy but characters are difficult to recognize in isolation)
UC Berkeley CS294-9 Fall Segmentation- based Word Recognition Note that you would not have much chance to recognize these individual characters!
UC Berkeley CS294-9 Fall Word-shape Analysis Approach Squeeze out extra white space, locate global reference lines (upper, top, base, bottom: Xxp ) TKH partions a word into 40 cells: 4 vertical regions and 10 horizontal. Some words have no descender or ascender regions: Hill
UC Berkeley CS294-9 Fall Word transformations
UC Berkeley CS294-9 Fall Detecting base, upper, top by smearing
UC Berkeley CS294-9 Fall The 40 area partitions
UC Berkeley CS294-9 Fall Stroke Directions
UC Berkeley CS294-9 Fall Edges, Endpoints
UC Berkeley CS294-9 Fall Cases Each Approach is Best At …
UC Berkeley CS294-9 Fall Most effective features? Best: Defined locally, yet containing shape information: stroke vectors, Baird templates Less effective: very high level “holes”; very low level “pixel values” Uncertainly/ partial matching is important/ TK Ho..
UC Berkeley CS294-9 Fall TKHo’s experiments Context: Zip code recognition Redundancy check requires reading the whole address Postal words Character recognizer trained on images 77 font samples were used to make prototypes
UC Berkeley CS294-9 Fall TKHo’s experiments Five (10?) methods used in parallel 1.A fuzzy character template matcher plus heuristic contextual postprocessor 2.Six character recognizers 3.Segmentation-based word recognizer using pixel values 4.Word shape analyzer using strokes 5.Word shape analyzer using Baird templates
UC Berkeley CS294-9 Fall TKHo’s experiments Many interesting conclusions.. 1.If several methods agree, they are almost always (99.6%) correct or right on second choice (100%) 2.Classifiers can be dynamically selected