UC Berkeley CS294-9 Fall Document Image Analysis Lecture 12: Word Segmentation Richard J. Fateman Henry S. Baird University of California – Berkeley.

Slides:

Advertisements

Similar presentations

Conceptual Clustering

Advertisements

QR Code Recognition Based On Image Processing

Segmentation of Touching Characters in Devnagari & Bangla Scripts Using Fuzzy MultiFactorial Analysis Presented By: Sanjeev Maharjan St. Xavier’s College.

Word Spotting DTW.

Data Mining Methodology 1. Why have a Methodology  Don’t want to learn things that aren’t true May not represent any underlying reality ○ Spurious correlation.

Computational Models of Discourse Analysis Carolyn Penstein Rosé Language Technologies Institute/ Human-Computer Interaction Institute.

CHAPTER 8: Producing Data: Sampling

Identifying Image Spam Authorship with a Variable Bin-width Histogram-based Projective Clustering Song Gao, Chengcui Zhang, Wei Bang Chen Department of.

A Low-cost Attack on a Microsoft CAPTCHA Yan Qiang,

Content Based Image Clustering and Image Retrieval Using Multiple Instance Learning Using Multiple Instance Learning Xin Chen Advisor: Chengcui Zhang Department.

. Class 4: Fast Sequence Alignment. Alignment in Real Life u One of the major uses of alignments is to find sequences in a “database” u Such collections.

Chapter 2: Pattern Recognition

Cluster Analysis.  What is Cluster Analysis?  Types of Data in Cluster Analysis  A Categorization of Major Clustering Methods  Partitioning Methods.

Prénom Nom Document Analysis: Segmentation & Layout Analysis Prof. Rolf Ingold, University of Fribourg Master course, spring semester 2008.

LYU 0102 : XML for Interoperable Digital Video Library Recent years, rapid increase in the usage of multimedia information, Recent years, rapid increase.

Chinese Character Recognition for Video Presented by: Vincent Cheung Date: 25 October 1999.

1/20 Document Segmentation for Image Compression 27/10/2005 Emma Jonasson Supervisor: Dr. Peter Tischer.

Data Mining on NIJ data Sangjik Lee. Unstructured Data Mining Text Keyword Extraction Structured Data Base Data Mining Image Feature Extraction Structured.

Pattern Recognition. Introduction. Definitions.. Recognition process. Recognition process relates input signal to the stored concepts about the object.

February 15, 2006 Geog 458: Map Sources and Errors

UC Berkeley CS294-9 Fall Document Image Analysis Lecture 5: Metrics Richard J. Fateman Henry S. Baird University of California – Berkeley Xerox.

嵌入式視覺 Pattern Recognition for Embedded Vision Template matching Statistical / Structural Pattern Recognition Neural networks.

1 Template-Based Classification Method for Chinese Character Recognition Presenter: Tienwei Tsai Department of Informaiton Management, Chihlee Institute.

Word Processing Standard Grade Computing LA/LM. Word processor a computer program that allows you to manipulate text What is?

David Corne, and Nick Taylor, Heriot-Watt University - These slides and related resources:

Irfan Essa, Alex Pentland Facial Expression Recognition using a Dynamic Model and Motion Energy (a review by Paul Fitzpatrick for 6.892)

Protein Sequence Alignment and Database Searching.

Presented by Tienwei Tsai July, 2005

Slides are based on Negnevitsky, Pearson Education, Lecture 12 Hybrid intelligent systems: Evolutionary neural networks and fuzzy evolutionary systems.

Joint Bi-Level Image Experts Group ( JBIG ). JBIG Joint Bi-Level Image Experts Group (JBIG), reports both to ISO/IEC JTC1/SC29/WG11 and ITU-T SG 8. 

Sociology 5811: Lecture 14: ANOVA 2

Intelligent Vision Systems ENT 496 Object Shape Identification and Representation Hema C.R. Lecture 7.

GATree: Genetically Evolved Decision Trees 전자전기컴퓨터공학과 데이터베이스 연구실 G 김태종.

ISV Innovation Presented by ISV Innovation Presented by Business Intelligence Fundamentals: Data Cleansing Ola Ekdahl IT Mentors 9/12/08.

80 million tiny images: a large dataset for non-parametric object and scene recognition CS 4763 Multimedia Systems Spring 2008.

Object Recognition in Images Slides originally created by Bernd Heisele.

Chapter 4: Pattern Recognition. Classification is a process that assigns a label to an object according to some representation of the object’s properties.

Information Integration By Neel Bavishi. Mediator Introduction A mediator supports a virtual view or collection of views that integrates several sources.

Data Preprocessing Compiled By: Umair Yaqub Lecturer Govt. Murray College Sialkot.

UC Berkeley CS294-9 Fall Document Image Analysis Lecture 11: Word Recognition and Segmentation Richard J. Fateman Henry S. Baird University of.

Machine Learning Saarland University, SS 2007 Holger Bast Marjan Celikik Kevin Chang Stefan Funke Joachim Giesen Max-Planck-Institut für Informatik Saarbrücken,

4. Relationship Extraction Part 4 of Information Extraction Sunita Sarawagi 9/7/2012CS 652, Peter Lindes1.

Data Mining and Decision Support

Scanned Documents INST 734 Module 10 Doug Oard. Agenda Document image retrieval  Representation Retrieval Thanks for David Doermann for most of these.

UC Berkeley CS294-9 Fall b- 1 Document Image Analysis Lecture 12b: Integrating other info Richard J. Fateman Henry S. Baird University of California.

Measurements and Data. Topics Types of Data Distance Measurement Data Transformation Forms of Data Data Quality.

Course 3 Binary Image Binary Images have only two gray levels: “1” and “0”, i.e., black / white. —— save memory —— fast processing —— many features of.

Combining Neural Networks and Context-Driven Search for On- Line, Printed Handwriting Recognition in the Newton Larry S. Yaeger, Brandn J. Web, and Richard.

Applying Deep Neural Network to Enhance EMPI Searching

Formatting a Spreadsheet

IMAGE PROCESSING RECOGNITION AND CLASSIFICATION

CS 430: Information Discovery

CHAPTER 8: Producing Data: Sampling

Real-Time Human Pose Recognition in Parts from Single Depth Image

Final Year Project Presentation --- Magic Paint Face

R-CNN region By Ilia Iofedov 11/11/2018 BGU, DNN course 2016.

CSc4730/6730 Scientific Visualization

Brief Review of Recognition + Context

Machine Learning: Lecture 3

Object Recognition Today we will move on to… April 12, 2018

iSRD Spam Review Detection with Imbalanced Data Distributions

CHAPTER 8: Producing Data: Sampling

Cluster Validity For supervised classification we have a variety of measures to evaluate how good our model is Accuracy, precision, recall For cluster.

Nearest Neighbors CSC 576: Data Mining.

Chapter 10 Content Analysis

Evaluating Classifiers for Disease Gene Discovery

Visual Algebra for Teachers

Presentation transcript:

UC Berkeley CS294-9 Fall Document Image Analysis Lecture 12: Word Segmentation Richard J. Fateman Henry S. Baird University of California – Berkeley Xerox Palo Alto Research Center

UC Berkeley CS294-9 Fall The course, recently…. We studied symbol recognition, classifiers and their combinations Word recognition as distinct from characters

UC Berkeley CS294-9 Fall A good segmentation method (or several) is handy We cannot rely on a lexicon to have all words (names, proper nouns, numbers, acronyms) Insisting that words be in the lexicon does not mean they are correct. Powerpoint tries to refuse misspell as mispell since the latter is not in the dictionary! Good segmentation means that the symbol based recognition has a better chance of success

UC Berkeley CS294-9 Fall Segmentation/ Naïve or clever Numerous papers on the subject Some without strong models (e.g. cut at thin parts) Some with exhaustive search / template matching Some with learning/ internal comparisons

UC Berkeley CS294-9 Fall Naïve connected component analysis can’t come close… Characters like “ij:; Ξ â % are separated Ligatures are not separated: ffl, ŒÆœ ffi Vertical cuts between touching characters will not ordinarily work for italics THIS IS ULTRA CONDENSED..TZ this is times italic. (other problems: X 2, )

UC Berkeley CS294-9 Fall Papers of interest on segmentation Tsujimoto and Asada Bayer and Kressel Tao Hong’s (1995) PhD on Degraded Text Recognition

UC Berkeley CS294-9 Fall Segmentation + Clustering (Tao Hong)

UC Berkeley CS294-9 Fall Can lead to decoding!

UC Berkeley CS294-9 Fall Sometimes the image itself holds a key to decoding…

UC Berkeley CS294-9 Fall Visual inter-word relations

UC Berkeley CS294-9 Fall An example text block showing visual inter-word relationships

UC Berkeley CS294-9 Fall Pattern matching can lead to identifying a segment

UC Berkeley CS294-9 Fall

UC Berkeley CS294-9 Fall Where this fits…

UC Berkeley CS294-9 Fall Example

UC Berkeley CS294-9 Fall Tsujimoto & Asada: Overview

UC Berkeley CS294-9 Fall Resolve the touching characters: New metric for finding breaks (find plausible breaks Use knowledge about “the usual suspects” rn/m k/lc d/cl … (limits search substantially)

UC Berkeley CS294-9 Fall Metric, pre-processing ANDing columns for profile removing slant from italics

UC Berkeley CS294-9 Fall Choosing break candidates

UC Berkeley CS294-9 Fall Decision Tree for “The”

UC Berkeley CS294-9 Fall Tree search Depth first, looking for solution to the string matching, in sequence. Some partitions are penalized (but not eliminated) if the segmentation point is uncertain. Segments are matched to omnifont templates (“multiple similarity method..”)

UC Berkeley CS294-9 Fall Reexamined explanations mrn qcj klc B13 HI-I mmnun ckdc Etc… 30 confusions This might be mistaken for This

UC Berkeley CS294-9 Fall Some tough calls…

UC Berkeley CS294-9 Fall Unbelievable accuracy…

UC Berkeley CS294-9 Fall A different, perhaps more general method (Bayer, Kressel) Goal: find the column position(s) at which characters are touching –Treat as a systematic classification problem –Learn from a data base containing labelled merged characters Collect real life data; get human breakpoints [or could be synthetic, I suppose] Find appropriate feature set Learn the features of touching characters –Hypothesize column breaks –Application: postal addresses, other stuff too

UC Berkeley CS294-9 Fall Database of touching chars ….2158 patterns

UC Berkeley CS294-9 Fall Big idea Rather than represent the breaks as low points in the projection profile, represent the breaks in the natural context of touching characters by actual example, suitably normalized for size (15-30 pixels high). These locations are manually marked.

UC Berkeley CS294-9 Fall Local feature set describing cut locations / measures of similarity Number of black pixels (= projection profile!) Number of white pixels counting from top/bottom Number of white-black transitions Number of identical b or w pixels next to this column (derivative of pp?)

UC Berkeley CS294-9 Fall Global feature set describing cut locations / measures of similarity Width to height ratio of full image (wider suggests touching characters) Width to height ratio of the image AFTER cutting(s) Number of white-black transitions Number of identical b or w pixels next to this column (derivative of pp?)

UC Berkeley CS294-9 Fall Illustration of the strategy

UC Berkeley CS294-9 Fall How accurate, how fast? (cut location) Finding cuts: 7.8% error in learning set, 7.2%(!) on test set 22% of the no-cut regions had errors Best results used 50-feature classifier using 9 column width Cost for one image cut-analysis  one character analysis Validates statistics > heuristics..