A mathematical formula recognition method and its performance evaluation Masayuki Okamoto Shinshu University JAPAN.

Slides:



Advertisements
Similar presentations
Patient information extraction in digitized X-ray imagery Hsien-Huang P. Wu Department of Electrical Engineering, National Yunlin University of Science.
Advertisements

Formatting and Editing Skills
FORMULAS & FUNCTIONS EXCEL. Input A collection of information Data typed into the spreadsheet Output Worksheet Results.
Segmentation of Touching Characters in Devnagari & Bangla Scripts Using Fuzzy MultiFactorial Analysis Presented By: Sanjeev Maharjan St. Xavier’s College.
Huffman code and ID3 Prof. Sin-Min Lee Department of Computer Science.
Real-Time Camera-Based Character Recognition Free from Layout Constraints M. Iwamura, T. Tsuji, A. Horimatsu, and K. Kise.
Chapter 6: Model Assessment
Lecture 15 Hidden Markov Models Dr. Jianjun Hu mleg.cse.sc.edu/edu/csce833 CSCE833 Machine Learning University of South Carolina Department of Computer.
Extraction of text data and hyperlink structure from scanned images of mathematical journals Ann Arbor, March 19, 2002 Masakazu Suzuki (Kyushu University)
 Next - Previous  Horizontal Bar  Vertical Menu.
Prénom Nom Document Analysis: Segmentation & Layout Analysis Prof. Rolf Ingold, University of Fribourg Master course, spring semester 2008.
Multiple Human Objects Tracking in Crowded Scenes Yao-Te Tsai, Huang-Chia Shih, and Chung-Lin Huang Dept. of EE, NTHU International Conference on Pattern.
Elementary Algebra Exam 4 Material Exponential Expressions & Polynomials.
1.01 Investigate typefaces and fonts.
Word Lesson 2 Editing and Formatting Text
: Chapter 1: Introduction 1 Montri Karnjanadecha ac.th/~montri Principles of Pattern Recognition.
Unsupervised Object Segmentation with a Hybrid Graph Model (HGM) Reporter: 鄭綱 (6/14)
1 Recognition of Multi-Fonts Character in Early-Modern Printed Books Chisato Ishikawa(1), Naomi Ashida(1)*, Yurie Enomoto(1), Masami Takata(1), Tsukasa.
Comparing two sample means Dr David Field. Comparing two samples Researchers often begin with a hypothesis that two sample means will be different from.
S EGMENTATION FOR H ANDWRITTEN D OCUMENTS Omar Alaql Fab. 20, 2014.
1 Formatting and Editing Skills. 2 Word Processing Word processing software application software used for creating text documents letters, memos, and.
Spreadsheet Basics chapter 7
Formatting and Editing Skills Apply formatting and editing features and operational keys appropriately.
Order of Operations - rules for arithmetic and algebra that describe what sequence to follow to evaluate an expression involving more than one operation.
PROJECT PROPOSAL DIGITAL IMAGE PROCESSING TITLE:- Automatic Machine Written Document Reader Project Partners:- Manohar Kuse(Y08UC073) Sunil Prasad Jaiswal(Y08UC124)
1.01b Investigate typefaces and fonts.. Fonts It’s easier to understand fonts if you begin with the original definition of a font. Before desktop publishing,
1Ellen L. Walker Category Recognition Associating information extracted from images with categories (classes) of objects Requires prior knowledge about.
Chapter Eight: Using Statistics to Answer Questions.
Outline Introduction Research Project Findings / Results
26/01/20161Gianluca Demartini Ranking Categories for Faceted Search Gianluca Demartini L3S Research Seminars Hannover, 09 June 2006.
Scanned Documents INST 734 Module 10 Doug Oard. Agenda Document image retrieval  Representation Retrieval Thanks for David Doermann for most of these.
Creating a Multimedia Presentation Working with a Presentation Adding and Modifying Slide Text Inserting Information Producing a Slide Show
The programs of the computer system. System software and Application Software Controlling the operation of the computer system. Software designed to perform.
Word Processing vocabulary (a day) & (b day) Put the vocabulary words in your notebook.  Alignment - The way multiple lines of text line.
Desktop Publishing Lesson 2 — Working with Text. Lesson 2 – Working with Text2 Objectives  Create a blank document.  Work with text boxes.  Work with.
UC Berkeley CS294-9 Fall Document Image Analysis Lecture 12: Word Segmentation Richard J. Fateman Henry S. Baird University of California – Berkeley.
Combining Neural Networks and Context-Driven Search for On- Line, Printed Handwriting Recognition in the Newton Larry S. Yaeger, Brandn J. Web, and Richard.
The Basics of Formulas & Functions
Formatting and Editing Skills
Microsoft Excel A Spreadsheet Program.
Formatting and Editing Skills
Word Processing.
Formatting and Editing Skills
Excel 2013 Formulas & Functions.
Knowledge-Based Organ Identification from CT Images
Formatting and Editing Skills
Formatting and Editing Skills
Formatting and Editing Skills
Formatting and Editing Skills
Unit# 6: ICT Applications
Excel 2013 Formulas & Functions.
Formatting and Editing Skills
Formatting and Editing Skills
Formatting and Editing Skills
Excel 2013 Formulas & Functions.
Computing in COBOL: The Arithmetic Verbs and Intrinsic Functions
Formatting and Editing Skills
Formatting and Editing Skills
Formatting and Editing Skills
Formatting and Editing Skills
Formatting and Editing Skills
Chapter Nine: Using Statistics to Answer Questions
Ann Arbor, March 19, 2002 Masakazu Suzuki (Kyushu University)
Formatting and Editing Skills
Computer Science 10 & ICT 9 EXCEL
Formatting and Editing Skills
Formatting and Editing Skills
Formatting and Editing Skills
University of Warith AL-Anbiya’a
Formatting and Editing Skills
Presentation transcript:

A mathematical formula recognition method and its performance evaluation Masayuki Okamoto Shinshu University JAPAN

Goal of our study Character and symbol recognition Structure analysis and recognition Performance evaluation method Experimental results Future works Overview of presentation

High performance formula recognition system for “ Archiv der Mathematik ” Goal of our study

Overview of Recognition System Labeling Character or symbol recognition Structure recognition Touching character separation

Font type (1/2) Alphabet 1. Roman 2. Italic 3. Bold 4. Calligraphy 5. German Greek

Font type (2/2) Digits Mathematical symbols Characters Normal size Small size Number of characters or symbols: 650

Dictionary data Following three features are calculated from each sample feature calculation 1.Mesh features 2.Peripheral features 3.PDC features dictionary featuresimage

Character recognition process result Given image Feature calculation features comparison Dictionary data

Character recognition process We classify the given image with each feature and we use the majority vote Result from mesh features Result from peripheral features Result from PDC features Majority vote

Touching characters We assume a character which has a low score of similarity as a touching character Result/Score ‘ O ’ /0.847 ‘ ( ’ /0.980 ‘ y ’ /0.990

Touching character segmentation(1) Blurring the image Calculate minimal points Estimate cutting lines Comparison Classification

Touching character segmentation (2) Make projection profile Projection profile Image |h i – h i+1 | > θ Recognize

Segmentation experiment 47 touching characters found in our experimental data Touching typesampleserrorsrate Vertically12925% Horizontally25964% Fraction bar6267% Three characters440%

Correct result Correct examples Touch with fraction bar

Errors Three touching characters Other types

Recognition experiment Number of symbols : We excluded touching characters We distinguished following similar shape characters

Recognition rate Font typesampleserrorsrate Digit % Alphabet % Greek % Bold % Calligraphy % German % Symbol % Total %

Recognition rate Similar shaped characters TypeSamplesErrorsRate C,c % 1,l % O, % x,χ % S,s % Total %

Causes of errors Errors Similar shaped characters 31 Other causes37 Total68

Examples of recognition errors Most errors occurred at small characters such as scripts

Summary for experiment Font collection from a target document can achieve good recognition results Most of errors occurred at ambiguous and small characters Separation of touching characters is difficult and its performance is not enough

Our previous methods (1) Projection profile cutting

Our previous methods (2) Specific structure processing(Bottom-up) Script Root Matrix Fundamental structure processing(Top-down) Vertical division by symbols Horizontal division by symbols Horizontal division by blank space Core symbol in subexpression

Character recognition Structure recognition * Output Outline of structure recognition Target symbol Horizontal connection Top to bottom Group A processing Group B processing [symbol = fraction,root, matrix] [symbol = script,limit] Recursion Output in LaTeX/mathML Image

Structure Recognition (1/2) Fractions Roots Matrices Target symbol Matrix Recognition Target symbol

Structure Recognition (2/2) ScriptsLimits Adjacent symbol Target symbol

Matrix Recognition Vertical Overlap Horizontal Overlap

Case-distinction Vertical Overlap Horizontal Overlap Right EdgeLeft Parenthesis

( β α ) ( e i ) =... Original expression Answer Database Format Positional Information

( β α ) ( e i ) =... ( β α ) ( e i ) =... not found Comparison between Results and Answers (a) Original expression(b) Recognition result found Correct Recognition Count 11Fractions 12Scripts Number correctly recognized (C) Number in original expression (N) Recognition rate = C / N

Arch.Math., Page 44, Vol. 64 limit Correct Results (1/4)

Arch.Math., Page 272, Vol. 65 Multi-fraction Correct Results (2/4)

Arch.Math., Page 277, Vol. 64 Sparse Matrix Correct Results (3/4) Original expression Recognition result

Correct Results (4/4) Arch.Math., Page 108, Vol. 64 Nested case-distinction Original expression Recognition result

Errors (1/2) Arch.Math., Page 65, Vol. 24 Matrix Original expression Recognition result

Errors (2/2) Arch.Math., Page 104, Vol. 64 Case-distinction Original expression Recognition result

Inapplicable expressions (1)

Inapplicable expressions (2)

Inapplicable expressions (3)

Structure Recognition Rate StructureTotalErrorCorrect rate (%) Scripts Limits Fractions Roots Matrices (Case-distinctions) Total

Summary of structure recognition Extension of recognition method Matrix and case-distinction Performance evaluation Quantitative evaluation for a large number of expressions Automatic calculation of recognition rate for each typical structure

Future work Improvement of touching character separation Extension for inapplicable expressions Experiments on other documents