A mathematical formula recognition method and its performance evaluation Masayuki Okamoto Shinshu University JAPAN
Goal of our study Character and symbol recognition Structure analysis and recognition Performance evaluation method Experimental results Future works Overview of presentation
High performance formula recognition system for “ Archiv der Mathematik ” Goal of our study
Overview of Recognition System Labeling Character or symbol recognition Structure recognition Touching character separation
Font type (1/2) Alphabet 1. Roman 2. Italic 3. Bold 4. Calligraphy 5. German Greek
Font type (2/2) Digits Mathematical symbols Characters Normal size Small size Number of characters or symbols: 650
Dictionary data Following three features are calculated from each sample feature calculation 1.Mesh features 2.Peripheral features 3.PDC features dictionary featuresimage
Character recognition process result Given image Feature calculation features comparison Dictionary data
Character recognition process We classify the given image with each feature and we use the majority vote Result from mesh features Result from peripheral features Result from PDC features Majority vote
Touching characters We assume a character which has a low score of similarity as a touching character Result/Score ‘ O ’ /0.847 ‘ ( ’ /0.980 ‘ y ’ /0.990
Touching character segmentation(1) Blurring the image Calculate minimal points Estimate cutting lines Comparison Classification
Touching character segmentation (2) Make projection profile Projection profile Image |h i – h i+1 | > θ Recognize
Segmentation experiment 47 touching characters found in our experimental data Touching typesampleserrorsrate Vertically12925% Horizontally25964% Fraction bar6267% Three characters440%
Correct result Correct examples Touch with fraction bar
Errors Three touching characters Other types
Recognition experiment Number of symbols : We excluded touching characters We distinguished following similar shape characters
Recognition rate Font typesampleserrorsrate Digit % Alphabet % Greek % Bold % Calligraphy % German % Symbol % Total %
Recognition rate Similar shaped characters TypeSamplesErrorsRate C,c % 1,l % O, % x,χ % S,s % Total %
Causes of errors Errors Similar shaped characters 31 Other causes37 Total68
Examples of recognition errors Most errors occurred at small characters such as scripts
Summary for experiment Font collection from a target document can achieve good recognition results Most of errors occurred at ambiguous and small characters Separation of touching characters is difficult and its performance is not enough
Our previous methods (1) Projection profile cutting
Our previous methods (2) Specific structure processing(Bottom-up) Script Root Matrix Fundamental structure processing(Top-down) Vertical division by symbols Horizontal division by symbols Horizontal division by blank space Core symbol in subexpression
Character recognition Structure recognition * Output Outline of structure recognition Target symbol Horizontal connection Top to bottom Group A processing Group B processing [symbol = fraction,root, matrix] [symbol = script,limit] Recursion Output in LaTeX/mathML Image
Structure Recognition (1/2) Fractions Roots Matrices Target symbol Matrix Recognition Target symbol
Structure Recognition (2/2) ScriptsLimits Adjacent symbol Target symbol
Matrix Recognition Vertical Overlap Horizontal Overlap
Case-distinction Vertical Overlap Horizontal Overlap Right EdgeLeft Parenthesis
( β α ) ( e i ) =... Original expression Answer Database Format Positional Information
( β α ) ( e i ) =... ( β α ) ( e i ) =... not found Comparison between Results and Answers (a) Original expression(b) Recognition result found Correct Recognition Count 11Fractions 12Scripts Number correctly recognized (C) Number in original expression (N) Recognition rate = C / N
Arch.Math., Page 44, Vol. 64 limit Correct Results (1/4)
Arch.Math., Page 272, Vol. 65 Multi-fraction Correct Results (2/4)
Arch.Math., Page 277, Vol. 64 Sparse Matrix Correct Results (3/4) Original expression Recognition result
Correct Results (4/4) Arch.Math., Page 108, Vol. 64 Nested case-distinction Original expression Recognition result
Errors (1/2) Arch.Math., Page 65, Vol. 24 Matrix Original expression Recognition result
Errors (2/2) Arch.Math., Page 104, Vol. 64 Case-distinction Original expression Recognition result
Inapplicable expressions (1)
Inapplicable expressions (2)
Inapplicable expressions (3)
Structure Recognition Rate StructureTotalErrorCorrect rate (%) Scripts Limits Fractions Roots Matrices (Case-distinctions) Total
Summary of structure recognition Extension of recognition method Matrix and case-distinction Performance evaluation Quantitative evaluation for a large number of expressions Automatic calculation of recognition rate for each typical structure
Future work Improvement of touching character separation Extension for inapplicable expressions Experiments on other documents