Download presentation
Presentation is loading. Please wait.
1
A mathematical formula recognition method and its performance evaluation Masayuki Okamoto Shinshu University JAPAN
2
Goal of our study Character and symbol recognition Structure analysis and recognition Performance evaluation method Experimental results Future works Overview of presentation
3
High performance formula recognition system for “ Archiv der Mathematik ” Goal of our study
4
Overview of Recognition System Labeling Character or symbol recognition Structure recognition Touching character separation
5
Font type (1/2) Alphabet 1. Roman 2. Italic 3. Bold 4. Calligraphy 5. German Greek
6
Font type (2/2) Digits Mathematical symbols Characters Normal size Small size Number of characters or symbols: 650
7
Dictionary data Following three features are calculated from each sample feature calculation 1.Mesh features 2.Peripheral features 3.PDC features dictionary featuresimage
8
Character recognition process result Given image Feature calculation features comparison Dictionary data
9
Character recognition process We classify the given image with each feature and we use the majority vote Result from mesh features Result from peripheral features Result from PDC features Majority vote
10
Touching characters We assume a character which has a low score of similarity as a touching character Result/Score ‘ O ’ /0.847 ‘ ( ’ /0.980 ‘ y ’ /0.990
11
Touching character segmentation(1) Blurring the image Calculate minimal points Estimate cutting lines Comparison Classification
12
Touching character segmentation (2) Make projection profile Projection profile Image |h i – h i+1 | > θ Recognize
13
Segmentation experiment 47 touching characters found in our experimental data Touching typesampleserrorsrate Vertically12925% Horizontally25964% Fraction bar6267% Three characters440%
14
Correct result Correct examples Touch with fraction bar
15
Errors Three touching characters Other types
16
Recognition experiment Number of symbols : 12659 We excluded touching characters We distinguished following similar shape characters
17
Recognition rate Font typesampleserrorsrate Digit1940299.90% Alphabet38112999.24% Greek518399.42% Bold1710100.0% Calligraphy204398.53% German58296.55% Symbol59572999.85% Total126596899.95%
18
Recognition rate Similar shaped characters TypeSamplesErrorsRate C,c98693.88% 1,l502399.40% O,03681296.74% x,χ4520100.00% S,s1891094.71% Total16093198.07%
19
Causes of errors Errors Similar shaped characters 31 Other causes37 Total68
20
Examples of recognition errors Most errors occurred at small characters such as scripts
21
Summary for experiment Font collection from a target document can achieve good recognition results Most of errors occurred at ambiguous and small characters Separation of touching characters is difficult and its performance is not enough
22
Our previous methods (1) Projection profile cutting
23
Our previous methods (2) Specific structure processing(Bottom-up) Script Root Matrix Fundamental structure processing(Top-down) Vertical division by symbols Horizontal division by symbols Horizontal division by blank space Core symbol in subexpression
24
Character recognition Structure recognition * Output Outline of structure recognition Target symbol Horizontal connection Top to bottom Group A processing Group B processing [symbol = fraction,root, matrix] [symbol = script,limit] Recursion Output in LaTeX/mathML Image
25
Structure Recognition (1/2) Fractions Roots Matrices Target symbol Matrix Recognition Target symbol
26
Structure Recognition (2/2) ScriptsLimits Adjacent symbol Target symbol
27
Matrix Recognition Vertical Overlap Horizontal Overlap
28
Case-distinction Vertical Overlap Horizontal Overlap Right EdgeLeft Parenthesis
29
( β α ) ( e i ) =... Original expression Answer Database Format Positional Information
30
( β α ) ( e i ) =... ( β α ) ( e i ) =... not found Comparison between Results and Answers (a) Original expression(b) Recognition result found Correct Recognition Count 11Fractions 12Scripts Number correctly recognized (C) Number in original expression (N) Recognition rate = C / N
31
Arch.Math., Page 44, Vol. 64 limit Correct Results (1/4)
32
Arch.Math., Page 272, Vol. 65 Multi-fraction Correct Results (2/4)
33
Arch.Math., Page 277, Vol. 64 Sparse Matrix Correct Results (3/4) Original expression Recognition result
34
Correct Results (4/4) Arch.Math., Page 108, Vol. 64 Nested case-distinction Original expression Recognition result
35
Errors (1/2) Arch.Math., Page 65, Vol. 24 Matrix Original expression Recognition result
36
Errors (2/2) Arch.Math., Page 104, Vol. 64 Case-distinction Original expression Recognition result
37
Inapplicable expressions (1)
38
Inapplicable expressions (2)
39
Inapplicable expressions (3)
40
Structure Recognition Rate StructureTotalErrorCorrect rate (%) Scripts38414398.9 Limits6053793.9 Fractions119397.5 Roots70198.6 Matrices (Case-distinctions) 66887.9 Total47019298.0
41
Summary of structure recognition Extension of recognition method Matrix and case-distinction Performance evaluation Quantitative evaluation for a large number of expressions Automatic calculation of recognition rate for each typical structure
42
Future work Improvement of touching character separation Extension for inapplicable expressions Experiments on other documents
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.