Presentation is loading. Please wait.

Presentation is loading. Please wait.

A mathematical formula recognition method and its performance evaluation Masayuki Okamoto Shinshu University JAPAN.

Similar presentations


Presentation on theme: "A mathematical formula recognition method and its performance evaluation Masayuki Okamoto Shinshu University JAPAN."— Presentation transcript:

1 A mathematical formula recognition method and its performance evaluation Masayuki Okamoto Shinshu University JAPAN

2 Goal of our study Character and symbol recognition Structure analysis and recognition Performance evaluation method Experimental results Future works Overview of presentation

3 High performance formula recognition system for “ Archiv der Mathematik ” Goal of our study

4 Overview of Recognition System Labeling Character or symbol recognition Structure recognition Touching character separation

5 Font type (1/2) Alphabet 1. Roman 2. Italic 3. Bold 4. Calligraphy 5. German Greek

6 Font type (2/2) Digits Mathematical symbols Characters Normal size Small size Number of characters or symbols: 650

7 Dictionary data Following three features are calculated from each sample feature calculation 1.Mesh features 2.Peripheral features 3.PDC features dictionary featuresimage

8 Character recognition process result Given image Feature calculation features comparison Dictionary data

9 Character recognition process We classify the given image with each feature and we use the majority vote Result from mesh features Result from peripheral features Result from PDC features Majority vote

10 Touching characters We assume a character which has a low score of similarity as a touching character Result/Score ‘ O ’ /0.847 ‘ ( ’ /0.980 ‘ y ’ /0.990

11 Touching character segmentation(1) Blurring the image Calculate minimal points Estimate cutting lines Comparison Classification

12 Touching character segmentation (2) Make projection profile Projection profile Image |h i – h i+1 | > θ Recognize

13 Segmentation experiment 47 touching characters found in our experimental data Touching typesampleserrorsrate Vertically12925% Horizontally25964% Fraction bar6267% Three characters440%

14 Correct result Correct examples Touch with fraction bar

15 Errors Three touching characters Other types

16 Recognition experiment Number of symbols : 12659 We excluded touching characters We distinguished following similar shape characters

17 Recognition rate Font typesampleserrorsrate Digit1940299.90% Alphabet38112999.24% Greek518399.42% Bold1710100.0% Calligraphy204398.53% German58296.55% Symbol59572999.85% Total126596899.95%

18 Recognition rate Similar shaped characters TypeSamplesErrorsRate C,c98693.88% 1,l502399.40% O,03681296.74% x,χ4520100.00% S,s1891094.71% Total16093198.07%

19 Causes of errors Errors Similar shaped characters 31 Other causes37 Total68

20 Examples of recognition errors Most errors occurred at small characters such as scripts

21 Summary for experiment Font collection from a target document can achieve good recognition results Most of errors occurred at ambiguous and small characters Separation of touching characters is difficult and its performance is not enough

22 Our previous methods (1) Projection profile cutting

23 Our previous methods (2) Specific structure processing(Bottom-up) Script Root Matrix Fundamental structure processing(Top-down) Vertical division by symbols Horizontal division by symbols Horizontal division by blank space Core symbol in subexpression

24 Character recognition Structure recognition * Output Outline of structure recognition Target symbol Horizontal connection Top to bottom Group A processing Group B processing [symbol = fraction,root, matrix] [symbol = script,limit] Recursion Output in LaTeX/mathML Image

25 Structure Recognition (1/2) Fractions Roots Matrices Target symbol Matrix Recognition Target symbol

26 Structure Recognition (2/2) ScriptsLimits Adjacent symbol Target symbol

27 Matrix Recognition Vertical Overlap Horizontal Overlap

28 Case-distinction Vertical Overlap Horizontal Overlap Right EdgeLeft Parenthesis

29 ( β α ) ( e i ) =... Original expression Answer Database Format Positional Information

30 ( β α ) ( e i ) =... ( β α ) ( e i ) =... not found Comparison between Results and Answers (a) Original expression(b) Recognition result found Correct Recognition Count 11Fractions 12Scripts Number correctly recognized (C) Number in original expression (N) Recognition rate = C / N

31 Arch.Math., Page 44, Vol. 64 limit Correct Results (1/4)

32 Arch.Math., Page 272, Vol. 65 Multi-fraction Correct Results (2/4)

33 Arch.Math., Page 277, Vol. 64 Sparse Matrix Correct Results (3/4) Original expression Recognition result

34 Correct Results (4/4) Arch.Math., Page 108, Vol. 64 Nested case-distinction Original expression Recognition result

35 Errors (1/2) Arch.Math., Page 65, Vol. 24 Matrix Original expression Recognition result

36 Errors (2/2) Arch.Math., Page 104, Vol. 64 Case-distinction Original expression Recognition result

37 Inapplicable expressions (1)

38 Inapplicable expressions (2)

39 Inapplicable expressions (3)

40 Structure Recognition Rate StructureTotalErrorCorrect rate (%) Scripts38414398.9 Limits6053793.9 Fractions119397.5 Roots70198.6 Matrices (Case-distinctions) 66887.9 Total47019298.0

41 Summary of structure recognition Extension of recognition method Matrix and case-distinction Performance evaluation Quantitative evaluation for a large number of expressions Automatic calculation of recognition rate for each typical structure

42 Future work Improvement of touching character separation Extension for inapplicable expressions Experiments on other documents


Download ppt "A mathematical formula recognition method and its performance evaluation Masayuki Okamoto Shinshu University JAPAN."

Similar presentations


Ads by Google