Download presentation
Presentation is loading. Please wait.
1
ExpressReader Pro adopted to retrodigitization of mathematical documents Kazuaki Yokota
2
ExpressReader Pro ■Printed Text OCR ■Japanese / English ■Recognition Rate 99.7% for Japanese 99.8% for English ■Powerful Layout Analysis ■for x86 based Windows PC Features
3
Layout analysis 1
4
Layout analysis 2
5
Adoption for mathematical document ■Application framework ■Detection and recognition of mathematical formula ■Output format Problems
6
Flow diagram Image scanning Skew correction Layout analysis Character recognition User modification Output conversion Formula recognition Formula detection
7
Component relation Scanning Graphical User Interface INFTY formula Recognition Layout analysis Character recognition Formula detection
8
Formula detection 1 ■Score each words for both mathematical formula and text word, obtained by character recognition. M 0 90 100 100 0 90 70 90 T 100 40 20 20 100 40 70 90
9
Formula detection 2 ■Parse by context-free grammar(CFG) - Formula is also non-terminal symbol of this CFG.
11
XML based processing ■Input Recognition parameter, Image ■While processing Layout information, etc ■Output Result OCR needs various data while processing To implement OCR to certain application system, user must program to treat these data. ----- Unify to XML
12
XML Based Processing Layout analysis Character recognition Formula detection Graphical User Interface XML
13
Advantage of XML ■Easy to convert to other formats (XSLT) ■Easy to treat (DOM/SAX) ■Extensible / Flexible ■MathML ■Platform independent
14
XML format 1 ……Recognition Parameters ….. Recognized Results(After Recognition)
15
XML format 2 g ……
16
XML format 3 g ….Mathematical formulae
17
Demonstration ■….
18
Product form ■Software Development Kit ■Simple OCR Software For x86 based Windows PC
19
Summary ■More convenient GUI is needed ■We wish our product will make your business to be more efficient....
Similar presentations
© 2024 SlidePlayer.com. Inc.
All rights reserved.