ExpressReader Pro adopted to retrodigitization of mathematical documents Kazuaki Yokota
ExpressReader Pro ■Printed Text OCR ■Japanese / English ■Recognition Rate 99.7% for Japanese 99.8% for English ■Powerful Layout Analysis ■for x86 based Windows PC Features
Layout analysis 1
Layout analysis 2
Adoption for mathematical document ■Application framework ■Detection and recognition of mathematical formula ■Output format Problems
Flow diagram Image scanning Skew correction Layout analysis Character recognition User modification Output conversion Formula recognition Formula detection
Component relation Scanning Graphical User Interface INFTY formula Recognition Layout analysis Character recognition Formula detection
Formula detection 1 ■Score each words for both mathematical formula and text word, obtained by character recognition. M T
Formula detection 2 ■Parse by context-free grammar(CFG) - Formula is also non-terminal symbol of this CFG.
XML based processing ■Input Recognition parameter, Image ■While processing Layout information, etc ■Output Result OCR needs various data while processing To implement OCR to certain application system, user must program to treat these data Unify to XML
XML Based Processing Layout analysis Character recognition Formula detection Graphical User Interface XML
Advantage of XML ■Easy to convert to other formats (XSLT) ■Easy to treat (DOM/SAX) ■Extensible / Flexible ■MathML ■Platform independent
XML format 1 ……Recognition Parameters ….. Recognized Results(After Recognition)
XML format 2 g ……
XML format 3 g ….Mathematical formulae
Demonstration ■….
Product form ■Software Development Kit ■Simple OCR Software For x86 based Windows PC
Summary ■More convenient GUI is needed ■We wish our product will make your business to be more efficient....