ExpressReader Pro adopted to retrodigitization of mathematical documents Kazuaki Yokota.

Slides:



Advertisements
Similar presentations
Don’t Type it! OCR it! How to use an online OCR..
Advertisements

Collecting data Chapter 6. What is data? Data is raw facts and figures. In order to process data it has to be collected. The method of collecting data.
DOCUMENT TYPES. Digital Documents Converting documents to an electronic format will preserve those documents, but how would such a process be organized?
Version 6.1. Old Vs New Input Format - PDF Meta-Scan & Structured Folder Output Features & Benefits Examples – Other Verticals Content.
EBRIDGE Open Platform Connector for HP TRIM Software HP TRIM Software eBRIDGE Open Platform Connector for HP TRIM Software HP TRIM Software 1Version_1_28_08_2012.
XSL-FO + MathML Render MathML to Display, PDF, SVG September 18
Department of Electrical and Computer Engineering He Zhou Hui Zheng William Mai Xiang Guo Advisor: Professor Patrick Kelly ASLLENGE Midway Design review.
Extraction of text data and hyperlink structure from scanned images of mathematical journals Ann Arbor, March 19, 2002 Masakazu Suzuki (Kyushu University)
Flow Master  Flow Master is used to design and analyze single pipe.  It is very flexible as no unit conversion is needed.  Data can be entered with.
Supervised by Prof. LYU, Rung Tsong Michael Department of Computer Science & Engineering The Chinese University of Hong Kong Prepared by: Chan Pik Wah,
WELCOME PROJECT GROUP MEMBERS  Orhan AKSOY  Rıdvan ÇELEBİ  Ulan BAYALİYEV  Mustafa BAL  Mehmet BIÇAK.
Java Programming, 3e Concepts and Techniques Chapter 1 An Introduction to Java and Program Design.
Internationalization of Java Platform Presenter: Ataru Nakazawa Advisor: Xiaoping Jia Date: January 23, 2004.
Implementation of One Stop Search by XSLT By Dave Low University of Hong Kong 9-Dec-2003.
Input Devices or Ways to create the stuff you want.
Customer Summary Prism Software S CAN P ATH Easily Convert Scanned Documents.
بسم الله الرحمن الرحيم معالج الحروف الضوئي OCR. Introduction Definition : OCR stands for O ptical C haracter R ecognition refers to the branch of computer.
Text-To-Speech System for Marathi Miss. Deepa V. Kadam Indian Institute of Technology, Bombay.
Advanced Workgroup System. RED Advanced Workgroup Systems: Scan Features Copy Print Scan DNSG Software Our Customers Documents Our Customers Documents.
Braille Converter For Exam Background What is Braille? Braille is a series of raised dots that can be read with the fingers by people who are.
I.R.I.S. Toolkits Bring the power of recognition, classification, compression and/or extraction to your application.
Technology to make Scientific Documents Accessible Masakazu SUZUKI, Kyushu University (Professor emeritus) Katsuhito YAMAGUCHI, Nihon University InftyProject.
1 Two-dimensional Context-Free Grammars: Mathematical Formulae Recognition Daniel Průša, Václav Hlaváč Center for Machine Perception Faculty of Electrical.
Your Interactive Guide to the Digital World Discovering Computers 2012.
TH-OCR NK. content introduction go to next page background assumptions overall structure chart IPO for overall structure dataflow diagram of overall structure.
Introduction 01_intro.ppt
Java Programming, 2E Introductory Concepts and Techniques Chapter 1 An Introduction to Java and Program Design.
JCE A Java-based Commissioning Environment tool Hiroyuki Sako, JAEA Hiroshi Ikeda, Visible Information Center Inc. SAD Workshop.
Translation of PDF and ODF documents to Braille Ian Ball Supervisor: Iain Murray.
WorkPlace Pro Utilities.
1 BTEC HNC Systems Support Castle College 2007/8 Systems Analysis Lecture 9 Introduction to Design.
Introduction to M ATLAB EE 100 – EE Dept. - JUST.
XML The Overview. Three Key Questions What is XML? What Problems does it solve? Where and how is it used?
The most powerful high-speed scanning, indexing and OCR solution on the market Supports many high speed scanners: Fujitsu, Canon, Kodak, Epson, Avision,
Programming Project (Last updated: August 31 st /2010) Updates: - All details of project given - Deadline: Part I: September 29 TH 2010 (in class) Part.
Hala Bezine IGS 2011 Cancun-Mexico 1 Presented by :M me Hala Bezine Republic of Tunisia Ministery of Higher Education and Scientific Research University.
Braille Converter For Exam Agenda 1.Introduction 2.Research Problem 3.Objectives 4.Methodology 5.Users & Benefits 6.Expected Outputs 7.References.
Enricher Converter Analyzer Parser & Renderer UNIVERSAL, FAST AND RELIABLE.
EZYFLO. Aim of EZYFLO To draw simple flowcharts To reduce the memory size of the flowchart To create a software which runs in DOS environment also.
Confidential, I.R.I.S. © 2005, All rights reserved I.R.I.S. new OCR Software suite: A full range for document conversion, for private and corporate users.
Updating JUPITER framework using XML interface Kobe University Susumu Kishimoto.
Bringing “it” all Together !? Dean Djokic, ESRI David Maidment.
Intro to Scanners. A scanner works by creating a digital image. When you scan a document, you are making a picture of it. This digital image can be used.
CHAPTER TWO INTRODUCTION TO VISUAL BASIC © Prepared By: Razif Razali 1.
Verified Network Configuration. Verinec Goals Device independent network configuration Automated testing of configuration Automated distribution of configuration.
Scientific Applications of XML Arvind Hulgeri, Shantanu Godbole
Lesson 1 Operating Systems, Part 1. Objectives Describe and list different operating systems Understand file extensions Manage files and folders.
XML Alyssa Roberts. What is XML? Extensible Markup Language Specification to creating custom mark-up languages Simplified version of SGML, originally.
An exercise in preservation and applied technology Making an Electronic Text.
Structural Design Software TYLER HUTCHISON. Requirements  Develop an interface to generate a text file.  The text file is then passed to one of five.
1 Circuitscape Capstone Presentation Team Circuitscape Katie Rankin Mike Schulte Carl Reniker Sean Collins.
Jian Gui WANG Bragg Institute Meeting Java Algorithm Library Dec Java DRA Algorithm Library For Opal Neutron Scattering Data Analysis Team Jian.
Chapter – 8 Software Tools.
W3C Multimodal Interaction Activities Deborah A. Dahl August 9, 2006.
Objective Enhance the document production workflow at US Government Printing Office (GPO) Extract images from PDF OCR the extracted images/PDF Produce.
QSREALM.BLOGSPOT.COM Input Output Devices. QSREALM.BLOGSPOT.COM Input – Output Devices Also known as Peripheral Devices. These surround a computer’s CPU.
WP3: Image Segmentation - OCR Stavros Perantonis, Vassilis Maragos Edinburgh, March 6-7, 2003 Institute of Informatics & Telecommunications NCSR “Demokritos”
Your Interactive Guide to the Digital World Discovering Computers 2012 Chapter 13 Computer Programs and Programming Languages.
Input and output devices for visually impaired users
Business Scanner Proposition Epson Workforce DS-30
RedOffice4.5 UI Implementation
Operating System Interface between a user and the computer hardware
Powerpoint available at
Alternate Format for STEM
Improving Braille accessibility and personalization on Internet
InftyReader, ChattyInfty, and InftyEditor
Infty Software - Assistive Tools to Access STEM -
Ann Arbor, March 19, 2002 Masakazu Suzuki (Kyushu University)
Lecture 23 CS 507.
Week 10 Systems Development
Presentation transcript:

ExpressReader Pro adopted to retrodigitization of mathematical documents Kazuaki Yokota

ExpressReader Pro ■Printed Text OCR ■Japanese / English ■Recognition Rate 99.7% for Japanese 99.8% for English ■Powerful Layout Analysis ■for x86 based Windows PC Features

Layout analysis 1

Layout analysis 2

Adoption for mathematical document ■Application framework ■Detection and recognition of mathematical formula ■Output format Problems

Flow diagram Image scanning Skew correction Layout analysis Character recognition User modification Output conversion Formula recognition Formula detection

Component relation Scanning Graphical User Interface INFTY formula Recognition Layout analysis Character recognition Formula detection

Formula detection 1 ■Score each words for both mathematical formula and text word, obtained by character recognition. M T

Formula detection 2 ■Parse by context-free grammar(CFG) - Formula is also non-terminal symbol of this CFG.

XML based processing ■Input Recognition parameter, Image ■While processing Layout information, etc ■Output Result OCR needs various data while processing To implement OCR to certain application system, user must program to treat these data Unify to XML

XML Based Processing Layout analysis Character recognition Formula detection Graphical User Interface XML

Advantage of XML ■Easy to convert to other formats (XSLT) ■Easy to treat (DOM/SAX) ■Extensible / Flexible ■MathML ■Platform independent

XML format 1 ……Recognition Parameters ….. Recognized Results(After Recognition)

XML format 2 g ……

XML format 3 g ….Mathematical formulae

Demonstration ■….

Product form ■Software Development Kit ■Simple OCR Software For x86 based Windows PC

Summary ■More convenient GUI is needed ■We wish our product will make your business to be more efficient....