Advanced OCR with OmniPage and FineReader. Overview Optical character recognition Optical character recognition Structural recognition Structural recognition.

Slides:



Advertisements
Similar presentations
Don’t Type it! OCR it! How to use an online OCR..
Advertisements

Collecting data Chapter 6. What is data? Data is raw facts and figures. In order to process data it has to be collected. The method of collecting data.
Microsoft ® Office Word 2007 Training Header and footer basics Sweetwater ISD presents:
K Beck Deleting Temporary Files Tabbed Browsing Quick Tabs Grouping Tabs Printing Webpages Internet Explorer 7 is on your new computer with new.
Business Computer Information Systems 1A Microsoft Office XP Word: Lessons 6 – Desktop Publishing, Lesson 7 – Working With Documents, Lesson 8 – Increasing.
® Copyright 2008 Adobe Systems Incorporated. All rights reserved. ADOBE® ACCESSIBILITY Achieving Accessibility with PDF Greg Pisocky Accessibility Specialist.
INSERT BOOK COVER 1Copyright © 2013 Pearson Education, Inc. Publishing as Prentice Hall. Exploring Microsoft Access 2010 by Robert Grauer, Keith Mast,
Microsoft Word Review.
® Copyright 2008 Adobe Systems Incorporated. All rights reserved. ADOBE® ACCESSIBILITY Achieving Accessibility with PDF Greg Pisocky Adobe Systems Thursday.
Microsoft Excel 2010 Chapter 7
Creating Custom Forms. 2 Design and create a custom form You can create a custom form by modifying an existing form or creating a new form. Either way,
Microsoft Office Word Plan a document Word is a tool that helps you quickly create documents with a professional look. You should follow four steps.
Session 803: Processing PDF Files Gaeir Dietrich Director High Tech Center Training Unit
Processing PDF: How to Go from PDF to E-text to Audio Gaeir Dietrich Director High Tech Center Training Unit of the California Community Colleges Foothill.
Processing PDF: How to Go from PDF to E-text to Audio Gaeir Dietrich Director High Tech Center Training Unit of the California Community Colleges Foothill.
Session 302 Using Optical Character Recognition Programs Gaeir Dietrich Director High Tech Center Training Unit of the California Community Colleges.
Creating Accessible PDF’s in Adobe Acrobat Professional 7.0.
Get visual with SmartArt graphics How to create SmartArt graphics You’ve gotten an overview of the types of SmartArt graphics available. Now, get the nuts.
Word Processing basics
© Cheltenham Computer Training 2002 Microsoft Publisher 2002 – Slide No 1 Microsoft Publisher 2002 Intermediate Level Course.
Systems Analysis and Design in a Changing World, 6th Edition
CPSC 203 Introduction to Computers T59 & T64 By Jie (Jeff) Gao.
CTRL + Z is your best friend. Use it to undo anything! You can even undo multiple mistakes!
Microsoft Excel Spreadsheet Review. Templates  Templates can be produced for the following elements:  Text and Graphics  Formatting Information – Layouts,
CIS 250 Advanced Computer Applications PowerPoint.
Creating a PowerPoint Presentation
Using Dreamweaver Web Page Design. Introduction to Web Page Design Developing a New Site Building a web site involves creating individual pages and linking.
INSERT BOOK COVER 1Copyright © 2011 Pearson Education, Inc. Publishing as Prentice Hall. Exploring Microsoft Access 2010 by Robert Grauer, Keith Mast,
Standard Grade Computing General Purpose Packages WORD-PROCESSING WORD-PROCESSING Chapter 2.
The switch from Microsoft Office 2003 to 2007 Microsoft Word Microsoft Excel Microsoft PowerPoint.
MarkNotes Question 1 The Human Computer Interface (HCI) is an important part of an ICT system. Describe four factors which should be taken.
Chapter 3 – Part 1 Word Processing Writer for Linux CMPF 112 : COMPUTING SKILLS.
IS201 Agenda: 10/15/2013 Do form and report exercise. Identify general guidelines for form and report design. Discuss a few key points about reports in.
Computer Literacy BASICS: A Comprehensive Guide to IC 3, 5 th Edition Lesson 19 Organizing and Enhancing Worksheets 1 Morrison / Wells / Ruffolo.
McGraw-Hill Career Education© 2008 by the McGraw-Hill Companies, Inc. All Rights Reserved. Office Word 2007 Lab 3 Creating Reports and Tables.
Computer Literacy for IC 3 Unit 2: Using Productivity Software Chapter 3: Formatting and Organizing Paragraphs and Documents © 2010 Pearson Education,
Power Point Introduction to Computers. Opening and Viewing Presentations Click on the Start button (bottom-left of your screen). From the popup menu displayed.
Working with Inaccessible PDFs Gaeir Dietrich Director High Tech Center Training Unit of the California Community Colleges
MS Word 2010 Tutorial Prepared by: Mr. R. De Vera ii.
WHAT SHOULD YOU HAVE IN YOUR ALTERNATE FORMAT TOOLBOX?
 Given live by a presenter  Played without a presenter on a computer screen or on the Web  Slides provide a way to use text and graphics to introduce.
Excel Tips to Make Your Life Easier Michael Winecoff Associate University Librarian for Technical Services November 5, 2015.
® Copyright 2008 Adobe Systems Incorporated. All rights reserved. ADOBE® ACCESSIBILITY PDF Accessibility – Best Practices for Authoring Pete DeVasto Greg.
CPSC 203 Introduction to Computers T97 By Jie (Jeff) Gao.
XP New Perspectives on Microsoft Office Access 2003, Second Edition- Tutorial 6 1 Microsoft Office Access 2003 Tutorial 6 – Creating Custom Forms.
LESSON 6 CREATING PUBLICATIONS WITH MULTIPLE PAGES PUBLISHER.
Alternate Media Workflow Strategies for PDF. Why PDF? Portable document format (PDF) Reads the same on any computer Looks like the book Contains all the.
MarkNotes Question 1 The Human Computer Interface (HCI) is an important part of an ICT system. Describe four factors which should be taken.
Academic Computing Services 2007 Microsoft Word 2010 Publishing Long Documents This Guide will teach you how to work with long documents such as dissertations.
Chapter 3 – Part 1 Word Processing Pages for Mac CMPF 112 : COMPUTING SKILLS.
Creating Accessible PDFs
Computer Fundamentals 1
Microsoft Office 2007-Illustrated
GO! with Microsoft Office 2016
GO! with Microsoft Access 2016
Creating Accessible PDFs from Word Docs
© Paradigm Publishing, Inc.
Chapter 4 Application Software
Microsoft Office Access 2003
Microsoft® Office Word 2007 Training
Lesson 21 Getting Started with PowerPoint Essentials
Microsoft Office Access 2003
Session 901 Using Optical Character Recognition Programs
BUSINESS COMMUNICATION SKILLS PRESENTATION SKILLS OF THESIS & PROJECT
Lesson 19 Organizing and Enhancing Worksheets
Learning the Basics of Microsoft Word 2010 for Microsoft Windows
Learning the Basics of Microsoft Word 2010 for Microsoft Windows
Day 4: Modifying Page Layout and Printing your Documents
Quick and Dirty: the art of OCR
Presentation transcript:

Advanced OCR with OmniPage and FineReader

Overview Optical character recognition Optical character recognition Structural recognition Structural recognition Options Options Loading Loading Zoning Zoning OCR OCR Editing Editing

Optical Character Recognition (OCR) OCR turns pictures of text into e-text OCR turns pictures of text into e-text Does well unless… Does well unless… –The picture is fuzzy –The contrast is poor –The font is unusual –The font is too small or too large –The material has unusual characters

Structural Recognition Analyzes the layout of the page Analyzes the layout of the page –Columns –Headings –Graphics –Tables Usually does fairly well, unless the layout is non-standard Usually does fairly well, unless the layout is non-standard

Programs that Run OCR Programs for consumers Programs for consumers –Kurzweil 1000, 3000 –OpenBook –Intel Reader –Many others… Programs for production Programs for production –ABBYY FineReader –Nuance OmniPage

Consumer Programs Highly automated Highly automated Designed for individuals who have print disabilities Designed for individuals who have print disabilities Are not good production tools Are not good production tools –Do not provide flexibility –Do not allow much overriding –Interfaces not designed for editing

Production Programs in General A good program for production allows you to… A good program for production allows you to… –Control the zones (areas or blocks of text and graphics) Add, delete, change Add, delete, change –Edit easily –Improve recognition

Preferred Programs ABBYY FineReader ABBYY FineReader –Relatively easy to learn –Fairly intuitive –Good structural recognition Nuance OmniPage Nuance OmniPage –Less intuitive but more accessible –Often does better with technical materials

Both Good Tools If you can afford to have both, it’s nice, but not absolutely necessary. If you can afford to have both, it’s nice, but not absolutely necessary. If you have both, run a couple test pages through each to see which is doing better on a particular job. If you have both, run a couple test pages through each to see which is doing better on a particular job.

Under the Hood For best results with a program, set up your options before you begin! For best results with a program, set up your options before you begin! Tools > Options Tools > Options

Lots of Languages FineReader and OmniPage handle multiple languages. FineReader and OmniPage handle multiple languages. For foreign language, turn on all the languages in the book. For foreign language, turn on all the languages in the book. –It will recognize the diacritical marks. –Turn on what you need, but only what you need.

Math If you are running OCR on math, try turning on Greek. If you are running OCR on math, try turning on Greek. –Greek will allow the program to recognize alphas, deltas, sigmas, etc.

Another Decision Detect page orientation or not? Detect page orientation or not? –Does not always get it right –Try it if you have many pages turned

Considerations You may or may not want to keep headers and footers. You may or may not want to keep headers and footers. –I generally keep them to pull the page numbers. You may want to keep the page breaks. You may want to keep the page breaks. –Retaining page breaks helps to maintain one-to-one page correspondence with the book.

Fitting Everything In some cases, you may need to work with a custom paper size to fit everything onto one page. In some cases, you may need to work with a custom paper size to fit everything onto one page. This feature can be helpful when you are retaining everything on the page but not the layout. This feature can be helpful when you are retaining everything on the page but not the layout.

Loading Files “Open” “Open” –Opens saved program files “Load” “Load” –Loads image files to process Note that this same issue comes up with saving! Note that this same issue comes up with saving!

Wizards Are Evil… Do not rely on the automation Do not rely on the automation Load the image file and choose the processes you want Load the image file and choose the processes you want

Workspace The program has three primary areas The program has three primary areas Pages Pane Pages Pane –Either thumbnails or details –Allows simple navigation of pages Image Pane Image Pane –Your graphic Text Pane Text Pane –Area where the text from OCR will show

More Accessible Both programs have a detail view. Both programs have a detail view. –Shows text instead of graphics Detail view is more accessible for screen readers. Detail view is more accessible for screen readers. Otherwise, it is personal preference. Otherwise, it is personal preference.

Two Ways to Save To Save the program file to access later in the OCR program, choose File > Save To Save the program file to access later in the OCR program, choose File > Save –This saves your work file. You save your converted file during the last phase of the processing. You save your converted file during the last phase of the processing.

Production Tips Work with dual monitors Work with dual monitors –Check your computer and video card Stretching an OCR program across two monitors is a HUGE time-saver! Stretching an OCR program across two monitors is a HUGE time-saver! Learn to use keyboard shortcuts. Learn to use keyboard shortcuts. –They save tons of time!