Download presentation
Presentation is loading. Please wait.
Published byAudra Chapman Modified over 9 years ago
1
Advanced OCR with OmniPage and FineReader
2
Overview Optical character recognition Optical character recognition Structural recognition Structural recognition Options Options Loading Loading Zoning Zoning OCR OCR Editing Editing
3
Optical Character Recognition (OCR) OCR turns pictures of text into e-text OCR turns pictures of text into e-text Does well unless… Does well unless… –The picture is fuzzy –The contrast is poor –The font is unusual –The font is too small or too large –The material has unusual characters
4
Structural Recognition Analyzes the layout of the page Analyzes the layout of the page –Columns –Headings –Graphics –Tables Usually does fairly well, unless the layout is non-standard Usually does fairly well, unless the layout is non-standard
5
Programs that Run OCR Programs for consumers Programs for consumers –Kurzweil 1000, 3000 –OpenBook –Intel Reader –Many others… Programs for production Programs for production –ABBYY FineReader –Nuance OmniPage
6
Consumer Programs Highly automated Highly automated Designed for individuals who have print disabilities Designed for individuals who have print disabilities Are not good production tools Are not good production tools –Do not provide flexibility –Do not allow much overriding –Interfaces not designed for editing
7
Production Programs in General A good program for production allows you to… A good program for production allows you to… –Control the zones (areas or blocks of text and graphics) Add, delete, change Add, delete, change –Edit easily –Improve recognition
8
Preferred Programs ABBYY FineReader ABBYY FineReader –Relatively easy to learn –Fairly intuitive –Good structural recognition Nuance OmniPage Nuance OmniPage –Less intuitive but more accessible –Often does better with technical materials
9
Both Good Tools If you can afford to have both, it’s nice, but not absolutely necessary. If you can afford to have both, it’s nice, but not absolutely necessary. If you have both, run a couple test pages through each to see which is doing better on a particular job. If you have both, run a couple test pages through each to see which is doing better on a particular job.
10
Under the Hood For best results with a program, set up your options before you begin! For best results with a program, set up your options before you begin! Tools > Options Tools > Options
11
Lots of Languages FineReader and OmniPage handle multiple languages. FineReader and OmniPage handle multiple languages. For foreign language, turn on all the languages in the book. For foreign language, turn on all the languages in the book. –It will recognize the diacritical marks. –Turn on what you need, but only what you need.
12
Math If you are running OCR on math, try turning on Greek. If you are running OCR on math, try turning on Greek. –Greek will allow the program to recognize alphas, deltas, sigmas, etc.
13
Another Decision Detect page orientation or not? Detect page orientation or not? –Does not always get it right –Try it if you have many pages turned
14
Considerations You may or may not want to keep headers and footers. You may or may not want to keep headers and footers. –I generally keep them to pull the page numbers. You may want to keep the page breaks. You may want to keep the page breaks. –Retaining page breaks helps to maintain one-to-one page correspondence with the book.
15
Fitting Everything In some cases, you may need to work with a custom paper size to fit everything onto one page. In some cases, you may need to work with a custom paper size to fit everything onto one page. This feature can be helpful when you are retaining everything on the page but not the layout. This feature can be helpful when you are retaining everything on the page but not the layout.
16
Loading Files “Open” “Open” –Opens saved program files “Load” “Load” –Loads image files to process Note that this same issue comes up with saving! Note that this same issue comes up with saving!
17
Wizards Are Evil… Do not rely on the automation Do not rely on the automation Load the image file and choose the processes you want Load the image file and choose the processes you want
18
Workspace The program has three primary areas The program has three primary areas Pages Pane Pages Pane –Either thumbnails or details –Allows simple navigation of pages Image Pane Image Pane –Your graphic Text Pane Text Pane –Area where the text from OCR will show
19
More Accessible Both programs have a detail view. Both programs have a detail view. –Shows text instead of graphics Detail view is more accessible for screen readers. Detail view is more accessible for screen readers. Otherwise, it is personal preference. Otherwise, it is personal preference.
20
Two Ways to Save To Save the program file to access later in the OCR program, choose File > Save To Save the program file to access later in the OCR program, choose File > Save –This saves your work file. You save your converted file during the last phase of the processing. You save your converted file during the last phase of the processing.
21
Production Tips Work with dual monitors Work with dual monitors –Check your computer and video card Stretching an OCR program across two monitors is a HUGE time-saver! Stretching an OCR program across two monitors is a HUGE time-saver! Learn to use keyboard shortcuts. Learn to use keyboard shortcuts. –They save tons of time!
Similar presentations
© 2024 SlidePlayer.com. Inc.
All rights reserved.