Download presentation
Presentation is loading. Please wait.
Published byCathleen Hampton Modified over 9 years ago
1
OCR at INIS Branko Krznarić
2
Outline What is OCR? OCR Objectives Principles Techniques Software INIS Training Seminar 12-16 October 2015, Vienna, Austria 2
3
What is OCR? INIS Training Seminar 12-16 October 2015, Vienna, Austria 3 (source: pcmag.com)
4
Optical Character Recognition (OCR) OCR is the “conversion of scanned images of handwritten, typewritten or printed text into machine-encoded text.” [1] Make digitized images of printed documents searchable. Font encoding issues. INIS Training Seminar 12-16 October 2015, Vienna, Austria 4
5
OCR Objectives Data entry from printed records. OCR adds an extra value to your image. OCR brings to life your digitized collection. We can “find the needle in the haystack” INIS Training Seminar 12-16 October 2015, Vienna, Austria 5
6
OCR Objectives (contd.) Method of digitizing printed texts Electronically edited Searched Stored more compactly Displayed on-line Machine processes INIS Training Seminar 12-16 October 2015, Vienna, Austria 6
7
OCR Techniques Pre-processing De-skew Despeckle Binarization Line removal Layout analysis (zoning) Post-processing (dictionary) INIS Training Seminar 12-16 October 2015, Vienna, Austria 7
8
Scanned vs. Vector Image INIS Training Seminar 12-16 October 2015, Vienna, Austria 8
9
“Do not look at the trees (letters) try to see the forest (sentences)“ F0R 488UR1N6 7H3 L0N63V17Y 0F 1NF0RM4710N, P3RH4P8 7H3 M087 1MP0R74N7 R0L3 1N 7H3 0P3R4710N 0F 4 D16174L 4RCH1V3 18 M4N461N6 7H3 1D3N717Y, 1N736R17Y 4ND QU4L17Y 0F 7H3 4RCH1V38 1783LF 48 4 7RU873D 80URC3 0F 7H3 CUL7UR4L R3C0RD. INIS Training Seminar 12-16 October 2015, Vienna, Austria 9
10
Verdana Font FOR ASSURING THE LONGEVITY OF INFORMATION, PERHAPS THE MOST IMPORTANT ROLE IN THE OPERATION OF A DIGITAL ARCHIVE IS MANAGING THE IDENTITY, INTEGRITY AND QUALITY OF THE ARCHIVES ITSELF AS A TRUSTED SOURCE OF THE CULTURAL RECORD. INIS Training Seminar 12-16 October 2015, Vienna, Austria 10
11
Brush Script MT (Windows Font) FOR ASSURING THE LONGEVITY OF INFORMATION, PERHAPS THE MOST IMPORTANT ROLE IN THE OPERATION OF A DIGITAL ARCHIVE IS MANAGING THE IDENTITY, INTEGRITY AND QUALITY OF THE ARCHIVES ITSELF AS A TRUSTED SOURCE OF THE CULTURAL RECORD. INIS Training Seminar 12-16 October 2015, Vienna, Austria 11
12
PCs ≠ Humans OCR compares patterns and selects the closest match. It can be forced to a specific context, but requires customization. People adapt to circumstances and can circumvent misspellings if context is clear. INIS Training Seminar 12-16 October 2015, Vienna, Austria 12
13
True or false Usually, printed text is adequately sampled if each line is at least two pixels wide: INIS Training Seminar 12-16 October 2015, Vienna, Austria 13
14
Zoom in INIS Training Seminar 12-16 October 2015, Vienna, Austria 14
15
Zoom in INIS Training Seminar 12-16 October 2015, Vienna, Austria 15
16
Results from OCR It is in this context that I… … and an additional protocol on the basis… INIS Training Seminar 12-16 October 2015, Vienna, Austria 16
17
Chinese Raster Image (scanned) INIS Training Seminar 12-16 October 2015, Vienna, Austria 17
18
Chinese Vector Image (OCR) 滤器 INIS Training Seminar 12-16 October 2015, Vienna, Austria 18
19
Arabic Raster Image (scanned) INIS Training Seminar 12-16 October 2015, Vienna, Austria 19
20
Arabic Vector Image (OCR) هذ ا وشملت INIS Training Seminar 12-16 October 2015, Vienna, Austria 20
21
Japanese Raster Image (scanned) INIS Training Seminar 12-16 October 2015, Vienna, Austria 21
22
Japanese Vector Image (OCR) INIS Training Seminar 12-16 October 2015, Vienna, Austria 22
23
Font Encoding INIS Training Seminar 12-16 October 2015, Vienna, Austria 23
24
Font Encoding (cont.) INIS Training Seminar 12-16 October 2015, Vienna, Austria 24
25
OCR Software High degree of recognition accuracy Reproducing formatted output OCR Software at INIS: Abbyy FineReader (multilingual OCR) Adobe Acrobat InftyReader INIS Training Seminar 12-16 October 2015, Vienna, Austria 25
26
Abbyy FineReader (interface) INIS Training Seminar 12-16 October 2015, Vienna, Austria 26
27
InftyReader - an OCR System for Math Documents INIS Training Seminar 12-16 October 2015, Vienna, Austria 27
28
Reference [1] “Optical character recognition” http://en.wikipedia.org/wiki/Optical_character_ recognition. Retrieved 2015-09-29. http://en.wikipedia.org/wiki/Optical_character_ recognition INIS Training Seminar 12-16 October 2015, Vienna, Austria 28
29
Thank you! INIS Training Seminar 12-16 October 2015, Vienna, Austria 29
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.