Download presentation
Presentation is loading. Please wait.
Published byLora Thomas Modified over 9 years ago
1
IAEA International Atomic Energy Agency International Nuclear Information System (INIS) OCR at INIS INIS Training Seminar 7-11 October 2013, Vienna, Austria Branko Krznarić (based on the presentation by Yves Reynaud) INIS Unit
2
IAEA Outline What is OCR? OCR Objectives Principles Techniques Software INIS Training Seminar 7-11 October 2013, Vienna, Austria 2
3
IAEA What is OCR? INIS Training Seminar 7-11 October 2013, Vienna, Austria 3 (source: pcmag.com)
4
IAEA Optical Character Recognition (OCR) OCR is the “conversion of scanned images of handwritten, typewritten or printed text into machine-encoded text.” [1] Make digitized images of printed documents searchable. Font encoding issues. INIS Training Seminar 7-11 October 2013, Vienna, Austria 4
5
IAEA OCR Objectives We can “find the needle in the haystack” OCR offers a basic search from an unstructured document. OCR adds an extra value to your image. OCR brings to life your digitized collection. INIS Training Seminar 7-11 October 2013, Vienna, Austria 5
6
IAEA OCR Techniques Pre-processing De-skew Despeckle Binarization (optional) Line removal Layout analysis (zoning) Post-processing (dictionary) INIS Training Seminar 7-11 October 2013, Vienna, Austria 6
7
IAEA INIS Training Seminar 7-11 October 2013, Vienna, Austria 7 Scanned vs. Vector Image
8
IAEA “Do not look at the trees (letters) try to see the forest (sentences)“ F0R 488UR1N6 7H3 L0N63V17Y 0F 1NF0RM4710N, P3RH4P8 7H3 M087 1MP0R74N7 R0L3 1N 7H3 0P3R4710N 0F 4 D16174L 4RCH1V3 18 M4N461N6 7H3 1D3N717Y, 1N736R17Y 4ND QU4L17Y 0F 7H3 4RCH1V38 1783LF 48 4 7RU873D 80URC3 0F 7H3 CUL7UR4L R3C0RD. INIS Training Seminar 7-11 October 2013, Vienna, Austria 8
9
IAEA Verdana Font FOR ASSURING THE LONGEVITY OF INFORMATION, PERHAPS THE MOST IMPORTANT ROLE IN THE OPERATION OF A DIGITAL ARCHIVE IS MANAGING THE IDENTITY, INTEGRITY AND QUALITY OF THE ARCHIVES ITSELF AS A TRUSTED SOURCE OF THE CULTURAL RECORD. INIS Training Seminar 7-11 October 2013, Vienna, Austria 9
10
IAEA Brush Script MT (Windows Font) FOR ASSURING THE LONGEVITY OF INFORMATION, PERHAPS THE MOST IMPORTANT ROLE IN THE OPERATION OF A DIGITAL ARCHIVE IS MANAGING THE IDENTITY, INTEGRITY AND QUALITY OF THE ARCHIVES ITSELF AS A TRUSTED SOURCE OF THE CULTURAL RECORD. INIS Training Seminar 7-11 October 2013, Vienna, Austria 10
11
IAEA PCs ≠ Humans OCR compares patterns and selects the closest match. It can be forced to a specific context, but requires customization. People adapt to circumstances and can circumvent misspellings if context is clear. INIS Training Seminar 7-11 October 2013, Vienna, Austria 11
12
IAEA True or false Usually, printed text is adequately sampled if each line is at least two pixels in thickness: INIS Training Seminar 7-11 October 2013, Vienna, Austria 12
13
IAEA Zoom in INIS Training Seminar 7-11 October 2013, Vienna, Austria 13
14
IAEA Zoom in INIS Training Seminar 7-11 October 2013, Vienna, Austria 14
15
IAEA Results from OCR It is in this context that I… … and an additional protocol on the basis… INIS Training Seminar 7-11 October 2013, Vienna, Austria 15
16
IAEA Chinese Raster Image (scanned) INIS Training Seminar 7-11 October 2013, Vienna, Austria 16
17
IAEA Chinese Vector Image (OCR) 滤器 INIS Training Seminar 7-11 October 2013, Vienna, Austria 17
18
IAEA Arabic Raster Image (scanned) INIS Training Seminar 7-11 October 2013, Vienna, Austria 18
19
IAEA Arabic Vector Image (OCR) هذ ا وشملت INIS Training Seminar 7-11 October 2013, Vienna, Austria 19
20
IAEA Japanese Raster Image (scanned) INIS Training Seminar 7-11 October 2013, Vienna, Austria 20
21
IAEA Japanese Vector Image (OCR) INIS Training Seminar 7-11 October 2013, Vienna, Austria 21
22
IAEA Font Encoding INIS Training Seminar 7-11 October 2013, Vienna, Austria 22
23
IAEA Font Encoding (cont.) INIS Training Seminar 7-11 October 2013, Vienna, Austria 23
24
IAEA OCR Software Abbyy FineReader (multilingual OCR) Adobe Acrobat InftyReader INIS Training Seminar 7-11 October 2013, Vienna, Austria 24
25
IAEA Abbyy FineReader (interface) INIS Training Seminar 7-11 October 2013, Vienna, Austria 25
26
IAEA InftyReader - an OCR System for Math Documents INIS Training Seminar 7-11 October 2013, Vienna, Austria 26
27
IAEA Reference [1] “Optical character recognition” http://en.wikipedia.org/wiki/Optical_character_r ecognition. Retrieved 2013-09-23. http://en.wikipedia.org/wiki/Optical_character_r ecognition INIS Training Seminar 7-11 October 2013, Vienna, Austria 27
28
IAEA Thank you! INIS Training Seminar 7-11 October 2013, Vienna, Austria 28
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.