Presentation is loading. Please wait.

Presentation is loading. Please wait.

IAEA International Atomic Energy Agency International Nuclear Information System (INIS) OCR at INIS INIS Training Seminar 7-11 October 2013, Vienna, Austria.

Similar presentations


Presentation on theme: "IAEA International Atomic Energy Agency International Nuclear Information System (INIS) OCR at INIS INIS Training Seminar 7-11 October 2013, Vienna, Austria."— Presentation transcript:

1 IAEA International Atomic Energy Agency International Nuclear Information System (INIS) OCR at INIS INIS Training Seminar 7-11 October 2013, Vienna, Austria Branko Krznarić (based on the presentation by Yves Reynaud) INIS Unit

2 IAEA Outline What is OCR? OCR Objectives Principles Techniques Software INIS Training Seminar 7-11 October 2013, Vienna, Austria 2

3 IAEA What is OCR? INIS Training Seminar 7-11 October 2013, Vienna, Austria 3 (source: pcmag.com)

4 IAEA Optical Character Recognition (OCR) OCR is the “conversion of scanned images of handwritten, typewritten or printed text into machine-encoded text.” [1] Make digitized images of printed documents searchable. Font encoding issues. INIS Training Seminar 7-11 October 2013, Vienna, Austria 4

5 IAEA OCR Objectives We can “find the needle in the haystack” OCR offers a basic search from an unstructured document. OCR adds an extra value to your image. OCR brings to life your digitized collection. INIS Training Seminar 7-11 October 2013, Vienna, Austria 5

6 IAEA OCR Techniques Pre-processing De-skew Despeckle Binarization (optional) Line removal Layout analysis (zoning) Post-processing (dictionary) INIS Training Seminar 7-11 October 2013, Vienna, Austria 6

7 IAEA INIS Training Seminar 7-11 October 2013, Vienna, Austria 7 Scanned vs. Vector Image

8 IAEA “Do not look at the trees (letters) try to see the forest (sentences)“ F0R 488UR1N6 7H3 L0N63V17Y 0F 1NF0RM4710N, P3RH4P8 7H3 M087 1MP0R74N7 R0L3 1N 7H3 0P3R4710N 0F 4 D16174L 4RCH1V3 18 M4N461N6 7H3 1D3N717Y, 1N736R17Y 4ND QU4L17Y 0F 7H3 4RCH1V38 1783LF 48 4 7RU873D 80URC3 0F 7H3 CUL7UR4L R3C0RD. INIS Training Seminar 7-11 October 2013, Vienna, Austria 8

9 IAEA Verdana Font FOR ASSURING THE LONGEVITY OF INFORMATION, PERHAPS THE MOST IMPORTANT ROLE IN THE OPERATION OF A DIGITAL ARCHIVE IS MANAGING THE IDENTITY, INTEGRITY AND QUALITY OF THE ARCHIVES ITSELF AS A TRUSTED SOURCE OF THE CULTURAL RECORD. INIS Training Seminar 7-11 October 2013, Vienna, Austria 9

10 IAEA Brush Script MT (Windows Font) FOR ASSURING THE LONGEVITY OF INFORMATION, PERHAPS THE MOST IMPORTANT ROLE IN THE OPERATION OF A DIGITAL ARCHIVE IS MANAGING THE IDENTITY, INTEGRITY AND QUALITY OF THE ARCHIVES ITSELF AS A TRUSTED SOURCE OF THE CULTURAL RECORD. INIS Training Seminar 7-11 October 2013, Vienna, Austria 10

11 IAEA PCs ≠ Humans OCR compares patterns and selects the closest match. It can be forced to a specific context, but requires customization. People adapt to circumstances and can circumvent misspellings if context is clear. INIS Training Seminar 7-11 October 2013, Vienna, Austria 11

12 IAEA True or false Usually, printed text is adequately sampled if each line is at least two pixels in thickness: INIS Training Seminar 7-11 October 2013, Vienna, Austria 12

13 IAEA Zoom in INIS Training Seminar 7-11 October 2013, Vienna, Austria 13

14 IAEA Zoom in INIS Training Seminar 7-11 October 2013, Vienna, Austria 14

15 IAEA Results from OCR It is in this context that I… … and an additional protocol on the basis… INIS Training Seminar 7-11 October 2013, Vienna, Austria 15

16 IAEA Chinese Raster Image (scanned) INIS Training Seminar 7-11 October 2013, Vienna, Austria 16

17 IAEA Chinese Vector Image (OCR) 滤器 INIS Training Seminar 7-11 October 2013, Vienna, Austria 17

18 IAEA Arabic Raster Image (scanned) INIS Training Seminar 7-11 October 2013, Vienna, Austria 18

19 IAEA Arabic Vector Image (OCR) هذ ا وشملت INIS Training Seminar 7-11 October 2013, Vienna, Austria 19

20 IAEA Japanese Raster Image (scanned) INIS Training Seminar 7-11 October 2013, Vienna, Austria 20

21 IAEA Japanese Vector Image (OCR) INIS Training Seminar 7-11 October 2013, Vienna, Austria 21

22 IAEA Font Encoding INIS Training Seminar 7-11 October 2013, Vienna, Austria 22

23 IAEA Font Encoding (cont.) INIS Training Seminar 7-11 October 2013, Vienna, Austria 23

24 IAEA OCR Software Abbyy FineReader (multilingual OCR) Adobe Acrobat InftyReader INIS Training Seminar 7-11 October 2013, Vienna, Austria 24

25 IAEA Abbyy FineReader (interface) INIS Training Seminar 7-11 October 2013, Vienna, Austria 25

26 IAEA InftyReader - an OCR System for Math Documents INIS Training Seminar 7-11 October 2013, Vienna, Austria 26

27 IAEA Reference [1] “Optical character recognition” http://en.wikipedia.org/wiki/Optical_character_r ecognition. Retrieved 2013-09-23. http://en.wikipedia.org/wiki/Optical_character_r ecognition INIS Training Seminar 7-11 October 2013, Vienna, Austria 27

28 IAEA Thank you! INIS Training Seminar 7-11 October 2013, Vienna, Austria 28


Download ppt "IAEA International Atomic Energy Agency International Nuclear Information System (INIS) OCR at INIS INIS Training Seminar 7-11 October 2013, Vienna, Austria."

Similar presentations


Ads by Google