Presentation is loading. Please wait.

Presentation is loading. Please wait.

DML-CZ: Scanning and adjusting the images Martin Lhoták Academy of Sciences Library Launching the DML-CZ 11. 5. 2008 Prague.

Similar presentations


Presentation on theme: "DML-CZ: Scanning and adjusting the images Martin Lhoták Academy of Sciences Library Launching the DML-CZ 11. 5. 2008 Prague."— Presentation transcript:

1 DML-CZ: Scanning and adjusting the images Martin Lhoták Academy of Sciences Library Launching the DML-CZ 11. 5. 2008 Prague

2 DML-CZ Workflow 1. Preparation 2. Scanning and adjusting the images 3. OCR 4. Metadata harvesting (MR, ZBL) 5. Integration 6. Digital Library

3 Content 1. Digitization Centre of the AS Library 2. Scanning 3. Adjusting the images 4. Basic metadada 5. OCR 6. Back up and movement of the data 7. Production till now

4 Digitization Centre of the AS Library In operation since 1.1.2004 Builded with support from EU Solidarity fund after floods in Czechia in 2002 Main aim - to build a digital library of scientific publications, published in the Academy of Science of the Czech Rep. Digital Library of ASCR Partner of DML-CZ project since 2005

5 The Academy of Science of the Czech Republic > 50 scientific institutes 7500 employees, (4000 R&D) > 11 000 articles, reports, etc. a year publish > 90 journals (circa 3000 articl.) > 100 years history

6 Digitization Centre of the AS Library 2 x A2 bw scanners Zeutschel OS 7000 1 x A1 color scanner Digibook 10000 1 x A4 fast production scan. Panasonic Staff – 8 to 10 people Monthly production 40 - 50.000 pages Overall production > 2.000.000 pages

7 DML-CZ: Scanning 2 x A2 bw scanners Zeutschel OS 7000 600 DPI 4 bit greyscale 1 page = 1 file usually A5 TIFF with lossless LZW compression circa 10 MB

8

9

10 Image Adjusting Software Book Restorer from i2S Designed to process scanned books Geometrical correction Crop Blur Binarization Despecle

11

12

13

14

15

16

17

18

19

20

21

22

23

24 Basic Metadata XML (DTD of The Czech National Library) Title basic biblographic data Physical size of the journal Numbers of pages Software Sirius (CZ)

25

26

27 OCR Fine Reader 8.1 2 runs: - 1. to recognize language of paragraph - 2. to do OCR with right language OCR workflow developed by team of Dr. P. Sojka Output – double layer PDF: - 1. layer scanned picture - 2. layer „OCRed“ text

28 Back up and movement of the data Main steps and outputs: 1. scanning – TIFF 2. image adjust. and basic metadata – TIFF, XML 3. OCR – PDF After each step above: One copy to server in Brno Two copies on LTO tapes

29 Production for DML-CZ till now Scanning: 97 268 pages Image adjust.:123 961 pages Basic metadata: 96 009 pages OCR:126 278 pages Disproportion: some data was obtained from GDZ Goettingen

30 Alternative output of the Acad. of Sci. mathematic http://kramerius.lib.cas.cz

31 Thank you! Questions? Martin Lhoták lhotak@knav.cz www.knav.cz


Download ppt "DML-CZ: Scanning and adjusting the images Martin Lhoták Academy of Sciences Library Launching the DML-CZ 11. 5. 2008 Prague."

Similar presentations


Ads by Google