Download presentation
Presentation is loading. Please wait.
Published byMerry Banks Modified over 9 years ago
1
DML-CZ: Scanning and adjusting the images Martin Lhoták Academy of Sciences Library Launching the DML-CZ 11. 5. 2008 Prague
2
DML-CZ Workflow 1. Preparation 2. Scanning and adjusting the images 3. OCR 4. Metadata harvesting (MR, ZBL) 5. Integration 6. Digital Library
3
Content 1. Digitization Centre of the AS Library 2. Scanning 3. Adjusting the images 4. Basic metadada 5. OCR 6. Back up and movement of the data 7. Production till now
4
Digitization Centre of the AS Library In operation since 1.1.2004 Builded with support from EU Solidarity fund after floods in Czechia in 2002 Main aim - to build a digital library of scientific publications, published in the Academy of Science of the Czech Rep. Digital Library of ASCR Partner of DML-CZ project since 2005
5
The Academy of Science of the Czech Republic > 50 scientific institutes 7500 employees, (4000 R&D) > 11 000 articles, reports, etc. a year publish > 90 journals (circa 3000 articl.) > 100 years history
6
Digitization Centre of the AS Library 2 x A2 bw scanners Zeutschel OS 7000 1 x A1 color scanner Digibook 10000 1 x A4 fast production scan. Panasonic Staff – 8 to 10 people Monthly production 40 - 50.000 pages Overall production > 2.000.000 pages
7
DML-CZ: Scanning 2 x A2 bw scanners Zeutschel OS 7000 600 DPI 4 bit greyscale 1 page = 1 file usually A5 TIFF with lossless LZW compression circa 10 MB
10
Image Adjusting Software Book Restorer from i2S Designed to process scanned books Geometrical correction Crop Blur Binarization Despecle
24
Basic Metadata XML (DTD of The Czech National Library) Title basic biblographic data Physical size of the journal Numbers of pages Software Sirius (CZ)
27
OCR Fine Reader 8.1 2 runs: - 1. to recognize language of paragraph - 2. to do OCR with right language OCR workflow developed by team of Dr. P. Sojka Output – double layer PDF: - 1. layer scanned picture - 2. layer „OCRed“ text
28
Back up and movement of the data Main steps and outputs: 1. scanning – TIFF 2. image adjust. and basic metadata – TIFF, XML 3. OCR – PDF After each step above: One copy to server in Brno Two copies on LTO tapes
29
Production for DML-CZ till now Scanning: 97 268 pages Image adjust.:123 961 pages Basic metadata: 96 009 pages OCR:126 278 pages Disproportion: some data was obtained from GDZ Goettingen
30
Alternative output of the Acad. of Sci. mathematic http://kramerius.lib.cas.cz
31
Thank you! Questions? Martin Lhoták lhotak@knav.cz www.knav.cz
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.