Download presentation
Presentation is loading. Please wait.
Published byBonnie Rice Modified over 9 years ago
1
Practices and Open Problems of Document Digitization For Million Book Project Xiaohui Zheng Tsinghua Univ. Library
2
Background THU participated in CADAL Project at the end of 2002 and finished 50000 E- books and E-dissertations in Jul 2006. Digitization Center was founded in March of 2003. Affiliated to Digital Library Research Division of THU.
3
Experiences In house or out source Planning and Source Material Selection Digitization Process Facility and Staff Management
4
In house or out source In House Pro: 1. Can control over all procedures, handling of materials and quality of products. 2. No worry about working with a vendor who turns out to be incompetent.
5
In house or out source In House Pro: 3. Provides a foundation of experience that helps to create policies, cost analyses, standard making, and data transferring. 4. keeping the production line in house makes other digitization projects smoothly forward in the whole flexible organization.
6
In house or out source In House Con: 1. Less staffing and workflow management experiences 2. Low productivity 3. Small Scale
7
In house or out source Out Source Pro: 1. Professional staff and developed workflow 2. High productivity. Large output in short time. 3. Large Scale
8
Our Choice In house operation 10 staff is enough to finish 50000 E-books in 3 years Enough time to training staff and improve efficiency.
9
Source Material Selection Copyright was the place to start Easy to handle Good quality of materials (not fragile) Quickly action for submitting the title list to duduplicate
10
Digitization Process Preparation (Selection , Identifier assignment) Scanning Image processing Metadata creation and packaging Quality control Data storage and backup
11
Ancient book Scanning and Image processing (Double page upside down scanning)
12
De-speckling and Centering CADAL 制作工具图像处理
13
Splitting into two pages (Batch processing)
14
Rotating (Batch processing)
15
De-skewing (batch processing) TPI
16
Format transferring (Batch processing)
17
Metadata creation and packaging
18
Facility and Staff Management Facility: Three flatbed AVA3 AVISION scanners Two FB6000E AVISION flatbed scanner Minolta PS 7000 High speed AVISION AV3800 Staff: 1 manager, 1 technical supervisor, 11 temp. staff Capacity: 5,000,000 page/year
19
Network topology and data storage system WAN Gigabit Ethernet Switch NAS Backup System DAS Dell System 4 Flatbed scanners High- speed scanner 9 Manual processing PCs 6 Automatic processing PCs LAN Gate- way Face- up Scanner
20
Related Software Scanning: QuickScan… Image processing: Bookshop, ACDSee, XnView, UltraEdit, Scanfix, DjVuerPro,… Cataloging and Packaging: CADAL Cataloging Tool, OEBEditor, CMDL Cataloging Toolkit,… Data transferring: DResManages
21
Open Problems And Considerations Content Discovery Metadata description is rough and inconsistent Resource Selection The coverage of the million books is not clear and systematical.
22
Open Problems And Considerations OCR Processing OCR processing has not yet started. The OCR technology for ancient book is under developed. Copyright Problem Almost 400,000 dissertations and modern books of CADAL collection haven’t clearly copyright disclaimer.
23
Open Problems And Considerations Organization Structure My suggestion is that more source collection provider, less digitization centers.
24
Thank you for your attention!
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.