Practices and Open Problems of Document Digitization For Million Book Project Xiaohui Zheng Tsinghua Univ. Library.

Slides:



Advertisements
Similar presentations
From books to bytes: accelerating digitization at TTU Libraries with Kirtas BookScan APT 2400 Jessica Lu Donell Callender Texas Tech University Libraries.
Advertisements

From books to bytes: accelerating digitization at TTU Libraries with Kirtas BookScan APT 2400 Jessica Lu Donell Callender Texas Tech University Libraries.
Strategic issues for digital projects... …or, what are we doing here?
Strategic issues for digital projects... …or, what are we doing here?
Capacity Building Passing on the Experience Dr. Noha Adly World Digital Library Arab Peninsula Regional Group meeting.
THE JOKOMO / YAMADA LIBRARY DIGITAL LIBRARY PROJECT.
Digitization of library collection in developing countries: the Hezekiah Oluwasanmi Library’s experience By Jagboro, K. O. Omotayo,B.O.
Overview Features & Functions 7/13/12. Foundations Need A Document Processing Solution That… Collects files automatically? Scans to Word? Scans to Desktop?
The US RDA Test For the European RDA Interest Group/JSC Seminar Copenhagen, Denmark, August 8, 2010 Presented by Beacher Wiggins Director for Acquisitions.
© Nuance Communications, Inc. All rights reserved. Page 1 Nuance ® AutoStore ® for SAP ® solutions.
Digitization Projects: Internal Development vs. Outsourcing Production or D.I.Y. vs. The Pros.
Redesigning Technical Services By Reconceptualizing Staff University of Connecticut Libraries Francine M. DeFranco Living the Future VI April 7, 2006.
PLANNING FOR DIGITIZATION Digitizing Your Documents.
Developing PANDORA Mark Corbould Director, IT Business Systems.
DIGITIZATION OF COMPUTER SCIENCE QUESTION PAPERS IN BHARATHIDASAN UNIVERSITY LIBRARY By V. MUTHULAKSHMI SUPERVISOR Dr. M. SURULINATHI Assistant Professor.
Digital Partnerships at San Francisco Public Library: So Many Suitors, So Little Time.
Library Electronic Resources in the EUI Library Veerle Deckmyn, Library Director Aimee Glassel, Electronic Resources Librarian 5 September
1 Australian Newspapers Digitisation Program Development of the Newspapers Content Management System Rose Holley – ANDP Manager ANPlan/ANDP Workshop, 28.
Recent Progress in the Million Book Digital Library Project in China By Prof. Jihai Zhao Zhejiang University Libraries, Hangzhou, China
The Voice of A Community Chinese Times Digitization Project Ian Song Prepared for the Multicultural Canada Conference
Jee Davis University of Texas Libraries ALCTS Technical Services Directors of Large Research Libraries Interest Group 2014 ALA Annual Conference in Las.
Bibliography in the Digital Age - IFLA Satellite Meeting Warsaw, 9 August Online materials published in Austria collecting, archiving and metadata.
Tisch Technical Services FY 2011 Planning April 13, 2010.
New Partnerships for Smarter Data Discovery, eBooks and Digital Asset Management Thailand IUG 2012 – Mahidol University.
METS-Based Cataloging Toolkit for Digital Library Management System Dong, Li Tsinghua University Library
Mark Phillips Digital Projects Department University of North Texas Annexation of Texas Project.
Plan for the preservation of digital content and archives in THUL Jiang Airong, Dong Li Tsinghua University Library EMANI Meeting GRENOBLE – 16 Oct, 2006.
Project HISPRA (Historical Pragensia) Supported by the European Economic Area (EEA) and Norwegian Financial Mechanisms Metropolitan Libraries Section Conference.
WORKFLOWS AND OTHER CONSIDERATIONS FOR DIGITIZATION  Steve Bingo  Processing Archivist Washington State University Libraries  Alex Merrill  Assistant.
Johannes Spitzbart Phonogrammarchiv, Austrian Academy of Sciences Österreichische Tage der Digitalen Geisteswissenschaften save the data - workshop on.
Group 1 Case Study Presentation Proposal for Open Access (OA) Library Leadership Institute 2014.
© January/2008 CCS Content Conversion Specialists GmbH Weidestr. 134, Hamburg, Germany consulting technology digitization services.
Uganda Science Digital Library (USDL) Digitizing and publishing documents Bergen – Makerere visit February 2005.
Promoting Open Access to Scholarly Data Promoting Open Access to Scholarly Data Ian Y. Song Simon Fraser University Library, Canada Prepared for the 20.
This is an overview of sophisticated configuration tools for online selling processes of network solutions. The tools address a very wide range of design.
Mass digitisation? Astrid Verheusen Projectmanager Research & Development Division National library of the Netherlands LIBER-EBLIDA Workshop on Digitisation.
24 March 2010Atlanta, Georgia Passing it on: Notes on digital initiative sustainability Marty Kurth HBCU Library Alliance – Cornell University Library.
11-15 April 2011 Mauritius Institute of Health S.S.Pillai
Choosing Delivery Software for a Digital Library Jody DeRidder Digital Library Center University of Tennessee.
International Seminary on Digitisation: Experience and Technology Lisbon, 11th May 2004 Minerva &MinervaPLUS Benefits for Cultural Institutions and Industries.
UNSD Census Workshop Day 2 - Session 7 Data Capture: Intelligent Character Recognition Andy Tye – International Manager DRS are Worldwide specialists in.
Digitization An Introduction to Digitization Projects and to Using the Montana Memory Project.
Million Book Bibliotheca Alexandrina Youssef Eldakar 19 November 2006.
Implementing UP 17 February Project Phases Analysis Implementation Evaluation Development Design.
1 By: Suman Negi, Technical Officer ‘B’ DESIDOC, DRDO, Delhi Presentation at NACLIN 14 (During 9-11 December 2014, Pondicherry) Design and Development.
Copyright © Allyn & Bacon 2008 Using Technology in the Classroom Gary G. Bitter & Jane M. Legacy Chapter Thirteen This multimedia product and its contents.
Stacy Nowicki, Library Director Michigan Academic Library Council Meeting Davenport University, Grand Rapids, MI 18 March 2011 Dspace at Kalamazoo College.
1/16/2016I. Revels Digital Imaging Workshop 1 Selection Considerations For Digital Imaging Projects.
Digitization & Digital Preservation
ISG We build general capability Introduction to Olympus Shawn T. Brown, PhD ISG MISSION 2.0 Lead Director of Public Health Applications Pittsburgh Supercomputing.
The library is open Digital Assets Management & Institutional Repository Russian-IUG November 2015 Tomsk, Russia Nabil Saadallah Manager Business.
CENTRAL/WESTERN MASSACHUSETTS AUTOMATED RESOURCE SHARING Digitization GOALS & THEIR LOGISTICS Michael J. Bennett Digital Initiatives Librarian C/WMARS,
Million Book Project: Vision Becoming Reality Gabrielle Michalek, Carnegie Mellon Presentation to Carnegie Mellon Qatar Library November 9 & 10, 2005.
Notes accompany this presentation. Please select Notes Page view. These materials can be reproduced only with official approval from Gartner. Such approvals.
The Metadata Is the Message Why Your Book Doesn’t Exist Without It.
Picking up the Pieces: A Retro ETD Project at Utah State University Richard W. Clement Dean of Libraries Utah State University ETD 2013 University of Hong.
Pre-Course Assignment
UNT Libraries TRAIL Processing Mark Phillips April 26, 2016
UNSD Census Workshop Data Capture: Intelligent Character Recognition
Digitisation in academic libraries: Experience from Makerere University Library, Kampala Uganda By Patrick Sekikome Presented at the CERN-UNESCO School.
Challenges against building FADA
DIGITAL LIBRARY.
Preserving Our Collective Digital History
A Statistical Profile of LTS and Aspirational Goals for 2015-
Digitizing Arabic Text: Where are we today?
Digitizing Arabic Text: Where are we today?
Digital Library and Plan for Institutional Repository
Current Challenges in Digitization
ArchivesSpace – Archivematica – DSpace Workflow Integration
Digital Library and Plan for Institutional Repository
Presentation transcript:

Practices and Open Problems of Document Digitization For Million Book Project Xiaohui Zheng Tsinghua Univ. Library

Background THU participated in CADAL Project at the end of 2002 and finished E- books and E-dissertations in Jul Digitization Center was founded in March of Affiliated to Digital Library Research Division of THU.

Experiences In house or out source Planning and Source Material Selection Digitization Process Facility and Staff Management

In house or out source In House Pro: 1. Can control over all procedures, handling of materials and quality of products. 2. No worry about working with a vendor who turns out to be incompetent.

In house or out source In House Pro: 3. Provides a foundation of experience that helps to create policies, cost analyses, standard making, and data transferring. 4. keeping the production line in house makes other digitization projects smoothly forward in the whole flexible organization.

In house or out source In House Con: 1. Less staffing and workflow management experiences 2. Low productivity 3. Small Scale

In house or out source Out Source Pro: 1. Professional staff and developed workflow 2. High productivity. Large output in short time. 3. Large Scale

Our Choice In house operation 10 staff is enough to finish E-books in 3 years Enough time to training staff and improve efficiency.

Source Material Selection Copyright was the place to start Easy to handle Good quality of materials (not fragile) Quickly action for submitting the title list to duduplicate

Digitization Process Preparation (Selection , Identifier assignment) Scanning Image processing Metadata creation and packaging Quality control Data storage and backup

Ancient book Scanning and Image processing (Double page upside down scanning)

De-speckling and Centering CADAL 制作工具图像处理

Splitting into two pages (Batch processing)

Rotating (Batch processing)

De-skewing (batch processing) TPI

Format transferring (Batch processing)

Metadata creation and packaging

Facility and Staff Management Facility: Three flatbed AVA3 AVISION scanners Two FB6000E AVISION flatbed scanner Minolta PS 7000 High speed AVISION AV3800 Staff: 1 manager, 1 technical supervisor, 11 temp. staff Capacity: 5,000,000 page/year

Network topology and data storage system WAN Gigabit Ethernet Switch NAS Backup System DAS Dell System 4 Flatbed scanners High- speed scanner 9 Manual processing PCs 6 Automatic processing PCs LAN Gate- way Face- up Scanner

Related Software Scanning: QuickScan… Image processing: Bookshop, ACDSee, XnView, UltraEdit, Scanfix, DjVuerPro,… Cataloging and Packaging: CADAL Cataloging Tool, OEBEditor, CMDL Cataloging Toolkit,… Data transferring: DResManages

Open Problems And Considerations Content Discovery Metadata description is rough and inconsistent Resource Selection The coverage of the million books is not clear and systematical.

Open Problems And Considerations OCR Processing OCR processing has not yet started. The OCR technology for ancient book is under developed. Copyright Problem Almost 400,000 dissertations and modern books of CADAL collection haven’t clearly copyright disclaimer.

Open Problems And Considerations Organization Structure My suggestion is that more source collection provider, less digitization centers.

Thank you for your attention!