Download presentation
Presentation is loading. Please wait.
Published byEleanor Phelps Modified over 8 years ago
1
Million Book Project: Vision Becoming Reality Gabrielle Michalek, Carnegie Mellon Presentation to Carnegie Mellon Qatar Library November 9 & 10, 2005
2
Vision “To attempt to understand and solve the technical, economic, and social policy issues of providing online access to all creative works of the human race.” – Dr. Raj Reddy
3
What is the Million Book Project? The Million Book Project (MBP) is a worldwide endeavor to digitize and provide full-text searching and free-to-read access to a million books by 2007.
4
Why is this important? To share knowledge and inform citizenry Facilitate new knowledge Enhance student learning and success of faculty research Address copyright absurdities Support digital library research Preserve rare and fragile cultural materials
5
Digital library research initiatives Machine translation Massive distributed database Storage formats Use of digital libraries Distribution and sustainability Security Search engines Image processing Optical Character Recognition (OCR) Language processing Copyright laws
6
Who is involved? Carnegie Mellon University Libraries and the School of Computer Science Other U.S. libraries OCLC, Digital Library Federation, and College & Research Libraries Internet Archive U.N. Food and Agriculture Organization India China
7
Partners Indian Institute of Science International Institute of Information Technology Indian Institute of Information Technology Anna University Mysore University University of Pune Goa University Tirumala Tirupati Devasthanams Shanmugha Arts, Science, Technology & Research Academy Arulmigu Kalasalingam College of Engineering Maharashtra Industrial Development Corporation Chinese Academy of Science Chinese Ministry of Education Fudan University Nanjing University Peking University Tsinghua University Zhejiang University
8
Partners National Science Foundation 2001$665,600 2002$1,000,000 2003 $1,000,000 2004 $1,000,000 2005$58,500 for equipment and travel
9
Content parameters Balance users’ wants with legality Opportunity-driven, many sub-collections Some content strategies: Books for College Libraries Public domain materials Cultural heritage materials
10
Almost 500,000 books scanned to date 230,000 books in Chinese 100,000 books in Indian languages 140,000 English or western language books Incised palm leaves from the Saraswathi Mahal Library
11
Scanning in India Established 20 scanning centers Have scanned 200,000 books to date Provides above average wages, desirable jobs
12
Scanning in China Established 17 scanning centers, including one in the Shenzhen Free Trade Zone Shenzhen scanning center Are scanning indigenous materials, public domain works shipped from the U.S., and U. S. copyrighted works already in Chinese libraries (with permission granted) Provides above average wages, desirable jobs
13
Million Book Project in China Centers scan 1,000 volumes / 200,000 pages daily 270,000 volumes have been scanned to date
14
Data corruption discovered in some test- case books was caused by compressing digital files to transfer data Presently and in the future, rather than compressing files, more disks are used to transfer data Other quality control improvements in the Shenzhen scanning center and North Technical Center in Beijing Quality control improvements
15
Digitization preserves fragile old or ancient books and manuscripts Digitization benefits the worldwide public as well academic communities by sharing knowledge that is otherwise unavailable to citizens Value of digitization
16
Standards and workflow National standards for digital preservation www.imls.gov/pubs/forumframework.htm www.imls.gov/pubs/forumframework.htm National standards for cataloging Documented workflow & training developed and provided by Carnegie Mellon University Libraries
17
Digitization workflow Operators scan, post- process and OCR 600 dpi TIFFs Scan-Fix Abby Fine Reader Technicians capture metadata
18
Sustaining the collection Goal: Ten organizations host collection Cost per host site is ~$1M per host site Collection is ~20 terabytes Current host sites: Digital Library of India Universal Library, China Universal Library, Carnegie Mellon Internet Archive UC Merced
19
Thank you Gabrielle Michalek, Head of Archives & Digital Library Initiatives, Carnegie Mellon University Libraries, gabrielle@cmu.edugabrielle@cmu.edu
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.