Mass Digitization Projects Celebration and Challenges Presented to the 2 nd ICUDL Alexandria, Egypt by Dr. Gloriana St. Clair Carnegie Mellon University
Thesis Mass digitization projects are creating a revolution in information retrieval. Focusing human attention must be the new research agenda.
Main Points History and current state –Million Book Project –Google Print/Book –Open Content Alliance Challenges –Technology –Metadata –Legal issues What’s next –Organization for learning
Million Book Project Began in 2000 Universal Library project Free to read Out-of-copyright; scanned with permission 800,000 volumes Funding: NSF, India, China, I’net Archive Pilot project in Qatar in 2007
Google Print/BookSearch Began in 2004 Google and a half dozen partners Search yields snippets, then buy or borrow book… Presumes its strategy respects copyright Funding: online advertising Google’s ease of use shapes expectations
Open Content Alliance (OCA) Alliance of non-profits and universities, including Million Book Project Led by Brewster Kahle, Internet Archive Targets in-copyright books Digitizes onsite in libraries
Challenges Technology Metadata Legal issues
Technology Proprietary equipment Google, OCA Changing standards grayscale, color Bandwidth v. cost images v. OCRd Readability v. cost corrected v. uncorrected
Metadata MARC/WorldCat access issues; in English Creating on the fly inaccurate Native cataloging various standards Traditional cataloging cost; suitability Non-book formats no standards
Legal Issues Copyright is our biggest constraint –In much of the world, a book is in copyright for the life of the author + 70 years –U.S. book copyright renewal records are searchable online (thanks to Michael Lesk) –Verifying copyright is time consuming and expensive
Copyright Strategies MBP Approaches publishers to digitize entire o.p. holdings, not title-by-title. Google Publishers sued over snippets. Now pushes users to analog books. 1 st ICUDL Michael Shamos proposed machine summarization as a way to deliver content without breaking copyright.
What Will Happen to Books? “What will happen to books? Reader, Take heart! (Publisher, be very, very afraid.) Internet search engines will set them free.” —Kevin Kelly, 2006
What’s Next How will this digital repository contribute to learning, help create new knowledge and build a better future? “Learning takes place in the head of the student, and depends entirely on the activities of the student.” —Herbert A. Simon, 2002
Technology + Learning Theory Competition for time, attention –Develop expert systems to assist selection Mastery of a discipline is now impossible –Sampling –Problem-solving –Just-in-time learning
Organizing Information Selected search –Discipline-specific gateways and portals Pattern recognition –IF-THEN sequences
Presenting Knowledge Creation of a dynamic pedagogy –Engage students –Relate concepts –Focus on learners, learning styles
Conclusions Much to celebrate –TEST BED Critical mass of digital materials for scholars, for computer science research. –NEW FACES, NEW IDEAS Involvement of new partners, launching of new projects. –ICUDL An international group that faces similar problems and concerns, works together, and shares solutions.
Conclusions Next, most difficult challenge Focusing human attention –Selecting information –Presenting information –Enabling learning ICUDL Let’s look forward to celebrating that victory as well, as partners.
Thank You Dr. Gloriana St. Clair Dean of University Libraries Carnegie Mellon University Pittsburgh, Pennsylvania