Denise Troll Covey Associate Dean, University Libraries, Carnegie Mellon Pennsylvania Library Association Conference Pittsburgh, PA – October 5, 2003 Understanding.

Slides:



Advertisements
Similar presentations
Partnering with Faculty / researchers to Enhance Scholarly Communication Caroline Mutwiri.
Advertisements

What is HathiTrust and How Can it Make a Difference? Sourcing and Scaling brought to the collective collection.
Million Book Project Today Gloriana St. Clair October 21, 2003 OCLC.
DSpace: the MIT Libraries Institutional Repository MacKenzie Smith, MIT EDUCAUSE 2003, November 5 th Copyright MacKenzie Smith, This work is the.
Turning to Dust or Digital Denise Troll Covey Associate University Librarian, Carnegie Mellon Future of the Book Conference Cairns, Australia – April 2003.
1. The Digital Library Challenge The Hybrid Library Today’s information resources collections are “hybrid” Combinations of - paper and digital format.
FirstSearch Update March 2005 Vivien Cook, Regional Account Manager
Million Book Project: Dreams and Realities Dr. Gloriana St. Clair University Librarian, Carnegie Mellon.
Changes in Library Usage, Usability, & User Support Denise A. Troll Distinguished Fellow, Digital Library Federation Associate University Librarian, Carnegie.
The Million Book Project: Removing Obstacles to Use, Satisfaction, & Success Denise Troll Covey Principal Librarian for Special Projects – Carnegie Mellon.
Automated Reference Assistance: Reference for a New Generation Denise Troll Covey Associate University Librarian Carnegie Mellon CNI Meeting – April 2002.
The Million Book Project: Confronting Copyright Absurdity, Creating Copyright Hope Denise Troll Covey Associate Dean, Carnegie Mellon University Libraries.
Denise Troll Covey Principal Librarian for Special Projects The Impact of Current Copyright Law Erin Rhodes Copyright Permission Assistant Carnegie Mellon.
Global Cooperation for Global Access: The Million Book Project Denise Troll Covey Principal Librarian for Special Projects Carnegie Mellon CRIS 2004 –
Proquest. Digital Commons/Institutional Repository at Pace.
Unconditional Copyright Removing the Camouflage Denise Troll Covey Principal Librarian for Special Projects Erin Rhodes Copyright Permission Assistant.
Seeking Copyright Permission for Open Access Denise Troll Covey Associate Dean, Carnegie Mellon University Libraries Coalition for Networked Information.
Institutional Repositories Tools for scholarship Mary Westell University of Calgary AMTEC Conference May 26, 2005.
Denise Troll Covey Principal Librarian for Special Projects – Carnegie Mellon DLF Forum – April 2004 – New Orleans, LA Copyright Permission for Open Access:
1 Minerva The Web Preservation Project. 2 Team Members Library of Congress Roger Adkins Cassy Ammen Allene Hayes Melissa Levine Diane Kresh Jane Mandelbaum.
Project Updates: Posner & Million Book Projects Denise Troll Covey Principal Librarian for Special Projects July 2004 – University Libraries Staff Meeting.
The Voice of A Community Chinese Times Digitization Project Ian Song Prepared for the Multicultural Canada Conference
The National Digital Newspaper Program (NDNP) An NEH/LC Collaborative Program Enhancing access to historical newspapers Release: September 2006.
Million Book Project (MBP) Gloriana St. Clair Johns Hopkins University February 5, 2003.
Sample Search ___________________________________ Search Results Abstract ___________________________________ Full Text Online Catalog WorldCat Assessment.
Isabel Silver and Laurie Taylor IMLS Library Publishing Services Workshop May 5, 2011 UF Smathers Libraries Publishing Services.
Serenate1 Non-standard users: The Library Raf Dekeyser K.U.Leuven.
Web Capture team Office of strategic initiatives February 27, 2006 Selecting Content from the Web: Challenges and Experiences of the Library of Congress.
OCLC Online Computer Library Center CONTENTdm ® Digital Collection Management Software Ron Gardner, OCLC Digital Services Consultant ICOLC Meeting April.
Million Book Project (MBP) Coalition for Networked Information December 5-6, 2002.
Ask A Librarian and QuestionPoint: Integrating Collaborative Digital Reference in the Real World (and in a really big library) Linda J. White Digital Project.
Marshall University Electronic Theses & Dissertations Program Implementation Issues & Responsibilities.
Exploring the Feasibility of Seeking Copyright Permissions ALA Annual Conference June 16, 2001 Carole A. George, Ed. D. Carnegie Mellon University Libraries.
Live Search Books University of Toronto – Scholar’s Portal Forum 2007 January 2007.
Google Books, UMI and Other Intriguing Trends in Digital Publishing Joe Wible Hopkins Marine Station of Stanford University October 9, 2006.
11-15 April 2011 Mauritius Institute of Health S.S.Pillai
The New Digital World and the Transformation of Information and Libraries Patricia L. Thibodeau Associate Dean Library Services & Archives Oct. 26, 2011.
Preserving Digital Collections for Future Scholarship Oya Y. Rieger Cornell University
OCLC Online Computer Library Center Kathy Kie December 2007 OCLC Cataloging & Metadata Services an introduction.
Ms. Irene Onyancha ISTD/Library & Information Management Services United Nations Economic Commission for Africa The Second Session of the Committee on.
ERIC and the WorldCat Registry Lawrence Henry ERIC Program Manager Joanna White WorldCat Registry Product Manager.
Preserving Digital Culture: Tools & Strategies for Building Web Archives : Tools and Strategies for Building Web Archives Internet Librarian 2009 Tracy.
University of California Mass Digitization Projects Update Users Council Annual Meeting May 8, 2008 Heather Christenson, Mass Digitization Project Mgr,
Library of Vilnius Gediminas Technical University Asta Katinaitė, Aurelija Striogienė
OpenWeb: Expanding access to Digital Collections Marshall Breeding Director for Innovative Technologies and Research Vanderbilt University
Bringing Electronic Books to Libraries netLibrary.
1 UNOG Library Digitization and Microform Unit (DMU) – December 2009.
CBSOR,Indian Statistical Institute 30th March 07, ISI,Kokata 1 Digital Repository support for Consortium Dr. Devika P. Madalli Documentation Research &
Implementing an Institutional Repository: Part III 16 th North Carolina Serials Conference March 29, 2007 Resource Issues.
Preservation Program Digital Preservation Program Digital Preservation Services: Extending tools to meet campus needs Patricia Cruse, Director, Digital.
JOURNAL CITATION REPORTS James Cook University Celebrating Research 9 OCTOBER 2009 Steven Werkheiser Manager, Customer Education & Training ANZ Thomson.
Mass Digitization Projects Celebration and Challenges Presented to the 2 nd ICUDL Alexandria, Egypt by Dr. Gloriana St. Clair Carnegie Mellon University.
“Help, we started a journal!” Adventures in supporting open access publishing using Open Journal Systems Anna Craft Metadata Cataloger The University of.
A Resource Discovery Service for the Library of Texas Requirements, Architecture, and Interoperability Testing William E. Moen, Ph.D. Principal Investigator.
HINARI: What have we learned? World Health Organization Trieste, October 2003.
1/16/2016I. Revels Digital Imaging Workshop 1 Selection Considerations For Digital Imaging Projects.
Serenate1 The librarian’s view Raf Dekeyser K.U.Leuven.
Carnegie Mellon University’s Million Book Project (MBP) Laurel Foundation – August 27, 2002.
One Library’s Successful Venture in Providing Comprehensive Streaming Media Services Charleston Conference 2015 Saturday, November 7 10:45am - 11:15am.
Million Book Project in U. S. and India International Conference on The Future of the Book April 22, 2003 Gloriana St. Clair Carnegie Mellon University.
Million Book Project: Vision Becoming Reality Gabrielle Michalek, Carnegie Mellon Presentation to Carnegie Mellon Qatar Library November 9 & 10, 2005.
Rebecca L. Mugridge LFO Research Colloquium March 19, 2008.
Million Book Project: Collections Dr. Gloriana St. Clair University Librarian, Carnegie Mellon.
Million Book Project Today
Copyright Permission for Open Access: Costs, Strategies, & Success Rates Denise Troll Covey Principal Librarian for Special Projects – Carnegie Mellon.
Carnegie Mellon University Libraries
Turning to Dust or Digital
The Million Book Project: Removing Obstacles to Use, Satisfaction, & Success Denise Troll Covey Principal Librarian for Special Projects – Carnegie Mellon.
Implementing an Institutional Repository: Part III
AUC’s Role In Facilitating Access To Knowledge In The Arab World
Presentation transcript:

Denise Troll Covey Associate Dean, University Libraries, Carnegie Mellon Pennsylvania Library Association Conference Pittsburgh, PA – October 5, 2003 Understanding & Assessing The Million Book Project

Million Book Project Vision “Attempt to understand & solve the technical, economic, & social policy issues of providing online access to all creative works of the human race.” – Dr. Raj Reddy

What is the Million Book Project? Effort to digitize & provide full-text searching & free-to-read access to a million books by 2007 Collection development Copyright permissions Acquisitions & shipping Scanning operations Proposal writing

Why is the Million Book Project? Democratize knowledge & empower citizenry Address disparity in library size & accessibility Facilitate new knowledge Combining old & new, east & west, technical & humanistic Enhance student learning & success of faculty research Address copyright absurdities

Why is the Million Book Project? Support digital library research Information distribution, management, & sustainability Security, copyright, & digital rights management Accuracy of optical character recognition (OCR) OCR of non-Romanic languages & scripts Automatic creation of structural metadata Automatic summarization Intelligent indexing Machine translation Storage formats Search engines

Who is involved? Carnegie Mellon University Libraries & School of Computer Science Other U.S. libraries Internet Archive India & China OCLC, DLF, & CRL Archival Resource Company Inc.

Funding Collection development NSF – $35,000 for initial planning meeting 2001 Funded by partners Copyright permission UC Merced – $35,000 Carnegie Mellon – ? Project administration Carnegie Mellon – ? Equipment & travel NSF – $3.6 million (discounts from Minolta) Labor for scanning India – $1.5 million China – ? Acquisitions & shipping Internet Archive – ? NSF – pilot shipment

Collection Development Strategies Librarians as selectors Best books – cited in bibliographies Technical reports & government documents Priorities of participants & funding agencies Topics to get full collections What can we acquire Bulk, cheap, fast Libraries weeding, closing, renovating

Initial Collection of Collections In Copyright Indigenous Indian & Chinese materials Public Domain 100, , ,000 Multi-lingual & multi-script language processing November 2001 planning meeting funded by NSF Books for College Libraries & other selected bibliographies

Current Collection of Collections In Copyright Indigenous Indian & Chinese materials Public Domain 500, , ,000 Multi-lingual & multi-script language processing Books for College Libraries & other selected bibliographies New copyright permission strategy

Scanning Underway in India Multiple centers – each with terabyte storage Indigenous materials Shipments from U.S. 100,000 books by 2004 Above average wages

6000 Book Pilot Shipment to India 20 ft ocean container – 25 days NY to Chennai 243 boxes – 9 palettes – 11,298 lbs – duty free Mostly public domain – government documents, social science, biography, history, & literature Approximate cost $2 per book round trip 4000 books did not have to be returned 2000 books were returned in good condition August 2002 to August 2003

Lessons Learned from Pilot Reduce shipping cost per book Change packing method = cost $1 per book or 50 cents per book shipped one way Reduce turn-around time Learned procedures for clearing customs Establish 5 international centers for U.S. shipments Funded by Indian government To be inaugurated January 2004 Receive books, scan, check quality, return

Distribution in India Central Distribution Site Deemed University, Thanjavur Pilot BangaloreHyderabad Current AllahabadCalcuttaDelhi

Scanning Underway in China Customs & content issues prohibit shipping books to China Scanning indigenous materials & U.S. copyrighted works already in their libraries (with permission granted) Above average wages

Standards & Workflow National standards for digital preservation Developed by IMLS 2001 & endorsed by DLF National standards for cataloging Carnegie Mellon University Libraries Developed & documented workflow Provided training

Digitization Workflow Operators scan, post-process, & OCR 600 DPI TIFF (v5) images ScanFix post-processing Abby Fine Reader OCR 98% accuracy with English Some foreign languages OCR being developed for other languages & scripts

Optimum Scanner Throughput 4000 books per year per Minolta scanner One scanner, two shifts daily = 16 books per day 250 work days per year = 4000 books per year 72,000 books per year with current 18 scanners 400,000 books per year with 100 scanners Allowing 50% deterioration in throughput, 100 scanners can complete the project in 5 years

Metadata Workflow Librarians capture Bibliographic metadata - for delivery system MARC from OCLC or create Dublin Core Guest IDs provided by OCLC Administrative metadata - for reporting system Bibliographic metadata Source library Return requested Copyright status – check renewal records Permission status – used by delivery system Collaborative development by India & Carnegie Mellon

Copyright Workflow #1 India Carnegie Mellon

Copyright Workflow #2 India Carnegie Mellon Internet Archive & Archival Resource Company, Inc.

Contextual Searching Legal to scan & create index without permission When no permission granted to display copyrighted book, search returns query terms in OCR context

Acquiring the Collection Archival Resources Company Inc. Packing, shipping, & tracking Help locate & acquire books Weeded collections Closing libraries Acquisitions web site Materials wanted Loaning & donating Insurance

Integrating the Collection Transporting files to Carnegie Mellon Inadequate Internet bandwidth Expense of copying to & from gold CD or DVD Physically transport files on disks

Sustaining the Collection Goal is 10 organizations host the Collection India – Digital Library of India multiple locations China – site(s) not yet known U.S. – Carnegie Mellon, Internet Archive, & University of California Merced Discussions with OCLC, Library of Congress, & Digital Library of Alexandria Estimated cost one million $$ per host site Estimated size is 20 terabytes

Next Steps Ship 12,500 books one-way to Hyderabad, India from University of $6000 Negotiations UMI/Proquest – print-on-demand service OCLC – digital registry & identifying source libraries CRL – supply or help acquire books November 2003 – collection meeting NSF proposal to create database of copyright renewal records

Print on Demand Service UMI/ProQuest Handle financial transactions Print, bind, & send books to customers Collaboration with Carnegie Mellon Negotiate royalties with publishers Develop suitable business model

Digital Rights Management – Lite Free-to-read by any Internet user Difficult to save or print books One page at a time using browser Secure servers restrict access Discourage hacking by offering affordable printed, bound books

Global Business Model Hardback book = $30.00 Paperback book = $15.00 Digitalback book = cost of cup of coffee Internet Archive POD = $1.00 paperback India POD = $0.80 paperback; $2.00 hardback

Open Access Feasibility Study Couldn’t locate publisher for 11% books If located publisher, half didn’t respond Even to second letter If got response Fewer than half gave permission Often permission was restricted 22% permission granted – statistically valid random sample

Success Rate Scholarly associations 45% University presses 37% Museums & galleries 31% Commercial publishers 12% Permission by Publisher Type 22% overall

Open Access Fine & Rare Books 367 titles in copyright (34% of collection) Couldn’t locate copyright holder for 13% of titles 127 letters & 44 follow-up calls to date 56% titles permission granted 6% with restrictions 3% titles permission denied Assumed if 3 contacts get no response

Transaction Costs $ 6,550FTE labor $ 225Phone calls $ 65Paper & postage $ 6,840TOTAL May 2003 through August 2003 Does not include legal fees, cost of Internet connectivity or administrator time $37.00 per title

Copyright Negotiations Educate Find online, but use print Online access increases use Open access doesn’t decrease, & can increase sales Copyright absurdity Ask Non-exclusive permission to scan & provide open access Minimal system functionality Give Preservation-quality copies Metadata & OCR Motivate $$ Use in added-value, fee-based services $$ Print on demand for out-of-print titles $$ Buy button for in-print titles Million Book Project

Initial Copyright Approach Do not pay permission cost Focus on out-of-print, in-copyright titles Books for College Libraries has 50,000 titles Begin with scholarly associations & university presses Transaction cost per title is prohibitive Identifying & inserting titles in letters Negotiating & tracking permission per title

Epiphany & New Approach Focus on publishers of quality books Treat bibliographies as approval plan of publishers Books for College Libraries has 5600 publishers Ask for permission to digitize All out-of-print, in-copyright titles All titles published prior to a date of their choosing All titles published # or more years ago List of titles they provide Follow-up phone call or visit

Current Statistics 5600 publishers in Books for College Libraries Using intermittent labor Couldn’t locate 30 publishers (so far) 184 letters & 24 follow-up calls to date 4% permission granted 5% permission denied Full-time staff October 2003

Results of New Approach Estimate transaction costs remain the same But acquire more books for $$ spent National Academy Press – 99% increase 26 titles in Books for College Libraries Permission for 3,046 titles Brookings Institution – 96% increase Rand McNally – 60% increase

“More Bang for the Buck” Indigenous Materials Public Domain In Copyright Initial Current

Projections Success rate (# BCL publishers) # of books per publisher Million Book Collection 4% (224) ,000 6% (336) ,000 22% (1,232)15001,848,000 We could need to negotiate with India for more labor

Transaction log analysis – beginning 2004 Number of searches, browses, pages displayed Use per title online & print-on-demand Use by different geographic demographics Use per time of day, day of week, month of the year Usage Assessments Outcomes assessment – User demographics – age, gender, location How users found the Collection Why they used it & what they did with it What difference the Collection made Their assessment of the quality of the Collection & the usability & functionality of the system Their view of the significance of the project

Copyright Assessments – 2006 Number of copyrighted books in the collection Success rate of permission requests Survey of participating publishers Overall satisfaction Quality of the copies What they did or plan to do with the copies Impact on revenue & view of open access

Dissemination Million Book Collection Books accessible via Google search Libraries can link to collection from web site Libraries can link books to catalog records Publisher database Successful negotiation strategies Research test bed

Thank you!