Project Updates: Posner & Million Book Projects Denise Troll Covey Principal Librarian for Special Projects July 2004 – University Libraries Staff Meeting.

Slides:



Advertisements
Similar presentations
Partnering with Faculty / researchers to Enhance Scholarly Communication Caroline Mutwiri.
Advertisements

Million Book Project Today Gloriana St. Clair October 21, 2003 OCLC.
Turning to Dust or Digital Denise Troll Covey Associate University Librarian, Carnegie Mellon Future of the Book Conference Cairns, Australia – April 2003.
MANAGING YOUR BUSINESS JUST GOT EASIER WITH.... A WORKFLOW AND DOCUMENT MANAGEMENT SOFTWARE.
The UM Libraries’ Frost Concert Archive Documenting the Performance History of the University of Miami Frost School of Music Amy Strickland University.
Welcome to the eCO (electronic Copyright Office) Standard Application Tutorial A guide for completing your electronic copyright registration.
Million Book Project: Dreams and Realities Dr. Gloriana St. Clair University Librarian, Carnegie Mellon.
The Million Book Project: Removing Obstacles to Use, Satisfaction, & Success Denise Troll Covey Principal Librarian for Special Projects – Carnegie Mellon.
Faculty Self-Archiving: The Gap between Opportunity and Practice Denise Troll Covey Carnegie Mellon University Libraries DLF Forum – November 2007.
Kabel Nathan Stanwicks, Head Circulation and Media Services Department Electronic Reserves Introductory Tutorial for Faculty.
The Million Book Project: Confronting Copyright Absurdity, Creating Copyright Hope Denise Troll Covey Associate Dean, Carnegie Mellon University Libraries.
Denise Troll Covey Principal Librarian for Special Projects The Impact of Current Copyright Law Erin Rhodes Copyright Permission Assistant Carnegie Mellon.
Global Cooperation for Global Access: The Million Book Project Denise Troll Covey Principal Librarian for Special Projects Carnegie Mellon CRIS 2004 –
Unconditional Copyright Removing the Camouflage Denise Troll Covey Principal Librarian for Special Projects Erin Rhodes Copyright Permission Assistant.
Seeking Copyright Permission for Open Access Denise Troll Covey Associate Dean, Carnegie Mellon University Libraries Coalition for Networked Information.
Denise Troll Covey Principal Librarian for Special Projects – Carnegie Mellon DLF Forum – April 2004 – New Orleans, LA Copyright Permission for Open Access:
Digitization Projects: Internal Development vs. Outsourcing Production or D.I.Y. vs. The Pros.
Chapter 3 Applications Software: Getting the Work Done.
The OECD and electronic records 18June 2001 Head of OECD Archives and Records Management Service Mary-Ann Grosset Organisation for.
The Voice of A Community Chinese Times Digitization Project Ian Song Prepared for the Multicultural Canada Conference
Redaction Best Practices Presented at ____PREP Chapter on Date… All Material © 2015 Property Records Industry Association. All Rights Reserved. Unauthorized.
© 2008 The McGraw-Hill Companies, Inc. All rights reserved. M I C R O S O F T ® Preparing for Electronic Distribution Lesson 14.
World Bank, Africa Region, Africa Household Survey Databank - The World Bank - Africa.
High-Speed, High Volume Document Storage, Retrieval, and Manipulation with Documentum and Snowbound March 8, 2007.
Denise Troll Covey Associate Dean, University Libraries, Carnegie Mellon Pennsylvania Library Association Conference Pittsburgh, PA – October 5, 2003 Understanding.
Million Book Project (MBP) Gloriana St. Clair Johns Hopkins University February 5, 2003.
Rich Foley - Executive Vice President Academic & Public Markets Helen Wilbur - Vice President Consortia Sales & Marketing Digital ArchivesResearch CollectionseBooks.
Presented by Christy Shorey Retrospective Dissertation Scanning at the University of Florida.
Mark Phillips Digital Projects Department University of North Texas Annexation of Texas Project.
2 Session S105 FISAP On The Web n eCB’s FISAP On The Web
WorldCat Knowledge Base and Direct Request: Successful Implementation for ILL Usage Carol Creager and Sean Crowley, MBC Katherine McKenzie, CWM Anne C.
OCLC Online Computer Library Center CONTENTdm ® Digital Collection Management Software Ron Gardner, OCLC Digital Services Consultant ICOLC Meeting April.
Million Book Project (MBP) Coalition for Networked Information December 5-6, 2002.
Marshall University Electronic Theses & Dissertations Program Implementation Issues & Responsibilities.
Exploring the Feasibility of Seeking Copyright Permissions ALA Annual Conference June 16, 2001 Carole A. George, Ed. D. Carnegie Mellon University Libraries.
©2006, CSA Creating and Managing Your COS Expertise Profile Managing Your CV and Promoting Your Work ® Resources for Research, Worldwide.
Google Books, UMI and Other Intriguing Trends in Digital Publishing Joe Wible Hopkins Marine Station of Stanford University October 9, 2006.
OCLC Online Computer Library Center Kathy Kie December 2007 OCLC Cataloging & Metadata Services an introduction.
Ms. Irene Onyancha ISTD/Library & Information Management Services United Nations Economic Commission for Africa The Second Session of the Committee on.
1 OPOL Training (OrderPro Online) Prepared by Christina Van Metre Independent Educational Consultant CTO, Business Development Team © Training Version.
VistA Imaging Capture via Scanning. October VistA Imaging Capture via Scanning The information in this documentation includes only new and updated.
University of California Mass Digitization Projects Update Users Council Annual Meeting May 8, 2008 Heather Christenson, Mass Digitization Project Mgr,
OpenWeb: Expanding access to Digital Collections Marshall Breeding Director for Innovative Technologies and Research Vanderbilt University
Library Research Sources at UGA. UGA Libraries  Comprised of the Main library, Science library, Student Learning Center and Research Facilities  3.7.
1 UNOG Library Digitization and Microform Unit (DMU) – December 2009.
Training by the Office of Library and Information Services Contact for more information: karen.gardner- or
Adobe Dreamweaver CS3 Revealed CHAPTER SIX: MANAGING A WEB SERVER AND FILES.
INTELLECTUAL RIGHTS AND HISTORIC CORPORA Mark Sandler University of Michigan ICOLC, March, 2003.
Mass Digitization Projects Celebration and Challenges Presented to the 2 nd ICUDL Alexandria, Egypt by Dr. Gloriana St. Clair Carnegie Mellon University.
“Help, we started a journal!” Adventures in supporting open access publishing using Open Journal Systems Anna Craft Metadata Cataloger The University of.
Stacy Nowicki, Library Director Michigan Academic Library Council Meeting Davenport University, Grand Rapids, MI 18 March 2011 Dspace at Kalamazoo College.
C2 Applications Software Getting the Work Done Solve a particular problem or perform a particular task.
Carnegie Mellon University’s Million Book Project (MBP) Laurel Foundation – August 27, 2002.
One Library’s Successful Venture in Providing Comprehensive Streaming Media Services Charleston Conference 2015 Saturday, November 7 10:45am - 11:15am.
Million Book Project in U. S. and India International Conference on The Future of the Book April 22, 2003 Gloriana St. Clair Carnegie Mellon University.
Using Content Presented by Karen Andrews Physical Sciences & Engineering Librarian, U.C. Davis Tuesday, September 13, :30-9:30 ASIDIC Fall 2005 Meeting.
Collecting Copyright Transfers and Disclosures via Editorial Manager™ -- Editorial Office Guide 2015.
CALIFORNIA DEPARTMENT OF EDUCATION Tom Torlakson, State Superintendent of Public Instruction Instructional Materials Ordering and Distribution System (IMODS)
American Diploma Project Administrative Site Training.
American Diploma Project Administrative Site Training.
Digitalcommons.unl.edu Archiving Department Records.
Million Book Project: Vision Becoming Reality Gabrielle Michalek, Carnegie Mellon Presentation to Carnegie Mellon Qatar Library November 9 & 10, 2005.
Million Book Project: Collections Dr. Gloriana St. Clair University Librarian, Carnegie Mellon.
Million Book Project Today
Copyright Permission for Open Access: Costs, Strategies, & Success Rates Denise Troll Covey Principal Librarian for Special Projects – Carnegie Mellon.
Turning to Dust or Digital
The Million Book Project: Removing Obstacles to Use, Satisfaction, & Success Denise Troll Covey Principal Librarian for Special Projects – Carnegie Mellon.
Standards For Collection Management ALCTS Webinar – October 7, 2014
Implementing an Institutional Repository: Part III
Presentation transcript:

Project Updates: Posner & Million Book Projects Denise Troll Covey Principal Librarian for Special Projects July 2004 – University Libraries Staff Meeting

Posner Project 2002 – 2004 Digitize the collection of Henry Posner Sr. –642 titles (1106 volumes) archival folders Acquire copyright permission –27% of volumes & small % of archival documents Create web site Provide unrestricted access Funded by Henry & Helen Posner Jr.

Preparation Evaluated & purchased Zeutschel scanner Interviewed & hired scanner operator Organized working group Designed workflow

Posner Working Group 2001–2002 Erika Linke Project Director 2002–2004 Denise Troll Covey Project Director –Linda Dujmic, Carole George, Bella Gerlich, Terry Hurlbert, Mary Kay Johnsen, Chris Kellen, Ann Marie Mesco, Joe Mesco, Gabrielle Michalek, Angel Morris, Jon Singletary, Brian Woolstrum

Scanning the Books 313,413 pages – all pages that can be scanned –600 DPI full color TIFF images –Most books scanned on Zeutschel –24 books scanned on loaner Indus –20 books not scanned –35 books partially scanned 200 gigabytes total storage –Scanned 40 gigabytes every 3–10 days

55 Books Not Scanned or Not Entirely Reasons –Bound too tight, pages uncut, poor condition, too large, too small, vellum Plans (20 surrogates located) –Locate & link to electronic versions at other sites –Digitize same title, different edition in our collection

Scanning the Archival documents 1,714 pages – all docs that are not confidential –Correspondence, dealer catalogs, pamphlets, advertisements, newspaper clippings, & newsletters Gloriana will verify that nothing confidential was scanned

Design, Implement & Maintain Search & retrieval system (DIVA v2) Usage statistics User Interface design & functionality were informed by user studies

Search Currently search via metadata Full text search coming soon

Browse

Results Click to display book

Display – Full Text Available Click to display full text

Navigate

Display – Annotation Available Click to display annotation

User Interface Messages Blank pages Books not scanned in entirety No permission to display Alternative copy

Posner Collection Usage Books: 633 titles (1004 volumes) online Archival documents not yet online

Workflow Plan 2003 Scan page images (create TIFFs) Post–process (Wolfpack) –Fix – straighten, remove speckles, etc. –Convert to JPEG for display –Convert to text for full text search (OCR) Backup to tape Ingest into system for search & retrieval Update DOI server & add PURL to library catalog

Problems No OCR in initial workflow –Not specified in proposal –No OCR software for color images Scanned faster than we could post process –Huge backlogs of books Scanned to servers, waiting for post–processing On backup tape, waiting to restore & OCR Identified OCR as bottleneck Changed workflow to ingest books prior to OCR

Many–to–One, One–to–Many Problem Books –1 title  many volumes –Many titles  1 volume Archival folders –1 folder  many titles –Many folders  1 title

2004 Quality Control Discovered project personnel had different versions of the “complete” list of Posner books Created canonical spreadsheet Checking status of each book –Scanned & fixed page images –Converted to JPEG –Converted to text (OCR) –Added PURL to library catalog

Copyright Permission Goal: permission to digitize & provide open access –302 copyrighted volumes (27% of Collection) –Archival catalogs & newsletters

Copyright Permissions Workflow Identify copyrighted volumes (27%) Identify & locate copyright holders Send letters requesting non–exclusive permission –To digitize & provide open access –Option to restrict access to Carnegie Mellon only Follow up with phone call or Negotiate to closure Update statistics & database

Problems Seeking Permission Identifying & locating copyright holders Publishers –Lost or misplaced request letter –Don’t know what they published –Don’t know what rights they have –Afraid of open access & lost revenue Learning copyright laws –Abandoned foreign works

29% 71% 24% 22% 54% 32% 17% 15% 36% Permission granted Permission denied No response Not located Posner Study Success Rates Received permission for 48% of books 2% of publishers applied access restrictions

Analysis by Copyright Holder Type Scholarly associations University presses Commercial publishers Authors & estates Other Success rate based on responses

Transaction Costs $ 10,808FTE labor $ 379Phone calls $ 100Paper & postage $ 11,287TOTAL May 2003 – October 2003 Does not include legal fees, administrator time, or cost of Internet connectivity or database creation. $78 per book/volume 174 letters & 159 follow–up calls or

Other Work Created student internships (funded by Posner) Assisted Posner Center opening May 2004 –DVD, “jade” reproduction, initial exhibit Moving the Collection August 2004 Ongoing –Posner Center & Collection security –Staffing the Posner Center –Selecting & supervising interns –Preparing exhibits

The Million Book Project Digitize & provide open access to a million books Vision, leadership, & research – Carnegie Mellon $$ Equipment & travel – NSF $$ Labor & research – India & China “Attempt to understand & solve the technical, economic, & social policy issues of providing online access to all creative works of the human race.” Raj Reddy

Rationale Democratize knowledge & empower citizenry – Address disparity in library size & accessibility Facilitate new knowledge – Combining old & new, east & west, technical & humanistic Enhance student learning & success of faculty research Preserve cultural treasures Address copyright absurdities

Support Digital Library Research Information distribution, management, & sustainability Security, copyright, & digital rights management Accuracy of optical character recognition (OCR) OCR of non–Romanic languages & scripts Automatic creation of structural metadata Automatic summarization Intelligent indexing Machine translation Storage formats Search engines

Overview 2001– –NSF funded collection & planning meeting –Meet with Chinese partners in U.S –NSF funded equipment & travel –Meet with Chinese partners in China –Meet with Indian partners in U.S. –Pilot shipment of books to India Gloriana & Mark Kamlet

Overview 2003 – –All partners meet in India –Meet with partner OCLC –Partner U.C. Merced Libraries fund copyright permission work 2004 –All partners meet in U.S. –Lesk work on identifying public domain books –Kahle v Ashcroft Meeting with Indian President Dr. A.P.J. Abdul Kalam

Collection & Planning Meeting November 2001 Collection of collections Copyright considerations –Books for College Libraries –Separate funding Avoid duplicate scanning Safe, affordable shipments Pilot shipment to India Indigenous Materials Public Domain In Copyright

2002 Purchased scanners Sent scanners to India & China Trained scanner operators & librarians Began scanning indigenous materials Began working with DLF & OCLC to develop digital registry Incised palm leaves Saraswathi Mahal Library

Pilot Shipment to India Questions to be answered –What does it take to pull & pack books for shipment abroad & track their scanning & return? –Should shipments be coordinated among participating U.S. libraries, centralized, or individualized? –How long will the books be away? –Will they be returned safely? –How will the scanning turn out?

Explore trade–offs By air is faster, more expensive –Air containers hold 1500 kilos (3300 pounds) By sea is slower, less expensive –Ocean containers hold four times as much Preliminary talk with Emery –Space available on flights to India & China, but little space available on return flights –Suggested shrink–wrap & return books by sea

August 2002 Pilot Shipment to India 6,000 books – mostly public domain –243 boxes, 11,298 pounds of books on nine pallets –Shipped from New York to Chennai in a 20–foot ocean container Shipment took 25 days to reach Chennai Round trip cost $2 per book –A third of the books had to be returned to the U.S.

Pilot Shipment Book Distribution From Chennai, books went to the central distribution center in Tanjore From Tanjore, books were distributed to scanning centers Pilot Central Distribution Site Deemed University

One Year Later... Most of the books were returned in good condition in August 2003 Lost books located & returned 2004 Scanning center in Hyderabad, India

Lessons Learned & New Strategies Reduce costs –Packing books in crates costs $1 per book round trip –Books that don’t need to be returned cost just $0.50 Reduce turn around time –Learned customs procedures –Changed distribution strategy Books now go direct from seaport to 4 international megacenters

Copyright Permission Dedicated labor beginning November 2003 –Funded by U.C. Merced Libraries Send request letter + prompt follow up call or Books for College Libraries –50,000 titles –5,600 publishers Scanning center in Hyderabad, India

“More Bang for the Buck!” Indigenous Materials in India & China Public Domain In Copyright Shifted from per title to per publisher approach Original Current

Request Letter Educate about open access & user behavior Ask for non–exclusive permission to digitize –All out of print, in copyright titles –All titles published prior to a date of their choosing –All titles published # or more years ago –List of titles they provide Assure – follow standards & laws; limit print & save Give – images, metadata, & OCR  $$$

Problems Seeking Permission Identifying & locating copyright holders Publishers –Lost or misplaced request letter –Afraid of open access & lost revenue –Don’t know what they published or what rights they have & don’t have the resources to figure it out –Involved in other projects – perhaps exclusively –Copyright reverts to author when book out of print

46% 54% 73% 12% 14% Permission granted Permission denied Still negotiating Preliminary Statistics Need 18% success rate with BCL publishers granting permission for 500 books each 371 publishers contacted 14% success rate 46,700+ titles

Analysis by Copyright Holder Type Scholarly associations University presses Commercial publishers Authors & estates Other Million Book Project Success rate based on responses

Estimated Transaction Costs $ 18,846Labor $ 323Follow up $ 194Paper & postage $ 19,363TOTAL Nov 2003 – Jun 2004 Does not include legal fees, administrator time, or cost of Internet connectivity or database creation.  $0.42 per book 540 letters & 359 follow up calls or Internet Café, India

Experiment: Renewal Databases Catalog of Copyright Entries (Digitized at Carnegie Mellon) – Library of Congress Copyright Catalog – –To find renewals 1973–1978 must consult another source for registration numbers Lesk Copyright Renewal Records – –Functionality created & enhanced by Michael Lesk

Experiment: U.S. Office of Copyright Expedite identifying & locating copyright holder –Asked to identify & locate copyright holders of 7 titles –$150 fee – charged 3 days after request –Estimated 4 to 6 weeks –Nudged at 8 weeks –Took 15 weeks to respond –Confirmed one citation Scanning center in Hyderabad, India

Experiment: Authors Registry Expedite locating copyright holder –Asked to locate 25 authors or estates –$2.50 fee per author/estate found –Same day response –Found 52% –92% accuracy rate Authors likely to grant permission, but transaction cost per book is high Scanning center in Hyderabad, India

Experiment: Workflow Time Trials Need to generate lists of BCL titles for publishers that will consider a list that we provide Expedite generating lists of titles (from dirty OCR) –Verify citation – 30% improvement using digital BCL over WorldCat –Verify copyright status – improvement using Lesk’s enhanced copyright renewal records –Verify print status – not cost effective Reduced cost per title from $2.12 to $1.41

Next Steps Partnerships pending –Vendor to provide print–on–demand service –U.S. Office of Copyright to study impact of CTEA Standards & best practices –NISO solicited proposal to develop copyright metadata –Lead best practice for acquiring copyright permission Research –Continue data gathering & analyses –Survey participating publishers’ views of open access

2004 Meeting at Carnegie Mellon Partners from India & China 124,000+ books scanned in India, China & Egypt Additional centers planned in India, China, Poland, Turkey, Korea

Scanning in China $8.46M from Ministry of Education 2003– ,000 books scanned –Capacity 0.5M pages scanned per day –Want to scan 1M books in 2 years –14 scanning centers 25,000 books from our collection shipped to China July 2004 Scanning center in Beijing

Scanning in India $25M annually to support research 70,000 books scanned –Capacity 1.5M pages per day –20 scanning centers –4 are megacenters Cost 0.69 rupees per page –0.89 rupees with cost of scanner –$0.015 – $0.019 U.S. dollars BooksLanguage 45,660English 10,486Telugu 1,114Sanskrit 1,091Tamil 3,454Urdu 220Kamada 475Hindi 636Marathi 7,158Other

Scanning in India Many books scanned, but no OCR available Will be done with Indian books in 6 to 12 months Working on quality control Librarians in Hyderabad, India

Research Access for visual & hearing impaired Automatic summarization Machine translation OCR Searching User interfaces State Library in Hyderabad, India

Sample Translation Telugu to English

Current book display & navigation Search within the book Next & previous page Zoom in & out Select format Go to page

Proposed book display & navigation Zoom in & out Select format Search within the book Next & previous page Go to page Add or remove bookmark Add to or view bookbag Get info about the book New search for other books Return to results Help Title of book Location within book Navigate book Beginning & end of book

Needs More books –India requested 60K books –China wants 10K books per month More scanners More storage Way to transfer content to Carnegie Mellon –DVD burners to be provided Way to standardize the metadata New policy on copyright

Copyright Policy Public Domain Enhancement Act HR 2601 –Pay $1.00 to maintain copyright after 50 years –Register copyright agent –U.S. Office of Copyright maintains freely accessible list of titles & agents Options to compensate copyright holders –Compulsory licensing – like sound recordings –Public Lending Right – government pays –Subscription model – like HBO –Metered use – pay per view –Free to read – pay to print Want a global model

Lesk Identifying Public Domain Public domain appears to have 5.5M books in English –2M published in U.S. pre–1923 –2M published in U.S. 1923–1963 (90+% not renewed) –1.5M (out of 8M) published in foreign countries 1.5M books in French, German, Spanish & Italian Expect half to be difficult to locate

Kahle v Ashcroft 92% of books are in copyright, but out of print Challenge U.S. copyright system –No records of copyright ownership –Denies public access to orphaned works without providing any benefits –Submit examples of impact of barriers to use of orphaned books

Use Internet Explorer The Universal Library, China site Digital Library of India The Universal Library, U.S. site FAQ:

Thank you! Denise Troll Covey –