Beyond the Google Book: the Future of the Digital Library Cory Snavely Library IT Core Services manager University of Michigan April 20, 2010.

Slides:



Advertisements
Similar presentations
NATIONAL LIBRARY OF MEDICINE PubMed Central Edwin Sequeira National Library of Medicine May 26, 2004.
Advertisements

Home-Grown Digital Library System Built Upon Open Source XML Technologies and Metadata Standards David Lacy Villanova University
HathiTrust Digital Library
HATHI TRUST A Shared Digital Repository Building A Future By Preserving Our Past The Preservation Infrastructure of HathiTrust Digital Library Jeremy York.
HATHI TRUST A Shared Digital Repository HathiTrust Digital Library Is There A Past In Your Future? Princeton University February 2010.
KAT HAGEDORN HATHITRUST SPECIAL PROJECTS COORDINATOR UNIVERSITY OF MICHIGAN LIBRARIES OCTOBER 9, 2009 Seamless Sharing: NYU, HathiTrust, ReCAP and the.
HathiTrust: Building the Universal Collection John Wilkin 18 May 2009.
This Library Never Forgets Preservation, Cooperation, and the Making of HathiTrust Digital Library Jeremy York Project Librarian HathiTrust Digital Library.
Cory Snavely Library IT Core Services manager University of Michigan September 2010.
Building the Universal Library: The Promise and Challenges of HathiTrust John Wilkin 2 April 2009.
HathiTrust Large Scale Search Tom Burton-West Information Retrieval Programmer Digital Library Production Service University of Michigan
HathiTrust Sharing a Federal Print Repository: Issues and Opportunities May 25, 2011 Heather Christenson.
What is HathiTrust and How Can it Make a Difference? Sourcing and Scaling brought to the collective collection.
HATHI TRUST A Shared Digital Repository HathiTrust 101 John Wilkin and Jeremy York August 27, 2010.
What is HathiTrust and Why is it relevant to research libraries? Sourcing and Scaling brought to the collective collection.
Building the Universal Library: Introducing HathiTrust Patricia A. Steele Indiana University Libraries John Price Wilkin University of Michigan Libraries.
HATHI TRUST A Shared Digital Repository Delivering Data For New Generations of Research Strategies and Challenges Jeremy York NISO/BISG Forum ALA 2010.
HATHI TRUST A Shared Digital Repository HathiTrust, Collections, and Collaboration COLD 2011 Spring Meeting Jeremy York May 20, 2011.
KAT HAGEDORN HATHITRUST SPECIAL PROJECTS COORDINATOR UNIVERSITY OF MICHIGAN LIBRARIES OCTOBER 9, 2009 Seamless Sharing: NYU, HathiTrust, ReCAP and the.
E-Content Service Group Virtual Meeting Digital Preservation: How to Get Started.
Digital Preservation A Matter of Trust. Context * As of March 5, 2011.
Capacity Building Passing on the Experience Dr. Noha Adly World Digital Library Arab Peninsula Regional Group meeting.
An Introduction to Repositories Thornton Staples Director of Community Strategy and Alliances Director of the Fedora Project.
Digital Collections: Storage and Access Jon Dunn Assistant Director for Technology IU Digital Library Program
HathiTrust Digital Library: Enrich Your Research and Scholarship Doreen Bradley Chris Powell University Library May 2011.
October 28, 2003Copyright MIT, 2003 METS repositories: DSpace MacKenzie Smith Associate Director for Technology MIT Libraries.
MacKenzie Smith Associate Director for Technology MIT Libraries.
HathiTrust and the Ecology of Shared Collections Paul N. Courant 21 May 2009.
HATHITRUST A Shared Digital Repository We’re Preserving the Past, What About the Present? NISO Webinar: Ensuring the Preservation of E-Books May 23, 2012.
HATHITRUST A Shared Digital Repository HathiTrust current work, challenges, and opportunities for public libraries Creating a Blueprint for a National.
HATHI TRUST A Shared Digital Repository Digital Repositories for Preservation and Access Digital Directions 2013 Jeremy York July 22, 2013 Unless otherwise.
Information Analysis at Scale: HathiTrust Research Center Beth Plale Director, Data to Insight Center Co-Director, HathiTrust Research Center November.
Digital archival storage for the University of Michigan Library collections.
Transformations at GPO: An Update on the Government Printing Office's Future Digital System George Barnum Coalition for Networked Information December.
Constructing the Memories Creating a Digital Collection Linda J. White, Digital Project Coordinator.
Tools and Services for the Long Term Preservation and Access of Digital Archives Joseph JaJa, Mike Smorul, and Sangchul Song Institute for Advanced Computer.
Mike Smorul Saurabh Channan Digital Preservation and Archiving at the Institute for Advanced Computer Studies University of Maryland, College Park.
A Digital Preservation Repository for Duke University Libraries Jim Coble Digital Repository Developer Open Repositories 2013.
HathiTrust – How To By Dr. Rob McGeachin 20 th Annual AgNIC Meeting May 7, 2015.
HATHITRUST A Shared Digital Repository HathiTrust: Putting Research in Context HTRC UnCamp September 10, 2012 John Wilkin, Executive Director, HathiTrust.
Electronic Mail List Preservation Takes Off: The H-Net Archive Lisa M. Schmidt MATRIX: The Center.
City of Seattle Office of the City Clerk Open Government = Access Challenges and Opportunities with Digital Records.
HathiTrust Digital Library. Overview ›Began in 2008 ›Large scale digital preservation repository ›Partnership of major research libraries ›Focus on both.
NCSU Libraries TRLN Digital Preservation Seminar NCSU.
Digitising Journals, March 2000, Copenhagen Astrid Wissenburg Information Services and Systems King’s College London
Preserving Digital Collections for Future Scholarship Oya Y. Rieger Cornell University
Technology Choices for the JSTOR Online Archive Presented by Chang Feng Department of Computer Engineering and Computer Science, University of Missouri-Columbia,
Breana McCracken University of Illinois at Urbana-Champaign HathiTrust and Copyright Future Implications - Strong precedent for libraries to continue to.
HATHITRUST A Shared Digital Repository HathiTrust and TRAC DigitalPreservation 2012 July 25, 2012 Jeremy York, Project Librarian, HathiTrust.
H ATHI T RUST HTTP :// WWW. HATHITRUST. ORG Large-Scale Digital Initiatives and their potential impact on the Maine Shared Collections Strategy Colby College.
HathiTrust’s Past, Present and Future. Short- and Long-term Functional Objectives Short-term Page turner mechanism (and Mobile!) Branding (overall initiative;
Author(s): Jeremy York, 2010 License: Unless otherwise noted, this material is made available under the terms of the Creative Commons Attribution–Noncommercial–Share.
The Data Capacitor and Digital Libraries at IU Jon Dunn Associate Director for Technology IU Digital Library Program February 22, 2006.
Successes and Growing Pains: The Indiana University Digital Library Program Jenn Riley Metadata Librarian Indiana University Digital Library Program January.
Digital preservation activities at the NLW Sally McInnes 18 September 2009.
CONTENT DISCOVERY, SERVICES, AND SUSTAINED ACCESS Timothy Cole, William Mischo, Beth Sandore, Sarah Shreeves ~ University of Illinois Library
HATHI TRUST A Shared Digital Repository Use of PREMIS for Internet Archive AIPs September 22, 2010.
National Library of the Czech Republic as End-User of the Research Networks Adolf Knoll deputy director
Preserving Electronic Mailing Lists as Scholarly Resources: The H-Net Archives Lisa M. Schmidt
HathiTrust: Collaboration in Building the Universal Collection John Wilkin 1 October 2009.
HathiTrust: Possibilities Metadata Working Group Cornell University Library March 21, 2014.
Barbara Preece ICOLC, April Mark Sandler Center for Library Initiatives Chicago Illinois Indiana Iowa Michigan Michigan State Minnesota Northwestern.
HathiTrust Digital Library Interface and Services
UNT Libraries TRAIL Processing Mark Phillips April 26, 2016
Joseph JaJa, Mike Smorul, and Sangchul Song
Building the Universal Library: Introducing HathiTrust
DIGITAL LIBRARY.
HathiTrust And Its Research Center
digital archival storage
Dissemination and Communication Introductory course
Presentation transcript:

Beyond the Google Book: the Future of the Digital Library Cory Snavely Library IT Core Services manager University of Michigan April 20, 2010

HathiTrust project profile Launched October member institutions and growing 99% Google-scanned materials 5.6 million volumes, 350 pages average 210 terabytes 2 US sites

Founding principles Long-term digital preservation Access to materials Digital formats allow simultaneous preservation and access Pioneer the concept of a universal, non- commercial, collaborative digital library

Preservation models Open Archical Information System (OAIS) – Provides formal guidelines for object storage and retrieval. – We incorporate these principles. Trusted Repository Audit Checklist (TRAC) – Provides auditing framework for assuring reliability: policy management, documentation, succession. – Our TRAC audit recently conducted. Still evolving!

Preservation architecture content Google: multi-purpose index and advertising engine HathiTrust: preservation- oriented service architecture managed repository API index service

Access services Bibliographic catalog for traditional discovery – Based on open-source Vufind system Full-text search – Based on open-source Solr system Page turner for online reading – Based on our older open-source digital library system – Only public-domain books according to US copyright law APIs for building new services

Search full-text of this item Save item to new or existing collection Image, text, or pdf views

Funding and Governance Major financial support from UM and Indiana University Cost-recovery based on content deposited Executive committee – deans and CIOs of founding institutions, executive director – budget and major initiatives Strategic advisory board – Representatives of member institutions – development priorities, policy development

Staffing and server infrastructure Significant developer staff contributed by UM One sysadmin of three in Core Services funded by HT Basic infrastructure cost: $3.86/GB – 1 420TB Isilon storage cluster per site Linear cost increment for adding new storage – Tape backup – 2 web, 1 database, 4 search servers in both sites – 5 ingest, 2 index-building servers in Michigan – Shared development environment

Material and Data Flow ingest web sync Google or other scanning project network or media delivery catalog rights database web index

Automated Data Ingest Handles per-volume logistics Rigorously validate identifier, object files, object completeness Generate METS (XML) object inventory Determine copyright by date and place of publication 500K+ volumes/month!

Data Characteristics 1 METS (XML) and 1 Zip archive (JPEG2000 and TIFF images and OCR text) per book 36MB average Zip file size (compressed) Layout uses pairtree, an IETF draft RFC developed at California Digital Library pairtree_root/39/01/51/23/45/67/89/

Search infrastructure web storage search web … … Query submission 2.Query distribution 3.Query processing 4.Results combining 5.Results display 6.Object retrieval 7.Object display 2

Optimizing search infrastructure How many books/shard? …shards/server? …memory/server? right size (600K books/shard) good performance (ms)

Questions? Cory Snavely