Reexamining Digital Library Infrastructure at IU Jon Dunn, Ryan Scherle, Eric Peters Indiana University Digital Library Program IU Digital Library Brown.

Slides:



Advertisements
Similar presentations
Digital Music and Audio Projects at Indiana University Jon Dunn Digital Library Program Indiana University August 16, 2001.
Advertisements

Data Publishing Service Indiana University Stacy Kowalczyk April 9, 2010.
Long-Term Preservation. Technical Approaches to Long-Term Preservation the challenge is to interpret formats a similar development: sound carriers From.
An Introduction to Repositories Thornton Staples Director of Community Strategy and Alliances Director of the Fedora Project.
Digital Collections: Storage and Access Jon Dunn Assistant Director for Technology IU Digital Library Program
October 28, 2003Copyright MIT, 2003 METS repositories: DSpace MacKenzie Smith Associate Director for Technology MIT Libraries.
Vital Implementation Update Vital Implementation Update 11 th January 2006 Paul Bevan – Glen Robson –
METS: An Introduction Structuring Digital Content.
Introduction to metadata for IDAH fellows Jenn Riley Metadata Librarian Digital Library Program.
Andrea Fojtu Charles University in Prague, National Library of the CR.
Fedora 3.0 and METS: A Partnership for the Organization, Presentation and Preservation of Digital Objects Open Repositories Georgia Tech, Atlanta,
Depositing e-material to The National Library of Sweden.
From EAD to METS An overview and history of METS Rick Beaubien UC Berkeley.
Building a Digital Library with Fedora International Conference on Developing Digital Institutional Repositories Hong Kong December 9, 2004.
Merrilee Proffitt e(X)literature / Digital Cultures Project April 2003 News from the Digital Library The Metadata Encoding and Transmission Standard; the.
All Things to All People Combining Resources to Build an Integrated Digital Repository Preservation and Access for Electronic College and University Records.
WMS: Democratizing Data
Kristin Eberle Monica Hampton Carmen Velasquez Kristin Eberle Monica Hampton Carmen Velasquez Knowledge Management.
Metadata: use of METS with Fedora Marie Lagerwall Technical Officer Centre for Learning Technology London School of Economics and.
Internet Resources Discovery (IRD) IBM DB2 Digital Library Thanks to Zvika Michnik and Avital Greenberg.
Demonstration of repositories Fedora (Flexible Extensible Digital Object Repository Architecture) Marie Lagerwall MIDESS Partners Meeting February 9, 2007.
Digital Object: A Virtual Online Storage Solution 598C Course Project Huajing Li.
Architecting an Extensible Digital Repository Anoop Kumar, Ranjani Saigal,Rob Chavez, Nikolai Schwertner Tufts University, Medford, MA.
Digital Library Architecture and Technology
Open Source Software Sustainability: A Case Study of Indiana University's Variations Software Jon W. Dunn, Phil Ponella, and Robert H. McDonald Indiana.
METS-Based Cataloging Toolkit for Digital Library Management System Dong, Li Tsinghua University Library
Dr. Kurt Fendt, Comparative Media Studies, MIT MetaMedia An Open Platform for Media Annotation and Sharing Workshop "Online Archives:
Building a Fedora Architecture to Support Diverse Collections Jon Dunn Ryan Scherle Digital Library Program Indiana University.
EVIA Digital Archive Project Jon W. Dunn and William G. Cowan Digital Library Program Indiana University IU Digital Library Brown Bag Series November 19,
Managing the Record of Research At the Smithsonian Using SIdora SAA Research Forum August 12, 2014.
Implementing an Integrated Digital Asset Management System: FEDORA and OAIS in Context Paul Bevan DAMS Implementation Manager
How to build your own Dark Archive (in your spare time) Priscilla Caplan FCLA.
1 XML as a preservation strategy Experiences with the DiVA document format Eva Müller, Uwe Klosa Electronic Publishing Centre Uppsala University Library,
Metadata: Essential Standards for Management of Digital Libraries ALI Digital Library Workshop Linda Cantara, Metadata Librarian Indiana University, Bloomington.
Overview of IU Digital Collections Search Hui Zhang Jon Dunn Indiana University Digital Library Program IU Digital Library Brown Bag October 19, 2011.
OAIS Rathachai Chawuthai Information Management CSIM / AIT Issued document 1.0.
UVa's Digital Library CSG - September 2005 Slides courtesy of: Leslie Johnston Director, Digital Access Services, UVA Library Tim Sigmon University of.
Music Digital Libraries Jon Dunn IU Digital Library Program April 9, 2002.
Library Repositories and the Documentation of Rights Leslie Johnston, University of Virginia Library NISO Workshop on Rights Expression May 19, 2005.
Use & Access 26 March Use “Proof of Concept” Model for General Libraries & IS faculty Model for General Libraries & IS faculty Test bed for DSpace.
The Data Capacitor and Digital Libraries at IU Jon Dunn Associate Director for Technology IU Digital Library Program February 22, 2006.
Successes and Growing Pains: The Indiana University Digital Library Program Jenn Riley Metadata Librarian Indiana University Digital Library Program January.
CONTENT DISCOVERY, SERVICES, AND SUSTAINED ACCESS Timothy Cole, William Mischo, Beth Sandore, Sarah Shreeves ~ University of Illinois Library
Introduction to metadata
ETD2006 Preserving ETDs With D.A.I.T.S.S. FLORIDA CENTER FOR LIBRARY AUTOMATION FC LA PAPER AUTHORS: Chuck Thomas Priscilla.
Alternative Architecture for Information in Digital Libraries Onno W. Purbo
VITAL at the National Library of Wales Glen Robson
OAIS Rathachai Chawuthai Information Management CSIM / AIT Issued document 1.0.
Metadata and Technology/Architecture Working Groups DLF Aquifer Project DLF Fall Forum Providence, RI November 14, 2008.
Metadata “Data about data” Describes various aspects of a digital file or group of files Identifies the parts of a digital object and documents their content,
Metadata and Meta tag. What is metadata? What does metadata do? Metadata schemes What is meta tag? Meta tag example Table of Content.
DSpace - Digital Library Software
Digital Preservation Panel Medusa at the University of Illinois at Urbana-Champaign: A Digital Preservation Service Based on PREMIS Kyle Rimkus, Preservation.
Storage Why is storage an issue? Space requirements Persistence Accessibility Needs depend on purpose of storage Capture/encoding Access/delivery Preservation.
Sharing Digital Scores: Will the Open Archives Initiative Protocol for Metadata Harvesting Provide the Key? Constance Mayer, Harvard University Peter Munstedt,
The library is open Digital Assets Management & Institutional Repository Russian-IUG November 2015 Tomsk, Russia Nabil Saadallah Manager Business.
Lifecycle Metadata for Digital Objects November 15, 2004 Preservation Metadata.
Institutional Repositories July 2007 DIGITAL CURATION creating, managing and preserving digital objects Dr D Peters DISA Digital Innovation South.
NLW. Object Classes Class 1  1 MARC Record  1 Image  No METS Class 2  1 MARC Record  Many images  No METS Class 3  1 MARC Record  Many.
Introduction to metadata for IDAH fellows Jenn Riley Metadata Librarian Digital Library Program.
Building Preservation Environments with Data Grid Technology Reagan W. Moore Presenter: Praveen Namburi.
5/29/2001Y. D. Wu & M. Liu1 Content Management for Digital Library May 29, 2001.
2/26/2004 Dan Swaney 1 Preservation Metadata and the OAIS Information Model A Metadata Framework to Support the Preservation of Digital Objects A review.
Building Foundations: Fedora, Fez, and the ADR prepared by Jessica Branco Colati ADR Project Director, Colorado Alliance of Research Libraries
Building Digital Archives Mark Phillips Cathy Hartman June 6, 2008.
FLORIDA CENTER FOR LIBRARY AUTOMATION
The Fedora Project March 19, 2003 ISTEC Symposium, Brazil
VI-SEEM Data Repository
Metadata to fit your needs... How much is too much?
An Open Archival Repository System for UT Austin
Presentation transcript:

Reexamining Digital Library Infrastructure at IU Jon Dunn, Ryan Scherle, Eric Peters Indiana University Digital Library Program IU Digital Library Brown Bag Series November 30, 2005

Some IU Digital Library History 1995: LETRS – electronic text 1996: Variations, DIDO – audio, images 1997: Digital Library Program

Digital Library Content Types at IU Books Manuscripts Photographs Art images Music audio Video Sheet music Musical score images Music notation files …and more

Current DLP Technical Environment Variety of access systems  DLXS (University of Michigan) Text Finding Aids Bibliographic information  Locally-developed systems Cushman Photograph Collection DIDO: Digital Images Delivered Online Variations/Variations2 Page turners (sheet music, METS Navigator)

Current DLP Technical Environment Variety of storage systems  Local DLP servers  DLP Tivoli Storage Manager  IU Massive Data Storage System (HPSS) No repository

What is a digital library repository? A system (hardware and software) in which to deposit digital objects (files and metadata) for purposes of access and/or long-term storage.

Repository Purposes Access  Web access to digital files and metadata  Services/applications for searching, browsing, transformation, etc. Preservation  Secure storage for digital files and metadata  Services for file integrity checking (using checksums), migration, conversion, etc. Some repositories are single-purpose; some are dual-purpose

Not a New Model… Digital Repository  Common system for storing, managing, and providing access to digital content and metadata Integrated Library System  Common system for storing, managing, and providing access to MARC cataloging records

Why do we need a repository? Isn’t what we have good enough?  Web servers  File servers  Databases  Mass storage systems

Mass Storage Systems High-capacity, high-performance data storage Hardware  Servers  Automated tape libraries, e.g. IBM, Storagetek  Spinning disk Software  HSM: hierarchical storage management  IU uses HPSS (High Performance Storage System) from IBM

Mass Storage Systems Typical features  Bit-level storage and retrieval of files  Security: authentication, authorization  Mirroring of data between sites over a network  Migration of files to new media types Is that enough for digital preservation?

Data Persistence Key is migration Keeping the bits alive  Physical media  Logical media format Keeping the bits understandable  File format  Metadata Digital data must be actively managed Small “pockets” of digital content pose a problem for migration

Digital Objects: More than just files Hi-res page image files (TIFF) Delivery page image files (JPEG) Text transcription (TEI/XML) Metadata Example: Electronic Book

Digital Objects: More than just files Hi-res audio files (Broadcast WAVE) Delivery audio files (MP3 or other) Images of labels, jacket, box, etc. Metadata Example: Sound Recording

Digital Objects: More than just files EAD Finding Aid Example: Archival Collection

DL Objects Digital library “objects” have many parts  Metadata Descriptive, administrative, structural, preservation, …  Preservation/archival files (several)  Delivery files (several)  Persistent identifier How do we keep them connected and organized?  Now: Good practice in file naming, directory organization, project documentation -not scalable!  Future: Digital object repository

A Word About Metadata Descriptive  Used for discovery and identification Technical  Technical characteristics of the object and its components  Used for preservation and for delivery Digital Provenance  How an object got to be what it is today Structural  How the parts of an object relate to each other

Some Relevant Metadata Standards Descriptive  MARC, MARCXML, Dublin Core, MODS, VRA Core, EAD Technical  MIX, PREMIS Digital Provenance  PMD, PREMIS Structural  METS, MPEG-7, MPEG-21

OAIS: Open Archival Information System Conceptual framework for an archival system dedicated to preserving and maintaining access to digital information over the long term Origins in space science community Discusses interactions that producers, consumers, and managers have with a repository Basis for much current thinking on repositories in digital library community  OCLC/RLG Trusted Digital Repositories: Attributes and Responsibilities  RLG/NARA Audit Checklist for Certifying Digital Repositories

OAIS Reference Model

Object Packaging Standards: Content and Metadata Functions in OAIS model  Submission Information Package (SIP)  Archival Information Package (AIP)  Dissemination Information Package (DIP) Two main competitors  METS Metadata Encoding and Transmission Standard  MPEG-21 DIDL Digital Item Declaration Language

METS METS Document Header Descript. MD Admin. MD File List Link Struct. Struct. Map Behaviors

Digital Object Repository Software Platforms Commercial digital asset management / content management / document management systems  e.g. IBM Content Manager, Artesia TEAMS, FileNet, Documentum Open source systems  e.g. Fedora (University of Virginia and Cornell) Homegrown systems  e.g. Harvard, California Digital Library Commercial services  e.g. OCLC Digital Archive

“Digital Repository” vs. “Institutional Repository” Digital repository  Common storage for digital content and metadata  Basic infrastructure component: “plumbing”  e.g. Fedora Institutional repository  Often implies focus on one application: institutional content, research output  e.g. MIT DSpace: “capture, store, index, preserve, and redistribute the intellectual output of a university’s research faculty in digital formats”

Motivation for a Digital Repository at IU Many pockets of digital content and metadata Difficult to sustain  Variable tech support, replacement funding  Harder to preserve, migrate data forward to new software and hardware  Harder to budget for Difficult to build common services and applications  Cross-collection search  Standard interfaces for viewing and playing content  Interfaces to course management and other IT services  OAI data providers  Preservation services (integrity checks, etc.)

Questions In Repository Planning at IU Scope  Just library?  Museums and archives?  All campuses?  Other digital content Instructional (e.g. faculty materials in OnCourse) Business (PR, Athletics, etc.) Funding model Standards  Minimum requirements for content formats and metadata Tools/services/applications  What else is needed to make a repository useful/usable for preservation and access?

Repository Evaluation Criteria Flexibility  Not a rigid data model  Support for many media types, complex digital objects  Not locked into one technology platform (OS, database) Extensibility  Use of modern technologies  Easy integration with other systems/tools  Means of extension/modification  Support for DL standards, particularly metadata Sustainability Supportability Usability Cost

Fedora FEDORA Flexible Extensible Digital Object and Repository Architecture

Fedora - Background Began as CS research project at Cornell –  Architecture  Reference implementation UVa Libraries became interested – 2000  Trying to create a DL architecture  No commercial solutions found Mellon-funded project –  Joint UVa/Cornell project  Update technologies  Make use of relational database  Make more production-ready  IU member of “deployment group” engaged in testing

Fedora - Technical Environment Open Source software Written in Java OS Platforms:  Windows  Linux / Unix  Mac OS X (not yet officially supported) Database support:  MySQL  McKoi  Oracle8i, Oracle9i

What does Fedora do? Manages files or references to files that make up digital objects Manages associations between objects and interfaces Invokes behaviors of objects

What does Fedora not do (yet)? Searching/browsing of metadata and content End-user UI for display/navigation of metadata and content Cataloging tools Preservation services … Fedora is DL “plumbing”… Not an out-of-the- box complete DL system

Digital object identifier Reserved Datastreams Key object metadata Disseminators Pointers to service definitions to provide service-mediated views Datastreams Aggregate content or metadata items Persistent ID (PID) Dublin Core (DC) Datastream Audit Trail (AUDIT) Relations (RELS-EXT ) Disseminator Default Disseminator Fedora Object Model

Fedora Repository and Web Services Web Services Exposure RDF files rdbms

Fedora Service Framework (Fedora 2.1)

Fedora Service Framework ( )

Content models A content model describes the internal structure of a class of Fedora objects  Number & type of datastreams  Number & type of disseminators Benefits of a content model  A method to describe the structure of similar Fedora objects  Facilitate the creation of “batches” of objects  Standardize handling of Fedora objects by tools outside the repository

Content model goals Maintain consistency with other Fedora users Standardize disseminators across objects, shifting the implementation to suit the needs of the collection  Makes it easier to build collection- independent applications on top of Fedora  It’s possible to change implementations behind the scenes (JPEG2000?) Maintain functionality of existing collections

Standard disseminators All objects implement the default disseminator Most objects implement the metadata disseminator Most objects implement type-specific disseminators Default dissem getDefaultContent getLabel getFullView getPreview Metadata dissem getMetadata(type) getDC

Content model for simple images Each image is a single Fedora object Images are available in a variety of sizes Each image belongs to a collection Default dissem Metadata dissem Collection obj Collection dissem Default dissem Metadata dissem Image obj Image dissem Default dissem Metadata dissem Image obj Image dissem

Handling metadata All metadata is stored in a single datastream All metadata is wrapped in a METS document Authoritative metadata is stored at the “natural location” Derived metadata may be stored elsewhere for technical reasons

Fedora Demos Hohenberger collection IU test server (Fedora native interface)  Horseshoe players Horseshoe players  Hohenberger collection Hohenberger collection Fedora at Tufts

Default dissem Metadata dissem Collection obj Collection dissem Default dissem Metadata dissem Book obj Book dissem Default dissem Metadata dissem Book obj Book dissem Default dissem Metadata dissem Page obj Image with Text dissem Default dissem Metadata dissem Page obj Image with Text dissem Default dissem Metadata dissem Page obj Image with Text dissem Default dissem Metadata dissem Page obj Image with Text dissem More complex models

Infrastructure Project Progress New staff hired with support from UITS Scope defined  Start with IUB Libraries Fedora selected as repository Initial planning work on DIDO2 started  Evaluation of tools Content modeling work begun Test import of some existing image collections

Infrastructure Project: Next Steps Finalize project sequencing  DIDO2  Documentary photography  Multi-page image objects  TEI text Define content, metadata standards Define and implement tools  Validation/loading/“ingestion”  Cataloging/metadata creation  Searching/browsing/discovery  Use Ongoing process

Infrastructure Project Challenges Time and resources vs. scope of work Sorting out old collections – digital archeology Implementing new infrastructure while continuing to do new projects Metadata entry / cataloging tool design Integration with MDSS/HPSS

Thank You! Contact info:  Jon Dunn  Ryan Scherle  Eric Peters Thanks to the Fedora project for the diagrams