An Introduction to Repositories Thornton Staples Director of Community Strategy and Alliances Director of the Fedora Project.

Slides:



Advertisements
Similar presentations
E-Content Service Group Virtual Meeting Digital Preservation: How to Get Started.
Advertisements

Digital Preservation A Matter of Trust. Context * As of March 5, 2011.
Preserving and Sharing Digital Data Greg Colati, Director, Archives and Special Collections May 11, 2012.
DuraSpace, Fedora and DuraCloud Triangle Research Libraries Network September, 2009.
DuraSpace: Digital Information All Ways, Always Pretoria, South Africa May 14 th, 2009.
October 28, 2003Copyright MIT, 2003 METS repositories: DSpace MacKenzie Smith Associate Director for Technology MIT Libraries.
DSpace: the MIT Libraries Institutional Repository MacKenzie Smith, MIT EDUCAUSE 2003, November 5 th Copyright MacKenzie Smith, This work is the.
Goals for RUcore o Flexible, extensible cyberinfrastructure for Rutgers University o Integrating platform for legacy information systems o Support preservation.
HATHI TRUST A Shared Digital Repository Digital Repositories for Preservation and Access Digital Directions 2013 Jeremy York July 22, 2013 Unless otherwise.
Oral History, METS and Fedora: Building a Standards-Compliant Audio Preservation Infrastructure.
Fedora 3.0 and METS: A Partnership for the Organization, Presentation and Preservation of Digital Objects Open Repositories Georgia Tech, Atlanta,
Fedora Commons: Introduction and Update Swedish National Library June 24, 2008.
Building a Digital Library with Fedora International Conference on Developing Digital Institutional Repositories Hong Kong December 9, 2004.
All Things to All People Combining Resources to Build an Integrated Digital Repository Preservation and Access for Electronic College and University Records.
PAWN: A Novel Ingestion Workflow Technology for Digital Preservation
THE RUTGERS WORKFLOW MANAGEMENT SYSTEM Mary Beth Weber Cataloging and Metadata Services Rutgers University Libraries August 3, 2007.
PAWN: A Novel Ingestion Workflow Technology for Digital Preservation Mike Smorul, Joseph JaJa, Yang Wang, and Fritz McCall.
Incompatible or Interoperable? A METS bridge for a small gap between two digital preservation software packages Lucas Mak Metadata & CatalogLibrarian
Digital Object: A Virtual Online Storage Solution 598C Course Project Huajing Li.
DuraSpace, Fedora and DuraCloud Thorny Staples Director, Community Strategy and Alliances ESIP Meeting, July 8, 2009.
Adventures in Digital Asset Management: Fedora at the National Library of Wales Glen Robson National Library of Wales
Managing the Record of Research At the Smithsonian Using SIdora SAA Research Forum August 12, 2014.
Research Data Management At the Smithsonian Using SIdora Nano Tech Working Group May 15, 2014.
DSpace. TM 2 Agenda  Introduction to DSpace  DSpace community  Institutional Repository  Easy to add/find content in DSpace  Building Online Communities.
Fedora Commons Overview and Future Plans Sandy Payette, Executive Director Cornell University Library Metadata Working Group June 13, 2008.
Implementing an Integrated Digital Asset Management System: FEDORA and OAIS in Context Paul Bevan DAMS Implementation Manager
1 XML as a preservation strategy Experiences with the DiVA document format Eva Müller, Uwe Klosa Electronic Publishing Centre Uppsala University Library,
Challenges of Digital Media Preservation Karen Cariani, Director Media Library and Archives Dave MacCarn, Chief Technologist.
Lifecycle Metadata for Digital Objects (INF 389K) September 18, 2006 The Big Metadata Picture, Web Access, and the W3C Context.
PLoS ONE Application Journal Publishing System (JPS) First application built on Topaz application framework Web 2.0 –Uses a template engine to display.
Dermot Frost Digital Repository of Ireland Trinity College Dublin.
OAIS Rathachai Chawuthai Information Management CSIM / AIT Issued document 1.0.
1 Schema Registries Steven Hughes, Lou Reich, Dan Crichton NASA 21 October 2015.
The Sustaining Digital Scholarship Project ETD 2005 Thornton Staples Digital Library Research and Development University of Virginia Library.
Implementor’s Panel: BL’s eJournal Archiving solution using METS, MODS and PREMIS Markus Enders, British Library DC2008, Berlin.
Use & Access 26 March Use “Proof of Concept” Model for General Libraries & IS faculty Model for General Libraries & IS faculty Test bed for DSpace.
The FCLA Digital Archive Joint Meeting of CSUL Committees, 2005.
The Canadian Information Network for Research in the Social Sciences and Humanities Tim Au Yeung and Mary Westell Libraries.
Digital preservation activities at the NLW Sally McInnes 18 September 2009.
Technical Update 2008 Sandy Payette, Executive Director Eddie Shin, Senior Developer April 3, 2008 Open Repositories 2008, Fedora User Group.
VITAL at the National Library of Wales Glen Robson
OAIS Rathachai Chawuthai Information Management CSIM / AIT Issued document 1.0.
How to Implement an Institutional Repository: Part II A NASIG 2006 Pre-Conference May 4, 2006 Technical Issues.
Digital Preservation across the technologies, strategies, open standards & interoperability aspects including the legal issues Pratik Shrivastava Scientist.
Metadata “Data about data” Describes various aspects of a digital file or group of files Identifies the parts of a digital object and documents their content,
Research Data Management At the Smithsonian Using Sidora CNI December 10, 2013.
Millman—Nov 04—1 An Update on Digital Libraries David Millman Director of Research & Development Academic Information Systems Columbia University
National Library of Finland Strategic, Systematic and Holistic Approach in Digitisation Cultural unity and diversity of the Baltic Sea Region – common.
DSpace - Digital Library Software
The library is open Digital Assets Management & Institutional Repository Russian-IUG November 2015 Tomsk, Russia Nabil Saadallah Manager Business.
Research Data Management At the Smithsonian PASIG, Washington, DC May 24, 2013.
Lifecycle Metadata for Digital Objects November 15, 2004 Preservation Metadata.
Institutional Repositories July 2007 DIGITAL CURATION creating, managing and preserving digital objects Dr D Peters DISA Digital Innovation South.
Rights Management for Shared Collections Storage Resource Broker Reagan W. Moore
Fedora Service Framework Sandy Payette, Executive Director UK Fedora Training London January 22-23, 2009.
Fedora Commons Overview and Background Sandy Payette, Executive Director UK Fedora Training London January 22-23, 2009.
Developing a Dark Archive for OJS Journals Yu-Hung Lin, Metadata Librarian for Continuing Resources, Scholarship and Data Rutgers University 1 10/7/2015.
An Introduction to Data Modeling with Fedora Thorny Staples Fedora Commons, Inc.
Joint Meeting of CSUL Committees,
Building A Repository for Digital Objects
? What is Institutional Repository for Rutgers University
Overview: Fedora Architecture and Software Features
Flexible Extensible Digital Object Repository Architecture
Flexible Extensible Digital Object Repository Architecture
An Architecture for Complex Objects and their Relationships
VI-SEEM Data Repository
Introduction to DSpace
Implementing an Institutional Repository: Part II
Implementing an Institutional Repository: Part II
How to Implement an Institutional Repository: Part II
Presentation transcript:

An Introduction to Repositories Thornton Staples Director of Community Strategy and Alliances Director of the Fedora Project

Creating a digital library is not a process of moving the traditional library online. Increasingly, it’s more about the care and feeding of the web!

Creating digital surrogates of paper collections is only the beginning Surrogate collections are an important step! Collecting born-digital materials is rapidly coming upon us Simple Institutional repository approaches are good but only scratch the surface Complex scholarly and scientific projects are the biggest challenge

Repositories are designed to be flexible and adaptable Relational databases are too rigid Need to be able to add new content types and media easily Need to be able to handle arbitrary complexity in relatively simple ways Above all, it all needs to be durable over a very long time!

Preservation and Archiving Scholars Workbench Scholars Workbench Institutional Repository Institutional Repository Data Curation Solutions Data Curation Solutions The Repository (Content abstraction) The Repository (Content abstraction) Raid Arrays Raid Arrays Tape Libraries Tape Libraries Cloud Storage Cloud Storage

Repositories are the foundation for many applications A set of abstractions that can be used to represent different kinds of data Manages the actual content beneath the surface Negotiates the connection between access and storage Designed to make data “durable” over the long term

Access is the core purpose of a repository Searching is important but it is not the only thing Finding is the point of searching! The point of finding is very often to use the resource that you have found, for analysis or reuse New digital resources that reuse found objects depend on continuing access for validity

Any unit of content may have more than one context Within one collection –An architectural image may related to more than one building Across collections –Special collections images many be art objects Across repositories –Born digital publications will almost always cross institutional boudaries

Authenticity and fidelity What is an authoritative digital surrogate of a real object? When is a copy of an original surrogate exact? A born-digital object has nothing to compare Digital “fingerprints” must be captured and managed as metadata When formats change, objects will not have all the same technical characteristics…

Making complex digital information “durable” is a very hard problem Durability implies that digital content is directly in use and sustained long-term A history of the changes to the encoding and state of content must be reliably provided A meaningful context for any unit of content may be one of many and must be sustained Replication appears to be our best friend and the could looks like an answer

Management is the core function of a repository Repositories are designed to keep everything as stable as possible while providing flexible access Managing things such that when they aren’t changing they are reliably the same Accounting for migration for technical reasons Disaster preparedness (lots of copies!) Must respect legal and policy issues

Repository abstractions provide a durability framework for managing. Content is “unitized” as information objects that combine data, metadata, policies, relationships and the history of the object. Complex digital resources are formally defined graphs of related objects. The public view of the content is presented as virtual data components.

DC Persistent ID RELS-EXT AUDIT n n Reserved Datastreams Custom Datastreams (any type, any number) A data object is one unit of content POLICY

Files are stored on disk and managed directly Versioning is necessary Checksums for each file provide assurance that they file has not changed Can be managed by the repository or as remote files

Virtual datastreams provide the access abstraction Can be simply retrieving a stored component Views of the content can be derived on demand, for different formats and resolutions Other data productions can be derived on demand; i.e. tiles from a JPEG2000 file By providing an abstract view of the content you break the dependence on the stored files

Content Access Content Management

Descriptive metadata is about the content of the resource Indexed for searching Also used for rendering user experiences Some standards in use: Dublin core - general MODS - bibliographic VRACore – cultural heritage FGDC - GIS datasets DDI – social science datsets

Administrative metadata is more about the encoding and use Metadata about the object generally, like checksums Technical metadata about the specifics of the encoding each format Event metadata, about what happens to an object over its lifetime; audit trails Policy metadata, like access restrictions and credit lines

Relationships Among Objects Describes adjacency relationships among objects, among units of content Can be done by explicitly listing IDs in XML, using METS for example or using RDF: PID – typeOfRelationship – relatedObjectPID Can used to assemble complex resources and aggregations of objects Explicit and implicit aggregations

Text Collections

Establishing and Enforcing Policies Policies must be established for the entire life-cycle of the information –Ownership and workflow policies –Access and use policies –Policies associated with sustaining (or not!)‏ Polices must be expressed for end users Policies must also be expressed for machine access

Indexing In a repository there is no “catalog”; the repository is the catalog Many indexes can be created for many reasons Either metadata or full content, or both Ontology-based indexes are rapidly becoming more feasible Keeping indexes updated is the trick

Fedora Repository Service GSearch OAI Ingest Simple JMS Simple JMS More… repository publishes events services listen and consume events or other messages Indexing as a harvesting service Blacklight