A Digital Preservation Repository for Duke University Libraries Jim Coble Digital Repository Developer Open Repositories 2013.

Slides:



Advertisements
Similar presentations
Home-Grown Digital Library System Built Upon Open Source XML Technologies and Metadata Standards David Lacy Villanova University
Advertisements

The Hydra Framework as a Series of Diagrams Naomi Dushay Stanford University Libraries April,
An Introduction to Repositories Thornton Staples Director of Community Strategy and Alliances Director of the Fedora Project.
October 28, 2003Copyright MIT, 2003 METS repositories: DSpace MacKenzie Smith Associate Director for Technology MIT Libraries.
DRS 2 one in a series of periodic updates Harvard University Library Andrea Goethals October 21, 2009 DRS = Digital Repository Service.
Transformations at GPO: An Update on the Government Printing Office's Future Digital System George Barnum Coalition for Networked Information December.
BCAD Architecture 2009 British Cartoon Archive. Projects A project to digitise and catalogue the Carl Giles Archive to current international standards.
Depositing e-material to The National Library of Sweden.
DSpace Devika P. Madalli DRTC, ISI Bangalore.
Building a Digital Library with Fedora International Conference on Developing Digital Institutional Repositories Hong Kong December 9, 2004.
Ingest and Loading DigiTool Version 3.0. Ingest and Loading 2 Ingest Agenda Ingest Overview and Introduction Ingest activity steps Transformers Task Chains.
WMS: Democratizing Data
Demonstration of repositories Fedora (Flexible Extensible Digital Object Repository Architecture) Marie Lagerwall MIDESS Partners Meeting February 9, 2007.
Progress in Access Technologies: NLM Video Search Jennifer Marill Chief, Technical Services Division Edward Luczak Systems Architect, Office of Computer.
Shared October 13, 2010 Shelf Michael Roy, Dean of Library and Information Services, Middlebury College A Networked Image Platform Jeremy Stynes, Head.
The New DRS (DRS 2) Introduction. What is DRS? Digital repository for preservation and access –Maintains integrity of deposited content –Preserves content.
Metadata: Its Functions in Knowledge Representation for Digital Collections 1 Summary.
July 29 and August 11, 2015 How CONTENTdm works: A demonstration Ron Gardner OCLC Digital Services Consultant.
Putting it all together for Digital Assets Jon Morley Beck Locey.
Greg Harris President & CEO We Can Work It Out Establishing the World’s First Rock and Roll Library.
Making the SHiFt: Using Sufia with Hydra/Fedora for collection management and access James Halliday Programmer/Analyst, Library Technologies Juliet L.
Malaysian Grid for Learning October DC 2004, Shanghai, China. © 2004 MIMOS Berhad. All Rights Reserved Metadata Management System DC2004: International.
Ingest and Dissemination with DAITSS Presented by Randy Fischer, Programmer, Florida Center for Library Automation, University of Florida DigCCurr2007.
SobekCM’s Community Ecosystems & Socio-Technical Practices Presented by Mark V. Sullivan June 10 th, 2014 Sobek image created by Jeff Dahl and is shared.
Dspace 1 Introduction to DSpace Mukesh Pund Scientist NISCAIR, New Delhi.
OCLC Online Computer Library Center CONTENTdm ® Digital Collection Management Software Ron Gardner, OCLC Digital Services Consultant ICOLC Meeting April.
International Council on Archives Section on University and Research Institution Archives Michigan State University September 7, 2005 Preserving Electronic.
Using Hydra/Fedora for digital repository infrastructure 5. September 2013 Andreas Borchsenius Westh The Royal Library, Copenhagen.
Managing the Record of Research At the Smithsonian Using SIdora SAA Research Forum August 12, 2014.
1. 2 introductions Nicholas Fischio Development Manager Kelvin Smith Library of Case Western Reserve University Benjamin Bykowski Tech Lead and Senior.
University of Washington Digital Library Project Geri Bunker The Fifth Dublin Core Metadata Workshop October 7, 1997.
Implementing an Integrated Digital Asset Management System: FEDORA and OAIS in Context Paul Bevan DAMS Implementation Manager
NLM Digital Collections Update for DCFedoraUsersGroup January 22, 2013 John Doyle National Library of Medicine.
1 XML as a preservation strategy Experiences with the DiVA document format Eva Müller, Uwe Klosa Electronic Publishing Centre Uppsala University Library,
IUScholarWorks is a set of services to make the work of IU scholars freely available. Allows IU departments, institutes, centers and research units to.
The DigiTool to FDA Program Lydia Motyka Florida Center for Library Automation.
Challenges of Digital Media Preservation Karen Cariani, Director Media Library and Archives Dave MacCarn, Chief Technologist.
Overview of IU Digital Collections Search Hui Zhang Jon Dunn Indiana University Digital Library Program IU Digital Library Brown Bag October 19, 2011.
Metadata Lessons Learned Katy Ginger Digital Learning Sciences University Corporation for Atmospheric Research (UCAR)
Dermot Frost Digital Repository of Ireland Trinity College Dublin.
1 Schema Registries Steven Hughes, Lou Reich, Dan Crichton NASA 21 October 2015.
Digital preservation activities at the NLW Sally McInnes 18 September 2009.
CONTENT DISCOVERY, SERVICES, AND SUSTAINED ACCESS Timothy Cole, William Mischo, Beth Sandore, Sarah Shreeves ~ University of Illinois Library
ETD2006 Preserving ETDs With D.A.I.T.S.S. FLORIDA CENTER FOR LIBRARY AUTOMATION FC LA PAPER AUTHORS: Chuck Thomas Priscilla.
How to Implement an Institutional Repository: Part II A NASIG 2006 Pre-Conference May 4, 2006 Technical Issues.
The New DRS Introduction. What is DRS? Digital repository for preservation and access – Maintains integrity of deposited content – Preserves content for.
Funded by: © AHDS Preservation in Institutional Repositories Preliminary conclusions of the SHERPA DP project Gareth Knight Digital Preservation Officer.
Digital Repository Service Update ___________________________ Yale University Library Roy Lechich, ILTS Audrey Novak 15 Aug 2007.
Research Data Management At the Smithsonian Using Sidora CNI December 10, 2013.
The NLW Digital Asset Management System Paul Bevan DAMS Implementation Manager
DSpace - Digital Library Software
CBRC Digital Repository: Storing and viewing 3D objects, for science! James Halliday Programmer/Analyst, Library Technologies Juliet L. Hardesty
The library is open Digital Assets Management & Institutional Repository Russian-IUG November 2015 Tomsk, Russia Nabil Saadallah Manager Business.
NLM Update and Still Image Serving April 27, 2016 John Doyle, Doron Shalvi, TA Nguyen National Library of Medicine.
Developing a Dark Archive for OJS Journals Yu-Hung Lin, Metadata Librarian for Continuing Resources, Scholarship and Data Rutgers University 1 10/7/2015.
Eliot Wilczek University Records Manager Digital Collections and Archives Tufts University Repositories: How are They Evolving? A NERCOMP Workshop September.
Rebecca L. Mugridge LFO Research Colloquium March 19, 2008.
Ingest and Dissemination with DAITSS
FLORIDA CENTER FOR LIBRARY AUTOMATION
DAITSS and the Florida Digital Archive
Bentley Project Reel Digitization Bentley Historical Library t
Introduction, Features & Technology
Flexible Extensible Digital Object Repository Architecture
Flexible Extensible Digital Object Repository Architecture
Introduction to DSpace
Library Technology Conference: Building Exhibits
Implementing an Institutional Repository: Part II
Metadata to fit your needs... How much is too much?
Implementing an Institutional Repository: Part II
How to Implement an Institutional Repository: Part II
Presentation transcript:

A Digital Preservation Repository for Duke University Libraries Jim Coble Digital Repository Developer Open Repositories 2013

Duke University  Research university in Durham, NC, USA  14,500 students, graduate and undergraduate  Duke University Libraries  Centrally administered library system  240 staff  6 million+ volumes  Professional school libraries serving schools of Business, Law, Divinity, and Medicine Open Repositories 2013

Initial Goal: Preservation Repository  Focus: Preservation Infrastructure  Improve our processes around preservation of digital assets  Reduce initial complexity by ignoring discovery and access issues  First Use Case: Digital Collections Program  Familiar with this content  Descriptive and technical metadata already exists  Separate discovery and access interface already exists Open Repositories 2013

Digital Collections Program  Digitized content, in-house and out-sourced  380,000 archival master files (~ 20 TB)  Primarily still images, with some audio and video  Locally developed public access interface  Open Repositories 2013

Current Scenario (Typical)  Archival master files  Produced by library’s Digital Production Center (DPC)  Stored on filesystem  ACE-AM for periodic checksum validation ACE-AM  Descriptive metadata  Produced by Cataloging and Metadata Services department  Maintained in CONTENTdm (or elsewhere)  Technical metadata  Generated and maintained by DPC  Nothing ties these elements together except local knowledge and a DPC identifier Open Repositories 2013

Initial Project Goal Open Repositories 2013 Descriptive Metadata Preservation Repository DPC Technical Metadata Archival Master Files

Technology  Fedora Commons Repository  Hydra Project Framework  Fedora (repository)  Solr (index)  Blacklight (discovery and access)  Hydra-Head (object creation / management) Open Repositories 2013

Resources  Experience on prior project (abandoned before production)  Fedora  Modeling digital collections content  Two developers  Part-time, though proportion of time increased throughout this project  Web application development experience (Django/Python, Java servlets)  No prior Ruby or Rails experience Open Repositories 2013

Timeline  Spring 2012: Prototype using Fedora command line utilities and Django using “found time”  June 2012: Project formally launched  July 2012: OR 2012; growing interest in Hydra Project  October 2012: HydraCamp at Penn State; Hydra-based development begins in earnest  February 2013: Initial pilot completed  April 2013: Duke becomes Hydra Partner  June 2013: Production preservation repository launched with two collections ingested Open Repositories 2013

Content Models  Collection  Collection-level descriptive metadata  Aggregated metadata about items / components in some cases  Item  Item-level descriptive metadata  Component  Digital content file (e.g., TIFF image file)  Technical metadata  Target  External digitization target image  Digital content file for target image Open Repositories 2013

Additional Models  AdminPolicy  Used in Hydra Framework to specify access rights  Individual objects are “governed by” a particular AdminPolicy  PreservationEvent  Records PREMIS Event data for …  Ingest  Ingest validation  Periodic fixity checks  Associated with object to which it applies Open Repositories 2013

Metadata Practices  Collect metadata available at time of ingest  CONTENTdm  MarcXML from library catalog  Digitization Guide from DPC  etc  Store collected metadata in its native formats in object datastreams  Normalize one set of descriptive metadata into Qualified Dublin Core for indexing and display Open Repositories 2013

Batch Ingest  Problem to solve  380,000 archival master files (~ 20 TB) spanning 8 years of digitization work  Some areas of relative consistency across the collections but also some divergences  Needed flexible batch ingest mechanism  Solution: Ingest “Manifest”  Enumerates the objects to be ingested in any given batch  Provides information about nature and location of content files, metadata, and related objects Open Repositories 2013

Ingest Manifest YAML File: Open Repositories 2013

Ingest Processor v1.0  Reads manifest file  Performs any needed pre-ingest steps  Creates a repository object for each object in turn  Adds appropriate datastreams and relationships  Creates thumbnail image from uploaded digital content  Creates Ingestion PreservationEvent  Validates each ingested object in turn  Compares repository object with manifest  Validates content file against external checksum if available  Creates Validation PreservationEvent and first Fixity Check PreservationEvent Open Repositories 2013

Validation PreservationEvent In PreservationEvent eventMetadata datastream … Open Repositories 2013

Export Sets  Example service built on top of repository infrastructure  Delivering archival master files to authorized patrons upon request  Current process is manual  DPC staff locate master file(s) on filesystem  Possibly create a zip file  Place file(s) in pick-up location or copy onto CD, DVD, etc., for delivery  Pre-Hydra prototype implementation was Django web app using Fedora REST API Open Repositories 2013

Export Sets  Built on bookmark functionality  Staff member searches for content-bearing objects of interest and bookmarks them  Export set can be created from bookmark list  Content files are retrieved from the repository and bundled into a zip file  Staff member can download and deliver to patron  Zip file includes a README manifest listing the content files with basic metadata Open Repositories 2013

Export Sets  Export sets can be named and stored for re-use  By default, zip file is also stored  Staff member can delete the zip file (to save space) and re-generate it as needed from the export set record  When no longer needed, export set record can be deleted Open Repositories 2013

Screenshot Walk-Through

Repository Home Page Open Repositories 2013

Collection Index Open Repositories 2013

Collection Content: Items Open Repositories 2013

Item Contents: Components Open Repositories 2013

Item Metadata Open Repositories 2013

Collection FCRepo View Open Repositories 2013

Creating Export Set Open Repositories 2013

Creating Export Set Open Repositories 2013

Export Set Created Open Repositories 2013

Export Set Zip File Open Repositories 2013

Future Plans  Version 1.1 – By September 2013  Interface improvements  Refactored batch ingest  Future enhancements  Ingest (batch and individual) performed by library staff  Editing capability  Future Use Cases  Faculty scholarship, electronic theses and dissertations  Electronic records and other born-digital content  Datasets  Image library for teaching / learning Open Repositories 2013

Questions? Jim Coble Digital Repository Developer Duke University Libraries Project Open Repositories 2013