Workflows for Digital Curation and Preservation Stacy Kowalczyk PASIG Dublin 2012 October 17, 2012.

Slides:



Advertisements
Similar presentations
GRADD: Scientific Workflows. Scientific Workflow E. Science laboris Workflows are the new rock and roll of eScience Machinery for coordinating the execution.
Advertisements

Focus on Your Content, Not on Ingesting Your Content Terry Brady Applications Programmer Analyst Georgetown University Library
ETD Management in the Texas Digital Library Adam Mikeal Texas Digital Library ETD 08 Aberdeen, Scotland June 6, 2008.
Data Publishing Service Indiana University Stacy Kowalczyk April 9, 2010.
OGF-23 iRODS Metadata Grid File System Reagan Moore San Diego Supercomputer Center.
E-Content Service Group Virtual Meeting Digital Preservation: How to Get Started.
An Introduction to Repositories Thornton Staples Director of Community Strategy and Alliances Director of the Fedora Project.
October 28, 2003Copyright MIT, 2003 METS repositories: DSpace MacKenzie Smith Associate Director for Technology MIT Libraries.
PREMIS: To Be or Not To Be in My METS The Preservation Journey at the University of Connecticut Libraries ALA Annual 2013 ALCTS PARS Intellectual Access.
Workflows for Digital Preservation and Curation Workshop Open Repositories 2012 Stacy Kowalczyk Beth Plale Kavitha Chandrasekar Yiming Sun.
DRS 2 one in a series of periodic updates Harvard University Library Andrea Goethals October 21, 2009 DRS = Digital Repository Service.
Interoperability and Preservation with the Hub and Spoke (HandS) Matt Cordial, Tom Habing, Bill Ingram, Robert Manaster University of Illinois Urbana-Champaign.
Interoperability and Preservation with the Hub and Spoke (HandS) Tom Habing, Bill Ingram, Robert Manaster University of Illinois Urbana-Champaign
Your Name AutoArchive?:The ADS and the SWORDARM project Catherine Hardman - Archaeology Data Service University of York White Rose/RoaDMap 24 th May 2012.
Mark J. Myers Electronic Records Archivist, KY Dept for Libraries and Archives (2001-May, 2014) Electronic Records Specialist, TX State Library and Archive.
PREMIS in Thought: Data Center for LC Digital Holdings Ardys Kozbial, Arwen Hutt, David Minor February 11, 2008.
Funded by: © AHDS Sherpa DP – a Technical Architecture for a Disaggregated Preservation Service Mark Hedges Arts and Humanities Data Service King’s College.
Fedora 3.0 and METS: A Partnership for the Organization, Presentation and Preservation of Digital Objects Open Repositories Georgia Tech, Atlanta,
ISO & OAI-PMH By Neal Harmeyer, Amy Hatfield, and Brandon Beatty PURDUE UNIVERSITY RESEARCH REPOSITORY.
Object Re-Use and Exchange Mellon Retreat, Nassau Inn, Princeton, NJ, March Herbert Van de Sompel, Carl Lagoze The OAI Object Re-Use & Exchange.
MIT’s DSpace A good fit for ETDs Margret Branschofsky Keith Glavash MIT LIBRARIES.
SOAPI: a flexible toolkit for implementing ingest and preservation workflows Mark Hedges Centre for e-Research, King’s College London Arts and Humanities.
Brief Overview of Major Enhancements to PAWN. Producer – Archive Workflow Network (PAWN) Distributed and secure ingestion of digital objects into the.
PAWN: A Novel Ingestion Workflow Technology for Digital Preservation
THE RUTGERS WORKFLOW MANAGEMENT SYSTEM Mary Beth Weber Cataloging and Metadata Services Rutgers University Libraries August 3, 2007.
Institutional Repositories Tools for scholarship Mary Westell University of Calgary AMTEC Conference May 26, 2005.
Incompatible or Interoperable? A METS bridge for a small gap between two digital preservation software packages Lucas Mak Metadata & CatalogLibrarian
Digital Asset Management for All? Visualising a Flexible DAMS Solution for Small and Medium Scale Institutions Paul Bevan Llyfrgell Genedlaethol Cymru.
Making the SHiFt: Using Sufia with Hydra/Fedora for collection management and access James Halliday Programmer/Analyst, Library Technologies Juliet L.
METS-Based Cataloging Toolkit for Digital Library Management System Dong, Li Tsinghua University Library
Adventures in Digital Asset Management: Fedora at the National Library of Wales Glen Robson National Library of Wales
San Diego Supercomputer CenterUniversity of California, San Diego Preservation Research Roadmap Reagan W. Moore San Diego Supercomputer Center
Implementing an Integrated Digital Asset Management System: FEDORA and OAIS in Context Paul Bevan DAMS Implementation Manager
1 XML as a preservation strategy Experiences with the DiVA document format Eva Müller, Uwe Klosa Electronic Publishing Centre Uppsala University Library,
The DigiTool to FDA Program Lydia Motyka Florida Center for Library Automation.
Choosing Delivery Software for a Digital Library Jody DeRidder Digital Library Center University of Tennessee.
Gathering Audio Metadata for the Monterey Jazz Festival Concerts OLAC 2006 By Nancy J. Hoebelheinrich, Stanford University Libraries.
Archival Information Packages for NASA HDF-EOS Data R. Duerr, Kent Yang, Azhar Sikander.
Creating Archive Information Packages for Data Sets: Early Experiments with Digital Library Standards Ruth Duerr, NSIDC MiQun Yang, THG Azhar Sikander,
Implementor’s Panel: BL’s eJournal Archiving solution using METS, MODS and PREMIS Markus Enders, British Library DC2008, Berlin.
Introduction to metadata
Archival Workshop on Ingest, Identification, and Certification Standards Certification (Best Practices) Checklist Does the archive have a written plan.
ETD2006 Preserving ETDs With D.A.I.T.S.S. FLORIDA CENTER FOR LIBRARY AUTOMATION FC LA PAPER AUTHORS: Chuck Thomas Priscilla.
HATHI TRUST A Shared Digital Repository Use of PREMIS for Internet Archive AIPs September 22, 2010.
How to Implement an Institutional Repository: Part II A NASIG 2006 Pre-Conference May 4, 2006 Technical Issues.
The New DRS Introduction. What is DRS? Digital repository for preservation and access – Maintains integrity of deposited content – Preserves content for.
Interoperability and Collection of Preservation Metadata for Digital Repository Content Matt Cordial, Tom Habing, Bill Ingram, Robert Manaster University.
PREMIS at the British Library Markus Enders, The British Library PREMIS Implementation Fair, San Fransisco, CA 07 October 2009.
Metadata “Data about data” Describes various aspects of a digital file or group of files Identifies the parts of a digital object and documents their content,
DSpace - Digital Library Software
The library is open Digital Assets Management & Institutional Repository Russian-IUG November 2015 Tomsk, Russia Nabil Saadallah Manager Business.
1 1 NOAA Office of Ocean Exploration End-to-End Data Management: A Success Story NOAA Tech Conference November 2005 Susan Gottfried National Coastal Data.
Implementing PREMIS in DigiTool Michael Kaplan ALA 2007 Update.
Portico’s “d-collections” preservation service Stephanie Orphan Positive trends in sustainability? Emerging approaches to archiving commercial databases.
ARIADNE is funded by the European Commission's Seventh Framework Programme Archiving and Repositories Holly Wright.
Institutional Repositories July 2007 DIGITAL CURATION creating, managing and preserving digital objects Dr D Peters DISA Digital Innovation South.
Collection Management Systems
A Project of the University Libraries Ball State University Libraries A destination for research, learning, and friends.
IUScholarWorks Repository Update Jim Halliday, Stacy Konkiel & Jennifer Laherty.
Managing live digital content with DuraSpace services Bill Branan PASIG Spring 2015.
Managing ETDs with Associated Complex Digital Objects Gabrielle V. Michalek Director, Scholarly Publishing, Archives and Data Services Carnegie Mellon.
Meeting of the Member States Expert Group on Digitisation and Digital Preservation , Luxembourg European Archival Records and Knowledge Preservation.
Archivists' Toolkit - All Hands Meeting Scope Both multilevel and single-level description Accommodates description of collections, series, sub-series,
Joint Meeting of CSUL Committees,
FLORIDA CENTER FOR LIBRARY AUTOMATION
Managing ETDs with Associated Complex Digital Objects
Implementing an Institutional Repository: Part II
Implementing an Institutional Repository: Part II
How to Implement an Institutional Repository: Part II
ArchivesSpace – Archivematica – DSpace Workflow Integration
Presentation transcript:

Workflows for Digital Curation and Preservation Stacy Kowalczyk PASIG Dublin 2012 October 17, 2012

Topics Goals A Very Brief Introduction to Workflow Systems Components for Curation Workflow Scenarios Future Work 2

Workflows for Curation Goals – Increase capacity and scalability of curation efforts – Develop distributed curation processes – Lower costs of curation activities – Improve quality with systematic and repeatable processes – Reduce human errors 3

Why Workflow Systems Repetitive and mundane activities simplified Facilitates and enforces best practices Enables efficient scheduling Machinery for coordinating the execution of services and linking together resources Facilitates outreach to researchers for direct deposit and automatic curation 4

Types of Workflow Systems 5 Kepler BPEL Ptolemy II Triana Taverna

Trident Open source project Based on Microsoft Workflow Foundation classes Supported by Microsoft Research and academic researchers Integrates with myExperiment Well accepted in the research community – well over 100 peer-reviewed and white papers were discovered from one scholarly aggregation service Graphical workflow design and execution interface 6

Trident Workflow Components Fixity Data Integrity Metadata Creation Format Normalization and Derivative Generation Persistent Identification Repository Integration 7

Fixity Components MD5 checksum generator MD5 checksum validator 8

Data Integrity Components JHOVE for format verification and validation Group validation (for object integrity) 9

Metadata Creation Components MIX data generator and validator METS data generator and validator 10

Format Components Format Conversions for normalization and derivative generation –.xlsx to.csv –.docx to.pdf –.ppt to.pdf –.tif to.jpg – Zipping on demand – Image (.tif or.jpg) to.pdf (single document and multipage) 11

Repository Component Ingest to DSpace via Sword DOI generator 12

Data Ingest Workflows Scenarios – Single part objects (individual images) – Multi-part objects (a book) – Multiple instantiations of a logical object (word, pdf and ppt of a research paper) – Multiple multi-part objects (a group of letters) – Research data products (multiple files of various types) 13

Single Part Objects 14

Single Part Objects Workflow 15 Derivative Generation Format Validation and Verification Fixity Check Create Tech Metadata Create Intellectual Metadata Create Object Metadata Persistent Identification Deposit in Repository Image Quality Checks

Single Part Objects Workflow For each original image – MD5 checksum – JHOVE validation and verification report – ImageMagick report – MIX file For each derivative file – MD5 Checksum – DOI For each logical object – DC record – METS record – Sword package 16

Multi-part Object Workflow 17

Multi-part Object Workflow Comic Book – RIS – Set of.tif files 18 Create Tech Metadata Derivative Generation Format Validation and Verification Fixity Check Object Integrity Create Intellectual Metadata Create Object Metadata Persistent Identification Deposit in Repository Image Quality Checks

Multi-part Object Workflow For each individual image file – MD5 checksum – JHOVE validation and verification report – ImageMagick report – MIX file For each derivative file – MD5 Checksum For the whole object – DOI – DC record – METS record Sword Package 19

Multiple Instantiations of a Logical Object Workflow 20

Multiple Instantiations of a Logical Object Workflow Papers – Each logical object per subdirectory – RIS, word file and (perhaps) supplemental file 21 Format Normalization Format Validation and Verification Fixity Check Create Intellectual Metadata Create Object Metadata Persistent Identification Deposit in Repository Derivative Generation

Multiple Instantiations of a Logical Object Workflow For each original object – MD5 Checksum – JHOVE report For each derivative object – MD5 Checksum – Output from normalization process – DOI for delivery object For the whole package – METS file – DC record – Sword Package 22

Multiple Multi-part Object Workflow 23

Multiple Multi-part Object Workflow Ball collection – RIS for collection and Inventory spreadsheet – Each logical object in separate subdirectory 24 Create Tech Metadata Derivative Generation Format Validation and Verification Fixity Check Create Intellectual Metadata Create Object Metadata Persistent Identification Deposit in Repository Image Quality Checks Collection Integrity Create Collection Metadata

Multiple Multi-part Object Workflow For each file – MD5 checksum – JHOVE report – MIX file – Scanning specifications – Derivative files For each logical object – Derivative object – DC record – METS file – DOIs For the whole collection – METS file – DC record 25

Research Data Products 26

Research Data Products Vortex – A subdirectory for each experiment 27 Compress Data Fixity Check Create Intellectual Metadata Create Object Metadata Persistent Identification Deposit in Repository

Research Data Products Outputs – Zipped data file – MD5 Checksum – FGDC metadata record – Dublin Core record – METS record – Sword Package 28

Post Deposit Curation Workflow Scenarios – Fixity verification – Format normalization – New or additional derivative generation – Media migration – Persistent identifier updates – Metadata updates 29

Future Work Adding additional components – EAD from spreadsheet – MARC record support – Premis support Testing in the lab – Digital library scanning labs – Research labs – Integrating with a production repository 30

Acknowledgements This research was made possible through a generous grant by Microsoft Research And by the Data to Insight Center of Indiana University’s Pervasive Technology Institute Thanks to Kavitha Chandrashankar and Quan Zhou for their help with developing components, workflows, and documentation 31

Thank you 32