Funded by: © AHDS Sherpa DP – a Technical Architecture for a Disaggregated Preservation Service Mark Hedges Arts and Humanities Data Service King’s College London
Funded by: © AHDS Digital repositories: Dealing with the digital deluge, Manchester, 5 June 2007 SHERPA DP Project Development Partners: AHDS at King’s College London (Lead), Nottingham, Glasgow, Edinburgh, White Rose Consortium, London Leap Consortium Objective: To create a shared, distributed preservation environment for the SHERPA project framed around the OAIS Reference Model. Notes: Participating repositories all based on DSpace or EPrints. Relatively simple data objects (eprints).
Funded by: © AHDS Digital repositories: Dealing with the digital deluge, Manchester, 5 June 2007 Distributed OAIS Model
Funded by: © AHDS Digital repositories: Dealing with the digital deluge, Manchester, 5 June 2007 Distributed Workflow
Funded by: © AHDS Digital repositories: Dealing with the digital deluge, Manchester, 5 June 2007 System Architecture
Funded by: © AHDS Digital repositories: Dealing with the digital deluge, Manchester, 5 June 2007 Key preservation actions at ingest Integrity/fixity checks. File format identification. Preservation metadata creation. Implement preservation strategy File format normalisation. Others …
Funded by: © AHDS Digital repositories: Dealing with the digital deluge, Manchester, 5 June 2007 Requirements Scalability: need to handle increasingly large quantities of data Generation and management of extensive set of preservation metadata Audit trail/provenance metadata: knowledge held in explicit machine- processable form
Funded by: © AHDS Digital repositories: Dealing with the digital deluge, Manchester, 5 June 2007 More Requirements Distributed architecture Integration of specialised tools Follow standards to allow flexible integration of future tools Automate workflow where possible, but also allow human interaction
Funded by: © AHDS Digital repositories: Dealing with the digital deluge, Manchester, 5 June 2007 Approach Web services encapsulating preservation actions Web interface for points in the process where human input required Linked by workflow management tool
Funded by: © AHDS Digital repositories: Dealing with the digital deluge, Manchester, 5 June 2007 Workflow management Large number of tools available –Taverna –BPEL (Active BPEL) –jBPM –others … Settled on jBPM
Funded by: © AHDS Digital repositories: Dealing with the digital deluge, Manchester, 5 June 2007 jBPM Web services and UI functions chained together to form a workflow or “Business Process” Open source, flexible, extensible workflow management system Bridges the gap between users and developers by giving them a common language Packaged as a J2EE application - can run on any J2EE application server like JBoss, Tomcat, etc.
Funded by: © AHDS Digital repositories: Dealing with the digital deluge, Manchester, 5 June 2007 Preservation Metadata Approach based on PREMIS data dictionary PREMIS data model based on five categories: intellectual entities, objects, agents, events, rights Implementing a subset of this model … with some format-specific extensions (e.g. MIX for images)
Funded by: © AHDS Digital repositories: Dealing with the digital deluge, Manchester, 5 June 2007 Available Tools Stand-alone specialised tools that perform preservation-related tasks File format identification, e.g. DROID-PRONOM –Developed by The National Archives –Identification of file formats based on their file signatures Technical metadata generation, e.g. JHOVE –Extensible framework for format validation –Perform format-specific identification, validation, and characterization of a digital object File format migration tools (e.g. XENA, Open Office)
Funded by: © AHDS Digital repositories: Dealing with the digital deluge, Manchester, 5 June 2007 Available tools and workflow Tools written in different languages Define generic interfaces for preservation actions Wrap the tools used as web services to promote: –Interoperability –Loose coupling, flexibility –Reusability
Funded by: © AHDS Digital repositories: Dealing with the digital deluge, Manchester, 5 June 2007 Workflow in jBPM
Funded by: © AHDS Digital repositories: Dealing with the digital deluge, Manchester, 5 June 2007 jBPM (jPDL)
Funded by: © AHDS Digital repositories: Dealing with the digital deluge, Manchester, 5 June 2007 Node and ActionHandler
Funded by: © AHDS Digital repositories: Dealing with the digital deluge, Manchester, 5 June 2007 Workflow Inputs & Outputs
Funded by: © AHDS Digital repositories: Dealing with the digital deluge, Manchester, 5 June 2007 Workflow Outputs Multiple METS packages (atomic model), each containing (some of): –data –Descriptive metadata –PREMIS object metadata (technical) –PREMIS event metadata –PREMIS relationship metadata –Format-specific technical metadata (e.g. MIX)
Funded by: © AHDS Digital repositories: Dealing with the digital deluge, Manchester, 5 June 2007 Fedora object model
Funded by: © AHDS Digital repositories: Dealing with the digital deluge, Manchester, 5 June 2007 Issues with automation Preserving content – what do we actually want to preserve? Significant properties – soft concept, hard to quantify (INSPECT) Lack of suitable tools – expensive, outputs unreliable
Funded by: © AHDS Digital repositories: Dealing with the digital deluge, Manchester, 5 June 2007 Next Steps SHERPA DP 2 ( ), looking at: - Additional repository types - More complex object types - different methods of data transfer Generalise system Add post-ingest preservation actions Add semantics for dynamic service discovery Resource discovery metadata generation
Funded by: © AHDS Digital repositories: Dealing with the digital deluge, Manchester, 5 June 2007 Questions Contact: