Presentation is loading. Please wait.

Presentation is loading. Please wait.

Funded by: © AHDS Sherpa DP – a Technical Architecture for a Disaggregated Preservation Service Mark Hedges Arts and Humanities Data Service King’s College.

Similar presentations


Presentation on theme: "Funded by: © AHDS Sherpa DP – a Technical Architecture for a Disaggregated Preservation Service Mark Hedges Arts and Humanities Data Service King’s College."— Presentation transcript:

1 Funded by: © AHDS Sherpa DP – a Technical Architecture for a Disaggregated Preservation Service Mark Hedges Arts and Humanities Data Service King’s College London

2 Funded by: © AHDS Digital repositories: Dealing with the digital deluge, Manchester, 5 June 2007 SHERPA DP Project Development Partners: AHDS at King’s College London (Lead), Nottingham, Glasgow, Edinburgh, White Rose Consortium, London Leap Consortium Objective: To create a shared, distributed preservation environment for the SHERPA project framed around the OAIS Reference Model. Notes: Participating repositories all based on DSpace or EPrints. Relatively simple data objects (eprints).

3 Funded by: © AHDS Digital repositories: Dealing with the digital deluge, Manchester, 5 June 2007 Distributed OAIS Model

4 Funded by: © AHDS Digital repositories: Dealing with the digital deluge, Manchester, 5 June 2007 Distributed Workflow

5 Funded by: © AHDS Digital repositories: Dealing with the digital deluge, Manchester, 5 June 2007 System Architecture

6 Funded by: © AHDS Digital repositories: Dealing with the digital deluge, Manchester, 5 June 2007 Key preservation actions at ingest Integrity/fixity checks. File format identification. Preservation metadata creation. Implement preservation strategy File format normalisation. Others …

7 Funded by: © AHDS Digital repositories: Dealing with the digital deluge, Manchester, 5 June 2007 Requirements Scalability: need to handle increasingly large quantities of data Generation and management of extensive set of preservation metadata Audit trail/provenance metadata: knowledge held in explicit machine- processable form

8 Funded by: © AHDS Digital repositories: Dealing with the digital deluge, Manchester, 5 June 2007 More Requirements Distributed architecture Integration of specialised tools Follow standards to allow flexible integration of future tools Automate workflow where possible, but also allow human interaction

9 Funded by: © AHDS Digital repositories: Dealing with the digital deluge, Manchester, 5 June 2007 Approach Web services encapsulating preservation actions Web interface for points in the process where human input required Linked by workflow management tool

10 Funded by: © AHDS Digital repositories: Dealing with the digital deluge, Manchester, 5 June 2007 Workflow management Large number of tools available –Taverna –BPEL (Active BPEL) –jBPM –others … Settled on jBPM

11 Funded by: © AHDS Digital repositories: Dealing with the digital deluge, Manchester, 5 June 2007 jBPM Web services and UI functions chained together to form a workflow or “Business Process” Open source, flexible, extensible workflow management system Bridges the gap between users and developers by giving them a common language Packaged as a J2EE application - can run on any J2EE application server like JBoss, Tomcat, etc.

12 Funded by: © AHDS Digital repositories: Dealing with the digital deluge, Manchester, 5 June 2007 Preservation Metadata Approach based on PREMIS data dictionary PREMIS data model based on five categories: intellectual entities, objects, agents, events, rights Implementing a subset of this model … with some format-specific extensions (e.g. MIX for images)

13 Funded by: © AHDS Digital repositories: Dealing with the digital deluge, Manchester, 5 June 2007 Available Tools Stand-alone specialised tools that perform preservation-related tasks File format identification, e.g. DROID-PRONOM –Developed by The National Archives –Identification of file formats based on their file signatures Technical metadata generation, e.g. JHOVE –Extensible framework for format validation –Perform format-specific identification, validation, and characterization of a digital object File format migration tools (e.g. XENA, Open Office)

14 Funded by: © AHDS Digital repositories: Dealing with the digital deluge, Manchester, 5 June 2007 Available tools and workflow Tools written in different languages Define generic interfaces for preservation actions Wrap the tools used as web services to promote: –Interoperability –Loose coupling, flexibility –Reusability

15 Funded by: © AHDS Digital repositories: Dealing with the digital deluge, Manchester, 5 June 2007 Workflow in jBPM

16 Funded by: © AHDS Digital repositories: Dealing with the digital deluge, Manchester, 5 June 2007 jBPM (jPDL)

17 Funded by: © AHDS Digital repositories: Dealing with the digital deluge, Manchester, 5 June 2007 Node and ActionHandler

18 Funded by: © AHDS Digital repositories: Dealing with the digital deluge, Manchester, 5 June 2007 Workflow Inputs & Outputs

19 Funded by: © AHDS Digital repositories: Dealing with the digital deluge, Manchester, 5 June 2007 Workflow Outputs Multiple METS packages (atomic model), each containing (some of): –data –Descriptive metadata –PREMIS object metadata (technical) –PREMIS event metadata –PREMIS relationship metadata –Format-specific technical metadata (e.g. MIX)

20 Funded by: © AHDS Digital repositories: Dealing with the digital deluge, Manchester, 5 June 2007 Fedora object model

21 Funded by: © AHDS Digital repositories: Dealing with the digital deluge, Manchester, 5 June 2007 Issues with automation Preserving content – what do we actually want to preserve? Significant properties – soft concept, hard to quantify (INSPECT) Lack of suitable tools – expensive, outputs unreliable

22 Funded by: © AHDS Digital repositories: Dealing with the digital deluge, Manchester, 5 June 2007 Next Steps SHERPA DP 2 (2007-2008), looking at: - Additional repository types - More complex object types - different methods of data transfer Generalise system Add post-ingest preservation actions Add semantics for dynamic service discovery Resource discovery metadata generation

23 Funded by: © AHDS Digital repositories: Dealing with the digital deluge, Manchester, 5 June 2007 Questions Contact: mark.hedges@ahds.ac.uk


Download ppt "Funded by: © AHDS Sherpa DP – a Technical Architecture for a Disaggregated Preservation Service Mark Hedges Arts and Humanities Data Service King’s College."

Similar presentations


Ads by Google