Download presentation
Presentation is loading. Please wait.
1
Hydra, research data and Archivematica
6/9/2018 Hydra, research data and Archivematica Julie Allinson (York) and Richard Green (Hull) Hydra Connect 2016– Boston, MA – 5th October 2016
2
Research data management
Why do we need digital preservation for research data? Often unique and irreplaceable May be needed to validate conclusions reached in publications May have a high-level mandate to do so May have potential for re-use Doesn’t our repository do that? Many (most?) repositories are designed for medium-term curation and access rather than long-term preservation Preservation implies actively taking steps to increase the chances of enabling meaningful re-use in the future
3
Enter Archivematica Archivematica is “a web- and standards-based, open-source application which allows your institution to preserve long- term access to trustworthy, authentic and reliable digital content.” “…in compliance with the ISO-OAIS functional model.” York and Hull both keen to see how Archivematica might potentially fit into our preservation workflows
4
Jisc “Research Data Spring” funding
Late 2014 Jisc made grant funding available for universities to investigate ways of managing research data (to satisfy a Government mandate) Hull and York jointly awarded money to look at the role Archivematica might play in an RDM workflow: “Filling the Digital Preservation Gap”
5
Jisc “Research Data Spring” funding
Led to three phase project (each phase bid for separately!): Phase 1 : Desk research Phase 2 : Work with Archivematica and PRONOM teams to make Archivematica a better fit to the need – the project actively funded development work - and to develop local implementation plans for integrated systems Phase 3 : Further improvements (particularly to file format identification via DROID) and build proof-of-concept (p-o-c) systems at Hull and York
6
Jisc Research Data Shared Service
Almost in parallel with the Research Data Spring projects Jisc were planning a Research Data Shared Service The resulting system will be managed and hosted, and will offer three core modules : repository, preservation and reporting Phase 1 and 2 reports from Hull and York very influential for the preservation module Commercial and open source offerings for each module, including Archivematica (for preservation) and Hydra (for the repo) Over 20 pilot institutions recruited (including York) – all identified preservation as a priority
8
York p-o-c implementation
York wanted to provide: an easy way of depositing data a way of monitoring datasets for RDM staff a way of requesting access to data with: data sent to archivematica dataset metadata pulled from PURE
9
York p-o-c implementation
Metadata from PURE pulled in nightly or on-demand Visual representation of workflow status Fedora objects created for the dataset to store local admin info and help connect the PURE and Archivematica records
10
York PCDM modelling Dataset = Dataset record from PURE Dataset can be made up of multiple ‘Packages’ of data, eg. newer version Individual data files stored, but folder structure is not Folder structure available in Archivematica METS
11
York p-o-c outputs Code:
Data model (draft): I5aStRr3nlBqqj5BzkI3eFZY4zNjOmo8w/edit?usp=sha ring
12
What next in York? Our RDM staff love the p-o-c and we have agreement to turn it into a production system over the autumn/winter This has been a helpful exercise for broader data modelling / Hydra implementation at York
13
Hull p-o-c implementation
Hull keen to make Archivematica part of a workflow for any type of repository content – not just research data. You may have seen a poster at Hydra Connect last year: Hull’s p-o-c implements most of the automated bulk ingest route, creates AIP(s) and builds repository objects from the DIP(s)
14
Hull p-o-c implementation
User assembles files and simple descriptive file(s) in Box folder. Shares the folder with Archivematica System checks folder contents and if OK creates a bag (BagIt standard) for each object which is passed to Archivematica Archivematica processes the bag to create an AIP which goes to a preservation store… …and also a DIP which is passed to the DIP processor DIP processor creates Hydra objects from the DIP contents and injects them into the repository QA queue… …matched to the AIP by UUID Thanks to Cottage Labs for all the new development work!
15
Hull p-o-c options Depositors have several options:
A folder containing multiple data files and one descriptive file a single AIP and a single repository object with (optionally) one or more surrogate files for download (so can be a “metadata-only” record) A folder containing multiple files and a csv file (one row per file) multiple AIPs with multiple repository objects, each with (optionally) a surrogate for download A folder containing the top-level folder of a structure a zipped structure in a single AIP and a single repository object (optionally) containing the zipped file for download
16
What’s next in Hull? We hope to be able to take the p-o-c work and turn it into a production system Hull is the UK’s “City of Culture” next year and there will be a great deal of digital material that the University Archives want to capture for posterity
17
Digital Preservation Awards 2016
We’re flattered that the project has been nominated as one of the three finalists for the Digital Preservation Awards “Research and Innovation Category” this year
18
Project reports The project reports for “Filling the Digital Preservation Gap” are available through Hull’s repository hydra.hull.ac.uk - search for the project title The phase one and two reports are there now, phase three by the end of the month
19
Thanks! Questions? julie.allinson@york.ac.uk
Hull github: Colleagues Chris Awre (Interim Librarian, University of Hull) Jenny Mitcham (Digital Archivist, University of York) Simon Wilson (University Archivist, University of Hull) Archivematica -
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.