CASPAR Framework and Lessons Learned David Giaretta.

Slides:



Advertisements
Similar presentations
CASPAR Validation. Metrics CASPAR Approach Representation Information (RepInfo) RepInfo Networks and their maintenance.
Advertisements

Curating Research: problems and policy Dale Peters Scientific Technical Manager DRIVER II.
Digital Preservation: Logical and bit-stream preservation using Plato and Eprints Introduction: Digital Preservation Recap Hannes Kulovits Andreas Rauber.
Long-Term Preservation. Technical Approaches to Long-Term Preservation the challenge is to interpret formats a similar development: sound carriers From.
Requirements for Long- Term Preservation David Giaretta 1 st October 2009, Helsinki.
Preservation of Software Barbara Sierman (digital preservation manager) E-Humanities Software and Tools Sustainability,
Digital Preservation and Trusted Digital Repositories Priscilla Caplan Florida Center for Library Automation ALA 2005 Chicago IL.
DigCCurr 2007: What digital curators do and what they need to know The CASPAR view on: What digital curators do and what they need to know : Research Perspectives.
Co-funded by the European Union under FP7-ICT Alliance Permanent Access to the Records of Science in Europe Network Co-ordinated by aparsen.eu #APARSEN.
Co-funded by the European Union under FP7-ICT Alliance Permanent Access to the Records of Science in Europe Network Co-ordinated by aparsen.eu #APARSEN.
Project Overview APA Conference 2012 ESA/ESRIN (Frascati), 6-7 November 2012 D. Giaretta (APA)
HATHI TRUST A Shared Digital Repository Digital Repositories for Preservation and Access Digital Directions 2013 Jeremy York July 22, 2013 Unless otherwise.
CODATA 2006, Beijing, China Oct CASPAR: Early results and future goals David Giaretta.
SCIDIP-ES services and toolkits David Giaretta. Preserving digitally encoded information Ensure that digitally encoded information are understandable.
Digital Preservation - Its all about the metadata right? “Metadata and Digital Preservation: How Much Do We Really Need?” SAA 2014 Panel Saturday, August.
PARSE.Insight Framework and Lesson Learned David Giaretta (STFC)
SCIDIP-ES Components Oct ,Brussels. Basic Preservation Strategies Often stated as: “Emulate or Migrate” OAIS concepts change these to: Add Representation.
Selecting Preservation Strategies for Web Archives Stephan Strodl, Andreas Rauber Department of Software.
Future Access to the Scientific and Cultural Heritage – A shared Responsibility Birte Christensen-Dalsgaard State and University Library.
Automatic Evaluation of Migration Quality in Distributed Networks of Converters Miguel Ferreira Supervisors Ana Alice Baptista.
1 Using Scalable and Secure Web Technologies to Design Global Format Registry Muluwork Geremew, Sangchul Song and Joseph JaJa Institute for Advanced Computer.
Preservation Seminar 8 Jan CASPAR: Long term preservation of digitally encoded information David Giaretta.
E-IRG Open Workshop on e-Infrastructures 4-5 Oct 2006 CASPAR Project Digital Preservation and Digital interoperability.
ADASS Sept Trusted Data Repositories David Giaretta STFC and Director of CASPAR and Associate Director UK Digital Curation Centre.
Co-funded by the European Union under FP7-ICT Alliance Permanent Access to the Records of Science in Europe Network Co-ordinated by aparsen.eu #APARSEN.
Long-term Archive Service Requirements draft-ietf-ltans-reqs-00.txt.
Who is doing a good job in digital preservation? Audit and Certification of Digital Repositories: ISO and the European Framework.
Different approaches to digital preservation Hilde van Wijngaarden Digital Preservation Officer Koninklijke Bibliotheek/ National Library of the Netherlands.
An Overview of Selected ISO Standards Applicable to Digital Archives Science Archives in the 21st Century 25 April 2007 Donald Sawyer - NASA/GSFC/NSSDC.
David Giaretta Associate Director (Development) Funders: DCC Development Digital Curation Centre a centre of expertise in data curation and preservation.
Ingest and Dissemination with DAITSS Presented by Randy Fischer, Programmer, Florida Center for Library Automation, University of Florida DigCCurr2007.
Project Overview APA Conference 2012 ESA/ESRIN (Frascati), 6-7 November 2012 M. Albani (European Space Agency), U.Di Giammatteo (ACS), D. Giaretta (APA)
Managing the Record of Research At the Smithsonian Using SIdora SAA Research Forum August 12, 2014.
Working Group: Practical Policy Rainer Stotzka, Reagan Moore.
Science Archives in the 21st Century 25/26 April Towards an International standard for Audit and Certification of Digital Repositories David Giaretta.
Research Data Management At the Smithsonian Using SIdora Nano Tech Working Group May 15, 2014.
How to build your own Dark Archive (in your spare time) Priscilla Caplan FCLA.
APARSEN Metadata for preservation, curation and interoperability Workshop on Research Metadata in Context 7-8 Sept 2010, Nijmegen David Giaretta APA and.
OAIS Rathachai Chawuthai Information Management CSIM / AIT Issued document 1.0.
Archival Information Packages for NASA HDF-EOS Data R. Duerr, Kent Yang, Azhar Sikander.
PREMIS Rathachai Chawuthai Information Management CSIM / AIT.
Creating documentation and metadata: Recording provenance and context Jeff Arnfield National Climatic Data Center Version a1.0 Review Date.
Digital Preservation: Current Thinking Anne Gilliland-Swetland Department of Information Studies.
JISC: Middleware for Distributed Cognition Project Team: Colin Tatham – technical lead David Gilks – database programmer Howard Noble – project manager.
Archival Workshop on Ingest, Identification, and Certification Standards Certification (Best Practices) Checklist Does the archive have a written plan.
Metadata for digital preservation: a review of recent developments Michael Day UKOLN, University of Bath ECDL2001, 5th European Conference.
OAIS Rathachai Chawuthai Information Management CSIM / AIT Issued document 1.0.
Research Data Management At the Smithsonian Using Sidora CNI December 10, 2013.
26/05/2005 Research Infrastructures - 'eInfrastructure: Grid initiatives‘ FP INFRASTRUCTURES-71 DIMMI Project a DI gital M ulti M edia I nfrastructure.
ARIADNE is funded by the European Commission's Seventh Framework Programme Archiving and Repositories Holly Wright.
Lifecycle Metadata for Digital Objects November 15, 2004 Preservation Metadata.
Institutional Repositories July 2007 DIGITAL CURATION creating, managing and preserving digital objects Dr D Peters DISA Digital Innovation South.
Data Preservation at Rutherford Lab David Corney 9 th July 2010 KEK.
Building Preservation Environments with Data Grid Technology Reagan W. Moore Presenter: Praveen Namburi.
Preservation Functionality in a Digital Archive Erik Oltmans Koninklijke Bibliotheek Raymond J. van Diessen IBM Business Consulting Services Hilde van.
BNSC Agency Report David Giaretta Colorado Springs 16 Jan 2007.
Co-funded by the European Union under FP7-ICT Alliance Permanent Access to the Records of Science in Europe Network Co-ordinated by aparsen.eu #APARSEN.
PV 2009, ESAC, Spain, 1-3 Dec Long term data and knowledge preservation for the Earth Sciences Archive S. ALBANI (ESA) D. Giaretta (STFC) PV 2009.
Co-funded by the European Union under FP7-ICT Alliance Permanent Access to the Records of Science in Europe Network aparsen.eu #APARSEN Options.
Co-ordinated by aparsen.eu #APARSEN Co-funded by the European Union under FP7-ICT Services and Sustainability David Giaretta,
DP Knowhow: Open Archival Information Systems (OAIS) in ISO APA/C-DAC International Conference on Digital Preservation and the Development of Trusted.
Components for a Science Data Infrastructure – preservation and re-use of data David Giaretta.
WP14 Common Testing Environments
Ingest and Dissemination with DAITSS
An Approach to Software Preservation
Dependency Management
David Giaretta Colorado Springs 16 Jan 2007
Building A Repository for Digital Objects
CASPAR Cultural, Artistic and Scientific knowledge for Preservation Access and Retrieval.
Digital Preservation and Trusted Digital Repositories
Presentation transcript:

CASPAR Framework and Lessons Learned David Giaretta

Overview CASPAR OAIS Threats and Solutions Validation

3 CASPAR Project EU FP6 Integrated Project Total spend approx. 16MEuro (8.8 MEuro from EU)

Digital Preservation Ensure that digitally encoded information are understandable and usable over the long term –Long term could start at just a few years Easy to make claims –Difficult to provide proof Reference Model for Open Archival Information System (ISO 14721) –The basic standard for work in digital pres. –Defines terminology and compliance criteria

5 Information Model & Representation Information The Information Model is key Recursion ends at KNOWLEDGEBASE of the DESIGNATED COMMUNITY (this knowledge will change over time and region) Information Object Representation Information 1+ interpreted using 1+ Data Object interpreted using Physical Object Digital Object Bit Sequence 1+

Basic concept of CASPAR Digital preservation had been dominated by libraries and (state) archives However there was a focus there on “rendered objects” and Tendency to think data is an “easy” add-on HOWEVER Need to deal with DATA – processed to new things, not just rendered Need to follow OAIS – finer grained view Need to test and prove that things work “metadata”

Preservation Strategies Emulation Access software Migration Transformation Description techniques

Data… Level 2 GOME Satellite instrument data

Contains numbers – need meaning 9

...to process to this 10

...or this 11

... through complex processing schemes 12

13 Just Format? sfqsftfoubujpo jogpsnbujpo svmft You have a file JHOVE tells you it is WORD version 7

..with some extra information.. 14 representation information rules Format Registries – useful but not enough: formats can be used for multiple purposes e.g. audio files used to store configuration parameters

15 Examples (cont) “504b f696….” “This is a ZIP file which contains Word files, each of which contains an encoded message which needs the key ‘!D$G^AJU*KI’ to decode it using encryption method SHA7”

16 Examples (cont) LaTex file containing an EPS (Encapulated Postscript) version of an image Web page containing Java Applet generating random numbers SWISS-PROT data Foreign Language s

17 XML enough? – can stare at this and probably understand it John Mary Paul

..but what about this? 18 <VOTABLE version="1.1" xmlns:xsi=" xsi:schemaLocation=" xmlns=" URL of data file used to create this table. Target name U0lNUExFICA9ICAgICAgICAgICAgICAgICAgICBUIC8gU3RhbmRhcmQgRklUUyBm b3JtYXQgICAgICAgICAgICAgICAgICAgICAgICAgICBCSVRQSVggID0gICAgICAg ICAgICAgICAgICAgIDggLyBDaGFyYWN0ZXIgZGF0YSAgICAgICAgICAgICAgICAg ICAgICAgICAgICAgICAgIE5BWElTICAgPSAgICAgICAgICAgICAgICAgICAgMCAv IE5vIGltYWdlLCBqdXN0IGV4dGVuc2lvbnMgICAgICAgICAgICAgICAgICAgICAg

Representation Information The Information Model is key Recursion ends at KNOWLEDGEBASE of the DESIGNATED COMMUNITY (this knowledge will change over time and region)

Representation Information Network

21 Preservation Data Flows and Strategies

Rep Info /DISCIPLINE Virtualisation

Modules and Dependencies: defining the Designated Community README.txt TEXT EDITOR ENGLISH LANGUAGE WINDOWS XP FITS FILE FITS STANDARD PDF STANDARD FITS JAVA s/w JAVA VM PDF s/w FITS DICTIONARY SPECIFICATION UNICODE SPECIFICATION XML SPECIFICATION MULTIMEDIA PERFORMANCE DATA C3D DirectXMAX/MSP 3D motion data files 3D scene data files motion to music mapping strategy

24

25

USE DATA Use application to find data in Repository Create DIP with enough RepInfo for the user (via DC profile) Obtain more RepInfo from Registry if necessary DRM Cost sharing Preservable infrastructure

ThreatRequirement for solution Users may be unable to understand or use the data e.g. the semantics, format, processes or algorithms involved Ability to create and maintain adequate Representation Information Non-maintainability of essential hardware, software or support environment may make the information inaccessible Ability to share information about the availability of hardware and software and their replacements/substitutes The chain of evidence may be lost and there may be lack of certainty of provenance or authenticity Ability to bring together evidence from diverse sources about the Authenticity of a digital object Access and use restrictions may make it difficult to reuse data, or alternatively may not be respected in future Ability to deal with Digital Rights correctly in a changing and evolving environment Loss of ability to identify the location of data An ID resolver which is really persistent The current custodian of the data, whether an organisation or project, may cease to exist at some point in the future Brokering of organisations to hold data and the ability to package together the information needed to transfer information between organisations ready for long term preservation The ones we trust to look after the digital holdings may let us down Certification process so that one can have confidence about whom to trust to preserve data holdings over the long term RepInfo toolkit, Packager and Registry – to create and store Representation Information. In addition the Orchestration Manager and Knowledge Gap Manager help to ensure that the RepInfo is adequate. Registry and Orchestration Manager to exchange information about the obsolescence of hardware and software, amongst other changes. The Representation Information will include such things as software source code and emulators. Authenticity toolkit will allow one to capture evidence from many sources which may be used to judge Authenticity. Digital Rights and Access Rights tools allow one to virtualise and preserve the DRM and Access Rights information which exist at the time the Content Information is submitted for preservation. Persistent Identifier system: such a system will allow objects to be located over time. Orchestration Manager will, amongst other things, allow the exchange of information about datasets which need to be passed from one curator to another. The Audit and Certification standard to which CASPAR has contributed will allow a certification process to be set up.

Accelerated Lifetime tests As part of the validation the CASPAR tested simulated the following: hardware changes software changes changes in the environment (including legal framework) changes to the knowledge bases of the Designated Communities

Test scenarios vs Threats to digital preservation

STFC Testbed – various STP data

ESA testbed

UNESCO testbed The Villa Livia dataset is a collection of files used within the "virtual museum of the ancient Via Flaminia" project: a 3D reconstruction of several archaeological sites along the ancient Via Flaminia, the largest of them being Villa Livia

This is an elevation grid (height map) of the area where Villa Liva is located. It is an ASCII file in the ESRI GRID file format

Contemporary Art Testbed

Performance Viewer: side-by-side comparison and validation of the transformation. From left to right: 3D visualization in Ogre3D, 3D model of the stage including the virtual dancer in VRML.

Figure 8 Some aspects of acousmatic production

CASPAR Validation In all cases members of the Designated Community, with appropriate changes to mimic changes over time, verified that the metadata was adequate for the use despite simulated changes of hardware, software, environment and Designated Community over time. Full details are available in the validation report (CASPAR Validation report, 2009)

Links CASPAR – CASPAR Source code - OAIS Reference Model - and the updated draft is available from px CASPAR Validation report validation-evaluation-report/at_download/file PARSE.Insight: – Alliance for Permanent Access: – Digital Curation Centre: – 38

FUTURE Users may be unable to understand or use the data e.g. the semantics, format, processes or algorithms involved Non-maintainability of essential hardware, software or support environment may make the information inaccessible The chain of evidence may be lost and there may be lack of certainty of provenance or authenticity Access and use restrictions may not be respected in the future Loss of ability to identify the location of data The current custodian of the data, whether an organisation or project, may cease to exist at some point in the future The ones we trust to look after the digital holdings may let us down

END