BNSC Report Fall 2007 David Giaretta.


Similar presentations
Criteria for the trustworthiness of data centres Jens Klump Helmholtz Centre Potsdam German Research Centre for Geosciences (GFZ) DataCite Summer Meeting.

IBM Haifa Research Lab © 2008 IBM Corporation Contacts: Simona Cohen, Michael Factor, Dalit Naor
CASPAR Validation. Metrics CASPAR Approach Representation Information (RepInfo) RepInfo Networks and their maintenance.
The PREMIS Data Dictionary Michael Day Digital Curation Centre UKOLN, University of Bath JORUM, JISC and DCC.
Preservation Metadata Initiatives: Practicality, Sustainability, and Interoperability Michael Day UKOLN, University of Bath ERPANET Training.
Co-funded by the European Union under FP7-ICT Co-ordinated by #APARSEN Sustainability and the APARSEN Network of Excellence: Preservation.
Digital Preservation and Trusted Digital Repositories Priscilla Caplan Florida Center for Library Automation ALA 2005 Chicago IL.
DigCCurr 2007: What digital curators do and what they need to know The CASPAR view on: What digital curators do and what they need to know : Research Perspectives.
Project Overview APA Conference 2012 ESA/ESRIN (Frascati), 6-7 November 2012 D. Giaretta (APA)
CODATA 2006, Beijing, China Oct CASPAR: Early results and future goals David Giaretta.
SCIDIP-ES services and toolkits David Giaretta. Preserving digitally encoded information Ensure that digitally encoded information are understandable.
Digital Preservation - Its all about the metadata right? “Metadata and Digital Preservation: How Much Do We Really Need?” SAA 2014 Panel Saturday, August.
SCIDIP-ES Components Oct ,Brussels. Basic Preservation Strategies Often stated as: “Emulate or Migrate” OAIS concepts change these to: Add Representation.
Future Access to the Scientific and Cultural Heritage – A shared Responsibility Birte Christensen-Dalsgaard State and University Library.
E-IRG Open Workshop on e-Infrastructures 4-5 Oct 2006 CASPAR Project Digital Preservation and Digital interoperability.
ADASS Sept Trusted Data Repositories David Giaretta STFC and Director of CASPAR and Associate Director UK Digital Curation Centre.
Co-funded by the European Union under FP7-ICT Alliance Permanent Access to the Records of Science in Europe Network Co-ordinated by #APARSEN.
Who is doing a good job in digital preservation? Audit and Certification of Digital Repositories: ISO and the European Framework.
Different approaches to digital preservation Hilde van Wijngaarden Digital Preservation Officer Koninklijke Bibliotheek/ National Library of the Netherlands.
An Overview of Selected ISO Standards Applicable to Digital Archives Science Archives in the 21st Century 25 April 2007 Donald Sawyer - NASA/GSFC/NSSDC.
Metadata for preservation Michael Day, UKOLN, University of Bath Chinese-European Workshop on Digital Preservation,
David Giaretta Associate Director (Development) Funders: DCC Development Digital Curation Centre a centre of expertise in data curation and preservation.
San Diego Supercomputer CenterUniversity of California, San Diego Preservation Research Roadmap Reagan W. Moore San Diego Supercomputer Center
Science Archives in the 21st Century 25/26 April Towards an International standard for Audit and Certification of Digital Repositories David Giaretta.
1st Workshop on Intelligent and Knowledge oriented Technologies Universal Semantic Knowledge Middleware Marek Paralič,
BNSC Report Fall 2007 David Giaretta. CASPAR Consortium Integrated project Total spend 16MEuro.
OAIS Open Archival Information System. “Content creators, systems developers, custodians, and future users are all potential stakeholders in the preservation.
Reference Model for an Open Archival Information System (OAIS) ESIP Summer Meeting John Garrett – ADNET Systems at NASA/GSFC ESIP Summer Meeting.
Archival Information Packages for NASA HDF-EOS Data R. Duerr, Kent Yang, Azhar Sikander.
Digital Preservation: Current Thinking Anne Gilliland-Swetland Department of Information Studies.
Small steps and lasting impact: making a start with preservation or It’s not all NASA Patricia Sleeman Digital Archives and Repositories University of.
Metadata for digital preservation: a review of recent developments Michael Day UKOLN, University of Bath ECDL2001, 5th European Conference.
OAIS Rathachai Chawuthai Information Management CSIM / AIT Issued document 1.0.
April 12, 2005 WHAT DOES IT MEAN TO BE AN ARCHIVES? Trusted Digital Repository Model Original Presentation by Bruce Ambacher Extended by Don Sawyer 12.
NDSR Boston webinar: Digital Preservation Introduction Presenter: Nancy Y McGovern October 2015.
The OAIS Reference Model Michael Day, Digital Curation Centre UKOLN, University of Bath Reference Models meeting,
Preservation metadata and the Cedars project Michael Day UKOLN: UK Office for Library and Information Networking University of Bath
Lifecycle Metadata for Digital Objects November 15, 2004 Preservation Metadata.
Data Preservation at Rutherford Lab David Corney 9 th July 2010 KEK.
An overview of the Reference Model for an Open Archival Information System (OAIS) Michael Day, Digital Curation Centre UKOLN, University.
Building Preservation Environments with Data Grid Technology Reagan W. Moore Presenter: Praveen Namburi.
BNSC Agency Report David Giaretta Colorado Springs 16 Jan 2007.
DP Knowhow: Introduction to Audit and Certification in ISO APARSEN-EGI Community Workshop on Managing, Computing and Preserving Big Data for Research.
A Semi-Automated Digital Preservation System based on Semantic Web Services Jane Hunter Sharmin Choudhury DSTC PTY LTD, Brisbane, Australia Slides by Ananta.
INFORMATION SYSTEMS SERVICES UNIVERSITY OF LEEDS ERPANET: OAIS Seminar Copenhagen - København 28th November 2002 Introducing the OAIS Model _________________________________.
Co-funded by the European Union under FP7-ICT Alliance Permanent Access to the Records of Science in Europe Network #APARSEN Options.
DP Knowhow: Open Archival Information Systems (OAIS) in ISO APA/C-DAC International Conference on Digital Preservation and the Development of Trusted.
An Approach to Software Preservation
Trusted Repository Systems Overview
Hannes Kulovits, Andreas Rauber Vienna University of Technology
David Giaretta Colorado Springs 16 Jan 2007
OAIS Producer (archive) Consumer Management
RLG Digital Certification Task Force
Science Data and Knowledge Preservation CASPAR and PARSE.Insight
Trustworthiness of Preservation Systems
An Introduction to Tessella and The Safety Deposit Box Platform
Data Ingestion in ENES and collaboration with RDA
Digital Repository Audit and Certification BOF
Preservation DataStores - Storage Assist for Preservation Environments
CASPAR Cultural, Artistic and Scientific knowledge for Preservation Access and Retrieval.
Active Data Management in Space 20m DG
C2CAMP (A Working Title)
Metadata for preservation
Metadata for digital long-term preservation
Open Archival Information System
Digital Preservation and Trusted Digital Repositories
Robin Dale RLG OAIS Functionality Robin Dale RLG
Presentation transcript:

BNSC Report Fall 2007 David Giaretta

CASPAR Consortium Integrated project Total spend 16MEuro Brief introduction to the CASPAR consortium – not all of whom are represented here.

…CASPAR Strongly based on OAIS Passed 1st year EU review

CASPAR Aims Produce tools and techniques to support digital preservation and make it easier to share the cost must be relatively easy to use must have a low “buy-in” in terms of effort required for adoption must avoid requiring wholesale change of everyone else’s systems must be decentralised and reproducible so that it can live on after the formal end of the CASPAR project must be “preservable” must be open: open source, open standards Cannot do everything Working closely with other projects It is I think very important for us to recognise a number of very important constraints on what we do – IF we are serious about wanting a wide take-up of our tools and techniques …..

Validation How can we judge any proposed solution? CASPAR validation metrics: Theoretic underpinning Testbed scenarios addressing real issues No “hand-waving” – use what is there now Accelerated lifetime tests Hardware and Software Environment People Improved “trustability”/”certifiability” Live a long time It is easy to propose something and wave ones hands But how CAN we judge any proposed solution OTHER than …. We believe that this is an important topic which we will return to In summary we propose: Theory Testbed scenarios – and here we look at actual example – no hand–waving (well… not much) – no saying that we could just do THIS [wave hands] for example just convert everything to XML and everything is OK. The really difficult steps involve addressing REAL issues and then showing that the digitally encoded information remains UNDERSTANDABLE/USABLE Note that we cannot provide ABSOLUTE proof – only EVIDENCE Evidence - not proof

Virtualisation CASPAR information flow architecture Rep Info Introducing the layered view of CASPAR which points out that we need to deal with more than RepInfo e.g. Digital Rights etc. Follow the “life-cycle” description in the CM. The items in the red ellipse are the RepInfo we have been talking about previously. The virtualisation is introduced in order to help with automation i.e. we need programmes to process the bytes – how can we make this easier? CASPAR information flow architecture Virtualisation

Orchestration Gap Manager Data Source Data Curator RegRep RepInfo toolkit Registry Repository User Application INFRASTRUCTURE ELEMENTS

Preservation Aware Storage and Preservation DataStores Preservation Aware Storage - The storage component of a digital preservation system that has built-in support for both bit preservation and logical preservation. Presevation DataStores (PDS) is a new OAIS-based preservation-aware storage. It offloads functionality to the storage layer Decrease the probability of data loss Simplify the applications Provide improved performance and robustness Utilize locality properties Compute data intensive functions internally e.g. fixity Provide better support for links among objects

Preservation Aware Storage Functionality Rational Physically co-locate the Information Object (AIP). However, this is relaxed if the AIP data already resides in an existing archive Ensure metadata is never lost when raw data survives Execute data intensive functions at the storage component: fixity computations and validation data transformation Utilize the data locality property Lessen data transfers to applications Handle technical provenance events internally E.g. migration and copy occurs at the storage Simplify applications Support the loading and execution of external transformations Ideally performed during bit-migration performed close to data

Preservation Aware Storage Functionality (Cont.) Rational Maintain referential integrity Update links during migration Ideally done during migration Ensure readability of the data by a different system in the future. Support global self-described formats Interaction with backend storage Support media migration Load and execute transformations Portable export format Support a graceful loss of data Self-describing self-contained media format Minimize the effect of media loss/corruption

Preservation Web Services Preservation DataStore PDS Architecture Preservation Web Services AIP Preservation DataStore Ingest, Access, Administration, … Preservation Engine Layer Applications Layered approach Prototype based on open standards OAIS, XAM, OSD Generic gradual mapping from logical to physical object Independent of physical storage Independent of stored data type Scalable XAM Layer Object/File Layer backend

Preservation Web Services Preservation DataStore PDS Architecture Preservation Web Services AIP Preservation Engine Layer Preservation DataStore RepInfo Mgr PDI Mgr Preservation WSDL Migration Mgr Placement Mgr Ingest, Access, Administration, … Preservation Engine Applications XAM API XAM Layer XAM Library VIM API VIM API XAM to FS XAM to OSD WAS CE posix I/O sockets File System HL OSD + Object Store web service Security Admin HL OSD Object Layer backend

Preservation DataStores Preservation DataStores are OAIS-based preservation aware storage API covers different options for ingest and access, configure policies and enables updates of AIPs and PDS code Prototype implements mainly ingest and access using web services References “Towards OAIS-Based Preservation Aware Storage - A White Paper“. “The Need for Preservation Aware Storage - A Position Paper". ACM SIGOPS Operating Systems Review, Special Issue on File and Storage Systems, Volume 41, Issue 1 (Jan 2007), pp 19-23. “Preservation DataStores: Architecture for Preservation Aware Storage”, to appear in 24th IEEE Conference on Mass Storage Systems and Technologies (MSST), 2007. Web site -

Virtualisation - building up data types… Spectrum Earth Observation image Astronomical image Time Series 3-D data Image Vector Data Value Building up data types from individual values to complex data structures to specialised domain data types

Content dependent components Representation Information tools Structure EAST DRB DFDL Virtualisation assistant Semantics RDF editors RDFSuite Terminology capture Software UVC Hardware emulators Trust, Authenticity & Provenance tools Certification assistant PREMIS Packaging tools XFDU toolkit Use existing tools where applicable Develop new tools as needed and resources allow The difficuties have been isolated in the various tool-kits Here are some examples of the sorts of tools we will be using and developing Some of the tools will need their own RepInfo. We will use existing tools where applicable Develop new tools as needed – as part of iterative process

Strawman Architecture…

…CASPAR Architecture Overview

CASPAR meets OAIS - 2 This is the most important overview of the CASPAR Architecture Deliverable: OAIS Functional Components and their responsibilities CASPAR Key Components match (part of) OASI Functional Components That analysis has allowed to lay the foundation for the CASPAR Architecture modelling

OAIS Information Model and CASPAR API

OAIS Information Model Capture in UML diagrams Add “obvious” methods get/set for sub-components e.g. we know AIP has PDI so need get/setPDI Add “best guess” methods Iterators over contents May need to change

Summary The Conceptual Model is based on OAIS and works out some implications It suggests area of Research Intelligibility Structure Virtualisation Authenticity It leads into the Architecture which is Broadly applicable Is useful not just for Preservation but also interoperability Note - Registry/Repository of Representation Information

Digital Curation Centre DCC Development closely linked to CASPAR Other linked JISC funded projects: SCARP Significant properties of software …may be others

Audit and Certification

The need for Trustable Repositories Task Force on Archiving of Digital Information (1996) declared, “a critical component of digital archiving infrastructure is the existence of a sufficient number of trusted organizations capable of storing, migrating, and providing access to digital collections.” “a process of certification for digital archives is needed to create an overall climate of trust about the prospects of preserving digital information.” A recurring request in many subsequent studies and workshops

Trusted Digital Repositories Invited group, hosted by Research Library Group (RLG) Concerned with organisational and financial issues Trusted Digital Repositories: Attributes and Responsibilities (TDR)

Critique of TRAC Closed process Single review of draft document Many changes based on unpublished “test audits” Underplays “understandability” Important for data Assumed not to be important for “documents” Simple list – Do ALL boxes have to be ticked? What does a “tick” mean anyway? Link to other standards ISO 17799/27001 for security (overlap with TRAC section C) ISO 9000 – say what you do and do what you say but impractical to demand multiple independent audits

ISO process status New group set up with the primary aim of producing an ISO standard Repository Audit and Certification (RAC) OPEN process Wiki open to all Mailing list open to all Virtual meetings normally every week See Into ISO via CCSDS – same route as OAIS Some organisational/procedural changes in CCSDS Currently a Birds of a Feather (BoF) group To demonstrate adequate support for the work Subsequently should become a Working Group Documents agreed by the WG will then be reviewed by CCSDS and more broadly via international ISO review process

Current status Reviewing and comparing TRAC NESTOR DCC documents Do we need another ISO standard? Could we could simply add to existing standards e.g. ISO 27001 The view is that ISO 27001 CANNOT be modified adequately It’s view of Information is too limited Started drafting a straw man document Taking TRAC and add concepts from other docs

Key Issues How to get from a checklist to an international accreditation/ certification system? Evidence – short term Evidence – long term The real crunch! Quantification The marking system Levels of audit? External review Internal maturity

The Market Transparency Trustable? What cost? certified by whom? to what level? what evidence? for what Designated Community relevant/sensible? What cost?

Links RAC group Wiki: TRAC document Digital Curation Centre CASPAR project EU project on digital preservation – Science, Culture and Arts data Infrastructure, tools and detailed case studies – what does one need to actually “understand” the data?

Alliance for Permanent Access Members: Science and Technology Facilities Council Koninklijke Bibliotheek Deutsche Nationalbibliothek Max Planck Gesellschaft International Association of Scientific, Technical and Medical Publishers European Space Agency, ESRIN Fernuniversität in Hagen European Organization for Nuclear Research Georg-August-Universitat Gottingen Stiftung Oeffentlichen Rechts European Science Foundation, Centre National d’Etudes Spatiales, Centre Informatique National de l’Enseignement Supérieur, UK Joint Information Systems Committee, British Library National Archives of Sweden

Alliance status First stage – fairly informal sign-up Preparing for Conference in Nov More formal framework next year

PARSE bid Consortium is a sub-group of the Alliance EU bid Aims at E-Infrastructure for Preservation Roadmap Survey of what is in place and planned Gap Analysis Impact Analysis tool

Other opportunities NSF solicitation, entitled Sustainable Digital Data Preservation and Access Network Partners (DataNet) informational meeting for prospective Principal Investigators will be held 10 am to noon, Tuesday, November 6, 2007, Room 595 NSF Stafford II building, Arlington, Virginia.