WP24 – Authenticity and Provenance

Slides:



Advertisements
Similar presentations
Criteria for the trustworthiness of data centres Jens Klump Helmholtz Centre Potsdam German Research Centre for Geosciences (GFZ) DataCite Summer Meeting.
Advertisements

Aparsen.eu #APARSEN Co-funded by the European Union under FP7-ICT Silvio Salza and Maria Guercio CINI - Università di Roma “La Sapienza APA Conference.
Fedora Users’ Conference Rutgers University May 14, 2005 Researching Fedora's Ability to Serve as a Preservation System for Electronic University Records.
System Integration Verification and Validation
Authentication of the Federal Register Charley Barth Director, Office of the Federal Register United States Government.
Digital Preservation - Its all about the metadata right? “Metadata and Digital Preservation: How Much Do We Really Need?” SAA 2014 Panel Saturday, August.
PREMIS in Thought: Data Center for LC Digital Holdings Ardys Kozbial, Arwen Hutt, David Minor February 11, 2008.
1 Extending the Implementation of PREMIS to Geospatial Resources in the Stanford Digital Repository: An Exploration By Nancy J. Hoebelheinrich Metadata.
Common Use Cases for Preservation Metadata Deborah Woodyard-Robinson Digital Preservation Consultant Long-term Repositories:
PAWN: A Novel Ingestion Workflow Technology for Digital Preservation
Co-funded by the European Union under FP7-ICT Alliance Permanent Access to the Records of Science in Europe Network Co-ordinated by aparsen.eu #APARSEN.
PAWN: A Novel Ingestion Workflow Technology for Digital Preservation Mike Smorul, Joseph JaJa, Yang Wang, and Fritz McCall.
Long-term Archive Service Requirements draft-ietf-ltans-reqs-00.txt.
Software Configuration Management
Privacy By Design Sample Use Case Privacy Controls Insurance Application- Vehicle Data.
What is Business Analysis Planning & Monitoring?
S/W Project Management
The Data Attribution Abdul Saboor PhD Research Student Model Base Development and Software Quality Assurance Research Group Freie.
International Council on Archives Section on University and Research Institution Archives Michigan State University September 7, 2005 Preserving Electronic.
Science Archives in the 21st Century 25/26 April Towards an International standard for Audit and Certification of Digital Repositories David Giaretta.
Data Archiving and Networked Services DANS is an institute of KNAW en NWO Trusted Digital Archives and the Data Seal of Approval Peter Doorn Data Archiving.
USING METADATA TO FACILITATE UNDERSTANDING AND CERTIFICATION ABOUT THE PRESERVATION PROPERTIES OF A PRESERVATION SYSTEM Jewel H. Ward, Hao Xu, Mike C.
Important acronyms AO = authorizing official ISO = information system owner CA = certification agent.
Reference Model for an Open Archival Information System (OAIS) ESIP Summer Meeting John Garrett – ADNET Systems at NASA/GSFC ESIP Summer Meeting.
Session ID: Session Classification: Dr. Michael Willett OASIS and WillettWorks DSP-R35A General Interest OASIS Privacy Management Reference Model (PMRM)
OAIS Rathachai Chawuthai Information Management CSIM / AIT Issued document 1.0.
IETF - LTANS, March 2004P. Sylvester, Edelweb & A. Jerman Blazic, SETCCE Introduction The following slides were prepared as a result of analysis and discussion.
Digital Preservation: Current Thinking Anne Gilliland-Swetland Department of Information Studies.
Archival Workshop on Ingest, Identification, and Certification Standards Certification (Best Practices) Checklist Does the archive have a written plan.
PREMIS Implementation Fair, San Francisco, CA October 7, Stanford Digital Repository PREMIS & Geospatial Resources Nancy J. Hoebelheinrich Knowledge.
OAIS Rathachai Chawuthai Information Management CSIM / AIT Issued document 1.0.
Preserving Electronic Mailing Lists as Scholarly Resources: The H-Net Archives Lisa M. Schmidt
Lifecycle Metadata for Digital Objects November 15, 2004 Preservation Metadata.
The OAIS model SEEDS meeting May 5 th, 2015, Lausanne Bojana Tasic.
An overview of the Reference Model for an Open Archival Information System (OAIS) Michael Day, Digital Curation Centre UKOLN, University.
Building Preservation Environments with Data Grid Technology Reagan W. Moore Presenter: Praveen Namburi.
Data Management: Data Processing Types of Data Processing at USGS There are several ways to classify Data Processing activities at USGS, and here are some.
Data Management and Digital Preservation Carly Dearborn, MSIS Digital Preservation & Electronic Records Archivist
Enterprise Architectures Course Code : CPIS-352 King Abdul Aziz University, Jeddah Saudi Arabia.
Co-funded by the European Union under FP7-ICT Alliance Permanent Access to the Records of Science in Europe Network Co-ordinated by aparsen.eu #APARSEN.
Co-funded by the European Union under FP7-ICT Alliance Permanent Access to the Records of Science in Europe Network aparsen.eu #APARSEN Options.
Università di Roma La Sapienza 1 Documenting the Authenticity and Provenance of Digital Data M ARIA G UERCIO Università degli studi di Roma “La Sapienza”
1 CASE Computer Aided Software Engineering. 2 What is CASE ? A good workshop for any craftsperson has three primary characteristics 1.A collection of.
CIS 375 Bruce R. Maxim UM-Dearborn
General Model of E-ARK Services
Module 4 — Submission & Ingest
Configuration Management
WP14 Common Testing Environments
Ingest and Dissemination with DAITSS
ECA 2010, Geneva, Switzerland Creating a synergy between BPM
OAIS Producer (archive) Consumer Management
Chapter 11: Software Configuration Management
D33.1B PEER REVIEW OF DIGITAL REPOSITORIES
Building A Repository for Digital Objects
Exercise: understanding authenticity evidence
Stream 2: Technical research Achievements and future plans
Joseph JaJa, Mike Smorul, and Sangchul Song
Active Data Management in Space 20m DG
Exercise: understanding authenticity evidence
Information and Network Security
Chapter 11: Software Configuration Management
IS-ENES Cases Seven use cases are listed as data lifecycle steps A B C
Metadata in Digital Preservation: Setting the Scene
Using Use Case Diagrams
e-Invoicing – e-Ordering 20/11/2008
Open Archival Information System
1 Advanced Cyber Security Forensics Training for Law Enforcement Building Advanced Forensics & Digital Evidence Human Resource in the Law Enforcement sector.
The Reference Model for an Open Archival Information System (OAIS)
Software Development Process Using UML Recap
ONC Update for HITSP Board
Presentation transcript:

WP24 – Authenticity and Provenance Silvio Salza, salza@dis.uniroma1.it CINI - Università di Roma “La Sapienza” First year review, Luxembourg, February 2, 2012

WP24 – Authenticity and provenance Start month: 3 End month: 14 Total effort: 32.5 p/m Partners: CINI (lead) MRL APA SBA CERN STM FORTH UK-DA IKIRAN

Motivations and goals To review and recommend authenticity systems: Analyze the results of relevant projects with the aim of building an interoperable framework; Define guidelines to collect authenticity evidence Analyze the way provenance, fixity and context are currently recorded by partners, and suggest improvements Develop mappings between provenance models to allow interoperability Develop provenance-based inference rules

WP24 Tasks Task 2410 - Review of authenticity systems Produce a state of the art of international research projects Extend the methodology proposed by CASPAR and InterPARES Propose a basic framework for authenticity Task 2420 - Evaluation of authenticity evidence Analyze the relation between authenticity and digital resource lifecycle Propose operational guidelines to collect authenticity evidence Evaluate the methodology through a set of case studies Task 2430 – Provenance Interoperability and Reasoning Map provenance models for interoperability Develop provenance-based inference rules

WP24 Deliverables D24.1 - Report on authenticity and plan for interoperable authenticity evaluation system Detailed analysis of the state of the art (projects and standard) Proposal of a common view for capturing and evaluating authenticity evidence in a standardized way Development of a consistent methodology and of concrete guidelines to allow interoperability and support changes in data holders and processing workflows Analysis and discussion of secure logging mechanisms Provenance interoperability and reasoning (results of task 2430)

WP24 Deliverables (cont) D24.2 - Implementation and testing of an authenticity protocol on a specific domain Test the methodology and the guidelines to check how they specialize on specific environments Case study analysis in different environments, to explore the current practices and to propose improvements Proposal and implementation of authenticity protocols (according to the CASPAR methodology) ID24.01 - Report on Provenance Interoperability and Mappings Internal deliverable documenting the activities in task 2430

State of the art: projects and standards Analysis of the outputs of the main research projects InterPARES and CASPAR (main reference on authenticity) PLANETS, InSPECT, PROTAGE, SHAMAN, PARSE.Insight, LIWA, KEEP, PersID, PrestoPRIME, Wf4Ever, SCAPE, TIMBUS, ENSURE, SCIDIP-ES, ARCOMEM Standards and recommendations on management and the certification of ERM and LTDP systems OAIS, PREMIS MoReq2 and MoReq2010, ISO 15489-1:2001, ISO 23081-1:2006 (creation and management of digital resources) UN/CEFACT – BRS. Transfer of Digital Records TRAC, ISO/DIS 16363, ISO/DIS 16919 (certification of digital repositories)

Authenticity in OAIS and InterPARES OAIS: “the degree to which a person (or system) may regard an object as what it is purported to be. The degree of authenticity is judged on the basis of evidence” (draft of the updated version) InterPARES: the authenticity has no degree in itself The presumption of the authenticity is graduated The assessment is supported by the preservation system and by the information collected during the lifecycle of the digital resource, both before and after preservation begins

Authenticity evidence and the chain of custody InterPARES links authenticity to the transformations a digital resource undergoes during its lifecycle Authenticity is inferred from the trustworthiness of all the information collected in the various phases: Creation and management Preservation Any transfer between systems (keeping and LTDP) Authenticity evidence should be collected along the whole lifecycle of digital resources During the preservation phase this information is part of PDI A systematic way should be proposed to collect and preserve authenticity evidence before ingestion

Technical and non technical evidence Authenticity evidence cannot be limited to ‘technical evidence’, i.e. mechanisms to validate the integrity at bit level (digests, signatures etc) ‘Non technical evidence’ should be collected as well Identity of the author Evidence of the reliability of the creation system Trustworthiness of the custodian Etc. Becomes of crucial importance if the bit stream is modified by transformations during the digital resource lifecycle Authenticity evidence should be defined in relation to the specific domain and its Designated Community (InterPARES)

Conceptual and methodological framework OAIS to be used as a reference model for the preservation process to manage workflows and responsibilities, and to define which authenticity evidence should be preserved and how it should be organized (PDI) InterPARES as the conceptual framework for interrelating principles, policies and procedures to compare and assess quality and consistency of digital practices with regard to authenticity General InterPARES assumption: LTDP objects can be preserved only as authentic copies: collected and preserved authenticity evidence should be adequate to support the assessment Concetti su cui si è basato CASPAR sull’autenticità

The CASPAR contribution Basic CASPAR assumption: authenticity assessment is based on the collection of documentation and information related to the workflows, actors and events in the transfer and preservation processes CASPAR devises a methodological approach based on a set of authenticity management tools (not implemented in the EU project) to integrate and document the main events during the preservation phase The definition of protocols and procedures for the preservation phase may have a considerable influence on the previous part of the custody chain, as it demands for a more systematic collection of authenticity evidence Concetti su cui si è basato CASPAR sull’autenticità

CASPAR authenticity protocols Authenticity protocol: formal procedure that defines controls and actions to be performed in connection with transformations of digital resources during their preservation in an OAIS An authenticity protocol gives an operational guideline to perform controls and to collect authenticity evidence, and is based on a workflow which can be automatic or manual a series of steps applied to a class of objects or to a class of events and related to one or more components of the PDI the information related to the step execution (actor, information, time, place, context of execution) Concetti su cui si è basato CASPAR sull’autenticità

THE OVERALL CASPAR MODEL

The APARSEN proposal The state of the art testifies that significant scientific contributions have been given A good level of theoretical and methodological formalization has been achieved A large gap still divides the theoretical results from the actual practices carried on in most repositories CASPAR and InterPARES have provided a solid basic framework, but further contribution is still needed Develop general detailed guidelines at concrete and operational level for the management of authenticity evidence Extend the analysis to the whole lifecycle

The Digital Resource lifecycle PRE-INGEST PHASE LTDP PHASE INGEST MIGRATE MIGRATE AGGREGATE LTDP SYSTEM KEEPING SYSTEM LTDP SYSTEM TRANSFER SUBMIT TRANSFER TRANSFER CAPTURE LTDP SYSTEM CREATION KEEPING SYSTEM AGGREGATE

Authenticity and the Digital Resource lifecycle Transformations connected to lifecycle events may affect the authenticity of the Digital Resource (DR) To assess authenticity one must be able to trace back all transformations Authenticity evidence must be collected in connection with each relevant event of the lifecycle The whole custody chain must be considered Pre-ingest phase is crucial in some environments, e.g. e-gov Precise guidelines must be given

Achieving interoperability Why is interoperability a crucial requirement? There may be several changes of custody along the lifecycle Authenticity evidence must be interpreted and managed by systems different from the ones that have collected it Achieving interoperability: Formal standardization: long process, needs consensus Guidelines (based on existing standards): concrete results in the short/medium term (introducing good practices) Setting up a preliminary proposal: Define a core set of events: when the evidence should be collected Specify which evidence should be collected and how to structure it Select test environments among current practices in the NOE

Events in the pre-ingest phase The core set of events: (Includes the most important and the most likely to occur) CAPTURE: the DR is delivered by its author to a keeping system; INTEGRATE: new information is added or associated to a DR; AGGREGATE: several DRs are aggregated to form a new DR; DELETE: a DR is deleted according to a stated policy; MIGRATE: one or several components of the DR are converted to a new format; TRANSFER: a DR is transferred to another keeping system; SUBMIT: a DR stored in a keeping system is delivered to a LTDP (Specific environments may require the definition of additional events)

Events in the LTDP phase LTDP-INGEST: a DR delivered in a SIP is ingested and stored as an AIP LTDP-AGGREGATE: several DRs stored in different AIPs, are aggregated in a single AIC; LTDP-EXTRACT: DRs are extracted from an AIC to form individual AIPs; LTDP-MIGRATE: one or several components of an AIP are converted to a new format; LTDP-DELETE: a DR stored in an AIP is deleted when its preservation time expires; LTDP-TRANSFER: a DR stored in an AIP is transferred to another LTDP system;

Characterizing the events Description: circumstances and actions Agents: the person(s) who take the responsibility Input: the DR(s) which are the object of the transformation Output: the DR(s) which are the result of the transformation Controls: which controls are performed and by whom Authenticity Evidence Record: Identity and authentication data of agents and systems involved Date and time Specification of the actions performed Results of controls performed Other…

Authenticity Evidence Records Each AER contains the evidence for a specific event The sequence of AERs reproduces the sequence of events in the DR lifecycle AEH AER1 AER2 AER3 AERn Authenticity Evidence Record (AER): structure containing the evidence collected in connection with a specific event Authenticity Evidence History (AEH): incremental structure of AER When the DR enters the LTDP phase the AEH collected during the pre-ingest phase provides the information to generate the PDI in the SIP During the LTDP phase the new AERs contribute to the PDI of the AIP

Case study analysis Test environments selected among the NOE partners Could the model reasonably fit into the real life scenario? Phase 1: analyze current practices What does authenticity mean to the community of users? What is being done right now about authenticity? Which are the relevant events of the lifecycle? Which evidence is currently collected? Phase 2: proposals to improve the current situation Identify the lifecycle events relevant to authenticity Specify actions and controls to be performed Define content and structure of the Authenticity Evidence Records

Test environments from NOE partners CINI USSL-Vicenza: Repository of the Public Health Care System UK-DA Archive in Social Sciences & Humanities CERN HEP (High Energy Physics) Repository of experimental data IKIRAN Repository of Scientific heritage of the Russian Academy of Sciences Satellite Data Center of Russian Academy of Sciences

USSL-Vicenza: phase 1: current practices Variety of digital resources: Several types of digital resources with separate workflows Test results (files in DICOM format and more) Medical reports (digitally signed by physicians) Long term preservation Pre-ingest workflow may involve several systems under different responsibilities All records are sent to the repository shortly after their creation Italian rules on LTDP are very specific, mostly centered on digital signatures, certified timestamps and based on collecting many DRs in a single large batch (called Preservation Volume)

USSL-Vicenza: the DR lifecycle PRE-INGEST PHASE LTDP PHASE AIPs are generated for the individual DRs Imaging device that generates DICOM files Multiple AIPs are aggregated in a Preservation Volume (AIC) MODALITY LTDP-AGGREGATE MODALITY CAPTURE INGEST LOCAL PACS LTDP SYSTEM SCRYBA CENTRAL PACS LOCAL PACS SUBMIT TRANSFER CAPTURE Test results are submitted as individual SIPs LTDP-MIGRATE PACS: Picture Archiving and Communication System MODALITY Format migration, to be implemented MODALITY

USSL-Vicenza: phase 2: improvements The model fits quite well: 6 events have been identified Current problems: Changes of custody and related responsibilities in the pre-ingest phase need better documentation Part of the authenticity evidence is currently not handled at the metadata level Additional controls and the gathering of additional evidence is needed More precise formalization of the events Detailed specification of controls, actions and responsibilities Detailed specification of Authenticity Evidence Records Define a precise (XML) data structure for AER Authenticity protocols will be developed for sample events

UK Data Archives: phase 1: current practices Long standing Archive (1967) in Social Science & Humanities Close relationships with Producers, Funders, Consumers Current practices: Network of Confidence with stakeholders Well devised and well documented procedures, compliance with international standards Part of the authenticity evidence not included in the SIP, but derived from data deposit forms and agreements Manual processing by expert teams: considerable effort devoted to collecting and organizing authenticity evidence in the ingest phase Improvement of evidence pursued by the Archives for best practice reasons rather than to meet a perceived demand

UK Data Archive: phase 2: improvements Model fits quite well: 3 events identified in the DR lifecycle: Pre-Ingest Phase > SUBMIT LTDP Phase > INGEST LTDP Phase > MIGRATE Suggested improvements (work in progress) Authenticity evidence could be collected by the Producers, structured according to detailed specifications and incorporated in the SIP Would make the responsibilities of the producers more evident Would improve the trustworthiness of the authenticity evidence Would contribute to a more automated ingest processing Application of checksum to SIPs (currently not mandated)

CERN: HEP data repository (work in progress) The HEP (High Energy Physics) community is considering the problem of preserving their experimental data The issue of authenticity is not always well understood Data are often transferred via unreliable channels (even e-mail) No specific procedures for authentication and submission Authenticity evidence is ‘retrieved’ from publications In extreme cases: somebody tried to provide “abandoned data” Broad margins for improvement Authenticity of data should be linked to the experimental context Procedures for authentication and submission should be defined Minimal authenticity evidence (and a predefined structure) to be provided in the PDI of the SIP should be agreed upon

Log Files and Logging Systems Log files capture relevant events and interactions with a system: Automatically According to a logging policy In a secure way Log files contain highly sensitive data Need to be protected from unauthorized access and tampering Required by ISO 27002 (ISO 17799) (Code of practice for information security management) ISO 27001 (Information Security Management Systems)

Secure logging protocols Trustworthy Logs are fundamental for monitoring and auditing Serve as evidence for provenance and authenticity of data Provide audit trails Describe events that occurred during the life cycle of an object Secure logging protocols Use cryptography to protect content Prevent insertion or deletion of unauthorized events Allow detection of fraud and tampering

Log File Audits and Secure Storage Log file audits have to be performed on a regular basis Audits check logs for integrity and validity Provide insights about data quality ISO/DIS 16363:2011 (Requirements for Audit and Certification of Trustworthy Digital Repositories) Secure Storage for log files Enhance security and safety requirements for sensitive log data Needs to be suitable for long term preservation Specialized provenance-aware storage WORM (Write Once Read Many) systems

Conclusions The state of the art The operational guidelines Results from research projects (mainly CASPAR and InterPARES) provide a solid basic framework for authenticity A large gap still divides theoretical results from actual practices Interoperability is a crucial target The operational guidelines Consider the whole lifecycle of the DR including pre-ingest workflow Identify the set of events that may affect authenticity Specify actions, controls and the Authenticity Evidence to be collected Testing the methodology Check the model against test environments from the NOE’s partners Verify how the guidelines may improve the current practices

Network of Excellence