Presentation is loading. Please wait.

Presentation is loading. Please wait.

WP24 – Authenticity and Provenance

Similar presentations


Presentation on theme: "WP24 – Authenticity and Provenance"— Presentation transcript:

1 WP24 – Authenticity and Provenance
Silvio Salza, CINI - Università di Roma “La Sapienza” First year review, Luxembourg, February 2, 2012

2 WP24 – Authenticity and provenance
Start month: End month: 14 Total effort: p/m Partners: CINI (lead) MRL APA SBA CERN STM FORTH UK-DA IKIRAN

3 Motivations and goals To review and recommend authenticity systems:
Analyze the results of relevant projects with the aim of building an interoperable framework; Define guidelines to collect authenticity evidence Analyze the way provenance, fixity and context are currently recorded by partners, and suggest improvements Develop mappings between provenance models to allow interoperability Develop provenance-based inference rules

4 WP24 Tasks Task 2410 - Review of authenticity systems
Produce a state of the art of international research projects Extend the methodology proposed by CASPAR and InterPARES Propose a basic framework for authenticity Task Evaluation of authenticity evidence Analyze the relation between authenticity and digital resource lifecycle Propose operational guidelines to collect authenticity evidence Evaluate the methodology through a set of case studies Task 2430 – Provenance Interoperability and Reasoning Map provenance models for interoperability Develop provenance-based inference rules

5 WP24 Deliverables D Report on authenticity and plan for interoperable authenticity evaluation system Detailed analysis of the state of the art (projects and standard) Proposal of a common view for capturing and evaluating authenticity evidence in a standardized way Development of a consistent methodology and of concrete guidelines to allow interoperability and support changes in data holders and processing workflows Analysis and discussion of secure logging mechanisms Provenance interoperability and reasoning (results of task 2430)

6 WP24 Deliverables (cont)
D Implementation and testing of an authenticity protocol on a specific domain Test the methodology and the guidelines to check how they specialize on specific environments Case study analysis in different environments, to explore the current practices and to propose improvements Proposal and implementation of authenticity protocols (according to the CASPAR methodology) ID Report on Provenance Interoperability and Mappings Internal deliverable documenting the activities in task 2430

7 State of the art: projects and standards
Analysis of the outputs of the main research projects InterPARES and CASPAR (main reference on authenticity) PLANETS, InSPECT, PROTAGE, SHAMAN, PARSE.Insight, LIWA, KEEP, PersID, PrestoPRIME, Wf4Ever, SCAPE, TIMBUS, ENSURE, SCIDIP-ES, ARCOMEM Standards and recommendations on management and the certification of ERM and LTDP systems OAIS, PREMIS MoReq2 and MoReq2010, ISO :2001, ISO :2006 (creation and management of digital resources) UN/CEFACT – BRS. Transfer of Digital Records TRAC, ISO/DIS 16363, ISO/DIS (certification of digital repositories)

8 Authenticity in OAIS and InterPARES
OAIS: “the degree to which a person (or system) may regard an object as what it is purported to be. The degree of authenticity is judged on the basis of evidence” (draft of the updated version) InterPARES: the authenticity has no degree in itself The presumption of the authenticity is graduated The assessment is supported by the preservation system and by the information collected during the lifecycle of the digital resource, both before and after preservation begins

9 Authenticity evidence and the chain of custody
InterPARES links authenticity to the transformations a digital resource undergoes during its lifecycle Authenticity is inferred from the trustworthiness of all the information collected in the various phases: Creation and management Preservation Any transfer between systems (keeping and LTDP) Authenticity evidence should be collected along the whole lifecycle of digital resources During the preservation phase this information is part of PDI A systematic way should be proposed to collect and preserve authenticity evidence before ingestion

10 Technical and non technical evidence
Authenticity evidence cannot be limited to ‘technical evidence’, i.e. mechanisms to validate the integrity at bit level (digests, signatures etc) ‘Non technical evidence’ should be collected as well Identity of the author Evidence of the reliability of the creation system Trustworthiness of the custodian Etc. Becomes of crucial importance if the bit stream is modified by transformations during the digital resource lifecycle Authenticity evidence should be defined in relation to the specific domain and its Designated Community (InterPARES)

11 Conceptual and methodological framework
OAIS to be used as a reference model for the preservation process to manage workflows and responsibilities, and to define which authenticity evidence should be preserved and how it should be organized (PDI) InterPARES as the conceptual framework for interrelating principles, policies and procedures to compare and assess quality and consistency of digital practices with regard to authenticity General InterPARES assumption: LTDP objects can be preserved only as authentic copies: collected and preserved authenticity evidence should be adequate to support the assessment Concetti su cui si è basato CASPAR sull’autenticità

12 The CASPAR contribution
Basic CASPAR assumption: authenticity assessment is based on the collection of documentation and information related to the workflows, actors and events in the transfer and preservation processes CASPAR devises a methodological approach based on a set of authenticity management tools (not implemented in the EU project) to integrate and document the main events during the preservation phase The definition of protocols and procedures for the preservation phase may have a considerable influence on the previous part of the custody chain, as it demands for a more systematic collection of authenticity evidence Concetti su cui si è basato CASPAR sull’autenticità

13 CASPAR authenticity protocols
Authenticity protocol: formal procedure that defines controls and actions to be performed in connection with transformations of digital resources during their preservation in an OAIS An authenticity protocol gives an operational guideline to perform controls and to collect authenticity evidence, and is based on a workflow which can be automatic or manual a series of steps applied to a class of objects or to a class of events and related to one or more components of the PDI the information related to the step execution (actor, information, time, place, context of execution) Concetti su cui si è basato CASPAR sull’autenticità

14 THE OVERALL CASPAR MODEL

15 The APARSEN proposal The state of the art testifies that significant scientific contributions have been given A good level of theoretical and methodological formalization has been achieved A large gap still divides the theoretical results from the actual practices carried on in most repositories CASPAR and InterPARES have provided a solid basic framework, but further contribution is still needed Develop general detailed guidelines at concrete and operational level for the management of authenticity evidence Extend the analysis to the whole lifecycle

16 The Digital Resource lifecycle
PRE-INGEST PHASE LTDP PHASE INGEST MIGRATE MIGRATE AGGREGATE LTDP SYSTEM KEEPING SYSTEM LTDP SYSTEM TRANSFER SUBMIT TRANSFER TRANSFER CAPTURE LTDP SYSTEM CREATION KEEPING SYSTEM AGGREGATE

17 Authenticity and the Digital Resource lifecycle
Transformations connected to lifecycle events may affect the authenticity of the Digital Resource (DR) To assess authenticity one must be able to trace back all transformations Authenticity evidence must be collected in connection with each relevant event of the lifecycle The whole custody chain must be considered Pre-ingest phase is crucial in some environments, e.g. e-gov Precise guidelines must be given

18 Achieving interoperability
Why is interoperability a crucial requirement? There may be several changes of custody along the lifecycle Authenticity evidence must be interpreted and managed by systems different from the ones that have collected it Achieving interoperability: Formal standardization: long process, needs consensus Guidelines (based on existing standards): concrete results in the short/medium term (introducing good practices) Setting up a preliminary proposal: Define a core set of events: when the evidence should be collected Specify which evidence should be collected and how to structure it Select test environments among current practices in the NOE

19 Events in the pre-ingest phase
The core set of events: (Includes the most important and the most likely to occur) CAPTURE: the DR is delivered by its author to a keeping system; INTEGRATE: new information is added or associated to a DR; AGGREGATE: several DRs are aggregated to form a new DR; DELETE: a DR is deleted according to a stated policy; MIGRATE: one or several components of the DR are converted to a new format; TRANSFER: a DR is transferred to another keeping system; SUBMIT: a DR stored in a keeping system is delivered to a LTDP (Specific environments may require the definition of additional events)

20 Events in the LTDP phase
LTDP-INGEST: a DR delivered in a SIP is ingested and stored as an AIP LTDP-AGGREGATE: several DRs stored in different AIPs, are aggregated in a single AIC; LTDP-EXTRACT: DRs are extracted from an AIC to form individual AIPs; LTDP-MIGRATE: one or several components of an AIP are converted to a new format; LTDP-DELETE: a DR stored in an AIP is deleted when its preservation time expires; LTDP-TRANSFER: a DR stored in an AIP is transferred to another LTDP system;

21 Characterizing the events
Description: circumstances and actions Agents: the person(s) who take the responsibility Input: the DR(s) which are the object of the transformation Output: the DR(s) which are the result of the transformation Controls: which controls are performed and by whom Authenticity Evidence Record: Identity and authentication data of agents and systems involved Date and time Specification of the actions performed Results of controls performed Other…

22 Authenticity Evidence Records
Each AER contains the evidence for a specific event The sequence of AERs reproduces the sequence of events in the DR lifecycle AEH AER1 AER2 AER3 AERn Authenticity Evidence Record (AER): structure containing the evidence collected in connection with a specific event Authenticity Evidence History (AEH): incremental structure of AER When the DR enters the LTDP phase the AEH collected during the pre-ingest phase provides the information to generate the PDI in the SIP During the LTDP phase the new AERs contribute to the PDI of the AIP

23 Case study analysis Test environments selected among the NOE partners
Could the model reasonably fit into the real life scenario? Phase 1: analyze current practices What does authenticity mean to the community of users? What is being done right now about authenticity? Which are the relevant events of the lifecycle? Which evidence is currently collected? Phase 2: proposals to improve the current situation Identify the lifecycle events relevant to authenticity Specify actions and controls to be performed Define content and structure of the Authenticity Evidence Records

24 Test environments from NOE partners
CINI USSL-Vicenza: Repository of the Public Health Care System UK-DA Archive in Social Sciences & Humanities CERN HEP (High Energy Physics) Repository of experimental data IKIRAN Repository of Scientific heritage of the Russian Academy of Sciences Satellite Data Center of Russian Academy of Sciences

25 USSL-Vicenza: phase 1: current practices
Variety of digital resources: Several types of digital resources with separate workflows Test results (files in DICOM format and more) Medical reports (digitally signed by physicians) Long term preservation Pre-ingest workflow may involve several systems under different responsibilities All records are sent to the repository shortly after their creation Italian rules on LTDP are very specific, mostly centered on digital signatures, certified timestamps and based on collecting many DRs in a single large batch (called Preservation Volume)

26 USSL-Vicenza: the DR lifecycle
PRE-INGEST PHASE LTDP PHASE AIPs are generated for the individual DRs Imaging device that generates DICOM files Multiple AIPs are aggregated in a Preservation Volume (AIC) MODALITY LTDP-AGGREGATE MODALITY CAPTURE INGEST LOCAL PACS LTDP SYSTEM SCRYBA CENTRAL PACS LOCAL PACS SUBMIT TRANSFER CAPTURE Test results are submitted as individual SIPs LTDP-MIGRATE PACS: Picture Archiving and Communication System MODALITY Format migration, to be implemented MODALITY

27 USSL-Vicenza: phase 2: improvements
The model fits quite well: 6 events have been identified Current problems: Changes of custody and related responsibilities in the pre-ingest phase need better documentation Part of the authenticity evidence is currently not handled at the metadata level Additional controls and the gathering of additional evidence is needed More precise formalization of the events Detailed specification of controls, actions and responsibilities Detailed specification of Authenticity Evidence Records Define a precise (XML) data structure for AER Authenticity protocols will be developed for sample events

28 UK Data Archives: phase 1: current practices
Long standing Archive (1967) in Social Science & Humanities Close relationships with Producers, Funders, Consumers Current practices: Network of Confidence with stakeholders Well devised and well documented procedures, compliance with international standards Part of the authenticity evidence not included in the SIP, but derived from data deposit forms and agreements Manual processing by expert teams: considerable effort devoted to collecting and organizing authenticity evidence in the ingest phase Improvement of evidence pursued by the Archives for best practice reasons rather than to meet a perceived demand

29 UK Data Archive: phase 2: improvements
Model fits quite well: 3 events identified in the DR lifecycle: Pre-Ingest Phase > SUBMIT LTDP Phase > INGEST LTDP Phase > MIGRATE Suggested improvements (work in progress) Authenticity evidence could be collected by the Producers, structured according to detailed specifications and incorporated in the SIP Would make the responsibilities of the producers more evident Would improve the trustworthiness of the authenticity evidence Would contribute to a more automated ingest processing Application of checksum to SIPs (currently not mandated)

30 CERN: HEP data repository (work in progress)
The HEP (High Energy Physics) community is considering the problem of preserving their experimental data The issue of authenticity is not always well understood Data are often transferred via unreliable channels (even ) No specific procedures for authentication and submission Authenticity evidence is ‘retrieved’ from publications In extreme cases: somebody tried to provide “abandoned data” Broad margins for improvement Authenticity of data should be linked to the experimental context Procedures for authentication and submission should be defined Minimal authenticity evidence (and a predefined structure) to be provided in the PDI of the SIP should be agreed upon

31 Log Files and Logging Systems
Log files capture relevant events and interactions with a system: Automatically According to a logging policy In a secure way Log files contain highly sensitive data Need to be protected from unauthorized access and tampering Required by ISO (ISO 17799) (Code of practice for information security management) ISO (Information Security Management Systems)

32 Secure logging protocols
Trustworthy Logs are fundamental for monitoring and auditing Serve as evidence for provenance and authenticity of data Provide audit trails Describe events that occurred during the life cycle of an object Secure logging protocols Use cryptography to protect content Prevent insertion or deletion of unauthorized events Allow detection of fraud and tampering

33 Log File Audits and Secure Storage
Log file audits have to be performed on a regular basis Audits check logs for integrity and validity Provide insights about data quality ISO/DIS 16363:2011 (Requirements for Audit and Certification of Trustworthy Digital Repositories) Secure Storage for log files Enhance security and safety requirements for sensitive log data Needs to be suitable for long term preservation Specialized provenance-aware storage WORM (Write Once Read Many) systems

34 Conclusions The state of the art The operational guidelines
Results from research projects (mainly CASPAR and InterPARES) provide a solid basic framework for authenticity A large gap still divides theoretical results from actual practices Interoperability is a crucial target The operational guidelines Consider the whole lifecycle of the DR including pre-ingest workflow Identify the set of events that may affect authenticity Specify actions, controls and the Authenticity Evidence to be collected Testing the methodology Check the model against test environments from the NOE’s partners Verify how the guidelines may improve the current practices

35 Network of Excellence


Download ppt "WP24 – Authenticity and Provenance"

Similar presentations


Ads by Google