Components for a Science Data Infrastructure – preservation and re-use of data David Giaretta.

Slides:



Advertisements
Similar presentations
Preserving Intellectual Property Rights in the long term CASPAR All Hands Meeting - Rome, September 2009 Demo
Advertisements

1 The CASPAR Project Key Components - Overview Luigi Briguglio Engineering R&D Laboratory – Roma CASPAR FINAL WORKSHOP IN ROME SCUOLA SUPERIORE PUBBLICA.
Curating Research: problems and policy Dale Peters Scientific Technical Manager DRIVER II.
CASPAR Preservable Infrastructure Addressing Preservation with an OAIS based Infrastructure Luigi Briguglio Engineering R&D Laboratory – Rome (Italy) 3rd.
Long-term Digital Metadata Curation Arif Shaon University of Reading 16 April 2014.
A centre of expertise in data curation and preservation CETIS MDR SIG::28 June 2006::University of Bath Funded by: This work is licensed under the Creative.
Long-Term Preservation. Technical Approaches to Long-Term Preservation the challenge is to interpret formats a similar development: sound carriers From.
DigCCurr 2007: What digital curators do and what they need to know The CASPAR view on: What digital curators do and what they need to know : Research Perspectives.
Project Overview APA Conference 2012 ESA/ESRIN (Frascati), 6-7 November 2012 D. Giaretta (APA)
CODATA 2006, Beijing, China Oct CASPAR: Early results and future goals David Giaretta.
SCIDIP-ES services and toolkits David Giaretta. Preserving digitally encoded information Ensure that digitally encoded information are understandable.
Digital Preservation - Its all about the metadata right? “Metadata and Digital Preservation: How Much Do We Really Need?” SAA 2014 Panel Saturday, August.
PARSE.Insight Framework and Lesson Learned David Giaretta (STFC)
SCIDIP-ES Components Oct ,Brussels. Basic Preservation Strategies Often stated as: “Emulate or Migrate” OAIS concepts change these to: Add Representation.
Future Access to the Scientific and Cultural Heritage – A shared Responsibility Birte Christensen-Dalsgaard State and University Library.
PAWN: A Novel Ingestion Workflow Technology for Digital Preservation
E-IRG Open Workshop on e-Infrastructures 4-5 Oct 2006 CASPAR Project Digital Preservation and Digital interoperability.
Co-funded by the European Union under FP7-ICT Alliance Permanent Access to the Records of Science in Europe Network Co-ordinated by aparsen.eu #APARSEN.
Who is doing a good job in digital preservation? Audit and Certification of Digital Repositories: ISO and the European Framework.
APARSEN Metadata for preservation, curation and interoperability Workshop on Research Metadata in Context 7-8 Sept 2010, Nijmegen David Giaretta APA and.
1 Authenticity Capture Prototype Matt Dunckley, STFC.
EU Project proposal. Andrei S. Lopatenko 1 EU Project Proposal CERIF-SW Andrei S. Lopatenko Vienna University of Technology
1 Schema Registries Steven Hughes, Lou Reich, Dan Crichton NASA 21 October 2015.
CASPAR Framework and Lessons Learned David Giaretta.
PREMIS Rathachai Chawuthai Information Management CSIM / AIT.
Digital Preservation: Current Thinking Anne Gilliland-Swetland Department of Information Studies.
Metadata for digital preservation: a review of recent developments Michael Day UKOLN, University of Bath ECDL2001, 5th European Conference.
How to Implement an Institutional Repository: Part II A NASIG 2006 Pre-Conference May 4, 2006 Technical Issues.
26/05/2005 Research Infrastructures - 'eInfrastructure: Grid initiatives‘ FP INFRASTRUCTURES-71 DIMMI Project a DI gital M ulti M edia I nfrastructure.
Metadata and Meta tag. What is metadata? What does metadata do? Metadata schemes What is meta tag? Meta tag example Table of Content.
Lecture 13.  Failure mode: when team understands requirements but is unable to meet them.  To ensure that you are building the right system Continually.
Preservation metadata and the Cedars project Michael Day UKOLN: UK Office for Library and Information Networking University of Bath
Lifecycle Metadata for Digital Objects November 15, 2004 Preservation Metadata.
Institutional Repositories July 2007 DIGITAL CURATION creating, managing and preserving digital objects Dr D Peters DISA Digital Innovation South.
Building Preservation Environments with Data Grid Technology Reagan W. Moore Presenter: Praveen Namburi.
BNSC Agency Report David Giaretta Colorado Springs 16 Jan 2007.
Co-funded by the European Union under FP7-ICT Alliance Permanent Access to the Records of Science in Europe Network Co-ordinated by aparsen.eu #APARSEN.
PV 2009, ESAC, Spain, 1-3 Dec Long term data and knowledge preservation for the Earth Sciences Archive S. ALBANI (ESA) D. Giaretta (STFC) PV 2009.
Co-funded by the European Union under FP7-ICT Alliance Permanent Access to the Records of Science in Europe Network aparsen.eu #APARSEN Options.
Data Grids, Digital Libraries and Persistent Archives: An Integrated Approach to Publishing, Sharing and Archiving Data. Written By: R. Moore, A. Rajasekar,
Co-ordinated by aparsen.eu #APARSEN Co-funded by the European Union under FP7-ICT Services and Sustainability David Giaretta,
1 The XMSF Profile Overlay to the FEDEP Dr. Katherine L. Morse, SAIC Mr. Robert Lutz, JHU APL
DP Knowhow: Open Archival Information Systems (OAIS) in ISO APA/C-DAC International Conference on Digital Preservation and the Development of Trusted.
Metadata Issues in Long-term Management of Data and Metadata
Project Management: Messages
WP14 Common Testing Environments
Ingest and Dissemination with DAITSS
An Approach to Software Preservation
OGF PGI – EDGI Security Use Case and Requirements
Conceptualizing the research world
Dependency Management
David Giaretta Colorado Springs 16 Jan 2007
Presentation on Software Requirements Submitted by
OAIS Producer (archive) Consumer Management
D33.1B PEER REVIEW OF DIGITAL REPOSITORIES
Computer Aided Software Engineering (CASE)
Configuration Management and Prince2
CASPAR Cultural, Artistic and Scientific knowledge for Preservation Access and Retrieval.
Distribution and components
Active Data Management in Space 20m DG
Data Management: Documentation & Metadata
Health Ingenuity Exchange - HingX
Analysis models and design models
An Introduction to Software Architecture
Metadata in Digital Preservation: Setting the Scene
An Open Archival Repository System for UT Austin
Open Archival Information System
Metadata The metadata contains
Subject Name: SOFTWARE ENGINEERING Subject Code:10IS51
How to Implement an Institutional Repository: Part II
Presentation transcript:

Components for a Science Data Infrastructure – preservation and re-use of data David Giaretta

CASPAR Project http://www.casparpreserves.eu EU FP6 Integrated Project Total spend approx. 16MEuro (8.8 MEuro from EU) Brief introduction to the CASPAR consortium – not all of whom are represented here. http://www.casparpreserves.eu

Key Components DRM

Modules and Dependencies: defining the Designated Community README.txt TEXT EDITOR ENGLISH LANGUAGE WINDOWS XP FITS FILE FITS STANDARD PDF JAVA s/w JAVA VM s/w DICTIONARY SPECIFICATION UNICODE XML MULTIMEDIA PERFORMANCE DATA C3D DirectX MAX/MSP 3D motion data files 3D scene motion to music mapping strategy We can formalize intelligibility based on this notion. For example, to understand a file README.txt I need a TEXT EDITOR but also knowledge of ENGLISH LANGUAGE. Alternatively, and I if I could not understand ENGLISH, I could understand this file if I had an ENGLISH2GREEK DICTIONARY. We have the same “pattern” in multimedia performance data, in scientific data, ... The same “pattern” occurs almost everywhere. For instance, astronomers use FITS files. To understand a FITS file one has to understand the FITS standards, and so on.

USE DATA Use application to find data in Repository Create DIP with enough RepInfo for the user (via DC profile) Obtain more RepInfo from Registry if necessary

Registry/Repository Scenario The Digital Object could have some RepInfo packed with it. User gets data from archive. User may be unfamiliar with the data so needs Representation Information. Some may be packaged with the data CPID CPID CPID CPID CPID CPID CPID CPID Digital Digital CPID CPID CPID Object Object CPID CPID CPID This gives an example of the way in which a Registry might be used in more detail. WE CAN DEMONSTRATE VARIOUS PIECES OF THIS Note that the RepInfo may kept WITH the data in the archive. However we should look on this as a form of “caching” of the RepInfo. Archive Archive User

Archive Archive May need MORE Representation Information. Registry/Repository of Representation Information May need MORE Representation Information. Obtain this from one or more Registry/Repository of Rep.Info. CPID Representation Information CPID CPID CPID CPID CPID CPID CPID Do not want to guess which RepInfo is needed – use an identifier (CPID) associated with the data CPID CPID CPID Digital Digital CPID CPID CPID Object Object CPID CPID CPID This gives an example of the way in which a Registry might be used. Note that the RepInfo may kept WITH the data in the archive. However we should look on this as a form of “caching” of the RepInfo. Archive Archive

Registry/Repository of Representation Information At some point in the future there is a need for further Representation Information. RepInfo Toolkit Someone creates the necessary RepInfo using whatever tools are appropriate. CPID Representation Information CPID CPID CPID CPID CPID CPID CPID CPID CPID CPID Digital Digital CPID CPID CPID Object Object CPID CPID CPID This gives an example of the way in which a Registry might be used. Note that the RepInfo may kept WITH the data in the archive. However we should look on this as a form of “caching” of the RepInfo. Archive Archive

Use of RepInfo CPID CPID CPID RepInfoLabel – points to other RepInfo CPID Structure = CPID Semantics = CPID Rendering s/w = CPID Each “bag of bits” has an associated pointer (CPID) to a Label CPID copy More detail about the link between the data and the registry CPID Structure = CPID Semantics = CPID Rendering s/w = CPID Registry External

http://www.youtube.com/watch?v=Yun9hkPPF9M

CREATE REPINFO Data holders etc inform Orchestration of dangers to preservation Orchestration uses Gap Manager to derive implications Orchestration asks experts for help e.g., Experts search for or create additional RepInfo needed to fill “gap” & store in RegRep

Key Components DRM

Authenticity Protocol Authenticity Step

Authenticity Protocol Execution Authenticity Protocol Execution Evaluation

Overall Authenticity Model Being implemented by IBM – built into Preservation Data Store

Transformation Protocol Transformation of received data files into IIWG format Recommendation (Policy) “For the received data file to be deemed as sufficient quality to support data analysis it must have been successfully transformed into standard IIWG data format, the use of processing software must be recorded” Steps will capture Details of transformation used Software details including name, version, source Time/date of process System details Details of person responsible Reason for believing this software is reliable Details of Transformational Information Properties checked Information Property descriptions Values checked Details of how checked Details of who was responsible for the check

User Experience / Flow diagram Slide shows the user experience of using the tool with screen shots

Key Components DRM

Digital Rights Ontology Core DRO entities A Domain ontology About Intellectual Property Rights To describe all aspects that are relevant to evaluate rights Attributes of rights Validity in time and space Laws and Agreements People Actions Licenses Constraints and Conditions … Harmonized with CIDOC CRM and FRBRoo

Interoperable Provenance Information (1) Intellectual Property Rights Life Cycle

Creation of “Spaces of Mind” Interoperable Provenance Information (2) DRM Creation History Intellectual Property Rights P1F_is_identified_by E39_Actor Daniel Teruggi E82_Actor_Appellation Daniel Teruggi P14B_performed E21_Person Daniel Teruggi F31_Expression_Creation Creation of “Spaces of Mind” R22F_created XXX P14B_performed F20_Self-Contained_Expression XXX A20F_isOn XXX C5_Copyright R1 - Distribution Right E65_Creation Creation of “Spaces of Mind” XXX C5_Copyright R2 - Reproduction Right XXX XXX C5_Copyright R3 - Attribution Right CIDOC-CRM core ontology Digital Rights Ontology

Insight: stakeholders Funding/policy National Funding organisations European funding Corporate funding Research Research institutes (non-profit) Universities Academic libraries Data management (preservation) Data centres (profit / non-profit) Libraries Archives Publishing General (cross-community) publishers Specific (community) publishers

Surveys to stakeholders Funding/policy < responses Research 1397 responses Data management (preservation) 273 responses Publishing 186 responses

About researchers Per category EU 44%, USA 33%, Other 23%

Data spectrum (R)

Sharing of data (R) How open is your data?

Sharing of data (R) Which constrains do you see in making data open?

Threats to preservation (R) The ones we trust to look after the digital holdings may let us down The current custodian of the data may cease to exist Loss of ability to identify the location of data Access and use restrictions may not be respected in the future Evidence may be lost Lack of sustainable hardware/software Users may be unable to understand or use the data

Threats to preservation (R) Users may be unable to understand or use the data e.g. the semantics, format or algorithms involved.

Requirement for solution Threat Requirement for solution Users may be unable to understand or use the data e.g. the semantics, format, processes or algorithms involved Ability to create and maintain adequate Representation Information Non-maintainability of essential hardware, software or support environment may make the information inaccessible Ability to share information about the availability of hardware and software and their replacements/substitutes The chain of evidence may be lost and there may be lack of certainty of provenance or authenticity Ability to bring together evidence from diverse sources about the Authenticity of a digital object Access and use restrictions may make it difficult to reuse data, or alternatively may not be respected in future Ability to deal with Digital Rights correctly in a changing and evolving environment Loss of ability to identify the location of data An ID resolver which is really persistent The current custodian of the data, whether an organisation or project, may cease to exist at some point in the future Brokering of organisations to hold data and the ability to package together the information needed to transfer information between organisations ready for long term preservation The ones we trust to look after the digital holdings may let us down Certification process so that one can have confidence about whom to trust to preserve data holdings over the long term RepInfo toolkit, Packager and Registry – to create and store Representation Information. In addition the Orchestration Manager and Knowledge Gap Manager help to ensure that the RepInfo is adequate. Registry and Orchestration Manager to exchange information about the obsolescence of hardware and software, amongst other changes. The Representation Information will include such things as software source code and emulators. Authenticity toolkit will allow one to capture evidence from many sources which may be used to judge Authenticity. Digital Rights and Access Rights tools allow one to virtualise and preserve the DRM and Access Rights information which exist at the time the Content Information is submitted for preservation. Persistent Identifier system: such a system will allow objects to be located over time. Orchestration Manager will, amongst other things, allow the exchange of information about datasets which need to be passed from one curator to another. The Audit and Certification standard to which CASPAR has contributed will allow a certification process to be set up.

Discipline repositories Cross-references Translators Thesauri Knowledge Gap Manager Orchestration/Brokering Processing Context Persistent ID resolver Authenticity tools RepInfo Registry Certification Knowledge Gap Manager Orchestration/Brokering Processing Context Persistent ID resolver Authenticity tools RepInfo Registry Certification Cross-references Translators Thesauri Automated systems Repositories Users Automated systems Repositories Users Compute Resource Local Authentication Local Authorisation Storage Resource Registries Process ID Scheduler Shibboleth FUTURE Compute Resource Local Authentication Local Authorisation Storage Users may be unable to understand or use the data e.g. the semantics, format, processes or algorithms involved Non-maintainability of essential hardware, software or support environment may make the information inaccessible The chain of evidence may be lost and there may be lack of certainty of provenance or authenticity Access and use restrictions may not be respected in the future Loss of ability to identify the location of data The current custodian of the data, whether an organisation or project, may cease to exist at some point in the future The ones we trust to look after the digital holdings may let us down WAN LAN Router Switch Cable Interconnects Gateways Management Network infrastructure: Individual islands of connectivity e.g. individual countrie EU puts in infrastructure GEANT (JANET in UK etc) to connect these (NOTE THAT THE INFRASTRUCTURE COMPONENTS NAMED HERE ARE ONLY INDICATIVE) The network supplies services to isolated organisations, each with its own resources such as CPU and storage EU funds EGEE, UK funds various GRID project to connect these 3) On these resources are used by repositories 1) the individual repositories need to be used in an integrated way 2) EU and JISC fund broad e-Research infrastructure In the future there will be some similar interconnected repositories and we need to transfer information from now to the future However there are many difficulties/threats to this transfer To counter these threats we need to put in place components to support preservation Many, if not all, of this preservation infrastructure is also useful to support the current e-Research infrastructure, particularly where in multidisciplinary research one needs to be able to use unfamiliar information resources. WAN Router LAN Switch Cable

Links CASPAR: PARSE.Insight: Alliance for Permanent Access: www.casparpreserves.eu http://www.casparpreserves.eu/training/advanced-digital-preservation-training-lectures/ PARSE.Insight: www.parse-insight.eu Alliance for Permanent Access: www.alliancepermanentaccess.eu Digital Curation Centre: www.dcc.ac.uk YouTube cartoon: www.youtube.com/watch?v=Yun9hkPPF9M OAIS: http://public.ccsds.org/publications/archive/650x0b1.pdf http://public.ccsds.org/sites/cwe/rids/Lists/CCSDS%206500P11/Overview.aspx http://developers.casparpreserves.eu:8080 http://sourceforge.net/projects/digitalpreserve

END

About researchers Communities aggregated to: Agriculture & Nutrition Behavioural Sciences Humanities Life Sciences Medicine Social Sciences Physical Sciences Socio-Cultural Sciences Technology Based on KNAW classification (Royal Netherlands Academy of Arts and Sciences)

Reasons for preservation It is unique. It potentially has economic value. It may stimulate inter-disciplinary collaborations. It allows for re-analysis of existing data. It may serve validation purposes in the future. It will stimulate the advancement of science (new research can build on existing knowledge). If research is publicly funded, the results should become public property and therefore properly preserved.

Reasons for preservation (R) It is unique It potentially has economic value It may stimulate inter-disciplinary collaborations. It allows for re-analysis of existing data. It may serve validation purposes in the future. It will stimulate the advancement of science. If research is publicly funded, the results should become public property and therefore properly preserved.

Reasons for preservation (DM) It is unique It potentially has economic value It may stimulate inter-disciplinary collaborations. It allows for re-analysis of existing data. It may serve validation purposes in the future. It will stimulate the advancement of science. If research is publicly funded, the results should become public property and therefore properly preserved.

Reasons for preservation (P) It is unique It potentially has economic value It may stimulate inter-disciplinary collaborations. It allows for re-analysis of existing data. It may serve validation purposes in the future. It will stimulate the advancement of science. If research is publicly funded, the results should become public property and therefore properly preserved.

Threats to preservation The ones we trust to look after the digital holdings may let us down. The current custodian of the data, whether an organisation or project, may cease to exist at some point in the future. Loss of ability to identify the location of data. Access and use restrictions (e.g. Digital Rights Management) may not be respected in the future. Evidence may be lost because the origin and authenticity of the data may be uncertain. Lack of sustainable hardware, software or support of computer environment may make the information inaccessible. Users may be unable to understand or use the data e.g. the semantics, format or algorithms involved.

Threats to preservation (DM) The ones we trust to look after the digital holdings may let us down The current custodian of the data may cease to exist Loss of ability to identify the location of data Access and use restrictions may not be respected in the future Evidence may be lost Lack of sustainable hardware/software Users may be unable to understand or use the data

Threats to preservation (P) The ones we trust to look after the digital holdings may let us down The current custodian of the data may cease to exist Loss of ability to identify the location of data Access and use restrictions may not be respected in the future Evidence may be lost Lack of sustainable hardware/software Users may be unable to understand or use the data

OAIS Archival Information Package (AIP) Neeri 2009 1-2 Oct 2009, Helsinki

Rep Info Virtualisation /DISCIPLINE Introducing the layered view of CASPAR which points out that we need to deal with more than RepInfo e.g. Digital Rights etc. Follow the “life-cycle” description in the CM. The items in the red ellipse are the RepInfo we have been talking about previously. The virtualisation is introduced in order to help with automation i.e. we need programmes to process the bytes – how can we make this easier? Virtualisation /DISCIPLINE

CASPAR Consortium http://www.casparpreserves.eu EU FP6 Integrated Project Total spend approx. 16MEuro (8.8 MEuro from EU) Brief introduction to the CASPAR consortium – not all of whom are represented here. http://www.casparpreserves.eu

CASPAR Solution The CASPAR Foundation Facade Layer KeyComponents Information Package Mngt Communication Mngt Information Access Designated Community & Knowledge Mngt Security Mngt The CASPAR Foundation KeyComponents Framework Platform

CASPAR Foundation The CASPAR Foundation KeyComponents GapManager DataAccess&Security RepInfoToolbox SemanticWeb Orchestration Registry Packaging DigitalRights FindingAids DataStores Authenticity Virtualisation Framework CASPAR Service Factory The CASPAR Foundation Application Server: Tomcat, Glassfish, WASCE Development Framework: JAX-WS, GWT, Ant Development Management: Hudson and JTrac Platform DBMS: H2, Postgres Java Platform Operating System: Linux, Unix, Windows, Mac

Preservation Issues 1… How To guarantee a digital information may be accessed and understood in the future How To guarantee a proper information package management within and OAIS Archive How To guarantee long-time preservation maintenance of any information package How To guarantee retrieval of Archival Information

Preservation Issues 2… How To guarantee intellegibility within heterogeneous Designated Communities and their digital information How To guarantee preservation actors are informed about change events How To guarantee an adequate security access with the proper rights to any resource and functionality within an OAIS Archive How To guarantee an adequate integrity and identity for any Archival Information

Answer - 1 To guarantee a digital information may be accessed and understood in the future, you need an adequate OAIS Representation Information REPINFO RepInfo ToolBox REG Registry VIRT Virtualisation

Answer - 2 To guarantee a proper information package management within and OAIS Archive, you need to create an adequate OAIS Information Package PACK Packaging

Preservation DataStores Answer - 3 To guarantee long-time preservation maintenance of any information package, you need an implementation of OAIS Archival Storage PDS Preservation DataStores

Answer - 4 To guarantee retrieval of Archival Information, you need an OAIS Finding Aids FIND Finding

Answer - 5 To guarantee intellegibility within heterogeneous Designated Communities and their digital information, you need to manage Designated Community Profiles and their Knowledge Base KM Knowledge

Answer - 6 To guarantee preservation actors are informed about change events, you need an adequate management of message exchange POM Orchestration

Digital Rights Manager Answer - 7 To guarantee an adequate security access with the proper rights to any resource and functionality within an OAIS Archive, you need a Security and DRM Management DRM Digital Rights Manager DAMS Data Access Manager & Security

Answer - 8 To guarantee an adequate integrity and identity for any Archival Information, you need an Authenticity Tool AUTH Authenticity

USE DATA Use application to find data in Repository Create DIP with enough RepInfo for the user (via DC profile) Obtain more RepInfo from Registry if necessary

Data… Level 2 GOME Satellite instrument data This is a mosaic composed by the data retrieved over 14 orbits per day with interpolation of data values to fill the gap between two quasi-adjacent geographical areas covered by the satellite ground track. On that date the white zone over Asia indicates absence of useful data. Format of file can be identified through a combination of file extension, physical location of file (as Indicated by Postgres database) and file name which indicates format. Textual description of the formats Is available on MST support pages Raw data IQ (In phase and quadrature) data - non standard binary format files format (textual description of format available) Spectral - Raw data which has undergone first stage processing to spectral non standard binary file format. (textual description of format available) Processed product Version 2 processed data product Radial data: Nasa Ames format (product specific textual description available) Cartesian data: Nasa Ames format (product specific textual description available) Version 1: processed data product Version 0: processed data product Time averaged radial data: non standard ASCII format (textual description available) Unaveraged radial data: non standard ASCII format (textual description available) Time averaged wind data: non standard ASCII format (textual description available) Unaveraged wind data: non standard ASCII format (textual description available) Time averaged power data: non standard ASCII format (textual description available) Quick Look plots: graphic file png format generated from cartesian products using GNU plot Under development With version 3 binary file of cartesian and radial product will be produced in the NetCDF format in tandem with a Nasa Ames ASCII versions. Once Version 3 processing (Python producing NetCDF radial and Cartesian Product) is released a reprocessing of the archived Spectral file will be undertaken. In order to produce a consistent series of Cartesian and radial product. As they will be of higher quality and more easily supported. After a period of time the BADC would wish to dispose of the old higher level product but would like to maintain the ability to recreate it.

Representation Information Network Neeri 2009 1-2 Oct 2009, Helsinki

Modules and Dependencies: defining the Designated Community README.txt TEXT EDITOR ENGLISH LANGUAGE WINDOWS XP FITS FILE FITS STANDARD PDF JAVA s/w JAVA VM s/w DICTIONARY SPECIFICATION UNICODE XML MULTIMEDIA PERFORMANCE DATA C3D DirectX MAX/MSP 3D motion data files 3D scene motion to music mapping strategy We can formalize intelligibility based on this notion. For example, to understand a file README.txt I need a TEXT EDITOR but also knowledge of ENGLISH LANGUAGE. Alternatively, and I if I could not understand ENGLISH, I could understand this file if I had an ENGLISH2GREEK DICTIONARY. We have the same “pattern” in multimedia performance data, in scientific data, ... The same “pattern” occurs almost everywhere. For instance, astronomers use FITS files. To understand a FITS file one has to understand the FITS standards, and so on.

Provenance: Performing Arts Here is shown the Representation Information model of this performance constructed with the use of the CNRS UI Prototype. Thanks to ULeeds and CNRS

Authenticity Model Neeri 2009 1-2 Oct 2009, Helsinki

Digital Rights Ontology Core DRO entities A Domain ontology About Intellectual Property Rights To describe all aspects that are relevant to evaluate rights Attributes of rights Validity in time and space Laws and Agreements People Actions Licenses Constraints and Conditions … Harmonized with CIDOC CRM and FRBRoo

Services of the DRM component A user or an application registerers the creation history of a creative work Title of the work, when and where it was made public for the first time Who contributed in some part of the work What is the precise contribution of each involved person … The DRM derives the existing Copyrights Who owns the rights Details of the rights Author Rights vs Related Rights Economic Rights vs Moral Rights Country of origin of the rights When do the rights expire DRM

Interoperable Provenance Information (1) Intellectual Property Rights Life Cycle

Creation of “Spaces of Mind” Interoperable Provenance Information (2) DRM Creation History Intellectual Property Rights P1F_is_identified_by E39_Actor Daniel Teruggi E82_Actor_Appellation Daniel Teruggi P14B_performed E21_Person Daniel Teruggi F31_Expression_Creation Creation of “Spaces of Mind” R22F_created XXX P14B_performed F20_Self-Contained_Expression XXX A20F_isOn XXX C5_Copyright R1 - Distribution Right E65_Creation Creation of “Spaces of Mind” XXX C5_Copyright R2 - Reproduction Right XXX XXX C5_Copyright R3 - Attribution Right CIDOC-CRM core ontology Digital Rights Ontology

Scenario: Change in Copyright Law Goal Demonstrate that the system withstands changes in Law Scenario: An amendment to a Copyright Law clarifies who can be considered a performer, hence eligible to related rights Current IPR Law No. 92–597 of July 1, 1992, Art. 212-1 says: “… persons who act, sing, deliver, declaim, play in or otherwise perform literary or artistic works, variety, circus or puppet acts.” A hypothetical amendment could extend the definition of performers including also people who project and interpret acousmatic sound files  We have more rightholders

Scenario 2 Update of PDI (Provenance) Update Intellectual Property Rights after a Change in Law POM FIND DRM preservation expert <use> DRM CASPAR Web Desktop PACK PDS

Scenario: Change in Copyright Law (2/2) 0. A notification is sent to the DRM Preservation Expert Through the POM Actions to do: The rules for correctly deriving existing rights must be updated Through the DRM Administration Interface Rights information (PDI-Provenance) must be updated for all affected archive holdings The affected AIPs are retrieved ( FIND Key Component ) The rights are re-calculated ( DRM Key Component ) The new Provenance is packaged in the AIPs and stored Through the PACK and PDS Key Components 76

Repository Audit and Certification Standard for certification in OAIS Roadmap Initial work produced TRAC Now an official CCSDS Working Group Open virtual meetings, notes and documents: http://www.digitalrepositoryauditandcertification.org Draft standard submitted to CCSDS/ISO to form the basis of an international audit and certification process

Conclusions Preservation Many layers of information needs more than format information needs more than Significant Properties Many layers of information This extra information must be created – tools are needed must itself be preserved Made easier by being integrated with (e-)infrastructure addressing users’ concerns Must deal with all types of digital objects (metadata)

OAIS conformance From OAIS: A conforming OAIS archive implementation shall support the model of information described in 2.2. The OAIS Reference Model does not define or require any particular method of implementation of these concepts. A conforming OAIS archive shall fulfill the responsibilities listed in 3.1. Subsection 4 provides examples of the mechanisms that may be used to discharge the responsibilities identified in 3.1. These mechanisms are not required for conformance. It is expected that a separate standard, as noted in section 1.5, will be produced on which accreditation and certification processes can be built.

Modules and Dependencies: Examples (Semantic Web data) ns4 ns2 ns3 ns1 The same is true for formally expressed knowledge. For instance, in the SWeb one schema may reuse or specialize elements coming from other schemas. Also in this case we have a dependency graph (over namespaces this time). RDF/S

Scenario: Intelligibility-aware Packaging FITS ZIP C3D DirectX MAX/MSP FITS DICTIONARY FITS STANDARD Gap(o2,P1) =  Gap(o2,P2) = {FITS, FITS_STANDARD, FITS_DICTIONARY, DICTIONARY_SPECIFICATION} Gap(o2,P3) = {FITS, FITS_STANDARD, FITS_DICTIONARY, DICTIONARY_SPECIFICATION, PDF_STANDARD, XML_SPECIFICATION, UNICODE_SPECIFICATION} Gap(o3,P3) = {ZIP} Gap(o3, ) = {ZIP, C3D, DirectX, MAX/MSP} DICTIONARY SPECIFICATION P2 Here you see the gaps between objects and various profiles (assumption the dependencies of objects have been recorded in a complete manner) PDF STANDARD XML SPECIFICATION UNICODE SPECIFICATION

Requirement for solution Threat Requirement for solution Users may be unable to understand or use the data e.g. the semantics, format, processes or algorithms involved Ability to create and maintain adequate Representation Information Non-maintainability of essential hardware, software or support environment may make the information inaccessible Ability to share information about the availability of hardware and software and their replacements/substitutes The chain of evidence may be lost and there may be lack of certainty of provenance or authenticity Ability to bring together evidence from diverse sources about the Authenticity of a digital object Access and use restrictions may make it difficult to reuse data, or alternatively may not be respected in future Ability to deal with Digital Rights correctly in a changing and evolving environment Loss of ability to identify the location of data An ID resolver which is really persistent The current custodian of the data, whether an organisation or project, may cease to exist at some point in the future Brokering of organisations to hold data and the ability to package together the information needed to transfer information between organisations ready for long term preservation The ones we trust to look after the digital holdings may let us down Certification process so that one can have confidence about whom to trust to preserve data holdings over the long term

Creating an OAIS Archival Information Package Need to decide and form strategies based upon Preservation objective Designated community knowledge base Information required based on this Sources and supply of information (is this achievable given realistic restrictions on resources) Identify Risks dependacies and what and things needs to be monitored