SCIDIP-ES services and toolkits David Giaretta. Preserving digitally encoded information Ensure that digitally encoded information are understandable.

Slides:



Advertisements
Similar presentations
OMV Ontology Metadata Vocabulary April 10, 2008 Peter Haase.
Advertisements

CASPAR Validation. Metrics CASPAR Approach Representation Information (RepInfo) RepInfo Networks and their maintenance.
Long-Term Preservation. Technical Approaches to Long-Term Preservation the challenge is to interpret formats a similar development: sound carriers From.
Co-funded by the European Union under FP7-ICT Co-ordinated by aparsen.eu #APARSEN Sustainability and the APARSEN Network of Excellence: Preservation.
An Introduction to Repositories Thornton Staples Director of Community Strategy and Alliances Director of the Fedora Project.
DigCCurr 2007: What digital curators do and what they need to know The CASPAR view on: What digital curators do and what they need to know : Research Perspectives.
Co-funded by the European Union under FP7-ICT Alliance Permanent Access to the Records of Science in Europe Network Co-ordinated by aparsen.eu #APARSEN.
Project Overview APA Conference 2012 ESA/ESRIN (Frascati), 6-7 November 2012 D. Giaretta (APA)
John Garrett  The CCSDS Archiving Standards  OAIS Standard  Overview  Contextual Model  Mandatory Functions  OAIS Updates  Definitions.
CODATA 2006, Beijing, China Oct CASPAR: Early results and future goals David Giaretta.
Digital Preservation DAVID GIARETTA (APA) FIRST PRELIDA WORKSHOP, TIRRENIA, JUNE 25TH-- ‐ 27TH,2013.
PARSE.Insight Framework and Lesson Learned David Giaretta (STFC)
SCIDIP-ES Components Oct ,Brussels. Basic Preservation Strategies Often stated as: “Emulate or Migrate” OAIS concepts change these to: Add Representation.
Project Overview APA Conference 2012 ESA/ESRIN (Frascati), 6-7 November 2012 M. Albani (European Space Agency), Project Coordinator.
Future Access to the Scientific and Cultural Heritage – A shared Responsibility Birte Christensen-Dalsgaard State and University Library.
E-IRG Open Workshop on e-Infrastructures 4-5 Oct 2006 CASPAR Project Digital Preservation and Digital interoperability.
Co-funded by the European Union under FP7-ICT Alliance Permanent Access to the Records of Science in Europe Network Co-ordinated by aparsen.eu #APARSEN.
Who is doing a good job in digital preservation? Audit and Certification of Digital Repositories: ISO and the European Framework.
David Giaretta Associate Director (Development) Funders: DCC Development Digital Curation Centre a centre of expertise in data curation and preservation.
Project Overview APA Conference 2012 ESA/ESRIN (Frascati), 6-7 November 2012 M. Albani (European Space Agency), U.Di Giammatteo (ACS), D. Giaretta (APA)
Science Archives in the 21st Century 25/26 April Towards an International standard for Audit and Certification of Digital Repositories David Giaretta.
APARSEN Metadata for preservation, curation and interoperability Workshop on Research Metadata in Context 7-8 Sept 2010, Nijmegen David Giaretta APA and.
Caring and Sharing Collaboration in Digital Curation outside North America Ross Harvey Simmons College, Boston Curation Matters: 17 June 2010.
Reference Model for an Open Archival Information System (OAIS) ESIP Summer Meeting John Garrett – ADNET Systems at NASA/GSFC ESIP Summer Meeting.
Archival Information Packages for NASA HDF-EOS Data R. Duerr, Kent Yang, Azhar Sikander.
CASPAR Framework and Lessons Learned David Giaretta.
Creating Archive Information Packages for Data Sets: Early Experiments with Digital Library Standards Ruth Duerr, NSIDC MiQun Yang, THG Azhar Sikander,
Digital Preservation: Current Thinking Anne Gilliland-Swetland Department of Information Studies.
PARSE.Insight and APARSEN Reaching a Common Vision for digital preservation research David Giaretta.
Archival Workshop on Ingest, Identification, and Certification Standards Certification (Best Practices) Checklist Does the archive have a written plan.
OAIS Rathachai Chawuthai Information Management CSIM / AIT Issued document 1.0.
April 12, 2005 WHAT DOES IT MEAN TO BE AN ARCHIVES? Trusted Digital Repository Model Original Presentation by Bruce Ambacher Extended by Don Sawyer 12.
M-1 INGEST OVERVIEW Don Sawyer National Space Science Data Center NASA/GSFC October 13, 1999.
Some OAIS Concepts ICPSR Conforming to OAIS 1. Fulfill 6 OAIS Responsibilities 2. Conform to the OAIS Information Model.
The OAIS Reference Model Michael Day, Digital Curation Centre UKOLN, University of Bath Reference Models meeting,
Preservation metadata and the Cedars project Michael Day UKOLN: UK Office for Library and Information Networking University of Bath
Lifecycle Metadata for Digital Objects November 15, 2004 Preservation Metadata.
The OAIS Reference Model and Trustworthy Repositories Josh Lubell Manufacturing Engineering Laboratory NIST
Infrastructure Breakout What capacities should we build now to manage data and migrate it over the future generations of technologies, standards, formats,
Data Preservation at Rutherford Lab David Corney 9 th July 2010 KEK.
An overview of the Reference Model for an Open Archival Information System (OAIS) Michael Day, Digital Curation Centre UKOLN, University.
Fulvio Marelli - ESA and future An example of data lifecycle: sensed data need to be acquired…
Data Management: Data Processing Types of Data Processing at USGS There are several ways to classify Data Processing activities at USGS, and here are some.
BNSC Agency Report David Giaretta Colorado Springs 16 Jan 2007.
OAIS (archive) Producer Management Consumer. Representation Information Data Object Information Object Interpreted using its Yields.
DP Knowhow: Introduction to Audit and Certification in ISO APARSEN-EGI Community Workshop on Managing, Computing and Preserving Big Data for Research.
OAIS (archive) OAIS (archive) Producer Management Consumer.
INFORMATION SYSTEMS SERVICES UNIVERSITY OF LEEDS ERPANET: OAIS Seminar Copenhagen - København 28th November 2002 Introducing the OAIS Model _________________________________.
School on Grid & Cloud Computing International Collaboration for Data Preservation and Long Term Analysis in High Energy Physics.
Co-funded by the European Union under FP7-ICT Alliance Permanent Access to the Records of Science in Europe Network Co-ordinated by aparsen.eu #APARSEN.
PV 2009, ESAC, Spain, 1-3 Dec Long term data and knowledge preservation for the Earth Sciences Archive S. ALBANI (ESA) D. Giaretta (STFC) PV 2009.
Co-funded by the European Union under FP7-ICT Alliance Permanent Access to the Records of Science in Europe Network aparsen.eu #APARSEN Options.
Co-funded by the European Union under FP7-ICT Co-ordinated by aparsen.eu #APARSEN CoE offerings Simon Lambert STFC All Hands Meeting, Amsterdam,
Co-ordinated by aparsen.eu #APARSEN Co-funded by the European Union under FP7-ICT Services and Sustainability David Giaretta,
DP Knowhow: Open Archival Information Systems (OAIS) in ISO APA/C-DAC International Conference on Digital Preservation and the Development of Trusted.
CESSDA SaW Training on Trust, Identifying Demand & Networking
Digital Sustainability on the EU Policy Level
Digital Sustainability on the EU Policy Level
WP14 Common Testing Environments
Criteria for Assessing Repository Trustworthiness: An Assessment
Dependency Management
David Giaretta Colorado Springs 16 Jan 2007
OAIS Producer (archive) Consumer Management
D33.1B PEER REVIEW OF DIGITAL REPOSITORIES
CASPAR Cultural, Artistic and Scientific knowledge for Preservation Access and Retrieval.
Active Data Management in Space 20m DG
Identifiers Answer Questions
An Open Archival Repository System for UT Austin
Open Archival Information System
The Reference Model for an Open Archival Information System (OAIS)
Presentation transcript:

SCIDIP-ES services and toolkits David Giaretta

Preserving digitally encoded information Ensure that digitally encoded information are understandable and usable over the long term – Long term could start at just a few years Need to do something because things become “unfamiliar” over time But the same techniques enable use of data which is “unfamiliar” right now

The OAIS Reference Model is concerned with the Long Term preservation of information provides vital concepts that are necessary to preserve digitally encoded information provides testable mandatory responsibilities provides useful vocabulary and check-lists is widely used in the design and description of archives and libraries. forms the basis of a number of follow-on standards which are being developed. OAIS CONFORMANCE Mandatory responsibilities Negotiate for and accept appropriate information from information Producers. Obtain sufficient control of the information provided to the level needed to ensure Long Term Preservation. Determine, either by itself or in conjunction with other parties, which communities should become the Designated Community and, therefore, should be able to understand the information provided, thereby defining its Knowledge Base. Ensure that the information to be preserved is Independently Understandable to the Designated Community. In particular, the Designated Community should be able to understand the information without needing special resources such as the assistance of the experts who produced the information. Follow documented policies and procedures which ensure that the information is preserved against all reasonable contingencies, including the demise of the archive, ensuring that it is never deleted unless allowed as part of an approved strategy. There should be no ad-hoc deletions. Make the preserved information available to the Designated Community and enable the information to be disseminated as copies of, or as traceable to, the original submitted Data Objects with evidence supporting its Authenticity. Long Term Preservation: The act of maintaining information, Independently Understandable by a Designated Community, and with evidence supporting its Authenticity, over the Long Term. OAIS Functional Model – useful terminology “Open Archival Information System (OAIS), now adopted as the “de facto” standard for building digital archives" NSF: Cyberinfrastructure Vision for 21st Century DiscoveryCyberinfrastructure Vision for 21st Century Discovery Available free from for more information see OAIS Information Model – key concepts needed for conformance The information that maps a Data Object into more meaningful concepts. Examples include software, ontologies, formal data descriptions, human readable documentation, web pages... Representation Information is itself Information and hence there is a network – a kind of recursion. This recursion stops when it matches the Designated Community’s Knowledge Base AIP: a set of information that has, in principle, all the qualities needed for permanent, or indefinite, Long Term Preservation of a designated Information Object 2002, updated 2011

Information model: Representation Information The Information Model is key Recursion ends at KNOWLEDGEBASE of the DESIGNATED COMMUNITY (this knowledge will change over time and region)

Archival Information Package Preservation Description Information Preservation Description Information Content Information further described by Package Description Packaging Information derived from described by delimited by identifies Data Object Data Object Representation Information Representation Information Physical Object Digital Object Structure Information Semantic Information Reference Information Provenance Information Context Information Fixity Information Other Representation Information Interpreted using Bit adds meaning to Access Rights Information Interpreted using 1 * *

PARSE.Insight: Indication of distribution of researchers’ responses Researchers: 1/3 Europe 1/3 USA 1/3 rest of world Incomplete sample of respondees Overall: 44% Europe 33% USA 23% rest of world

What? Data spectrum (R)

Sharing of data (R) How open is your data?

Sharing of data (R) Which constrains do you see in making data open?

Threats to preservation 1.The ones we trust to look after the digital holdings may let us down. 2.The current custodian of the data, whether an organisation or project, may cease to exist at some point in the future. 3.Loss of ability to identify the location of data. 4.Access and use restrictions (e.g. Digital Rights Management) may not be respected in the future. 5.Evidence may be lost because the origin and authenticity of the data may be uncertain. 6.Lack of sustainable hardware, software or support of computer environment may make the information inaccessible. 7.Users may be unable to understand or use the data e.g. the semantics, format or algorithms involved.

Threats to preservation (R) The ones we trust to look after the digital holdings may let us down The current custodian of the data may cease to exist Loss of ability to identify the location of data Access and use restrictions may not be respected in the future Evidence may be lost Lack of sustainable hardware/software Users may be unable to understand or use the data

Threats to preservation (R) Users may be unable to understand or use the data e.g. the semantics, format or algorithms involved.

What works - evidence

CASPAR in brief Prototyped discipline independent Infrastructure components Carried out fundamental research based on and contributing to OAIS Developed toolkits for Representation Information, Authenticity, Digital Rights etc Provided substantial collection of evidence, validated by the designated communities, supporting their effectiveness for digital preservation by: accelerated lifetime tests using changes in hardware, software, environment and knowledge base of designated communities using many types of digitally encoded information – data and documents from science (STFC, ESA), cultural heritage (UNESCO) and contemporary performing arts (CIANT, INA, IRCAM, Univ Leeds) Infrastructure to support preservation of all types of digitally encoded information. Supports maintenance of Representation Information Networks. simple, re-implementable interfaces no single point of failure decentralised heterogeneous asynchronous Toolkits to create all components of AIPs Test scenarios vs Threats to digital preservation For more information see and

Infrastructures

FUTURE Users may be unable to understand or use the data e.g. the semantics, format, processes or algorithms involved Non-maintainability of essential hardware, software or support environment may make the information inaccessible The chain of evidence may be lost and there may be lack of certainty of provenance or authenticity Access and use restrictions may not be respected in the future Loss of ability to identify the location of data The current custodian of the data, whether an organisation or project, may cease to exist at some point in the future The ones we trust to look after the digital holdings may let us down

Preservation Infrastructure Services which are not centralised, no single point of failure Supplements for existing archives to improve their ability to preserve their holdings – Do not replace everything – small additions – better certification result Simple services which can be maintained into the future

SCIDIP-ES in brief Upgrade CASPAR prototype components into scalable, robust e- infrastructure components to support digital preservation of all types of digital objects decentralised, heterogeneous, asynchronous, no single point of failure Persistent, simple re- implementable interfaces critical mass of users: Earth science as initial focus Other disciplines via APA DIGITAL PRESERVATION RESEARCH needed to create the tools needed to create the “metadata” used by the e-infrastructure and user applications. Tools may be domain dependent. Must include Rep. Info. Network of the metadata SCIence Data Infrastructure for Preservation – with focus on Earth Science Led by ESA. Currently in negotiation with EU. For more information see Storage Service Gap Identification Service Orchestration Service RepInfo Registry Service Preservation Strategy Toolkit Process Virtualisation Toolkit Finding Aid Toolkit Cloud Storage Persistent ID i/f Service External PI services ISO Certification Organisation Certification Toolkit External Access/Use Services E-INFRASTRUCTURE TOOLKITS Archives User applications Domain independent Infrastructure counters threats identified by PARSE.Insight based on CASPAR prototypes APARSEN will produce a common vision to allow a coherent approach Will help archives with certification

ThreatRequirement for solution Users may be unable to understand or use the data e.g. the semantics, format, processes or algorithms involved Ability to create and maintain adequate Representation Information Non-maintainability of essential hardware, software or support environment may make the information inaccessible Ability to share information about the availability of hardware and software and their replacements/substitutes The chain of evidence may be lost and there may be lack of certainty of provenance or authenticity Ability to bring together evidence from diverse sources about the Authenticity of a digital object Access and use restrictions may make it difficult to reuse data, or alternatively may not be respected in future Ability to deal with Digital Rights correctly in a changing and evolving environment Loss of ability to identify the location of data An ID resolver which is really persistent The current custodian of the data, whether an organisation or project, may cease to exist at some point in the future Brokering of organisations to hold data and the ability to package together the information needed to transfer information between organisations ready for long term preservation The ones we trust to look after the digital holdings may let us down Certification process so that one can have confidence about whom to trust to preserve data holdings over the long term RepInfo toolkit, Packager and Registry – to create and store Representation Information. In addition the Orchestration Manager and Knowledge Gap Manager help to ensure that the RepInfo is adequate. Registry and Orchestration Manager to exchange information about the obsolescence of hardware and software, amongst other changes. The Representation Information will include such things as software source code and emulators. Authenticity toolkit will allow one to capture evidence from many sources which may be used to judge Authenticity. Digital Rights and Access Rights tools allow one to virtualise and preserve the DRM and Access Rights information which exist at the time the Content Information is submitted for preservation. Persistent Identifier system: such a system will allow objects to be located over time. Orchestration Manager will, amongst other things, allow the exchange of information about datasets which need to be passed from one curator to another. The Audit and Certification standard to which CASPAR has contributed will allow a certification process to be set up.

AUTHENTICITY/ ANNOTATION FINDING AIDS DRM DAMS REGISTRY DATA STORE ORCHESTRATION PACKAGING REPINFO TOOLBOX GAP MGR Q5: Please explain by means of a graphic a potential distribution of the SCIDIP-ES infrastructure with respect to geographical locations (for example for storage), and with a mapping to the OAIS model. DATA STORE AIP (Archival Information Package) 5

Summary – SCIDIP-ES services and toolkits Demonstrated demand for these services Demonstrated effectiveness across domains Maintainable