CASPAR Cultural, Artistic and Scientific knowledge for Preservation Access and Retrieval
WHAT:objectives (from the call) Develop systems and tools which will support the accessibility and use over time of digital cultural and scientific resources. Explore how to preserve the availability and authenticity of digital resources over time Support emerging complexity of scientific, cultural and creative objects and associated repositories
Objectives Objective 1: to lay the foundation for all future preservation activities (CASPAR methodology) Objective 2: to create key advanced components to use in all the preservation activities (CASPAR components) Objective 3: to create the long-term autonomous system to support all the preservation activities (CASPAR framework) Objective 4: to demonstrate the validity of the CASPAR framework with heterogeneous data and a variety of innovative applications (CASPAR testbeds) In addition to these fundamental objectives, CASPAR offers supporting activities in order to guarantee the successful execution of the project results even after the end of the project and the re-usability of outcomes in a wider domain than the testbed-related sectors: Objective 5: to build up the CASPAR preservation user community in order to create consensus around the initiative and gather a critical mass of potential users/customers Objective 6: to create a self-sustainable model for the CASPAR process and offer supporting activities in order to promote the successful exploitation of the project results after the end of the project.
WHAT: vision CASPAR manages knowledge to keep archives alive through time: Preserve information & knowledge – not just “the bits” Preservation is a process, not a one-shot event transforming content (migration, emulation, etc.) to adapt it to new constraints of rendition and playability and enriching content to preserve its intelligibility and (re)usability (not just rendering) OAIS provides a general framework: current implementations deal more with format than the interpretation of data CASPAR proposes a richer implementation for dealing with content interpretation
WHAT: expected results CASPAR approach and framework to support the “end-to-end” lifecycle for scientific, cultural and creative digital resources Infrastructure Tools Techniques Testbeds: science, culture, artistic to identify and test common infrastructure Supported by discipline specific access Embedded in long-lived institutions
must be relatively easy to use must have a low “buy-in” in terms of effort required to adopt the CASPAR paradigm must avoid requiring wholesale change of everyone else’s systems must be decentralised and reproducible so that it can live on after the formal end of the CASPAR project.
FOR WHOM Potential USERS: Creators of the resources Funders of the resources and their preservation Curators of the resources Suppliers of preservation-related services Users of the information
...for WHOM Large users communities involved with Science: Culture European Space Agency CCLRC Culture UNESCO Artistic INA, IRCAM, CIANT … Creators Funders Curators Suppliers End-users
.......for WHOM Multi-Industry perspectives Software Hardware Middleware
HOW: Foundations of Preservation approach OAIS Reference Model OAIS related stds work: Producer-Archive interface NARA/RLG Audit & Certification draft – now released for testing and comment SIP, XFDU….others OAIS based projects InterPARES ….many others
HOW: Implementation plan structure (blocks of work)
HOW (cont’d): S&T approach Component-based research OAIS-based components e.g. Storage OAIS-based extensions Next generation components Focused research & testbeds: vertical threads
HOW (cont’s): OAIS extensions Knowledge driven approach Knowledge management to support long-term preservation of concepts/information: Single, complex, on demand, interactive objects DRM Authenticity Access Storage
Framework Integrated Framework: supports the development of the three vertical testbeds Component-based research Open standards & Open Source development methodology Framework: integration of research components with existing off-the-shelf/modifiable-off-the-shelf components Service Oriented Architecture for service delivery Process control and composition
CASPAR Testbeds Three testbeds: Cultural, Performing Arts, Scientific Cultural <- UNESCO Peforming Arts <- INA , IRCAM Scientific <- ESA (with CCLRC) Complex, multi-source, multifaceted data Specific requirements on preservation (technical, delivery, legal) Specific research issues: as matter of facts, they represents three focused research streams Identifying and confirming common infrastructure elements
CASPAR testbeds: Testing and Validation Common design & validation methodology Uniform evaluation parameters Each testbed has its own user communities Continuous feeding to the Project Performance Evaluation process
CASPAR Integrated architecture
CCLRC Infrastructure Build-up FP7 projects Other CCLRC projects CCLRC Curation Facility European Preservation Infrastructure Alliance CASPAR Other CCLRC projects Other Alliance Members e.g. ESA Future Alliance Members
Registries
UK DCC Organisation communities of practice: users Industry curation organisations eg DPC community support & outreach Collaborative Associates Network of Data Organisations service definition & delivery management & admin support research collaborators research development co-ordination testbeds & tools Industry standards bodies
DCC Registry
Sharing RepInfo RepInfo is needed RepInfo is extensive May need to “extend” RepInfo as Designated Community and/or its knowledgebase changes How can we avoid every Repository repeating the work Need to control costs Need to share the effort
Requirements Data users - need to be able to obtain pre-identified RepInfo Curators: need to be able to find suitable pre-existing RepInfo to re-use Or Create RepInfo
Registry for Representation Info The Digital Object could have RepInfo packed with it Support automated access & processing Example of use of Representation Information Labelling
Use of RepInfo CPID CPID Structure = CPID Semantics = CPID Rendering s/w = CPID Each “bag of bits” has an associated pointer (CPID) to a Label Structure = CPID Semantics = CPID Rendering s/w = CPID CPID Structure = CPID Semantics = CPID Rendering s/w = CPID Registry External
Registry Interface Requirements Give it an identifier, give me back something (e.g. RepInfo) Allow me to search for RepInfo Interoperable with other (format) registries Not limited to single protocols
Registry API API allows applications to talk to many different implementations http://dev.dcc.ac.uk/cvs
API
ebXML Registry Version 3.0: Simplified View of Architecture Source: ebXML Registry Services and Protocols Committee Draft, 10 February 2005
Labels and CPIDs
Example RepInfo Label A Label is itself RepInfo. It provides a way to collect together in a sensible way lots of individual pieces of RepInfo
Re-using RepInfo Existing RepInfo can be used to build up further RepInfo E.g. refer to existing RepInfo in labels
Versioning and LID Each object has a unique identifier Versions of an object share a “logical ID” (LID) Simply using the LID gives the latest version Can specify a particular version
Clients DCC Registry: Any Registry Web browser Thick client (http://registry.dcc.ac.uk) Any Registry Applications using API
GUI access to Registry
Classifications Many Classification Schemes Help to find RepInfo
Initial RepInfo Simple text PDF, Word(!) FITS format ASCII Unicode UTF7/8 PDF, Word(!) FITS format FITS standard dictionaries Things that are “MISSING”
RepInfo entry Simple command line tool
Creating Repinfo There are many tools which can be used to create RepInfo: Simple text editor to create text describing the data Complex tools to capture data description e.g. EAST (see next slides) DFDL etc Programming languages of various sorts
EAST descriptions
OASIS tool for creating EAST descriptions Snapshot d ’écran OASIS
Example of EAST description
Using RepInfo A pointer to RepInfo can be attached to data The RepInfo can be used to Display Examine Process Re-use the data
Example of use of RepInfo Laser facility produces Binary data normally used by proprietary software Describe using EAST data description language Use in generic application (shown here) to display/process
Simple Buy-In Need to add RepInfo to your Data Objects? Does the RepInfo already exist? Yes: get its ID and put that in a label No: register what you have – be assigned an ID. Add more details later when needed Or others can add more details
Operating Registries See http://dev.dcc.ac.uk/twiki/bin/view/Main/RegistryProcedures