Download presentation
Presentation is loading. Please wait.
1
Collection Based Persistent Archives
Reagan Moore Chaitan Baru Amarnath Gupta George Kremenek Bertram Ludaescher Richard Marciano Arcot Rajasekar Wayne Schroeder Michael Wan Ilya Zaslavsky Bing Zhu (
2
Topics Components of a persistent archive
Information management example Data management example Knowledge management example
3
Fundamental Concept for a Persistent Archive
Persistence requires migration over time onto new technology While the migration occurs, a persistent archive must be able to interoperate with both the old technology and the new technology. A persistent archive is an interoperability system.
4
What Types of Interoperability are Needed?
Data management (data sets) Ability to work with multiple types of storage systems, across separate administration domains Information management (schema) Ability to define a collection independent of database choice Ability to migrate collection onto new databases Knowledge management (ontology) Ability to map old concepts to current view of the world Ability to present and manipulate information associated with data sets
5
Implicit Concepts Infrastructure independence Information models
Data set access Authentication Collection management Presentation Non-proprietary formatting Information models XML - Information markup language GML - Graphics markup language Functional separation of archival systems Accessioning workbench, archive, access workbench
6
Implicit Goals Maintain digital objects and the information retrieval catalog description in the archive Provide ability to instantiate collection as needed on new technology Instantiate archived collection only when needed Implies collection can sit in the archive forever, and can still be accessed at an arbitrary point in the future
7
Electronic Records Archive (ERA)
TRANSFER ACCESSION ARCHIVES REFERENCE Media Handlers Accessioning Work Bench (snapin) Reference Workbench (snapin) Retrieve Records Catalog METADATA REPOSITORY RECORDS Internet Intranet Text Image Photo Video Audio Geographical Information System Compound Records WEB Database Arrangement Query & Reference Tools A R C TAPE TAPE CD U N W R A P E CD W R A P E DISK DISK record FTP Presentation FTP Metadata wrapper Order Fulfillment
8
Common Information Model
eXtensible Markup Language (XML) Use tags to define semantic context for components of the data set Document Type Definition (DTD) Provides semi-structured representation for organizing tags that can be applied to groups of digital objects Development of standards for tags California Digital Library - Encoded Archival Description Digital sky, Protein Data Bank, Neuroscience brain images
9
Digital Object Representation
Require non-proprietary markup language for formats that can be controlled by the archive HTML - text SVG - Scalable Vector Graphics markup language As standards evolve, choose next format markup language to be a superset of the previous language Convert to new standard on the fly as digital objects are accessed, or during a media migration
10
Hierarchy of Information Contexts
Digital object context Meta-data to define the structure of the object When publishing a digital object, must also publish the context of the object Use collections to organize objects Meta-data to define the structure of the collection When publishing a collection, must also publish the information needed to organize the collection. Use knowledge context to control presentation Rules to map information to presentation style Rules that govern the generation of the digital objects
11
Information Management
XML representation of metadata attributes Standardization of DTDs - MOA II DTD for text Standardization of markup language XML based representation of collection structure Attributes defining the physical layout of a schema into relational tables (foreign keys, attribute data types, …) XML databases & XML organized data collections Commercial systems: Excelon, TAMINO, Oracle8i,…
12
Art Museum Image Consortium
Information management Support for heterogeneous digital objects Automated conversion of meta-data to XML DTD Validation of meta-data
13
AMICO Meta-data Conversion to XML
14
Collection Demonstrate ability to ingest, archive, recreate, query, and present a digital object from a 1 million record collection (RFC1036) 2.5 GB of data 6 required fields 13 optional fields User defined fields (over 1000) Determine information model needed for persistent archive
15
XML DTD for
16
Data Management Hierarchy
Persistent Archives Storage of information model, data model, along with data Data Grid Access to data in a different administration domain Digital Library - services Interlib - ADEPT, UC Berkeley Digital Library Data Collection Extensible Meta-data catalog - EMCAT Data handling SDSC Storage Resource Broker - SRB Archival Storage High performance storage system - HPSS
17
Storage Transparencies
Location transparency Distribution of data collection across multiple physical resources Name transparency Attributed based access to data Protocol transparency Common API for access to remote data resources Time transparency Minimization of data access latency
18
Digital Library Data Management
Persistent identifiers Ability to move a data set without the name changing Data set replicas Management of multiple copies of a data set Archival backup of data sets Integration of disk data caches with archival storage Persistent archives Management of a collection through multiple cycles of technology evolution
19
SDSC Storage Resource Broker
& Meta-data Catalog Application Resource Third-party copy File SID DBLobj SID Obj SID User Remote Proxies SRB MCAT ADSM HPSS DB2 Oracle Unix Dublin Core DataCutter Application Meta-data
20
Collection Based Access
Abstract data set naming and administration away from physical storage resource Data sets defined by attributes Logical collection used to group data sets across storage systems Enables support for replication of data Collection owned data Authentication controlled by data handling system Persistence controlled by data handling system
21
SRB Containers - Managing Archive Latency
Create container in a logical storage resource containing at least one “cacheable” resource Create objects in containers “Cache” daemon will move filled containers to archive synch and purge API’s SRB client SRB Server UNIX HPSS HPSS container cached containers Distributed Storage Resources
22
Knowledge Management Knowledge-based mediation
Conceptual-level integration Predictive learning models Rule-based ontology maps Map source XML to Concept Map (ontologies, views) Rule-based presentation and analysis Rules governing accessioning of data sets Rule governing integrity constraints Style sheets for presentation
23
AMICO Presentation Interface
25
Formatted Message Using XML DTD
26
Knowledge Representation
PROTLOC Result (XML/XSLT) ANATOM Result (VML) MODEL-BASED Mediation Surface atlas, Van Essen Lab CCB, Montana SU stereotaxic atlas LONI NCMIR, UCSD MCell, CNL, Salk
27
ANATOM
28
ANATOM
29
PROTLOC
30
Applications Support for distributed data collections
Federation of data collections to form digital library Integration of digital libraries with archives Finding aids for federation of digital libraries through mediation of information Data grids for data access Persistent archives
31
Communities Providing Technology
Archival storage - HPSS, ADSM, SANs Data handling - Storage Resource Broker Databases - XML, Object relational Digital libraries - services, information discovery Data grids - collection federation, finding aids Computational grids - remote execution Library - catalogs, DTDs, finding aids Archivist - archival procedures
33
Further Information
Similar presentations
© 2024 SlidePlayer.com. Inc.
All rights reserved.