Download presentation
Presentation is loading. Please wait.
Published byGyles Page Modified over 6 years ago
1
Introduction to Digital Libraries Week 8: Complex Objects, Part 1
Old Dominion University Department of Computer Science CS 695 Fall 2005 Michael L. Nelson 10/17/05
2
Interrelated Projects: A Partial View
PICS RDF DC … METS WF Fedora Dienst OAI-PMH KWF MPEG-21 DIDL Buckets/SODA
3
Warwick Framework
4
From Dublin to Warwick Second Invitational Metadata Workshop, 1996
Extend DC by introducing typed metadata packages aggregated into containers packages: simple (DC, MARC, FGDC, etc.) indirect (pointers to remote packages) container (hierarchical relationships)
5
WF Motivations no distinction between data & metadata
no single “about” relationship resource location does not matter resources can be computational objects
6
Simple WF Container Figure 1 from:
7
Warwick Framework Catalog
WFC = a table of contents or manifest defining the relationships of the packages in the container…
8
A WF Catalog as a WF Package
Figure 2 from:
9
Metadata vs. Data Figure 3 from:
10
Distributed Digital Object Container
Figure 5 from:
11
Distributed Active Relationships
Figure 6 from:
12
Fedora
13
Fedora Principles From Payette & Lagoze, ECDL 1998 ( support for heterogeneous data types accommodation of new types as they emerge aggregation of mixed, possibly distributed, data into complex objects the ability to specify multiple content disseminations of these objects the ability to associate rights management schemes with these disseminations
14
Fedora Services From Payette & Lagoze, ECDL 1998 ( repository services that provide the mechanisms for depositing, storing and accessing digital objects index services that provide the mechanisms for discovering digital objects collection services that provide the means of aggregating sets of digital objects and services into meaningful collections naming services that register and resolve globally unique, persistent names for digital objects user interface services that provide a human gateway into the other services
15
Fedora Digital Object Figure 7 from:
16
Fedora Primitive Disseminator
Figure 1 from:
17
Fedora Content-Type Disseminators
“The service requests of particular content-types are not actually specified in the DigitalObject architecture; instead, FEDORA provides the means to link to externally-defined content types.” Figure 2 from:
18
How to Identify External Content Types?
Store the Content-Types as DOs assign handles or other URIs define 2 types: signature: the what servlet: the how
19
Servlets & Signatures Figure 3 from:
20
Accessing a Fedora DO Figure 7 from:
21
Static -> Dynamic Binding of Behaviors
Dushay(JCDL 2002) introduces a behavior registry DOs expose their structural metadata specified in structoid schemas DOs link to the behavior registry, not to specific disseminators the registry matches structural metadata requirements, adding a layer of indirection between DOs & behaviors
22
Discovering Behaviors
Figure 7 from:
23
Invoking a Behavior Figure 8 from:
24
Buckets / SODA
25
Repositories must look after the information they hold
Principle #7 in Bill Arm’s 1995 D-Lib Magazine paper: Repositories must look after the information they hold “Repository Access Protocol” Kahn Wilensky Framework figure 3 in
26
Objects vs. Archives This is the tenet that I question…
Most DL objects still bound to the applications that generate or render the objects
27
Archival Responsibility
Traditional archives provide collection level granularity for terms and conditions, access control, etc. Buckets are autonomous and intelligent, and can enforce access control by themselves “archivelets”? Buckets are heterogeneous every bucket can have different access controls
28
SODA: Smart Objects, Dumb Archives
Objects are more important than the archive that holds them The object should be the authority on its contents, not an archive We envision a general shift of intelligence from archives to the objects themselves DL protocols should find, index, and search -- not know about file formats, policy, terms and conditions, etc.
29
Content is King The information content is more important than the systems used for its storage, management and retrieval Objects should not be “locked” in specific DLs or archives
30
Design Goals Aggregation Intelligence
DLs should be shielded from the transient nature of file formats Prevent information hemorrhaging by archiving all data types Intelligence Aggregation (above) implies code, why stop at passive objects? Make objects smart... Bucket-bucket & bucket-tool intelligence
31
Design Goals Self-Sufficiency Mobility
Maximum autonomy & survivability: fully self-sufficient buckets Option to internally store all needed materials Mobility Why should an information object be stuck in one place? Mobility for replication, workflow, data collection
32
Design Goals Heterogeneity Archive Independence
One size does not fit all... Different buckets for different applications, sites, disciplines, etc. Archive Independence Focus is on information, not yet another DL “system” does not require an archive to function “Work with everything; break nothing”
33
Buckets Aggregation + intelligence = buckets
Object-oriented, intelligent agent archival entities A collection of all information about a project: manuscripts - software data - images video - etc. Customizable, heterogeneous buckets can “learn”, “talk”, and “coordinate” buckets control terms and conditions, display, etc. -- not the archive that holds them
34
A Sample Bucket 4 packages: - report (4 elements)
- appendix (2 elements) - contact information (2 elements) - translation (1 element)
35
Another Sample Bucket 2 packages: - pre-print (2 elements)
- pointer to SFX reference linking service for published and pre-print versions (2 elements) this bucket display for the Universal Preprint Service
36
IRI Project Bucket Not really a replacement home
page, but a storage container for source code, publications, reports, forms, reviews, etc.
37
Course Bucket Readings, slides, syllabus, etc.
38
Heterogeneous Buckets
Buckets are envisioned to locally modifiable and extensible There is a default set of public methods defined for buckets additional methods can be locally defined Buckets can “learn” new methods new “default” methods, or locally defined extensions override default methods
39
Bucket 1.x Messages Sample bucket messages:
returns the metadata for the bucket invokes the default display method displays a single element lists all the methods that this bucket implements
40
Bucket 2.0 Messages aggregate: assumptions metadata data
methods to operate on the metadata/data (cheat) assumptions Perl http server
41
2.0 Internal Structure jaga.cs.odu.edu:/home/mln/public_html/teaching/cs695-f03 % ls bucket/ CVS/ index.cgi* jaga.cs.odu.edu:/home/mln/public_html/teaching/cs695-f03 % ls bucket/ bucket.xml* content/ CVS/ lib/ logs/ methods/ jaga.cs.odu.edu:/home/mln/public_html/teaching/cs695-f03 % ls bucket/content/ ~syllabus.txt ~week1~readings.html ~week5~readings.html ~week10~readings.html ~week1~week-01.ppt ~week6~readings.html ~week11~readings.html ~week2~readings.html ~week7~readings.html ~week12~readings.html ~week2~week-02.ppt ~week8~readings.html ~week13~readings.html ~week3~assignment1.ppt ~week9~readings.html ~week14~readings.html ~week3~readings.html ~week15~readings.html ~week3~week-03.ppt jaga.cs.odu.edu:/home/mln/public_html/teaching/cs695-f03 % ls bucket/lib CVS/ EZXML.pm mime.e style.css jaga.cs.odu.edu:/home/mln/public_html/teaching/cs695-f03 % ls bucket/logs/ access.log CVS/ mylog.log jaga.cs.odu.edu:/home/mln/public_html/teaching/cs695-f03 % ls bucket/methods/ addElement.pl* getElement.pl* listMethods.pl* setPreference.pl* CVS/ get_log.pl* listPreference.pl* deleteElement.pl* getlog.pl* log.pl* display.pl* getMetadata.pl* setMetadata.pl* jaga.cs.odu.edu:/home/mln/public_html/teaching/cs695-f03 %
42
Examples 1.6.X bucket 2.0 buckets 3.0 buckets (under development)
2.0 buckets 3.0 buckets (under development) uses MPEG-21 DIDLs cf.
43
Buckets as Entire Archive
Archive Ingest and Handling Test (AIHT) project upcoming D-Lib Magazine Articles (December 2005) implicit assumptions: 1 bucket = 1 logical item (N physical items) Display is for human use Bucket contents are DOM-parsable
44
Which Interface? Display based on web use
Display based on archival use
45
Bucket / MPEG-21 Model http://beatitude.cs.odu.edu:8080/bucket/
MPEG-21 DIDL Payload Bucket Infrastructure methods logs support libraries
46
MPEG-21 DIDL A generic, powerful complex object metadata format
Based on an abstract data model Semantics separated from syntax i.e. the tags don’t mean anything -- a little disconcerting at first glance Digital library use championed by LANL
47
Mobility Model Not filesystem semantics Closer to process semantics
cp, mv, rm Closer to process semantics fork, exec returns a bucket stream (fork) overwrites the bucket with the uploaded bucket stream Arguments bucket, package, payload, ride format only tar currently supported
48
data objects as intelligent agents
Intelligence Shift of responsibility into the data objects opens up an entire new class of applications: data objects as intelligent agents Premise: instead of having the data objects do nothing while they patiently wait to be accessed, have them do something useful while waiting ...
49
Conclusions Smart objects are an idea whose time has come
natural progression of DL R&D Smart objects will play an fundamental role in digital preservation More info on preservation:
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.