Presentation is loading. Please wait.

Presentation is loading. Please wait.

Thanks to Michael L. Nelson, Sandra Payette, Thorton Staples

Similar presentations


Presentation on theme: "Thanks to Michael L. Nelson, Sandra Payette, Thorton Staples"— Presentation transcript:

1 Thanks to Michael L. Nelson, Sandra Payette, Thorton Staples
Complex objects Thanks to Michael L. Nelson, Sandra Payette, Thorton Staples

2 Warwick Framework

3 From Dublin to Warwick Second Invitational Metadata Workshop, 1996
Extend DC by introducing typed metadata packages aggregated into containers packages: simple (DC, MARC, FGDC, etc.) indirect (pointers to remote packages) container (hierarchical relationships)

4 WF Motivations no distinction between data & metadata
no single “about” relationship resource location does not matter resources can be computational objects

5 Simple WF Container Figure 1 from:

6 Metadata types Dublin Core MARC (Machine-Readable Cataloging)
A simple set of metadata elements used in digital libraries, primarily to describe digital objects and for collections management, and for exchange of metadata. MARC (Machine-Readable Cataloging) A format used by libraries to store and exchange catalog records.

7 Warwick Framework Catalog
WFC = a table of contents or manifest defining the relationships of the packages in the container…

8 A WF Catalog as a WF Package
Figure 2 from:

9 Metadata vs. Data Figure 3 from:

10 Distributed Digital Object Container
Figure 5 from:

11 Distributed Active Relationships
Figure 6 from:

12 Fedora

13 Digital Library Interoperability
Cornell Digital Library Library of Congress

14 Principles for Digital Library Architecture
Open Architecture functionality partitioned into set of well-defined services services accessible via well-defined protocol Modularization promotes interoperability scalable to different clientele (library, informal web) Federation enable aggregations into logical collections Distribution of content and services of administration and management

15 Component-Ware Digital Libraries
UI Gateway Service UI Name Service Identifiers Collection Service Query Mediator Service Index Service Define: UI - user interface service, where the presentation of information can be customized Index Service -- a way to talk about disparate search engines as one conceptual entity. Collection -- a set of digital content grouped because it fulfills particular criteria Colllection Service -- supplies metadata specific to the contents and services of a particular collection. (Could be what QMs are in the collection as well as what content is in the collection.) Query Mediator -- deals with distribution of queries and aggregation of results in the resource discovery process All services communicate via one or more OPEN PROTOCOLs Repository Service Digital Objects

16 The Fedora Project Fedora Use Cases Open source software
The Flexible Extensible Digital Object Repository Architecture Fedora Use Cases Digital Asset Management Institutional Repository Digital archives and preservation Content Management System (CMS) Digital Library Architecture Open source software Not RedHat ! Mozilla Public License

17 Fedora History Research (1997-present) :
DARPA and NSF-funded research project at Cornell Reference implementation developed at Cornell Extensive interoperability testing (experiments with CNRI) Policy Enforcement First Application ( ) : University of Virginia digital library prototype Open Source Software (2002-present): Andrew W. Mellon Foundation granted $1 million to develop a production-quality Fedora system Four releases of software, leading to 2.0 this fall $1.4 million granted by Mellon for second phase

18

19 FEDORA Digital Object Model
container for aggregating any digital material disseminations of complex types global extensibility mechanisms access management Repository Service Service layer for “contained” DigitalObjects Object lifecycle management Secure environment open interface

20 FEDORA: Goals Distribution - of digital content and services
Interface Stability - for digital objects Interoperability - for digital objects and repositories Extensibility - naturally evolving type system Flexibility - community-driven type development Security - rights management and access control Preservation - longevity of digital objects

21 FEDORA DigitalObjects can be...
Simple, familiar entities Complex, compound, dynamic objects

22 FEDORA DigitalObject Model
MIME-typed stream of bytes Book Diary Future Dissemination Dublin Core Service Request upon external source Internal DataStream Reference DataStream

23 Data types MIME (Internet Media Type) Datastream
A scheme for specifying the data type of digital material. Datastream A typed byte stream that preserves the internal format and encoding of the type, but encapulates it so that it can be treated generically within the DigitalObject. This distillation of data into its essential type and byte representation allows heterogeneous forms of digital content to be treated in a uniform manner,

24 Disseminator Type A set of behaviors that formally describes the functionality of any global or community-specific notion of content. getSection getArticle getChapter getPage getFrame getLength

25 Disseminator A generic component that associates
a set of behaviors with a DigitalObject. Primitive Disseminator Extensible Type Disseminator Generic behaviors Extended behaviors

26 FEDORA DigitalObject Primitive Disseminator application/ application/
MARC application/ postscript image/gif image/gif image/gif Primitive Disseminator image/gif

27 Client communicates with generic requests
GetChapter GetTOC GetPage Book Disseminator DublinCore ListDisseminatorTypes Book, DublinCore GetMethods(Book) application/ MARC DS1 GetChapter(n), GetPage(n),GetTOC() Primitive Disseminator application/ postscript GetDissemination (Book.GetPage(1)) DS2

28 … to produce non-generic behaviors for the DigitalObject
A Disseminator... … references a Servlet TYPE DESCRIPTION = DublinCore SERVLET = cornell.dli2/DC-from-MARC … to produce non-generic behaviors for the DigitalObject GetDCField GetDCRecord DC application/ MARC DS1 GetMethods(DC) application/ postscript DS2 GetDCField(Title), GetDCRecord

29 DigitalObject Interface Stability
Structure Mechanisms can be updated or replaced as technology changes ... … and the interface to the Digital Object remains stable Servlet-2 Servlet-1 Servlet-3 Disseminator Type Mechanism Interface

30 DigitalObject Extensibility: Adding New Types
Book Book can be operated on in novel ways… Photo Collection to create new disseminations not originally conceived of for the particular digital object. Photo Collect The same underlying data... Structure Mechanism Interface

31 Extensibility: a look under the hood
DublinCore Mechanism (Servlet) DC URNDC1 Servlet Disseminator DC servlet Servlet = URNDC1 DC application/ MARC GetDissemination( GetDCRecord) application/ postscript DublinCore Record DC MethodList Signature Disseminator URNDC DublinCore Disseminator Type (Interface Definition) DC signature GetDCField GetDCRecord

32 Proliferation of Disseminator Types
We use FEDORA DigitalObjects to store Disseminator Signatures and Servlets. Type Registration (via name service) a Disseminator Type’s global identifier is … the URN of a DigitalObject containing a Signature a Servlet’s global identifier is … the URN of a DigitalObject containing a Servlet Types can be globally recognizable and mechanisms can be shared.

33 Interoperable Digital Objects and Repositories
RAP Client Name Service Repository Repository Repository Identifiers Audio/Visual Archive Cornell Library Collections Image Database System

34 RAP A part of the core infrastructure (along with the notion of digital objects and a naming system) is a common repository access protocol (RAP). The RAP is to be supported by all repositories in the system, and defines the core set of interactions with that repository, such as storing or retrieving a digital object. RAP is not an implementation blueprint, but rather only an interface description that is technology independent. The repository itself may be considered as a digital object containing other digital objects.

35 Persistent Identifiers
In FEDORA, use them for: Repositories DigitalObjects Disseminator Types Servlet Mechanisms Benefits: Ensure uniqueness Provide stability (location independence) Promote global extensibility Promote interoperability Identifiers Name Service

36 Identifiers - A Brief Primer
IETF Uniform Resource Name (URN) Spec Naming Scheme The policies and procedures for creating and assigning URNs within a particular domain. Resolution System A system that translates URNs into their location-specific identifiers (e.g., URLs). Registries A set of global directories that provide information on which resolution systems can translate any particular URN.

37 Identifiers - Existing Solutions
CNRI’s Handle System good implementation of URN specification 1 Handle >> one or more locations resolve to different data types (URL, IOR,…) OCLC’s PURL persistent URLs, not really URNs 1 PURL >> only one location (a HTTP redirect) Community-specific Initiatives Digital Object Identifier (DOI) - publishers Handle System + Rights Metadata PubMedID - Medline BibCode - astro-physics journals

38 FEDORA Status Reference Implementation Collaborations
CORBA IDL defines open interfaces for Repository Access Protocol (RAP) Java/CORBA repository and clients Collaborations CNRI core design and interoperability complex disseminations (dynamic) U of Virginia web integration complex disseminations (e.g., e-texts)

39 PRISM Security Policy Enforcement
Challenges what is enforceable? distributed object environment interoperability and extensibility Monitor all operations, generic and extended Enforce a wide array of policies basic security violations rights management access control GetDCField GetDCRecord DC application/ MARC text/x-acl

40 PRISM: Preservation Handles Fedora Repositories Preservation Service

41 PRISM: Preservation Policy Enforcement
Surrogate Object Monitors DigitalObject state and catches unacceptable, or risky transitions Book Preserve P preservation metadata DS1 Preservation Service application/ postscript DS2

42 References Payette, Blanchi, Lagoze, and Overly: Interoperability for Digital Objects and Repositories: The Cornell/CNRI Experiments, D-Lib Magazine, May Payette and Lagoze: Flexible and Extensible Digital Object and Repository Architecture (FEDORA), ECDL Lagoze and Payette: An Infrastructure for Open-Architecture Digital Libraries Daniel, Lagoze, and Payette, A Metadata Architecture for Digital Libraries, IEEE ADL FEDORA Home Page Payette: Persistent Identifiers on the Digital Terrain, RLG DigiNews, April 1998, Volume 2, Number 2.

43 Fedora Principles From Payette & Lagoze, ECDL 1998 ( support for heterogeneous data types accommodation of new types as they emerge aggregation of mixed, possibly distributed, data into complex objects the ability to specify multiple content disseminations of these objects the ability to associate rights management schemes with these disseminations

44 Fedora Services From Payette & Lagoze, ECDL 1998 ( repository services that provide the mechanisms for depositing, storing and accessing digital objects index services that provide the mechanisms for discovering digital objects collection services that provide the means of aggregating sets of digital objects and services into meaningful collections naming services that register and resolve globally unique, persistent names for digital objects user interface services that provide a human gateway into the other services

45 Fedora Digital Object Figure 7 from:

46 Fedora Primitive Disseminator
Figure 1 from:

47 Fedora Content-Type Disseminators
“The service requests of particular content-types are not actually specified in the DigitalObject architecture; instead, FEDORA provides the means to link to externally-defined content types.” Figure 2 from:

48 How to Identify External Content Types?
Store the Content-Types as DOs assign handles or other URIs define 2 types: signature: the what servlet: the how

49 Servlets & Signatures Figure 3 from:

50 Accessing a Fedora DO Figure 7 from:

51 Static -> Dynamic Binding of Behaviors
Dushay(JCDL 2002) introduces a behavior registry DOs expose their structural metadata specified in structoid schemas DOs link to the behavior registry, not to specific disseminators the registry matches structural metadata requirements, adding a layer of indirection between DOs & behaviors

52 Discovering Behaviors
Figure 7 from:

53 Invoking a Behavior Figure 8 from:

54 Buckets / SODA Smart objects, dumb archives

55 Bucket Concept Abstract metadata category Bucket-level search
Strongly typed Well-defined search semantics query terms query operators Explicitly mapped from source metadata (FGDC, 1.3, “Time period of content”, “ ”) Bucket-level search uniform across all collections e.g.: search all collections for items whose Originator bucket contains the phrase “geological survey”

56 Bucket Properties name semantic definition
Coverage date semantic definition The time period to which the item is relevant. data type (strictly observed) calendar date or range of calendar dates syntactic representation (strictly observed) ISO 8601

57 Repositories must look after the information they hold
Principle #7 in Bill Arm’s 1995 D-Lib Magazine paper: Repositories must look after the information they hold “Repository Access Protocol” Kahn Wilensky Framework figure 3 in

58 Objects vs. Archives Michael Nelson’s concern
Most DL objects still bound to the applications that generate or render the objects

59 Archival Responsibility
Traditional archives provide collection level granularity for terms and conditions, access control, etc. Buckets are autonomous and intelligent, and can enforce access control by themselves “archivelets”? Buckets are heterogeneous every bucket can have different access controls

60 SODA: Smart Objects, Dumb Archives
Objects are more important than the archive that holds them The object should be the authority on its contents, not an archive We envision a general shift of intelligence from archives to the objects themselves DL protocols should find, index, and search -- not know about file formats, policy, terms and conditions, etc.

61 Content is King The information content is more important than the systems used for its storage, management and retrieval Objects should not be “locked” in specific DLs or archives

62 Design Goals Aggregation Intelligence
DLs should be shielded from the transient nature of file formats Prevent information hemorrhaging by archiving all data types Intelligence Aggregation (above) implies code, why stop at passive objects? Make objects smart... Bucket-bucket & bucket-tool intelligence

63 Design Goals Self-Sufficiency Mobility
Maximum autonomy & survivability: fully self-sufficient buckets Option to internally store all needed materials Mobility Why should an information object be stuck in one place? Mobility for replication, workflow, data collection

64 Design Goals Heterogeneity Archive Independence
One size does not fit all... Different buckets for different applications, sites, disciplines, etc. Archive Independence Focus is on information, not yet another DL “system” does not require an archive to function “Work with everything; break nothing”

65 Buckets Aggregation + intelligence = buckets
Object-oriented, intelligent agent archival entities A collection of all information about a project: manuscripts - software data - images video - etc. Customizable, heterogeneous buckets can “learn”, “talk”, and “coordinate” buckets control terms and conditions, display, etc. -- not the archive that holds them

66 A Sample Bucket 4 packages: - report (4 elements)
- appendix (2 elements) - contact information (2 elements) - translation (1 element)

67 Is this a bucket?

68 Another Sample Bucket 2 packages: - pre-print (2 elements)
- pointer to SFX reference linking service for published and pre-print versions (2 elements) this bucket display for the Universal Preprint Service

69 IRI Project Bucket Not really a replacement home
page, but a storage container for source code, publications, reports, forms, reviews, etc.

70 Course Bucket Readings, slides, syllabus, etc.

71 Heterogeneous Buckets
Buckets are envisioned to locally modifiable and extensible There is a default set of public methods defined for buckets additional methods can be locally defined Buckets can “learn” new methods new “default” methods, or locally defined extensions override default methods

72 Bucket 1.x Messages Sample bucket messages:
returns the metadata for the bucket invokes the default display method displays a single element lists all the methods that this bucket implements

73 Bucket 2.0 Messages aggregate: assumptions metadata data
methods to operate on the metadata/data (cheat) assumptions Perl http server

74 2.0 Internal Structure jaga.cs.odu.edu:/home/mln/public_html/teaching/cs695-f03 % ls bucket/ CVS/ index.cgi* jaga.cs.odu.edu:/home/mln/public_html/teaching/cs695-f03 % ls bucket/ bucket.xml* content/ CVS/ lib/ logs/ methods/ jaga.cs.odu.edu:/home/mln/public_html/teaching/cs695-f03 % ls bucket/content/ ~syllabus.txt ~week1~readings.html ~week5~readings.html ~week10~readings.html ~week1~week-01.ppt ~week6~readings.html ~week11~readings.html ~week2~readings.html ~week7~readings.html ~week12~readings.html ~week2~week-02.ppt ~week8~readings.html ~week13~readings.html ~week3~assignment1.ppt ~week9~readings.html ~week14~readings.html ~week3~readings.html ~week15~readings.html ~week3~week-03.ppt jaga.cs.odu.edu:/home/mln/public_html/teaching/cs695-f03 % ls bucket/lib CVS/ EZXML.pm mime.e style.css jaga.cs.odu.edu:/home/mln/public_html/teaching/cs695-f03 % ls bucket/logs/ access.log CVS/ mylog.log jaga.cs.odu.edu:/home/mln/public_html/teaching/cs695-f03 % ls bucket/methods/ addElement.pl* getElement.pl* listMethods.pl* setPreference.pl* CVS/ get_log.pl* listPreference.pl* deleteElement.pl* getlog.pl* log.pl* display.pl* getMetadata.pl* setMetadata.pl* jaga.cs.odu.edu:/home/mln/public_html/teaching/cs695-f03 %

75 Examples 1.6.X bucket 2.0 buckets 3.0 buckets (under development)
2.0 buckets 3.0 buckets (under development) uses MPEG-21 DIDLs cf.

76 Buckets as Entire Archive
Archive Ingest and Handling Test (AIHT) project upcoming D-Lib Magazine Articles (December 2005) implicit assumptions: 1 bucket = 1 logical item (N physical items) Display is for human use Bucket contents are DOM-parsable

77 Which Interface? Display based on web use
Display based on archival use

78 Bucket / MPEG-21 Model http://beatitude.cs.odu.edu:8080/bucket/
MPEG-21 DIDL Payload Bucket Infrastructure methods logs support libraries

79 MPEG-21 DIDL A generic, powerful complex object metadata format
Based on an abstract data model Semantics separated from syntax i.e. the tags don’t mean anything -- a little disconcerting at first glance Digital library use championed by LANL

80 Mobility Model Not filesystem semantics Closer to process semantics
cp, mv, rm Closer to process semantics fork, exec returns a bucket stream (fork) overwrites the bucket with the uploaded bucket stream Arguments bucket, package, payload, ride format only tar currently supported

81 data objects as intelligent agents
Intelligence Shift of responsibility into the data objects opens up an entire new class of applications: data objects as intelligent agents Premise: instead of having the data objects do nothing while they patiently wait to be accessed, have them do something useful while waiting ...

82 Conclusions Smart objects are an idea whose time has come
natural progression of DL R&D Smart objects will play an fundamental role in digital preservation More info on preservation:


Download ppt "Thanks to Michael L. Nelson, Sandra Payette, Thorton Staples"

Similar presentations


Ads by Google