Flexible and Extensible Digital Object and Repository Architecture (FEDORA) Sandra Payette Cornell University MOA2/Cornell Architecture Meeting December 10,
Background - CDLRG NCSTRL Dienst architecture Interoperability strawman proposal (see Leiner paper in next D-Lib) Open Architecture Research Program Cornell Reference Architecture for Distributed Digital Libraries (CRADDL) Flexible Extensible Digital Object and Repository Architecture (FEDORA) Distributed Searching Resource Discovery and Metadata (Dublin Core effort, STARTS)
Library of Congress Cornell Digital Library Digital Library Interoperability
Cornell Reference Architecture for Distributed Digital Libraries (CRADDL) Open Architecture functionality partitioned into set of well-defined services services accessible via well-defined protocol Modularization promotes interoperability scalable to different clientele (research library, informal web) Federation enable aggregations into logical collections Distribution of content (collections) and services of administration and management of DL
Repository Services CRADDL: Component-Ware Digital Libraries Collection Services Index Services Handles Name Service Digital Objects UI Gateway Service
FEDORA Repository Service core service to provide a reliable and secure means to store and disseminate digital content interoperability with other CRADDL services Digital Object Model container for aggregating any digital material disseminations of complex content types with rights management global extensibility mechanisms Part of our broader effort to develop a component-ware digital library architecture
FEDORA: Conceptual Backdrop CNRI Digital Object Architecture (Kahn/Wilensky, Arms/Blanchi/Overly) Warwick Framework Distributed Active Relationships
FEDORA DigitalObject: container for content Structure (raw data structure) Interface (content views) Mechanisms (executables) Repository: logical service Service layer for “contained” DigitalObjects Object lifecycle management Secure environment for running mobile code
Digital Library Content Simple, familiar content types Complex, compound, dynamic content types
FEDORA: Goals Normalization of digital library content - order the chaos Flexible notions of content while ensuring interoperability Stable interfaces as underlying mechanisms change Naturally evolving content type system - extensibility Community-driven content type development Complex aggregations of distributed content Rights management - leverage existing/future schemes
Dublin Core Book Future Diary-MOA Multiple “views” of a DigitalObject DataStream (MIME-typed byte stream)
Digital Object is... recognizable by what it can do getChapter getPage getTrack getLabel getSection getArticle getFrame getLength
Structure Mechanism Content-Type Interfaces Book Dublin Core What the client sees vs. What the object is
Content Type A set of behaviors that formally describes the functionality of any global or domain-specific notion of content.
Disseminator A generic component for associating a set of behaviors with a DigitalObject. Primitive Disseminator Content Type Disseminator
FEDORA DigitalObject application/ postscript application/ MARC Primitive Disseminator Structural Kernel Content-Type Wrapper
application/ MARC DS 1 application/ postscript DS 2 Primitive Disseminator DigitalObject : Client communicates with PrimitiveDisseminator Book, DublinCore ListContentTypes Book Disseminator DublinCore Disseminator GetDissemination (Book.GetPage(1)) GetChapter GetTOC GetPage GetChapter(n), GetTOC(), etc. GetMethods(Book)
Content Type Principles Stability Orthogonality to Structure Extensibility These are achieved in FEDORA through the architectural segregation of DigitalObject structure, mechanisms, and content-type interfaces.
FEDORA: Interface Stability Mechanism Structure Interface Content Type Mechanisms can be updated or replaced as technology changes... … and content interface to the Digital Object remains stable
Digital Object Extensibility: Adding New Content Types MechanismStructure Interface Book The same underlying data... Book can be operated on in novel ways… Photo Collection to create new disseminations not originally conceived of for the particular digital object. Photo Collect
Content Type Extensibility There must be a way to identify, register and proliferate content types in the global digital library infrastructure. Content types must become persistent, named entities in the digital library infrastructure. How? Content-type definitions and mechanism are disseminated from named DigitalObjects (using FEDORA’s own architectural abstractions).
Content Disseminator is a Generic Component... … that references another FEDORA DigitalObject that disseminates a content-type servlet GetDCField(e), GetDCRecord GetMethods(DC) application/ MARC GetDCField GetDCRecord DC DS 1 application/ postscript DS 2 DataStreams = DS 1 ContentTypeID = URN DC1
How Achieve Content-Type Extensibility? application/ MARC DC servlet application/ postscript DublinCore Record GetDissemination( GetDCRecord) DC CTID = URN DC1 DC signature GetDCField GetDCRecord DC MethodList Signature Disseminator URN DC DC Mechanism Servlet Disseminator URN DC1 Digital Object attains its extended content-type behaviors through association and delegation
Registration and Proliferation of Content Types A content type becomes registered when the URN of the DigitalObject that disseminates its signature is registered (in a DL name service) A content type becomes usable when the URN of the DigitalObject that disseminates its servlet is registered Other DigitalObjects can utilize content types by referencing them by these URNs.
Access Management Must have facilities to protect content No single solution Association of existing, external rights management schemes Accommodate new schemes FEDORA applies same extensibility model to rights management...
AccessManager Mechanisms application/ MARC text/x-acl DC ACL Mechanism Servlet Disseminator URN ACL1 URN 1 GetDCField GetDCRecord Disseminator protected by AccessManager External Servlet Utilized
Current Status Full reference implementation CORBA IDL defines all component interfaces Java/CORBA prototype system complete Java client application for building and accessing digital objects Initial demonstration content types Dublin Core Article/Technical Report Book (with CNRI / Library of Congress) Photo
CNRI/Cornell Interoperability Project CNRI and Library of Congress partners Developed Joint Interface Definition agreement on all conceptual abstractions merger of RAP and FEDORA IDL Separate repository implementations CNRI using Visigenics ORB Cornell using Iona’s OrbixWeb ORB Test collections of Digital Objects CNRI - Library Congress materials (books, journals, photographs, speeches) Cornell - NCSTRL research collections
CNRI/Cornell Interoperability Experiments IT0: Fundamental Communication Inter-ORB communication IDL recognition: request invocation; proper return types Status: Success (October 1998) IT1: Functional Interoperability create/access DigitalObjects in each repository exercise all operations on each other’s repositories Status: In Progress (completion 12/18) IT2: Content-Type Servlet Interoperability dynamic loading and running of remote servlets
FEDORA: Planned Research Scale up: demonstrate complex content types and servlets with CNRI and LC Integration of new community- developed content types (e.g., MOA2) Access Management Reliability, security, integrity (DLI2 - CS/Cornell University Library) For more information:
CDLRG References Lagoze and Payette: An Infrastructure for Open-Architecture Digital Libraries Payette and Lagoze: Flexible and Extensible Digital Object and Repository Architecture (FEDORA) Lagoze and Fielding: Defining Collections in Distributed Digital Libraries Distributed Search and Resource Discovery