The Fedora Project March 10, 2003 Sandy Payette Cornell Information Science
The Problem of Complex Content Motivation The Problem of Complex Content
Digital Library Content not just documents ... Some familiar objects Complex, compound, dynamic objects
Key Research Questions How can clients interact with heterogeneous collections of complex objects in a simple and interoperable manner? How can complex objects be designed to be both generic and genre-specific at the same time? How can we associate services and tools with objects to provide different presentations or transformations of the object content? How can we associate specialized, fine-grained access control policies with specific objects, or with groups of objects? How can we facilitate the long-term management and preservation of complex objects with dependencies on distributed content and services?
Shortcomings of commercial digital library products Narrow focus on specific media formats (e.g. image databases, document management) Fail to effectively address interrelationships among digital entities Fail to address interoperability; no open interfaces to facilitate sharing of services; no standard protocols for cross-system interoperability Fail to provide facilities for managing programs and tools that are integral to delivering digital content. Not extensible; does not enable easy integration of new tools and services Do not address fine-grained access control and preservation issues.
The Flexible Extensible Digital Object Repository Architecture (FEDORA) DARPA and NSF-funded research at Cornell (1997-present) CORBA-based reference implementation (Payette/Lagoze) Extensive interoperability testing (with Arms/Blanchi/Overly) Policy Enforcement (Payette/Schneider) Interpreted and re-implemented at U of Virginia (1999-) Simple web-oriented implementation, focused on access to collections Java servlet and relational db Testbed of 10,000,000 objects with performance metrics (1999-2001) Mellon-Funded FEDORA Software(2002-) University of Virginia and Cornell - joint development Open source Web services and XML Mediation of distributed services Preservation focus
The Fedora Architecture Digital Object Model The Repository Web Services
FEDORA Basic Object Architecture Digital Object Model Container to aggregate digital content of any type Data or metadata Local or distributed “Behavior” definitions (like abstract interfaces) Hooks to external services Enables multiple “disseminations” of content
Digital Object Model Functional View dynamic Application services
Globally unique persistent id Digital Object Model Architectural View Globally unique persistent id Persistent ID ( PID ) Public view: access methods for obtaining “disseminations” of digital object content Disseminators Internal view: metadata necessary to manage the object System Metadata Datastreams Protected view: content that makes up the “basis” of the object
Digital Object Model Service Relationships Persistent ID (PID) Service Definition Metadata (WSDL) System Metadata Datastreams Behavior Definition Object Persistent ID (PID) System Metadata Datastreams Disseminators Data Object Behavior Mechanism Object Persistent ID (PID) Service Binding Metadata (WSDL) System Metadata Datastreams External Service
FEDORA Basic Repository Architecture Repository System Object Management Lifecycle (Ingest/create Store Delete Approve Purge) Validation PID Generation Version management Access Control Preservation support Object Access Object Dissemination Object Reflection Service Mediation
Fedora: A Programmer’s View Understanding the system implementation Web Services Server Design
What is a Web Service? A distributed application that runs over the internet. An addressable network endpoint which receives structured messages returns structured responses. A web application that publishes an open interface through which clients can send requests and received responses.
How is this different from plain old web applications? Formally defined API (application programming interface) defines a set of abstract operations for a web service Published bindings for client to run operations Standard protocol for invoking operations on the service. XML as standard means of encoding service requests and responses.
Why are Web Services important? Interoperability Web applications can interact and build upon each other Data is transferred in an interoperable manner (HTTP) Data is encoded in an interoperable format (XML) Works in decentralized, distributed, operating-system independent environment. Standards-oriented Means to expose complex operations with rich data typing (via XML Schema language typing) Ease of integrating distributed systems via the Web W3C effort to develop this service architecture
How are Web Services Implemented? Simple Object Access Protocol (SOAP) SOAP is a messaging protocol that can run over different transport protocols (e.g., HTTP, SMTP) Operation oriented (send a request to a end point) Like CORBA, RMI, DCOM…but for Web and simpler Application APIs can be defined and published using the Web Service Description Language (WSDL) Requests and responses sent as XML messages Supports simple and complex data typing in requests and responses Supports transmission of binary data within requests or response packages
How are Web Services Implemented? REST (Representational State Transfer) URI + HTTP + XML URI/resource driven; message built into a URL HTTP GET or POST Response is XML data Issues: Not a standard, but a style of doing web apps; arguably it just gives a fancy name to how lots of people do applications on the web by default; nothing really new here; just argues to do things the way we have been, maybe a little more standard by using XML. Fragile service definition – URL’s change No data typing on requests Limited ability to transmit complex requests on URL W3C behind SOAP; one strong voice out there for REST (Prescod).
Example of Web Service using SOAP My Application SOAP Request (XML) Google Web Service SOAP/HTTP SOAP/HTTP doSpellingSuggestion(payet) payette SOAP Response (XML)
XML SOAP Request <?xml version="1.0" encoding="UTF-8"?> SOAP-ENV:Envelope xmlns:SOAP-ENV=http://schemas.xmlsoap.org/soap/envelope/ xmlns:xsi="http://www.w3.org/1999/XMLSchema-instance xmlns:xsd="http://www.w3.org/1999/XMLSchema"> <SOAP-ENV:Body> <m:doSpellingSuggestion xmlns:m="urn:GoogleSearch"> <key>/e325JlNPASJu</key> <phrase>payet</phrase> </m:doSpellingSuggestion> </SOAP-ENV:Body> </SOAP-ENV:Envelope>
XML SOAP Response <?xml version="1.0" encoding="UTF-8"?> <SOAP-ENV:Envelope xmlns:SOAP-ENV="http://schemas.xmlsoap.org/soap/envelope/" xmlns:xsi="http://www.w3.org/1999/XMLSchema-instance" xmlns:xsd="http://www.w3.org/1999/XMLSchema"> <SOAP-ENV:Body> <ns1:doSpellingSuggestionResponse xmlns:ns1="urn:GoogleSearch" SOAP-ENV:encodingStyle="http://schemas.xmlsoap.org/soap/encoding/"> <return xsi:type="xsd:string">payette</return> </ns1:doSpellingSuggestionResponse> </SOAP-ENV:Body> </SOAP-ENV:Envelope>
Fedora and Web Services Fedora Repository system exposed as two related Web services Access (API-A) and Management (API-M) Both described using WSDL Both have SOAP and HTTP bindings Back-end services Digital object behaviors implemented as linkages to other distributed web services Service binding metadata (WSDL) stored in special Fedora objects. Fedora Repository system acts a mediator to these services.
Fedora: Web Services View
3-Tiered Architecture Modular & Extensible System Diagram Fedora Server Design 3-Tiered Architecture Modular & Extensible System Diagram
Server Design: 3 Layers Interface Service Exposure API-A, API-M, pure http and SOAP via http. Application Logic Implements requests in terms of the Fedora object model. Storage Database, Filesystem, Object serializations and cache(s).
System Diagram
Fedora: Implementation Technologies Fedora Web Services Layer Apache Axis for SOAP over HTTP Apache Tomcat 4.1 Core Repository System Sun Java J2SDK1.4 Xerces 2-2.0.2 for XML parsing and validation Saxon 6.5 for XSLT transformation Schematron 1.5 for validation MySQL-2.23.52 and Mckoi relational database Deployment Platforms Windows 2000, NT, XP Solaris Linux
DEMO Local Repository www.fedora.info
Deployment Partners Los Alamos National Laboratory: Research Library Library of Congress: Motion Picture and Recorded Sound Division Indiana University: Digital Library group Kings College London: Humanities Computing NYU: Humanities Computing Northwestern University: Academic Computing Oxford: Oxford Digital Library and The Refugee Studies Center Tufts: Digital Collections and Archives Department