Download presentation
Presentation is loading. Please wait.
Published byLenard King Modified over 9 years ago
1
FEDORA Project McGill University May 17 2004 Bill Parod Academic Technologies Northwestern University bill-parod@northwestern.edu
2
Priorities for digital libraries Managing digital resources as if they are all the same Delivering digital resources as if they are all unique and free to participate in any number of contexts Supporting digital scholarship wherever it may lead Slide courtesy of Sandy Payette and Thornton Staples
3
Shortcomings of commercial digital library products Narrow focus on specific media formats (e.g. image databases, document management) Fail to effectively address interrelationships among digital entities Fail to address interoperability. Fail to provide facilities for managing programs and tools that deliver digital content. Not extensible; do not enable easy integration of new tools and services Slide courtesy of Sandy Payette and Thornton Staples
4
The Flexible Extensible Digital Object Repository Architecture (FEDORA) Developed as a DARPA and NSF-funded research project at Cornell (1997-present) Interpreted and re-implemented at University of Virginia (1999) Virginia prototype supported a testbed of 10,000,000 digital objects with very good results (1999-2001) Andrew W. Mellon Foundation granted Virginia and Cornell $1,000,000 to develop a full-featured production FEDORA system that is web-based (2002+) Slide courtesy of Sandy Payette and Thornton Staples
5
Digital Object Model Slide courtesy of Sandy Payette and Thornton Staples
6
Persistent ID (PID) Disseminators SystemMetadata Datastreams Globally unique persistent id Public view: access methods for obtaining “disseminations” of digital object content Internal view: metadata necessary to manage the object Protected view: content that makes up the “basis” of the object Digital Object Model Architectural View Slide courtesy of Sandy Payette and Thornton Staples
7
Digital Object Model Example Disseminators Persistent ID (PID) Default Disseminators Simple Image SystemMetadata Datastreams Get Profile List Items Get Item List Methods Get DC Record Get Thumbnail Get Medium Get High Get VeryHigh Slide courtesy of Sandy Payette and Thornton Staples
8
Persistent ID (PID) Behavior Definition Metadata SystemMetadata Datastreams Data Object Persistent ID (PID) Service Binding Metadata (WSDL) SystemMetadata Datastreams Web Service Object Behavior Contracts behavior contract behavior subscription data contract Persistent ID (PID) Disseminators Datastreams System Metadata Behavior Mechanism Object Behavior Definition Object Slide courtesy of Sandy Payette and Thornton Staples
9
Shared Image Behavior Definitions Slide courtesy of Sandy Payette and Thornton Staples
10
Client and Web Service Interactions Fedora Repository System External Service Dispatch Client application Server application web browser Client application user Fedora Service APIs user Content Transform Service API Content Transform Service API Slide courtesy of Sandy Payette and Thornton Staples
11
Fedora 1.2 Software Feature Set Open Fedora APIs – Repository as web services (REST and SOAP bindings); WSDL interface defs Flexible Digital Object Model – Content View: objects as bundle of items (content and metadata) – Service View: objects as a set of service methods (“behaviors”) – Extensible functionality by associating services with objects Repository System – Core Services: Management, Access/Search, OAI-PMH – Storage: XML object store; relational db object cache; relational db object registry – Mediation - auto-dispatching to distributed web services for content transformation – Auto-Indexing – system metadata and DC record of each object – HTTP Basic Authentication and Access Control – Built-in disseminator services: XSLT x-form, image manipulation, xml-to-PDF Content Versioning – Automatic version control (saves version of content/metadata when modified) – Enables date-time stamped API requests (see object as it looked at a point in time) Clients – Fedora Administrator: GUI client to create/maintain objects – Default Web browser interface: search; access objects via default disseminator – Command line utilities (batch load, ingest, purge, others) – Migration Utility – mass export/ingest
12
Fedora Repository Service Interfaces Management Service (API-M) – Ingest - XML-encoded object submission – Create - interactive object creation via API requests – Maintain - interactive object modification via API requests – Validate – application of integrity rules to objects – Identify - generate unique object identifiers – Security - authentication and access control – Preserve - automatic content versioning and audit trail – Export - XML-encoded object formats Access Service (API-A and API-A-LITE) – Search - search repository for objects – Object Reflection - what disseminations can the object provide? – Object Dissemination - request a view of the object’s content OAI-PMH Provider Service – OAI-DC records Slide courtesy of Sandy Payette and Thornton Staples
13
Fedora Software Distribution Package Open Source (Mozilla Public License) 100% Java (Sun Java J2SDK1.4) Supporting Technologies – Apache Tomcat 4.1 and Apache Axis (SOAP) – Xerces 2-2.0.2 for XML parsing and validation – Saxon 6.5 for XSLT transformation – Schematron 1.5 for validation – MySQL and Mckoi relational database – Oracle 9i support Deployment Platforms – Windows 2000, NT, XP – Solaris – Linux Slide courtesy of Sandy Payette and Thornton Staples
14
FEDORA at Northwestern University
15
General Background Academic Technologies unit of IT Develop and support faculty projects Library partnerships Institutional partnerships Diverse clientele Diverse content
16
Current FEDORA Projects Block Museum of Art The Last Expression Art Collection Introduction to Asian Art History BBC Spoken Word Archive Encyclopedia of Chicago WordHoard Text Analysis Project Various image collections
17
General Goals Efficient production - code reuse Efficient access – content reuse Content flexibility Implementation flexibility Content management Implementation management
18
Art collections Wall murals Photographs Historical maps GIS maps Newspapers Book page images Digital video Spoken word Literary works Encyclopedias Lexical data Census data Event data Diversity of Content
19
RDBMS XML Databases XSLT Processors GIS Wavelet Image Servers Vector Image Processors Streaming Media Servers Custom Servlets Diversity of Systems
20
Art collections Wall murals Photographs Historical maps GIS maps Newspapers Book page images Digital video Spoken word Literary works Encyclopedias Lexical data Census data Event data Abstract Image Models
21
Art collections Wall murals Photographs Historical maps GIS maps Newspapers Book page images Digital video Spoken word Literary works Encyclopedias Lexical data Census data Event data Abstract Text Model
22
Digital video Spoken word Literary works Encyclopedias Lexical data Census data Event data Art collections Wall murals Photographs Historical maps GIS maps Newspapers Book page images Time-based Media Model
23
Images SimpleZoomLayered Core getThumbnail getCoverpage Basic getThumbnail get Medium getHigh getVeryHigh Zoom getRegion getViewer Layered getRegion getViewer
24
Behaviors by Type ImageMapA/VBookNewsEText Core Image Hi-Res Layered Geo Time Text
25
Simple Image Simple Image model used for art collections Collection-specific page style is achieved by bundling Xslt style sheet as data-stream with collection object The same image model can be used for different collections
26
Zoomable image Zoomable image xslt includes zooming controls Collection specific style is also achieved with XSLT “Zoomable” image also provides simple image behavior Can participate in basic image applications in this way
27
Collection Object Collection behavior – getSearchForm – performSearch – getItem – getItems – addItem – deleteItem – reindex – displayItem
28
Customizing collection objects Collection objects all leverage common search functionality Each provide their own xslt for search results So new collections can be brought up easily This is true regardless of the collection type: image, audio,…
29
Search Implementation FEDORA METS files currently indexed offline Plan to integrate update notification and indexing Search Engine – Have 3 implementations: FEDORA native search Sgrep OpenText Investigating SRW/CQL Search results passed through XSLT Easy to provide search capability to collections
30
TEI Text
31
Bound Volume TEI Book object For transcribed and/or page image scans Table of Contents tree viewer Zoomable image object for page scans
32
Content Re-use Contextualization Collection maintenance – Topical galleries Ad-hoc or dynamic collections – For classes... – personal collections… – special exhibits…
33
Specialized clients “Project Pad” software Group/Private network folders Image annotation Audio annotation Client for FEDORA image and audio objects
34
Image workspace
35
Implementation flexibility Development vs production environment Avoid product “lock-in” Technology migration Services are external – Image server – Tomcat servlets – Search engines – Table of contents service – Xquery – RDBMS
36
FEDORA Dissemination Requests External Services Cache data Data Request Dissemination FEDORA – External Services RDBMS Search Engine BMECH Image Server Data stream TOC Server
37
Next Steps Implement more object types – Event, video, tabular data Authoring tools Work flow support Security management Content management tools Wider interoperability
38
Metadata in ExcelMETSFEDORA Tiffs in Xythos TrueSpectra Image Server Dissemination Requests Catalog in Excel converted to METS for FEDORA ingest Tiff Masters deposited in collection’s Xythos directory Access to Xythos directory enabled for TrueSpectra virtual paths METS/FEDORA record includes link to TrueSpectra image Access to image is through FEDORA image behaviors DepartmentAcademic Technologies Data flow Requests Users Image Workflow: FEDORA – TrueSpectra – Xythos
39
Auto-ingesterFEDORAFiles in Xythos TrueSpectra Streaming Server Search Dissemination Requests Faculty or SupportAcademic Technologies Data flow Requests Users Physical Collection Management Scenario: FEDORA – Content Service – Xythos Integration Metadata update FEDORA collection object attached to Xythos directory Xythos notifies collection object of changes in the directory File added – collection creates new member item File updated – item accepts new version for file stream File removed – item is set dormant in FEDORA Metadata added/updated online or batch
40
Summary Code reuse through object abstraction Content reuse through clear object models Flexible implementation binding Flexible content modeling
41
Future Software Releases Fedora Object XML (FOXML) – Internal storage format; direct expression of Fedora object model – Better support for relationships (“kinship” metadata) – Better support for audit trail (event history) – Format identifiers for dynamic service binding Shibboleth authentication Policy Enforcement – XACML expression language – Fedora policy enforcement module Web interface for easy content submission Batch object modification utility Administrative Reporting Object Event History (ABC/RDF disseminations) Better support for “collections” New ingest and export formats (METS1.3, DIDL) December 2003 – December 2004 Slide courtesy of Sandy Payette and Thornton Staples
42
Future Development Proposals Digital Library in a Box – Full-featured DL application with “Fedora inside” – Optimized for common set of content types Fedora Power Server – Integrity Management Tools – Service and link liveness checker – Fault Tolerance – Mirroring and Replication – Peer-to-peer interoperability features – Repository clustering – Load balancing Object Creation Tools – Workflow applications based on content models – Web interface for document/content submission Slide courtesy of Sandy Payette and Thornton Staples
43
Questions http://www.fedora.net Bill-parod@northwestern.edu
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.