Presentation is loading. Please wait.

Presentation is loading. Please wait.

FEDORA Project McGill University May 17 2004 Bill Parod Academic Technologies Northwestern University

Similar presentations


Presentation on theme: "FEDORA Project McGill University May 17 2004 Bill Parod Academic Technologies Northwestern University"— Presentation transcript:

1 FEDORA Project McGill University May 17 2004 Bill Parod Academic Technologies Northwestern University bill-parod@northwestern.edu

2 Priorities for digital libraries Managing digital resources as if they are all the same Delivering digital resources as if they are all unique and free to participate in any number of contexts Supporting digital scholarship wherever it may lead Slide courtesy of Sandy Payette and Thornton Staples

3 Shortcomings of commercial digital library products Narrow focus on specific media formats (e.g. image databases, document management) Fail to effectively address interrelationships among digital entities Fail to address interoperability. Fail to provide facilities for managing programs and tools that deliver digital content. Not extensible; do not enable easy integration of new tools and services Slide courtesy of Sandy Payette and Thornton Staples

4 The Flexible Extensible Digital Object Repository Architecture (FEDORA) Developed as a DARPA and NSF-funded research project at Cornell (1997-present) Interpreted and re-implemented at University of Virginia (1999) Virginia prototype supported a testbed of 10,000,000 digital objects with very good results (1999-2001) Andrew W. Mellon Foundation granted Virginia and Cornell $1,000,000 to develop a full-featured production FEDORA system that is web-based (2002+) Slide courtesy of Sandy Payette and Thornton Staples

5 Digital Object Model Slide courtesy of Sandy Payette and Thornton Staples

6 Persistent ID (PID) Disseminators SystemMetadata Datastreams Globally unique persistent id Public view: access methods for obtaining “disseminations” of digital object content Internal view: metadata necessary to manage the object Protected view: content that makes up the “basis” of the object Digital Object Model Architectural View Slide courtesy of Sandy Payette and Thornton Staples

7 Digital Object Model Example Disseminators Persistent ID (PID) Default Disseminators Simple Image SystemMetadata Datastreams Get Profile List Items Get Item List Methods Get DC Record Get Thumbnail Get Medium Get High Get VeryHigh Slide courtesy of Sandy Payette and Thornton Staples

8 Persistent ID (PID) Behavior Definition Metadata SystemMetadata Datastreams Data Object Persistent ID (PID) Service Binding Metadata (WSDL) SystemMetadata Datastreams Web Service Object Behavior Contracts behavior contract behavior subscription data contract Persistent ID (PID) Disseminators Datastreams System Metadata Behavior Mechanism Object Behavior Definition Object Slide courtesy of Sandy Payette and Thornton Staples

9 Shared Image Behavior Definitions Slide courtesy of Sandy Payette and Thornton Staples

10 Client and Web Service Interactions Fedora Repository System External Service Dispatch Client application Server application web browser Client application user Fedora Service APIs user Content Transform Service API Content Transform Service API Slide courtesy of Sandy Payette and Thornton Staples

11 Fedora 1.2 Software Feature Set Open Fedora APIs – Repository as web services (REST and SOAP bindings); WSDL interface defs Flexible Digital Object Model – Content View: objects as bundle of items (content and metadata) – Service View: objects as a set of service methods (“behaviors”) – Extensible functionality by associating services with objects Repository System – Core Services: Management, Access/Search, OAI-PMH – Storage: XML object store; relational db object cache; relational db object registry – Mediation - auto-dispatching to distributed web services for content transformation – Auto-Indexing – system metadata and DC record of each object – HTTP Basic Authentication and Access Control – Built-in disseminator services: XSLT x-form, image manipulation, xml-to-PDF Content Versioning – Automatic version control (saves version of content/metadata when modified) – Enables date-time stamped API requests (see object as it looked at a point in time) Clients – Fedora Administrator: GUI client to create/maintain objects – Default Web browser interface: search; access objects via default disseminator – Command line utilities (batch load, ingest, purge, others) – Migration Utility – mass export/ingest

12 Fedora Repository Service Interfaces Management Service (API-M) – Ingest - XML-encoded object submission – Create - interactive object creation via API requests – Maintain - interactive object modification via API requests – Validate – application of integrity rules to objects – Identify - generate unique object identifiers – Security - authentication and access control – Preserve - automatic content versioning and audit trail – Export - XML-encoded object formats Access Service (API-A and API-A-LITE) – Search - search repository for objects – Object Reflection - what disseminations can the object provide? – Object Dissemination - request a view of the object’s content OAI-PMH Provider Service – OAI-DC records Slide courtesy of Sandy Payette and Thornton Staples

13 Fedora Software Distribution Package Open Source (Mozilla Public License) 100% Java (Sun Java J2SDK1.4) Supporting Technologies – Apache Tomcat 4.1 and Apache Axis (SOAP) – Xerces 2-2.0.2 for XML parsing and validation – Saxon 6.5 for XSLT transformation – Schematron 1.5 for validation – MySQL and Mckoi relational database – Oracle 9i support Deployment Platforms – Windows 2000, NT, XP – Solaris – Linux Slide courtesy of Sandy Payette and Thornton Staples

14 FEDORA at Northwestern University

15 General Background Academic Technologies unit of IT Develop and support faculty projects Library partnerships Institutional partnerships Diverse clientele Diverse content

16 Current FEDORA Projects Block Museum of Art The Last Expression Art Collection Introduction to Asian Art History BBC Spoken Word Archive Encyclopedia of Chicago WordHoard Text Analysis Project Various image collections

17 General Goals Efficient production - code reuse Efficient access – content reuse Content flexibility Implementation flexibility Content management Implementation management

18 Art collections Wall murals Photographs Historical maps GIS maps Newspapers Book page images Digital video Spoken word Literary works Encyclopedias Lexical data Census data Event data Diversity of Content

19 RDBMS XML Databases XSLT Processors GIS Wavelet Image Servers Vector Image Processors Streaming Media Servers Custom Servlets Diversity of Systems

20 Art collections Wall murals Photographs Historical maps GIS maps Newspapers Book page images Digital video Spoken word Literary works Encyclopedias Lexical data Census data Event data Abstract Image Models

21 Art collections Wall murals Photographs Historical maps GIS maps Newspapers Book page images Digital video Spoken word Literary works Encyclopedias Lexical data Census data Event data Abstract Text Model

22 Digital video Spoken word Literary works Encyclopedias Lexical data Census data Event data Art collections Wall murals Photographs Historical maps GIS maps Newspapers Book page images Time-based Media Model

23 Images SimpleZoomLayered Core getThumbnail getCoverpage Basic getThumbnail get Medium getHigh getVeryHigh Zoom getRegion getViewer Layered getRegion getViewer

24 Behaviors by Type ImageMapA/VBookNewsEText Core Image Hi-Res Layered Geo Time Text

25 Simple Image Simple Image model used for art collections Collection-specific page style is achieved by bundling Xslt style sheet as data-stream with collection object The same image model can be used for different collections

26 Zoomable image Zoomable image xslt includes zooming controls Collection specific style is also achieved with XSLT “Zoomable” image also provides simple image behavior Can participate in basic image applications in this way

27 Collection Object Collection behavior – getSearchForm – performSearch – getItem – getItems – addItem – deleteItem – reindex – displayItem

28 Customizing collection objects Collection objects all leverage common search functionality Each provide their own xslt for search results So new collections can be brought up easily This is true regardless of the collection type: image, audio,…

29 Search Implementation FEDORA METS files currently indexed offline Plan to integrate update notification and indexing Search Engine – Have 3 implementations: FEDORA native search Sgrep OpenText Investigating SRW/CQL Search results passed through XSLT Easy to provide search capability to collections

30 TEI Text

31 Bound Volume TEI Book object For transcribed and/or page image scans Table of Contents tree viewer Zoomable image object for page scans

32 Content Re-use Contextualization Collection maintenance – Topical galleries Ad-hoc or dynamic collections – For classes... – personal collections… – special exhibits…

33 Specialized clients “Project Pad” software Group/Private network folders Image annotation Audio annotation Client for FEDORA image and audio objects

34 Image workspace

35 Implementation flexibility Development vs production environment Avoid product “lock-in” Technology migration Services are external – Image server – Tomcat servlets – Search engines – Table of contents service – Xquery – RDBMS

36 FEDORA Dissemination Requests External Services Cache data Data Request Dissemination FEDORA – External Services RDBMS Search Engine BMECH Image Server Data stream TOC Server

37 Next Steps Implement more object types – Event, video, tabular data Authoring tools Work flow support Security management Content management tools Wider interoperability

38 Metadata in ExcelMETSFEDORA Tiffs in Xythos TrueSpectra Image Server Dissemination Requests Catalog in Excel converted to METS for FEDORA ingest Tiff Masters deposited in collection’s Xythos directory Access to Xythos directory enabled for TrueSpectra virtual paths METS/FEDORA record includes link to TrueSpectra image Access to image is through FEDORA image behaviors DepartmentAcademic Technologies Data flow Requests Users Image Workflow: FEDORA – TrueSpectra – Xythos

39 Auto-ingesterFEDORAFiles in Xythos TrueSpectra Streaming Server Search Dissemination Requests Faculty or SupportAcademic Technologies Data flow Requests Users Physical Collection Management Scenario: FEDORA – Content Service – Xythos Integration Metadata update FEDORA collection object attached to Xythos directory Xythos notifies collection object of changes in the directory File added – collection creates new member item File updated – item accepts new version for file stream File removed – item is set dormant in FEDORA Metadata added/updated online or batch

40 Summary Code reuse through object abstraction Content reuse through clear object models Flexible implementation binding Flexible content modeling

41 Future Software Releases Fedora Object XML (FOXML) – Internal storage format; direct expression of Fedora object model – Better support for relationships (“kinship” metadata) – Better support for audit trail (event history) – Format identifiers for dynamic service binding Shibboleth authentication Policy Enforcement – XACML expression language – Fedora policy enforcement module Web interface for easy content submission Batch object modification utility Administrative Reporting Object Event History (ABC/RDF disseminations) Better support for “collections” New ingest and export formats (METS1.3, DIDL) December 2003 – December 2004 Slide courtesy of Sandy Payette and Thornton Staples

42 Future Development Proposals Digital Library in a Box – Full-featured DL application with “Fedora inside” – Optimized for common set of content types Fedora Power Server – Integrity Management Tools – Service and link liveness checker – Fault Tolerance – Mirroring and Replication – Peer-to-peer interoperability features – Repository clustering – Load balancing Object Creation Tools – Workflow applications based on content models – Web interface for document/content submission Slide courtesy of Sandy Payette and Thornton Staples

43 Questions http://www.fedora.net Bill-parod@northwestern.edu


Download ppt "FEDORA Project McGill University May 17 2004 Bill Parod Academic Technologies Northwestern University"

Similar presentations


Ads by Google