Designing Protocols in Support of Digital Library Componentization Hussein Suleman and Edward A. Fox Digital Library Research Laboratory Virginia Tech
Acknowledgements §Portions of this work were funded in part by the US National Science Foundation through grants DUE , , , , and ; and IIS , and Among these are subcontracts with original funding to UNC Wilmington, U. of Arizona, and U. of Florida. §Portions of this work were funded in part by the Mellon Foundation through a subcontract with original funding to SOLINET for AmericanSouth.org.
Program Document Document Document Program Program Image Image Image Video Video Video usersdigital objects ?
? Program Document Document Document Program Program Image Image Image Video Video Video ? digital library Monolithic and/or Custom-built web-based application
Program Document Document Document Program Program Image Image Image Video Video Video componentized digital library ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ?
Open Archives Initiative (OAI) §Advocacy for interoperability §Standard for transferring metadata among digital libraries l Protocol for Metadata Harvesting (PMH) Simplicity Generality Extensibility §Support for PMH => Open Archive (OA)
Program Document Document Document Program Program Image Image Image Video Video Video open digital library OA PMH XPMH
Open Digital Library Protocol Extended OAI-PMH Protocol for Metadata Harvesting
Open Digital Library Component Extended OPEN ARCHIVE OPEN ARCHIVE
Open Digital Library §Network of Extended Open Archives where each node acts as either a provider of data, services or both. §Component = Node §Protocol = Arc
Program Document Document ETD Program ETD Image Image ETD Video Video ETD-4 ETD Digital Library Search Filter Union Recent Browse PMH ODLRecent ODLBrowse ODLUnion ODLSearch ODLUnion PMH USER INTERFACE Students and researchers ETD collections Example Open Digital Library
Prototype - FrontPage
Prototype - Search
Prototype - Browse
ODL Component Requirements §Search l Retrieve a list of items l Index new items §Annotate l Add annotation to item l Retrieve a list of annotations for an item
Open Digital Library Components §Running now l XML-File (data provider from file system) l Union, search, browse, recent, filter l E-journal/review, Submit, Edit, Annotation §Class projects l High performance multilingual search l Recommender, Rating; Mirroring (see JCDL’02) l Working with NCSA: from DB, unstructured text §Others discussed l Classification/categorization l DL-Viz interconnection (VIDI – Jun Wang ETD)
Harvest from data providers DBUnion Archive Merger Component DBBrowse Browse Engine IRDB-1 Search Engine As Metadata Search Service Provider As Metadata Browse Service Provider XML File Coll. & Data Provider 1 XML File Coll. & Data Provider 2 XML File Coll. & Data Provider 3 Open Digital Library: Extended What’s New Engine As What’s New Service Provider OAI-PMH Data Provider Submit Archive OAIB (NCSA: from RDBMS) Filter Recommend Rate Engine Annotation Engine IRDB-2 Search Engine As Annotation Search Service Provider As Recommend & Rate Service Provider
Layer 1 : OAI PMH §Protocol for Metadata Harvesting l Transfer stream of metadata from one archive or component to another §Service Requests l Identify, ListSets, ListMetadataFormats l GetRecord, ListIdentifiers, ListRecords
Layer 2 : Extended OAI-PMH §OAI-PMH + extensions for general-purpose inter-component communication l Added in generic containers in every response for additional information l Added “PutRecord” to submit a record l Increased granularity to support times as well as dates (same as OAI-PMH v2.0) l Ignored DC requirement
Layer 3 : ODL Protocols §Specialized protocol semantics for different components, e.g.: l Search component uses ODLSearch protocol ListRecords and ListIdentifiers embed query terms in “set” parameter l Annotation component uses ODLAnnotate protocol ListRecords and ListIdentifiers specify the item for which annotations are requested in the “set” parameter PutRecord adds an annotation to an item
Case Study: ETD ODL Prototype §Electronic Thesis and Dissertation Open Digital Library
Case Study: CSTC §Computer Science Teaching Center
CSTC User Interface
Performance Optimizations §Caching of responses §Persistent CGI mechanisms l FastCGI l SpeedyCGI §Request multiple records in a single operation (proposed)
What have we accomplished ? §Complete protocol-level separation among components within the DL §Seamless integration with little “glue” §Simple extensions of OAI-PMH §Modular and portable components §Efficient in speed - not as efficient in storage
Projects Using ODL §NDLTD ( l Union Catalog for Electronic Theses and Dissertations – prototype ODL site §Computer Science Teaching Center ( part of l Digital library of peer-reviewed teaching resources in the computing sciences §AmericanSouth.org l Portal to meta-collection of resources related to Southern History and Culture
(Somewhat) Open Issues §Is this scalable? Portable ? Extensible ? §Can we define all popular DL services using such a methodology? (completeness problem) §Can we define DLs as configurations of ODL components? (composition problem) §Is OAI-PMH a good baseline protocol ? Can we design a better baseline protocol upon which to base harvesting and repository access? §To what degree is an ODL network equivalent to a monolithic system? (comparison problem)
Ultimate Goal §Package different configurations into instant DL systems or subsystems §DL building = component configuration §All DLs speak the same language(s) §Basic services are trivial to provide so more effort is spent on advanced capabilities of DLs
Discussion? §Questions? §How can we extend this discussion, in context of the OCKHAM effort? §Will you please join: l l l §Will you add to ?