An OAI-Compliant Federated Physics Digital Library for the NSDL Department of Computer Science Old Dominion University, Norfolk, VA In Collaboration With Los Alamos National Laboratory & American Physical Society
Motivation Lack of a federation service that provides an unified interface to diverse collections in the physics domain having metadata that differ in richness, syntax, and semantics
Challenges Resource Discovery –Diversity in metadata richness –Lack of controlled vocabulary –Ease of discovering –Reference services Creation and Maintenance –Freshness of metadata –Dynamic nature of collections –Filtering Economic Sustainability –Rights management –Who pays?
Automated metadata mapping approach
Interactive resource discovery approach components
Data Normalization and Authority Files Authority File Creation Process Normalization of entries in Creator Field Attempt to convert all the author fields in all the archives to a standard format, for example: Clustering Iterative refinement approach: Coarse level clusters based on approximate string matching (edit-distance, soundex, n-gram) Refining clusters based on affiliation where available Interactive author confirmation service Part of general metadata ‘scrubbing’ service Interactive author confirmation Feedback to source archive Some of the related work Klas Erikson, “Approximate Swedish name matching –survey and test of different algorithms” James C. French, Allison L. Powell, and Eric Schulman, “ Applications of Approximate word matching in information retrieval”. James C. French, Allison L. Powell and Eric Schulman, “Automating the construction of authority files in digital libraries: A case study”
Federation/archives Consistency
Homogenizing User Space Allowing Web search users to discover information in OAI collections (DP-9 Service) Allowing OAI search users to discover information in non-OAI-compliant collections/web sites and databases that are Web enabled
DP-9 Service for Exposing OAI Collections to Web
Web Enabled Non-OAI Compliant Collections/Databases/ Web Sites Web Enabled Non-OAI Compliant Collections/Databases/ Web Sites Web Enabled Non-OAI Compliant Collections/Databases/ Web Sites OAI Service Provider Gateway to Non-OAI Collections WIDL Description (XML based language) WIDL Description (XML based language) WIDL Description (XML based language) Gateway Service for Harvesting Non-OAI Collections
Sample Description in WIDL of a Web enabled Non-OAI Collection
Gateway Implementation ServiceController WIDLProcessor2 DcRecord Record2DB WIDLProcessor1 WIDLGenerator MetadataConverter Database
NSDL Physics Collection – Prototype in Development Web enabled non- OAI Collection
Interactive Subject Selection Interface
Future Tasks Exploiting richer metadata (formula-based search and authoring) Handling diversity in metadata (same time display of rich and poor metadata records) Interactive search interface for resource discovery Data normalization, authority files, filtering Harvesting other OAI collections such as CERN Harvesting non-OAI compliant collections Investigating different schemes for maintaining federation/archives consistency Provide high level services such as cross-linking based on existing tools. Citeseer research index-based reference linking service Collaboration with NUDL project at VT and SINN project at Oldenburg Integration into NSDL main portal