Download presentation
Presentation is loading. Please wait.
Published byClarissa Daniel Modified over 9 years ago
1
MARIAN: Searching and Querying Across Heterogeneous Federated Digital Libraries Marcos André Gonçalves Robert K. France Edward A. Fox Tamas E. Doszkocs Work performed at Virginia Tech, Blacksburg, VA USA Support provided in part by NSF & National Library of Medicine.
2
JCDL 2001 First Joint ACM/IEEE Conference on Digital Libraries (+ NSF DLI-2 PI mtg) http://www.jcdl.org June 24-28, 2001 in Roanoke, VA Conference Committee: General Chair: Edward A. Fox, Virginia Tech Program Chair: Christine Borgman, UCLA Treasurer: Neil Rowe, Naval Postgraduate School Posters Chair: Craig Nevill-Manning, Rutgers U. …
3
Outline NDLTD Harvesting Strategies and the OAI MARIAN Middleware Generating Digital Libraries with 5SL Future Directions
4
NDLTD (1 of 3) Context: Networked Digital Library of Theses and Dissertations, www.ndltd.org, www.theses.orgwww.theses.org Please join! Submit your (student’s) works! International federation of universities, libraries, supporting institutions (e.g., VTLS union catalog) Extremely heterogeneous Autonomy of management and decentralization Disparate protocols, metadata, repositories (e.g., UMI, OCLC’s WorldCat), language, encodings, user characteristics and preferences
5
NDLTD (2 of 3) Worldwide organization: educational/social context National/regional projects in Australia, Catalunya, Germany, India, Latin America (UNESCO/OAS/ISTEC), South Africa (Mellon), USA (including OhioLINK), … International conference (225 in March 2000, more expected for next, at Caltech) Steering committee representing supporting groups as well as the hundreds of universities
6
NDLTD (3 of 3) Unique collection – discipline/document context Multilingual and multimedia content Large book-size documents Full-content in several formats (XML, PDF, etc.) Large number of bibliographic references Several sets of metadata with different ranges of quality, that can fit with the Open Archives Initiative (www.openarchives.org)
7
Harvesting Strategies Harvesting vs. Federated Search Harvesting plus Federated Search Plus local collections The NDLTD Union Collection Multiple Harvesting Protocols Harvest™ System Z39.50 Dienst OAI
8
Union Collection Architecture
9
Open Archives Initiative (OAI) Interoperability Standards: Released - Jan/Feb Data + Service Providers Metadata Harvesting Protocol Unique identifiers (URNs) for each record Date-stamp for each record when last modified/created/deleted HTTP server with scripting capabilities 6 Service requests (verbs) Identify, ListMetaFormats, ListSets ListIdentifiers, GetRecord, ListRecords
10
low-barrier interop umbrella herbert van de sompel metadata OPACimageFTXTA&Ie-print
11
OAI harvesting tools herbert van de sompel service provider harvester data provider repository Datestamp Identifier Set Records repositoryrepository
12
OAI harvesting tools herbert van de sompel service provider harvester data provider repository Supporting protocol requests: Identify ListMetadataFormats ListSets Harvesting protocol requests: ListRecords ListIdentifiers GetRecord repositoryrepository
13
Design Features Combined Harvesting, Federated Search, and Local Collections Object-Oriented Information Graph Representation 5S Model and 5SL Specification Language
14
MARIAN Middleware Flexible Representation Model Information Graph Class Hierarchies Weights and Weighted Sets (w. lazy eval) Class-Based Search Unified Searcher API Combining Heterogeneous Information Structural Matching Synthetic Superclasses
15
Information Graph Model (1/2) Each Information Object is a Node. Structure: exposed through Links Features of interest can become Nodes or can remain Hidden within Node Class Search Methods.
16
Information Graph Model (2/2)
17
Class-Based Search Common Search Methods Text Link / Weighted Link Node in Context Common Searcher Operations Match Best (weighted maximum) Match Most (summative union)
18
Class-Based Search public interface ClassManager { public WtdObjSet match(InfoDesc description); public boolean isInClass(FullID id); public Object idToObject(FullID id); public Vector idsToObjects(Vector ids); }
19
Class-Based Search
20
Combining Sources of Information Structural Matching Extends Weighted Retrieval to include “Best Match to Document Structure” Recursive, Extensible Collection Views Simple Interface to Complex Collections Common Interface to Diverse Collections Weighted Interface to Collections of Varying Quality
21
Dc.creatorHasDcCreator HasCrawlerAuthor Headings Dc.Subject Keywords HasDcSubject HasHeadings HasKeywords dc.title crawlerTitle PhysDis-ETD (SOIF) dc.description crawlerDescription body Individual HasAuthor HasSubject title ThesisDissertation description SubClasses 0.8 1.00.91.0 0.8 SubClasses 1.0 0.8 0.9 Subject Individual Dc.creatorHasDcCreator HasCrawlerAuthor Headings Dc.Subject Keywords HasDcSubject HasHeadings HasKeywords dc.title crawlerTitle PhysDis-ETD (SOIF) dc.description crawlerDescription body Individual HasAuthor HasSubject title ThesisDissertation description SubClasses 0.8 1.00.91.0 0.8 SubClasses 1.0 0.8 0.9 Subject Individual NDLTD Collection View (part)
22
5S Model for Digital Libraries (1/2) Formal Model Streams Structures Spaces Services Societies
23
5S Model for Digital Libraries (2/2) Formal Model Streams Structures Spaces Services Societies NDLTD / MARIAN Example Document (presentable, indexable information object) Weighted Set (e.g., of results to a match operation) Collection Graph; Inheritance Lattice; Measure Space Adaptive Search; Query History Maintenance Library End-Users; DL Builders
24
5SL Generates Digital Library (Components)
25
Generating Digital Libraries: XML
26
Interoperability with 5S and 5SL Reductionist / Constructivist Approach Compositional mappings between DLs Composition of S-based constructs Mapping language
27
Student Projects to Integrate Schedule-driven Harvester SDI / Filtering for NDLTD MARIAN-Phronesis (Spanish – Monterrey); and work with German (Oldenburg / DFG), Portuguese, Chinese, Japanese, Korean TREC data formatted for loading
28
Future Work Fusion on hybrid architecture Incorporation of belief networks Using 5SL to generate wrappers New services/ functionalities Personalization (e.g., history, folders) Visualization (e.g., Envision applet) Integration with PetaPlex (100 nodes, 2.5 Tbytes disk capacity, > 300 Mbps to campus backbone, Sornil inversion)
29
Conclusions NDLTD provides a real, fertile, DL testbed. Harvesting strategies and the OAI MARIAN middleware: graphs, classes, views Generating Digital Libraries with 5SL Future: high performance services, experimental comparisons
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.