ETD Search Services Ming Luo Edward A. Fox Virginia Tech
Acknowledgements (selected) Support: Adobe, AOL, DFG, NSF (DUE , , ; IIS , ), OCLC, UNESCO, VTLS Colleagues: Vinod Chachra, Tom Dehn, Marcos Gonçalves, Thom Hickey, Aaron Krowne, Ming Luo, Gail McMillan, Hussein Suleman, Jeff Young
Where are the data coming from? From this community! Please join us, share your data! OCLC NDLTD Union – 112,652 ETDs metadata – 44 institutions – Will be able to provide 3 formats DC ETDMS MARC21
Where are the data coming from? OCLC (Research) contacts are – Thom Hickey – Jeff Young – Tom Dehn VTLS has some additional data sources – Providing data other than through OAI-PMH – Including in Korean and Greek – Contact is Vinod Chachra
Institutions in OCLC NDLTD Union Set (1)
Institutions in OCLC NDLTD Union Set (2)
OCLC SRU What is SRU? – Search and Retrieve URL Service (SRU) is web service based protocols for searching databases – Derived from Z39.50 – Uses Common Query Language – Current version: V1.1, 13th February 2004 – Three basic operations, explain, scan and searchRetrieve
CQL Examples (from dc.title cql.stem dc.title = "cat" cat dc.title = "cat" author = "smith" dc.title any "cat" bath.author cql.exact "smith, j." dc.title any/relevant/rel.CORI "cat fish" dc.author exact/stem "smith, j." dc.title = cat " " dc.title = "cat" and bath.author = "smith" " cat" or hat dc.title = "cat" prox/distance=1/unit=word dc.title = "in" "cat" prox/distance>2/ordered "hat" dc.title=cat and/rel.sum dc.title=dog > dc=" dc.title = "cat"
OCLC SRU Interface
VTLS Search Based on Virtua system from – VTLS ( ) – V isionary T echnology in L ibrary S olutions Developed in C++ Uses Oracle Database
Virtua User Interface Scan Search Key Word Search Expert Search
VTLS Union Catalog Content Languages The VTLS NDLTD Union Catalog has data in 6 different languages. These are: English German Greek Korean Portuguese Spanish Examples follow
Language = German; hits = 137
Full record display
Virginia Tech ETD Union Componentized Digital Library Software Uses OCLC’s OAI data provider Mirrored in China by CALIS About 200 queries and 400 pages per day for the past year and usage is increasing
Discovery Current Awareness Preservation Service Providers Data Providers Metadata harvesting The World According to OAI
VT ETD Union System Diagram BrowseSearchWhat’s New OCLC data provider User Interface
Open Digital Library Protocol Extended OAI-PMH Protocol for Metadata Harvesting
Open Digital Library Component Extended OPEN ARCHIVE OPEN ARCHIVE
Open Digital Library Components Running now – XML-File (data provider from file system) – Search: simple or in-memory (Essex) or generalized – Union, browse, recent, filter – E-journal/review, Submit, Edit, Annotation – Recommender, Rating; Mirroring (see JCDL’02) – Working with NCSA: from DB, unstructured text Others in process – Classification/categorization – Registry (and other connections with web services)
Program Document Document Document Program Program Image Image Image Video Video Video open digital library OA PMH XPMH
Program Document Document ETD Program ETD Image Image ETD Video Video ETD-4 ETD DL for the Networked Digital Library of Theses and Dissertations ( Search Filter Union Recent Browse PMH ODLRecent ODLBrowse ODLUnion ODLSearch ODLUnion PMH USER INTERFACE Students and researchers ETD collections Example Open Digital Library
ETD Union Search Mirror Site in China (CALIS) ( – popular site!)
Quality of Search Services ComposabilityEfficiencyEffectiveness OCLC SRUMediumHigh VTLSLowHighMedium Virginia Tech ETD Union Search HighMedium
Comparison of Software Software UsedSoftware License Price of Software Full Text Search OCLC SRUHomegrownN/A No VTLSVIRTUA (VTLS.com) CommercialDepends on user number, collection size No Virginia Tech ETD Union Search Open Digital Library BSD-like Open Source License FreeNo Virginia Tech ETD collection Ultraseek (Verity.com) CommercialDepends on user number, collection size Yes
Next Steps with VT ETD Union Web Services based component Easier user interface configuration Better precision of search results; full-text? Research studies (e.g., Ryan Richardson dissertation) – Studies of collections and genre – Summaries using concept maps – Cross-language retrieval
References: Z39.50 International - Next Generation: VT Service: bin/OCLCUnion/UI/index.pl VTLS Service: OCLC Service (SRU):
Thank You! Paper with more details is available at URL: SearchServices0.7.doc DLRL: Fox: