Arc – Federated Searching Service Kurt Maly, Xiaoming Liu, M.Zubair, Michael L.Nelson Old Dominion University January 23, 2001.

Slides:



Advertisements
Similar presentations
© 2008 EBSCO Information Services SUSHI, COUNTER and ERM Systems An Update on Usage Standards Ressources électroniques dans les bibliothèques électroniques.
Advertisements

Search, access and impact: Web citation services Tim Brody Intelligence, Agents, Multimedia Group University of Southampton.
Tim Brody University of Southampton CiteBase Services 13/07/2001.
Open Scholarship 2006 Bielefeld Academic Search Engine a Scientific Search Service for Institutional Repositories Open Scholarship 2006 New Challenges.
Classification & Your Intranet: From Chaos to Control Susan Stearns Inmagic, Inc. E-Libraries E204 May, 2003.
The CERIF-2000 Implementation. Andrei S. Lopatenko CERIF Implementation Guidelines Andrei Lopatenko Vienna University of Technology
June 22-23, 2005 Technology Infusion Team Committee1 High Performance Parallel Lucene search (for an OAI federation) K. Maly, and M. Zubair Department.
ELPUB 2006 June Bansko Bulgaria1 Automated Building of OAI Compliant Repository from Legacy Collection Kurt Maly Department of Computer.
Information Retrieval in Practice
R utgers C ommunity R epository RU CORE Fedora Repository Object Datastreams.
Semantic Search Jiawei Rong Authors Semantic Search, in Proc. Of WWW Author R. Guhua (IBM) Rob McCool (Stanford University) Eric Miller.
OAI Standards for Sheet Music Meeting March 28-29, 2002 Basic OAI Principals How They Apply to Sheet Music Presenter: Curtis Fornadley, Senior Programmer/Analyst.
Basic Concepts Architecture Topology Protocols Basic Concepts Open e-Print Archive Open Archive -- generalization of e-print Data Provider and Service.
Overview of Search Engines
Metadata: Its Functions in Knowledge Representation for Digital Collections 1 Summary.
A Social blog using MongoDB ITEC-810 Final Presentation Lucero Soria Supervisor: Dr. Jian Yang.
Dienst Distributed Networked Publishing Carl Lagoze Digital Library Scientist Cornell University.
Metadata Harvesting The Hague, 13 & 14 January 2009 Julie Verleyen Scientific Coordinator, Europeana Office EuropeanaLocal Knowledge Sharing Workshop.
OCLC Online Computer Library Center CONTENTdm ® Digital Collection Management Software Ron Gardner, OCLC Digital Services Consultant ICOLC Meeting April.
University of Illinois at Urbana-Champaign OAI Alpha Experiences Timothy W. Cole Thomas G. Habing Grainger Engineering.
Online Autonomous Citation Management for CiteSeer CSE598B Course Project By Huajing Li.
Getting Started with CONTENTdm Corey Harper, University of Oregon Terry Reese, Oregon State University OLA - April 8, 2005.
Project Overview Bibliographic merging, Endeca, and Web application.
Dec 9-11, 2003ICADL Challenges in Building Federation Services over Harvested Metadata Hesham Anan, Jianfeng Tang, Kurt Maly, Michael Nelson, Mohammad.
Indo-US Workshop, June23-25, 2003 Building Digital Libraries for Communities using Kepler Framework M. Zubair Old Dominion University.
University of North Texas Libraries Building Search Systems for Digital Library Collections Mark E. Phillips Texas Conference on Digital Libraries May.
Internet Information Retrieval Sun Wu. Course Goal To learn the basic concepts and techniques of internet search engines –How to use and evaluate search.
Themes Architecture Content Metadata Interoperability Standards Knowledge Organisation Systems Use and Users Legal and Economic Issues The Future.
Sébastien François, EPrints Lead Developer EPrints Developer Powwow, ULCC.
Introduction to Digital Libraries hussein suleman uct cs honours 2004.
Design of a Search Engine for Metadata Search Based on Metalogy Ing-Xiang Chen, Che-Min Chen,and Cheng-Zen Yang Dept. of Computer Engineering and Science.
ICOLC Las Vegas March 28, 2003 TDNet E-Management Services for Consortia From E-Journals to E-Resources Michael Markwith President, TDNet Inc.
Archivists’ Toolkit: Introduction March 12, 2007 Jody Lloyd Thompson.
ICDL 2004 Improving Federated Service for Non-cooperating Digital Libraries R. Shi, K. Maly, M. Zubair Department of Computer Science Old Dominion University.
CYCLADES IST CYCLADES: A Personalised Collaborative Digital Library Environment Umberto Straccia I.S.T.I. - C.N.R. Pisa (ITALY)
National Partnership for Advanced Computational Infrastructure San Diego Supercomputer Center Persistent Management of Distributed Data Reagan W. Moore.
Kurt Maly Department of Computer Science Old Dominion University Norfolk, Virginia 23529, USA Digital Libraries, OAI and Free Software.
Uwe SchindlerGES 2007 – May 2-4, 2007 Data Information Service based on Open Archives Initiative Protocols and Apache Lucene Uwe Schindler 1, Benny Bräuer.
Open Archives Initiative Protocol for Metadata Harvesting (OAI-PMH) Phil Barker, March © Heriot-Watt University. You may reproduce all or any part.
Open Archive Initiative – Protocol for metadata Harvesting (OAI-PMH) Surinder Kumar Technical Director NIC, New Delhi
Caltech CODA CODA: Collection of Digital Archives Caltech Scholarly Communication.
An OAI-Compliant Federated Physics Digital Library for the NSDL Department of Computer Science Old Dominion University, Norfolk, VA In Collaboration.
1 GRID Based Federated Digital Library K. Maly, M. Zubair, V. Chilukamarri, and P. Kothari Department of Computer Science Old Dominion University February,
Mercury – A Service Oriented Web-based system for finding and retrieving Biogeochemical, Ecological and other land- based data National Aeronautics and.
© 2010 Deep Web Technologies, Inc. Taking the Library Back from Google Abe Lederman, President and CTO Deep Web Technologies May 12, 2010.
Automatic Metadata Discovery from Non-cooperative Digital Libraries By Ron Shi, Kurt Maly, Mohammad Zubair IADIS International Conference May 2003.
Metadata “Data about data” Describes various aspects of a digital file or group of files Identifies the parts of a digital object and documents their content,
Digital Collections: Making it Happen Hema Ramachandran Ed Sponsler Jim O’Donnell, Caltech Library System SCELC, September , Caltech.
JISC/NSF PI Meeting, June Archon - A Digital Library that Federates Physics Collections with Varying Degrees of Metadata Richness Department of Computer.
May 26-28ICNEE 2003 ARCHON: BUILDING LEARNING ENVIRONMENTS THROUGH EXTENDED DIGITAL LIBRARY SERVICES Hesham Anan, Kurt Maly, Mohammad Zubair,et al. Digital.
Oct 12-14, 2003NSDL Challenges in Building Federation Services over Harvested Metadata Kurt Maly, Michael Nelson, Mohammad Zubair Digital Library.
IAEA International Atomic Energy Agency INIS Collection Search: Introduction and main features The Role of the International Nuclear Information System.
Search with Invenio Invenio User Group Workshop 2012 CERN IT-CIS-DLS – Flavio Costa.
OAI: What happened since Summer 2000 End of Summer 2000 –Not only e-prints research library community publishers, librarians, scholars –Digital Library.
Feb 24-27, 2004ICDL 2004, New Dehli Improving Federated Service for Non-cooperating Digital Libraries R. Shi, K. Maly, M. Zubair Department of Computer.
Feb 21-25, 2005ICM 2005 Mumbai1 Converting Existing Corpus to an OAI Compliant Repository J. Tang, K. Maly, and M. Zubair Department of Computer Science.
1 CS 430: Information Discovery Lecture 26 Architecture of Information Retrieval Systems 1.
June 3-6, 2003E-Society Lisbon Automatic Metadata Discovery from Non-cooperative Digital Libraries R. Shi, K. Maly, M. Zubair Department of Computer Science.
CERN Document Server 19 tth January 2006 CERN Document Server Jean-Yves Le Meur 19 th January 2006.
NDLTD Toward Universal Accessibility of ETDs: Building the NDLTD Union Archive Hussein Suleman, Edward A. Fox,
Information Retrieval in Practice
Tiewei (Lucy) Liu Metadata Librarian June 26, 2016
An Overview of Data-PASS Shared Catalog
Building A Web-based University Archive
Building Search Systems for Digital Library Collections
NASA Technical Report Server (NTRS) Project Overview April 2, 2003
The Re3gistry software and the INSPIRE Registry
OAI and Metadata Harvesting
Digitometric Services for Open Archives Environments
Introduction to Digital Libraries Assignment #4
Presentation transcript:

Arc – Federated Searching Service Kurt Maly, Xiaoming Liu, M.Zubair, Michael L.Nelson Old Dominion University January 23, 2001

Introduction Federated searching service Participant of OAI alpha test

Background Universal Preprint Service. Initial demonstration vehicle for OAI. Based on NCSTRL+ which is an extension of NCSTRL. Buckets. Search engine developed at ODU based on Oracle database.

Service (1/2) Simple search. Search freetext across archives. Support boolean operator (and/or). Advanced search. Search across archives, or in specific archive and its subset. Search free text in author/title/abstract fields. Filter search/browse by archive/set/subject/type/language/datestamp/disc overy date. Controlled vocabulary extracted from archives.

Service (2/2) Result sorting. By datestamp,archive,relevant ranking. Result display. Result list – NCSTRL+ like interface. Display single document in detail.  Lightweight bucket.  Link to data source.

Collections being harvested Data harvested from OAI1.0 compliant Data harvested from old SFC WCR NCSTRL IdentifierFull name of the archive arXivarXiv e-print archive CogPrints NACANational Advisory Committee for Aeronautics NDLTDVirginia Tech Thesis/Dissertation Collection LTRSLangley Tehcnical Report Server

Harvesting - For Alpha Test Only IdentifierOrganization Harvest URL HeinOnlineCornell NSDL-CUCornell ldcUPenn elraUPenn lcoa1LOC tknUTK idliUIUC

Implementation (1/3)

Implementation (2/3) Data Normalization Different archives have different format/naming conventions for specific metadata fields. Harvest Historical Harvest  Collected archival data published before a fixed time Fresh Harvest  An incremental harvester daemon periodically fetches new published metadata from data providers.

Implementation (3/3) Metadata indexed with Oracle’s context cartridge server Session information maintained in local cache For performance reasons; result sets can be large and are manipulated in cache rather than from the RDBMS More info about architecture: ECDL 2000, Maly et al., pp

Lessons Learned (1/2) Quality of data providers The expense of maintaining a quality federation service is highly dependant on quality of data providers. Controlled vocabulary Using unified controlled vocabulary, or at least defining mapping relationship, is important in a cross archive service.

Lessons Learned (2/2) XML syntax and character encoding A single error could influence large set of data. The character encoding error occurs frequently in most data providers. Harvest schedule We use historical harvest + daily based incremental harvest. The trade-off between data freshness and harvest efficiency.

Future Work Create authority file for author, organization, format, etc. Map different subject classification system to a canonical one. Adding full bucket support. Link service, customized collections, change the nature of the collection based on usage... and other value added service if possible.

Acknowledgements Thanks for the help from OAI alpha group and data providers. Thanks for the help from ODU DL Group (