Dec 9-11, 2003ICADL 20031 Challenges in Building Federation Services over Harvested Metadata Hesham Anan, Jianfeng Tang, Kurt Maly, Michael Nelson, Mohammad.

Slides:



Advertisements
Similar presentations
EBSCO Discovery Service
Advertisements

50 Years of Experience in Making Grey Literature Available Matching the Expectations of the Particle Physics Community Carmen ODell.
Interoperability Scenarios All Working Groups Meeting May, Rome, Italy.
EXtensible Catalog David Lindahl University of Rochester.
Rapid Visual OAI Tool S. Kothamasa, K. Maly, M. Zubair (Old Dominion University) X. Liu (Los Alamos National Laboratory) RCDL 2003, St. Petersburg.
11th euroCRIS Strategic Seminar Brussel, Sep 9 – Discovery Metadata Friedrich Summann COAR / Bielefeld University Library.
June 22-23, 2005 Technology Infusion Team Committee1 High Performance Parallel Lucene search (for an OAI federation) K. Maly, and M. Zubair Department.
ELPUB 2006 June Bansko Bulgaria1 Automated Building of OAI Compliant Repository from Legacy Collection Kurt Maly Department of Computer.
OAI-PMH Dawn Petherick, University Web Services Team Manager, Information Services, University of Birmingham MIDESS Dissemination.
Dspace – Digital Repository Dawn Petherick, University Web Services Team Manager Information Services, University of Birmingham MIDESS Dissemination.
Introducing Symposia : “ The digital repository that thinks like a librarian”
Basic Concepts Architecture Topology Protocols Basic Concepts Open e-Print Archive Open Archive -- generalization of e-print Data Provider and Service.
Metadata: Its Functions in Knowledge Representation for Digital Collections 1 Summary.
System Design/Implementation and Support for Build 2 PDS Management Council Face-to-Face Mountain View, CA Nov 30 - Dec 1, 2011 Sean Hardman.
Data-PASS Shared Catalog Micah Altman & Jonathan Crabtree 1 Micah Altman Harvard University Archival Director, Henry A. Murray Research Archive Associate.
OCLC Online Computer Library Center A Global OpenURL Resolver Registry Phil Norman OCLC Dlsr4lib Workshop March 23 rd, 2006 Arlington VA.
Usability Evaluation of a Research Repository and Collaboration Website For Human-animal Bond Researchers Tao Zhang | Digital User Experience Specialist.
Digital Asset Management for All? Visualising a Flexible DAMS Solution for Small and Medium Scale Institutions Paul Bevan Llyfrgell Genedlaethol Cymru.
Dienst Distributed Networked Publishing Carl Lagoze Digital Library Scientist Cornell University.
Navigating and Browsing 3D Models in 3DLIB Hesham Anan, Kurt Maly, Mohammad Zubair Computer Science Dept. Old Dominion University, Norfolk, VA, (anan,
CONTI’2008, 5-6 June 2008, TIMISOARA 1 Towards a digital content management system Gheorghe Sebestyen-Pal, Tünde Bálint, Bogdan Moscaliuc, Agnes Sebestyen-Pal.
Introduction to the OAI Metadata Harvesting Protocol Hussein Suleman, Digital Library Research Laboratory Virginia Tech.
How to participate in the Union Catalogue Project Hussein Suleman Sivulile – Open Access South Africa Advanced Information Management.
Rapid Visual OAI Tool S. Kothamasa, K. Maly, M. Zubair (Old Dominion University) X. Liu (Los Alamos National Laboratory) RCDL 2003, St. Petersburg.
Electronic Theses at Rhodes University presented by Irene Vermaak Rhodes University Library National ETD Project CHELSA Stakeholder Workshop 5 November.
The physics departments and documents network EUNIS Conference, Bled, June 29 th -July 2 nd 2004 Michael Schlenker: Dynamic.
Indo-US Workshop, June23-25, 2003 Building Digital Libraries for Communities using Kepler Framework M. Zubair Old Dominion University.
07/11/2002Thomas Baron - JACoW Workshop1 CERN Library Requirements T. Baron CERN ETT-DH-CDS.
1 Schema Registries Steven Hughes, Lou Reich, Dan Crichton NASA 21 October 2015.
Design of a Search Engine for Metadata Search Based on Metalogy Ing-Xiang Chen, Che-Min Chen,and Cheng-Zen Yang Dept. of Computer Engineering and Science.
SCIELO AS AN OPEN ARCHIVE: the development of SciELO / OpenArchives data provider interface Prof. Carlos H. Marcondes Federal Fluminense University/ Information.
CBSOR,Indian Statistical Institute 30th March 07, ISI,Kokata 1 Digital Repository support for Consortium Dr. Devika P. Madalli Documentation Research &
PI Data Archive Server COM Points Richard Beeson.
ICDL 2004 Improving Federated Service for Non-cooperating Digital Libraries R. Shi, K. Maly, M. Zubair Department of Computer Science Old Dominion University.
Access Management in Federated Digital Libraries Kailash Bhoopalam Kurt Maly Mohammed Zubair Ravi Mukkamala Old Dominion University Norfolk, Virginia.
Freelib: A Self-sustainable Digital Library for Education Community Ashraf Amrou, Kurt Maly, Mohammad Zubair Computer Science Dept., Old Dominion University.
Kurt Maly Department of Computer Science Old Dominion University Norfolk, Virginia 23529, USA Digital Libraries, OAI and Free Software.
This presentation describes the development and implementation of WSU Research Exchange, a permanent digital repository system that is being, adding WSU.
Open Archive Initiative – Protocol for metadata Harvesting (OAI-PMH) Surinder Kumar Technical Director NIC, New Delhi
Caltech CODA CODA: Collection of Digital Archives Caltech Scholarly Communication.
An OAI-Compliant Federated Physics Digital Library for the NSDL Department of Computer Science Old Dominion University, Norfolk, VA In Collaboration.
1 GRID Based Federated Digital Library K. Maly, M. Zubair, V. Chilukamarri, and P. Kothari Department of Computer Science Old Dominion University February,
1 Tools for Extracting Metadata and Structure from DTIC Documents Digital Library Group Department of Computer Science Old Dominion University December,
OAI Overview DLESE OAI Workshop April 29-30, 2002 John Weatherley
Bitter Harvest Metadata Harvesting Issues, Problems, and Possible Solutions Roy Tennant California Digital Library.
Automatic Metadata Discovery from Non-cooperative Digital Libraries By Ron Shi, Kurt Maly, Mohammad Zubair IADIS International Conference May 2003.
Digital Collections: Making it Happen Hema Ramachandran Ed Sponsler Jim O’Donnell, Caltech Library System SCELC, September , Caltech.
Coding Compliance Components Writing Custom Policies for Auditing, Expiration and More Jason Morrill Program Manager Windows SharePoint Services.
JISC/NSF PI Meeting, June Archon - A Digital Library that Federates Physics Collections with Varying Degrees of Metadata Richness Department of Computer.
May 26-28ICNEE 2003 ARCHON: BUILDING LEARNING ENVIRONMENTS THROUGH EXTENDED DIGITAL LIBRARY SERVICES Hesham Anan, Kurt Maly, Mohammad Zubair,et al. Digital.
Oct 12-14, 2003NSDL Challenges in Building Federation Services over Harvested Metadata Kurt Maly, Michael Nelson, Mohammad Zubair Digital Library.
Feb 24-27, 2004ICDL 2004, New Dehli Improving Federated Service for Non-cooperating Digital Libraries R. Shi, K. Maly, M. Zubair Department of Computer.
Arc – Federated Searching Service Kurt Maly, Xiaoming Liu, M.Zubair, Michael L.Nelson Old Dominion University January 23, 2001.
The library is open Digital Assets Management & Institutional Repository Russian-IUG November 2015 Tomsk, Russia Nabil Saadallah Manager Business.
NSDL STEM Exchange: Technical Overview and Implications for Active Dissemination of Federally Funded Resources Across Implementation Systems.
2/22/2016J Ammerman1 Open Archives Initiative What is it? What’s it good for?
The Earth Information Exchange. Portal Structure Portal Functions/Capabilities Portal Content ESIP Portal and Geospatial One-Stop ESIP Portal and NOAA.
1 CS 430: Information Discovery Lecture 26 Architecture of Information Retrieval Systems 1.
June 3-6, 2003E-Society Lisbon Automatic Metadata Discovery from Non-cooperative Digital Libraries R. Shi, K. Maly, M. Zubair Department of Computer Science.
Physical Oceanography Distributed Active Archive Center THUANG June 9-13, 20089th GHRSST-PP Science Team Meeting GHRSST GDAC and EOSDIS PO.DAAC.
CERN Document Server 19 tth January 2006 CERN Document Server Jean-Yves Le Meur 19 th January 2006.
Metayogi Increasing the Accessibility of the Semantic Web Karim Tharani Doug Macdonald Rachel Heidecker.
A centre of expertise in digital information management 10 minute practical guide to the JISC Information Environment (for publishers!)
Archon: Facilitating Access to Special Collections Prepared for PACSCL Conference Something New for Something Old: Innovative Approaches.
Web Services Overview Thomas Hickey. 2 What are Web Services? Machine-to-machine communication Run over standard Web protocols –XML syntax, HTTP packaging.
Submitted By: Usha MIT-876-2K11 M.Tech(3rd Sem) Information Technology
OAI and Metadata Harvesting
NSDL Data Repository (NDR)
Open Archive Initiative
Institutional Repositories
Presentation transcript:

Dec 9-11, 2003ICADL Challenges in Building Federation Services over Harvested Metadata Hesham Anan, Jianfeng Tang, Kurt Maly, Michael Nelson, Mohammad Zubair, and Zhao Yang Digital Library Group Old Dominion University Norfolk, VA 23529

Dec 9-11, 2003ICADL Outline Motivation Overview Process Automation Web Services and Applications Performance Conclusions and Future Work

Dec 9-11, 2003ICADL Motivation Harvesting provides only the basic services to get metadata from repositories. Processing these data or retrieving related metadata is not part of the OAI-PMH. Dynamic harvesting introduces challenges of keeping specialized-services consistent with ingestion of new metadata records.

Dec 9-11, 2003ICADL Motivation There is a growing use of the Web Services standard. Hence providing services compliant with this standard will increase the usability of our digital library. Using web services enable 3 rd parties to provide services that enhance our native services on top of our federation collection

Dec 9-11, 2003ICADL Overview Archon is a federation of physics digital libraries. Its architecture provides services to both humans and machines: Basic Services (for humans) –a search and discovery service; –a service to allow searching on equations embedded in the metadata, –a cross-archive citation service OAI Services (for machines) –a storage service for the metadata of collected archives; –a harvester service to collect data from digital libraries using OAI-PMH –a data provider service to expose metadata to OAI-PMH harvesters Web Services (for machines) –A focus library for personal use

Dec 9-11, 2003ICADL Archon Architecture

Dec 9-11, 2003ICADL Process Automation At the core of Archon we have high level services that require post-processing of harvested metadata. we implemented Archon’s post-harvesting processes as tasks that can be run incrementally and automatically. The Archon post-processing consists of tasks for citation and equation processing, normalization, and a subject resolver.

Dec 9-11, 2003ICADL Harvest Post Processing Citation Processing Reference-linking service provides the user a list of the references for each metadata record. Where possible the service provides links to the documents at external source archives and within Archon.

Dec 9-11, 2003ICADL Harvest Post Processing Citation Processing

Dec 9-11, 2003ICADL Harvest Post Processing-Citation Processing

Dec 9-11, 2003ICADL Harvest Post Processing-Citation Processing

Dec 9-11, 2003ICADL Harvest Post Processing-Citation Processing Data for Resolved References

Dec 9-11, 2003ICADL Harvest Post Processing - Equation Processing We represent the equations as images and display these images when the metadata records are displayed. This requires the following tasks to be performed after harvesting new metadata records: –Identifying equations –Filtering equations –Equation storage

Dec 9-11, 2003ICADL Harvest Post Processing - Equation Processing

Dec 9-11, 2003ICADL Harvest Post Processing - Subject Resolvers Our subject resolver, tries to fill the subject field for APS and arXiv DC records.

Dec 9-11, 2003ICADL Harvest Post Processing - Statistics #records#refs Historical APS39, ,521 ArXive229,076 4,838,158 CERN17,055 58,105 NASA38,688 N/A Emilio3,480 N/A Incremental APS ArXive CERN NASA 4, ,096 0* #Equation # subject resolved *Due to lack of parallel metadata or parsed error in parallel metadata. Equation will not be processed for those whose subject is not resolved. Archon collection Unique Authors: 346,315 Unique Subjects:9,889 Equations (all): 330,503 #records#refs

Dec 9-11, 2003ICADL Web Services and Applications Created web service to allow students and teachers to create personal collections. These services use Web Services standards including the use of SOAP requests and response in communication between the clients and the services. Examples of these services include: –Search Service –Book Shelf Service

Dec 9-11, 2003ICADL Web Services and Applications Book Shelf Service –allows each user to have a personalized collection a subset of the federation –enables teachers to collect course materials and package it in a personalized collection –enables students that are doing research in a topic to make a special collection that contains all the related documents in that collection. Search Service –provides access to all search functionality without the need to use the Archon interface –allows each user (e.g. teacher) to provide customized client for the collections that can have special features according to a course’s needs.

Dec 9-11, 2003ICADL

Dec 9-11, 2003ICADL

Dec 9-11, 2003ICADL Web Services and Applications

Dec 9-11, 2003ICADL Web Services and Applications

Dec 9-11, 2003ICADL

Dec 9-11, 2003ICADL Conclusions and Future Work In our collections, we collected about 300K dc metadata for documents from APS, CERN, arXiv, Emilio and NASA. We also collected 30K parallel metadata records from APS. We have also resolved the data of 5.5M references that are cited by the above documents. Our performance analysis shows that we can comfortably set the scheduler of the OAI harvester to about 1 day and have a safety factor for human intervention should the automatic process break down.

Dec 9-11, 2003ICADL Conclusions and Future Work We have developed Web Services that can be used for search and discovery of our collections. The developed web services can be used by other developers who want to provide customized or enhanced services or that want to build services additional to the currently provided services. We have also developed sample client applications such as a bookshelf client that can store a collection of documents and can be used to export them as references (in user defined formats) to help authors in writing research papers.

Dec 9-11, 2003ICADL Conclusions and Future Work We are almost complete in the process of adding production service of federating CERN, arXiv, and APS. We are partially complete in add NASA and plan to collaborate with AIP(American Institute of Physics) to have their collections included as well. Once all these are federated and working at the high service level at a dynamic basis, the Web services should prove to be attractive particularly to authors of papers who can thus maintain their own bibliographies.

Dec 9-11, 2003ICADL Future Work Collections have overlapping holdings, need strong de-duplication service Expand the personalization effort to allow students and researchers to integrate the DL information into their writing of reports and papers Test a role based access system that allows for each contributing collection to have different policies for different organizations

Dec 9-11, 2003ICADL [1][1] An entry ‘0.1’ means a time less than 0.1s. Harvest Performance Harvesting from NCSTRL-NCSU Operation Operation Time (s) Number of Times Average Time (s) Identify0.61 DB Resumption0.12 ListRecords ListSets24.81 Total

Dec 9-11, 2003ICADL Harvest Performance Harvesting from arXiv (from ARC)

Dec 9-11, 2003ICADL Harvest Performance Harvesting from APS (DC)

Dec 9-11, 2003ICADL Harvest Performance Parallel Harvesting from APS

Dec 9-11, 2003ICADL Citation Processing Performance Citation Processing for APS

Dec 9-11, 2003ICADL Citation Processing Performance Citation Processing for arXiv

Dec 9-11, 2003ICADL Citation Processing Performance Citation Processing for CERN

Dec 9-11, 2003ICADL Subject Resolving Performance APS Subject Revolving

Dec 9-11, 2003ICADL

Dec 9-11, 2003ICADL

Dec 9-11, 2003ICADL

Dec 9-11, 2003ICADL