Download presentation
Presentation is loading. Please wait.
Published byJade Elliott Modified over 9 years ago
1
Dec 9-11, 2003ICADL 20031 Challenges in Building Federation Services over Harvested Metadata Hesham Anan, Jianfeng Tang, Kurt Maly, Michael Nelson, Mohammad Zubair, and Zhao Yang Digital Library Group Old Dominion University Norfolk, VA 23529
2
Dec 9-11, 2003ICADL 20032 Outline Motivation Overview Process Automation Web Services and Applications Performance Conclusions and Future Work
3
Dec 9-11, 2003ICADL 20033 Motivation Harvesting provides only the basic services to get metadata from repositories. Processing these data or retrieving related metadata is not part of the OAI-PMH. Dynamic harvesting introduces challenges of keeping specialized-services consistent with ingestion of new metadata records.
4
Dec 9-11, 2003ICADL 20034 Motivation There is a growing use of the Web Services standard. Hence providing services compliant with this standard will increase the usability of our digital library. Using web services enable 3 rd parties to provide services that enhance our native services on top of our federation collection
5
Dec 9-11, 2003ICADL 20035 Overview Archon is a federation of physics digital libraries. Its architecture provides services to both humans and machines: Basic Services (for humans) –a search and discovery service; –a service to allow searching on equations embedded in the metadata, –a cross-archive citation service OAI Services (for machines) –a storage service for the metadata of collected archives; –a harvester service to collect data from digital libraries using OAI-PMH –a data provider service to expose metadata to OAI-PMH harvesters Web Services (for machines) –A focus library for personal use
6
Dec 9-11, 2003ICADL 20036 Archon Architecture
7
Dec 9-11, 2003ICADL 20037 Process Automation At the core of Archon we have high level services that require post-processing of harvested metadata. we implemented Archon’s post-harvesting processes as tasks that can be run incrementally and automatically. The Archon post-processing consists of tasks for citation and equation processing, normalization, and a subject resolver.
8
Dec 9-11, 2003ICADL 20038 Harvest Post Processing Citation Processing Reference-linking service provides the user a list of the references for each metadata record. Where possible the service provides links to the documents at external source archives and within Archon.
9
Dec 9-11, 2003ICADL 20039 Harvest Post Processing Citation Processing
10
Dec 9-11, 2003ICADL 200310 Harvest Post Processing-Citation Processing
11
Dec 9-11, 2003ICADL 200311 Harvest Post Processing-Citation Processing
12
Dec 9-11, 2003ICADL 200312 Harvest Post Processing-Citation Processing Data for Resolved References
13
Dec 9-11, 2003ICADL 200313 Harvest Post Processing - Equation Processing We represent the equations as images and display these images when the metadata records are displayed. This requires the following tasks to be performed after harvesting new metadata records: –Identifying equations –Filtering equations –Equation storage
14
Dec 9-11, 2003ICADL 200314 Harvest Post Processing - Equation Processing
15
Dec 9-11, 2003ICADL 200315 Harvest Post Processing - Subject Resolvers Our subject resolver, tries to fill the subject field for APS and arXiv DC records.
16
Dec 9-11, 2003ICADL 200316 Harvest Post Processing - Statistics #records#refs Historical APS39,064 686,521 ArXive229,076 4,838,158 CERN17,055 58,105 NASA38,688 N/A Emilio3,480 N/A Incremental APS ArXive CERN NASA 4,052 49 607 66,096 0* 594 12 #Equation # subject resolved 37 581 25 48 *Due to lack of parallel metadata or parsed error in parallel metadata. Equation will not be processed for those whose subject is not resolved. Archon collection Unique Authors: 346,315 Unique Subjects:9,889 Equations (all): 330,503 #records#refs
17
Dec 9-11, 2003ICADL 200317 Web Services and Applications Created web service to allow students and teachers to create personal collections. These services use Web Services standards including the use of SOAP requests and response in communication between the clients and the services. Examples of these services include: –Search Service –Book Shelf Service
18
Dec 9-11, 2003ICADL 200318 Web Services and Applications Book Shelf Service –allows each user to have a personalized collection a subset of the federation –enables teachers to collect course materials and package it in a personalized collection –enables students that are doing research in a topic to make a special collection that contains all the related documents in that collection. Search Service –provides access to all search functionality without the need to use the Archon interface –allows each user (e.g. teacher) to provide customized client for the collections that can have special features according to a course’s needs.
19
Dec 9-11, 2003ICADL 200319
20
Dec 9-11, 2003ICADL 200320
21
Dec 9-11, 2003ICADL 200321 Web Services and Applications
22
Dec 9-11, 2003ICADL 200322 Web Services and Applications
23
Dec 9-11, 2003ICADL 200323
24
Dec 9-11, 2003ICADL 200324 Conclusions and Future Work In our collections, we collected about 300K dc metadata for documents from APS, CERN, arXiv, Emilio and NASA. We also collected 30K parallel metadata records from APS. We have also resolved the data of 5.5M references that are cited by the above documents. Our performance analysis shows that we can comfortably set the scheduler of the OAI harvester to about 1 day and have a safety factor for human intervention should the automatic process break down.
25
Dec 9-11, 2003ICADL 200325 Conclusions and Future Work We have developed Web Services that can be used for search and discovery of our collections. The developed web services can be used by other developers who want to provide customized or enhanced services or that want to build services additional to the currently provided services. We have also developed sample client applications such as a bookshelf client that can store a collection of documents and can be used to export them as references (in user defined formats) to help authors in writing research papers.
26
Dec 9-11, 2003ICADL 200326 Conclusions and Future Work We are almost complete in the process of adding production service of federating CERN, arXiv, and APS. We are partially complete in add NASA and plan to collaborate with AIP(American Institute of Physics) to have their collections included as well. Once all these are federated and working at the high service level at a dynamic basis, the Web services should prove to be attractive particularly to authors of papers who can thus maintain their own bibliographies.
27
Dec 9-11, 2003ICADL 200327 Future Work Collections have overlapping holdings, need strong de-duplication service Expand the personalization effort to allow students and researchers to integrate the DL information into their writing of reports and papers Test a role based access system that allows for each contributing collection to have different policies for different organizations
28
Dec 9-11, 2003ICADL 200328 [1][1] An entry ‘0.1’ means a time less than 0.1s. Harvest Performance Harvesting from NCSTRL-NCSU Operation Operation Time (s) Number of Times Average Time (s) Identify0.61 DB7.01430.1 Resumption0.12 ListRecords46.2223.1 ListSets24.81 Total80.51430.6
29
Dec 9-11, 2003ICADL 200329 Harvest Performance Harvesting from arXiv (from ARC)
30
Dec 9-11, 2003ICADL 200330 Harvest Performance Harvesting from APS (DC)
31
Dec 9-11, 2003ICADL 200331 Harvest Performance Parallel Harvesting from APS
32
Dec 9-11, 2003ICADL 200332 Citation Processing Performance Citation Processing for APS
33
Dec 9-11, 2003ICADL 200333 Citation Processing Performance Citation Processing for arXiv
34
Dec 9-11, 2003ICADL 200334 Citation Processing Performance Citation Processing for CERN
35
Dec 9-11, 2003ICADL 200335 Subject Resolving Performance APS Subject Revolving
36
Dec 9-11, 2003ICADL 200336
37
Dec 9-11, 2003ICADL 200337
38
Dec 9-11, 2003ICADL 200338
39
Dec 9-11, 2003ICADL 200339
Similar presentations
© 2024 SlidePlayer.com. Inc.
All rights reserved.