Danmarks Tekniske Videncenter / Technical Knowledge Center of Denmark Danmarks Tekniske Universitet / Technical University of Denmark DORSDL Workshop,

Slides:



Advertisements
Similar presentations
Testing Relational Database
Advertisements

Alvis status report: Index DataMike Taylor Alvis status report: Index Data Check out the exciting things to come! 1. Technical contribution.
Advanced Metadata Usage Daan Broeder TLA - MPI for Psycholinguistics / CLARIN Metadata in Context, APA/CLARIN Workshop, September 2010 Nijmegen.
JMS messaging service  All write-only Fedora operations are published to subscribed clients  Messaging system can be durable – if client/consumer/subscriber.
A. Grigorov, A. Georgiev, M. Petrov, S. Varbanov, K. Stefanov Building a Knowledge Repository for Life-long Competence Development.
Chapter 2. Slide 1 CULTURAL SUBJECT GATEWAYS CULTURAL SUBJECT GATEWAYS Subject Gateways  Started as links of lists  Continued as Web directories  Culminated.
Connect. Communicate. Collaborate Click to edit Master title style MODULE 1: perfSONAR TECHNICAL OVERVIEW.
Funded by: © AHDS Sherpa DP – a Technical Architecture for a Disaggregated Preservation Service Mark Hedges Arts and Humanities Data Service King’s College.
Depositing e-material to The National Library of Sweden.
1 Introduction to XML. XML eXtensible implies that users define tag content Markup implies it is a coded document Language implies it is a metalanguage.
Information Retrieval in Practice
Building a Digital Library with Fedora International Conference on Developing Digital Institutional Repositories Hong Kong December 9, 2004.
R utgers C ommunity R epository RU CORE Fedora Repository Object Datastreams.
Rheeve: A Plug-n-Play Peer- to-Peer Computing Platform Wang-kee Poon and Jiannong Cao Department of Computing, The Hong Kong Polytechnic University ICDCSW.
Tools and Services for the Long Term Preservation and Access of Digital Archives Joseph JaJa, Mike Smorul, and Sangchul Song Institute for Advanced Computer.
The Open Archives Initiative Simeon Warner (Cornell University) Symposium on “Scholarly Publishing and Archiving on the Web”, University.
ReQuest (Validating Semantic Searches) Norman Piedade de Noronha 16 th July, 2004.
Interpret Application Specifications
Greenstone Digital Library Usage and Implementation By: Paul Raymond A. Afroilan Network Applications Team Preginet, ASTI-DOST.
AgriDrupal - a “suite of solutions” for agricultural information management and dissemination, built on the Drupal CMS; - the community of practice around.
Overview of Search Engines
Databases & Data Warehouses Chapter 3 Database Processing.
Digital Library Architecture and Technology
CONTI’2008, 5-6 June 2008, TIMISOARA 1 Towards a digital content management system Gheorghe Sebestyen-Pal, Tünde Bálint, Bogdan Moscaliuc, Agnes Sebestyen-Pal.
DEF System Architecture XML Web Services Fedora and the Zebra Search Engine in an OAI Eprints Application by Gert Schmeltz Pedersen, DTV
1. 2 introductions Nicholas Fischio Development Manager Kelvin Smith Library of Case Western Reserve University Benjamin Bykowski Tech Lead and Senior.
SITools Enhanced Use of Laboratory Services and Data Romain Conseil
LIS 506 (Fall 2006) LIS 506 Information Technology Week 11: Digital Libraries & Institutional Repositories.
Indo-US Workshop, June23-25, 2003 Building Digital Libraries for Communities using Kepler Framework M. Zubair Old Dominion University.
University of North Texas Libraries Building Search Systems for Digital Library Collections Mark E. Phillips Texas Conference on Digital Libraries May.
Fedora Content Models for the National Science Digital Library Data Repository Fedora User’s Group Meeting Copenhagen, September 28, 2005 Carl Lagoze Cornell.
Fedora and GSearch in a Research Project about Integrated Search Open Repositories 2009 Gert Schmeltz Pedersen DTU Library, Technical Information Center.
FI-CORE Data Context Media Management Chapter Release 4.1 & Sprint Review.
Overview of IU Digital Collections Search Hui Zhang Jon Dunn Indiana University Digital Library Program IU Digital Library Brown Bag October 19, 2011.
PLoS ONE Application Journal Publishing System (JPS) First application built on Topaz application framework Web 2.0 –Uses a template engine to display.
CITIDEL: Computing & Information Technology Interactive Digital Educational Library Web Page: Contacts: Future.
Topic Rathachai Chawuthai Information Management CSIM / AIT Review Draft/Issued document 0.1.
11 CORE Architecture Mauro Bruno, Monica Scannapieco, Carlo Vaccari, Giulia Vaste Antonino Virgillito, Diego Zardetto (Istat)
1 Schema Registries Steven Hughes, Lou Reich, Dan Crichton NASA 21 October 2015.
Design of a Search Engine for Metadata Search Based on Metalogy Ing-Xiang Chen, Che-Min Chen,and Cheng-Zen Yang Dept. of Computer Engineering and Science.
Uwe SchindlerGES 2007 – May 2-4, 2007 Data Information Service based on Open Archives Initiative Protocols and Apache Lucene Uwe Schindler 1, Benny Bräuer.
Introduction to the Semantic Web and Linked Data
Funded by: © AHDS Preservation in Institutional Repositories Preliminary conclusions of the SHERPA DP project Gareth Knight Digital Preservation Officer.
The Mint Mapping tool The MoRe aggregator Vassilis Tzouvaras, Dimitris Gavrilis National Technical University of Athens Digital Curation Unit - IMIS, Athena.
Scalable Hybrid Keyword Search on Distributed Database Jungkee Kim Florida State University Community Grids Laboratory, Indiana University Workshop on.
INRIA - Progress report DBGlobe meeting - Athens November 29 th, 2002.
Automatic Metadata Discovery from Non-cooperative Digital Libraries By Ron Shi, Kurt Maly, Mohammad Zubair IADIS International Conference May 2003.
Information Retrieval
Fedora Content Modeling for Improved Services for Research Databases Open Repositories 2009 Mikael Karstensen Elbæk Alfred Heller Gert Schmeltz Pedersen.
JISC/NSF PI Meeting, June Archon - A Digital Library that Federates Physics Collections with Varying Degrees of Metadata Richness Department of Computer.
Distributed Data Analysis & Dissemination System (D-DADS ) Special Interest Group on Data Integration June 2000.
Distributed Service Registry Workshop, Warwick, U.K. 1 Distributed Functionality in the UIUC OAI Registry
Copyright 2007, Information Builders. Slide 1 iWay Web Services and WebFOCUS Consumption Michael Florkowski Information Builders.
1 CS 430: Information Discovery Lecture 26 Architecture of Information Retrieval Systems 1.
ELISQ Systems Demonstration Sagnik Ray Choudhury Doha -- May 2015.
CERN Document Server 19 tth January 2006 CERN Document Server Jean-Yves Le Meur 19 th January 2006.
Fedora Service Framework Sandy Payette, Executive Director UK Fedora Training London January 22-23, 2009.
September 2003, 7 th EDG Conference, Heidelberg – Roberta Faggian, CERN/IT CERN – European Organization for Nuclear Research The GRACE Project GRid enabled.
General Architecture of Retrieval Systems 1Adrienn Skrop.
CRISP WP 17 1 / 2 Proposed Metadata Catalogue Architecture Document.
Breeda Herlihy, IR Manager, UCC Library. UCC selected DSpace in 2008 Software selection group Staff from Library IT, Computer Centre, Special Collections,
A Semi-Automated Digital Preservation System based on Semantic Web Services Jane Hunter Sharmin Choudhury DSTC PTY LTD, Brisbane, Australia Slides by Ananta.
Information Retrieval in Practice
Usage scenarios, User Interface & tools
CHAPTER 3 Architectures for Distributed Systems
Building Search Systems for Digital Library Collections
An Architecture for Complex Objects and their Relationships
Outline Pursue Interoperability: Digital Libraries
Dr. Bhavani Thuraisingham The University of Texas at Dallas
SDMX IT Tools SDMX Registry
Presentation transcript:

Danmarks Tekniske Videncenter / Technical Knowledge Center of Denmark Danmarks Tekniske Universitet / Technical University of Denmark DORSDL Workshop, 21 September 2006 Development of Services in the Fedora Service Framework by Gert Schmeltz Pedersen

DORSDL Workshop, 21 September Development of Services in the Fedora Service Framework Contents –The Fedora Service Framework –The Fedora Generic Search Service –Considerations about a Peer-to-Peer Service for Fedora –Conclusion

DORSDL Workshop, 21 September The Fedora Service Framework Services are stand-alone web applications that run independently of the Fedora repository Two main benefits to the service framework approach: –allows new functionality to be added as atomic, modular services that can interact with Fedora repositories, yet not be part of the repository, –makes co-development of new services for Fedora easier since each service can be independently developed and plugged into the framework. Flexible Extensible Digital Object Repository Architecture Powerful digital object model Extensible metadata management Expressive inter-object relationships

DORSDL Workshop, 21 September The Fedora Service Framework Fedora Object XML (FOXML) is a simple XML format that directly expresses the Fedora digital object model

DORSDL Workshop, 21 September Development of Services in the Fedora Service Framework The Fedora Generic Search Service –Background The DEF-XWS project Zebra at work Lucene in action –Approach and requirements –Current prototype (fedoragsearch) –Architectural snapshots –Configuration and customization –Further work –The work is funded by DEFF, Denmark's Electronic Research Library.DEFF, Denmark's Electronic Research Library

DORSDL Workshop, 21 September OAI Manager Full set Sub set Librarian DEF Portal User OAI Harvester Open Archives Initiative Data Providers MYSQLMYSQL Z39.50 OAI-PMH Eprint Service Provider Zebra server Web UI w/Z39.50 InfoNet User Zebra server Web UI w/Z39.50 EXPORTEXPORT Fedora server Zebra server Full text retrieval Batch ingest EXPORTEXPORT AppXYZ User DEF-XWS Eprints User SOAP/REST Web UI w/SOAP java Web UI w/REST php AppXYZ w/SOAP perl Background - DEF-XWS Eprints

DORSDL Workshop, 21 September Purpose achieved –Fedora hands-on and experience –web services hands-on and experience –DEF-XWS Eprints available from web services and to applications combining many web services Lesson –Do not override field search, –provide generic search service instead... Background - DEF-XWS Eprints

DORSDL Workshop, 21 September Zebra at work Features Zebra is provided as open source by Index Data. Written in portable C, so it runs on most Unix-like systems as well as Windows. Modules zebraidx and zebrasrv Searching supports a combination of boolean queries, relevance-ranking, truncation, masking, full regular expression matching and "approximate matching" (eg. spelling mistakes). Z39.50 protocol support, recently also SRW/SRU and CQL Configurable to understand many input formats... SGML, XML, ISO2709 (MARC), raw text. Arbitrarily complex records. Robust updating - records can be added and deleted “on the fly”. Very large databases: logical files can be automatically partitioned over multiple disks.

DORSDL Workshop, 21 September ”Lucene in Action” Figure 1.5 A typical application integration with Lucene Document Field dc.title:"Information retrieval" AND dc.creator:Staples

DORSDL Workshop, 21 September Approach and Requirements Do iterations of requirements analysis and prototype development allow various indexing-and-search engines to be configured or plugged in, initially Lucene and Zebra implement as a webapp within the Fedora Service Framework allow indexing of, and search in, all information in FOXML records for FedoraObjects, including full texts in datastreams and disseminator results define interface for a set of operations, provide REST and SOAP access basic operations: –updateIndex - indexing the contents of the Fedora repository –gfindObjects - search similar to Fedora findObjects secondary operations: –browseIndex - browsing terms in a given index. –getRepositoryInfo - describing the properties of a repository –getIndexInfo - describing the properties of an index allow multiple repositories to be indexed in one and the same index allow multiple indexes to be generated from one repository

DORSDL Workshop, 21 September Current prototype - updateIndex … Advanced FO Sample from Apache FOP Distribution Apache Group demo:21 FedoraObject Active FO_TO_PDFDOC Advanced FO Sample … Apache Group transformation

DORSDL Workshop, 21 September Current prototype - gfindObjects

DORSDL Workshop, 21 September Current prototype - gfindObjects

DORSDL Workshop, 21 September Current prototype - browseIndex

DORSDL Workshop, 21 September Current prototype - getRepositoryInfo

DORSDL Workshop, 21 September Current prototype - getIndexInfo

DORSDL Workshop, 21 September Architectural snapshots - basic- fedoragsearch Contents –Lucene –Zebra –fedoragsearch REST demo architecture installation and configuration further customizations

DORSDL Workshop, 21 September Architectural snapshots - indexing - many-to-many

DORSDL Workshop, 21 September Configuration and customization Configuration examples: fedoragsearch.properties - soapBase = - repositoryNames = REPOSNAMES - indexNames = INDEXNAMES - mimeTypes = MIMETYPES INDEXNAME/index.properties - operationsImpl = dk.defxws.fgslucene.OperationsImpl - defaultQueryFields = dc.description dc.title REPOSNAME/repository.properties - soapBase = - fedoraObjectDir = FEDORAOBJECTDIR Customization examples: demoFoxmlToLucene.xslt demoGfindObjectsToHtml.xslt gfindObjects Implement plugin for XyzEngine

DORSDL Workshop, 21 September Further work From prototype to production version Clean up Give access Make better Exceptions and error messages Handle XACML Notification mechanism javaDoc Junit test cases Test on various platforms Documentation Ensuring that we obtain the same high quality as the Fedora code itself has Takeover by core development team Contributions from Fedora community

DORSDL Workshop, 21 September Development of Services in the Fedora Service Framework Considerations about a Peer-to-Peer Service for Fedora 1.The Background: Alvis utilization activities 2.The EU project: Alvis - Superpeer Semantic Search Engine 3.Analysis of alternatives 4.Design of a Peer-to-Peer service for Fedora

DORSDL Workshop, 21 September The Background: Alvis utilization activities The Alvis project is developing an open source prototype of a distributed, semantic-based search engine. An important consideration in the Alvis project has been how to utilize Alvis results in the Digital Library context. Therefore, a test case is established with the purpose to utilize Alvis results in the context of the Fedora repository system (the assumption is that the experience and some principles will be applicable to other digital library systems) The test plan for this test case has the following steps: 1.Analysis Alternative 1: a document enrichment service Alternative 2: a peer-to-peer service 2.Design of a peer-to-peer service for Fedora, so that Fedora may act as an Alvis superpeer 3.Involving the Fedora developer and user community 4.Implementation of the service 5.Evaluation of uses of the service

DORSDL Workshop, 21 September The Alvis EU project The initial tasks research in the design, use and interoperability of topic-specific search engines development of an open-source prototype of a distributed, semantic-based search engine building on content through automatic analysis of free text advancing peer-to-peer technology

DORSDL Workshop, 21 September The Alvis EU project input system can include a crawler, an RSS reader, XML database extraction, etc.input system document system does routine processing on documents prior to entry to the runtime system, such as tagging named entities.document system maintenance system does processing at the full document collection level to update linguistic and semantic resources used in the document system.maintenance system superpeer runs the search engine at a node and provides the user interface. This represents an individual, possibly topic specific search engine.superpeer p2p system provides a network-wide interface to a set of individual search engines using P2P.p2p system Document enrichment service semi-automatic tagging with semantic knowledge Peer-to-Peer service network-wide search Peer-to-peer (From Wikipedia, the free encyclopedia) A peer-to-peer (or P2P) computer network is a network that relies primarily on the computing power and bandwidth of the participants in the network rather than concentrating it in a relatively low number of servers.networkbandwidth servers The P2P overlay network consists of all the participating peers as its nodes and has links between any two nodes that know each otheroverlay network Structured P2P networks overcome the limitations of unstructured networks by maintaining a Distributed Hash Table (DHT) and by allowing each peer to be responsible for a specific part of the content in the network.Distributed Hash Table

DORSDL Workshop, 21 September WP9 User interface WP7 Crawler WP5 Linguistic analysis WP2 Relevance analysis WP3 Indexing engine WP6 Resource acquisition Acquisition format Linguistic format Relevance format SRU/Z39.50 WP4 Peer- to-peer WP8 Chinese contribution Crawler, linguistic analysis and relevance analysis Analysis: Document enrichment service

DORSDL Workshop, 21 September Usage of enrichment elements

DORSDL Workshop, 21 September Analysis: Document enrichment service Functionality as a Fedora service Topic-specific crawling based on subject hierarchies Natural language analysis of content Entity recognition Classification of content Addition of synonyms Topic specific scores for customised rankings Too many partners/modules/subsystems involved Usages of enrichment not clarified

DORSDL Workshop, 21 September The initial vision A set of heterogeneous servers connected into a search network Each one is wrapped suitably, so as to act as Alvis Search Peers In this view, Fedora repositories may be wrapped as well. wrapper Analysis: Peer-to-Peer service

DORSDL Workshop, 21 September Design: alvisp2p service The alvisp2p service shall implement the interfaces IndexingQuery and Retrieval for interacting with the P2P system, and implement the necessary operations for interacting with the core Fedora repository service. Seen from the ALVIS view point we will then have a thin superpeer, seen from the Fedora view point we will have a Peer-to-Peer service.

DORSDL Workshop, 21 September Development of Services in the Fedora Service Framework Conclusion –Two examples of services illustrating the issues in developing services for the Fedora Service Framework Interaction with Fedora Reuse from Fedora Security –A promising development approach for Fedora –Promising in general for Digital Object Repository Systems in Digital Libraries? Thank you

DORSDL Workshop, 21 September For more information A Peer-to-Peer Architecture for Information Retrieval Across Digital Library Collections, Technical report LSIR-REPORT , March Report on abstract model and P2P protocols, ALVIS Deliverable 4.1, Beyond term indexing: A P2P framework for Web information retrieval, submitted to thr Informatica journal, December Building a peer-to-peer full-text Web search engine with highly discriminative keys, Technical report LSIR-REPORT , November Using a layered Markov model for distributed web rank computation, ICDCS 2005, Columbus, Ohio, U.S.A., June Towards A Common Framework for Peer-to-Peer Web Retrieval, Book Chapter of From Integrated Publication and Information Systems to Virtual Information and Knowledge Environments, EJN-Festschrift, Matthias Hemmje Ed., Springer LNCS 3379, November An Architecture for Peer-to-Peer Information Retrieval, in 27th Annual International ACM SIGIR Conference (SIGIR 2004), Workshop on Peer-to-Peer Information Retrieval, July, A Query-Adaptive Partial Distributed Hash Table for Peer-to-Peer Systems", in International Workshop on Peer-to-Peer Computing & DataBases (P2P&DB 2004), Crete, Greece, March 2004.

DORSDL Workshop, 21 September alvisp2p service scenario Logon to network Publish document list Publish index Publish query Receive local query Deliver local hit list Receive global hit list Logoff from network