OGF19 Grid Information Retrieval Working Group January 30, 2007 Chapel Hill, NC.

Slides:



Advertisements
Similar presentations
COUNTER: improving usage statistics Peter Shepherd Director COUNTER December 2006.
Advertisements

A Lightweight Platform for Integration of Mobile Devices into Pervasive Grids Stavros Isaiadis, Vladimir Getov University of Westminster, London {s.isaiadis,
Large-Scale, Adaptive Fabric Configuration for Grid Computing Peter Toft HP Labs, Bristol June 2003 (v1.03) Localised for UK English.
1 Senn, Information Technology, 3 rd Edition © 2004 Pearson Prentice Hall James A. Senns Information Technology, 3 rd Edition Chapter 7 Enterprise Databases.
1 jNIK IT tool for electronic audit papers 17th meeting of the INTOSAI Working Group on IT Audit (WGITA) SAI POLAND (the Supreme Chamber of Control)
Tom Sugden EPCC OGSA-DAI Future Directions OGSA-DAI User's Forum GridWorld 2006, Washington DC 14 September 2006.
OGSA-DAI Activities OGSA-DAI for Developers GridWorld 2006, Washington DC 11 September 2006.
Relational Database and Data Modeling
OGSA ByteIO WG Status of working group © 2006 Open Grid Forum.
© 2007 Open Grid Forum SAGA: Simple API for Grid Applications Steven Newhouse Application Standards Area Director.
GLUE 2.0 and GLUEMan Sergio Andreozzi, INFN-CNAF, Bologna (Italy) OGF Feb 2008, Cambridge, MA, USA.
© 2006 Open Grid Forum INFOD Extended Specifications OGF21, Seattle, WA, USA
1 Service Oriented Architectures (SOA): What Users Need to Know. OGF 19: January 31, 2007 Charlotte, NC John Salasin, Ph.D, Visiting Researcher National.
© 2007 Open Grid Forum OGSA Message Broker Service - MBS proposal OGF19 OGSA-WG session #3 Abdeslem DJAOUI 30 January, :30pm Chapel Hill, NC.
© 2007 Open Grid Forum OGSA-RUS Specification Update, Adoption and WS-RF Profile Discussions (Molly Pitcher) Morris Riedel (Forschungszentrum Jülich –
© 2007 Open Grid Forum Data Management Challenge - The View from OGF OGF22 – February 28, 2008 Cambridge, MA, USA Erwin Laure David E. Martin Data Area.
© 2006 Open Grid Forum INFOD-WG Status and Plans OGF21, Seattle, WA, USA
© 2006 Open Grid Forum GGF18, 13th September 2006 OGSA Data Architecture Scenarios Dave Berry & Stephen Davey.
Copyright © 2006 Data Access Technologies, Inc. Open Source eGovernment Reference Architecture Approach to Semantic Interoperability Cory Casanave, President.
Multilinguality & Semantic Search Eelco Mossel (University of Hamburg) Review Meeting, January 2008, Zürich.
Experiences with Converting my Grid Web Services to Grid Services Savas Parastatidis & Paul Watson
The ANSI/SPARC Architecture of a Database Environment
The National Grid Service Mike Mineter.
A Workflow Engine with Multi-Level Parallelism Supports Qifeng Huang and Yan Huang School of Computer Science Cardiff University
Enterprise Java and Data Services Designing for Broadly Available Grid Data Access Services.
Open Grid Service Architecture - Data Access & Integration (OGSA-DAI) Dr Martin Westhead Principal Consultant, EPCC Telephone: Fax:+44.
1 OGSA-DAI Platform Dependencies Malcolm Atkinson for OMII SC 18 th January 2005.
The National Grid Service and OGSA-DAI Mike Mineter
Current status of grids: the need for standards Mike Mineter TOE-NeSC, Edinburgh.
© University of Reading David Spence 20 April 2014 e-Research: Activities and Needs.
Standardizing Usage Statistics Requests with SUSHI Theodore Fons Senior Product Manager Innovative Interfaces.
Content Interaction and Formatting, Tayeb LEMLOUMA & Nabil Layaïda. November Tayeb Lemlouma & Nabil Layaïda Presented by Sébastien Laborie November.
Data Management Expert Panel - WP2. WP2 Overview.
The World Wide Web. 2 The Web is an infrastructure of distributed information combined with software that uses networks as a vehicle to exchange that.
IONA Technologies Position Paper Constraints and Capabilities for Web Services
15 May 2006Collaboration Board GridPP3 Planning Executive Summary Steve Lloyd.
Database System Concepts and Architecture
Executional Architecture
Enhancing Spotfire with the Power of R
Page 1 Norikazu - Nick Yamasaki KDDI Corporation Chair, TSG-S Upcoming Services and Systems Aspects for 3GPP2: 3GPP2 Future Directions and IMT-Advanced.
Continued Investment in ATML
C2-SENSE WP2 Bojan Božić, Gerald Schimak, Refiz Duro C2-SENSE WP2 Meeting Paris
Information Retrieval in Practice
Search Engines and Information Retrieval
Overview of Search Engines
SACM Information Model. Current Status First WG draft posted 10/24 Many open issues remain Several comments / suggestions sent to WG for review Today.
Systems Group Dept. Computer Science ETH Zurich - Switzerland XQBench An XQuery Benchmarking Service Peter M. Fischer.
Understanding Data Warehousing
GRACE Project IST EGAAP meeting – Den Haag, 25/11/2004 Giuseppe Sisto – Telecom Italia Lab.
Search Engines and Information Retrieval Chapter 1.
Rutherford Appleton Laboratory SKOS Ecoterm 2006 Alistair Miles CCLRC Rutherford Appleton Laboratory Semantic Web Best Practices and Deployment.
GT Components. Globus Toolkit A “toolkit” of services and packages for creating the basic grid computing infrastructure Higher level tools added to this.
OGF26 Grid Information Retrieval Research Group May 26, 2008 Chapel Hill.
CRISP & SKA WP19 Status. Overview Staffing SKA Preconstruction phase Tiered Data Delivery Infrastructure Prototype deployment.
IT-522: Web Databases And Information Retrieval By Dr. Syed Noman Hasany.
Grids - the near future Mark Hayes NIEeS Summer School 2003.
1/22/08 RTR Project Presentation to TPTF RTR Project Michael Daskalantonakis & Brian Cook.
RUS: Resource Usage Service Steven Newhouse James Magowan
Architecture View Models A model is a complete, simplified description of a system from a particular perspective or viewpoint. There is no single view.
Information Retrieval Transfer Cycle Dania Bilal IS 530 Fall 2007.
Welcome Grids and Applied Language Theory Dave Berry Research Manager 16 th October 2003.
OGSA-DQP Steven Lynden University of Manchester. Data access & integration with OGSA-DAI: GGF 17 2 Introduction OGSA-DQP is a service based distributed.
Developing GRID Applications GRACE Project
OGSA-DAI.
GGF - © Birds of a Feather - Policy Architecture Working Group.
Information Retrieval in Practice
Search Engine Architecture
Lecture 1: Introduction and the Boolean Model Information Retrieval
Global Grid Forum (GGF) Orientation
Information Retrieval and Web Design
Presentation transcript:

OGF19 Grid Information Retrieval Working Group January 30, 2007 Chapel Hill, NC

2 Agenda IP Policy reminder Introduce participants GIR-WG charter & overview GIR document status review Reference implementations Mention of related work elsewhere Paul Kim presentation Chris Fallen presentation Discussion

3 Session Particulars OGF IP policies apply GIR-WG chairs: Dr. Greg Newby, Arctic Region Supercomputing Center Dr. Paul Yangwoo Kim, Dongguk U. Nassib Nassar, RENCI

4 What is GIR-WG? GIR-WG was chartered by OGF to develop standards and reference implementations for information retrieval (IR) on computational grids. GIR-WG has published a Requirements document under GGF (GFD-I.027) Our first Experimental document was published recently (GFD-E.082) Progress on the Architecture document is dormant, awaiting practical experience Practical experience is being gained, and will result in at least further experimental documents.

5 What is Information Retrieval? IR is the science and method of delivering documents that are relevant to human information needs. Rather than delivering sets of matching documents (as DBMS do), IR systems rank matching documents. IR systems usually focus on textual input data (aka, natural language) either unformatted or formatted (plain text, HTML, XML, etc.)

6 GIR-WG Charter The GIR WG will establish a specific set of requirements, an architecture, and detailed specifications for Information Retrieval (IR) on computational grids. GIR will provide document collection management, indexing/searching, and query processing services to grid users and applications. GIR Milestones: GIR Requirements Document - Stakeholder-driven list of service-level requirements for building a grid-based IR system. Published in 2005 as GFD-I.27. GIR Architecture Document - Describes overall system comprised of integrated grid services, scenarios, etc. Draft under consideration since 2004; based on Experimental document outcomes, final version is expected in Experimental Documents - Experiences with GIR implementations or partial implementations (query processors, indexers, collection managers...). GFD-E.082 in 2006; others under consideration GIR Recommendation Draft Document - Describes each service in detail, with sections for different implementation platforms (such as Web Services, Grid Services, standalone...). Draft is expected after Architecture document, in GIR Recommendation Final Document - After the Draft Recommendation, based on independent interoperable implementations and further practical experiences. Within 2 years of the Draft Recommendation.

7 Why IR is a good candidate for Grid computing Excellent for divide and conquer coarse-grained parallelism Input items are discrete Coordination across subsets of a document collection can be minimal Results from multiple sources can be coordinated and relevance ranked together Queries may be handled independently

8 Significant Progress oDocuments: oGIR Requirements published oGIR Architecture in mid-draft (dormant) oExperimental document: published oImplementation: oMCNC released a technology preview oKims work: an experimental document oNewbys work: heading to an experimental document oNassars work: Sarcomere & Amberfish, open source toolkit based on GT4 oFallen & Newby distributed IR research

9 Requirements overview (per GFD-I.027) Desirability of Grid infrastructure for IR, notably enterprise IR: VO (for security, segmentation) Conceptual separation of functions (for indexing, collection management & query processing) Flexible but coarse-grained flow of control among elements Persistence of queries, collections and indexes Three primary components : Collection manager: handles input gathering, transformation, transport, staging and delivery Indexer: core information retrieval collection representation Query processor: respond to user needs, including standing information needs (i.e., information filtering)

10 Implementation Approaches Do not rely on particular implementations or middleware (e.g., Globus) Pursue different types of Grid implementations: Minimalist, home grown Globus-based Pure Web services These approaches can each be separate Experimental docs; will be appendices in the Architecture doc

11 GFD-E.082 Kim: Grid Information Retrieval System for Dynamically Reconfigurable Virtual Organization Practical experience on re-allocation of GIR nodes based on system load Indexer, collection manager or query processor, based on system load Dynamic reallocation of nodes within a computational grid

12 Nassar: Sarcomere See Sarcomere calls a collection of documents a "database". One or more "indexes" can be created per database. Each index represents an access point for searching the document collection. In theory, indexes can differ in how they constrain the queries (e.g. by fields), what kind of data structures are used, etc. At the moment only Amberfish full text indexes are supported (index type = "Amberfish"). Current port types (very rudimentary and highly subject to change): createDatabase deleteDatabase createIndex deleteIndex addDocument Search Stay tuned for more developments!

13 Newby: Multisearch How can we merge result sets from different IR engines? Desire to merge based on global relevance Challenging because different IR engines have different scoring/ranking algorithms Challenging because different collections have different characteristics, influencing ranking Used for TREC by Fallen & Newby 2005, 2006

14 Results are merged based on statistical normalization No accounting for different IR engines or different collections Simplifying assumptions that all IR rankings come from the same basic distribution Simple interface to an Axis/Tomcat backend

15 Opportunities for Interaction OGSA-DAI has middleware that provides basic query and result set transport Search from multiple databases; add a higher-level merger Seems promising for GIR!

16 Discussion of GIR-WG Your questions, thoughts and suggestions

17 Get Involved! Visit Subscribe to Talk with chairs about data and reference implementations and documents