Using SRB and iRODS with the Cheshire3 Information Framework Building Data Grids with iRODS 27-30 May, 2008 National e-Science Centre Edinburgh Dr Robert.

Slides:



Advertisements
Similar presentations
웹 서비스 개요.
Advertisements

18 Copyright © 2005, Oracle. All rights reserved. Distributing Modular Applications: Introduction to Web Services.
General introduction to Web services and an implementation example
SLIDE 1FIST Shanghai Digging Into Data: Data Mining for Information Access Ray R. Larson University of California, Berkeley Paul Watry.
1 Introduction to XML. XML eXtensible implies that users define tag content Markup implies it is a coded document Language implies it is a metalanguage.
Technical Architectures
Presentation Outline  Project Aims  Introduction of Digital Video Library  Introduction of Our Work  Considerations and Approach  Design and Implementation.
What is.NET?. The Clients of.NET a) A new generation of connected application b) Microsoft.NET Framework managed execution c) Allows PCs and other smart.
Tools and Services for the Long Term Preservation and Access of Digital Archives Joseph JaJa, Mike Smorul, and Sangchul Song Institute for Advanced Computer.
Data Grids: Globus vs SRB. Maturity SRB  Older code base  Widely accepted across multiple communities  Core components are tightly integrated Globus.
Robust Tools for Archiving and Preserving Digital Data Joseph JaJa, Mike Smorul, and Mike McGann Institute for Advanced Computer Studies Department of.
SLIDE 1IS 240 – Spring 2009 Prof. Ray Larson University of California, Berkeley School of Information Principles of Information Retrieval.
1 Workflow Description for Open Hypermedia Systems Sanjay Vivek, David C. De Roure Department of Electronics and Computer Science.
SLIDE 1IS 240 – Spring 2007 Prof. Ray Larson University of California, Berkeley School of Information Tuesday and Thursday 10:30 am - 12:00.
SLIDE 1IS 240 – Spring 2010 Prof. Ray Larson University of California, Berkeley School of Information Principles of Information Retrieval.
Peoplesoft: Building and Consuming Web Services
SLIDE 1IS 240 – Spring 2006 Prof. Ray Larson University of California, Berkeley School of Information Management & Systems Tuesday and Thursday.
Web-based Portal for Discovery, Retrieval and Visualization of Earth Science Datasets in Grid Environment Zhenping (Jane) Liu.
Platform as a Service (PaaS)
SLIDE 1ISGC Taipei, Taiwan Grid-based Search and Data Mining Using Cheshire3 In collaboration with Robert Sanderson University of Liverpool.
Hadoop Team: Role of Hadoop in the IDEAL Project ●Jose Cadena ●Chengyuan Wen ●Mengsu Chen CS5604 Spring 2015 Instructor: Dr. Edward Fox.
C Copyright © 2009, Oracle. All rights reserved. Appendix C: Service-Oriented Architectures.
SLIDE 1IS 240 – Spring 2013 Prof. Ray Larson University of California, Berkeley School of Information Principles of Information Retrieval.
DP&NM Lab. POSTECH, Korea - 1 -Interaction Translation Methods for XML/SNMP Gateway Interaction Translation Methods for XML/SNMP Gateway Using XML Technologies.
A Metadata Based Approach For Supporting Subsetting Queries Over Parallel HDF5 Datasets Vignesh Santhanagopalan Graduate Student Department Of CSE.
Master Thesis Defense Jan Fiedler 04/17/98
Introduction to Apache OODT Yang Li Mar 9, What is OODT Object Oriented Data Technology Science data management Archiving Systems that span scientific.
Web Services Kanda Runapongsa Dept. of Computer Engineering Khon Kaen University.
Engr. M. Fahad Khan Lecturer Software Engineering Department University Of Engineering & Technology Taxila.
Dr. Mohamed Osman Hegazi 1 Database Systems Concepts Database Systems Concepts Course Outlines: Introduction to Databases and DBMS. Database System Concepts.
Introduction to Nutch CSCI 572: Information Retrieval and Search Engines Summer 2010.
RELATIONAL FAULT TOLERANT INTERFACE TO HETEROGENEOUS DISTRIBUTED DATABASES Prof. Osama Abulnaja Afraa Khalifah
BIEN Confederated DB (S) Analytical DB(s) Heterogeneous source database(s) of Plots/Specimens/Occurrences Synonymy Names Reference taxonomy *** *** Feedback.
Design of a Search Engine for Metadata Search Based on Metalogy Ing-Xiang Chen, Che-Min Chen,and Cheng-Zen Yang Dept. of Computer Engineering and Science.
XML Web Services Architecture Siddharth Ruchandani CS 6362 – SW Architecture & Design Summer /11/05.
SLIDE 1DID Meeting - Montreal Integrating Data Mining and Data Management Technologies for Scholarly Inquiry Ray R. Larson University of California,
DataNet – Flexible Metadata Overlay over File Resources Daniel Harężlak 1, Marek Kasztelnik 1, Maciej Pawlik 1, Bartosz Wilk 1, Marian Bubak 1,2 1 ACC.
San Diego Supercomputer Center National Partnership for Advanced Computational Infrastructure SRB + Web Services = Datagrid Management System (DGMS) Arcot.
Grid Architecture William E. Johnston Lawrence Berkeley National Lab and NASA Ames Research Center (These slides are available at grid.lbl.gov/~wej/Grids)
Chapter 10 Intro to SOAP and WSDL. Objectives By study in the chapter, you will be able to: Describe what is SOAP Exam the rules for creating a SOAP document.
National Partnership for Advanced Computational Infrastructure San Diego Supercomputer Center Persistent Archive for the NSDL Reagan W. Moore Charlie Cowart.
Presented by Scientific Annotation Middleware Software infrastructure to support rich scientific records and the processes that produce them Jens Schwidder.
INTRODUCTION TO DBS Database: a collection of data describing the activities of one or more related organizations DBMS: software designed to assist in.
XML and Its Applications Ben Y. Zhao, CS294-7 Spring 1999.
CNI, 3rd April 2006 Slide 1 UK National Centre for Text Mining: Activities and Plans Dr. Robert Sanderson Dept. of Computer Science University of Liverpool.
SLIDE 1INFOSCALE Hong Kong Integrating Data Mining and Data Management Technologies for Scholarly Inquiry Paul Watry Richard Marciano.
1 The EDIT System, Overview European Commission – Eurostat.
Development of e-Science Application Portal on GAP WeiLong Ueng Academia Sinica Grid Computing
Web Services An Introduction Copyright © Curt Hill.
Intro to Web Services Dr. John P. Abraham UTPA. What are Web Services? Applications execute across multiple computers on a network.  The machine on which.
AHM04: Sep 2004 Nottingham CCLRC e-Science Centre eMinerals: Environment from the Molecular Level Managing simulation data Lisa Blanshard e- Science Data.
XML Tools (Chapter 4 of XML Book). What tools are needed for a complete XML application? n Fundamental components n Web infrasructure n XML development.
AJAX and REST. Slide 2 What is AJAX? It’s an acronym for Asynchronous JavaScript and XML Although requests need not be asynchronous It’s not really a.
Matthew Farrellee Computer Sciences Department University of Wisconsin-Madison Condor and Web Services.
Copyright 2007, Information Builders. Slide 1 iWay Web Services and WebFOCUS Consumption Michael Florkowski Information Builders.
Visualizing JSTOR: Exploring OAI-ORE for Information Topology Navigation CERN Workshop on Innovations in Scholarly Communication (OAI6) 17 th June, 2009.
© Geodise Project, University of Southampton, Workflow Support for Advanced Grid-Enabled Computing Fenglian Xu *, M.
SLIDE 1IS 240 – Spring 2010 Prof. Ray Larson University of California, Berkeley School of Information Principles of Information Retrieval.
XML and Distributed Applications By Quddus Chong Presentation for CS551 – Fall 2001.
XML 1. Chapter 8 © 2013 Pearson Education, Inc. Publishing as Prentice Hall SAMPLE XML SCHEMA (XSD) 2 Schema is a record definition, analogous to the.
OGSA-DAI.
SLIDE 1NaCTeM Launch -Manchester National Center for Text Mining Launch Event Ray R. Larson University of California, Berkeley School of Information.
WEB SERVICES.
AJAX and REST.
Unit – 5 JAVA Web Services
Flexible Extensible Digital Object Repository Architecture
Flexible Extensible Digital Object Repository Architecture
Introduction to Web Services and SOA
CS6604 Digital Libraries IDEAL Webpages Presented by
Introduction to Web Services and SOA
Presentation transcript:

Using SRB and iRODS with the Cheshire3 Information Framework Building Data Grids with iRODS May, 2008 National e-Science Centre Edinburgh Dr Robert Sanderson Dept. of Computer Science University of Liverpool Building Data Grids with iRODS iRODS Workshop, May 27 th 2008 Slide 1

Cheshire3 Introduction Architecture SRB Integration Architecture Grid Usage iRODS Integration Possible Architectures Overview iRODS Workshop, May 27 th 2008 Slide 2

Cheshire3: Information Analysis Framework Digital Library/Information Retrieval engine with... Data Mining/Machine Learning Text Mining/Natural Language Processing Computational Grid Data Grid Standards Based: Unicode, XML/XPath, MPI, Z39.50/SRU,... Object Oriented Architecture Easy to develop and extend in Python,... but heavy lifting possible in imported C libraries Developed at University of Liverpool, plus UC Berkeley Version: Mostly stable, needs thorough testing/documentation Introduction iRODS Workshop, May 27 th 2008 Slide 3

Context iRODS Workshop, May 27 th 2008 Slide 4

Architecture iRODS Workshop, May 27 th 2008 Slide 5 Index Extractor Server ConfigStore UserStore User Object Database Query Normalizer Record Document PreParser Parser Transformer Records ProtocolHandler RecordStore Terms Documents Ingest Process ResultSet PreParser DocumentFactor y DocumentStore IndexStore Tokenizer TokenMerger

Architecture 2 iRODS Workshop, May 27 th 2008 Slide 6 Index Record IndexStore Extractor XPathObject Extractor XPathObject Extractor Normalizer Index Normalizer Tokenizer TokenMerger Tokenizer TokenMerger Index Normalizer

SRB Integration iRODS Workshop, May 27 th 2008 Slide 7 RecordStore / DocumentStore Filesystem Berkeley DBSQL RDBMS (postgresql)‏ SRB record, document data

SRB Integration iRODS Workshop, May 27 th 2008 Slide 8 IndexStore SRB terms a-bc-d e-fg-h... Index dbs db with query term

Grid Implementation iRODS Workshop, May 27 th 2008 Slide 9 Focus on ingest, not discovery (yet)‏ Instantiate architecture on every node Assign one node as master, rest as slaves. Master then divides the processing as appropriate. Calls between slaves possible Calls as small, simple as possible: (objectIdentifier, functionName, *arguments)‏ Typically: (workflow_id, 'process', document_id)

Grid Architecture iRODS Workshop, May 27 th 2008 Slide 10 Master Task Slave Task 1 Slave Task N Data Grid GPFS Temporary Storage (workflow, process, document)‏ fetch document document extracted data

Grid Architecture 2 iRODS Workshop, May 27 th 2008 Slide 11 Master Task Slave Task 1 Slave Task N Data Grid GPFS Temporary Storage (index, load)‏ store index fetch extracted data

NARA ERA Demonstrator 20Gb of web crawled data in SRB, indexes stored in SRB Interface generated by easily deployable Python layer Medline Dataset Experiments 16.5 Million Abstracts plus associated metadata Parsed data stored in SRB Indexes in filesystem NSDL Grade Level Analysis NSDL web crawl data (3 Tb+)‏ Data already in SRB, analysis stored to SRB Usage iRODS Workshop, May 27 th 2008 Slide 12

Simple Integration (ala SRB) possible: Store data in iRODS for Storage classes Requires Python interface to iRODS Doesn't really benefit from rule capabilities Other (more interesting) Options: Cheshire3 as External Microservice Platform Cheshire3 as Internal Microservice Platform Cheshire3 as Rules Platform(?) iRODS Integration iRODS Workshop, May 27 th 2008 Slide 13

External Microservice Platform iRODS Workshop, May 27 th 2008 Slide 14 iRODS Cheshire3 C3 Microservice C3 Interface Microservice data processed data Possible Interfaces: MPI/PVM RPC SOAP Xml Over Http Arbitrary Transport Protocol etc. Loose Coupling via Client Interface

Internal Microservice Platform iRODS Workshop, May 27 th 2008 Slide 15 iRODS C3 Microservice data Cheshire3 Requires iRODS to have Python interpreter as alternative Microservice platform, rather than a Python client API. Much tighter integration: Cheshire3 would have access to iRODS internal information rather than just what was passed over interface. Microservice definition problem becomes Cheshire3 Workflow definition – XML description No bandwidth problems of transferring large amounts of data back and forth Tight Coupling via Python Integration

Rules Platform? iRODS Workshop, May 27 th 2008 Slide 16 iRODS data Cheshire3 Rules C3 Microservice Microservice s Requires Python interpreter at the Rules execution level, rather than (as well as) at the Microservice level. More flexible in terms of rule design Easier to write rules than current rule language Event system rather than rules execution? Integration of Computational Grid for rule/microservice execution?

Website: Me: Acknowledgements: SHAMAN: EU 7 th Framework Programme Cheshire3: JISC, NSF Questions? Thank You! iRODS Workshop, May 27 th 2008 Slide 17