Interactive Data Analysis on the “Grid” Tech-X/SLAC/PPDG:CS-11 Balamurali Ananthan David Alexander

Slides:



Advertisements
Similar presentations
1 CEOS/WGISS20 – Kyiv – September 13, 2005 Paul Kopp SIPAD New Generation: Dominique Heulet CNES 18, Avenue E.Belin Toulouse Cedex 9 France
Advertisements

This product includes material developed by the Globus Project ( Introduction to Grid Services and GT3.
9.5 Software Architecture
The road to reliable, autonomous distributed systems
1 CENTER FOR PARALLEL COMPUTERS An Introduction to Globus Toolkit® 3 -Developing Interoperable Grid services.
Component Patterns – Architecture and Applications with EJB copyright © 2001, MATHEMA AG Component Patterns Architecture and Applications with EJB JavaForum.
Distributed components
David Adams ATLAS DIAL Distributed Interactive Analysis of Large datasets David Adams BNL March 25, 2003 CHEP 2003 Data Analysis Environment and Visualization.
Grid Programming Environment (GPE) Grid Summer School, July 28, 2004 Ralf Ratering Intel - Parallel and Distributed Solutions Division (PDSD)
Web Servers How do our requests for resources on the Internet get handled? Can they be located anywhere? Global?
Data Grids: Globus vs SRB. Maturity SRB  Older code base  Widely accepted across multiple communities  Core components are tightly integrated Globus.
Online Magazine Bryan Ng. Goal of the Project Product Dynamic Content Easy Administration Development Layered Architecture Object Oriented Adaptive to.
DataGrid Kimmo Soikkeli Ilkka Sormunen. What is DataGrid? DataGrid is a project that aims to enable access to geographically distributed computing power.
Interpret Application Specifications
4b.1 Grid Computing Software Components of Globus 4.0 ITCS 4010 Grid Computing, 2005, UNC-Charlotte, B. Wilkinson, slides 4b.
Oxford Jan 2005 RAL Computing 1 RAL Computing Implementing the computing model: SAM and the Grid Nick West.
The Open Grid Service Architecture (OGSA) Standard for Grid Computing Prepared by: Haoliang Robin Yu.
Interactive Data Analysis on the Grid with JAS and Globus Interactive Data Analysis on the Grid with JAS and Globus David Alexander, Brian Miller, & John.
Victor Serbo, SLAC30 September 2004, Interlaken, Switzerland JASSimApp plugin for JAS3: Interactive Geant4 GUI Serbo, Victor (SLAC) - presenter Donszelmann,
Client/Server Architecture
1 ParaView Current Architecture and History Current Architecture and History Issues with the Current Design Issues with the Current Design.
Web-based Portal for Discovery, Retrieval and Visualization of Earth Science Datasets in Grid Environment Zhenping (Jane) Liu.
The SAM-Grid Fabric Services Gabriele Garzoglio (for the SAM-Grid team) Computing Division Fermilab.
QCDgrid Technology James Perry, George Beckett, Lorna Smith EPCC, The University Of Edinburgh.
- 1 - Grid Programming Environment (GPE) Ralf Ratering Intel Parallel and Distributed Solutions Division (PDSD)
XCAT Science Portal Status & Future Work July 15, 2002 Shava Smallen Extreme! Computing Laboratory Indiana University.
Java Analysis Studio Status Update 12 May 2000 Altas Software Week Tony Johnson
SITools Enhanced Use of Laboratory Services and Data Romain Conseil
Presented by Xiaoyu Qin Virtualized Access Control & Firewall Virtualization.
David Adams ATLAS ATLAS Distributed Analysis David Adams BNL March 18, 2004 ATLAS Software Workshop Grid session.
WSRF & WSRF’s Application in VO-DAS Haijun Tian ChinaVO
CSCI 6962: Server-side Design and Programming Web Services.
GT Components. Globus Toolkit A “toolkit” of services and packages for creating the basic grid computing infrastructure Higher level tools added to this.
ESP workshop, Sept 2003 the Earth System Grid data portal presented by Luca Cinquini (NCAR/SCD/VETS) Acknowledgments: ESG.
TRASC Globus Application Launcher VPAC Development Team Sudarshan Ramachandran.
Java Root IO Part of the FreeHEP Java Library Tony Johnson Mark Dönszelmann
Towards a Universal Client for Grid Monitoring Systems Towards a Universal Client for Grid Monitoring Systems Design and Implementation of the Ovid Browser.
1 Overview of the Application Hosting Environment Stefan Zasada University College London.
Agent Technology for Data Analysis Tony Johnson - SLAC 21 st October 1998 WORKSHOP ON SCIENTIFIC DATA MANAGEMENT PROBLEMS AND SOLUTIONS.
V. Serbo, SLAC ACAT03, 1-5 December 2003 Interactive GUI for Geant4 by Victor Serbo, SLAC.
Shannon Hastings Multiscale Computing Laboratory Department of Biomedical Informatics.
Holding slide prior to starting show. A Portlet Interface for Computational Electromagnetics on the Grid Maria Lin and David Walker Cardiff University.
NA-MIC National Alliance for Medical Image Computing UCSD: Engineering Core 2 Portal and Grid Infrastructure.
Distributed Information Systems. Motivation ● To understand the problems that Web services try to solve it is helpful to understand how distributed information.
AIDA Web Interface Tony Johnson, Victor Serbo, Max Turri AIDA Workshop, CERN, July 2003.
Metadata Mòrag Burgon-Lyon University of Glasgow.
Overview of grid activities in France in relation to FKPPL FKPPL Workshop Thursday February 26th, 2009 Dominique Boutigny.
Interactive Workflows Branislav Šimo, Ondrej Habala, Ladislav Hluchý Institute of Informatics, Slovak Academy of Sciences.
EGEE User Forum Data Management session Development of gLite Web Service Based Security Components for the ATLAS Metadata Interface Thomas Doherty GridPP.
Middleware for Campus Grids Steven Newhouse, ETF Chair (& Deputy Director, OMII)
INFSO-RI Enabling Grids for E-sciencE Ganga 4 – The Ganga Evolution Andrew Maier.
1 e-Science AHM st Aug – 3 rd Sept 2004 Nottingham Distributed Storage management using SRB on UK National Grid Service Manandhar A, Haines K,
CSC 480 Software Engineering High Level Design. Topics Architectural Design Overview of Distributed Architectures User Interface Design Guidelines.
On Using BPEL Extensibility to Implement OGSI and WSRF Grid Workflows Aleksander Slomiski Presented by Onyeka Ezenwoye CIS Advanced Topics in Software.
Data Manipulation with Globus Toolkit Ivan Ivanovski TU München,
ERDDAP The Next Generation of Data Servers Bob Simons DOC / NOAA / NMFS / SWFSC / ERD Monterey, CA Disclaimer: The opinions expressed.
Nanbor Wang, Balamurali Ananthan Tech-X Corporation Gerald Gieraltowski, Edward May, Alexandre Vaniachine Argonne National Laboratory 2. ARCHITECTURE GSIMF:
David Adams ATLAS ATLAS-ARDA strategy and priorities David Adams BNL October 21, 2004 ARDA Workshop.
Features of JAS Plots Plots update in real time. Data for plots can be local or remote (use Java RMI to connect to JAS Data Server). Rich variety of styles.
David Adams ATLAS ATLAS Distributed Analysis and proposal for ATLAS-LHCb system David Adams BNL March 22, 2004 ATLAS-LHCb-GANGA Meeting.
Grid Execution Management for Legacy Code Architecture Exposing legacy applications as Grid services: the GEMLCA approach Centre.
Tutorial on Science Gateways, Roma, Catania Science Gateway Framework Motivations, architecture, features Riccardo Rotondo.
XML 2002 Annotation Management in an XML CMS A Case Study.
A service Oriented Architecture & Web Service Technology.
The Open Grid Service Architecture (OGSA) Standard for Grid Computing
AMGA Web Interface Salvatore Scifo INFN sez. Catania
AMGA Web Interface Vincenzo Milazzo
Nominal Technologies for Modelling and High Level Applications of LCLS
Condor-G: An Update.
Presentation transcript:

Interactive Data Analysis on the “Grid” Tech-X/SLAC/PPDG:CS-11 Balamurali Ananthan David Alexander Tony Johnson Victor Serbo Presented at Computing in High Energy Physics Interlaken, Switzerland, September 2004

Focus of our work Interactive Data Analysis on the Grid Very quick (<1 second  100’s seconds) turnaround Very quick (<1 second  100’s seconds) turnaround Intermediate results presented in real time Intermediate results presented in real time Plots update as analysis proceeds Output from analysis displayed immediately High degree of interactivity High degree of interactivity Change cuts/binning etc. and see immediate results Goal – seamless interactive computing on the web.

Starting Point JAS2 analysis client supports Local Analysis Local Analysis Data, analysis code and GUI client live on same machine Client-Server Analysis Client-Server Analysis Data and analysis runs on remote machine, GUI client runs on local machine (uses Java RMI as network protocol) In 2002 we added GRID based analysis GUI client runs on local machine GUI client runs on local machine Data and Analysis runs in parallel on a farm of remote machines Data and Analysis runs in parallel on a farm of remote machines Initial implementation used Globus2 + Java RMI Initial implementation used Globus2 + Java RMI In all three modes goal is for physicist to feel that he is interacting with his local machine All three modes look almost identical to use All three modes look almost identical to use Try to hide as much of the Grid from the end-user as practical Try to hide as much of the Grid from the end-user as practical

JAS2 Grid Client

Current Project Builds on Earlier Work Grid Services based on OGSI/Globus 3 Switch to using WS-RF (Globus 4?) in future Reuse existing Globus facilities where possible Reuse existing Globus facilities where possible Define new services if not already available Define new services if not already available Design loosely-coupled services to encourage re-use Separate interface from implementation Interfaces: Collaborate with CS-11/PPDG/ARDA Interfaces: Collaborate with CS-11/PPDG/ARDA Reference Implementation: JAS-DAGS (Dataset Analysis Grid Service) Reference Implementation: JAS-DAGS (Dataset Analysis Grid Service) Use JAS3 as reference analysis client Currently in development Plan for initial use for International Linear Collider Simulation Studies Plan for initial use for International Linear Collider Simulation Studies

Dataset Catalog Service First component developed Interface collaboratively designed as part of PPDG-CS11 project Interface collaboratively designed as part of PPDG-CS11 project Aims to separate interface from implementation Aims to separate interface from implementation We have a reference implementation Based on Java and simple “in-memory” XML database Based on Java and simple “in-memory” XML database Designed to make it easy to put same interface on top of other existing data catalog systems Has also been deployed as a Clarens service

Dataset Catalog Service Allows user to “browse” dataset hierarchy Allows user to “search” using “meta-data” associated with each dataset Output Grid Service Handle (GSH) of the Dataset Locator Grid Service Handle (GSH) of the Dataset Locator The Locator service that knows the actual location of the Dataset. String ID of the Dataset String ID of the Dataset An opaque string interpreted only by the dataset locator

DAGS Dataset Analysis Grid Service Aim to produce complete interactive data analysis system Aim to produce complete interactive data analysis system Loosely based on CS-11 API’s Migrate from RMI->OGSA in stages to maintain working system at each stage Key design goals Only requires Globus (+JavaVM) on worker nodes Only requires Globus (+JavaVM) on worker nodes Everything else dynamically deployed Specialized analysis services only need to be installed on specific gateway nodes. Specialized analysis services only need to be installed on specific gateway nodes. Few services need to be visible outside firewall. Few services need to be visible outside firewall. No Grid software on Client node (except Java COG) No Grid software on Client node (except Java COG)

WORKER NODE 1WORKER NODE 2 JAS3 Client Dataset Analysis Manager Service Dataset Catalog Service Index Service Dataset Locator Service Data Splitter Service Reliable File Transfer Service Analysis Server Managed Job Service Reliable File Transfer Service Managed Job Service Analysis TaskResults Analysis Job Description Results Dataset IDDataset query Result Merging Service (AIDA based) Firewall Caching Service Data Chooser Plugin Proxy Login Plugin DAGS client DAGS Conceptua l Diagram

Performance JAS2 system used Java Remote Method Invocation (RMI). Current system still uses RMI in some areas, but intention is to migrate to OGSA Performance is a real problem: Trivial Service Invocation (AuctionService) over 10Mbit LAN Trivial Service Invocation (AuctionService) over 10Mbit LAN all times for 100 calls, excluding first call all times for 100 calls, excluding first call RMI: 100 calls - 96ms Globus3.2 (non-secure): 100 calls - 22 seconds Globus3.2 (secure): 100 calls seconds Problems may be partly related to Globus implementation, but are clearly also partly fundamental problems with XML encoding/decoding and web-service protocol Problems may be partly related to Globus implementation, but are clearly also partly fundamental problems with XML encoding/decoding and web-service protocol Possible workarounds Possible workarounds “fast web services” WS/ WS/ WS/ or “clarens + xml-rpc” or …

Plans Deploy Dataset Catalog Interface with some real data sources International Linear Collider Simulation Data International Linear Collider Simulation Data Some interest in interface to POOL Some interest in interface to POOL Deploy full DAGS system and try with real users First target will be linear collider simulation studies First target will be linear collider simulation studies Work on interoperability with other systems Clarens/Rendezvous service Clarens/Rendezvous service gLite? gLite? One goal of switching to OGSI was to use interoperable modules One goal of switching to OGSI was to use interoperable modules This requires development of “standard” interfaces which provide for flexibility in the way in which they will be used It is unclear that the HEP community has the motivation to do this

Conclusion We are making progress on developing a Globus 3 based interactive data system Aim to have usable system by end 2004 Aim to have usable system by end 2004 Globus/OGSI/WS-RF is certainly not the easiest way to implement interactive data analysis Performance is a problem Performance is a problem Workarounds exist not clear if/when this will be addressed by core Globus software not clear if/when this will be addressed by core Globus software Looking at other technologies for better performance Looking at other technologies for better performance Interoperability and Component Reuse Interoperability and Component Reuse Some progress but not so far as effective as was hoped for

Links DAGS CS JAS AIDA Clarens

Screenshots

Some Screenshots Starting Work Manager.. Starting Grid Service Manager..

Screenshots(cont…) Starting MMJFS on the end nodes…

Starting JAS Client..

JAS Client..

Resulting Histogram…