Distributed Services for Grid Distributed Services for Grid

Slides:



Advertisements
Similar presentations
LEAD Portal: a TeraGrid Gateway and Application Service Architecture Marcus Christie and Suresh Marru Indiana University LEAD Project (
Advertisements

Legacy code support for commercial production Grids G.Terstyanszky, T. Kiss, T. Delaitre, S. Winter School of Informatics, University.
CMS Applications Towards Requirements for Data Processing and Analysis on the Open Science Grid Greg Graham FNAL CD/CMS for OSG Deployment 16-Dec-2004.
A. Arbree, P. Avery, D. Bourilkov, R. Cavanaugh, S. Katageri, G. Graham, J. Rodriguez, J. Voeckler, M. Wilde CMS & GriPhyN Conference in High Energy Physics,
Sphinx Server Sphinx Client Data Warehouse Submitter Generic Grid Site Monitoring Service Resource Message Interface Current Sphinx Client/Server Multi-threaded.
Globus Toolkit 4 hands-on Gergely Sipos, Gábor Kecskeméti MTA SZTAKI
Distributed Heterogeneous Data Warehouse For Grid Analysis
Grid Collector: Enabling File-Transparent Object Access For Analysis Wei-Ming Zhang Kent State University John Wu, Alex Sim, Junmin Gu and Arie Shoshani.
Workload Management Workpackage Massimo Sgaravatto INFN Padova.
Event Generators Norman Graf (SLAC) May 20, 2003 May 20, 2003.
Data Management for Physics Analysis in PHENIX (BNL, RHIC) Evaluation of Grid architecture components in PHENIX context Barbara Jacak, Roy Lacey, Saskia.
Other servers Java client, ROOT (analysis tool), IGUANA (CMS viz. tool), ROOT-CAVES client (analysis sharing tool), … any app that can make XML-RPC/SOAP.
October 2003 Iosif Legrand Iosif Legrand California Institute of Technology.
The new The new MONARC Simulation Framework Iosif Legrand  California Institute of Technology.
Workload Management Massimo Sgaravatto INFN Padova.
Web-based Portal for Discovery, Retrieval and Visualization of Earth Science Datasets in Grid Environment Zhenping (Jane) Liu.
CONDOR DAGMan and Pegasus Selim Kalayci Florida International University 07/28/2009 Note: Slides are compiled from various TeraGrid Documentations.
The CAVES Project Collaborative Analysis Versioning Environment System The CODESH Project COllaborative DEvelopment SHell Dimitri Bourilkov University.
1 Resource Management of Large- Scale Applications on a Grid Laukik Chitnis and Sanjay Ranka (with Paul Avery, Jang-uk In and Rick Cavanaugh) Department.
Data Management Kelly Clynes Caitlin Minteer. Agenda Globus Toolkit Basic Data Management Systems Overview of Data Management Data Movement Grid FTP Reliable.
ARGONNE  CHICAGO Ian Foster Discussion Points l Maintaining the right balance between research and development l Maintaining focus vs. accepting broader.
Korea Workshop May Grid Analysis Environment (GAE) (overview) Frank van Lingen (on behalf of the GAE.
Grid Leadership Avery –PI of GriPhyN ($11 M ITR Project) –PI of iVDGL ($13 M ITR Project) –Co-PI of CHEPREO –Co-PI of UltraLight –President of SESAPS Ranka.
Virtual Logbooks and Collaboration in Science and Software Development Dimitri Bourilkov, Vaibhav Khandelwal, Archis Kulkarni, Sanket Totala University.
ACAT 2003 Iosif Legrand Iosif Legrand California Institute of Technology.
Main Sphinx Design Concepts There are two primary design components which comprise Sphinx The Database Warehouse The Control Process The Database Warehouse.
Instrumentation of the SAM-Grid Gabriele Garzoglio CSC 426 Research Proposal.
Grid Execution Management for Legacy Code Applications Grid Enabling Legacy Code Applications Tamas Kiss Centre for Parallel.
CPT Demo May Build on SC03 Demo and extend it. Phase 1: Doing Root Analysis and add BOSS, Rendezvous, and Pool RLS catalog to analysis workflow.
Data Mining and Exploration Middleware for Distributed and Grid Computing – University of Minnesota 1 Sphinx: A Scheduling Middleware for Data.
09/02 ID099-1 September 9, 2002Grid Technology Panel Patrick Dreher Technical Panel Discussion: Progress in Developing a Web Services Data Analysis Grid.
CHEP 2004 Grid Enabled Analysis: Prototype, Status and Results (on behalf of the GAE collaboration) Caltech, University of Florida, NUST, UBP Frank van.
Grid Scheduler: Plan & Schedule Adam Arbree Jang Uk In.
ROOT-CORE Team 1 PROOF xrootd Fons Rademakers Maarten Ballantjin Marek Biskup Derek Feichtinger (ARDA) Gerri Ganis Guenter Kickinger Andreas Peters (ARDA)
Grid-Powered Scientific & Engineering Applications Ho Quoc Thuan INSTITUTE OF HIGH PERFORMANCE COMPUTING.
The Grid Effort at UF Presented by Craig Prescott.
AliEn AliEn at OSC The ALICE distributed computing environment by Bjørn S. Nilsen The Ohio State University.
MONTE CARLO EVENT GENERATION IN A MULTILANGUAGE, MULTIPLATFORM ENVIRONMENT Norman Graf Tony Johnson Stanford Linear Accelerator Center Abstract: We discuss.
Distributed Services for Grid Enabled Data Analysis Distributed Services for Grid Enabled Data Analysis.
Korea Workshop May GAE CMS Analysis (Example) Michael Thomas (on behalf of the GAE group)
Super Scaling PROOF to very large clusters Maarten Ballintijn, Kris Gulbrandsen, Gunther Roland / MIT Rene Brun, Fons Rademakers / CERN Philippe Canal.
AHM04: Sep 2004 Nottingham CCLRC e-Science Centre eMinerals: Environment from the Molecular Level Managing simulation data Lisa Blanshard e- Science Data.
David Adams ATLAS ATLAS-ARDA strategy and priorities David Adams BNL October 21, 2004 ARDA Workshop.
Clarens Toolkit Building Blocks for a Simple TeraGrid Gateway Tutorial Conrad Steenberg Julian Bunn, Matthew Graham, Joseph Jacob, Craig Miller, Roy Williams.
David Adams ATLAS ADA: ATLAS Distributed Analysis David Adams BNL December 15, 2003 PPDG Collaboration Meeting LBL.
Managing LIGO Workflows on OSG with Pegasus Karan Vahi USC Information Sciences Institute
1 Grid2003 Monitoring, Metrics, and Grid Cataloging System Leigh GRUNDHOEFER, Robert QUICK, John HICKS (Indiana University) Robert GARDNER, Marco MAMBELLI,
System Software Laboratory Databases and the Grid by Paul Watson University of Newcastle Grid Computing: Making the Global Infrastructure a Reality June.
Towards a High Performance Extensible Grid Architecture Klaus Krauter Muthucumaru Maheswaran {krauter,
EGI-InSPIRE RI EGI-InSPIRE EGI-InSPIRE RI EGI solution for high throughput data analysis Peter Solagna EGI.eu Operations.
Accessing the VI-SEEM infrastructure
Workload Management Workpackage
U.S. ATLAS Grid Production Experience
California Institute of Technology
PROOF – Parallel ROOT Facility
The CMS Grid Analysis Environment GAE (The CAIGEE Architecture)
Job workflow Pre production operations:
OGSA Data Architecture Scenarios
LCG middleware and LHC experiments ARDA project
DUCKS – Distributed User-mode Chirp-Knowledgeable Server
N. De Filippis - LLR-Ecole Polytechnique
Patrick Dreher Research Scientist & Associate Director
Support for ”interactive batch”
Initial job submission and monitoring efforts with JClarens
Module 01 ETICS Overview ETICS Online Tutorials
PROOF - Parallel ROOT Facility
Information Services for Dynamically Assembled Semantic Grids
Gordon Erlebacher Florida State University
Status of Grids for HEP and HENP
Grid Computing Software Interface
Presentation transcript:

Distributed Services for Grid Distributed Services for Grid Enabled Data Analysis Distributed Services for Grid Enabled Data Analysis

Scenario Liz and John are members of CMS Liz is from Caltech and is an expert in event reconstruction John is from Florida and is an expert in statistical fits They wish to combine their expertise and collaborate on a CMS Data Analysis Project

Grid-services Web Service: Clarens Demo Goals Prototype vertically integrated system Transparent/seamless experience Distribute grid services using a uniform web service Clarens ! Understand system latencies failure modes Investigate request scheduling in a resource limited and dynamic environment Emphasize functionality over scalability Investigate interactive vs. scheduled data analysis on a grid Hybrid example Understand where are the difficult issues Grid Monitoring Service MonALISA Grid Resource VDT Server Grid Execution VDT Client Grid Scheduling Sphinx Virtual Data Chimera Workflow Generation ShahKar Collaborative Environment CAVE Grid-services Web Service: Clarens Analysis Client IGUANA ROOT Web Browser PDA Remote Data Clarens

Data Discovery Chimera Virtual data products are pre-registered with the Chimera Virtual Data Service. Using Clarens, data products are discovered by Liz and John by remotely browsing the Chimera Virtual Data Service y.cards x.cards pythia pythia y.ntpl x.ntpl h2root h2root y.root x.root y.ntpl y.root x.ntpl x.root request browse Chimera Virtual Data System

Data Analysis Chimera Liz wants to analyse x.root using her analysis code a.C x.cards pythia // Analysis code: a.C #include <iostream.h> #include <math.h> #include "TFile.h" #include "TTree.h" #include "TBrowser.h" #include "TH1.h" #include "TH2.h" #include "TH3.h" #include "TRandom.h" #include "TCanvas.h" #include "TPolyLine3D.h" #include "TPolyMarker3D.h" #include "TString.h" void a( char treefile[], char newtreefile[] ) { Int_t Nhep; Int_t Nevhep; Int_t Isthep[3000]; Int_t Idhep[3000], Jmohep[3000][2], Jdahep[3000][2]; Float_t Phep[3000][5], Vhep[3000][4]; Int_t Irun, Ievt; Float_t Weight; Int_t Nparam; Float_t Param[200]; TFile *file = new TFile( treefile ); TTree *tree = (TTree*) file -> Get( "h10 tree -> SetBranchAddress( "Nhep", &Nh x.ntpl h2root x.root Chimera Virtual Data System

Interactive Workflow Generation Liz browses the local directory for her analysis code and the Chimera Virtual Data Service for input LFNs… x.cards pythia x.ntpl Select input LFN h2root x.root Select CINT script Define output LFN Chimera Virtual Data System register browse

Interactive Workflow Generation She selects and registers (to the Grid) her analysis code, the appropriate input LFN, and a newly defined ouput LFN x.cards pythia x.ntpl Select input LFN a.C b.C c.C d.C y.ntpl y.root x.ntpl x.root h2root x.root Select CINT script Define output LFN xa.root Chimera Virtual Data System register browse

Interactive Workflow Generation A branch is automatically added in the Chimera Virtual Data Catalog, and a.C is uploaded into “gridspace” and registered with RLS x.cards pythia x.ntpl Select input LFN a.C b.C c.C d.C y.ntpl y.root x.ntpl x.root h2root a.C x.root a.C x.root root Select CINT script Define output LFN xa.root xa.root Chimera Virtual Data System register browse

Interactive Workflow Generation x.cards Querying the Virtual Data Service, Liz sees that xa.root is now available to her as a new virtual data product pythia x.ntpl y.ntpl y.root x.ntpl x.root xa.root h2root x.root a.C root request browse xa.root Chimera Virtual Data System

Request Submission Chimera She requests it…. y.ntpl y.root x.ntpl x.cards She requests it…. pythia x.ntpl y.ntpl y.root x.ntpl x.root xa.root h2root x.root a.C xa.root root request browse xa.root Chimera Virtual Data System

Brief Interlude: The Grid is Busy and Resources are Limited! Production is taking place Other physicists are using the system Use MonALISA to avoid congestion in the grid Limited: As grid computing becomes standard fare, oversubscription to resources will be common ! CMS gives Liz a global high priority Based upon local and global policies, and current Grid weather, a grid-scheduler: must schedule her requests for optimal resource use

Sphinx Scheduling Server Nerve Centre Global view of system Data Warehouse Information driven Repository of current state of the grid Control Process Finite State Machine Different modules modify jobs, graphs, workflows, etc and change their state Flexible Extensible Sphinx Server Message Interface Graph Reducer Control Process Job Predictor Data Warehouse Graph Predictor Job Admission Control Policies Accounting Info Grid Weather Resource Prop. and status Request Tracking Workflows etc Graph Admission Control Graph Data Planner Job Execution Planner Graph Tracker Data Management Information Gatherer

Distributed Services for Grid Distributed Services for Grid Enabled Data Analysis Distributed Services for Grid Enabled Data Analysis Caltech File Service VDT Resource Chimera Virtual Data Service Clarens ROOT Data Analysis Client Florida File Service VDT Resource Clarens Sphinx/VDT Execution Service Globus Clarens GridFTP Clarens Fermilab File Service VDT Resource Sphinx Scheduling Service Globus Iowa File Service VDT Resource Globus RLS Replica Location Service MonALISA Monitoring Service MonALISA

Collaborative Analysis x.cards Meanwhile, John has been developing his statistical fits in b.C by analysing the data product x.root pythia x.ntpl h2root y.ntpl y.root x.ntpl x.root xa.root xb.root x.root a.C b.C root root xb.root xa.root xb.root request browse

Collaborative Analysis x.cards After Liz has finished optimising the event reconstruction, John uses his analysis code b.C on her data product xa.root to produce the final statistical fits and results ! pythia x.ntpl h2root y.root x.ntpl x.root xa.root xb.root xab.root x.root a.C b.C root root xab.root xa.root xb.root request browse root xab.root

Key Features Distributed Services Prototype in Data Analysis Remote Data Service Replica Location Service Virtual Data Service Scheduling Service Grid-Execution Service Monitoring Service Smart Replication Strategies for “Hot Data” Virtual Data w.r.t. Location Execution Priority Management on a Resource Limited Grid Policy Based Scheduling & QoS Virtual Data w.r.t. Existence Collaborative Environment Sharing of Datasets Use of Provenance

Credits California Institute of Technology University of Florida Julian Bunn, Iosif Legrand, Harvey Newman, Suresh Singh, Conrad Steenberg, Michael Thomas, Frank Van Lingen, Yang Xia University of Florida Paul Avery, Dimitri Bourilkov, Richard Cavanaugh, Laukik Chitnis, Jang-uk In, Mandar Kulkarni, Pradeep Padala, Craig Prescott, Sanjay Ranka Fermi National Accelerator Laboratory Anzar Afaq, Greg Graham