Download presentation
Presentation is loading. Please wait.
Published byさみら やまがた Modified over 5 years ago
1
Distributed Services for Grid Distributed Services for Grid
Enabled Data Analysis Distributed Services for Grid Enabled Data Analysis
2
Scenario Liz and John are members of CMS
Liz is from Caltech and is an expert in event reconstruction John is from Florida and is an expert in statistical fits They wish to combine their expertise and collaborate on a CMS Data Analysis Project
3
Grid-services Web Service: Clarens
Demo Goals Prototype vertically integrated system Transparent/seamless experience Distribute grid services using a uniform web service Clarens ! Understand system latencies failure modes Investigate request scheduling in a resource limited and dynamic environment Emphasize functionality over scalability Investigate interactive vs. scheduled data analysis on a grid Hybrid example Understand where are the difficult issues Grid Monitoring Service MonALISA Grid Resource VDT Server Grid Execution VDT Client Grid Scheduling Sphinx Virtual Data Chimera Workflow Generation ShahKar Collaborative Environment CAVE Grid-services Web Service: Clarens Analysis Client IGUANA ROOT Web Browser PDA Remote Data Clarens
4
Data Discovery Chimera Virtual data products are
pre-registered with the Chimera Virtual Data Service. Using Clarens, data products are discovered by Liz and John by remotely browsing the Chimera Virtual Data Service y.cards x.cards pythia pythia y.ntpl x.ntpl h2root h2root y.root x.root y.ntpl y.root x.ntpl x.root request browse Chimera Virtual Data System
5
Data Analysis Chimera Liz wants to analyse x.root
using her analysis code a.C x.cards pythia // Analysis code: a.C #include <iostream.h> #include <math.h> #include "TFile.h" #include "TTree.h" #include "TBrowser.h" #include "TH1.h" #include "TH2.h" #include "TH3.h" #include "TRandom.h" #include "TCanvas.h" #include "TPolyLine3D.h" #include "TPolyMarker3D.h" #include "TString.h" void a( char treefile[], char newtreefile[] ) { Int_t Nhep; Int_t Nevhep; Int_t Isthep[3000]; Int_t Idhep[3000], Jmohep[3000][2], Jdahep[3000][2]; Float_t Phep[3000][5], Vhep[3000][4]; Int_t Irun, Ievt; Float_t Weight; Int_t Nparam; Float_t Param[200]; TFile *file = new TFile( treefile ); TTree *tree = (TTree*) file -> Get( "h10 tree -> SetBranchAddress( "Nhep", &Nh x.ntpl h2root x.root Chimera Virtual Data System
6
Interactive Workflow Generation
Liz browses the local directory for her analysis code and the Chimera Virtual Data Service for input LFNs… x.cards pythia x.ntpl Select input LFN h2root x.root Select CINT script Define output LFN Chimera Virtual Data System register browse
7
Interactive Workflow Generation
She selects and registers (to the Grid) her analysis code, the appropriate input LFN, and a newly defined ouput LFN x.cards pythia x.ntpl Select input LFN a.C b.C c.C d.C y.ntpl y.root x.ntpl x.root h2root x.root Select CINT script Define output LFN xa.root Chimera Virtual Data System register browse
8
Interactive Workflow Generation
A branch is automatically added in the Chimera Virtual Data Catalog, and a.C is uploaded into “gridspace” and registered with RLS x.cards pythia x.ntpl Select input LFN a.C b.C c.C d.C y.ntpl y.root x.ntpl x.root h2root a.C x.root a.C x.root root Select CINT script Define output LFN xa.root xa.root Chimera Virtual Data System register browse
9
Interactive Workflow Generation
x.cards Querying the Virtual Data Service, Liz sees that xa.root is now available to her as a new virtual data product pythia x.ntpl y.ntpl y.root x.ntpl x.root xa.root h2root x.root a.C root request browse xa.root Chimera Virtual Data System
10
Request Submission Chimera She requests it…. y.ntpl y.root x.ntpl
x.cards She requests it…. pythia x.ntpl y.ntpl y.root x.ntpl x.root xa.root h2root x.root a.C xa.root root request browse xa.root Chimera Virtual Data System
11
Brief Interlude: The Grid is Busy and Resources are Limited!
Production is taking place Other physicists are using the system Use MonALISA to avoid congestion in the grid Limited: As grid computing becomes standard fare, oversubscription to resources will be common ! CMS gives Liz a global high priority Based upon local and global policies, and current Grid weather, a grid-scheduler: must schedule her requests for optimal resource use
12
Sphinx Scheduling Server
Nerve Centre Global view of system Data Warehouse Information driven Repository of current state of the grid Control Process Finite State Machine Different modules modify jobs, graphs, workflows, etc and change their state Flexible Extensible Sphinx Server Message Interface Graph Reducer Control Process Job Predictor Data Warehouse Graph Predictor Job Admission Control Policies Accounting Info Grid Weather Resource Prop. and status Request Tracking Workflows etc Graph Admission Control Graph Data Planner Job Execution Planner Graph Tracker Data Management Information Gatherer
13
Distributed Services for Grid Distributed Services for Grid
Enabled Data Analysis Distributed Services for Grid Enabled Data Analysis Caltech File Service VDT Resource Chimera Virtual Data Service Clarens ROOT Data Analysis Client Florida File Service VDT Resource Clarens Sphinx/VDT Execution Service Globus Clarens GridFTP Clarens Fermilab File Service VDT Resource Sphinx Scheduling Service Globus Iowa File Service VDT Resource Globus RLS Replica Location Service MonALISA Monitoring Service MonALISA
14
Collaborative Analysis
x.cards Meanwhile, John has been developing his statistical fits in b.C by analysing the data product x.root pythia x.ntpl h2root y.ntpl y.root x.ntpl x.root xa.root xb.root x.root a.C b.C root root xb.root xa.root xb.root request browse
15
Collaborative Analysis
x.cards After Liz has finished optimising the event reconstruction, John uses his analysis code b.C on her data product xa.root to produce the final statistical fits and results ! pythia x.ntpl h2root y.root x.ntpl x.root xa.root xb.root xab.root x.root a.C b.C root root xab.root xa.root xb.root request browse root xab.root
16
Key Features Distributed Services Prototype in Data Analysis
Remote Data Service Replica Location Service Virtual Data Service Scheduling Service Grid-Execution Service Monitoring Service Smart Replication Strategies for “Hot Data” Virtual Data w.r.t. Location Execution Priority Management on a Resource Limited Grid Policy Based Scheduling & QoS Virtual Data w.r.t. Existence Collaborative Environment Sharing of Datasets Use of Provenance
17
Credits California Institute of Technology University of Florida
Julian Bunn, Iosif Legrand, Harvey Newman, Suresh Singh, Conrad Steenberg, Michael Thomas, Frank Van Lingen, Yang Xia University of Florida Paul Avery, Dimitri Bourilkov, Richard Cavanaugh, Laukik Chitnis, Jang-uk In, Mandar Kulkarni, Pradeep Padala, Craig Prescott, Sanjay Ranka Fermi National Accelerator Laboratory Anzar Afaq, Greg Graham
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.