Grid infrastructure analysis with a simple flow model Andrey Demichev, Alexander Kryukov, Lev Shamardin, Grigory Shpiz Scobeltsyn Institute of Nuclear.

Slides:



Advertisements
Similar presentations
IEEE NSS 2003 Performance of the Relational Grid Monitoring Architecture (R-GMA) CMS data challenges. The nature of the problem. What is GMA ? And what.
Advertisements

Workload management Owen Maroney, Imperial College London (with a little help from David Colling)
INFSO-RI Enabling Grids for E-sciencE Workload Management System and Job Description Language.
FP7-INFRA Enabling Grids for E-sciencE EGEE Induction Grid training for users, Institute of Physics Belgrade, Serbia Sep. 19, 2008.
Development of test suites for the certification of EGEE-II Grid middleware Task 2: The development of testing procedures focused on special details of.
INFSO-RI Enabling Grids for E-sciencE EGEE Middleware The Resource Broker EGEE project members.
The EPIKH Project (Exchange Programme to advance e-Infrastructure Know-How) gLite Grid Services Abderrahman El Kharrim
Stuart K. PatersonCHEP 2006 (13 th –17 th February 2006) Mumbai, India 1 from DIRAC.Client.Dirac import * dirac = Dirac() job = Job() job.setApplication('DaVinci',
Makrand Siddhabhatti Tata Institute of Fundamental Research Mumbai 17 Aug
Computer System Lifecycle Chapter 1. Introduction Computer System users, administrators, and designers are all interested in performance evaluation. Whether.
Pilots 2.0: DIRAC pilots for all the skies Federico Stagni, A.McNab, C.Luzzi, A.Tsaregorodtsev On behalf of the DIRAC consortium and the LHCb collaboration.
The ATLAS Production System. The Architecture ATLAS Production Database Eowyn Lexor Lexor-CondorG Oracle SQL queries Dulcinea NorduGrid Panda OSGLCG The.
KISTI’s Activities on the NA4 Biomed Cluster Soonwook Hwang, Sunil Ahn, Jincheol Kim, Namgyu Kim and Sehoon Lee KISTI e-Science Division.
F.Fanzago – INFN Padova ; S.Lacaprara – LNL; D.Spiga – Universita’ Perugia M.Corvo - CERN; N.DeFilippis - Universita' Bari; A.Fanfani – Universita’ Bologna;
INFSO-RI Enabling Grids for E-sciencE Logging and Bookkeeping and Job Provenance Services Ludek Matyska (CESNET) on behalf of the.
Real Time Monitor of Grid Job Executions Janusz Martyniak Imperial College London.
CHEP'07 September D0 data reprocessing on OSG Authors Andrew Baranovski (Fermilab) for B. Abbot, M. Diesburg, G. Garzoglio, T. Kurca, P. Mhashilkar.
CERN IT Department CH-1211 Genève 23 Switzerland t Internet Services Job Monitoring for the LHC experiments Irina Sidorova (CERN, JINR) on.
Monitoring in EGEE EGEE/SEEGRID Summer School 2006, Budapest Judit Novak, CERN Piotr Nyczyk, CERN Valentin Vidic, CERN/RBI.
Grid Technologies  Slide text. What is Grid?  The World Wide Web provides seamless access to information that is stored in many millions of different.
Interactive Job Monitor: CafMon kill CafMon tail CafMon dir CafMon log CafMon top CafMon ps LcgCAF: CDF submission portal to LCG resources Francesco Delli.
DataGrid WP1 Massimo Sgaravatto INFN Padova. WP1 (Grid Workload Management) Objective of the first DataGrid workpackage is (according to the project "Technical.
Your university or experiment logo here Caitriana Nicholson University of Glasgow Dynamic Data Replication in LCG 2008.
EGEE-II INFSO-RI Enabling Grids for E-sciencE EGEE and gLite are registered trademarks Information System on gLite middleware Vincent.
The EDGeS project receives Community research funding 1 SG-DG Bridges Zoltán Farkas, MTA SZTAKI.
Grid job submission using HTCondor Andrew Lahiff.
November SC06 Tampa F.Fanzago CRAB a user-friendly tool for CMS distributed analysis Federica Fanzago INFN-PADOVA for CRAB team.
Enabling Grids for E-sciencE EGEE-III INFSO-RI Using DIANE for astrophysics applications Ladislav Hluchy, Viet Tran Institute of Informatics Slovak.
CERN IT Department CH-1211 Genève 23 Switzerland t Monitoring: Tracking your tasks with Task Monitoring PAT eLearning – Module 11 Edward.
13 May 2004EB/TB Middleware meeting Use of R-GMA in BOSS for CMS Peter Hobson & Henry Nebrensky Brunel University, UK Some slides stolen from various talks.
Enabling Grids for E-sciencE System Analysis Working Group and Experiment Dashboard Julia Andreeva CERN Grid Operations Workshop – June, Stockholm.
June 24-25, 2008 Regional Grid Training, University of Belgrade, Serbia Introduction to gLite gLite Basic Services Antun Balaž SCL, Institute of Physics.
CERN Using the SAM framework for the CMS specific tests Andrea Sciabà System Analysis WG Meeting 15 November, 2007.
EGEE-III INFSO-RI Enabling Grids for E-sciencE Overview of STEP09 monitoring issues Julia Andreeva, IT/GS STEP09 Postmortem.
What is SAM-Grid? Job Handling Data Handling Monitoring and Information.
Caitriana Nicholson, CHEP 2006, Mumbai Caitriana Nicholson University of Glasgow Grid Data Management: Simulations of LCG 2008.
FRANEC and BaSTI grid integration Massimo Sponza INAF - Osservatorio Astronomico di Trieste.
Enabling Grids for E-sciencE INFSO-RI Tools for CIC Operations, Bologna, 24th May Monitoring workflow in EGEE GOC DB is used to get the list.
1 Andrea Sciabà CERN Critical Services and Monitoring - CMS Andrea Sciabà WLCG Service Reliability Workshop 26 – 30 November, 2007.
Website: Answering Continuous Queries Using Views Over Data Streams Alasdair J G Gray Werner.
CERN IT Department CH-1211 Genève 23 Switzerland t Internet Services Job Priorities update Andrea Sciabà IT/GS Ulrich Schwickerath IT/FIO.
AliEn AliEn at OSC The ALICE distributed computing environment by Bjørn S. Nilsen The Ohio State University.
EGEE-III INFSO-RI Enabling Grids for E-sciencE EGEE and gLite are registered trademarks Abel Carrión Ignacio Blanquer Vicente Hernández.
 CMS data challenges. The nature of the problem.  What is GMA ?  And what is R-GMA ?  Performance test description  Performance test results  Conclusions.
LCG LCG-1 Deployment and usage experience Lev Shamardin SINP MSU, Moscow
Testing and integrating the WLCG/EGEE middleware in the LHC computing Simone Campana, Alessandro Di Girolamo, Elisa Lanciotti, Nicolò Magini, Patricia.
Your university or experiment logo here Performance Monitoring Gidon Moont e-Science, HEP, Imperial College London Talk to JRA1.
Experiment Support CERN IT Department CH-1211 Geneva 23 Switzerland t DBES Andrea Sciabà Hammercloud and Nagios Dan Van Der Ster Nicolò Magini.
STAR Scheduling status Gabriele Carcassi 9 September 2002.
Grid Workload Management (WP 1) Massimo Sgaravatto INFN Padova.
EGEE-II INFSO-RI Enabling Grids for E-sciencE Practical using WMProxy advanced job submission.
EGEE 3 rd conference - Athens – 20/04/2005 CREAM JDL vs JSDL Massimo Sgaravatto INFN - Padova.
Experiment Support CERN IT Department CH-1211 Geneva 23 Switzerland t DBES Andrea Sciabà Ideal information system - CMS Andrea Sciabà IS.
STAR Scheduler Gabriele Carcassi STAR Collaboration.
D.Spiga, L.Servoli, L.Faina INFN & University of Perugia CRAB WorkFlow : CRAB: CMS Remote Analysis Builder A CMS specific tool written in python and developed.
WMS baseline issues in Atlas Miguel Branco Alessandro De Salvo Outline  The Atlas Production System  WMS baseline issues in Atlas.
SAM Status Update Piotr Nyczyk LCG Management Board CERN, 5 June 2007.
Joint Institute for Nuclear Research Synthesis of the simulation and monitoring processes for the data storage and big data processing development in physical.
Monitoring the Readiness and Utilization of the Distributed CMS Computing Facilities XVIII International Conference on Computing in High Energy and Nuclear.
RI EGI-TF 2010, Tutorial Managing an EGEE/EGI Virtual Organisation (VO) with EDGES bridged Desktop Resources Tutorial Robert Lovas, MTA SZTAKI.
LCG/gLite BDII performance measurements Lev Shamardin Scobeltsyn Institute of Nuclear Physics, Moscow State University.
SAM architecture EGEE 07 Service Availability Monitor for the LHC experiments Simone Campana, Alessandro Di Girolamo, Nicolò Magini, Patricia Mendez Lorenzo,
The EPIKH Project (Exchange Programme to advance e-Infrastructure Know-How) gLite Grid Introduction Salma Saber Electronic.
Enabling Grids for E-sciencE Work Load Management & Simple Job Submission Practical Shu-Ting Liao APROC, ASGC EGEE Tutorial.
Design rationale and status of the org.glite.overlay component
Nicolas Jacq LPC, IN2P3/CNRS, France
gLite Information System
EGEE Middleware: gLite Information Systems (IS)
Presentation transcript:

Grid infrastructure analysis with a simple flow model Andrey Demichev, Alexander Kryukov, Lev Shamardin, Grigory Shpiz Scobeltsyn Institute of Nuclear Physics, Moscow State University

Lev Shamardin 2 Why a grid simulator? A simulator allows easy changes to a grid structure and behavior. The grid behavior under stress conditions:  Site failures  Job execution failures  Unexpected raise of the job load  Bottleneck analysis System structure optimization

Lev Shamardin 3 Different approaches to job flow simulation Individual jobs tracking  Monte-Carlo simulation of job submission. The model system simulates of stages of the job life from the submission to completion or failure.  Easier to implement.  Examples: simgrid, gridsim, beosim

Lev Shamardin 4 Different approaches to job flow simulation Statistical models of job flows  Simulation of job flows (i.e. “jobs/second”). The model system consists of boxes which take a number of job flows as input and produces a number of job flows as an output.  The output of such model is actually exactly the numbers we are interested in.  Examples: optorsim

Lev Shamardin 5 Goals Create a simple reallistic model of the grid Model should be capable answering the questions:  Will the grid handle the required constant average job load?  Can it be reorganized to handle the load?

Lev Shamardin 6 Job Registration Job Submission & Status Resource Status Information job job status, output Input queue Planner Output queue Structure of a workload management system

Lev Shamardin 7 Simple flow-based model Simulation of an LCG-like grid Four general node types:  User Interface (UI), the source of the jobs in the system.  Resource Broker (RB), accepts the jobs, queries the informational system, dispatches the jobs to the  Computing Elements (CE), where the jobs are executed.  BDII nodes, which are the informational system.

Lev Shamardin 8 User Interface UIs may be connected to a number of RBs  Each UI generates a constant job requests flow in the direction of a connected RB. UI RB UI RB

Lev Shamardin 9 Resource Broker RBs are connected to BDIIs and CEs, and have connected UIs. RB is characterized by  maximum input job requests flow  number of informational system lookups per job and a maximum number of informational lookups flow  maximum job flow to the CEs

Lev Shamardin 10 Informational System (BDII) Maximum flow of requests it can handle UI RB UI RB BDII

Lev Shamardin 11 Computing Elements The maximum flow of „jobs“ it can process All jobs are assumed to be equal We are not interested in the exact location of the failing CE when the grid is overloaded, therefore we can combine all grid CEs into one virtual CE with the efficient capacity. We could actually do the same for the UIs

Lev Shamardin 12 Simple flow-based model UI RB UI RB BDII Virtual CE

Lev Shamardin 13 Flows Think of a pluming. The UIs generate the flow of incoming jobs to the RBs. The RB generates a flow of the requests to the BDII and CE The flow of the requests to the CE is checked against the maximum

Lev Shamardin 14 Overflows treatment All overflows are monitored but not truncated. If an overflow happened we are interested not in the exact values of the overflow, but in the fact of the overflow itself.

Lev Shamardin 15 Automatic structure generation Information published in the GOC database  No direct access to the GOCDB, so the data is pulled out from the SAM web-services Information published in the services configuration files  No straight way to determine which BDII is used by a particular RB, but gsiftp access to the RB filesystem allows to read an parse the RB config

Lev Shamardin 16 Automatic structure generation: UI No information about UIs is published. We have to guess and/or estimate. Each site is assumed to be running a UI with some default parameters. This UI is connected to the site RBs, or to the country RBs, or to the region RBs or to the „default“ RB.

Lev Shamardin 17 Automatic structure generation: RB RBs parameters are based on the measurements by CMS collaboration („Update on gLite WMS tests“ by Andrea Sciabà). All RBs are assumed to be able to submit jobs to all CEs.

Lev Shamardin 18 Automatic structure generation: RB The RB is using the BDII specified in its configuration if this data is available  Site BDII is used if the information is unavaible.  One of the BDIIs in the same Country is used if there is no site BDII  One of the BDIIs in the same Region is used if there is no BDII in the country  Top-level „default“ BDII is used if there are no BDIIs in the Region.

Lev Shamardin 19 Automatic structure generation For the BDII performance we use the results from the talk „LCG/gLite BDII performance measurements“. The CE performance is scaled according to the number of the CPUs on each CE.

Lev Shamardin 20 Example: russian part of LCG UI, RB, BDII

Lev Shamardin 21 Conclusion A simple flow-based model describing the job load distribution in the grid The structure of the modeled grid is automatically updated to match the real grid structure Parameters of nodes are based on the measured values

Lev Shamardin 22 Conclusion Any node connections or parameters may be overriden allowing to play with the grid Numbers for the current LCG are quite optimistic:  RBs are capable of generating the job flow to accomodate all available resources on CEs, but  Clever connection between RBs and UIs is required, i.e. if we want not to overflow the RB, the UI should become a registered service.

Lev Shamardin 23 Future plans Distinguish different kinds of jobs.  A big number of short-time jobs makes a higher load on the grid than the smaller number of long jobs. Accomodate the delays in the informational system  The information about CE availability is delayed from the reality on the RB, causing job submission failures and resubmissions => additional „background“ load on the RB

Lev Shamardin 24 Acknowledgements The research was partially supported by  INTAS-CERN Grant  RFBR Grant