Presentation is loading. Please wait.

Presentation is loading. Please wait.

Grid infrastructure analysis with a simple flow model Andrey Demichev, Alexander Kryukov, Lev Shamardin, Grigory Shpiz Scobeltsyn Institute of Nuclear.

Similar presentations


Presentation on theme: "Grid infrastructure analysis with a simple flow model Andrey Demichev, Alexander Kryukov, Lev Shamardin, Grigory Shpiz Scobeltsyn Institute of Nuclear."— Presentation transcript:

1 Grid infrastructure analysis with a simple flow model Andrey Demichev, Alexander Kryukov, Lev Shamardin, Grigory Shpiz Scobeltsyn Institute of Nuclear Physics, Moscow State University

2 Lev Shamardin 2 Why a grid simulator? A simulator allows easy changes to a grid structure and behavior. The grid behavior under stress conditions:  Site failures  Job execution failures  Unexpected raise of the job load  Bottleneck analysis System structure optimization

3 Lev Shamardin 3 Different approaches to job flow simulation Individual jobs tracking  Monte-Carlo simulation of job submission. The model system simulates of stages of the job life from the submission to completion or failure.  Easier to implement.  Examples: simgrid, gridsim, beosim

4 Lev Shamardin 4 Different approaches to job flow simulation Statistical models of job flows  Simulation of job flows (i.e. “jobs/second”). The model system consists of boxes which take a number of job flows as input and produces a number of job flows as an output.  The output of such model is actually exactly the numbers we are interested in.  Examples: optorsim

5 Lev Shamardin 5 Goals Create a simple reallistic model of the grid Model should be capable answering the questions:  Will the grid handle the required constant average job load?  Can it be reorganized to handle the load?

6 Lev Shamardin 6 Job Registration Job Submission & Status Resource Status Information job job status, output Input queue Planner Output queue Structure of a workload management system

7 Lev Shamardin 7 Simple flow-based model Simulation of an LCG-like grid Four general node types:  User Interface (UI), the source of the jobs in the system.  Resource Broker (RB), accepts the jobs, queries the informational system, dispatches the jobs to the  Computing Elements (CE), where the jobs are executed.  BDII nodes, which are the informational system.

8 Lev Shamardin 8 User Interface UIs may be connected to a number of RBs  Each UI generates a constant job requests flow in the direction of a connected RB. UI RB UI RB

9 Lev Shamardin 9 Resource Broker RBs are connected to BDIIs and CEs, and have connected UIs. RB is characterized by  maximum input job requests flow  number of informational system lookups per job and a maximum number of informational lookups flow  maximum job flow to the CEs

10 Lev Shamardin 10 Informational System (BDII) Maximum flow of requests it can handle UI RB UI RB BDII

11 Lev Shamardin 11 Computing Elements The maximum flow of „jobs“ it can process All jobs are assumed to be equal We are not interested in the exact location of the failing CE when the grid is overloaded, therefore we can combine all grid CEs into one virtual CE with the efficient capacity. We could actually do the same for the UIs

12 Lev Shamardin 12 Simple flow-based model UI RB UI RB BDII Virtual CE

13 Lev Shamardin 13 Flows Think of a pluming. The UIs generate the flow of incoming jobs to the RBs. The RB generates a flow of the requests to the BDII and CE The flow of the requests to the CE is checked against the maximum

14 Lev Shamardin 14 Overflows treatment All overflows are monitored but not truncated. If an overflow happened we are interested not in the exact values of the overflow, but in the fact of the overflow itself.

15 Lev Shamardin 15 Automatic structure generation Information published in the GOC database  No direct access to the GOCDB, so the data is pulled out from the SAM web-services Information published in the services configuration files  No straight way to determine which BDII is used by a particular RB, but gsiftp access to the RB filesystem allows to read an parse the RB config

16 Lev Shamardin 16 Automatic structure generation: UI No information about UIs is published. We have to guess and/or estimate. Each site is assumed to be running a UI with some default parameters. This UI is connected to the site RBs, or to the country RBs, or to the region RBs or to the „default“ RB.

17 Lev Shamardin 17 Automatic structure generation: RB RBs parameters are based on the measurements by CMS collaboration („Update on gLite WMS tests“ by Andrea Sciabà). All RBs are assumed to be able to submit jobs to all CEs.

18 Lev Shamardin 18 Automatic structure generation: RB The RB is using the BDII specified in its configuration if this data is available  Site BDII is used if the information is unavaible.  One of the BDIIs in the same Country is used if there is no site BDII  One of the BDIIs in the same Region is used if there is no BDII in the country  Top-level „default“ BDII is used if there are no BDIIs in the Region.

19 Lev Shamardin 19 Automatic structure generation For the BDII performance we use the results from the talk „LCG/gLite BDII performance measurements“. The CE performance is scaled according to the number of the CPUs on each CE.

20 Lev Shamardin 20 Example: russian part of LCG UI, RB, BDII

21 Lev Shamardin 21 Conclusion A simple flow-based model describing the job load distribution in the grid The structure of the modeled grid is automatically updated to match the real grid structure Parameters of nodes are based on the measured values

22 Lev Shamardin 22 Conclusion Any node connections or parameters may be overriden allowing to play with the grid Numbers for the current LCG are quite optimistic:  RBs are capable of generating the job flow to accomodate all available resources on CEs, but  Clever connection between RBs and UIs is required, i.e. if we want not to overflow the RB, the UI should become a registered service.

23 Lev Shamardin 23 Future plans Distinguish different kinds of jobs.  A big number of short-time jobs makes a higher load on the grid than the smaller number of long jobs. Accomodate the delays in the informational system  The information about CE availability is delayed from the reality on the RB, causing job submission failures and resubmissions => additional „background“ load on the RB

24 Lev Shamardin 24 Acknowledgements The research was partially supported by  INTAS-CERN Grant 2005-7509  RFBR Grant 06-07-89199


Download ppt "Grid infrastructure analysis with a simple flow model Andrey Demichev, Alexander Kryukov, Lev Shamardin, Grigory Shpiz Scobeltsyn Institute of Nuclear."

Similar presentations


Ads by Google