Jean-Sébastien Gay LIP ENS Lyon, Université Claude Bernard Lyon 1 INRIA Rhône-Alpes GRAAL Research Team Join work with DIET TEAM D istributed I nteractive E ngineering T oolbox DIET Batch and Simbatch: a quick glance
RPC and Grid Computing: Grid RPC AGENT(s) S1S2 S3 S4 A, B, C Answer (C) S2 ! Request Op(C, A, B) Client
Outline 1.Introduction 2.Diet-Batch 3.Simbatch 4.Conclusion and perspectives
DIET Architecture LA MA LA Server front end Master Agent Local Agent Client MA JXTA FAST library Application Modeling System availabilities LDAPNWS
MA SeD_parallel Frontal NFS LSFPBS Loadleveler GLUE SeD_batch SeD_seq Parallel and batch submissions - 1/2 Parallel & sequential jobs → transparent for the user Submit a parallel job → system dependent NFS: copy the code? MPI: LAM, MPICH? batch system dependent Numerous batch systems (homogenization?) Batch schedulers behaviour (queues, scripts, etc.) Information about the internal scheduling process Monitoring & Performance prediction SGEOAR LA
Parallel and batch submissions - 2/2 2 API Client side Request for seq, // resolution or let DIET choose the best Server side Script with generic mnemonics DIET_NAME_FRONTALE, DIET_NB_NODES, DIET_BATCH_NODESFILE A program that must end with a call to diet_submit_call() Experiments
Performance prediction with batch system During the submission stage Need to know when the task will begin/end Need to decide how many processors will be used Need performance prediction! Three means Use a probabilistic tool Ask the batch system (only available for MAUI and OAR 2.0) Use a simulator
Batch scheduler overview Portable Batch System (PBS) First Come First Served (FCFS) OAR (v. 1.6) Conservative BackFilling (CBF) Torque + Maui Only torque: FCFS Maui 3 scheduling policies: BESTFIT, FIRSTFIT (CBF), GREEDY Sun Grid Engine (SGE) FCFS Loadleveler 3 scheduling policies: FCFS, CBF, GANG Possibility to plug external schedulers EASY Maui (should soon become the standard scheduler)
Grid simulator overview Data replication: ChicSim : I. Foster PARallel Simulation Environment for Complex Systems OptorSim: W. H. Bell, D. G. Cameron, R. Carvajal-Schiaffino JAVA Grid-economy GridSim: R.Buyya(Nimrod/G) JAVA Quite similar to Simgrid Non-specialized toolkit Simgrid H. Casanova, A. Legrand and M. Quinson C
… and their drawbacks Minimal support for batch schedulers Sometimes lack of functionalities to create them Often difficult to reuse Example: OptorSim No parallel tasks available Backfilling impossible Lack of realism
Simbatch in a nutshell Goals Cluster simulation for enhancing realism Prediction tool for DIET API for clients Description of the platform in XML files Use of the API in the deployment.xml file Example 1: Creating a batch process on the host « Frontal » Example 2: Creating a resource Each batch must be described in simbatch.xml A specific load can be simulated for each batch API for developers Algorithms are plug-ins Reusable functions Find the first matching slot in a Gantt chart slot_t * find_first_slot(cluster_t c, int nb_nodes, double start_time, double duration); Empty queues and reschedule void generic_reschedule(cluster_t cluster, void (*schedule)(cluster_t cluster, m_task_t task));
Experiment description 2 types of experiments Validation by simulation: parameter variation Topology, scheduling algorithm… Comparison between simulated platform Task generation Inter-arrival time: Poisson law, µ = 300s Resources number: U(1,5) Run time: U(600,1800) Wall time: run time x U(1.1;1.3) Experiment platform 5 node cluster Star topology OAR v. 1.6
Validation
Simulation precision Number of tasks: 100 Makespan: 23h Error rate on the flow metrics around 1%
Conclusion and perspectives DIET-Batch Diet is now able to handle batch schedulers 3 Sed types: sequential, batch, parallel Good performance improvements Simbatch Standalone simulations show good results Configuration file available to simulate Lyon’s site Excellent tool to replay load Next steps Integrate Simbatch in DIET-Batch
Questions ?