Download presentation
Presentation is loading. Please wait.
1
DAS-3/Grid’5000 meeting: 4th December 20061 The KOALA Grid Scheduler over DAS-3 and Grid’5000 Processor and data co-allocation in grids Dick Epema, Alexandru Iosup, Mathieu Jan, Hashim Mohamed, Ozan Sonmez Parallel and Distributed Systems Group
2
DAS-3/Grid’5000 meeting: 4th December 20062 Contents Our context: grid scheduling and co-allocation The design of the KOALA co-allocating scheduler Some performance results KOALA over Grid’5000 and DAS-3 Conclusion & future work
3
DAS-3/Grid’5000 meeting: 4th December 20063 Grid scheduling environment System Grid schedulers usually do not own resources themselves Grid schedulers have to interface to different local schedulers Sun Grid Engine (SGE 6.0 ) on DAS-2/DAS-3 OAR on Grid’5000 Workload Various kind of applications Various requirements
4
DAS-3/Grid’5000 meeting: 4th December 20064 Co-Allocation (1) In grids, jobs may use multiple types of resources in multiple sites: co-allocation or multi-site operation Without co-allocation, a grid is just a big load-sharing device Find suitable candidate system for running a job If the candidate is not suitable anymore, migrate multiple separate jobs grid
5
DAS-3/Grid’5000 meeting: 4th December 20065 Co-Allocation (2) With co-allocation Use available resources (e.g., processors) Access and/or process geographically spread data Application characteristics (e.g., simulation in one location, visualization in another) Problems More difficult resource-discovery process Need to coordinate allocations of local schedulers Slowdown due to wide-area communications single global job grid
6
DAS-3/Grid’5000 meeting: 4th December 20066 A model for co-allocation: schedulers global queue with grid scheduler LS local queues with local schedulers local jobs global job KOALA clusters LS load sharingco-allocation non-local job
7
DAS-3/Grid’5000 meeting: 4th December 20067 A model for co-allocation: job types fixed job flexible job non-fixed job scheduler decides on component placement scheduler decides on split up and placement job components same total job size job component placement fixed
8
DAS-3/Grid’5000 meeting: 4th December 20068 A model for co-allocation: policies Placement policies dictate where the components of a job go Placement policies for non-fixed jobs Load-aware: Worst Fit (WF) (balance load in clusters) Input-file-location-aware: Close-to-Files (CF) (reduce file-transfer times) Communication-aware: Cluster Minimization (CM) (reduce number of wide-area messages) Placement policies for flexible jobs: Communication- and queue time-aware: Flexible Cluster (CM + reduce queue wait time)Minimization (FCM)
9
DAS-3/Grid’5000 meeting: 4th December 20069 KOALA: a Co-Allocating grid scheduler Main goals 1.Processor co-allocation: non-fixed/flexible jobs 2.Data co-allocation: move large input files to the locations where the job components will run prior to execution 3.Load sharing: in the absence of co-allocation KOALA Run alongside local schedulers Scheduler independent from Globus Uses Globus components (e.g., RSL and GridFTP) For launching jobs uses its own mechanisms or Globus DUROC Has been deployed on the DAS2 in September 2005
10
DAS-3/Grid’5000 meeting: 4th December 200610 KOALA: the architecture PIP/NIP: information services RLS: replica location service CO: co-allocator PC: processor claimer RM: run monitor RL: runners listener DM: data manager Ri: runners SGE ?
11
DAS-3/Grid’5000 meeting: 4th December 200611 KOALA: the runners The KOALA runners are adaptation modules for different application types Set up communication Launch applications Current runners KRunner: default KOALA runner that co-allocates processors and that’s it DRunner: DUROC runner for co-allocated MPI applications IRunner: runner for applications using the Ibis Java library for grid applications
12
DAS-3/Grid’5000 meeting: 4th December 200612 KOALA: job flow with four phases new submission place job + _ placement queueclaiming queue + _ Phase 1: job placement Phase 2: file transfer Phase 3: claim processors Phase 4: launch job runners claim processors retry
13
DAS-3/Grid’5000 meeting: 4th December 200613 KOALA: job time line If advanced reservations are not supported, don’t claim processors immediately after placing, but wait until close to the estimated job start time So processors are left idle (processor gained time) Placing and claiming may have to be retried multiple times time job placement estimated start time claiming time estimated file-transfer time processor gained time processor wasted time job submission
14
DAS-3/Grid’5000 meeting: 4th December 200614 KOALA: performance results (1) With replication (3 copies of input files, 2, 4, or 6 GB) Offer a 30% co-allocation load during two hours Try to keep the background load between 30% and 40% time (s) utilization (%) 90 KOALA workload background load processor gained time processor wasted time 1x8 2x8 4x8 1x16 2x16 4x16 job size (number of components X component size) CF placement tries WF placement tries CF claiming tries WF claiming tries 20 See, e.g.: H.H. Mohamed and D.H.J. Epema, “An Evaluation of the Close-to-Files Processor and Data Co-Allocation Policy in Multiclusters,” IEEE Cluster 2004. number of tries CF
15
DAS-3/Grid’5000 meeting: 4th December 200615 KOALA: performance results (2) Communication-intensive applications Workload 1: low load Workload 2: high load Background load: 15-20% workload 1workload 2 average wait time (s) average execution time (s) average middleware overhead (s) number of job components workload 1workload 2 See: O. Sonmez, H.H. Mohamed, D.H.J. Epema, Communication-Aware Job-Placement Policies for the KOALA Grid Scheduler, 2 nd IEEE Int’l Conf. on e-Science and Grid Computing, dec. 2006.
16
DAS-3/Grid’5000 meeting: 4th December 200616 Grid’5000 and DAS-3 interconnection: scheduling issues Preserve each system usage Characterize jobs (especially for Grid’5000) Usage policies Allow simultaneous use of both testbeds One more level of hierarchy in latencies Co-allocation of jobs Various type of applications: PSAs, GridRPC, etc DAS-3DAS-3
17
DAS-3/Grid’5000 meeting: 4th December 200617 KOALA over Grid’5000 and DAS-3 Goal: testing KOALA policies … … in a heterogeneous environment … with different workloads … with OAR reservation capabilities Grid’5000 from DAS-3 “Virtual” clusters inside KOALA Used whenever DAS-3 is overloaded How: deployment of DAS-3 environment on Grid’5000 DAS-3DAS-3
18
DAS-3/Grid’5000 meeting: 4th December 200618 KOALA over Grid’5000 and DAS-3: how DAS-3DAS-3 DAS-3DAS-3 Lyon Orsay Rennes DAS-3DAS-3 file-server OAR DAS-3DAS-3 DAS-3DAS-3 DAS-3DAS-3 …
19
DAS-3/Grid’5000 meeting: 4th December 200619 Using DAS-3 from Grid’5000 Authorize Grid’5000 users to submit jobs … via SGE directly, OARGrid or KOALA Usage policies? Deployment of environments on DAS-3 as in Grid’5000? When: during nights and week-end? Deployment at grid level KOALA submit kadeploy jobs DAS-3DAS-3
20
DAS-3/Grid’5000 meeting: 4th December 200620 Current progress Collected traces of Grid’5000 [done] OAR tables of 15 clusters OARGrid tables LDAP database Analysis in progress KOALA over Grid’5000 [in progress] KOALA communicate with OAR for its information service [done] GRAM interface to OAR “DAS-2” image on Grid’5000: Globus, KOALA, OAR DAS-3DAS-3
21
DAS-3/Grid’5000 meeting: 4th December 200621 Conclusion Use bandwidth and latency in job placements (lightpaths?) Deal with more application types (PSAs, …) A decentralized P2P KOALA Future work KOALA is a grid resource management system Support processor and data co-allocation Several job placement policies (WF, CF, CM, FCM)
22
DAS-3/Grid’5000 meeting: 4th December 200622 Information Publications see PDS publication database at www.pds.ewi.tudelft.nl Web site KOALA:www.st.ewi.tudelft.nl/koala
23
DAS-3/Grid’5000 meeting: 4th December 200623 Slowdown due to wide-area communications Co-allocated applications are less efficient due to the relatively slow wide-area communications Extension factor of a job service time on multicluster service time on single cluster Co-allocation is beneficial when the extension factor ≤ 1.20 Unlimited co-allocation is no good Communications libraries may be optimized for wide-area communication (>1 usually) See, e.g.: A.I.D. Bucur and D.H.J. Epema, “Trace-Based Simulations of Processor Co-Allocation Policies in Multiclusters,” HPDC 2003.
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.