Presentation is loading. Please wait.

Presentation is loading. Please wait.

Faucets: Efficient Utilization of Multiple Clusters

Similar presentations


Presentation on theme: "Faucets: Efficient Utilization of Multiple Clusters"— Presentation transcript:

1 Faucets: Efficient Utilization of Multiple Clusters
Laxmikant Kale, Jayant DeSouza, Sameer Kumar, Sindhura Bandhakavi, Mani Potnuru Parallel Programming Laboratory Department of Computer Science University of Illinois at Urbana-Champaign change title to match web page put up software, maybe CVS fix components diagram, separate diagram for AQS simpler scripts to configure, run 2/23/2019 Charm++ Workshop 2002

2 Outline Motivation, and Faucets Adaptive Jobs, and
the Faucets solution the Adaptive Jobs solution Faucets Job Submission Job Monitoring Adaptive Jobs, and Performance Results Adaptive Queuing System, and Simulations and Performance Results Future Work 2/23/2019 Charm++ Workshop 2002

3 Motivation Demand for high end compute power, but
Dispersed which machine will give me back my results quickest? Hard to use use ssh to login, ftp files, decide queue, create script, submit because of the hassle, users just submit same script to same machine even if a better alternative exists monitor a running job Low operational efficiency of existing computing systems first this, then outline 2/23/2019 Charm++ Workshop 2002

4 Solution 1: Faucets Motivation #1: dispersed, hard to use
Central source of compute power Users Providers of compute resources User account not needed on every resource Match users and providers Market economy ? QoS requirements, contracts and bidding systems GUI or web-based interface Submission monitoring 2/23/2019 Charm++ Workshop 2002

5 Faucets Cluster Cluster Cluster Parallel systems need to
maximize their efficiency! Faucets Job Specs Cluster Job Submission Bids File Upload Job Specs File Upload Job Id Cluster Job Id Efficiency metrics are profit and utilization Job Monitor Cluster 2/23/2019 Charm++ Workshop 2002

6 Motivation #2: Inefficient Utilization
Allocate A ! Conflict ! B Queued 16 Processor system Job B 8 processors Job A Job B Job A 10 processors first why inefficient, then how adaptive jobs solves also mention external “fragmentation” Current Job Schedulers can have low system utilization ! 2/23/2019 Charm++ Workshop 2002

7 Motivation #2, contd. Chun & Culler paper
Compares FirstPrice (market-based scheduling) with PrioFIFO. Up to 2.5x improvement as degree of job parallelism increases Both have “head-of-line” blocking Adaptive jobs fix this Brent Chun and David Culler – User-centric Performance Analysis of Market-based Cluster Batch Schedulers, CCGrid 2002. 2/23/2019 Charm++ Workshop 2002

8 Solution 2: Adaptive Jobs
Jobs that can shrink or expand the number of processors they are running on at runtime Improve system utilization and response time Properties Min_pe, related to the memory requirements of the job Max_pe, related to speedup 2/23/2019 Charm++ Workshop 2002

9 Adaptive Job Scheduler
Scheduler can take advantage of this adaptivity Improve system utilization and response time Scheduling decisions Shrink existing jobs when a new job arrives Expand jobs to use all processors when a job finishes Processor map sent to the job Bit vector specifying which processors a job is allowed to use (use 3 4 and 5!) Handles regular (non-adaptive) jobs 2/23/2019 Charm++ Workshop 2002

10 Two Adaptive Jobs 16 Processor system Job A Job B Job A Job B
A Expands ! Allocate A ! Allocate B ! Shrink A B Finishes 16 Processor system Job B Min_pe = 8 Max_pe= 16 Job A Job B Job A Max_pe = 10 Min_pe = 1 2/23/2019 Charm++ Workshop 2002

11 Outline Motivation, and Faucets Adaptive Jobs, and
the Faucets solution the Adaptive Jobs solution Faucets Job Submission Job Monitoring Adaptive Jobs, and Performance Results Adaptive Queuing System, and Simulations and Performance Results Future Work 2/23/2019 Charm++ Workshop 2002

12 Faucets: Job Submission
2/23/2019 Charm++ Workshop 2002

13 Submission Mechanism QoS requirements, contract, bidding
type, number of processors memory estimated compute time or table: processors vs. compute time deadline price Authentication, security Accounting Cluster Bartering 2/23/2019 Charm++ Workshop 2002

14 Faucets Cluster Cluster Cluster Parallel systems need to
maximize their efficiency! Faucets Job Specs Cluster Job Submission Bids File Upload Job Specs File Upload Job Id Cluster Job Id Efficiency metrics are profit and utilization Job Monitor Cluster 2/23/2019 Charm++ Workshop 2002

15 Job Monitoring: Appspector
2/23/2019 Charm++ Workshop 2002

16 Using Appspector Charm client-server (CCS) interface User can write
Default server Default Java client User can write Program code to send relevant data Java class to display data 2/23/2019 Charm++ Workshop 2002

17 Clusters Status View 2/23/2019 Charm++ Workshop 2002

18 Adaptive Jobs 2/23/2019 Charm++ Workshop 2002

19 Adaptive Job Framework
Applications written in MPI or Charm++ Scheduler controls the processor map for each job Processor map is used by the job’s load balancer Scheduler Adaptive Application AMPI CHARM++ Loadbalancer Converse Proc. Map Use the Charm++ framework 2/23/2019 Charm++ Workshop 2002

20 Charm++ Charm++: Object based virtualization
Program written as a large number of objects which can migrate Number of objects typically much larger than processors Load-balancer can remap objects Measurement based load balancing Charm++ is a data driven message passing language 2/23/2019 Charm++ Workshop 2002

21 Adaptive Charm++ Programs
Charm++ program is adaptive automatically if a shrink expand enabled centralized load-balancing strategy is used Currently CommLB and RandcentLB are shrink expand enabled Compile with –module CommLB Run with +balancer CommLB 2/23/2019 Charm++ Workshop 2002

22 MPI Jobs How do we make MPI jobs adaptive? AMPI
AMPI maps the MPI processes to user level threads which can migrate Each thread is embedded in a Charm++ object, thus allowing load balancing and shrink-expand Use the Charm++ framework 2/23/2019 Charm++ Workshop 2002

23 Adaptive AMPI Programs
Build AMPI with an adaptive load balancing strategy Call MPI_MIGRATE() at regular intervals in each MPI process, because it will not listen to the processor map otherwise. 2/23/2019 Charm++ Workshop 2002

24 Performance Results for Adaptive Jobs
2/23/2019 Charm++ Workshop 2002

25 Shrink Expand Overhead
0.49 0.56 0.46 0.59 0.54 0.66 0.50 0.61 Expand Time (s) Shrink Time (s) Processors Performance for MD program with 10MB migrated data per processor on NCSA Platinum 2/23/2019 Charm++ Workshop 2002

26 Residual Processes Shrink
Objects are moved from the unallocated processors to the allocated processors Leaves behind a residual process repetition, eliminate More work being done on the loadbalancer Many strategies have been implemented Obvious questions: how long does it take to shrink and expand? New call MPI Migrate 2/23/2019 Charm++ Workshop 2002

27 Effect of Residual Process
Utilization (%) Jobs In System Performance cost (%) 2 1.98 4 1.43 8 3.24 Now we are convinced of the adaptive job implementation, how much does the system performance improve with adaptive jobs Performance on a 16 processor system Time (s) Performance of Job1 and Job2 2/23/2019 Charm++ Workshop 2002

28 Adaptive Queuing System
2/23/2019 Charm++ Workshop 2002

29 AQS Features Multithreaded Reliable and robust
Tested on the cool.cs Linux cluster at PPL Supports most features of standard queuing systems Has the ability to manage adaptive jobs currently implemented in Charm++ and MPI Handles regular (non-adaptive) jobs 2/23/2019 Charm++ Workshop 2002

30 AQS Scheduling Strategy
A library component that decides which jobs to schedule Similar to equipartitioning [N Islam et al] On job arrival and job completion All running jobs and the new one are allocated their minimum number of processors Leftover processors are shared equally subject to each job's maximum processor usage If it is not possible to allocate the new job its minimum number of processors, it is queued 2/23/2019 Charm++ Workshop 2002

31 Simulated Utilization
2/23/2019 Charm++ Workshop 2002

32 Simulated MRT 2/23/2019 Charm++ Workshop 2002

33 Experimental Utilization
2/23/2019 Charm++ Workshop 2002

34 Experimental MRT 2/23/2019 Charm++ Workshop 2002

35 Summary and Future Work
Ease of use – Faucets Better utilization – Charm++/AMPI Adaptive Jobs Go to to download Future Extend the system to other parallel machines Eliminate residual processes Integrate the scheduler with Globus More comprehensive QoS contracts being developed Sophisticated bidding schemes for the faucets framework Bidding schemes to include memory deadline profit etc. 2/23/2019 Charm++ Workshop 2002


Download ppt "Faucets: Efficient Utilization of Multiple Clusters"

Similar presentations


Ads by Google