Download presentation
Presentation is loading. Please wait.
Published byMagdalene Berry Modified over 6 years ago
1
Cluster Resource Management: A Scalable Approach
Ning Li and Jordan Parker CS 736 Class Project
2
Ning Li and Jordan Parker
Outline Introduction A Scalable Approach: Hierarchy Results Conclusions Questions 11/21/2018 Ning Li and Jordan Parker
3
Why Study Resource Management?
Clusters have become increasingly popular for large parallel computing. Web Servers Clusters are becoming increasingly large to the order of thousands of nodes. Clusters are providing multiple services. Hard to evaluate Bad is easy to determine Good is much harder Possible scenario. ISP is doing web hosting. They have 5 clients. 4 of them pay the same amount, the fifth pays five times as much. Therefore that client should get half of the clusters resources. But clients are really only paying for guarantees, it can get very complicated from here. 11/21/2018 Ning Li and Jordan Parker
4
Resource Management Example
4th Node Services only B Poor Management Ideal A 50% B 50% Node 1 A 50% B 50% Node 2 A 50% B 50% Node 3 B 100% Node 4 Overall A 37.5% B 62.5% A 66% B 33% Node 1 A 66% B 33% Node 2 A 66% B 33% Node 3 B 100% Node 4 A 50% B 50% Overall 11/21/2018 Ning Li and Jordan Parker
5
Ning Li and Jordan Parker
Clustering Goals Scalability Reliability High Performance Affordability Cluster Resource Management should have these same goals 11/21/2018 Ning Li and Jordan Parker
6
Ning Li and Jordan Parker
Related Work Proportional-Share Cluster Reserves Proportional Share Andrea Arpaci-Dusseau – Cluster Reserves 11/21/2018 Ning Li and Jordan Parker
7
Related Work: Approach Differences
Our Goal: to provide a scalable solution for resource management. Other work focused primarily on just having good management This often meant 1 manager for all the nodes Clearly this could present a scalable bottleneck Effectiveness: Other solutions probably better for smaller clusters, we hope to be better for large (>1000 nodes) clusters. 11/21/2018 Ning Li and Jordan Parker
8
Ning Li and Jordan Parker
Outline Introduction A Scalable Approach: Hierarchy Results Conclusions Questions 11/21/2018 Ning Li and Jordan Parker
9
Hierarchy: A Scalable Approach
Hierarchical Management Nodes service jobs Managers facilitate resource management 5 6 7 8 9 10 11 12 2 3 4 1 11/21/2018 Ning Li and Jordan Parker
10
Ning Li and Jordan Parker
Banking Algorithm Goal Determine best allocation given previous usage Primitives Tickets Bank accounts Deposit / withdraw tickets 6 Steps 11/21/2018 Ning Li and Jordan Parker
11
Ning Li and Jordan Parker
Banking Algorithm Step 1: For each service class on each node Deposit unused tickets Step 2: For each service class on each node Reallocate service class Full utilization: Allocation = usage + k Under utilization: Allocation = usage - k 11/21/2018 Ning Li and Jordan Parker
12
Banking Algorithm Cont.
Step 3: For each service class Compare total allocation to desired Subtract from over-allocated Add to needy & under-allocated Step 4: For each service class Deposit / Withdraw If still over-allocated withdraw If still under-allocated deposit 11/21/2018 Ning Li and Jordan Parker
13
Banking Algorithm Cont.
Step 5: Withdraw and allocate Reward the needy nodes Step 6: Done, clear the bank accounts 11/21/2018 Ning Li and Jordan Parker
14
Ning Li and Jordan Parker
Reliability Bottom-up Manager Replacement 5 6 7 8 9 10 11 12 2 3 4 1 Not relevant to the performance of our scheduler, we didn’t even simulate it but … It does show that the network layout we’ve designed could easily handle failures Making the tree balance itself and handling failures could be relatively straight forward 5 5 6 7 2 2 1 3 8 9 10 4 11 12 11/21/2018 Ning Li and Jordan Parker
15
Ning Li and Jordan Parker
Outline Introduction A Scalable Approach: Hierarchy Results Conclusions Questions 11/21/2018 Ning Li and Jordan Parker
16
Ning Li and Jordan Parker
Results Cluster Nodes Managers 1st/2nd Level Reporting 1st/2nd Level Workloads Class 2 Constraints Tests 4 2/1 1/1 Steady Dyn 1 1/5 100 10/1 1-30 2 3 900 30/1 1-300 5 Not going show case of having one manager (here). The data is a nice comparison – but take our word for it we got essentially the same results. Max of 900 nodes because NS started page faulting with more nodes. We now have access to a larger server – ironsides (7Gb RAM), we’ll see what happens Choose 3 service classes because it makes it easy to evaluate and easy to 11/21/2018 Ning Li and Jordan Parker
17
Implementation Details
Simulations via The NS – Network Simulator Low bandwidth 10Mbs communication network UDP for lower server overhead Assumptions Node level resource management works ideally UDP might not be appropriate when introducing fault tolerance, we just wanted to show that it could work on a low bandwidth network since it introduces lower overheads. 11/21/2018 Ning Li and Jordan Parker
18
Ning Li and Jordan Parker
Test 1: Overview 4 nodes – 3 services – 60/30/10 Allocation 4th node receives all of 3rd class’s requests Steady Workload 2nd 33% 1st 66% Node 1 2nd 33% 1st 66% Node 2 2nd 33% 1st 66% Node 3 2nd 20% 1st 40% 3rd 40% Node 4 2nd 30% 1st 60% 3rd 10% Overall 11/21/2018 Ning Li and Jordan Parker
19
Ning Li and Jordan Parker
Test 1: Data 11/21/2018 Ning Li and Jordan Parker
20
Ning Li and Jordan Parker
Test 2: Overview 100 nodes – 3 services – 60/30/10 Allocation nodes 1-30 receive all of 3rd class’s requests Steady Workload 11/21/2018 Ning Li and Jordan Parker
21
Ning Li and Jordan Parker
Test 2: Data 11/21/2018 Ning Li and Jordan Parker
22
Ning Li and Jordan Parker
Test 3: Overview 100 nodes – 3 services – 60/30/10 Allocation nodes 1-30 receive all of 3rd class’s requests Dynamic Workload 11/21/2018 Ning Li and Jordan Parker
23
Ning Li and Jordan Parker
Test 3: Data 11/21/2018 Ning Li and Jordan Parker
24
Ning Li and Jordan Parker
Test 4: Overview 100 nodes – 3 services – 60/30/10 Allocation nodes 1-30 receive all of 3rd class’s requests Steady Workload Reporting 1/5 Nodes every 0.3 second Managers every 1.5 seconds 11/21/2018 Ning Li and Jordan Parker
25
Ning Li and Jordan Parker
Test 4: Data 11/21/2018 Ning Li and Jordan Parker
26
Ning Li and Jordan Parker
Test 5: Overview 900 nodes – 3 services – 60/30/10 Allocation nodes receive all of 3rd class’s requests Steady Workload 11/21/2018 Ning Li and Jordan Parker
27
Ning Li and Jordan Parker
Test 5: Data 11/21/2018 Ning Li and Jordan Parker
28
Ning Li and Jordan Parker
Outline Introduction A Scalable Approach: Hierarchy Results Conclusions Questions 11/21/2018 Ning Li and Jordan Parker
29
Ning Li and Jordan Parker
Conclusions Benefits of an hierarchy Scalable Reliable Geographic Applications Implemented a new management scheme: Banking Comparable Results Improved Scalability 11/21/2018 Ning Li and Jordan Parker
30
Ning Li and Jordan Parker
Conclusions Clusters are sensitive to small policy changes Clusters are built for specific workloads Their performance is important and small changes have significant impact No scheme is universally applicable Future Work Real system implementation Real Workloads Real node level resource management More steady performance 11/21/2018 Ning Li and Jordan Parker
31
Ning Li and Jordan Parker
Outline Introduction A Scalable Approach: Hierarchy Results Conclusions Questions 11/21/2018 Ning Li and Jordan Parker
32
Ning Li and Jordan Parker
Questions 11/21/2018 Ning Li and Jordan Parker
33
Related Work: Proportional-Share
Stride Scheduling Ticket based and similar to lottery Scale Randomly query k nodes to find best allocation Different Application Condor-like resource allocation/applications Extending Proportional-Share Scheduling to a Network of Workstations Andrea C. Arpaci-Dusseau and David Culler 11/21/2018 Ning Li and Jordan Parker
34
Related Work: Cluster Reserves
Resource Container Schedulers Constrained Optimization Algorithm Scale Centralized single manager 11/21/2018 Ning Li and Jordan Parker
35
Hierarchical Cluster Reserves – Version 1
Modify Cluster Reserves optimization algorithm Use it when manager manages nodes AND when level_n+1 manager manages level_n managers. 11/21/2018 Ning Li and Jordan Parker
36
Hierarchical Cluster Reserves – Version 2
Cluster Reserves optimization algorithm Use it when manager manages nodes Don’t use it for upper level managers Modify the manager to manager reporting Lie to the algorithm 11/21/2018 Ning Li and Jordan Parker
Similar presentations
© 2024 SlidePlayer.com. Inc.
All rights reserved.