Hierarchical Scheduling for Diverse Datacenter Workloads Arka A. Bhattacharya, David Culler, Ali Ghodsi, Scott Shenker, and Ion Stoica University of California, Berkeley Eric Friedman International Computer Science Institute, Berkeley ACM SoCC’13
Hierarchical Scheduling A feature of cloud schedulers. Enables scheduling resources to reflect organizational priorities. “Hierarchical scheduling 是 cloud scheduler 的一個關鍵特徵” The key feature of hierarchical scheduling—which is absent in flat or non-hierarchical scheduling—is that if some node in the hierarchy is not using its resources they are redistributed among that node’s sibling nodes, as opposed to all leaf nodes.
Hierarchical Share Guarantee Assign to each node in the weighted tree some guaranteed share of the resources. A node nL is guaranteed to get at least x share of resources from it parent, where x equals to Wi: weight of node ni P(): parent of a node C(): the set of children of a node A(): the subset of demanding nodes A leaf node is demanding if it asks for more resources than are allocated to it, whereas an internal node is demanding if any of its children are demanding.
Example Given 480 servers 240 48 80 48 96 96 96 96 160
Multi-resource Scheduling Workloads in data centers tend to be diverse. CPU-intensive, memory-intensive, or I/O intensive. Ignoring the actual resource needs of jobs leads to poor performance isolation and low throughput for jobs.
Dominant Resource Fairness (DRF) A generalization of max-min fairness to multiple resource types. Maximize the minimum dominant shares of users in the system. Dominant share si is the maximum share of resource among all shares of a user. Dominant resource is the resource corresponding to the dominant share. DRF是一種通用的多資源的max-min fairness分配策略。 簡而言之,DRF試圖最大化所有用戶中最小的dominant share。 Dominant share是在所有已经分配给用戶的多種資源中,占據最大份額的一種資源。 DRF直觀想法是在多資源環境下,一个用戶的資源分配應該由該用戶的dominant share决定。
Example Dominant resource Job 1: memory Job 2: CPU Dominant share 60%
How DRF Works Given a set of users, each with a resource demand vector. The resources required to execute one job. Starts with every user being allocated with zero resources. Repeatedly picks the user with the lowest dominant share. Launches one of the user’s job if there are enough resources available in the system.
Example System with 9 CPUs and 18 GB RAM. User A: <1 CPU, 4 GB> User B: <3 CPUs, 1 GB>
Hierarchical DRF (H-DRF) Static H-DRF Collapsed hierarchies Naive H-DRF Dynamic H-DRF
Static H-DRF A static version of DRF to handle hierarchies. Algorithm Given the hierarchy structure and the amount of resources in the system. Starts with every leaf nodes being allocated with zero resources. Repeatedly allocates resource to a leaf node until no more resources can be assigned to any node.
Resource Allocation in Static H-DRF Start at the root of the tree and traverse down to a leaf. At each step picking the demanding child that has the smallest dominant share. Internal nodes are assigned the sum of all the resources assigned to their immediate children. Allocate the leaf node an ε amount of its resource demands. Increases the node’s dominant share by ε.
Example Given 10 CPUs and 10 GPUs.
Weakness of Static H-DRF Re-calculating the static H-DRF allocations for each of the leaves and arrivals from scratch is computationally infeasible.
Collapsed Hierarchies Converts a hierarchical scheduler into a flat one and apply weighted DRF algorithm. Works when only one resource is involved. Violates the hierarchical share guarantee for internal nodes in the hierarchy.
Example Given Flatten nr n1,1 <1,1> 50% n2,1 <1,0> 25%
Weighted DRF Each user i is associated a weight vector Wi = {wi,1, … wi,m}. wi,j represents the weight of user i for resource j. Dominant share If the weights of all users are set to 1 => DRF. wi,j
Weighted DRF in Collapsed Hierarchies Each node ni has a weight wi. Let wi,j = wi for 1≦j≦m The ratio between dominated resources allocated to user a and user b equals to wa/wb.
Example Given Collapsed Hierarchies nr n1,1 <1,1> 50%
Naive H-DRF A natural adaptation of the original DRF to the hierarchical setting. The hierarchical share guarantee is violated for leaf nodes. Starvation
Example Static H-DRF Naive H-DRF Dominate share = 1.0
Dynamic H-DRF Does not suffer from starvation. Satisfy the hierarchical share guarantee. Two key features: Rescaling to minimum nodes Ignoring blocked nodes
Rescaling to Minimum Nodes Compute the resource consumption of an internal node as follows: Find the demanding child with minimum dominant share M. Rescale every child’s resource consumption vector so that its dominant share becomes M. Add all the children’s rescaled vectors to get the internal node’s resource consumption vector.
Example Given 10 CPUs and 10 GPUs. After n2,1 finishes a job and release 1 CPU: Dominate share = 0.4 Dominate share = 0.5 <0.4, 0.4> <0.5, 0> <0.4, 0> <0, 1> <0, 0.4> <0, 1>
Ignoring Blocked Nodes Dynamic H-DRF only consider non- blocked nodes for rescaling. A leaf node is blocked if either Any of the resources it requires are saturated. The node is non-demanding. An internal node is blocked if all of its children are blocked.
Example Static H-DRF Without blocked Dominate share = 1/3
Allocation Properties Hierarchical Share Guarantees Group Strategy-proofness No group of users can misrepresent their resource requirements in such a way that all of them are weakly better off, and at least one of them is strictly better off. Recursive Scheduling Not Population Monotonicity PM: Any node exiting the system should not decrease the resource allocation to any other node in the hierarchy tree.
Example
Evaluation - Hierarchical Sharing 49 Amazon EC2 severs Dominant resource: n1,1, n2,1, n2,2: CPU n1,2: GPU
Result pareto-efficiency: no node in the hierarchy can be allocated an extra task on the cluster without reducing the share of some other node.
Conclusion Proposed H-DRF, which is a hierarchical multi-resource scheduler. Avoid job starvation and maintain hierarchical share guarantee. Future works DRF under placement constraints. Efficient allocation vector update.