Load Balancing Tasks with Overlapping Requirements Milan Vojnovic Microsoft Research Joint work with Dan Alistarh, Christos Gkantsidis, Jennifer Iglesias, Bo Zong
Motivating Application Scenario: Stream Processing Platforms 2
Tasks and Requirements 3
4
Problem #1: Bi-Criteria Load Balancing Query Assignment Problem: Find an assignment of tasks to machines that Criteria 1: minimizes the total number of distinct requirements that need to be supplied to machines Criteria 2: the number of tasks assigned over machines is balanced 5
Problem #2: Min-Max Load Balancing Query Assignment Problem: Find an assignment of tasks to machines that minimizes the maximum number of distinct requirements needed by a machine 6
Other Motivating Application Scenarios Scheduling tasks in distributed clusters of machines with data locality … Beyond resource allocation in data centres: Clustering of information objects (documents, images, videos) Summarizing topics for collections of documents … 7
Related Work 8
Problem #1: Bi-Criteria Load Balancing 9
NP Hardness Query Assignment Problem is NP-complete Proof: Reduction from the well known bin packing problem 10
Random Query Assignment 11
Deficiency of Random Query Assignment 12
Special Case: Tasks with Singleton Requirements There exists a polynomial-time algorithm that guarantees 2- approximation for singleton task requirements with arbitrary weights 13
Algorithm 14
Tasks with Arbitrary Sets of Requirements 15
Gadget: Minimum Task Type Packing 16
Algorithm 17
Experimental Evaluation 18
Offline Algorithms MQP = defined in an earlier slide OffRand = uniform random assignment of a query type to a machine IC = Incremental cost MMS = Min-max traffic cost per machine 19
Performance of Offline Algorithms Number of requirements per task 20
Online Task Assignment 21
Performance of Online Algorithms Number of requirements per task 22
Problem #2: Min-Max Load Balancing 23
Online Task Assignment 24
Hidden Co-Clustering Input 25
Recovery Theorem 26
Experimental Evaluation Dataset Greedy Random = random task arrival Decreasing with respect to the number of requirements Balance big = large tasks to least loaded, small items according to greedy Prefer big = large tasks to least loaded, delayed assignment of up to a fixed number of small tasks 27
Retail dataset 28
Conclusion Studied two variants of non-standard load balancing problems Bi-criteria and min-max Approximation ratios for offline problems Hidden clustering recovery conditions for a simple greedy online task assignment strategy Open questions: Tighter approximation ratios for offline versions of both problems? Similar hidden cluster recover questions (allowing for more memory)? 29