A Multi-Agent Learning Approach to Online Distributed Resource Allocation Chongjie Zhang Victor Lesser Prashant Shenoy Computer Science Department University of Massachusetts Amherst
Focus This paper presents a multi-agent learning (MAL) approach to address resource sharing in cluster networks. – Exploit unknown task arrival patterns Problem characteristics: – Realistic – Multiple agents – Partial observability – No global reward signal – Communication delay – Two interacting learning problems
Increasing Computing Demands “Software as a service” is becomeing a popular IT business model. Challenging to build large computing infrastructure to host such wide-spread online services.
A Potentially Cost-Effective Solution Shared clusters – Built using commodity PCs or workstations. – Running the number of applications significantly larger than the number of nodes … Resource manager A dedicated cluster … Resource manager A shared cluster [Arpacidusseau and Culler, 1997; Aron et al., 2000; Urgaonkar and Shenoy 2003]
Building Larger, Scalable Computing Infrastructures Centralized resource management limits the size of shared clusters. Organizing shared clusters into a network and sharing resource across clusters. How to efficiently share resources within a cluster network? Shared Cluster
Outline Problem Formulation Fair Action Learning Algorithm Learning Distributed Resource Allocation – Local Allocation Decision – Task Routing Decision Experimental Results Summary
Problem Formulation A distributed sequential resource allocation problem (DSRAP) is denoted as a tuple : – C = { C 1, …, C m } is a set of agents (or clusters) – A = { a ij } m x m is the adjacent matrix of agents and a ij is the task transfer time from C i to C j – T = { t 1, …, t l } is a set of task types – R = { R 1, …, R q } is a set of resource types – B = { D ij } l x m is the task arrival pattern and D ij is the arrival distribution of tasks of type t i at C j
Problem Description: Cluster Network C1C1 C6C6 C2C2 C4C4 C3C3 C5C5 C7C7 C9C9 C8C8 C 10 a 12 a 13 a 24 a 46 a 25 a 35 a 56 a 37 a 79 a 69 a 68 a 8, 10 a 9, 10 … Computing node Resource Cluster R1R1 R2R2 R3R3
Problem Formulation: Static Model (Cont.) Each agent C i = { n i1, …, n ik } contains a set of computing nodes. Each node n ij provides a set of resources, represented as {, …, }, where v ijh is the capacity of resource R h type on node n ij. We assume there exists standards that quantify each type of resource.
Problem Description: Task A task is denoted as a tuple, where – t is the task type – u is the utility rate of the task – w is the maximum waiting time before being allocated – d i is the demand for resource i = 1, …, q.
Problem Description: Task Type A task type characterizes a set of tasks, each of whose feature components follows a common distribution. A task type t is denoted as a tuple, where – D t s is the task service time distribution – D t u is the distribution of utility rate – D t w is the distribution of the maximum waiting time – D t di is the distribution of the demand for resource i = 1, …, q.
Local Task Allocation Decision-Making Individual Agent’s Decision-Makings Task Routing Decision-Making Local Resource Scheduling Local Resource Scheduling Local Task Allocation Decision-Making Task Routing Decision-Making Task Routing Decision-Making Existing cluster resource scheduling algorithm Tasks to be allocated locallyTasks not allocated locally Task Set T T2T2 T3T3 T4T4
Problem Goal The main goal is to derive decision policies for each agent that maximize the average utility rate (AUR) of the whole system. Note that, due to its partial view of the system, each individual cluster can only observe its local utility rate, but not the system's utility rate.
Multi-Agent Reinforcement Learning (MARL) In a multi-agent setting, all agents are concurrently learning their policies. The environment becomes non-stationary from the perspective of an individual agent. Single-agent reinforcement learning algorithms may diverge due to lack of synchronization. Several MARL algorithms are proposed. – GIGA, GIGA-Wolf, WPL, etc.
Fair Action Learning (FAL) Algorithm We usually don’t know the exact policy gradient used by GIGA in practical problems. FAL is a direct policy search technique. FAL is a variant of GIGA, using an easily-calculable, approximate policy gradient. Policy gradient GIGA’s normalization function
Local Task Allocation Decision-Making Individual Agent’s Decision-Makings Task Routing Decision-Making Local Resource Scheduling Local Resource Scheduling Local Task Allocation Decision-Making Tasks to be allocated locallyTasks not allocated locally Task Set T T2T2 T3T3 T4T4
Local Task Allocation Decision-Making Select a subset of received tasks to be allocated locally to maximize its local utility rate. Potentially improve the global utility rate. Use an incrementally selecting algorithm. selected := Ø allocable := getAllocable(tasks) t := selectTask(Allocable) t = nil selected := selected {t} tasks := tasks \ {t} selected := selected {t} tasks := tasks \ {t} learn() Return selected Yes No
Learning Local Task Allocation Learning model: – State: features describing both tasks to be allocated and availability of local resources – Action: selecting a task – Reward for selecting task a at state s Due to partial observation, each agent uses FAL to learn a stochastic policy. Q-learning is used to update the value function
Accelerating the Learning Process Reasons – Extremely large policy search space – Non-stationary learning environment – Avoid poor initial policies in practical systems Techniques – Initialize policies with a greedy allocation algorithm – Set utilization threshold for conducting ε-greedy exploration – Limit the exploration rate for selecting nil task
Local Task Allocation Decision-Making Individual Agent’s Decision-Makings Task Routing Decision-Making Local Resource Scheduling Local Resource Scheduling Tasks to be allocated locallyTasks not allocated locally Task Set T T2T2 T3T3 T4T4
Task Routing Decision-Making To which neighbor should an agent forward an unallocated task to get it to an unsaturated cluster before it expires? Learn to route tasks via interacting with its neighbors The learning objective is to maximize the probability of each task to be allocated in the system. C1C1 C6C6 C2C2 C4C4 C3C3 C5C5 Task
Learning Task Routing State s x is defined by the characteristics of the current task x that an agent is forwarding. An action j corresponds to choosing neighbor j for forwarding a task. Reward is the allocation probability of task x forwarded to neighbor j : The probability that j allocates x locally The allocation probability of x forwarded by j Routing policy of j
Learning Task Routing (cont.) The local allocation probability Q i ( s x, j ) is the expected probability that the task x will be allocated if an agent i forwards it to its neighbor j. Q-learning is used to update the value function. FAL is used to learn the task routing policy.
Dual Exploration i i j j s s d d r ( s x, i ) r ( s x, j ) Q i ( s x, j ) Q j ( s x, i ) x Forward Exploration Backward Exploration [Kumer and Miikkulainen, 1999]
Experiments: Compared Approaches Distributed approaches: A centralized approach – using best-first algorithm with a global view – ignore the communication delay – sometimes generating optimal allocation
Experimental Setup Cluster network with heterogeneous clusters and heterogeneous computing nodes (total 1024 nodes) Four types of tasks: ordinary, IO-intensive, compute- intensive, and demanding Two task arrival patterns: light load and heavy load
Experimental Result: Light Load
Experimental Result: Heavy Load
Summary This paper presents a multi-agent learning (MAL) approach to address resource sharing in cluster networks for building large computing infrastructure. Experimental results are encouraging. This work plausibly suggests that MAL may be a promising approach to online optimization problems in distributed systems.