A Multi-Agent Learning Approach to Online Distributed Resource Allocation Chongjie Zhang Victor Lesser Prashant Shenoy Computer Science Department University.

Slides:

Advertisements

Similar presentations

Yinyin Yuan and Chang-Tsun Li Computer Science Department

Advertisements

Operations Scheduling

Distributed Systems Major Design Issues Presented by: Christopher Hector CS8320 – Advanced Operating Systems Spring 2007 – Section 2.6 Presentation Dr.

Hadi Goudarzi and Massoud Pedram

Class-constrained Packing Problems with Application to Storage Management in Multimedia Systems Tami Tamir Department of Computer Science The Technion.

Modeling Maze Navigation Consider the case of a stationary robot and a mobile robot moving towards a goal in a maze. We can model the utility of sharing.

U NIVERSITY OF M ASSACHUSETTS, A MHERST Department of Computer Science Solving POMDPs Using Quadratically Constrained Linear Programs Christopher Amato.

A system Performance Model Instructor: Dr. Yanqing Zhang Presented by: Rajapaksage Jayampthi S.

Distributed Process Scheduling Summery Distributed Process Scheduling Summery BY:-Yonatan Negash.

Planning under Uncertainty

On Computing Compression Trees for Data Collection in Wireless Sensor Networks Jian Li, Amol Deshpande and Samir Khuller Department of Computer Science,

Effectively Utilizing Global Cluster Memory for Large Data-Intensive Parallel Programs John Oleszkiewicz, Li Xiao, Yunhao Liu IEEE TRASACTION ON PARALLEL.

Multi-agent Oriented Constraint Satisfaction Authors: Jiming Liu, Han Jing and Y.Y. Tang Speaker: Lin Xu CSCE 976, May 1st 2002.

Beneficial Caching in Mobile Ad Hoc Networks Bin Tang, Samir Das, Himanshu Gupta Computer Science Department Stony Brook University.

Design and Performance Evaluation of Queue-and-Rate-Adjustment Dynamic Load Balancing Policies for Distributed Networks Zeng Zeng, Bharadwaj, IEEE TRASACTION.

INFERRING NETWORKS OF DIFFUSION AND INFLUENCE Presented by Alicia Frame Paper by Manuel Gomez-Rodriguez, Jure Leskovec, and Andreas Kraus.

An Optimal Learning Approach to Finding an Outbreak of a Disease Warren Scott Warren Powell

1 Hybrid Agent-Based Modeling: Architectures,Analyses and Applications (Stage One) Li, Hailin.

Hierarchical Reinforcement Learning Ersin Basaran 19/03/2005.

1 Introduction to Load Balancing: l Definition of Distributed systems. Collection of independent loosely coupled computing resources. l Load Balancing.

EE 685 presentation Optimization Flow Control, I: Basic Algorithm and Convergence By Steven Low and David Lapsley Asynchronous Distributed Algorithm Proof.

LPT for Data Aggregation in Wireless Sensor networks Marc Lee and Vincent W.S Wong Department of Electrical and Computer Engineering, University of British.

Scalable Information-Driven Sensor Querying and Routing for ad hoc Heterogeneous Sensor Networks Maurice Chu, Horst Haussecker and Feng Zhao Xerox Palo.

U NIVERSITY OF M ASSACHUSETTS, A MHERST Department of Computer Science Optimal Fixed-Size Controllers for Decentralized POMDPs Christopher Amato Daniel.

Multiagent Planning with Factored MDPs Carlos Guestrin Daphne Koller Stanford University Ronald Parr Duke University.

An Adaptive Multi-Objective Scheduling Selection Framework For Continuous Query Processing Timothy M. Sutherland Bradford Pielech Yali Zhu Luping Ding.

Reinforcement Learning: Learning algorithms Yishay Mansour Tel-Aviv University.

Bandwidth Allocation in a Self-Managing Multimedia File Server Vijay Sundaram and Prashant Shenoy Department of Computer Science University of Massachusetts.

Distributed Constraint Optimization * some slides courtesy of P. Modi

1 Algorithms for Bandwidth Efficient Multicast Routing in Multi-channel Multi-radio Wireless Mesh Networks Hoang Lan Nguyen and Uyen Trang Nguyen Presenter:

Dimitrios Konstantas, Evangelos Grigoroudis, Vassilis S. Kouikoglou and Stratos Ioannidis Department of Production Engineering and Management Technical.

Roger ZimmermannCOMPSAC 2004, September 30 Spatial Data Query Support in Peer-to-Peer Systems Roger Zimmermann, Wei-Shinn Ku, and Haojun Wang Computer.

Distributed Quality-of-Service Routing of Best Constrained Shortest Paths. Abdelhamid MELLOUK, Said HOCEINI, Farid BAGUENINE, Mustapha CHEURFA Computers.

COGNITIVE RADIO FOR NEXT-GENERATION WIRELESS NETWORKS: AN APPROACH TO OPPORTUNISTIC CHANNEL SELECTION IN IEEE BASED WIRELESS MESH Dusit Niyato,

Conference Paper by: Bikramjit Banerjee University of Southern Mississippi From the Proceedings of the Twenty-Seventh AAAI Conference on Artificial Intelligence.

Distributed Constraint Optimization Michal Jakob Agent Technology Center, Dept. of Computer Science and Engineering, FEE, Czech Technical University A4M33MAS.

1 Distributed Operating Systems and Process Scheduling Brett O’Neill CSE 8343 – Group A6.

Exponential Moving Average Q- Learning Algorithm By Mostafa D. Awheda Howard M. Schwartz Presented at the 2013 IEEE Symposium Series on Computational Intelligence.

Reinforcement Learning

SoftCOM 2005: 13 th International Conference on Software, Telecommunications and Computer Networks September 15-17, 2005, Marina Frapa - Split, Croatia.

Message-Passing for Wireless Scheduling: an Experimental Study Paolo Giaccone (Politecnico di Torino) Devavrat Shah (MIT) ICCCN 2010 – Zurich August 2.

임규찬. 1. Abstract 2. Introduction 3. Design Goals 4. Sample-Based Scheduling for Parallel Jobs 5. Implements.

Chengjie Sun,Lei Lin, Yuan Chen, Bingquan Liu Harbin Institute of Technology School of Computer Science and Technology 1 19/11/ :09 PM.

1 Distributed Energy-Efficient Scheduling for Data-Intensive Applications with Deadline Constraints on Data Grids Cong Liu and Xiao Qin Auburn University.

1 ECE 517: Reinforcement Learning in Artificial Intelligence Lecture 17: TRTRL, Implementation Considerations, Apprenticeship Learning Dr. Itamar Arel.

Model-based Bayesian Reinforcement Learning in Partially Observable Domains by Pascal Poupart and Nikos Vlassis (2008 International Symposium on Artificial.

Optimization Flow Control—I: Basic Algorithm and Convergence Present : Li-der.

RF network in SoC1 SoC Test Architecture with RF/Wireless Connectivity 1. D. Zhao, S. Upadhyaya, M. Margala, “A new SoC test architecture with RF/wireless.

Mobile Agent Migration Problem Yingyue Xu. Energy efficiency requirement of sensor networks Mobile agent computing paradigm Data fusion, distributed processing.

A P2P-Based Architecture for Secure Software Delivery Using Volunteer Assistance Purvi Shah, Jehan-François Pâris, Jeffrey Morgan and John Schettino IEEE.

EE 685 presentation Optimization Flow Control, I: Basic Algorithm and Convergence By Steven Low and David Lapsley.

CUHK Learning-Based Power Management for Multi-Core Processors YE Rong Nov 15, 2011.

MMAC: A Mobility- Adaptive, Collision-Free MAC Protocol for Wireless Sensor Networks Muneeb Ali, Tashfeen Suleman, and Zartash Afzal Uzmi IEEE Performance,

MDPs (cont) & Reinforcement Learning

SCALABLE INFORMATION-DRIVEN SENSOR QUERYING AND ROUTING FOR AD HOC HETEROGENEOUS SENSOR NETWORKS Paper By: Maurice Chu, Horst Haussecker, Feng Zhao Presented.

Model Minimization in Hierarchical Reinforcement Learning Balaraman Ravindran Andrew G. Barto Autonomous Learning Laboratory.

Static Process Scheduling

A Dynamic Query-tree Energy Balancing Protocol for Sensor Networks H. Yang, F. Ye, and B. Sikdar Department of Electrical, Computer and systems Engineering.

A System Performance Model Distributed Process Scheduling.

1 Chapter 17 2 nd Part Making Complex Decisions --- Decision-theoretic Agent Design Xin Lu 11/04/2002.

Learning Team Behavior Using Individual Decision Making in Multiagent Settings Using Interactive DIDs Muthukumaran Chandrasekaran THINC Lab, CS Department.

Reinforcement Learning: Learning algorithms Yishay Mansour Tel-Aviv University.

Generalized Point Based Value Iteration for Interactive POMDPs Prashant Doshi Dept. of Computer Science and AI Institute University of Georgia

Reinforcement Learning for Mapping Instructions to Actions S.R.K. Branavan, Harr Chen, Luke S. Zettlemoyer, Regina Barzilay Computer Science and Artificial.

Optimally Modifying Software for Safety and Functionality Sampath Kannan U.Penn (with Arvind Easwaran & Insup Lee)

Chapter 16 Client/Server Computing Dave Bremer Otago Polytechnic, N.Z. ©2008, Prentice Hall Operating Systems: Internals and Design Principles, 6/E William.

U NIVERSITY OF M ASSACHUSETTS, A MHERST Department of Computer Science Achieving Goals in Decentralized POMDPs Christopher Amato Shlomo Zilberstein UMass.

Announcements Homework 3 due today (grace period through Friday)

ExaO: Software Defined Data Distribution for Exascale Sciences

Metaheuristic methods and their applications. Optimization Problems Strategies for Solving NP-hard Optimization Problems What is a Metaheuristic Method?

Presentation transcript:

A Multi-Agent Learning Approach to Online Distributed Resource Allocation Chongjie Zhang Victor Lesser Prashant Shenoy Computer Science Department University of Massachusetts Amherst

Focus This paper presents a multi-agent learning (MAL) approach to address resource sharing in cluster networks. – Exploit unknown task arrival patterns Problem characteristics: – Realistic – Multiple agents – Partial observability – No global reward signal – Communication delay – Two interacting learning problems

Increasing Computing Demands “Software as a service” is becomeing a popular IT business model. Challenging to build large computing infrastructure to host such wide-spread online services.

A Potentially Cost-Effective Solution Shared clusters – Built using commodity PCs or workstations. – Running the number of applications significantly larger than the number of nodes … Resource manager A dedicated cluster … Resource manager A shared cluster [Arpacidusseau and Culler, 1997; Aron et al., 2000; Urgaonkar and Shenoy 2003]

Building Larger, Scalable Computing Infrastructures Centralized resource management limits the size of shared clusters. Organizing shared clusters into a network and sharing resource across clusters. How to efficiently share resources within a cluster network? Shared Cluster

Outline Problem Formulation Fair Action Learning Algorithm Learning Distributed Resource Allocation – Local Allocation Decision – Task Routing Decision Experimental Results Summary

Problem Formulation A distributed sequential resource allocation problem (DSRAP) is denoted as a tuple : – C = { C 1, …, C m } is a set of agents (or clusters) – A = { a ij } m x m is the adjacent matrix of agents and a ij is the task transfer time from C i to C j – T = { t 1, …, t l } is a set of task types – R = { R 1, …, R q } is a set of resource types – B = { D ij } l x m is the task arrival pattern and D ij is the arrival distribution of tasks of type t i at C j

Problem Description: Cluster Network C1C1 C6C6 C2C2 C4C4 C3C3 C5C5 C7C7 C9C9 C8C8 C 10 a 12 a 13 a 24 a 46 a 25 a 35 a 56 a 37 a 79 a 69 a 68 a 8, 10 a 9, 10 … Computing node Resource Cluster R1R1 R2R2 R3R3

Problem Formulation: Static Model (Cont.) Each agent C i = { n i1, …, n ik } contains a set of computing nodes. Each node n ij provides a set of resources, represented as {, …, }, where v ijh is the capacity of resource R h type on node n ij. We assume there exists standards that quantify each type of resource.

Problem Description: Task A task is denoted as a tuple, where – t is the task type – u is the utility rate of the task – w is the maximum waiting time before being allocated – d i is the demand for resource i = 1, …, q.

Problem Description: Task Type A task type characterizes a set of tasks, each of whose feature components follows a common distribution. A task type t is denoted as a tuple, where – D t s is the task service time distribution – D t u is the distribution of utility rate – D t w is the distribution of the maximum waiting time – D t di is the distribution of the demand for resource i = 1, …, q.

Local Task Allocation Decision-Making Individual Agent’s Decision-Makings Task Routing Decision-Making Local Resource Scheduling Local Resource Scheduling Local Task Allocation Decision-Making Task Routing Decision-Making Task Routing Decision-Making Existing cluster resource scheduling algorithm Tasks to be allocated locallyTasks not allocated locally Task Set T T2T2 T3T3 T4T4

Problem Goal The main goal is to derive decision policies for each agent that maximize the average utility rate (AUR) of the whole system. Note that, due to its partial view of the system, each individual cluster can only observe its local utility rate, but not the system's utility rate.

Multi-Agent Reinforcement Learning (MARL) In a multi-agent setting, all agents are concurrently learning their policies. The environment becomes non-stationary from the perspective of an individual agent. Single-agent reinforcement learning algorithms may diverge due to lack of synchronization. Several MARL algorithms are proposed. – GIGA, GIGA-Wolf, WPL, etc.

Fair Action Learning (FAL) Algorithm We usually don’t know the exact policy gradient used by GIGA in practical problems. FAL is a direct policy search technique. FAL is a variant of GIGA, using an easily-calculable, approximate policy gradient. Policy gradient GIGA’s normalization function

Local Task Allocation Decision-Making Individual Agent’s Decision-Makings Task Routing Decision-Making Local Resource Scheduling Local Resource Scheduling Local Task Allocation Decision-Making Tasks to be allocated locallyTasks not allocated locally Task Set T T2T2 T3T3 T4T4

Local Task Allocation Decision-Making Select a subset of received tasks to be allocated locally to maximize its local utility rate. Potentially improve the global utility rate. Use an incrementally selecting algorithm. selected := Ø allocable := getAllocable(tasks) t := selectTask(Allocable) t = nil selected := selected  {t} tasks := tasks \ {t} selected := selected  {t} tasks := tasks \ {t} learn() Return selected Yes No

Learning Local Task Allocation Learning model: – State: features describing both tasks to be allocated and availability of local resources – Action: selecting a task – Reward for selecting task a at state s Due to partial observation, each agent uses FAL to learn a stochastic policy. Q-learning is used to update the value function

Accelerating the Learning Process Reasons – Extremely large policy search space – Non-stationary learning environment – Avoid poor initial policies in practical systems Techniques – Initialize policies with a greedy allocation algorithm – Set utilization threshold for conducting ε-greedy exploration – Limit the exploration rate for selecting nil task

Local Task Allocation Decision-Making Individual Agent’s Decision-Makings Task Routing Decision-Making Local Resource Scheduling Local Resource Scheduling Tasks to be allocated locallyTasks not allocated locally Task Set T T2T2 T3T3 T4T4

Task Routing Decision-Making To which neighbor should an agent forward an unallocated task to get it to an unsaturated cluster before it expires? Learn to route tasks via interacting with its neighbors The learning objective is to maximize the probability of each task to be allocated in the system. C1C1 C6C6 C2C2 C4C4 C3C3 C5C5 Task

Learning Task Routing State s x is defined by the characteristics of the current task x that an agent is forwarding. An action j corresponds to choosing neighbor j for forwarding a task. Reward is the allocation probability of task x forwarded to neighbor j : The probability that j allocates x locally The allocation probability of x forwarded by j Routing policy of j

Learning Task Routing (cont.) The local allocation probability Q i ( s x, j ) is the expected probability that the task x will be allocated if an agent i forwards it to its neighbor j. Q-learning is used to update the value function. FAL is used to learn the task routing policy.

Dual Exploration i i j j s s d d r ( s x, i ) r ( s x, j ) Q i ( s x, j ) Q j ( s x, i ) x Forward Exploration Backward Exploration [Kumer and Miikkulainen, 1999]

Experiments: Compared Approaches Distributed approaches: A centralized approach – using best-first algorithm with a global view – ignore the communication delay – sometimes generating optimal allocation

Experimental Setup Cluster network with heterogeneous clusters and heterogeneous computing nodes (total 1024 nodes) Four types of tasks: ordinary, IO-intensive, compute- intensive, and demanding Two task arrival patterns: light load and heavy load

Experimental Result: Light Load

Experimental Result: Heavy Load

Summary This paper presents a multi-agent learning (MAL) approach to address resource sharing in cluster networks for building large computing infrastructure. Experimental results are encouraging. This work plausibly suggests that MAL may be a promising approach to online optimization problems in distributed systems.