Decentralised load balancing in closed and open systems A. J. Ganesh University of Bristol Joint work with S. Lilienthal, D. Manjunath, A. Proutiere and.

Slides:



Advertisements
Similar presentations
1+eps-Approximate Sparse Recovery Eric Price MIT David Woodruff IBM Almaden.
Advertisements

The Average Case Complexity of Counting Distinct Elements David Woodruff IBM Almaden.
Mobility Increase the Capacity of Ad-hoc Wireless Network Matthias Gossglauser / David Tse Infocom 2001.
On Complexity, Sampling, and -Nets and -Samples. Range Spaces A range space is a pair, where is a ground set, it’s elements called points and is a family.
Fast Convergence of Selfish Re-Routing Eyal Even-Dar, Tel-Aviv University Yishay Mansour, Tel-Aviv University.
Queueing Models and Ergodicity. 2 Purpose Simulation is often used in the analysis of queueing models. A simple but typical queueing model: Queueing models.
Short-Term Fairness and Long- Term QoS Lei Ying ECE dept, Iowa State University, Joint work with Bo Tan, UIUC and R. Srikant, UIUC.
Resource Allocation in Wireless Networks: Dynamics and Complexity R. Srikant Department of ECE and CSL University of Illinois at Urbana-Champaign.
EE 685 presentation Optimal Control of Wireless Networks with Finite Buffers By Long Bao Le, Eytan Modiano and Ness B. Shroff.
DYNAMIC POWER ALLOCATION AND ROUTING FOR TIME-VARYING WIRELESS NETWORKS Michael J. Neely, Eytan Modiano and Charles E.Rohrs Presented by Ruogu Li Department.
Markov Chains 1.
TCOM 501: Networking Theory & Fundamentals
1 Part III Markov Chains & Queueing Systems 10.Discrete-Time Markov Chains 11.Stationary Distributions & Limiting Probabilities 12.State Classification.
Entropy Rates of a Stochastic Process
Modelling and Performance Analysis of BitTorrent-Like Peer-to-Peer Networks.
TCP Stability and Resource Allocation: Part II. Issues with TCP Round-trip bias Instability under large bandwidth-delay product Transient performance.
Visual Recognition Tutorial
Oblivious Routing for the L p -norm Matthias Englert Harald Räcke 1.
1 Cooperative Communications in Networks: Random coding for wireless multicast Brooke Shrader and Anthony Ephremides University of Maryland October, 2008.
ECS 152A Acknowledgement: slides from S. Kalyanaraman & B.Sikdar
*Sponsored in part by the DARPA IT-MANET Program, NSF OCE Opportunistic Scheduling with Reliability Guarantees in Cognitive Radio Networks Rahul.
Network Bandwidth Allocation (and Stability) In Three Acts.
1 A Class Of Mean Field Interaction Models for Computer and Communication Systems Jean-Yves Le Boudec EPFL – I&C – LCA Joint work with Michel Benaïm.
Queueing Theory: Part I
1 Mean Field Interaction Models for Computer and Communication Systems and the Decoupling Assumption Jean-Yves Le Boudec EPFL – I&C – LCA Joint work with.
Little’s Theorem Examples Courtesy of: Dr. Abdul Waheed (previous instructor at COE)
Fundamental Characteristics of Queues with Fluctuating Load VARUN GUPTA Joint with: Mor Harchol-Balter Carnegie Mellon Univ. Alan Scheller-Wolf Carnegie.
1 A Class Of Mean Field Interaction Models for Computer and Communication Systems Jean-Yves Le Boudec EPFL – I&C – LCA Joint work with Michel Benaïm.
Rensselaer Polytechnic Institute © Shivkumar Kalvanaraman & © Biplab Sikdar1 ECSE-4730: Computer Communication Networks (CCN) Network Layer Performance.
Combining Multipath Routing and Congestion Control for Robustness Peter Key.
Using ranking and DCE data to value health states on the QALY scale using conventional and Bayesian methods Theresa Cain.
Queueing Theory.
Queuing Networks: Burke’s Theorem, Kleinrock’s Approximation, and Jackson’s Theorem Wade Trappe.
Stability and Fairness of Service Networks Jean Walrand – U.C. Berkeley Joint work with A. Dimakis, R. Gupta, and J. Musacchio.
Fundamental Characteristics of Queues with Fluctuating Load (appeared in SIGMETRICS 2006) VARUN GUPTA Joint with: Mor Harchol-Balter Carnegie Mellon Univ.
Dimitrios Konstantas, Evangelos Grigoroudis, Vassilis S. Kouikoglou and Stratos Ioannidis Department of Production Engineering and Management Technical.
6. Markov Chain. State Space The state space is the set of values a random variable X can take. E.g.: integer 1 to 6 in a dice experiment, or the locations.
Distributed resource allocation in wireless data networks: Performance and design Alexandre Proutière Orange-FT / ENS Paris.
Queueing Theory I. Summary Little’s Law Queueing System Notation Stationary Analysis of Elementary Queueing Systems  M/M/1  M/M/m  M/M/1/K  …
1 Chapter 5 Flow Lines Types Issues in Design and Operation Models of Asynchronous Lines –Infinite or Finite Buffers Models of Synchronous (Indexing) Lines.
MIT Fun queues for MIT The importance of queues When do queues appear? –Systems in which some serving entities provide some service in a shared.
Message-Passing for Wireless Scheduling: an Experimental Study Paolo Giaccone (Politecnico di Torino) Devavrat Shah (MIT) ICCCN 2010 – Zurich August 2.
NETE4631:Capacity Planning (2)- Lecture 10 Suronapee Phoomvuthisarn, Ph.D. /
1 ECE-517 Reinforcement Learning in Artificial Intelligence Lecture 7: Finite Horizon MDPs, Dynamic Programming Dr. Itamar Arel College of Engineering.
Flows and Networks Plan for today (lecture 4): Last time / Questions? Output simple queue Tandem network Jackson network: definition Jackson network: equilibrium.
1 Queueing Theory Frank Y. S. Lin Information Management Dept. National Taiwan University
1 Elements of Queuing Theory The queuing model –Core components; –Notation; –Parameters and performance measures –Characteristics; Markov Process –Discrete-time.
Modeling and Analysis of Computer Networks
Queuing Networks Jean-Yves Le Boudec 1. Contents 1.The Class of Multi-Class Product Form Networks 2.The Elements of a Product-Form Network 3.The Product-Form.
Lecture #11 Stability of switched system: Arbitrary switching João P. Hespanha University of California at Santa Barbara Hybrid Control and Switched Systems.
Network Design and Analysis-----Wang Wenjie Queuing Theory III: 1 © Graduate University, Chinese academy of Sciences. Network Design and Performance Analysis.
Why Wait?!? Bryan Gorney Joe Walker Dave Mertz Josh Staidl Matt Boche.
7-1 Introduction to Queueing Theory l Components of a queueing system n probability density function (pdf) of interarrival times n pdf of service times.
Generalized Semi- Markov Processes (GSMP). Summary Some Definitions The Poisson Process Properties of the Poisson Process  Interarrival times  Memoryless.
Chapter 3 DeGroot & Schervish. Functions of a Random Variable the distribution of some function of X suppose X is the rate at which customers are served.
Order Optimal Delay for Opportunistic Scheduling In Multi-User Wireless Uplinks and Downlinks Michael J. Neely University of Southern California
Maciej Stasiak, Mariusz Głąbowski Arkadiusz Wiśniewski, Piotr Zwierzykowski Model of the Nodes in the Packet Network Chapter 10.
1 On the Channel Capacity of Wireless Fading Channels C. D. Charalambous and S. Z. Denic School of Information Technology and Engineering, University of.
Energy Optimal Control for Time Varying Wireless Networks Michael J. Neely University of Southern California
BSnetworks.pptTKK/ComNet Research Seminar, SRPT Applied to Bandwidth Sharing Networks (to appear in Annals of Operations Research) Samuli Aalto.
Flows and Networks Plan for today (lecture 3): Last time / Questions? Output simple queue Tandem network Jackson network: definition Jackson network: equilibrium.
Flows and Networks Plan for today (lecture 6): Last time / Questions? Kelly / Whittle network Optimal design of a Kelly / Whittle network: optimisation.
Random Variables r Random variables define a real valued function over a sample space. r The value of a random variable is determined by the outcome of.
R. Srikant University of Illinois at Urbana-Champaign
Lecture on Markov Chain
Lecture 4: Algorithmic Methods for G/M/1 and M/G/1 type models
Jean-Yves Le Boudec EPFL – I&C – LCA Joint work with Michel Benaïm
Queueing networks.
Javad Ghaderi, Tianxiong Ji and R. Srikant
Optimal Control for Generalized Network-Flow Problems
Presentation transcript:

Decentralised load balancing in closed and open systems A. J. Ganesh University of Bristol Joint work with S. Lilienthal, D. Manjunath, A. Proutiere and F. Simatos

Model Fixed set of m servers Closed system Fixed set of n clients Open system Clients arrive according to independent Poisson processes of rates 1,…, m Exponential job sizes, iid with unit mean Service rates are  1,…,  m Processor sharing service discipline

Objective Closed system Balance the server loads Open system Maximise throughput Minimise delay Seek decentralised algorithms a client can sample an arbitrary server and decide to move based on the loads at its current and sampled servers

Motivation Dynamic spectrum access in wireless servers are channels Multipath TCP or dynamic routing servers are routes Route choice in transport networks servers are routes All are examples of congestion games time to reach Nash equilibrium

Algorithm 1: Random local search (RLS) Clients pick servers uniformly at random according to independent unit rate Poisson processes Move if it would strictly improve their individual service rate (= rate of server divided by its load)

Algorithm 2: Random load oblivious (RLO) Clients are impatient and simply perform independent random walks over the servers until the leave Random walk described by continuous time Markov chain with rate matrix Q and invariant distribution  >0 Moves are oblivious of server load

Related work: a synchronous model Berenbrink et al. (2005) At each time step, each client picks a server at random If load at current server is A and at new server is B<A, then moves with probability (A  B)/A

Previous results: Closed systems Expected time to reach load balance in asynchronous model is O(m 2 ): Goldberg (2004) Expected time to reach balance in synchronous model is O(loglog(m) + n 4 ): Berenbrink et al. (2005): O(log(m) + nlog(n)): Berenbrink et al. (2007)

Our results Closed systems Time to reach perfect balance is O(m 2 log(m)/n + log 2 (m)) Time to reach  -balance is O(log(m)/  ) Open systems Both RLS and RLO are throughput maximising: system is stable whenever  i   i

Notation and definitions N(t) = (N 1 (t),…,N m (t)) : number of clients at servers 1,…,m at time t N(t) is balanced if |N i (t)-N j (t)|  1 for all i and j N(t) is  -balanced if (1  )p  N i (t)  (1+  )p for all i, where p=n/m  = time to reach balance   = time to reach  -balance

Notation and definitions V(t) = max j N j (t) C v (t) = number of servers with exactly v clients at time t B v (t) = C v  1 (t) A v (t) = number of servers with v  2 or fewer clients at time t

Results for closed systems E[  ] = O(m 2 log(m)/n + log 2 (m)) E[   ] = O(log(m)/  ) E[  ] =  (m 2 /n + log(m)) Typically interested in n >> m

Proof (perfect balance) Previous work used quadratic Lyapunov functions We use V(t) as Lyapunov function Say RLS algorithm is in phase v at time t if V(t)=v C v (t) decreases monotonically during phase v Phase v ends when C v (t) hits 0

Proof (cont.) C v decreases by 1 when one of the vC v clients at a maximally loaded server samples one of the A v servers with v  2 or fewer clients This happens at rate vC v A v /m Lower bound for A v : no more than n/(v  1) servers can have v  1 or more clients Implies upper bound on mean time for C v to decrease by 1 and hence for V to decrease by 1

Proof (  -balance) Involves counting the number of  - balanced, underloaded and overloaded servers, and the number of clients at overloaded servers, and using these to bound the expected time till one such client moves to an underloaded or  -balanced server

Stability results for closed systems If  i    i, then the system is stable under both RLO and RLS policies

Proof of stability for RLO algorithm Proof uses Foster’s criterion, with the total number of clients in the system as Lyapunov function Denote by |x| the L 1 -norm of vector x |N(t)| is the total number of clients in system at time t | | is the total arrival rate |  | is the maximum service

Foster’s criterion Suppose there exist K,  and t>0 such that E n [|N(t)|  |n|] K Then N(t) is ergodic

Bounding the drift E n [|N(t)|  |n|] = t    i E[Y i (t)] where Y i (t) is the time up to time t that server i is non-idle (has at least 1 client) If E[Y i (t)] is very nearly equal to t, then have Foster’s criterion from condition  Need a lower bound on Y i (t) to get an upper bound on the drift

Bounding the idle time Clients perform independent random walks on system, but don’t leave Independent rate  i Poisson processes of `virtual’ services at servers If number of clients at server i at time t is more than the total number of virtual services at all servers on [0,t], then queue i has to be non-empty at time t

Bounding the idle time (cont.) Suppose |n| is large Markov chain describing random walks reaches equilibrium in constant time Number of clients at each server is  (|n|) from this time Number of virtual services is O(1)

Proof of stability for RLS algorithm Uses a slightly different Lyapunov function f(n) = |n| +  k(n) for suitably small  >0, where k(n) is the number of empty servers in state n

Performance estimates in open systems Consider large m asymptotics X k m (t): proportion of servers with exactly k clients at time t X m (t) evolves as density dependent Markov process By Kurtz’s theorem, evolution converges to solution of deterministic differential equation over finite time-horizons

Performance estimates in open systems (cont.) Idea: look at equilibrium points of deterministic dynamics If there is a unique stable equilibrium, we expect that stochastic dynamics will live in vicinity of this equilibrium Use the equilibrium to derive performance measures

In more detail … Kurtz’s theorem only applies for finite time horizons Doesn’t tell us about long-term behaviour Can get around this by using propagation of chaos techniques developed by Snitzman

Numerical results Asymptotic estimates pretty accurate even for small m, say m =10 RLO is only a little bit worse than RLS in terms of mean delay (about 20% worse in parameter range considered)

Conclusions Random local search balances loads very quickly in closed systems polylog in number of servers Impatience is a virtue impatient customers help to balance load and achieve resources pooling, even if they migrate oblivious of load

Open problems Have assumed all clients can use all servers, and also that they can move between any pair of servers What if clients can only move from a server to its neighbours in some graph? What if clients are of different types, and each type can only use a subset of the servers?

Open problems Suppose clients can only migrate to neighbouring servers in a graph Can the time to balance loads be related to mixing times of random walks on this graph?

Open problems Performance measures in open systems obtained in terms of equilibrium points of a differential equation Is there perfect resource pooling in heavy traffic limit? Can we get tail bounds on delays? What if clients can use multiple servers simultaneously?