Download presentation
Presentation is loading. Please wait.
Published byGwenda Hopkins Modified over 9 years ago
1
Decentralised load balancing in closed and open systems A. J. Ganesh University of Bristol Joint work with S. Lilienthal, D. Manjunath, A. Proutiere and F. Simatos
2
Model Fixed set of m servers Closed system Fixed set of n clients Open system Clients arrive according to independent Poisson processes of rates 1,…, m Exponential job sizes, iid with unit mean Service rates are 1,…, m Processor sharing service discipline
3
Objective Closed system Balance the server loads Open system Maximise throughput Minimise delay Seek decentralised algorithms a client can sample an arbitrary server and decide to move based on the loads at its current and sampled servers
4
Motivation Dynamic spectrum access in wireless servers are channels Multipath TCP or dynamic routing servers are routes Route choice in transport networks servers are routes All are examples of congestion games time to reach Nash equilibrium
5
Algorithm 1: Random local search (RLS) Clients pick servers uniformly at random according to independent unit rate Poisson processes Move if it would strictly improve their individual service rate (= rate of server divided by its load)
6
Algorithm 2: Random load oblivious (RLO) Clients are impatient and simply perform independent random walks over the servers until the leave Random walk described by continuous time Markov chain with rate matrix Q and invariant distribution >0 Moves are oblivious of server load
7
Related work: a synchronous model Berenbrink et al. (2005) At each time step, each client picks a server at random If load at current server is A and at new server is B<A, then moves with probability (A B)/A
8
Previous results: Closed systems Expected time to reach load balance in asynchronous model is O(m 2 ): Goldberg (2004) Expected time to reach balance in synchronous model is O(loglog(m) + n 4 ): Berenbrink et al. (2005): O(log(m) + nlog(n)): Berenbrink et al. (2007)
9
Our results Closed systems Time to reach perfect balance is O(m 2 log(m)/n + log 2 (m)) Time to reach -balance is O(log(m)/ ) Open systems Both RLS and RLO are throughput maximising: system is stable whenever i i
10
Notation and definitions N(t) = (N 1 (t),…,N m (t)) : number of clients at servers 1,…,m at time t N(t) is balanced if |N i (t)-N j (t)| 1 for all i and j N(t) is -balanced if (1 )p N i (t) (1+ )p for all i, where p=n/m = time to reach balance = time to reach -balance
11
Notation and definitions V(t) = max j N j (t) C v (t) = number of servers with exactly v clients at time t B v (t) = C v 1 (t) A v (t) = number of servers with v 2 or fewer clients at time t
12
Results for closed systems E[ ] = O(m 2 log(m)/n + log 2 (m)) E[ ] = O(log(m)/ ) E[ ] = (m 2 /n + log(m)) Typically interested in n >> m
13
Proof (perfect balance) Previous work used quadratic Lyapunov functions We use V(t) as Lyapunov function Say RLS algorithm is in phase v at time t if V(t)=v C v (t) decreases monotonically during phase v Phase v ends when C v (t) hits 0
14
Proof (cont.) C v decreases by 1 when one of the vC v clients at a maximally loaded server samples one of the A v servers with v 2 or fewer clients This happens at rate vC v A v /m Lower bound for A v : no more than n/(v 1) servers can have v 1 or more clients Implies upper bound on mean time for C v to decrease by 1 and hence for V to decrease by 1
15
Proof ( -balance) Involves counting the number of - balanced, underloaded and overloaded servers, and the number of clients at overloaded servers, and using these to bound the expected time till one such client moves to an underloaded or -balanced server
16
Stability results for closed systems If i i, then the system is stable under both RLO and RLS policies
17
Proof of stability for RLO algorithm Proof uses Foster’s criterion, with the total number of clients in the system as Lyapunov function Denote by |x| the L 1 -norm of vector x |N(t)| is the total number of clients in system at time t | | is the total arrival rate | | is the maximum service
18
Foster’s criterion Suppose there exist K, and t>0 such that E n [|N(t)| |n|] K Then N(t) is ergodic
19
Bounding the drift E n [|N(t)| |n|] = t i E[Y i (t)] where Y i (t) is the time up to time t that server i is non-idle (has at least 1 client) If E[Y i (t)] is very nearly equal to t, then have Foster’s criterion from condition Need a lower bound on Y i (t) to get an upper bound on the drift
20
Bounding the idle time Clients perform independent random walks on system, but don’t leave Independent rate i Poisson processes of `virtual’ services at servers If number of clients at server i at time t is more than the total number of virtual services at all servers on [0,t], then queue i has to be non-empty at time t
21
Bounding the idle time (cont.) Suppose |n| is large Markov chain describing random walks reaches equilibrium in constant time Number of clients at each server is (|n|) from this time Number of virtual services is O(1)
22
Proof of stability for RLS algorithm Uses a slightly different Lyapunov function f(n) = |n| + k(n) for suitably small >0, where k(n) is the number of empty servers in state n
23
Performance estimates in open systems Consider large m asymptotics X k m (t): proportion of servers with exactly k clients at time t X m (t) evolves as density dependent Markov process By Kurtz’s theorem, evolution converges to solution of deterministic differential equation over finite time-horizons
24
Performance estimates in open systems (cont.) Idea: look at equilibrium points of deterministic dynamics If there is a unique stable equilibrium, we expect that stochastic dynamics will live in vicinity of this equilibrium Use the equilibrium to derive performance measures
25
In more detail … Kurtz’s theorem only applies for finite time horizons Doesn’t tell us about long-term behaviour Can get around this by using propagation of chaos techniques developed by Snitzman
26
Numerical results Asymptotic estimates pretty accurate even for small m, say m =10 RLO is only a little bit worse than RLS in terms of mean delay (about 20% worse in parameter range considered)
27
Conclusions Random local search balances loads very quickly in closed systems polylog in number of servers Impatience is a virtue impatient customers help to balance load and achieve resources pooling, even if they migrate oblivious of load
28
Open problems Have assumed all clients can use all servers, and also that they can move between any pair of servers What if clients can only move from a server to its neighbours in some graph? What if clients are of different types, and each type can only use a subset of the servers?
29
Open problems Suppose clients can only migrate to neighbouring servers in a graph Can the time to balance loads be related to mixing times of random walks on this graph?
30
Open problems Performance measures in open systems obtained in terms of equilibrium points of a differential equation Is there perfect resource pooling in heavy traffic limit? Can we get tail bounds on delays? What if clients can use multiple servers simultaneously?
Similar presentations
© 2024 SlidePlayer.com. Inc.
All rights reserved.