Distributed Computing 9. Sorting - a lower bound on bit complexity Shmuel Zaks ©
Michael C. Loui, The Complexity of Sorting on Distributed Systems, 1984
The problem
Model Asynchronous network N identical processors on a ring Known topology No failures Bounded memory O(logN) Bounded message size O(logN)
N=8 L=100 p0p1 p2 p3 p5 p7 p6 IV(2) = 34 IV(3) = 90 IV(6)=16 IV(7)=4 IV(1) =29IV(0) = 45 p4 IV(4) = 8IV(5) = 28 p0p1 p2 p3 p5 p7 p6 FV(2) = 90 FV(3) = 4 FV(6)=28 FV(7)=29 FV(1) =45FV(0) = 34 p4 FV(4) = 8 FV(5) = 16
Upper bound
Simple algorithm Program for i: Phase 1 : Select a leader Phase 2 if leader then send(Sorted( {IV{i} ) else [ receive Sorted ({IV(leader), IV(leader+1) … IV(i - 1)}) send Sorted( {IV(leader), IV(leader+1) … IV(i) } ) ]
Phase 3: receive Sorted(S) FV(i) = min( S ) send Sorted( S \ FV(i))
Phase 1: find a leader. This will be the base b=0. Phase 2: b sends its value. each processor i at its turn receives the sorted values IV(0),…IV(i-1), and sends the sorted values IV(0),…IV(i) Phase 3: each i (starting at i=0) receives the suffix of length N-i of the sorted list, and send the suffix of length N-(i+1) of the sorted list. At the end each I holds the i+1 st value
Phase 1: O(N logN) messages of size O(logL) each Phase 2: i sends i+1 messages of size O(logL) Phase 3: i sends N-(i+1) messages of size O(logL) Hence, i sends at phases 2 and 3 N messages of size O(log L)
Messages: O( N log N) + O ( N 2 ) = O ( N 2 ) Bits: O (N 2 log L ) Time: O(N)
Optimal algorithm Decode the sorted sequence more efficiently Decode a sequence S = (a 1, …, a k ) by (a 1 – a 0, a 2 – a 1, …, a k – a k-1 ) (assume a 0 =0)
For a given S = (a 1, …, a k ), decode (a 1 – a 0, a 2 – a 1, …, a k – a k-1 ). write a i – a i-1 in binary, and then replace 0 by 00 1 by 10, by 01 ( and ) by 11 E(S) is the resulting binary string
Length of E(S) ( Better than O(NlogL) ) (prove)
insert and delete_min of E(S) a1a1... a i+1 -a i a i -a i-1 a n -a n-1... a1a1 a i -a i-1 a n -a n-1... a i+1 - bb - a i a 1 a 2 -a 1 a n -a n-1... a n -a n-1... a2a2 insert b, Delete_min
Optimal algorithm Program for i: Phase 1 : Select a leader Phase 2 if leader then send Encode( {IV{i} ) else [ receive SortedEncoding({IV(leader), IV(leader+1)… IV(i - 1)}) send SortedEncoding( {IV(leader), IV(leader+1) … IV(i) } ) ]
Phase 3: receive SortedEncoding(S) FV(i) = min( S ) send SortedEncoding( S \ FV(i))
complexity Phase 2 and 3 each processor sends a decoding of a sequence of length at most k, hence the bit complexity is In the model the memory is bounded by O(logN), hence L=O(logN), and the message complexity is
Lower bound
Sketch: find many distributions of initial value so that N/4 values will have to travel a distance of at least N/16.
SBS – set of finite sequences of binary strings where each component is non-empty Lemma 1: Less than 4 b+1 /6 sequences in SBS have at most b bits n(b) – number of sequences with b bits
p0p1 p2 p3 p4 p5 p8 p7 p6 P1 P2 IV(0)IV(1) IV(2) IV(3) IV(4) IV(5) IV(6) IV(7) IV(8) c1m1c1m2c2m3c1m4
S1S1 S2S2 S2S2 p.
Example: N=17 L=51. (R=3) p0p1 p2 p3 p4 p15 p5 p6 p7 p8 p9 p16 p14 p13 p12 11p p10 0,1,2 3,4,5 30,31,32 6,7,8 33,34,35 9,10,11 36,37,38 12,13,14 39,40,41 15,16,17 42,43,44 18,19,20 45,46,47 21,22,23 48,49,50 24,25,26 27,28,29
p0p1 p2 p3 p4 p15 p5 p6 p7 p8 p9 p16 p14 p13 p12 11p p10 0,1,2 3,4,5 30,31,32 6,7,8 33,34,35 9,10,11 36,37,38 12,13,14 39,40,41 15,16,17 42,43,44 18,19,20 45,46,47 21,22,23 48,49,50 24,25,26 27,28,29 If b is chosen as the base, then the destination of IV(p) is
Pigeonhole principle If you put n pigeons in k holes, then at least one hole Will contain at least N/k pigeons If the sum of k numbers is N, then at least one of them is at least N/k
s t r q p1p1 p2p2
s t r q p1p1 p2p2
q = 15 r = 1 s = 3 t = 13 p0p1 p2 p3 p4 p15 p5 p6 p7 p8 p9 p16 p14 p13 p12 11p p10 0,1,2 3,4,5 30,31,32 6,7,8 33,34,35 9,10,11 36,37,38 12,13,14 39,40,41 15,16,17 42,43,44 18,19,20 45,46,47 21,22,23 48,49,50 24,25,26 27,28,29 r s q t
P1P1 P2P2 P1P1 P2P2
Summing over all cuts:
Hence there exists a distribution d* such that Namely, there exists a distribution s.t. the total number of bits sent over the cut c is
Notes: - The proof applies also to synchronous systems -The memory at each processor can be unbounded, or even infinite. -By the model each message is bounded by O(logN), thus the lower bound on the number of messages is