1 Rateless codes and random walks for P2P resource discovery in Grids IEEE TRANSACTIONS ON PARALLEL AND DISTRIBUTED SYSTEMS, NOV Valerio Bioglio Rossano Gaeta Marco Grangetto Matteo Sereno
2 Outline ﻪIntroduction ﻪRelated Work ﻪProposed System ﻪAnalysis ﻪSimulation Results ﻪConclusion
Introduction ﻪThe system is presented as a set of nodes connected to form a P2P network. ﻩeach node contains a piece of information. ﻩall nodes may leave or join dynamically. ﻪA peer to obtain a local view of global information defined on all peers of a P2P unstructured network. ﻪEvery node must communicate to all the participants so as to obtain the information of other peers. 3
Introduction ﻪMany proposals exploiting unstructured P2P systems share a common characteristic : ﻩThe interface peers ﻯhave one administrative domain ﻯconnect to other interface peers ﻯmaintain data of their local nodes ﻪThis paper assume ﻩeach peer holds a piece of information. ﻩany peer requires to access the data of all other peers at rate λ queries/sec. 4
Introduction ﻪThe goals to be achieved are threefold : 1.The complete global information can be collect by every node. 2.The communication overhead must be limited. 3.The processing power of each node must be used parsimoniously. 5
Contribution 1.A continuous flow of control packets exchanged among the nodes using the random walk principle. 2.The information combined by each node has to be the same version. 3.The proposed solution is suitable for large size data held by each node. 6
7 Outline ﻪIntroduction ﻪRelated Work ﻪProposed System ﻪAnalysis ﻪSimulation Results ﻪConclusion
Related Work (1/2) ﻪThe flow control used by [6] on the maximum rate at which a participant can submit updates without creating a backlog and devises content reconciliation mechanisms to reduce message redundancy. ﻪAlgebraic Gossip, proposed in [11], in this paper a gossip algorithm based on Network Coding is presented, and it is proved that the spreading time of this algorithm is O(K). 8 [6] “Efficient reconciliation and flow control for anti-entropy protocols,” in Proceedings of the 2nd Workshop on Large-Scale Distributed Systems and Middleware, LADIS ’08. ACM, [11] “Algebraic gossip: a network coding approach to optimal multiple rumor mongering,” IEEE Transactions on Information Theory, vol. 52, no. 6, pp. 2486–2507, JUN 2006.
Related Work (2/2) ﻪIn [13] distributed fountain codes are proposed for networked storage. To create a new encoded packet, each storage node asks information to a randomly selected node of the network. ﻪA similar algorithm is proposed in [14], but the coded packet formation mechanism is reversed. ﻪThe nodes cope with the information gathering and the encoding operations; in [16] this responsibility is assigned to the packets. 9 [13] “Bistributed fountain codes for networked storage,” in IEEE ICASSP, [14] “Data persistence in large-scale sensor networks with decentralized fountain codes,” in IEEE Infocom, [16] “Rateless packet approach for data gathering in wireless sensor networks,” Selected Areas in Communications, IEEE Journal on, vol. 28, no. 9, pp. 1169–1179, Sep
10 Outline ﻪIntroduction ﻪRelated Work ﻪProposed System ﻪAnalysis ﻪSimulation Results ﻪConclusion
System Description (1/3) 11
System Description (2/3) ﻪTo realize a concurrent broadcasting of all the information collected by all the nodes in the network. ﻩall nodes should communicate with each other. ﻪThis paper proposes a fully distributed solution based on random walks. ﻩeach node starts a limited number ω of packets. ﻩthose packets are propagated by random walk in the network. ﻩall the nodes use the packets to solve a system of linear equations. 12
System Description (3/3) ﻪThe shortcomings of network coding 1.The added computational complexity ﻯSolution ﻳusing simple combinations XOR ﻳusing rateless codes, known as LT codes 2.The impossibility of asynchronous updating ﻯSolution ﻳasynchronous updating 13 Node A Node B
Random Walk and LT Coding 14 Header dFdF didi v1v1 t1t1 v2v2 t2t2 v3v3 t3t3 v4v4 t4t4 c eq 1 eq 2
Random Walk and LT Coding ﻪWhen a packet approaches the maximum dimension DIM, the eldest equation carried by it is deleted. ﻪWhen the acknowledgement timer reaches 0 the receiving node acknowledges the originator that its random walker is still alive. 15
Asynchronous Update and LT Coding (1/3) ﻪThe information spread by the random walkers can be recovered by any node as soon as the number of equations has been collected. ﻪThe decoder task can be formulated as the solution of the following system of linear equations Gx = c. ﻩG is an N×N binary matrix. ﻯrows : N possible independent equations collected by the node ﻩx is N×1 column vectors. ﻯN unknown pieces of information ﻩc is the corresponding buffered linear combinations. 16
Asynchronous Update and LT Coding (2/3) ﻪThe nodes are allowed to update their information only when a new generation is initiated. ﻩthe vector x is extended to the (ν+1)·N×1 vector ˜x ﻩ˜G turns to be a (ν + 1)N×(ν + 1)N extended decoding matrix ﻪThe information collected in the network with a sliding window mechanism including the (ν+1) most recent generations for the information. 17
Asynchronous Update and LT Coding (3/3) ﻪThe idea is to keep the decoding as updated as possible aiming at reconstructing the last N elements of ˜x. ﻪThis paper proposes a strategy to manage the extended decoding matrix ˜G in order to make the decoding process robust to asynchronous updates of the information. 18
Asynchronous Update Algorithm 19 [21] V. Bioglio, M. Grangetto, R. Gaeta, and M. Sereno, “An optimal partial decoding algorithm for rateless codes,” in IEEE International Symposium on Information Theory (ISIT), aug 2011, pp –2735.
20 Outline ﻪIntroduction ﻪRelated Work ﻪProposed System ﻪAnalysis ﻪSimulation Results ﻪConclusion
Recovery Time (1/6) ﻪThe time required to spread all the local information to all the participants in the network is defined as recovery time. ﻪModel the recovery time as a function of ﻩthe size of the local information m ﻩthe number of random walkers generated per node ω ﻩthe number of nodes in the network N ﻩthe maximum size of the random walk packets DIM. 21
Recovery Time (2/6) 22
Recovery Time (3/6) ﻪWe can know that n U and n C the maximum number of equations storable in an uncoded and encoded packet are : ﻩ. 23
Recovery Time (4/6) ﻪIt is possible to predict the number of hops T C required to distribute a certain number of equations R C using the coded approach. 24
Recovery Time (5/6) ﻪN = 1000 nodes 25
Recovery Time (6/6) ﻪN = 1000, N neigh = 50, ω = 1 ﻪ95% confidence interval 26
27 Outline ﻪIntroduction ﻪRelated Work ﻪProposed System ﻪAnalysis ﻪSimulation Results ﻪConclusion
Simulation Results (1/4) ﻪIn order to simulate the real P2P circumstances in networks : 1.at each time slot 30 random nodes shuffle their neighborhood by exchanging one random neighbor. 2.when a node joins it connects to a random set of neighboring nodes. when a node leaves its neighbors replace it through the described shuffling mechanism. 3.keep constant the overall number of packets in the network ideal signaling is assumed 28
Simulation Results (2/4) ﻪFor each node v l we calculate the percentage of overall information retrieved by that node as a function of time T : 29
Simulation Results (3/4) ﻪThe average value of the previous index computed on the set of nodes A(T) that are active. ﻪAll the numerical results based on the previous definitions have been averaged over 30 independent trials so as to guarantee statistically meaningful values. 30
Simulation Results (4/4) 31
32 Outline ﻪIntroduction ﻪRelated Work ﻪProposed System ﻪAnalysis ﻪSimulation Results ﻪConclusion
Conclusion ﻪThe design of a novel decoder for rateless codes that is robust to asynchronous updates of the information. ﻪThe development of a simple analytical model for the estimation of the time required to spread the information. ﻪThe encoded system scales better than the uncoded one when the number of nodes in the distributed system increases. 33