P2P systems: epidemic scheduling, content placement and user profiling Laurent Massoulié Thomson, Paris Research Lab.

Slides:



Advertisements
Similar presentations
Google News Personalization: Scalable Online Collaborative Filtering
Advertisements

The Capacity of Wireless Networks Danss Course, Sunday, 23/11/03.
The Capacity of Wireless Networks
Mobility Increase the Capacity of Ad-hoc Wireless Network Matthias Gossglauser / David Tse Infocom 2001.
Routing and Congestion Problems in General Networks Presented by Jun Zou CAS 744.
How to Schedule a Cascade in an Arbitrary Graph F. Chierchetti, J. Kleinberg, A. Panconesi February 2012 Presented by Emrah Cem 7301 – Advances in Social.
Community Detection with Edge Content in Social Media Networks Paper presented by Konstantinos Giannakopoulos.
Playback delay in p2p streaming systems with random packet forwarding Viktoria Fodor and Ilias Chatzidrossos Laboratory for Communication Networks School.
Bounds on Code Length Theorem: Let l ∗ 1, l ∗ 2,..., l ∗ m be optimal codeword lengths for a source distribution p and a D-ary alphabet, and let L ∗ be.
MS&E 211 Minimum Cost Flow LP Ashish Goel. Minimum Cost Flow (MCF) Need to ship some good from “supply” nodes to “demand” nodes over a network – Example:
Gossip Algorithms and Implementing a Cluster/Grid Information service MsSys Course Amar Lior and Barak Amnon.
Gossip algorithms : “infect forever” dynamics Low-level objectives: – One-to-all: Disseminate rumor from source node to all nodes of network – All-to-all:
Information Networks Graph Clustering Lecture 14.
Online Social Networks and Media. Graph partitioning The general problem – Input: a graph G=(V,E) edge (u,v) denotes similarity between u and v weighted.
Gossip Scheduling for Periodic Streams in Ad-hoc WSNs Ercan Ucan, Nathanael Thompson, Indranil Gupta Department of Computer Science University of Illinois.
Lectures on Network Flows
Analysis of Network Diffusion and Distributed Network Algorithms Rajmohan Rajaraman Northeastern University, Boston May 2012 Chennai Network Optimization.
1 Maximal Independent Set. 2 Independent Set (IS): In a graph, any set of nodes that are not adjacent.
Graph Clustering. Why graph clustering is useful? Distance matrices are graphs  as useful as any other clustering Identification of communities in social.
Lecture 21: Spectral Clustering
1 Data Persistence in Large-scale Sensor Networks with Decentralized Fountain Codes Yunfeng Lin, Ben Liang, Baochun Li INFOCOM 2007.
ZIGZAG A Peer-to-Peer Architecture for Media Streaming By Duc A. Tran, Kien A. Hua and Tai T. Do Appear on “Journal On Selected Areas in Communications,
1 Cooperative Communications in Networks: Random coding for wireless multicast Brooke Shrader and Anthony Ephremides University of Maryland October, 2008.
Network Coding for Large Scale Content Distribution Christos Gkantsidis Georgia Institute of Technology Pablo Rodriguez Microsoft Research IEEE INFOCOM.
Network Coding Project presentation Communication Theory 16:332:545 Amith Vikram Atin Kumar Jasvinder Singh Vinoo Ganesan.
An Efficient Clustering-based Heuristic for Data Gathering and Aggregation in Sensor Networks Wireless Communications and Networking (WCNC 2003). IEEE,
EDA (CS286.5b) Day 6 Partitioning: Spectral + MinCut.
P2P live streaming: optimality results and open problems Laurent Massoulié Thomson, Paris Research Lab Based on joint work with: Bruce Hajek, Sujay Sanghavi,
1 40 th Annual CISS 2006 Conference on Information Sciences and Systems Some Optimization Trade-offs in Wireless Network Coding Yalin E. Sagduyu Anthony.
Distributed Combinatorial Optimization
Optimal peer-to-peer broadcasting schemes Laurent Massoulié Thomson Research, Paris Joint work with A. Twigg, C. Gkantsidis and P. Rodriguez.
The Byzantine Generals Strike Again Danny Dolev. Introduction We’ll build on the LSP presentation. Prove a necessary and sufficient condition on the network.
Combining Multipath Routing and Congestion Control for Robustness Peter Key.
CS401 presentation1 Effective Replica Allocation in Ad Hoc Networks for Improving Data Accessibility Takahiro Hara Presented by Mingsheng Peng (Proc. IEEE.
Delay Efficient Sleep Scheduling in Wireless Sensor Networks Gang Lu, Narayanan Sadagopan, Bhaskar Krishnamachari, Anish Goel Presented by Boangoat(Bea)
Improved results for a memory allocation problem Rob van Stee University of Karlsruhe Germany Leah Epstein University of Haifa Israel WADS 2007 WAOA 2007.
Domain decomposition in parallel computing Ashok Srinivasan Florida State University COT 5410 – Spring 2004.
Communication (II) Chapter 4
EE 685 presentation Distributed Cross-layer Algorithms for the Optimal Control of Multi-hop Wireless Networks By Atilla Eryılmaz, Asuman Özdağlar, Devavrat.
Computing and Communicating Functions over Sensor Networks A.Giridhar and P. R. Kumar Presented by Srikanth Hariharan.
Dense subgraphs of random graphs Uriel Feige Weizmann Institute.
1 Introduction to Approximation Algorithms. 2 NP-completeness Do your best then.
Epidemic Dissemination & Efficient Broadcasting in Peer-to-Peer Systems Laurent Massoulié Thomson, Paris Research Lab Based on joint work with: Bruce Hajek,
1 On the Placement of Web Server Replicas Lili Qiu, Microsoft Research Venkata N. Padmanabhan, Microsoft Research Geoffrey M. Voelker, UCSD IEEE INFOCOM’2001,
© 2009 IBM Corporation 1 Improving Consolidation of Virtual Machines with Risk-aware Bandwidth Oversubscription in Compute Clouds Amir Epstein Joint work.
June 21, 2007 Minimum Interference Channel Assignment in Multi-Radio Wireless Mesh Networks Anand Prabhu Subramanian, Himanshu Gupta.
1 Network Coding and its Applications in Communication Networks Alex Sprintson Computer Engineering Group Department of Electrical and Computer Engineering.
Threshold Phenomena and Fountain Codes Amin Shokrollahi EPFL Joint work with M. Luby, R. Karp, O. Etesami.
Data Mining Practical Machine Learning Tools and Techniques Chapter 4: Algorithms: The Basic Methods Section 4.6: Linear Models Rodney Nielsen Many of.
1 On the Placement of Web Server Replicas Lili Qiu, Microsoft Research Venkata N. Padmanabhan, Microsoft Research Geoffrey M. Voelker, UCSD IEEE INFOCOM’2001,
Chapter 1. Formulations 1. Integer Programming  Mixed Integer Optimization Problem (or (Linear) Mixed Integer Program, MIP) min c’x + d’y Ax +
Some questions about multipath Damon Wischik, UCL Trilogy UCL.
Information Theory for Mobile Ad-Hoc Networks (ITMANET): The FLoWS Project Competitive Scheduling in Wireless Networks with Correlated Channel State Ozan.
Gaussian Mixture Models and Expectation-Maximization Algorithm.
On Reducing Mesh Delay for Peer- to-Peer Live Streaming Dongni Ren, Y.-T. Hillman Li, S.-H. Gary Chan Department of Computer Science and Engineering The.
1 The Encoding Complexity of Network Coding Michael Langberg California Institute of Technology Joint work with Jehoshua Bruck and Alex Sprintson.
Content caching and scheduling in wireless networks with elastic and inelastic traffic Group-VI 09CS CS CS30020 Performance Modelling in Computer.
Domain decomposition in parallel computing Ashok Srinivasan Florida State University.
A Framework for Reliable Routing in Mobile Ad Hoc Networks Zhenqiang Ye Srikanth V. Krishnamurthy Satish K. Tripathi.
DM GROUP MEETING PRESENTATION PLAN Eigenvector-based Centrality Measures For Temporal Networks by D Taylor et.al. Uncovering the Small Community.
2/14/2016  A. Orda, A. Segall, 1 Queueing Networks M nodes external arrival rate (Poisson) service rate in each node (exponential) upon service completion.
Raptor Codes Amin Shokrollahi EPFL. BEC(p 1 ) BEC(p 2 ) BEC(p 3 ) BEC(p 4 ) BEC(p 5 ) BEC(p 6 ) Communication on Multiple Unknown Channels.
Stability of decentralised control mechanisms Laurent Massoulié Thomson Research, Paris.
A Tutorial on Spectral Clustering Ulrike von Luxburg Max Planck Institute for Biological Cybernetics Statistics and Computing, Dec. 2007, Vol. 17, No.
Network Topology Single-level Diversity Coding System (DCS) An information source is encoded by a number of encoders. There are a number of decoders, each.
Theory of Computational Complexity Probability and Computing Chapter Hikaru Inada Iwama and Ito lab M1.
R. Srikant University of Illinois at Urbana-Champaign
Clustering Using Pairwise Comparisons
3.3 Network-Centric Community Detection
Information flows through networks:
Presentation transcript:

P2P systems: epidemic scheduling, content placement and user profiling Laurent Massoulié Thomson, Paris Research Lab

2 Outline Epidemic schemes for live streaming – Rate-optimality – Delay-optimality Content placement – Optimisation framework – Adaptive replication User profiling – Spectral clustering – Linear programming

3 Outline Epidemic schemes for live streaming – Rate-optimality – Delay-optimality Content placement – Optimisation framework – Adaptive replication and 3/4 - competitivity User profiling – Spectral clustering – Linear Programming

4 Context P2P systems for live streaming on the Internet – PPLive, CoolStreaming, Sopcast, TVants,TVUPlay, Joost…

5 Network constraints ● Graph connecting nodes ● Capacities assigned to edges Achievable broadcast rate [Edmonds, 73]:  Equals maximal number of edge-disjoint spanning trees that can be packed in graph  Coincides with minimum over receivers of max-flow ( = min-cut) between source and receiver

6 Based on local informations No explicit construction of spanning trees Random Useful chunk selection and Edmonds’ theorem [LM, A. Twigg, C. Gkantsidis & P. Rodriguez] When injection rate at source is strictly feasible, Markov process is ergodic.  Chunks successfully broadcast with bounded delay ? ? ? ? ? ? ?? ?

7 Network with access (node) constraints … ● Scarce resource: access capacity ● Complete communication graph: Everyone can send to anyone ●Bound on maximum streaming rate λ: Let c i = uplink b/w of node i Necessary condition for feasibility:

8 Deprived Peer / Random Useful Chunk [LM, A. Twigg, C. Gkantsidis & P. Rodriguez] Sender’s packets Potential receiver 1Potential receiver 2 5 Source policy: sends “fresh” packets if any (fresh = not sent yet to anyone)

9 Deprived Peer / Random Useful Chunk [LM, A. Twigg, C. Gkantsidis & P. Rodriguez] Sender’s packets Potential receiver 1Potential receiver 2 5 Neighborhood management: Periodically add random neighbor & suppress least deprived neighbor  Fixed neighborhood sizes

10 Main result When λ < λ*, Markov process is ergodic.  Hence all packets are received at all nodes after time bounded in probability

11 Multiple commodities Several sources s, Dedicated receiver sets V(s) Can overlap Sources are not receivers Nodes cannot relay commodities they don’t consume …

12 Multiple commodities Necessary conditions for feasibility: Bundled most deprived / random useful: do not distinguish between commodities when – measuring deprivation – Chosing random useful packet System is ergodic when Conditions hold with strict inequality

13 Symmetric Networks (c 1 = c 2 =... = c N = 1 chunk / sec ) Previous lower bound reads log 2 (N) Achievable [J. Mundinger & R. Weber]: source t t-1 t-2 t-3 t+1 Makes use of log 2 (N) trees; not robust against churn

14 A look at the corresponding trees N=4 N=8 N=16 N=32

15 Random target / latest useful packet ? Sender’s packets Receiver’s packets Latest useful pkt ???

16 I.e:Diffusion at rates arbitrarily close to optimal feasible under optimal delay ( plus constant) Random target / latest useful packet For arbitrary injection rate λ<1 and constant x>0, Each peer receives fraction 1- 1/x of packets in time log 2 (N)+O(x). [T. Bonald, LM, F. Mathieu, D. Perino & A. Twigg]

17 Open questions Delay optimality in heterogeneous environments Cost optimality Convergence time scale

18 Outline Epidemic schemes for live streaming – Rate-optimality – Delay-optimality Content placement – Optimisation framework – Adaptive replication User profiling – Spectral clustering – Linear programming

19 Outline Epidemic schemes for live streaming – Rate-optimality – Delay-optimality Content placement – Optimisation framework – Adaptive replication User profiling – Spectral clustering – Linear programming

20 Problem statement N users Storage capacity: m objects Service capacity: B requests Local accesses are free Request rate: f for object f Request duration: 1 Aim: minimize number of lost requests

21 Optimal placement structure Let M f = number of replicas of object f Schedulable region: request rates x f verifying Effective arrival rates: times K if objects can be split into K size (1/K) sub-objects

22 Hot/Warm/Cold partition Sort objects according to popularity : 1  2  … Replicate everywhere (M f =N) top popular objects 1…,f(1) Partial replication of objects f(1)+1,…f(2) : No replication of objects for f>f(2) f(1) and f(2) : such that “warm objects” generate requests at rate BN, and all memory is used

23 Adaptive replication Replication policy: – Create new replica for object f after each dropped request – Remove object chosen at random Ignoring object-specific capacity constraints, caricature dynamics:  Equilibrium:

24 Adaptive replication (ctd) Compare to full replication of only top popular objects, i.e. Then reductions to offered rates verify  “Value of foresight” is less than 25%...

25 Outline Epidemic schemes for live streaming – Rate-optimality – Delay-optimality Content placement – Optimisation framework – Adaptive replication User profiling – Spectral clustering – Linear programming

26 Outline Epidemic schemes for live streaming – Rate-optimality – Delay-optimality Content placement – Optimisation framework – Adaptive replication User profiling – Spectral clustering – Linear programming

27 User profiling Aim: predict tastes of users Applications: – Further optimization of placement – Recommender Systems

28 Netflix dataset 17, 770 movies, rated by 480, 000 users

29 The planted partition model Users partitioned into clusters k=1,…,K Each pair of users (i,j) : conflict level C(i,j) in [0,1] (e.g., fraction of movies rated differently) Statistical assumptions: – C(i,j) independent over i<j – E(C(i,j)) = b kl D/N if users i,j belong clusters k, l

30 A spectral algorithm Step 1: find suitable “de-noised” descriptors of users  Form normalized eigenvectors x(1),…,x(K) associated to K largest (in absolute value) eigenvalues of conflict matrix  To each user i, assign vector z i =(x i (1),…,x i (K))

31 A spectral algorithm Step 2: do crude clustering on descriptors  Pick a random set of A users u(1),…,u(A)  Identify pair with closest descriptors (for L 2 norm) and remove one of them, until only K users are left, say v(1),…,v(K)  Cluster the nodes according to proximity of their descriptors to the cluster exemplars v(1),…,v(K)

32 Theorem Assume that – Fixed number K of clusters, each of size  (N) – Matrix (b kl ) has full rank K – D  C log(N) for some constant C Then with probability 1-o(1), Algorithm partitions correctly fraction 1-o(1) of nodes for suitable A ( 1<< A << D 1/2 ) Main tool: control of spectral structure of E-R graph adjacency matrix when average degree D  C log(N) [Feige-Ofek]

33 Open question Brute force Maximum Likelihood: retrieves clusters when D>>1  Efficient procedure under this assumption?

34 Another algorithmic version of Netflix Objective: for user n, find inference of all unknown ratings that maximizes number of users fully agreeing with user n  NP-hard (badly so) Probabilistic model – Users belong to clusters k=1,…,K, with sizes a(k) N – Within a cluster, identical ratings (i.i.d., +1 or -1 w.p. ½ for each movie, F movies in total) – Each rating of each user: revealed w.p. p

35 Proposed algorithm (inspiration: compressive sensing; see [Decoding by linear programming, Candes&Tao]) Consider user 1 For suitable cost function g, determine full rating vectors X(n), compatible with known ratings (i.e. P n X(n)=Y(n) ), that minimize A proxy to (intractable) minimization of

36 Conditions for optimality Assume optimum of (II) : “clustered” reconstruction X**(n) such that X**(n)=X**(1) for all indices n  A Then optimum of (I) such that X*(n)=X*(1), n  A provided:

37 Application to probabilistic model Necessary condition for hidden cluster to be optimal: Sufficient condition for LP algorithm to retrieve hidden cluster, under choice g= |.|  :  Differ by factor at most K-1

38 Outlook Clustering – Robustness of proposed schemes to statistical modeling assumptions – Efficient (distributed?) implementations