PERFORMANCE AND REPUTATION IN University of Southern California

PERFORMANCE AND REPUTATION IN University of Southern California
PEER-TO-PEER SYSTEMS Hui Zhang University of Southern California

A true story from Star Wars Episode I: a little boy’s attack on a central server
21st Century Fox 9/19/2018

A true story from Star Wars Episode I: the failure of a whole Droid army
21st Century Fox 9/19/2018

A lesson on designing a robust and scalable distributed system:
Why not a P2P design? 21st Century Fox 9/19/2018

Outline P2P design Reputation systems Data availability: Freenet
9/19/2018 Outline P2P design Application / User Overlay routing Underlying network Reputation systems Data availability: Freenet Latency optimization: Chord I will start with overlay routing layer as the algorithms in this layer are the critical determinant of the overall system performance 9/19/2018 P2Peco

9/19/2018 Freenet [Clarke et al. 2001] A distributed anonymous information storage and retrieval system. Provides privacy for information producers, consumers, and holders. Data Node Hashing Key space 9/19/2018 P2Peco

An example search in Freenet network
9/19/2018 An example search in Freenet network 9/19/2018 P2Peco

9/19/2018

9/19/2018 An example search in Freenet network cache replacement scheme is LRU 9/19/2018 P2Peco

9/19/2018

P An example search in Freenet network E A B D C E has a copy of key 8
9/19/2018 An example search in Freenet network A B C D E B’s Routing Table Key Pointer 7 10 8 … D’s Routing Table E has a copy of key 8 P A’s Routing Table 9/19/2018 P2Peco

Performance Deterioration of Freenet under heavy load
9/19/2018 Performance Deterioration of Freenet under heavy load 0.2 0.4 0.6 0.8 1 5 10 15 20 25 Request Hit Ratio # of files generated per node 300 nodes, cache size =60, routing table size =110 We have try other size like cache size =180 files, and still gets the qualitatively same curve. 9/19/2018 P2Peco

Why can’t Freenet search efficiently under heavy work load?
9/19/2018 Why can’t Freenet search efficiently under heavy work load? Key distribution in a routing table under light work load Key distribution in a routing table under heavy work load Clustering 200000 400000 600000 800000 1e+06 200000 400000 600000 800000 1e+06 Key space Key space Comparison of the key distribution in the routing tables of Freenet nodes under different work loads. 9/19/2018 P2Peco

Clustering with randomness
9/19/2018 Clustering with randomness To improve routing performance, we want the routing table at node x to conform to the small-world model [Kleinberg 1999]. Crucial Observation: Such clustering can be achieved by just changing the route-cache replacement policy. Clustered keys Random key (shortcut) Key Space 9/19/2018 P2Peco

Small-world Freenet [Zhang et al. 2002, 2004]
9/19/2018 Small-world Freenet [Zhang et al. 2002, 2004] Enhanced-clustering route-cache replacement Each node chooses a seed randomly when joining the network When a new key (file) u is to be cached, the node chooses in the current datastore the key v farthest from the seed If Distance (u, seed) < Distance (v, seed), cache u and evict v with probability 1-P (clustering) If Distance (u, seed) > Distance (v, seed), cache u and evict v with probability P (randomness). Clarify route-cahce Put an example for Freenet 9/19/2018 P2Peco

Random, regular, and small-world graphs – micro-level and macro-level
9/19/2018 Random, regular, and small-world graphs – micro-level and macro-level Original Freenet New Freenet with P=0 New Freenet with P=0.03 200000 400000 600000 800000 1e+06 Key space Key value Key distribution in a routing table under heavy work load [watts et al 1998] Show low and heavy load clusteing then a slide for the question: can we change a little 9/19/2018 P2Peco

Small-world Freenet - performance
9/19/2018 Small-world Freenet - performance Small-world Freenet Regular Freenet Random Freenet Hit ratio vs. work load Avg. hops per successful request vs. work load 9/19/2018 P2Peco

Small-world Freenet - analysis
9/19/2018 Small-world Freenet - analysis The expected steps to deliver a message in the idealized small-world Freenet is O(log N) if the routing table size is (log2 N), where N is the network size. System Expected routing path Avg. Routing table size CAN[Ratnaswamy et al 2001] O(dN1/d) O(d) Chord[Stoica et al 2001] O(log N) Tapestry[Zhao et al 2001] We showed that After this slide: Experiments: -does this problem happens? - yes, Red-hat 9.0 CD distrbuiioin broght down (personal comm Ian Clarke) Implementation - class project - observed key clustering in a real implemtation 9/19/2018 P2Peco

Experiments Does this performance degradation problem happen?
9/19/2018 Experiments Does this performance degradation problem happen? Yes. For example, when RedHat 9.0 CD ROMs were distributed in the Freenet [Ian Clarke, private communication]. Real-world testing. Local datastore observation on live Freenet nodes. Key distribution in data store at Hour-1: an original Freenet node Fractionj in stead of perecentage 9/19/2018 P2Peco

9/19/2018 Outline P2P design Application / User Overlay routing Underlying network Reputation systems Data availability: Freenet Latency optimization: Chord Tapestry, Pastry, small-world Freenet, … It’s not hard to show the results shown in the next also apply to many other P2P systems including … 9/19/2018 P2Peco

Chord – overlay routing structure
9/19/2018 Chord – overlay routing structure Network node 32 64 96 128 160 192 224 256 256 0: [225, 256] 32: [1, 32] Range-1 Range-2 Range-3 Range-4 Data Routing Pointer 220 Chord: a well-known DHT design. Which supports a simple primitive: given a key, maps it onto a node A Chord network with 8 nodes and 8-bit key space 9/19/2018 P2Peco

Chord – routing latency optimization
9/19/2018 Chord – routing latency optimization @usc @beijing @stanford @ucsd 32 64 96 128 160 192 224 256 256 Network node Data Overlay routing 220 @HP Labs A Chord network with 8 nodes and 8-bit key space 9/19/2018 P2Peco

LPRS-Chord [Zhang et al. 2003; Goel et al. 2005]
9/19/2018 LPRS-Chord [Zhang et al. 2003; Goel et al. 2005] @usc Network node Proximity neighbor selection [Plaxton et al 1997] [Rowstron et al 2001] [Zhao et al 2001] [Gummadi et al 2003]: Populate each entry of the routing table with nearby nodes among candidates. Nearby to what degree? When and how to populate? Data Routing Pointer Distance measurement @shanghai @ucla Previous solution requires each entry requires to point to the nearest candidate to provide a guaranteed low latency performance. Building and maintaining such an optimal routing table is costly LPRS-Chord can have a near-optimal latency performance when each entry points to a candidate near enough (the degree will be explained later). Heuristics are proposed to build this optimal routing table when a node joins in. This brings high boot-strap cost and delayed joining for the nodes, And they still have pay the maintenance fee to keep the invariants in the routing tables. LPRS-Chord start with the original routing tables that is not optimized on latency, And then incrementally but rapidly improve the routing table entries With random sampling piggybacked on regular lookup messages. @beijing @HP Labs A Chord network with 8 nodes and 8-bit key space 9/19/2018 P2Peco

Lookup-Parasitic Random Sampling (LPRS)
9/19/2018 Lookup-Parasitic Random Sampling (LPRS) 1. Recursive lookup. 2. Each intermediate hop appends its IP address to the lookup message. 3. When the lookup reaches its target, the target informs each listed hop of its identity. 4. Each intermediate hop then gets a reasonable estimate of the latency to the target, and update its routing table accordingly. 5. When the target key is random to the initial node, a sampling on each range (of some node) happens with the same probability 1/2. Uniform sampling in terms of range 9/19/2018 P2Peco

LPRS-Chord: latency performance on ring, mesh, and random graph
Time 2logN 1 2 3 4 5 6 1000 2000 3000 4000 5000 6000 Latency Stretch Network Size Ring Mesh Random graph latency for each lookup on the underlying topology latency stretch = average latency on the underlying topology 9/19/2018

Underlying network vs. overlay routing
Expected lookup latency in LPRS-Chord Cost per node d-power-law latency expansion O(L) (log N)d samples on each range exponential latency expansion (L•logN) - L: the average unicast latency in the underlying network. N: overlay network size. Latency expansion: Let Nu(x) denote the number of nodes in the network G that are within latency x of node u. - power-law latency expansion: Nu(x) grows proportionally to xd, for all nodes u. - exponential latency expansion: Nu(x) grows proportionally to x for some constant  > 1. 9/19/2018

Is the Internet latency expansion power-law?
9/19/2018 Is the Internet latency expansion power-law? The hop-count expansion of the Internet router-level topology is exponential; The latency expansion of the Internet router-level topology is power-law and has an exponent between 1 and 2. Networking algorithms including Request latency reduction in web cache systems [Plaxton et al ]. Nearest neighbor search [Karger et al. 2002] Locality-aware DHT design [Abraham et al. 2004] . Gossip-based communication mechanisms [Kempe et al. 2002]. This immediately implies that LPRS is a very practical approach to reducing lookup latency in DHT systems. 9/19/2018 P2Peco

9/19/2018 Outline P2P design Application / User Overlay routing Underlying network Reputation systems Data availability: Freenet Latency optimization: DHTs 9/19/2018 P2Peco

Building social trust in P2P users
9/19/2018 Building social trust in P2P users Peers are prone to show selfish behaviors when there is no incentive to cooperate. Free-riding phenomenon [Adar et al. 2000][Saroiu et al. 2001]. Reputation systems A means of describing social trust networks. Effective at Incentivizing user cooperation. Isolating malicious users. Adjudging node reliability, … 9/19/2018 P2Peco

Eigenvector-based reputation systems
9/19/2018 Eigenvector-based reputation systems endorsement A referential link structure (N by N matrix) Eigenvector or stationary distribution based rating schemes. HIST [Kleinberg1999], PageRank [Brin et al. 1998], etc. What’s eignvector? Replace this with a figure to show p2p link structure 9/19/2018 P2Peco

PageRank: random walk model
9/19/2018 PageRank: random walk model With prob. (1-), I will continue the walk to a random successor node. : resetting probability node With prob. , I will restart the walk at a random node. : resetting probability referential link The walker X 1/2 1/3 Y Z As time goes on, the expected percentage of steps the walker is at each node v converges to the PageRank weight PR(v). 9/19/2018 P2Peco

PageRank: is it collusion-proof?
9/19/2018 PageRank: is it collusion-proof? Can a node easily boost its rank by manipulating its out-going links with others’? An indication of “Yes” 9/19/2018 P2Peco

Two experimental topologies
9/19/2018 Two experimental topologies W, a Web link topology Contains the link structure of upwards of 80 million URLs. Source: the Stanford WebBase. B, a weblog blogrolling topology Contains the blogrolling structure of upwards of 72,000 blogs. Source: the XML-RPC webblog service. 9/19/2018 P2Peco

Experiment: Collusion200
9/19/2018 Experiment: Collusion200 Model a small number of web pages simultaneously colluding. Methodology: 100 colluding groups; Each colluding group has the circle topology consisting of two nodes with adjacent ranks; Arbitrarily chose nodes originally ranked around 1000th, 2000th, …, th.  = 0.15. (100th, 200th, …, 10000th for B due to the smaller graph size) 9/19/2018 P2Peco

Experiment result of Collusion200
9/19/2018 Experiment result of Collusion200 Old rank: th New rank: 5038th Old rank: 10001th New rank: 450th Old rank: 1005th New rank: 67th 9/19/2018 P2Peco

There is a long flat portion…
9/19/2018 There is a long flat portion… The long flat portion explains the dramatic perturbation of the ranking in Collusion200: small PR weight change can cause large change in relative ranking. The PR weight distribution of 2 topologies. 9/19/2018 P2Peco

An observation on collusion behaviors
9/19/2018 An observation on collusion behaviors To increase their PR weight, i.e., the stationary weight in the random walk, the colluding nodes will stall the random walk. G G’ When the resetting probability  increases, the colluding nodes must suffer a significant drop in PR weight. Therefore, we expect the PR weight of colluding nodes to be highly correlated with 1/  (the average walk length), while that of non-colluding nodes is relatively insensitive to the change in . 9/19/2018 P2Peco

Adaptive-resetting scheme [Zhang et al. 2004]
9/19/2018 Adaptive-resetting scheme [Zhang et al. 2004] Part I – collusion detection: Given the topology, calculate the PR vector under different  values. {] = {0.0375, 0.05, 0.075, 0.15, 0.3, 0.45, 0.6], default = 0.15. Calculate the correlation coefficient between the curve of each node x's PR weight and the curve of 1/ . Label it as co-co(x). Part II –  personalization: Calculate each node x's out-link personalized- = F(default, co-co(x)). Exponential function FExp= The final PR weight vector is calculated with these personalized resetting values. Remove linear 9/19/2018 P2Peco

Experiment result of Collusion200 (II)
9/19/2018 Experiment result of Collusion200 (II) Before after (with without) 9/19/2018 P2Peco

Topology analysis on W and B
9/19/2018 Topology analysis on W and B messenger.yahoo.com a small loop of two top nodes in W a star sub-graph in B 9/19/2018 P2Peco

New top-25 URL list in W Dropped out Dropping New 9/19/2018 9/19/2018
P2Peco

9/19/2018 Conclusion A set of algorithms and schemes for performance and reputation in P2P systems. They have or will been implemented in real applications. Small-world Freenet source code implemented and tested in Freenet. LPRS-Chord in grid computing [Cai et al. 2004]. A collusion-robust reputation system for Weblog community I have also worked on Traffic estimation and measurement [INFOCOM05]. Distributed indexing for multi-dimensional range queries [NetDB05]. 9/19/2018 P2Peco

Thanks! 9/19/2018

Backup slides 9/19/2018

9/19/2018 P2P networked systems A collaborating group of Internet end-hosts which overlay their own special-purpose network atop the Internet. Design goals Allows rapid deployment through self organization; Scales with increasing network size; Adapts to dynamics from both the underlying network and the users. 9/19/2018 P2Peco

Small-world model [Kleinberg1999]
9/19/2018 Small-world model [Kleinberg1999] Small-world graphs have short-distance clustering (like regular graph) long-distance shortcuts (result in short global path length like random graph) How to find short paths in a distributed fashion? Local contact log2(N) average routing path length Shortcut Probability 1/j i i+j An one-dimensional small-world network example 9/19/2018 P2Peco

Term definition: Latency expansion
9/19/2018 Term definition: Latency expansion Let Nu(x) denote the number of nodes in the network G that are within latency x of node u. - Power-law latency expansion: Nu(x) grows (i.e. ``expands'‘) proportionally to xd, for all nodes u. Examples: ring (d=1), mesh (d=2). - Exponential latency expansion: Nu(x) grows proportionally to x for some constant  > 1. Examples: random graphs. While overlay, the udnerlying network has a critical impact on the performance of 9/19/2018 P2Peco

Lookup-Parasitic Random Sampling (LPRS)
Network node 128 192 224 256 Data Overlay routing Distance measurement 220 Uniform sampling in terms of ranges: For a routing request with (log N) hops, final node t will be a random node in (log N) different ranges for the intermediate nodes. A Chord network with 8 nodes and 8-bit key space 9/19/2018

Underlying network vs. overlay routing (II)
9/19/2018 Underlying network vs. overlay routing (II) Underlying network Expected lookup latency in Chord Cost per node exponential latency expansion (L•logN) - L: the average unicast latency in the underlying network. N: overlay network size. Network node Distance measurement D Latency x Expansion rate Nu(x)  cx (>1. E.g., random graphs with hop distance) 9/19/2018 P2Peco

topology with power-law expansion
LPRS-Chord: topology with power-law expansion Ring Stretch (at time 2logN) 9/19/2018

LPRS-Chord: convergence time Convergence Time 9/19/2018

Internet latency expansion
The performance of many other networking algorithms relies on the latency expansion characteristic of the underlying network. Request latency reduction in web cache systems [Plaxton et al ]. Nearest neighbor search [Karger et al. 2002] . Locality-aware DHT design [Abraham et al. 2004] . Gossip-based communication mechanisms [Kempe et al. 2002]. The Internet router-level topology has an exponential expansion [Phillips et al. 1999]. The expansion is defined on router-level hops How about the expansion in terms of latency? 9/19/2018

Internet latency expansion: measurement methodology
Collected two router-level topologies. - one in May 2002 with routers, and the other in November 2003 with routers. Randomly sampled about 100,000 node pairs from each topology and used their latency to estimate Internet latency expansion. Approximated link latency between any two nodes by the accumulated geographic distance of the path between the two nodes in shortest path routing. - assign geo-locations to nodes using the Geotrack tool. 9/19/2018

Stretch on the router-level graph (at time 2logN)
LPRS-Chord: Internet subgraphs Stretch on the router-level graph (at time 2logN) 9/19/2018

node t is in its range-x (< d) node t is in its range-y (< x)
Uniform sampling in terms of ranges Node x: the node at hop x Node 0: the request initiator Node t: the request terminator Node 1 node t is in its range-x (< d) routing path Node 0 node t is in its range-d Node 2 node t is in its range-y (< x) Node t For a routing request with (log N) hops, final node t will be a random node in (log N) different ranges. 9/19/2018

Reputation systems [Okita2003]
9/19/2018 Reputation systems [Okita2003] A means of describing social trust networks. The basic concept is a democratic meritocracy. A rating system is used to evaluate individual members, and those results are then collated to produce a consensus about the merit of any given member. Examples: Livejournal, Friendster, eBay, Advogato 9/19/2018 P2Peco

PageRank algorithm [Brin1998]
9/19/2018 PageRank algorithm [Brin1998] Assume N pages. Assign all pages the initial value 1/N Let Nu be the out-degree of Page u, Rank(v) the importance of Page v, Bv the set of pages pointing to v. Basic algorithm v Rank(v) = Enhanced algorithm against rank sinks v Rank(v) = : damping factor 9/19/2018 P2Peco

9/19/2018 PageRank [Brin et al. 1998] A rating scheme to rank hypertext documents on the WWW. An iterative algorithm to calculate the importance of a web page based on the importance of its parent pages. Can be applied to other systems than WWW. 9/19/2018 P2Peco

Amp(G): a metric on group collusion
9/19/2018 Amp(G): a metric on group collusion x y G G’ i j : resetting probability WG(G’) =PR(i)+PR(j) real group weight PR(x) 3 (1-) PR(y) 2 4 + (1-) Win(G’) = + 2 N (1-W(G’)) “actual” group weight In the system of node group G, for a subgroup G’, the amplification factor Amp(G’) = Remove this 9/19/2018 P2Peco

Answer for (1+1 = ?) in PageRank
9/19/2018 Answer for (1+1 = ?) in PageRank In the original PageRank system, where  is the resetting probability. Remove this slide 9/19/2018 P2Peco

Experiment result of Collusion200 (I)
9/19/2018 Experiment result of Collusion200 (I) Remove this W - Amplification factors of the colluding groups in Collusion200. 9/19/2018 P2Peco

9/19/2018 Experiment result of Collusion200 (II) Figure A: W – new PR weight after Collusion200. 9/19/2018 P2Peco

Experiment result of Collusion200 (VII)
9/19/2018 Experiment result of Collusion200 (VII) Figure B: B – new PR rank after Collusion200 9/19/2018 P2Peco

Experiment result of Collusion200 (X)
9/19/2018 Experiment result of Collusion200 (X) Figure C: B – new PR weight after Collusion200 9/19/2018 P2Peco

Next step: how to detect collusions?
9/19/2018 Next step: how to detect collusions? Theorem on group detection hardness. Max G’G Amp(G’) is a NP-Hard problem. Remove this 9/19/2018 P2Peco

Correlation coefficient
9/19/2018 Correlation coefficient 9/19/2018 P2Peco

Experiment result of Collusion200 (IV)
9/19/2018 Experiment result of Collusion200 (IV) remove W - Amplification factors of the colluding groups in Collusion200. 9/19/2018 P2Peco

Experiment result of Collusion200 (V)
9/19/2018 Experiment result of Collusion200 (V) W – new PR weight after Collusion200. 9/19/2018 P2Peco

Experiment 2: Collusion22
9/19/2018 Experiment 2: Collusion22 Model various colluding subgraphs. Methodology: 3 colluding groups: node referential link (100th, 200th, …, 10000th for B due to the smaller graph size) G1: 10-node ring G2: 10-node star topology G3: 2-node ring 9/19/2018 P2Peco

Experiment result of Collusion22 (I)
9/19/2018 Experiment result of Collusion22 (I) Amplification factors of the 3 colluding groups in Collusion22. 9/19/2018 P2Peco

9/19/2018 Experiment result of Collusion22 (II) W – new PR weight after Collusion22. 9/19/2018 P2Peco

Experiment result of Collusion22 (III)
9/19/2018 Experiment result of Collusion22 (III) Figure D: W – new PR rank after Collusion22. 9/19/2018 P2Peco

How about using finer statistics of the random walk
9/19/2018 How about using finer statistics of the random walk The revisit intervals of the random walk on a colluding node will likely to have a large variance compared to its expectation. Figure E: A counterexample: a star+dangling circle topology 1 2 N N+1 N-1 N-2 9/19/2018 P2Peco

PERFORMANCE AND REPUTATION IN University of Southern California

Similar presentations

Presentation on theme: "PERFORMANCE AND REPUTATION IN University of Southern California"— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

PERFORMANCE AND REPUTATION IN University of Southern California

Similar presentations

Presentation on theme: "PERFORMANCE AND REPUTATION IN University of Southern California"— Presentation transcript:

Similar presentations

About project

Feedback