Download presentation
Presentation is loading. Please wait.
Published byMargaret Bryan Modified over 6 years ago
1
Handling Churn in Less-structured P2P Systems Elders Know Best
Yi Qiao & Fabián E. Bustamante Department of Electrical Engineering & Computer Science Northwestern University 20’ presentation – 5’ Q&A Who is my audience? P2P practitioners, some experimental, some more theory-oriented If someone remembers only one thing from my talk, what would I like it to be? You can build churn-resilience performance systems through self-adaptive protocols that rely on the system’s dynamic characteristics
2
Toward Massively Distributed Systems
What scale may bring Virtually infinite resources always available Information everywhere at anytime Power to the people! … but not for free Resource management Heterogeneity Naming Administration Measurement, testing & debugging in the mist of chaos John Lennon, 50-year trend from centralized to massively distributed - from centralized, high unit cost, low unit volume to low unit cost, high unit volume systems Quantitatively & qualitatively different than traditional distributed systems Scale & scope introduces new design & implementation problems Cooperative or peer-to-peer - A good model for these systems Qiao & Bustamante, EE&CS, Northwestern U. IEEE P2P 2005
3
Peers’ Transiency (a.k.a. Churn)
The problem with peers’ transiency Very large peer populations Autonomous nature of peers Architectural mutual dependencies of P2P systems Median session length from 1hr to 1’ [Sariou ’02], [Bustamante ‘03], [Rhea ’04] … Why should you care? E.g. for data sharing applications: control traffic cost, spread of queries, cache effectiveness, degree of replication, … <Explain session time and lifespan> Session time is the time from the node’s joining to its subsequent leaving from the system. We use session time and lifespan interchangeably. Another metric of transiency is lifetime – the time between the node first entering the system and its final departure from it. Qiao & Bustamante, EE&CS, Northwestern U. IEEE P2P 2005
4
Qiao & Bustamante, EE&CS, Northwestern U. IEEE P2P 2005
The Lifespan Approach Peer Lifespan Distribution Active probing of ~1 million peers’ lifespans RCDF of peers with lifespan in [~22’, 3.5 days] Pareto distribution of the form λTk (k < 0) A peer’s expected remaining session length is proportional to the peer’s age Basis for churn-resilient protocols and strategies! Qiao & Bustamante, EE&CS, Northwestern U. IEEE P2P 2005
5
Qiao & Bustamante, EE&CS, Northwestern U. IEEE P2P 2005
Outline Motivation & background Lifespan-based protocols and strategies Organizational protocols Query-related strategies Evaluation Conclusions Qiao & Bustamante, EE&CS, Northwestern U. IEEE P2P 2005
6
Organizational Protocols
The way a peer-to-peer system is structured Unstructured (UDP) - All peers equal; e.g., Gnutella v0.4 Loosely structured (HDP) - Leaf & super-peers; e.g., Gnutella v0.6, Kazaa Highly structured (DHT) Lifespan-based organizational protocols Opt for longer lived peers when choosing neighbors and/or recommending peers to others [Bustamante02] Lifespan UDP (LUDP) Opt for older peers for connections; random recommendations Lifespan HDP (LHDP) Leaf and super-peers opt for older super peers for connections Given the Used-Better-than-New-in-Expectation nature of lifespan distribution LUDP and LHDP use a weighted credit selection scheme that also considers the peer’s estimated current number of available incoming connections Qiao & Bustamante, EE&CS, Northwestern U. IEEE P2P 2005
7
Query, Caching & Replication Strategies
Flooding Query is propagated to all neighbors within a radius Inherently un-scalable K-random walks k parallel query messages randomly forwarded at each hop [Lv02] Improvement factoring in node’s degree [Adamic01] , capacity [Lv03], … Lifespan-based k-random walk Query Opt for older peers when forwarding a query walker A simple weighted probabilistic approach works well Avoids collision between walkers Prevents hot spots Qiao & Bustamante, EE&CS, Northwestern U. IEEE P2P 2005
8
Query, Caching & Replication Strategies
Neighbor Caching with incremental Update (NCU) Path Caching with eXpiration (PCX) [Roussopoulos03] Effectiveness not obvious for less-structured systems Regional Caching with eXpiration (RCX) - new Peers in query hit path push query hit entries to some of their neighbors Lifespan-based RCX Caching in older neighbors along the path Expiration threshold for cached entries is based on age of target peer NCU – peer maintain caches of metadata for all its neighbors PCX – directly applicable but effectiveness not obvious Qiao & Bustamante, EE&CS, Northwestern U. IEEE P2P 2005
9
Query, Caching & Replication Strategies
Simple replication – make replicas on requesters Proactive replication (path replication) puts more replicas on multiple peers Regional replication - more effective than path replication Put replicas on some neighbors of each peer along the query path Lifespan-based Regional Replication (LRRep) Opt for in-the-path-region peers’ older neighbors for placing replicas Upper-bound for number of replicas each peer can store Replication also plays an important role on improving system’s performance scalability All query, caching and replication strategies can be built on either original organizational protocols such as UDP and HDP, or lifespan-based protocols, such as LUDP and LHDP Qiao & Bustamante, EE&CS, Northwestern U. IEEE P2P 2005
10
Determining Peer’s Age
Effectiveness of lifespan-based approach, depends on Fitness of session length estimators Accuracy of peers’ age information … A lightweight distributed protocol for age determination Some good characteristics Age never directly requested from peer itself Trimming/sampling reduces the probability of small cabals P trying to determine C’s age 1.Witness collection Get from C list of potential witnesses & interaction windows 2.Witness sampling & trimming a. Trim witness with suspiciously large interaction windows b. Sample final list W 3.Collecting testimonies & determining age a. Validate C reported interaction windows asking peers in W b. Determine C’s age Algorithm based on previous work on reputation Damiani et al, CCCS 2002; Dutta et al, P2PEcon 2003 Qiao & Bustamante, EE&CS, Northwestern U. IEEE P2P 2005
11
Qiao & Bustamante, EE&CS, Northwestern U. IEEE P2P 2005
Outline Motivation & background Lifespan-based protocols and strategies Organizational protocols Query-related strategies Evaluation Conclusions Query Caching Replication Organizational protocol Qiao & Bustamante, EE&CS, Northwestern U. IEEE P2P 2005
12
Qiao & Bustamante, EE&CS, Northwestern U. IEEE P2P 2005
Evaluation Setups Simulation Simulations driven by 4 of the 20 lifespan traces ~150,000 peers, online at any time 4 query walkers, with TTL = 20 Simulated time 511,000” (~6 days) Wide-area Modified open-source Gnutella client 150 PlanetLab nodes online peers during experiment 3 query walkers, with TTL = 7 Qiao & Bustamante, EE&CS, Northwestern U. IEEE P2P 2005
13
Basic Advantages of Lifespan Approach
… and % more query hits LUDP has 50-70% shorter query resolution time than UDP k-random-walk query (RQuery) Simple replication (SRep) Lifespan-based Unstructured (LUDP) k-random-walk query (RQuery) Simple replication (SRep) Random Unstructured (UDP) Qiao & Bustamante, EE&CS, Northwestern U. IEEE P2P 2005 Simulation
14
Basic Advantages of Lifespan Approach
Comparable results in wide-area experiments LUDP delivers >40% more query hits than UDP k-random-walk query (RQuery) Simple replication (SRep) Lifespan-based Unstructured (LUDP) k-random-walk query (RQuery) Simple replication (SRep) Random Unstructured (UDP) Qiao & Bustamante, EE&CS, Northwestern U. IEEE P2P 2005 Wide-Area
15
Basic Advantages of Lifespan Approach
And with hierarchical protocols … and more query hits Significantly faster query - 3x faster for 50% of queries k-random-walk query (RQuery) Simple replication (SRep) Lifespan-based Hierarchical (LHDP) k-random-walk query (RQuery) Simple replication (SRep) Random Hierarchical (HDP) Qiao & Bustamante, EE&CS, Northwestern U. IEEE P2P 2005 Simulation
16
Lifespan-based Query-related Strategies
… and more query hits Significantly faster query x faster for 50% of queries Lifespan k-random-walk query (LQuery) Lifespan-based regional replication (LRRep) Unstructured (UDP) k-random-walk query (RQuery) Regional replication (RRRep) Unstructured (UDP) Qiao & Bustamante, EE&CS, Northwestern U. IEEE P2P 2005 Wide-Area
17
Qiao & Bustamante, EE&CS, Northwestern U. IEEE P2P 2005
Combined Strengths … and 3x improvement on query hits >4x times faster query resolution times Lifespan k-random-walk query (LQuery) Lifespan-based regional replication (LRRep) Lifespan-based Hierarchical (LHDP) k-random-walk query (RQuery) Regional replication (RRRep) Random Hierarchical (HDP) Qiao & Bustamante, EE&CS, Northwestern U. IEEE P2P 2005 Simulation
18
Conclusions & Future Work
Need to address churn resilience in massively distributed systems Lifespan is a good base for structural resilient systems Illustrative lifespan-based organizational protocols & strategies Demonstrated effectiveness through trace-driven simulations & wide-area experiments Lower control overhead Faster query resolution Higher query hits Currently applying similar ideas to build structurally churn-resilient DHT systems Qiao & Bustamante, EE&CS, Northwestern U. IEEE P2P 2005
19
Qiao & Bustamante, EE&CS, Northwestern U. IEEE P2P 2005
20
Basic Advantages of Lifespan Approach
Relative query satisfaction: the percentage of queries achieving Z satisfaction (i.e. at least z query hits) Why lifespan-based LUDP is better? Queries more likely to reach older peers which store more replicas, cache indexes longer, and are much less likely to breakdown query/reply paths Replication Caching Aggregate Query Hit Z=5 Satisfaction Z=10 Satisfaction Z=20 Satisfaction SRep None 1.57 1.21 1.50 1.65 NCU 1.22 1.13 1.18 PCX 1.00 1.67 1.15 1.37 Using PCX, LUDP results on faster query resolution. Qiao & Bustamante, EE&CS, Northwestern U. IEEE P2P 2005 Simulation
21
Lifespan-based Query-related Strategies
Just from query: ~100% improvement on query resolution time & hit numbers Lifespan k-random-walk query (LQuery) Simple replication (SRep) Unstructured (UDP) k-random-walk query (RQuery) Simple replication (SRep) Unstructured (UDP) Qiao & Bustamante, EE&CS, Northwestern U. IEEE P2P 2005 Simulation
22
Lifespan-based Query-related Strategies
median query hit number 25 to 60 90% query resolution time: 0.2 sec to 0.55 sec Lifespan k-random-walk query (LQuery) Lifespan-based regional replication (LRRep) Unstructured (UDP) Lifespan-based regional caching (LRCX) k-random-walk query (RQuery) Regional replication (RRRep) Unstructured (UDP) Regional caching (RRCX) Qiao & Bustamante, EE&CS, Northwestern U. IEEE P2P 2005 Simulation
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.