Handling Churn in Less-structured P2P Systems Elders Know Best

Slides:



Advertisements
Similar presentations
Peer-to-Peer and Social Networks An overview of Gnutella.
Advertisements

Scalable Content-Addressable Network Lintao Liu
Peer-to-Peer Systems Chapter 25. What is Peer-to-Peer (P2P)? Napster? Gnutella? Most people think of P2P as music sharing.
On Large-Scale Peer-to-Peer Streaming Systems with Network Coding Chen Feng, Baochun Li Dept. of Electrical and Computer Engineering University of Toronto.
Fabián E. Bustamante, 2007 Meridian: A lightweight network location service without virtual coordinates B. Wong, A. Slivkins and E. Gün Sirer SIGCOM 2005.
GIA: Making Gnutella-like P2P Systems Scalable Yatin Chawathe Intel Research Seattle Sylvia Ratnasamy, Lee Breslau, Scott Shenker, and Nick Lanham.
VDR: Proactive element Conclusions VDR reaches 3.5% more nodes than VDR-R and 9% more nodes than our modified random walk routing strategy (RWR) VDR shows.
Massively Distributed Database Systems Distributed Hash Spring 2014 Ki-Joune Li Pusan National University.
Technion –Israel Institute of Technology Computer Networks Laboratory A Comparison of Peer-to-Peer systems by Gomon Dmitri and Kritsmer Ilya under Roi.
1 An Overview of Gnutella. 2 History The Gnutella network is a fully distributed alternative to the centralized Napster. Initial popularity of the network.
Search and Replication in Unstructured Peer-to-Peer Networks Pei Cao, Christine Lv., Edith Cohen, Kai Li and Scott Shenker ICS 2002.
Farnoush Banaei-Kashani and Cyrus Shahabi Criticality-based Analysis and Design of Unstructured P2P Networks as “ Complex Systems ” Mohammad Al-Rifai.
Denial-of-Service Resilience in Peer-to-Peer Systems D. Dumitriu, E. Knightly, A. Kuzmanovic, I. Stoica and W. Zwaenepoel Presenter: Yan Gao.
Gnutella 2 GNUTELLA A Summary Of The Protocol and it’s Purpose By
Small-world Overlay P2P Network
P2p, Spring 05 1 Topics in Database Systems: Data Management in Peer-to-Peer Systems March 29, 2005.
Technion –Israel Institute of Technology Software Systems Laboratory A Comparison of Peer-to-Peer systems by Gomon Dmitri and Kritsmer Ilya under Roi Melamed.
Building Low-Diameter P2P Networks Eli Upfal Department of Computer Science Brown University Joint work with Gopal Pandurangan and Prabhakar Raghavan.
Efficient Content Location Using Interest-based Locality in Peer-to-Peer Systems Presented by: Lin Wing Kai.
Rutgers PANIC Laboratory The State University of New Jersey Self-Managing Federated Services Francisco Matias Cuenca-Acuna and Thu D. Nguyen Department.
Exploiting Content Localities for Efficient Search in P2P Systems Lei Guo 1 Song Jiang 2 Li Xiao 3 and Xiaodong Zhang 1 1 College of William and Mary,
Making Gnutella-like P2P Systems Scalable Presented by: Karthik Lakshminarayanan Yatin Chawathe, Sylvia Ratnasamy, Lee Breslau, Nick Lanham, and Scott.
Kyushu University Graduate School of Information Science and Electrical Engineering Department of Advanced Information Technology Supervisor: Professor.
Chord-over-Chord Overlay Sudhindra Rao Ph.D Qualifier Exam Department of ECECS.
Searching in Unstructured Networks Joining Theory with P-P2P.
Improving Data Access in P2P Systems Karl Aberer and Magdalena Punceva Swiss Federal Institute of Technology Manfred Hauswirth and Roman Schmidt Technical.
1CS 6401 Peer-to-Peer Networks Outline Overview Gnutella Structured Overlays BitTorrent.
1 Virtual Direction Routing for Overlay Networks Bow-Nan Cheng Murat Yuksel Shivkumar Kalyanaraman.
INTRODUCTION TO PEER TO PEER NETWORKS Z.M. Joseph CSE 6392 – DB Exploration Spring 2006 CSE, UT Arlington.
1 Napster & Gnutella An Overview. 2 About Napster Distributed application allowing users to search and exchange MP3 files. Written by Shawn Fanning in.
Application-Layer Anycasting By Samarat Bhattacharjee et al. Presented by Matt Miller September 30, 2002.
Resilient P2P Anonymous Routing by Using Redundancy Yingwu Zhu.
CCAN: Cache-based CAN Using the Small World Model Shanghai Jiaotong University Internet Computing R&D Center.
Quantitative Evaluation of Unstructured Peer-to-Peer Architectures Fabrício Benevenuto José Ismael Jr. Jussara M. Almeida Department of Computer Science.
GIA: Making Gnutella-like P2P Systems Scalable Yatin Chawathe Sylvia Ratnasamy, Scott Shenker, Nick Lanham, Lee Breslau (Several slides have been taken.
A Peer-to-Peer Approach to Resource Discovery in Grid Environments (in HPDC’02, by U of Chicago) Gisik Kwon Nov. 18, 2002.
An IP Address Based Caching Scheme for Peer-to-Peer Networks Ronaldo Alves Ferreira Joint work with Ananth Grama and Suresh Jagannathan Department of Computer.
SIGCOMM 2001 Lecture slides by Dr. Yingwu Zhu Chord: A Scalable Peer-to-peer Lookup Service for Internet Applications.
GIA: Making Gnutella-like P2P Systems Scalable Yatin Chawathe Sylvia Ratnasamy, Scott Shenker, Nick Lanham, Lee Breslau Parts of it has been adopted from.
Peer to Peer A Survey and comparison of peer-to-peer overlay network schemes And so on… Chulhyun Park
Efficient P2P Search by Exploiting Localities in Peer Community and Individual Peers A DISC’04 paper Lei Guo 1 Song Jiang 2 Li Xiao 3 and Xiaodong Zhang.
By Jonathan Drake.  The Gnutella protocol is simply not scalable  This is due to the flooding approach it currently utilizes  As the nodes increase.
DHT-based unicast for mobile ad hoc networks Thomas Zahn, Jochen Schiller Institute of Computer Science Freie Universitat Berlin 報告 : 羅世豪.
Plethora: Infrastructure and System Design. Introduction Peer-to-Peer (P2P) networks: –Self-organizing distributed systems –Nodes receive and provide.
On Reducing Mesh Delay for Peer- to-Peer Live Streaming Dongni Ren, Y.-T. Hillman Li, S.-H. Gary Chan Department of Computer Science and Engineering The.
Gerhard Haßlinger Search Methods in Dynamic Wireless Networks  Challenges for search in wireless networks  Random walks and flooding for search with.
Peer to Peer Network Design Discovery and Routing algorithms
Algorithms and Techniques in Structured Scalable Peer-to-Peer Networks
P2P Search COP P2P Search Techniques Centralized P2P systems  e.g. Napster, Decentralized & unstructured P2P systems  e.g. Gnutella.
Decentralized Trust Management for Ad-Hoc Peer-to-Peer Networks Thomas Repantis Vana Kalogeraki Department of Computer Science & Engineering University.
Fabián E. Bustamante, Fall 2005 A brief introduction to Pastry Based on: A. Rowstron and P. Druschel, Pastry: Scalable, decentralized object location and.
Distributed Caching and Adaptive Search in Multilayer P2P Networks Chen Wang, Li Xiao, Yunhao Liu, Pei Zheng The 24th International Conference on Distributed.
CS Spring 2010 CS 414 – Multimedia Systems Design Lecture 24 – Introduction to Peer-to-Peer (P2P) Systems Klara Nahrstedt (presented by Long Vu)
Marco Conti, Enrico Gregori, Giovanni Turi Istituto di Informatica e Telematica – CNR MobiHoc ‘ Jongsoo Lee
Virtual Direction Routing
Data Center Network Architectures
Peer-to-Peer Data Management
Peer-to-Peer and Social Networks
Plethora: Infrastructure and System Design
Early Measurements of a Cluster-based Architecture for P2P Systems
Chapter 7: Consistency & Replication IV - REPLICATION MANAGEMENT -Sumanth Kandagatla Instructor: Prof. Yanqing Zhang Advanced Operating Systems (CSC 8320)
Spatial Online Sampling and Aggregation
Determining the Peer Resource Contributions in a P2P Contract
GIA: Making Gnutella-like P2P Systems Scalable
Paraskevi Raftopoulou, Euripides G.M. Petrakis
Improving Performance in the Gnutella Protocol
Peer-to-Peer Information Systems Week 6: Performance
Elders know best Lifespan-based ideas in P2P systems
Friendships that last Peer lifespan and its role in P2P protocols
Mobile P2P Data Retrieval and Caching
Presentation transcript:

Handling Churn in Less-structured P2P Systems Elders Know Best Yi Qiao & Fabián E. Bustamante Department of Electrical Engineering & Computer Science Northwestern University {yqiao,fabianb}@cs.northwestern.edu 20’ presentation – 5’ Q&A Who is my audience? P2P practitioners, some experimental, some more theory-oriented If someone remembers only one thing from my talk, what would I like it to be? You can build churn-resilience performance systems through self-adaptive protocols that rely on the system’s dynamic characteristics

Toward Massively Distributed Systems What scale may bring Virtually infinite resources always available Information everywhere at anytime Power to the people! … but not for free Resource management Heterogeneity Naming Administration Measurement, testing & debugging in the mist of chaos John Lennon, 1940-1980 50-year trend from centralized to massively distributed - from centralized, high unit cost, low unit volume to low unit cost, high unit volume systems Quantitatively & qualitatively different than traditional distributed systems Scale & scope introduces new design & implementation problems Cooperative or peer-to-peer - A good model for these systems Qiao & Bustamante, EE&CS, Northwestern U. IEEE P2P 2005

Peers’ Transiency (a.k.a. Churn) The problem with peers’ transiency Very large peer populations Autonomous nature of peers Architectural mutual dependencies of P2P systems Median session length from 1hr to 1’ [Sariou ’02], [Bustamante ‘03], [Rhea ’04] … Why should you care? E.g. for data sharing applications: control traffic cost, spread of queries, cache effectiveness, degree of replication, … <Explain session time and lifespan> Session time is the time from the node’s joining to its subsequent leaving from the system. We use session time and lifespan interchangeably. Another metric of transiency is lifetime – the time between the node first entering the system and its final departure from it. Qiao & Bustamante, EE&CS, Northwestern U. IEEE P2P 2005

Qiao & Bustamante, EE&CS, Northwestern U. IEEE P2P 2005 The Lifespan Approach Peer Lifespan Distribution Active probing of ~1 million peers’ lifespans RCDF of peers with lifespan in [~22’, 3.5 days] Pareto distribution of the form λTk (k < 0) A peer’s expected remaining session length is proportional to the peer’s age Basis for churn-resilient protocols and strategies! Qiao & Bustamante, EE&CS, Northwestern U. IEEE P2P 2005

Qiao & Bustamante, EE&CS, Northwestern U. IEEE P2P 2005 Outline Motivation & background Lifespan-based protocols and strategies Organizational protocols Query-related strategies Evaluation Conclusions Qiao & Bustamante, EE&CS, Northwestern U. IEEE P2P 2005

Organizational Protocols The way a peer-to-peer system is structured Unstructured (UDP) - All peers equal; e.g., Gnutella v0.4 Loosely structured (HDP) - Leaf & super-peers; e.g., Gnutella v0.6, Kazaa Highly structured (DHT) Lifespan-based organizational protocols Opt for longer lived peers when choosing neighbors and/or recommending peers to others [Bustamante02] Lifespan UDP (LUDP) Opt for older peers for connections; random recommendations Lifespan HDP (LHDP) Leaf and super-peers opt for older super peers for connections Given the Used-Better-than-New-in-Expectation nature of lifespan distribution LUDP and LHDP use a weighted credit selection scheme that also considers the peer’s estimated current number of available incoming connections Qiao & Bustamante, EE&CS, Northwestern U. IEEE P2P 2005

Query, Caching & Replication Strategies Flooding Query is propagated to all neighbors within a radius Inherently un-scalable K-random walks k parallel query messages randomly forwarded at each hop [Lv02] Improvement factoring in node’s degree [Adamic01] , capacity [Lv03], … Lifespan-based k-random walk Query Opt for older peers when forwarding a query walker A simple weighted probabilistic approach works well Avoids collision between walkers Prevents hot spots Qiao & Bustamante, EE&CS, Northwestern U. IEEE P2P 2005

Query, Caching & Replication Strategies Neighbor Caching with incremental Update (NCU) Path Caching with eXpiration (PCX) [Roussopoulos03] Effectiveness not obvious for less-structured systems Regional Caching with eXpiration (RCX) - new Peers in query hit path push query hit entries to some of their neighbors Lifespan-based RCX Caching in older neighbors along the path Expiration threshold for cached entries is based on age of target peer NCU – peer maintain caches of metadata for all its neighbors PCX – directly applicable but effectiveness not obvious Qiao & Bustamante, EE&CS, Northwestern U. IEEE P2P 2005

Query, Caching & Replication Strategies Simple replication – make replicas on requesters Proactive replication (path replication) puts more replicas on multiple peers Regional replication - more effective than path replication Put replicas on some neighbors of each peer along the query path Lifespan-based Regional Replication (LRRep) Opt for in-the-path-region peers’ older neighbors for placing replicas Upper-bound for number of replicas each peer can store Replication also plays an important role on improving system’s performance scalability All query, caching and replication strategies can be built on either original organizational protocols such as UDP and HDP, or lifespan-based protocols, such as LUDP and LHDP Qiao & Bustamante, EE&CS, Northwestern U. IEEE P2P 2005

Determining Peer’s Age Effectiveness of lifespan-based approach, depends on Fitness of session length estimators Accuracy of peers’ age information … A lightweight distributed protocol for age determination Some good characteristics Age never directly requested from peer itself Trimming/sampling reduces the probability of small cabals P trying to determine C’s age 1.Witness collection Get from C list of potential witnesses & interaction windows 2.Witness sampling & trimming a. Trim witness with suspiciously large interaction windows b. Sample final list W 3.Collecting testimonies & determining age a. Validate C reported interaction windows asking peers in W b. Determine C’s age Algorithm based on previous work on reputation Damiani et al, CCCS 2002; Dutta et al, P2PEcon 2003 Qiao & Bustamante, EE&CS, Northwestern U. IEEE P2P 2005

Qiao & Bustamante, EE&CS, Northwestern U. IEEE P2P 2005 Outline Motivation & background Lifespan-based protocols and strategies Organizational protocols Query-related strategies Evaluation Conclusions Query Caching Replication Organizational protocol Qiao & Bustamante, EE&CS, Northwestern U. IEEE P2P 2005

Qiao & Bustamante, EE&CS, Northwestern U. IEEE P2P 2005 Evaluation Setups Simulation Simulations driven by 4 of the 20 lifespan traces ~150,000 peers, 3000-4000 online at any time 4 query walkers, with TTL = 20 Simulated time 511,000” (~6 days) Wide-area Modified open-source Gnutella client 150 PlanetLab nodes 200-300 online peers during experiment 3 query walkers, with TTL = 7 Qiao & Bustamante, EE&CS, Northwestern U. IEEE P2P 2005

Basic Advantages of Lifespan Approach … and 50- 70% more query hits LUDP has 50-70% shorter query resolution time than UDP k-random-walk query (RQuery) Simple replication (SRep) Lifespan-based Unstructured (LUDP) k-random-walk query (RQuery) Simple replication (SRep) Random Unstructured (UDP) Qiao & Bustamante, EE&CS, Northwestern U. IEEE P2P 2005 Simulation

Basic Advantages of Lifespan Approach Comparable results in wide-area experiments LUDP delivers >40% more query hits than UDP k-random-walk query (RQuery) Simple replication (SRep) Lifespan-based Unstructured (LUDP) k-random-walk query (RQuery) Simple replication (SRep) Random Unstructured (UDP) Qiao & Bustamante, EE&CS, Northwestern U. IEEE P2P 2005 Wide-Area

Basic Advantages of Lifespan Approach And with hierarchical protocols … and more query hits Significantly faster query - 3x faster for 50% of queries k-random-walk query (RQuery) Simple replication (SRep) Lifespan-based Hierarchical (LHDP) k-random-walk query (RQuery) Simple replication (SRep) Random Hierarchical (HDP) Qiao & Bustamante, EE&CS, Northwestern U. IEEE P2P 2005 Simulation

Lifespan-based Query-related Strategies … and more query hits Significantly faster query - 2-3x faster for 50% of queries Lifespan k-random-walk query (LQuery) Lifespan-based regional replication (LRRep) Unstructured (UDP) k-random-walk query (RQuery) Regional replication (RRRep) Unstructured (UDP) Qiao & Bustamante, EE&CS, Northwestern U. IEEE P2P 2005 Wide-Area

Qiao & Bustamante, EE&CS, Northwestern U. IEEE P2P 2005 Combined Strengths … and 3x improvement on query hits >4x times faster query resolution times Lifespan k-random-walk query (LQuery) Lifespan-based regional replication (LRRep) Lifespan-based Hierarchical (LHDP) k-random-walk query (RQuery) Regional replication (RRRep) Random Hierarchical (HDP) Qiao & Bustamante, EE&CS, Northwestern U. IEEE P2P 2005 Simulation

Conclusions & Future Work Need to address churn resilience in massively distributed systems Lifespan is a good base for structural resilient systems Illustrative lifespan-based organizational protocols & strategies Demonstrated effectiveness through trace-driven simulations & wide-area experiments Lower control overhead Faster query resolution Higher query hits Currently applying similar ideas to build structurally churn-resilient DHT systems Qiao & Bustamante, EE&CS, Northwestern U. IEEE P2P 2005

Qiao & Bustamante, EE&CS, Northwestern U. IEEE P2P 2005

Basic Advantages of Lifespan Approach Relative query satisfaction: the percentage of queries achieving Z satisfaction (i.e. at least z query hits) Why lifespan-based LUDP is better? Queries more likely to reach older peers which store more replicas, cache indexes longer, and are much less likely to breakdown query/reply paths Replication Caching Aggregate Query Hit Z=5 Satisfaction Z=10 Satisfaction Z=20 Satisfaction SRep None 1.57 1.21 1.50 1.65 NCU 1.22 1.13 1.18 PCX 1.00 1.67 1.15 1.37 Using PCX, LUDP results on faster query resolution. Qiao & Bustamante, EE&CS, Northwestern U. IEEE P2P 2005 Simulation

Lifespan-based Query-related Strategies Just from query: ~100% improvement on query resolution time & hit numbers Lifespan k-random-walk query (LQuery) Simple replication (SRep) Unstructured (UDP) k-random-walk query (RQuery) Simple replication (SRep) Unstructured (UDP) Qiao & Bustamante, EE&CS, Northwestern U. IEEE P2P 2005 Simulation

Lifespan-based Query-related Strategies median query hit number 25 to 60 90% query resolution time: 0.2 sec to 0.55 sec Lifespan k-random-walk query (LQuery) Lifespan-based regional replication (LRRep) Unstructured (UDP) Lifespan-based regional caching (LRCX) k-random-walk query (RQuery) Regional replication (RRRep) Unstructured (UDP) Regional caching (RRCX) Qiao & Bustamante, EE&CS, Northwestern U. IEEE P2P 2005 Simulation