1 - CS7701 – Fall 2004 Review of: Making Gnutella-like P2P Systems Scalable Paper by: – Yatin Chawathe (AT&T) –Sylvia Ratnasamy (Intel) –Lee Breslau (AT&T)

Slides:



Advertisements
Similar presentations
Peer-to-Peer and Social Networks An overview of Gnutella.
Advertisements

CAN 1.Distributed Hash Tables a)DHT recap b)Uses c)Example – CAN.
P2P data retrieval DHT (Distributed Hash Tables) Partially based on Hellerstein’s presentation at VLDB2004.
Ion Stoica, Robert Morris, David Karger, M. Frans Kaashoek, Hari Balakrishnan MIT and Berkeley presented by Daniel Figueiredo Chord: A Scalable Peer-to-peer.
Peer to Peer and Distributed Hash Tables
Clayton Sullivan PEER-TO-PEER NETWORKS. INTRODUCTION What is a Peer-To-Peer Network A Peer Application Overlay Network Network Architecture and System.
GIA: Making Gnutella-like P2P Systems Scalable Yatin Chawathe Intel Research Seattle Sylvia Ratnasamy, Lee Breslau, Scott Shenker, and Nick Lanham.
Massively Distributed Database Systems Distributed Hash Spring 2014 Ki-Joune Li Pusan National University.
1 An Overview of Gnutella. 2 History The Gnutella network is a fully distributed alternative to the centralized Napster. Initial popularity of the network.
Search and Replication in Unstructured Peer-to-Peer Networks Pei Cao, Christine Lv., Edith Cohen, Kai Li and Scott Shenker ICS 2002.
Gnutella 2 GNUTELLA A Summary Of The Protocol and it’s Purpose By
Improving Gnutella Willy Henrique Säuberli Seminar in Distributed Computing, 16. November 2005 Papers: I.Making Gnutella-like P2P Systems Scalable; SIGCOMM.
Small-world Overlay P2P Network
P2p, Spring 05 1 Topics in Database Systems: Data Management in Peer-to-Peer Systems March 29, 2005.
ZIGZAG A Peer-to-Peer Architecture for Media Streaming By Duc A. Tran, Kien A. Hua and Tai T. Do Appear on “Journal On Selected Areas in Communications,
Cis e-commerce -- lecture #6: Content Distribution Networks and P2P (based on notes from Dr Peter McBurney © )
Topics in Reliable Distributed Systems Lecture 2, Fall Dr. Idit Keidar.
A Trust Based Assess Control Framework for P2P File-Sharing System Speaker : Jia-Hui Huang Adviser : Kai-Wei Ke Date : 2004 / 3 / 15.
Efficient Content Location Using Interest-based Locality in Peer-to-Peer Systems Presented by: Lin Wing Kai.
Exploiting Content Localities for Efficient Search in P2P Systems Lei Guo 1 Song Jiang 2 Li Xiao 3 and Xiaodong Zhang 1 1 College of William and Mary,
Search and Replication in Unstructured Peer-to-Peer Networks Pei Cao Cisco Systems, Inc. (Joint work with Christine Lv, Edith Cohen, Kai Li and Scott Shenker)
Making Gnutella-like P2P Systems Scalable Presented by: Karthik Lakshminarayanan Yatin Chawathe, Sylvia Ratnasamy, Lee Breslau, Nick Lanham, and Scott.
Object Naming & Content based Object Search 2/3/2003.
Chord-over-Chord Overlay Sudhindra Rao Ph.D Qualifier Exam Department of ECECS.
Topics in Reliable Distributed Systems Fall Dr. Idit Keidar.
1 CS 194: Distributed Systems Distributed Hash Tables Scott Shenker and Ion Stoica Computer Science Division Department of Electrical Engineering and Computer.
CS522: Algorithmic and Economic Aspects of the Internet Instructors: Nicole Immorlica Mohammad Mahdian
Searching in Unstructured Networks Joining Theory with P-P2P.
1CS 6401 Peer-to-Peer Networks Outline Overview Gnutella Structured Overlays BitTorrent.
1 Virtual Direction Routing for Overlay Networks Bow-Nan Cheng Murat Yuksel Shivkumar Kalyanaraman.
Introduction to Peer-to-Peer Networks. What is a P2P network Uses the vast resource of the machines at the edge of the Internet to build a network that.
P2P File Sharing Systems
INTRODUCTION TO PEER TO PEER NETWORKS Z.M. Joseph CSE 6392 – DB Exploration Spring 2006 CSE, UT Arlington.
1 Napster & Gnutella An Overview. 2 About Napster Distributed application allowing users to search and exchange MP3 files. Written by Shawn Fanning in.
P2P Group Meeting (ICS/FORTH) Monday, 21 February, 2005 Making Gnutella-like P2P Systems Scalable (Yatin Chawathe, Sylvia Ratnasamy, Lee Breslau, Nick.
1 P2P Computing. 2 What is P2P? Server-Client model.
Introduction to Peer-to-Peer Networks. What is a P2P network A P2P network is a large distributed system. It uses the vast resource of PCs distributed.
Thesis Proposal Data Consistency in DHTs. Background Peer-to-peer systems have become increasingly popular Lots of P2P applications around us –File sharing,
An affinity-driven clustering approach for service discovery and composition for pervasive computing J. Gaber and M.Bakhouya Laboratoire SeT Université.
Peer to Peer Research survey TingYang Chang. Intro. Of P2P Computers of the system was known as peers which sharing data files with each other. Build.
1 BitHoc: BitTorrent for wireless ad hoc networks Jointly with: Chadi Barakat Jayeoung Choi Anwar Al Hamra Thierry Turletti EPI PLANETE 28/02/2008 MAESTRO/PLANETE.
Chord: A Scalable Peer-to-peer Lookup Protocol for Internet Applications Xiaozhou Li COS 461: Computer Networks (precept 04/06/12) Princeton University.
Quantitative Evaluation of Unstructured Peer-to-Peer Architectures Fabrício Benevenuto José Ismael Jr. Jussara M. Almeida Department of Computer Science.
GIA: Making Gnutella-like P2P Systems Scalable Yatin Chawathe Sylvia Ratnasamy, Scott Shenker, Nick Lanham, Lee Breslau (Several slides have been taken.
Peer Pressure: Distributed Recovery in Gnutella Pedram Keyani Brian Larson Muthukumar Senthil Computer Science Department Stanford University.
1 Peer-to-Peer Technologies Seminar by: Kunal Goswami (05IT6006) School of Information Technology Guided by: Prof. C.R.Mandal, School of Information Technology.
GIA: Making Gnutella-like P2P Systems Scalable Yatin Chawathe Sylvia Ratnasamy, Scott Shenker, Nick Lanham, Lee Breslau Parts of it has been adopted from.
Peer to Peer A Survey and comparison of peer-to-peer overlay network schemes And so on… Chulhyun Park
Efficient P2P Search by Exploiting Localities in Peer Community and Individual Peers A DISC’04 paper Lei Guo 1 Song Jiang 2 Li Xiao 3 and Xiaodong Zhang.
"A Measurement Study of Peer-to-Peer File Sharing Systems" Stefan Saroiu, P. Krishna Gummadi Steven D. Gribble, "A Measurement Study of Peer-to-Peer File.
By Jonathan Drake.  The Gnutella protocol is simply not scalable  This is due to the flooding approach it currently utilizes  As the nodes increase.
An overview of Gnutella
Aug 22, 2002Sigcomm 2002 Replication Strategies in Unstructured Peer-to-Peer Networks Edith Cohen AT&T Labs-research Scott Shenker ICIR.
Algorithms and Techniques in Structured Scalable Peer-to-Peer Networks
CS Spring 2014 CS 414 – Multimedia Systems Design Lecture 37 – Introduction to P2P (Part 1) Klara Nahrstedt.
Two Peer-to-Peer Networking Approaches Ken Calvert Net Seminar, 23 October 2001 Note: Many slides “borrowed” from S. Ratnasamy’s Qualifying Exam talk.
INTERNET TECHNOLOGIES Week 10 Peer to Peer Paradigm 1.
P2P Search COP6731 Advanced Database Systems. P2P Computing  Powerful personal computer Share computing resources P2P Computing  Advantages: Shared.
P2P Search COP P2P Search Techniques Centralized P2P systems  e.g. Napster, Decentralized & unstructured P2P systems  e.g. Gnutella.
CS Spring 2012 CS 414 – Multimedia Systems Design Lecture 37 – Introduction to P2P (Part 1) Klara Nahrstedt.
School of Electrical Engineering &Telecommunications UNSW Cost-effective Broadcast for Fully Decentralized Peer-to-peer Networks Marius Portmann & Aruna.
Peer-to-Peer File Sharing Systems Group Meeting Speaker: Dr. Xiaowen Chu April 2, 2004 Centre for E-transformation Research Department of Computer Science.
Distributed Caching and Adaptive Search in Multilayer P2P Networks Chen Wang, Li Xiao, Yunhao Liu, Pei Zheng The 24th International Conference on Distributed.
CS Spring 2010 CS 414 – Multimedia Systems Design Lecture 24 – Introduction to Peer-to-Peer (P2P) Systems Klara Nahrstedt (presented by Long Vu)
Composing Web Services and P2P Infrastructure. PRESENTATION FLOW Related Works Paper Idea Our Project Infrastructure.
Unstructured Networks: Search Márk Jelasity. 2 Outline ● Emergence of decentralized networks ● The Gnutella network: how it worked and looked like ● Search.
Virtual Direction Routing
Early Measurements of a Cluster-based Architecture for P2P Systems
EE 122: Peer-to-Peer (P2P) Networks
GIA: Making Gnutella-like P2P Systems Scalable
Presentation transcript:

1 - CS7701 – Fall 2004 Review of: Making Gnutella-like P2P Systems Scalable Paper by: – Yatin Chawathe (AT&T) –Sylvia Ratnasamy (Intel) –Lee Breslau (AT&T) –Nick Lanham (UC Berkeley) –Scott Shenker (ICSI) Published in: – IEEE SIGCOMM 2003 Reviewed by: – Todd Sproull Discussion Leader: – Christoph Jechlitschek CS7701: Research Seminar on Networking

2 - CS7701 – Fall 2004 Outline Introduction Problem Description Gia Design Simulation Results Implementation Conclusions

3 - CS7701 – Fall 2004 Introduction Peer to Peer (P2P) Networks –“Systems serving other Systems” –Potential for millions of users –Gained consumer popularity through Napster Napster –Started in 1999 by Shawn Fanning –Enabled music fans to trade songs over a P2P network –Clients connected to centralized Napster Servers to locate music –2001 Judge ruled Napster had to block all copyrighted material –2002 Napster folded RIAA continued after Napster clones Gnutella –March 14, 2000 Nullsoft released first version of software Created by Justin Frankel and Tom Pepper Nullsoft pulled the software the next day –Software was reverse engineered –Open Source clients became available –Built around decentralized approach

4 - CS7701 – Fall 2004 Gnutella Distributed search and download Unstructured: ad-hoc topology –Peers connect to random nodes Random search –Flood queries across network Scaling problems –As network grows, search overhead increases P1P1 P2P2 P4P4 P3P3 who has “madonna” P 4 has “madonna- american-life.mp3” P5P5 P6P6 P 2 has “madonna- ray-of-light.mp3”

5 - CS7701 – Fall 2004 Problem Gnutella has notoriously poor scaling –Flooding-based Solution –Just using Distributed Hash Tables does not necessarily fix the problem Challenge –Improve scaling while maintain Gnutella’s simplicity Propose new mechanisms to fix scalability issues Evaluate performance of these individual components and the entire network

6 - CS7701 – Fall 2004 What about DHTS? Distributed Hash Tables (DHTs) –Provides hash table abstraction over multiple compute nodes How it works –Each DHT can store data items –Data items indexed via lookup key –Overlay routing delivers requests for a given key to the responsible node –O (log N) message hops in network of N nodes –DHT adjusts mapping of keys and neighbor tables when node set changes

7 - CS7701 – Fall 2004 Example B’s Routing Table KeyPointer 7C 8D C B D Key 6? I have key 6 Key 6? D’s Routing Table KeyPointer 6E Nope! Key 6? Key 6! E A

8 - CS7701 – Fall 2004 DHT only P2P network? Problems –P2P clients are transient Clients joining and leaving at rates causing a fair amount of “churn” Route failures require O (log n) repair operations –Keyword searches are more prevalent, and more important than an exact-match queries “Madonna Ray of Light mp3” or “Madona Ray Light mp3”.. –Queries are for hay, not needles Most requests for popular content 50% content requests for more than 100 replicas 80% content requests for more than 80 replicas

9 - CS7701 – Fall 2004 The Solution Design new Gnutella like P2P system “Gia” –Short for gianduia, generic form of hazelnut spread Nutella What’s so great about it? –Dynamic Topology Adaptation Accounts for heterogeneity among nodes –Active Flow Control Scheme Implements token based allocation for queries –One-hop replication Keep small nodes next to well connected “higher capacity” nodes –Capacity refers to message processing capabilities of a node per unit time –Search Protocol based on Random Walks No longer flooding the network with requests

10 - CS7701 – Fall 2004 Make high-capacity nodes easily reachable –Dynamic topology adaptation Make high-capacity nodes have more answers –One-hop replication Search efficiently –Biased random walks Prevent overloaded nodes –Active flow control Make high-capacity nodes easily reachable –Dynamic topology adaptation Make high-capacity nodes have more answers –One-hop replication Search efficiently –Biased random walks Prevent overloaded nodes –Active flow control Example Query

11 - CS7701 – Fall 2004 Dynamic Topology Adaptation Core Component of Gia Goals –Ensure high capacity nodes are ones with high degree –Keep low capacity nodes within short reach of high capacity nodes Accomplished through satisfaction level S –When S=0, node is dissatisfied –As node accumulates more neighbors, satisfaction rises until it reaches a satisfaction level of 1

12 - CS7701 – Fall 2004 Adding new neighbors Adding neighbor Y to X –Add neighbor new neighbor, if room exists –If no room, check to see if an existing neighbor can be replaced –Goal: Find an existing neighbor with capacity less then or equal to new neighbor, with the highest degree Do not drop an already poorly connected neighbor Assumptions: –Max Neighbors of X = 3 –Capacity of all nodes the same X A B Y C

13 - CS7701 – Fall 2004 Token Based Flow Control Allows client to query the neighbor only if allowed from the neighbor –Client must have token from neighbor Tokens sent from a client to its neighbors periodically –Token allocation rate based on nodes ability to process queries

14 - CS7701 – Fall 2004 One Hop Replication Gia nodes maintain index of content of neighbors –Improves efficiency of search process –Allows for neighbors to respond to search queries Being “close” to content is useful –Not necessary that you have the requested content, but instead a pointer to it

15 - CS7701 – Fall 2004 Search Protocol Based on biased random walks –Gia node selects highest capacity neighbor that it has tokens for and sends query –Queues message if no tokens available for any neighbor Uses two mechanisms for control –TTL bounds duration of walks –Maintains MAX_RESPONSES parameter for maximum number of answers it searches for

16 - CS7701 – Fall 2004 Simulations Four basic models –FLOOD Gnutella Model –RWRT Random Walks over Random Topologies Proposed by Lv et al. –SUPER Classifies some nodes as “Super Nodes”, based on Capacity (> 1000x) –GIA Gia protocol suite Capacity –The number of messages (queries or add/drop requests) a node can process per unit time –Derived from measured bandwidth distributions from Sariou et al. Fair amount of clients have dialup connections Majority are using cable-modem or DSL Few have “high-speed” connections

17 - CS7701 – Fall 2004 Performance Metrics Collapse Point (CP) –Per node query rate at the point beyond which the success rate drops below 90%. –Referred to as the knee Hop-count before collapse (CP-HP) –Average hop count prior to collapse

18 - CS7701 – Fall 2004 Performance Comparison

19 - CS7701 – Fall 2004 Factor Analysis Effects of individual components –Remove each component from Gia one at a time –Add each component to RWRT –No single component contributes entirely to Gia’s success

20 - CS7701 – Fall 2004 Multiple Searches CP changes with MAX_RESPONSES Replication Factor and MAX_RESPONSES

21 - CS7701 – Fall 2004 Robustness Static SUPER Static RWRT (1% repl)

22 - CS7701 – Fall 2004 Active Replication Allow higher capacity nodes to replicate files –On demand replication when high capacity node receives query and download request Active replication can increase capacity of nodes serving files from a factor of 38 to 50

23 - CS7701 – Fall 2004 Implementation Satisfaction Level –Aggressiveness of Adaptation –Exponential relationship between satisfaction level S and adaptation interval I –Define: I = Adaptation interval S = Satisfaction level T = maximum interval between adaptation iterations K = aggressiveness of adaptation interval –Let I = T * K -(1-S)

24 - CS7701 – Fall 2004 Satisfaction Level Calculating Satisfaction level –S = 0 initially and if # of neighbors is less than predefined min –Satisfaction Algorithm does the following Adds up normalized capacity of all neighbors –High capacity neighbor with low degree is worth more than High capacity high degree Divide your capacity from total to find S Returns S=1 if S > 1 or # neighbors greater than predefined max

25 - CS7701 – Fall 2004 Deployment Planet Lab –Wide Area service deployment testbed in North America, Europe, Asia and the South Pacific –Deployed Gia on 83 clients –Measured time to reach “steady state”

26 - CS7701 – Fall 2004 Related Work KaZaA –At time of SIGCOMM little had been published on KaZaA –“Understanding KaZaA” Liang, et al CAP –Cluster based approach to handle scaling in Gnutella Based on a central clustering server Clusters act as directory servers PierSearch –Published in SIGCOMM 2004 –PIER + Gnutella PIER uses DHT for hard to find content and Gnutella for the more popular Gnuetella2 –Aimed at fixing many of the problems with Gnutella –Not created by Gnutella founders, causing some controversy in the community

27 - CS7701 – Fall 2004 Conclusion Gia proves to be a scalable Gnutella –3 to 5 orders of magnitude improvement Unstructed system works well for popular content –DHT not necessary in most cases Working implementation on Planet Lab

28 - CS7701 – Fall 2004

29 - CS7701 – Fall 2004

30 - CS7701 – Fall 2004

31 - CS7701 – Fall 2004

32 - CS7701 – Fall 2004