Peer Communications. References Many of the slides are from a tutorial “P2P Systems” by Keith Ross and Dan Rubenstein 1.Dejan S. Milojicic, Vana Kalogeraki,

Slides:



Advertisements
Similar presentations
Peer-to-Peer and Social Networks An overview of Gnutella.
Advertisements

The Start of Digital Anarchy Shawn Fanning (19-yr-old student nicknamed Napster) developed the original Napster application and service in January 1999.
P2P data retrieval DHT (Distributed Hash Tables) Partially based on Hellerstein’s presentation at VLDB2004.
INF 123 SW ARCH, DIST SYS & INTEROP LECTURE 12 Prof. Crista Lopes.
Clayton Sullivan PEER-TO-PEER NETWORKS. INTRODUCTION What is a Peer-To-Peer Network A Peer Application Overlay Network Network Architecture and System.
CompSci 356: Computer Network Architectures Lecture 21: Content Distribution Chapter 9.4 Xiaowei Yang
Peer to Peer (P2P) Networks and File sharing. By: Ryan Farrell.
An Overview of Peer-to-Peer Networking CPSC 441 (with thanks to Sami Rollins, UCSB)
Peer-to-Peer Networks as a Distribution and Publishing Model Jorn De Boever (june 14, 2007)
Cis e-commerce -- lecture #6: Content Distribution Networks and P2P (based on notes from Dr Peter McBurney © )
Topics in Reliable Distributed Systems Lecture 2, Fall Dr. Idit Keidar.
Spotlighting Decentralized P2P File Sharing Archie Kuo and Ethan Le Department of Computer Science San Jose State University.
Efficient Content Location Using Interest-based Locality in Peer-to-Peer Systems Presented by: Lin Wing Kai.
A. Frank 1 Internet Resources Discovery (IRD) Peer-to-Peer (P2P) Technology (1) Thanks to Carmit Valit and Olga Gamayunov.
Peer-to-Peer Intro Jani & Sami Peltotalo.
1 Client-Server versus P2P  Client-server Computing  Purpose, definition, characteristics  Relationship to the GRID  Research issues  P2P Computing.
Freenet A Distributed Anonymous Information Storage and Retrieval System I Clarke O Sandberg I Clarke O Sandberg B WileyT W Hong.
Topics in Reliable Distributed Systems Fall Dr. Idit Keidar.
Peer To Peer Distributed Systems Pete Keleher. Why Distributed Systems? l Aggregate resources! –memory –disk –CPU cycles l Proximity to physical stuff.
Improving Data Access in P2P Systems Karl Aberer and Magdalena Punceva Swiss Federal Institute of Technology Manfred Hauswirth and Roman Schmidt Technical.
Peer-peer and Application-level Networking CS 218 Fall 2003 Multicast Overlays P2P applications Napster, Gnutella, Robust Overlay Networks Distributed.
1CS 6401 Peer-to-Peer Networks Outline Overview Gnutella Structured Overlays BitTorrent.
Jay Sheth P2P Basics Jay Sheth CE, UMBC. Jay Sheth P2P Agenda What is P2P Why P2P Components and algorithms Characteristics Different P2P systems Future.
Introduction to Peer-to-Peer Networks. What is a P2P network Uses the vast resource of the machines at the edge of the Internet to build a network that.
P2P File Sharing Systems
Freenet. Anonymity  Napster, Gnutella, Kazaa do not provide anonymity  Users know who they are downloading from  Others know who sent a query  Freenet.
1 Napster & Gnutella An Overview. 2 About Napster Distributed application allowing users to search and exchange MP3 files. Written by Shawn Fanning in.
Introduction Widespread unstructured P2P network

Peer-to-Peer Overlay Networks. Outline Overview of P2P overlay networks Applications of overlay networks Classification of overlay networks – Structured.
Searching In Peer-To-Peer Networks Chunlin Yang. What’s P2P - Unofficial Definition All of the computers in the network are equal Each computer functions.
1 Telematica di Base Applicazioni P2P. 2 The Peer-to-Peer System Architecture  peer-to-peer is a network architecture where computer resources and services.
File Sharing Prepared By: Mahmud Gabareen Eyal Biber Itamar Arie Lecturer Iris Reinhartz-Berger.
1 P2P Computing. 2 What is P2P? Server-Client model.
Introduction to Peer-to-Peer Networks. What is a P2P network A P2P network is a large distributed system. It uses the vast resource of PCs distributed.
Introduction of P2P systems
Peer-to-Peer Networks University of Jordan. Server/Client Model What?
Chapter 2: Application layer
Jonathan Walpole CSE515 - Distributed Computing Systems 1 Teaching Assistant for CSE515 Rahul Dubey.
Peer to Peer (1). References  Chapter 2.9 of Kurose and Ross  Papers oOpenNap: Open Source Napster Server oJ. Liang, R. Kumar and K. Ross, Understanding.
2: Application Layer1 Chapter 2 outline r 2.1 Principles of app layer protocols r 2.2 Web and HTTP r 2.3 FTP r 2.4 Electronic Mail r 2.5 DNS r 2.6 Socket.
GNUTELLA PEER-TO-PEER NETWORKING. GNUTELLA n What is Gnutella n Relation to the World Wide Web n How it Works n Sites / Links / Information.
Chord: A Scalable Peer-to-peer Lookup Protocol for Internet Applications Xiaozhou Li COS 461: Computer Networks (precept 04/06/12) Princeton University.
Peer-to-Pee Computing HP Technical Report Chin-Yi Tsai.
Network Computing Laboratory Scalable File Sharing System Using Distributed Hash Table Idea Proposal April 14, 2005 Presentation by Jaesun Han.
The Start Shawn Fanning (19-yr-old student nicknamed Napster) developed the original Napster application and service in January 1999 while a freshman.
2: Application Layer1 Chapter 2: Application layer r 2.1 Principles of network applications  app architectures  app requirements r 2.2 Web and HTTP r.
Peer-to-Peer Network Tzu-Wei Kuo. Outline What is Peer-to-Peer(P2P)? P2P Architecture Applications Advantages and Weaknesses Security Controversy.
SIGCOMM 2001 Lecture slides by Dr. Yingwu Zhu Chord: A Scalable Peer-to-peer Lookup Service for Internet Applications.
FastTrack Network & Applications (KaZaA & Morpheus)
1 Peer-to-Peer Technologies Seminar by: Kunal Goswami (05IT6006) School of Information Technology Guided by: Prof. C.R.Mandal, School of Information Technology.
Peer to Peer A Survey and comparison of peer-to-peer overlay network schemes And so on… Chulhyun Park
Computer Networking P2P. Why P2P? Scaling: system scales with number of clients, by definition Eliminate centralization: Eliminate single point.
ADVANCED COMPUTER NETWORKS Peer-Peer (P2P) Networks 1.
Peer to Peer Network Design Discovery and Routing algorithms
Peer-to-peer systems (part I) Slides by Indranil Gupta (modified by N. Vaidya)
Peer to Peer Computing. What is Peer-to-Peer? A model of communication where every node in the network acts alike. As opposed to the Client-Server model,
Algorithms and Techniques in Structured Scalable Peer-to-Peer Networks
Two Peer-to-Peer Networking Approaches Ken Calvert Net Seminar, 23 October 2001 Note: Many slides “borrowed” from S. Ratnasamy’s Qualifying Exam talk.
INTERNET TECHNOLOGIES Week 10 Peer to Peer Paradigm 1.
P2P Search COP6731 Advanced Database Systems. P2P Computing  Powerful personal computer Share computing resources P2P Computing  Advantages: Shared.
P2P Search COP P2P Search Techniques Centralized P2P systems  e.g. Napster, Decentralized & unstructured P2P systems  e.g. Gnutella.
CS 425 / ECE 428 Distributed Systems Fall 2015 Indranil Gupta (Indy) Peer-to-peer Systems All slides © IG.
Chord: A Scalable Peer-to-Peer Lookup Service for Internet Applications * CS587x Lecture Department of Computer Science Iowa State University *I. Stoica,
CS Spring 2010 CS 414 – Multimedia Systems Design Lecture 24 – Introduction to Peer-to-Peer (P2P) Systems Klara Nahrstedt (presented by Long Vu)
Distributed Web Systems Peer-to-Peer Systems Lecturer Department University.
CHAPTER 3 Architectures for Distributed Systems
Peer-to-Peer and Social Networks
Presentation transcript:

Peer Communications

References Many of the slides are from a tutorial “P2P Systems” by Keith Ross and Dan Rubenstein 1.Dejan S. Milojicic, Vana Kalogeraki, Rajan Lukose,Kiran Nagaraja1, Jim Pruyne, Bruno Richard, Sami Rollins 2,Zhichen Xu, “Peer-to-Peer Computing, HP Technical Report HPL

Client-Server Model Let’s look at the Client-Server model Servers are centrally maintained and administered Client has fewer computing resources than a server This is the way the web works No interaction between clients

Client Server Model Disadvantages of the client-server model  Reliability  The network depends on a possibly highly loaded server to function properly.  Server needs to be replicated to some extent to provide better reliability.  Scalability  More users imply more demand for computing power, storage space and bandwidth

Peer-to-Peer Model All nodes have similar resources. No reliance on central services or resources. A node acts as both as a “server” and client. Considered more scalable

Why P2P? The Internet has three valuable fundamental assets  Information  Computing resources  Bandwidth All of which are vastly under utilized,partly due to the traditional client-server model

Why P2P? No single search engine can locate and catalog the ever-increasing amount of information on the Web in a timely way Moreover, a huge amount of information is transient and not subject to capture by techniques such as Web crawling  Google claims that it searches about 1.3x10 8 web pages  Finding useful information in real time is increasingly difficult!

Why P2P? Although miles of new fiber have been installed, the new bandwidth gets little use if everyone goes to Yahoo for content and to eBay Instead, hot spots just get hotter while cold pipes remain cold This is partly why most people still feel the congestion over the Internet while a single fiber’s bandwidth has increased by a factor of 10 6 since 1975, doubling every 16 months

Why P2P? P2P potentially can eliminating the single-source bottleneck P2P can be used to distribute data and control and load-balance requests across the Net P2P potentially eliminates the risk of a single point of failure P2P infrastructure allows direct access and shared space, and this can enable remote maintenance capability

Peer-to-Peer Applications File/Object sharing (e.g., Gnutella) Distributed computing(e.g., Collaboration (e.g., Jabber, Magi, Groove)

Environment Characteristics for Peer-to-Peer Systems Unreliable environments Peers connecting/disconnecting – network failures to participation Random Failures e.g. power outages, cable and DSL failures, hackers Personal machines are much more vulnerable than servers

Evaluating Peer-to-Peer Systems A node’s database:  What does a node need to save in order to operate properly/efficiently Success rate (if the file is in the network, what are the changes that a search will find it) Lookup cost:  Time  Communication (bandwidth usage) Join/departure cost Fault Tolerance – Resilience to faults Resilience to denial of service attacks, security.

Main Server Distributes a screen saver–based application to users Applies signal analysis algorithms different data sets to process radio-telescope data. Has more than 3 million users - used over a million years of CPU time to date 1. Install Screen Server Radio-telescope Data 2. SETI client (screen Saver) starts 3. SETI client gets data from server and runs 4. Client sends results back to server Distributed Computing:

Issues in File Sharing Services Publish – How to insert a new file into the network Lookup – Find a specific file Retrieval – Getting a copy of a file

P2P File Sharing Software Allows a user to open up a directory in their file system  Anyone can retrieve a file from directory  Like a Web server Allows the user to copy files from other users’ open directories:  Like a Web client Allows users to search nodes for content based on keyword matches:  Like Google

Napster: How Did It Work Application-level, client-server protocol over point- to-point TCP Centralized directory server Steps:  Connect to Napster server  Give server keywords to search the full list with.  Select “best” of correct answers.  One approach is select based on the response time of a pings. –Shortest response time is chosen.

Napster: How Did It Work File list and IP address is uploaded 1. napster.com centralized directory

Napster: How Did It Work napster.com centralized directory Query and results User requests search at server. 2.

Napster: How Did It Work pings User pings hosts that apparently have data. Looks for best transfer rate. 3. napster.com centralized directory

Napster: How Did It Work napster.com centralized directory Retrieves file User chooses server 4. Napster’s centralized server farm had difficult time keeping up with traffic

Napster History:  5/99: Shawn Fanning (freshman, Northeasten U.) founds Napster Online music service  12/99: first lawsuit  3/00: 25% UWisc traffic Napster  2/01: US Circuit Court of Appeals: Napster knew users violating copyright laws  7/01: # simultaneous online users: Napster 160K, Gnutella: 40K, Morpheus (KaZaA): 300K

Napster Judge orders Napster to pull plug in July ‘01 Other file sharing apps take over! gnutella napster fastrack (KaZaA) 8M 6M 4M 2M 0.0 bits per sec

Napster: Discussion Locates files quickly Vulnerable to censorship and technical failure Popular data become less accessible because of the load of the requests on a central server

Gnutella The focus is on a decentralized method of searching for files  Central directory server no longer the bottleneck  More difficult to “pull plug” Each application instance serves to:  Store selected files  Route queries from and to its neighboring peers  Respond to queries if file stored locally  Serve files

Gnutella Gnutella history:  3/14/00: release by AOL, almost immediately withdrawn  Became open source  Many iterations to fix poor initial design (poor design turned many people off) Issues:  How much traffic does one query generate?  How many hosts can it support at once?  What is the latency associated with querying?  Is there a bottleneck?

Gnutella: Searching Searching by flooding: A Query packet might ask, "Do you have any content that matches the string ‘Homer"?  If a node does not have the requested file, then 7 (default set by Gnutella) of its neighbors are queried.  If the neighbors do not have it, they contact 7 of their neighbors.  Maximum hop count: 10 (this is called time-to-live TTL)  Reverse path forwarding for responses (not files)

Gnutella: Searching Downloading Peers respond with a “QueryHit” (contains contact info) File transfers use direct connection using HTTP protocol’s GET method When there is a firewall a "Push" packet is used – reroutes via Push path

Gnutella: Searching

Gnutella: Discovering Peers A peer has to know at least one other peer to send requests to. Addresses of some peers have been published on a website. When a peer enters the network, it contacts a designated peer and receives a list of other peers that have recently entered the network.

Gnutella: Discussion Robust: The failure of peer is not a failure of Gnutella. Performance: Flooding leads to poor performance Free riders: Those who get data but do not share data.

KaZaA: The Service More than 3 million up peers sharing over 3,000 terabytes of content More popular than Napster ever was More than 50% of Internet traffic ? MP3s & entire albums, videos, games Optional parallel downloading of files Automatically switches to new download server when current server becomes unavailable Provides estimated download times

KaZaA: The Service A user can configure the maximum number of simultaneous uploads and maximum number of simultaneous downloads Queue management at server and client  Frequent uploaders can get priority in server queue Keyword search  User can configure “up to x” responses to keywords Responses to keyword queries come in waves; stops when x responses are found

KaZaA: The Technology Proprietary Control data encrypted Everything in HTTP request and response messages

KaZaA: Architecture Each peer is either a supernode or is assigned to a supernode  56 min avg connect  Each SN has about children  Roughly 30,000 SNs Each supernode has TCP connections with supernodes  23 min avg connect supernodes

KaZaA: Architecture Nodes that have more connection bandwidth and are more available are designated as supernodes Each supernode acts as a mini-Napster hub, tracking the content and IP addresses of its descendants A supernode tracks only the content of its children. Considered a cross between Napster and Gnutella

KaZaA: Finding Supernodes List of potential supernodes included within software download New peer goes through list until it finds operational supernode  Connects, obtains more up-to-date list, with 200 entries  Nodes in list are “close” to ON.  Node then pings 5 nodes on list and connects with the one If supernode goes down, node obtains updated list and chooses new supernode

KaZaA Queries Node first sends query to supernode  Supernode responds with matches  If x matches found, done. Otherwise, supernode forwards query to subset of supernodes  If total of x matches found, done. Otherwise, query further forwarded  Probably by original supernode rather than recursively

Peer Topologies Core Centralized Ring Hierarchical Decentralized Hybrid Centralized-Ring Centralized-Centralized Centralized-Decentralized

Centralized Client/server Web servers Databases Napster search Instant Messaging Popular Power

Ring Fail-over clusters Simple load balancing Assumption  Single owner

Hierarchical Tree structure Example: DNS

Decentralized Gnutella Freenet Internet routing

Centralized + Ring Robust web applications High availability of servers

Centralized + Centralized N-tier apps Database heavy systems Web services gateways Google.com uses this topology to deliver their service

Centralized + Centralized N-tier apps Database heavy systems Web services gateways Google.com uses this topology to deliver their service

Centralized + Decentralized New Wave of P2P Clip2 Gnutella Reflector (next) FastTrack  KaZaA  Morpheus

Structured P2P So far the examples that we have seen are considered unstructured P2P in the sense that looking for content does not actually take the content into account. Structured P2P allows for more directed searches based on the content. Directed searches are an attempt to address the flooding problem

Distributed Hash Tables (DHT) This is used to overcoming the flooded search problem Operationally like standard hash tables Data is distributed around the network Features  Efficient:  O(log N) messages per lookup  Even distribution of keys among nodes  Adaptable  Network reconfiguration does not cascade to all nodes  Robust: replication of tables provides survival to node failures

DHT Step 1: The Hash Introduce a hash function to map the object being searched for to a unique identifier:  e.g., h(“NGC’02 Tutorial Notes”) → 8045 Distribute the range of the hash function among all nodes in the network Each node must “know about” at least one copy of each object that hashes within its range (when one exists)

“Knowing about objects” Two alternatives  Node can cache each (existing) object that hashes within its range  Pointer-based: level of indirection - node caches pointer to location(s) of object

DHT Step 2: Routing For each object, node(s) whose range(s) cover that object must be reachable via a “short” path  by the querier node (assumed can be chosen arbitrarily)  by nodes that have copies of the object (when pointer-based approach is used) The different approaches (CAN,Chord,Pastry,Tapestry) differ fundamentally only in the routing approach  any “good” random hash function will suffice This is a major research topic

Summary P2P is a major portion of Internet traffic.  Has exceeded web traffic There are different approaches to structuring P2P applications. There is a good deal of concern about future scalability and traffic which leading to a lot research from academia and industry.