1 1/30/2008 Network Applications: P2P Applications.

Slides:



Advertisements
Similar presentations
Peer-to-Peer and Social Networks An overview of Gnutella.
Advertisements

The Start of Digital Anarchy Shawn Fanning (19-yr-old student nicknamed Napster) developed the original Napster application and service in January 1999.
Ion Stoica, Robert Morris, David Karger, M. Frans Kaashoek, Hari Balakrishnan MIT and Berkeley presented by Daniel Figueiredo Chord: A Scalable Peer-to-peer.
INF 123 SW ARCH, DIST SYS & INTEROP LECTURE 12 Prof. Crista Lopes.
Peer to Peer and Distributed Hash Tables
Denial-of-Service Resilience in Peer-to-Peer Systems D. Dumitriu, E. Knightly, A. Kuzmanovic, I. Stoica and W. Zwaenepoel Presenter: Yan Gao.
Gnutella 2 GNUTELLA A Summary Of The Protocol and it’s Purpose By
Peer-to-Peer Networks João Guerreiro Truong Cong Thanh Department of Information Technology Uppsala University.
Peer-to-Peer Jeff Pang Spring Spring 2004, Jeff Pang2 Intro Quickly grown in popularity –Dozens or hundreds of file sharing applications.
CS162 Operating Systems and Systems Programming Lecture 23 HTTP and Peer-to-Peer Networks April 20, 2011 Ion Stoica
1 Unstructured Routing : Gnutella and Freenet Presented By Matthew, Nicolai, Paul.
Freenet A Distributed Anonymous Information Storage and Retrieval System I Clarke O Sandberg I Clarke O Sandberg B WileyT W Hong.
Topics in Reliable Distributed Systems Fall Dr. Idit Keidar.
1 CS 194: Distributed Systems Distributed Hash Tables Scott Shenker and Ion Stoica Computer Science Division Department of Electrical Engineering and Computer.
1 Seminar: Information Management in the Web Gnutella, Freenet and more: an overview of file sharing architectures Thomas Zahn.
Peer-to-Peer Networks Slides largely adopted from Ion Stoica’s lecture at UCB.
Winter 2008 P2P1 Peer-to-Peer Networks: Unstructured and Structured What is a peer-to-peer network? Unstructured Peer-to-Peer Networks –Napster –Gnutella.
1 Freenet  Addition goals to file location: -Provide publisher anonymity, security -Resistant to attacks – a third party shouldn’t be able to deny the.
Peer-peer and Application-level Networking CS 218 Fall 2003 Multicast Overlays P2P applications Napster, Gnutella, Robust Overlay Networks Distributed.
1CS 6401 Peer-to-Peer Networks Outline Overview Gnutella Structured Overlays BitTorrent.
CS 640: Introduction to Computer Networks Yu-Chi Lai Lecture 18 - Peer-to-Peer.
Introduction to Peer-to-Peer Networks. What is a P2P network Uses the vast resource of the machines at the edge of the Internet to build a network that.
P2P File Sharing Systems
Freenet. Anonymity  Napster, Gnutella, Kazaa do not provide anonymity  Users know who they are downloading from  Others know who sent a query  Freenet.
Peer-to-Peer Computing CS587x Lecture Department of Computer Science Iowa State University.
1 Napster & Gnutella An Overview. 2 About Napster Distributed application allowing users to search and exchange MP3 files. Written by Shawn Fanning in.
Introduction Widespread unstructured P2P network

1 Slides are from Richard Yang from Yale Minor modifications are made Network Applications and Network Programming: Web and P2P.
1 P2P Computing. 2 What is P2P? Server-Client model.
Introduction of P2P systems
Jonathan Walpole CSE515 - Distributed Computing Systems 1 Teaching Assistant for CSE515 Rahul Dubey.
2: Application Layer1 Chapter 2 outline r 2.1 Principles of app layer protocols r 2.2 Web and HTTP r 2.3 FTP r 2.4 Electronic Mail r 2.5 DNS r 2.6 Socket.
1 Peer-to-Peer Communication Research Project Presentation 2002 – Sukmin Kim – Srikara Hrushikesh CIS 585.
Chord: A Scalable Peer-to-peer Lookup Protocol for Internet Applications Xiaozhou Li COS 461: Computer Networks (precept 04/06/12) Princeton University.
1 Slides from Richard Yang with minor modification Peer-to-Peer Systems: DHT and Swarming.
The Start Shawn Fanning (19-yr-old student nicknamed Napster) developed the original Napster application and service in January 1999 while a freshman.
2: Application Layer1 Chapter 2: Application layer r 2.1 Principles of network applications  app architectures  app requirements r 2.2 Web and HTTP r.
1 V1-Filename.ppt / / Jukka K. Nurminen Content Search UnstructuredP2P Content Search Unstructured P2P Jukka K. Nurminen *Partly adapted from.
Peer-to-Peer File Sharing Jennifer Rexford COS 461: Computer Networks Lectures: MW 10-10:50am in Architecture N101
SIGCOMM 2001 Lecture slides by Dr. Yingwu Zhu Chord: A Scalable Peer-to-peer Lookup Service for Internet Applications.
1 Peer-to-Peer Technologies Seminar by: Kunal Goswami (05IT6006) School of Information Technology Guided by: Prof. C.R.Mandal, School of Information Technology.
15-744: Computer Networking L-22: P2P. Lecture 22: Peer-to-Peer Networks Typically each member stores/provides access to content Has quickly.
CS 640: Introduction to Computer Networks Aditya Akella Lecture 24 - Peer-to-Peer.
ADVANCED COMPUTER NETWORKS Peer-Peer (P2P) Networks 1.
1 9/30/2009 Peer-to-Peer Systems: Unstructured. Admin. r Programming assignment 1 linked on the schedule page 2.
Peer to Peer Network Design Discovery and Routing algorithms
15-744: Computer Networking L-22: P2P. L -22; © Srinivasan Seshan, P2P Peer-to-peer networks Assigned reading [Cla00] Freenet: A Distributed.
Peer-to-peer systems (part I) Slides by Indranil Gupta (modified by N. Vaidya)
Peer to Peer Computing. What is Peer-to-Peer? A model of communication where every node in the network acts alike. As opposed to the Client-Server model,
Algorithms and Techniques in Structured Scalable Peer-to-Peer Networks
CS Spring 2014 CS 414 – Multimedia Systems Design Lecture 37 – Introduction to P2P (Part 1) Klara Nahrstedt.
15-744: Computer Networking L-23: P2P. L -23; © Srinivasan Seshan, P2P Peer-to-peer networks Assigned reading [Cla00] Freenet: A Distributed.
INTERNET TECHNOLOGIES Week 10 Peer to Peer Paradigm 1.
P2P Search COP6731 Advanced Database Systems. P2P Computing  Powerful personal computer Share computing resources P2P Computing  Advantages: Shared.
P2P Search COP P2P Search Techniques Centralized P2P systems  e.g. Napster, Decentralized & unstructured P2P systems  e.g. Gnutella.
09/13/04 CDA 6506 Network Architecture and Client/Server Computing Peer-to-Peer Computing and Content Distribution Networks by Zornitza Genova Prodanoff.
1 Indranil Gupta (Indy) Lecture 4 Peer to Peer Systems January 27, 2011 All Slides © IG CS 525 Advanced Distributed Systems Spring 2011.
CS Spring 2010 CS 414 – Multimedia Systems Design Lecture 24 – Introduction to Peer-to-Peer (P2P) Systems Klara Nahrstedt (presented by Long Vu)
A Survey of Peer-to-Peer Content Distribution Technologies Stephanos Androutsellis-Theotokis and Diomidis Spinellis ACM Computing Surveys, December 2004.
BitTorrent Vs Gnutella.
CS 268: Lecture 22 (Peer-to-Peer Networks)
Peer-to-Peer and Social Networks
Early Measurements of a Cluster-based Architecture for P2P Systems
EE 122: Peer-to-Peer (P2P) Networks
CS 268: Peer-to-Peer Networks and Distributed Hash Tables
CS 162: P2P Networks Computer Science Division
Unstructured Routing : Gnutella and Freenet
#02 Peer to Peer Networking
Presentation transcript:

1 1/30/2008 Network Applications: P2P Applications

Recap: FTP, HTTP r FTP: file transfer m ASCII (human-readable format) requests and responses m stateful server m one data channel and one control channel r HTTP m ASCII requests, header lines, entity body, and responses line m stateless server (each request should contain the full information) m one data channel 2

3 Domain Name Service (DNS) r Hierarchical delegation avoids central control, improving manageability and scalability r Redundant servers improve robustness m see news/article.php/ for DDoS attack on root servers in Oct (9 of the 13 root servers were crippled, but only slowed the network) news/article.php/ m see for performance monitoringhttp:// r Caching reduces workload and improve robustness

4 Problems of DNS r Domain names may not be the best way to name other resources, e.g. files r Separation of DNS query from application query does not work well in mobile, dynamic environments m e.g., locate the nearest printer r Simple query model make it hard to implement advanced query r Relatively static records m although theoretically you can update the values of the records, it is rarely enabled

Summary of Traditional C-S Network Applications r How does a client locate a server? r Is the application extensible, robust, scalable? 5 app. server C0C0 client 1 client 2 client 3 client n down speed to the clients? DNS slashdot effect, CNN on 9/11

An Upper Bound on Scalability r Assume m need to achieve same speed to all clients m only uplinks can be bottlenecks 6 server C0C0 client 1 client 2 client 3 client n C1C1 C2C2 C3C3 CnCn

An Upper Bound on Scalability  Maximum throughput R = min{C 0, (C 0 +  C i )/n}  R is clearly the upper bound  The bound is theoretically approachable  why not in practice? 7 server C0C0 client 1 client 2 client 3 client n C1C1 C2C2 C3C3 CnCn

Outline r Recap r P2P 8

9 Objectives of P2P r Bypass DNS to access resources! m examples: instant messaging, skype r Share the storage and bandwidth of individual clients to improve scalability m examples: file sharing and streaming Internet

Peer-to-Peer Computing r Quickly grown in popularity: m dozens or hundreds of file sharing applications m 50-80% Internet traffic is P2P m upset the music industry, drawn college students, web developers, recording artists and universities into court From ipoque web site; Nov. 2007

What is P2P? r But P2P is not new and is probably here to stay r Original Internet was a p2p system: m The original ARPANET connected UCLA, Stanford Research Institute, UCSB, and Univ. of Utah m No DNS or routing infrastructure, just connected by phone lines m Computers also served as routers r P2P is simply an iteration of scalable distributed systems

P2P Systems r File Sharing: BitTorren, LimeWire r Streaming: PPLive, PPStream, … r Research systems m Distributed Hash Tables m Content distribution networks r Collaborative computing: m project m Human genome mapping m Intel NetBatch: 10,000 computers in 25 worldwide sites for simulations, saved about 500million

Outline r Recap r P2P m the lookup problem 13

The Lookup Problem Internet N1N1 N2N2 N3N3 N6N6 N5N5 N4N4 Publisher Key=“title” Value=MP3 data… Client Lookup(“title”) ? find where a particular file is stored pay particular attention to see its equivalence of DNS

15 Outline  Recap  P2P  the lookup problem  Napster

16 Centralized Database: Napster r Program for sharing music over the Internet r History: m 5/99: Shawn Fanning (freshman, Northeasten U.) founded Napster Online music service, wrote the program in 60 hours m 12/99: first lawsuit m 3/00: 25% UWisc traffic Napster m 2000: est. 60M users m 2/01: US Circuit Court of Appeals: Napster knew users violating copyright laws m 7/01: # simultaneous online users: Napster 160K m 9/02: bankruptcy We are referring to the Napster before closure. 03/2000

17 Napster: How Does it Work? Application-level, client-server protocol over TCP A centralized index system that maps files (songs) to machines that are alive and with files Steps: r Connect to Napster server r Upload your list of files (push) to server r Give server keywords to search the full list r Select “best” of hosts with answers

18 Napster Architecture

Napster: Publish I have X, Y, and Z! Publish insert(X, )

Napster: Search Where is file A? Query Reply search(A) -->

Napster: Ping ping ping

Napster: Fetch fetch

23 Napster Messages General Packet Format [chunksize] [chunkinfo] [data...] CHUNKSIZE: Intel-endian 16-bit integer size of [data...] in bytes CHUNKINFO: (hex) Intel-endian 16-bit integer login rejected 02 - login requested 03 - login accepted 0D - challenge? (nuprin1715) 2D - added to hotlist 2E - browse error (user isn't online!) 2F - user offline 5B - whois query 5C - whois result 5D - whois: user is offline! 69 - list all channels 6A - channel info 90 - join channel 91 - leave channel …..

24 Centralized Database: Napster r Summary of features: a hybrid design m control: client-server (aka special DNS) for files m data: peer to peer r Advantages m simplicity, easy to implement sophisticated search engines on top of the index system r Disadvantages m application specific (compared with DNS) m lack of robustness, scalability: central search server single point of bottleneck/failure m easy to sue !

25 Variation: BitTorrent r A global central index server is replaced by one tracker per file (called a swarm) m reduces centralization; but needs other means to locate trackers r The bandwidth scalability management technique is more interesting m more later

26 Outline  Recap  P2P  the lookup problem  Napster (central query server; distributed data servers)  Gnutella

Gnutella r On March 14 th 2000, J. Frankel and T. Pepper from AOL’s Nullsoft division (also the developers of the popular Winamp mp3 player) released Gnutella r Within hours, AOL pulled the plug on it r Quickly reverse-engineered and soon many other clients became available: Bearshare, Morpheus, LimeWire, etc. 27

28 Decentralized Flooding: Gnutella r On startup, client contacts other servents (server + client) in network to form interconnection/peering relationships m servent interconnection used to forward control (queries, hits, etc) r How to find a resource record: decentralized flooding m send requests to neighbors m neighbors recursively forward the requests

29 Decentralized Flooding B A C E F H J S D G I K M N L

30 Decentralized Flooding B A C E F H J S D G I K send query to neighbors M N L  Each node forwards the query to its neighbors other than the one who forwards it the query

31 Background: Decentralized Flooding B A C E F H J S D G I K M N L  Each node should keep track of forwarded queries to avoid loop !  nodes keep state (which will time out---soft state)  carry the state in the query, i.e. carry a list of visited nodes

32 Decentralized Flooding: Gnutella r Basic message header m Unique ID, TTL, Hops r Message types m Ping – probes network for other servents m Pong – response to ping, contains IP addr, # of files, etc. m Query – search criteria + speed requirement of servent m QueryHit – successful response to Query, contains addr + port to transfer from, speed of servent, etc. m Ping, Queries are flooded m QueryHit, Pong: reverse path of previous message

33 Advantages and Disadvantages of Gnutella r Advantages: m totally decentralized, highly robust r Disadvantages: m not scalable; the entire network can be swamped with flood requests especially hard on slow clients; at some point broadcast traffic on Gnutella exceeded 56 kbps m to alleviate this problem, each request has a TTL to limit the scope each query has an initial TTL, and each node forwarding it reduces it by one; if TTL reaches 0, the query is dropped (consequence?)

Aside: Search Time?

Aside: All Peers Equal? 56kbps Modem 10Mbps LAN 1.5Mbps DSL 56kbps Modem 1.5Mbps DSL

Aside: Network Resilience Partial TopologyRandom 30% dieTargeted 4% die from Saroiu et al., MMCN 2002

Flooding: FastTrack (aka Kazaa) r Modifies the Gnutella protocol into two-level hierarchy r Supernodes m Nodes that have better connection to Internet m Act as temporary indexing servers for other nodes m Help improve the stability of the network r Standard nodes m Connect to supernodes and report list of files r Search m Broadcast (Gnutella-style) search across supernodes r Disadvantages m Kept a centralized registration  prone to law suits

38 Outline  Recap  P2P  the lookup problem  Napster (central query server; distributed data server)  Gnutella (decentralized, flooding)  Freenet

39 Freenet r History m final year project Ian Clarke, Edinburgh University, Scotland, June, 1999Ian ClarkeEdinburgh University r Goals: m totally distributed system without using centralized index or broadcast (flooding) m respond adaptively to usage patterns, transparently moving, replicating files as necessary to provide efficient service m provide publisher anonymity, security m free speech : resistant to attacks – a third party shouldn’t be able to deny (e.g., deleting) the access to a particular file (data item, object)

40 Basic Structure of Freenet r Each machine stores a set of files; each file is identified by a unique identifier (called key or id) r Each node maintains a “routing table” m id – file id, key m next_hop node – where a file corresponding to the id might be available m file – local copy if one exists id next_hop file … … …

41 Query r API: file = query(id); r Upon receiving a query for file id m check whether the queried file is stored locally if yes, return it if not, forward the query message –key step: search for the “closest” id in the table, and forward the message to the corresponding next_hop id next_hop file … … …

42 Query Example 4 n1 f4 12 n2 f12 5 n3 9 n3 f9 3 n1 f3 14 n4 f14 5 n3 14 n5 f14 13 n2 f13 3 n6 n1 n2 n3 n4 4 n1 f4 10 n5 f10 8 n6 n5 query(10) ’ 5 Beside the routing table, each node also maintains a query table containing the state of all outstanding queries that have traversed it  to backtrack

43 Query: the Complete Process r API: file = query(id); r Upon receiving a query for file id m check whether the queried file is stored locally if yes, return it; otherwise m check TTL to limit the search scope each query is associated a TTL that is decremented each time the query message is forwarded when TTL=1, the query is forwarded with a probability TTL can be initiated to a random value within some bounds to obscure distance to originator m look for the “closest” id in the table with an unvisited next_hope node if found one, forward the query to the corresponding next_hop otherwise, backtrack –ends up performing a Depth First Search (DFS)-like traversal –search direction ordered by closeness to target r When file is returned it is cached along the reverse path (any advantage?)