EE324 DISTRIBUTED SYSTEMS L17 P2P. Scaling Problem 2  Millions of clients  server and network meltdown.

Slides:



Advertisements
Similar presentations
P2P data retrieval DHT (Distributed Hash Tables) Partially based on Hellerstein’s presentation at VLDB2004.
Advertisements

Ion Stoica, Robert Morris, David Karger, M. Frans Kaashoek, Hari Balakrishnan MIT and Berkeley presented by Daniel Figueiredo Chord: A Scalable Peer-to-peer.
Peer to Peer and Distributed Hash Tables
Course on Computer Communication and Networks Lecture 10 Chapter 2; peer-to-peer applications (and network overlays) EDA344/DIT 420, CTH/GU.
Clayton Sullivan PEER-TO-PEER NETWORKS. INTRODUCTION What is a Peer-To-Peer Network A Peer Application Overlay Network Network Architecture and System.
CompSci 356: Computer Network Architectures Lecture 21: Content Distribution Chapter 9.4 Xiaowei Yang
Peer-to-Peer Networks João Guerreiro Truong Cong Thanh Department of Information Technology Uppsala University.
L-19 P2P. Scaling Problem Millions of clients  server and network meltdown 2.
Application Layer Overlays IS250 Spring 2010 John Chuang.
P2p, Spring 05 1 Topics in Database Systems: Data Management in Peer-to-Peer Systems March 29, 2005.
Peer-to-Peer
Peer-to-Peer Jeff Pang Spring Spring 2004, Jeff Pang2 Intro Quickly grown in popularity –Dozens or hundreds of file sharing applications.
Topics in Reliable Distributed Systems Lecture 2, Fall Dr. Idit Keidar.
Peer-to-Peer Credits: Slides adapted from J. Pang, B. Richardson, I. Stoica, M. Cuenca, B. Cohen, S. Ramesh, D. Qui, R. Shrikant, Xiaoyan Li and R. Martin.
Introduction to Peer-to-Peer (P2P) Systems Gabi Kliot - Computer Science Department, Technion Concurrent and Distributed Computing Course 28/06/2006 The.
P2P: Advanced Topics Filesystems over DHTs and P2P research Vyas Sekar.
Chord: A Scalable Peer-to-peer Lookup Service for Internet Applications Ion Stoica, Robert Morris, David Karger, M. Frans Kaashoek and Hari alakrishnan.
Chord-over-Chord Overlay Sudhindra Rao Ph.D Qualifier Exam Department of ECECS.
Topics in Reliable Distributed Systems Fall Dr. Idit Keidar.
1 CS 194: Distributed Systems Distributed Hash Tables Scott Shenker and Ion Stoica Computer Science Division Department of Electrical Engineering and Computer.
1 Seminar: Information Management in the Web Gnutella, Freenet and more: an overview of file sharing architectures Thomas Zahn.
Wide-area cooperative storage with CFS
Peer-to-Peer Networks Slides largely adopted from Ion Stoica’s lecture at UCB.
Peer-to-Peer Networking Credit slides from J. Pang, B. Richardson, I. Stoica.
File Sharing : Hash/Lookup Yossi Shasho (HW in last slide) Based on Chord: A Scalable Peer-to-peer Lookup Service for Internet ApplicationsChord: A Scalable.
Winter 2008 P2P1 Peer-to-Peer Networks: Unstructured and Structured What is a peer-to-peer network? Unstructured Peer-to-Peer Networks –Napster –Gnutella.
Peer-to-Peer Scaling Problem Millions of clients  server and network meltdown.
Peer-to-Peer Outline p2p file sharing techniques –Downloading: Whole-file vs. chunks –Searching Centralized index (Napster, etc.) Flooding (Gnutella,
CS 640: Introduction to Computer Networks Yu-Chi Lai Lecture 18 - Peer-to-Peer.
Distributed Systems Lecture 21 – CDN & Peer-to-Peer.
P2P File Sharing Systems
INTRODUCTION TO PEER TO PEER NETWORKS Z.M. Joseph CSE 6392 – DB Exploration Spring 2006 CSE, UT Arlington.
1 Napster & Gnutella An Overview. 2 About Napster Distributed application allowing users to search and exchange MP3 files. Written by Shawn Fanning in.
Introduction Widespread unstructured P2P network

Content Overlays (Nick Feamster). 2 Content Overlays Distributed content storage and retrieval Two primary approaches: –Structured overlay –Unstructured.
CSE Computer Networks Prof. Aaron Striegel Department of Computer Science & Engineering University of Notre Dame Lecture 25 – April 15, 2010.
Introduction of P2P systems
Peer to Peer Research survey TingYang Chang. Intro. Of P2P Computers of the system was known as peers which sharing data files with each other. Build.
Chord: A Scalable Peer-to-peer Lookup Protocol for Internet Applications Xiaozhou Li COS 461: Computer Networks (precept 04/06/12) Princeton University.
1 Distributed Hash Tables (DHTs) Lars Jørgen Lillehovde Jo Grimstad Bang Distributed Hash Tables (DHTs)
1 Slides from Richard Yang with minor modification Peer-to-Peer Systems: DHT and Swarming.
Presentation 1 By: Hitesh Chheda 2/2/2010. Ion Stoica, Robert Morris, David Karger, M. Frans Kaashoek, Hari Balakrishnan MIT Laboratory for Computer Science.
Peer-to-Peer File Sharing Jennifer Rexford COS 461: Computer Networks Lectures: MW 10-10:50am in Architecture N101
SIGCOMM 2001 Lecture slides by Dr. Yingwu Zhu Chord: A Scalable Peer-to-peer Lookup Service for Internet Applications.
1 Peer-to-Peer Technologies Seminar by: Kunal Goswami (05IT6006) School of Information Technology Guided by: Prof. C.R.Mandal, School of Information Technology.
PEER TO PEER (P2P) NETWORK By: Linda Rockson 11/28/06.
Peer to Peer A Survey and comparison of peer-to-peer overlay network schemes And so on… Chulhyun Park
15-744: Computer Networking L-22: P2P. Lecture 22: Peer-to-Peer Networks Typically each member stores/provides access to content Has quickly.
CS 640: Introduction to Computer Networks Aditya Akella Lecture 24 - Peer-to-Peer.
Peer-to-Peer Systems.  Quickly grown in popularity:  Dozens or hundreds of file sharing applications  In 2004: 35 million adults used P2P networks.
Computer Networking P2P. Why P2P? Scaling: system scales with number of clients, by definition Eliminate centralization: Eliminate single point.
Hui Zhang, Fall Computer Networking CDN and P2P.
Peer-to-Peer File Sharing
Algorithms and Techniques in Structured Scalable Peer-to-Peer Networks
Two Peer-to-Peer Networking Approaches Ken Calvert Net Seminar, 23 October 2001 Note: Many slides “borrowed” from S. Ratnasamy’s Qualifying Exam talk.
1 Overlay Networks. 2 Routing overlays –Experimental versions of IP (e.g., 6Bone) –Multicast (e.g., MBone and end-system multicast) –Robust routing (e.g.,
INTERNET TECHNOLOGIES Week 10 Peer to Peer Paradigm 1.
P2P Search COP6731 Advanced Database Systems. P2P Computing  Powerful personal computer Share computing resources P2P Computing  Advantages: Shared.
P2P Search COP P2P Search Techniques Centralized P2P systems  e.g. Napster, Decentralized & unstructured P2P systems  e.g. Gnutella.
Chord: A Scalable Peer-to-Peer Lookup Service for Internet Applications * CS587x Lecture Department of Computer Science Iowa State University *I. Stoica,
CS Spring 2010 CS 414 – Multimedia Systems Design Lecture 24 – Introduction to Peer-to-Peer (P2P) Systems Klara Nahrstedt (presented by Long Vu)
(some slides from D. Anderson, K. Harras and others)
Peer-to-Peer Data Management
Distributed Systems CDN & Peer-to-Peer.
مظفر بگ محمدی دانشگاه ایلام
EE 122: Peer-to-Peer (P2P) Networks
DHT Routing Geometries and Chord
CS 162: P2P Networks Computer Science Division
Consistent Hashing and Distributed Hash Table
Presentation transcript:

EE324 DISTRIBUTED SYSTEMS L17 P2P

Scaling Problem 2  Millions of clients  server and network meltdown

P2P System 3  Leverage the resources of client machines (peers)  Computation, storage, bandwidth

Why p2p? 4  Harness lots of spare capacity  1M end-hosts: can use it for free by providing the right incen tive.  Build self-managing systems / Deal with huge scale  Same techniques attractive for both companies / servers / p 2p E.g., Akamai’s 14,000 nodes Google’s 100,000+ nodes

Outline 5  p2p file sharing techniques  Downloading: Whole-file vs. chunks  Searching Centralized index (Napster, etc.) Flooding (Gnutella, etc.) Smarter flooding (KaZaA, …) Routing (Freenet, etc.)  Uses of p2p - what works well, what doesn’t?  servers vs. arbitrary nodes  Hard state (backups!) vs soft-state (caches)  Challenges  Fairness, freeloading, security, …

P2p file-sharing 6  Quickly grown in popularity  Dozens or hundreds of file sharing applications  40%~70% of Internet traffic in 2007  ~40% [pam 2012] P2P

What’s out there? 7 CentralFloodSuper- node flood Route Whole File NapsterGnutellaFreenet Chunk Based BitTorrentKaZaA (bytes, not chunks) DHTs (Vuze, uTorrent, BitComet)

Searching 8 Internet N1N1 N2N2 N3N3 N6N6 N5N5 N4N4 Publisher Key=“SNSD” Value=Music video data… Client Lookup(“SNSD”) ?

Framework 9  Common Primitives:  Join: how to I begin participating?  Publish: how do I advertise my file?  Search: how to I find a file?  Fetch: how to I retrieve a file?

Outline 10  Centralized Database  Napster  Query Flooding  Gnutella, KaZaA  Swarming  BitTorrent  Unstructured Overlay Routing  Freenet  Structured Overlay Routing  Distributed Hash Tables  Something new: Bitcoin

Napster 11  History  1999: Sean Fanning launches Napster  Peaked at 1.5 million simultaneous users  Jul 2001: Napster shuts down  Centralized Database:  Join: on startup, client contacts central server  Publish: reports list of files to central server  Search: query the server => return someone that stores the requested file  Fetch: get the file directly from peer

Napster: Publish 12 I have X, Y, and Z! Publish insert(X, )

Napster: Search 13 Where is file A? Query Reply search(A) --> Fetch

Napster: Discussion 14  Pros:  Simple  Search scope is O(1)  Controllable (pro or con?)  Cons:  Server maintains O(N) State  Server does all processing  Single point of failure

Outline 15  Centralized Database  Napster  Query Flooding  Gnutella, KaZaA  Swarming  BitTorrent  Structured Overlay Routing  Distributed Hash Tables

Gnutella 16  History  In 2000, J. Frankel and T. Pepper from Nullsoft released Gnutella  Soon many other clients: Bearshare, Morpheus, LimeWire, etc.  In 2001, many protocol enhancements including “ultrapeers”  Query Flooding:  Join: on startup, client contacts a few other nodes; these become it s “neighbors”  Publish: no need  Search: ask neighbors, who ask their neighbors, and so on... when /if found, reply to sender. TTL limits propagation  Fetch: get the file directly from peer

Gnutella: Overview 17  Query Flooding:  Join: on startup, client contacts a few other nodes; these become its “neighbors”  Publish: no need  Search: ask neighbors, who ask their neighbors, and so on... when/if found, reply to sender. TTL limits propagation  Fetch: get the file directly from peer

Gnutella: bootstrapping 18  Manual boostrapping (type your friend’s IP addr)  Software has list of IP addresses  Gnutella Web Caching System  Tells the list of other Gnutella clients  HTTP based protocol for getting and updating list of clients 

I have file A. Gnutella: Search 19 Where is file A? Query Reply

Gnutella: Discussion 20  Pros:  Fully de-centralized  Search cost distributed  each node permits powerful search semantics  Cons:  Search scope is O(N)  Search time is O(???)  Nodes leave often, network unstable  TTL-limited search works well for haystacks.  For scalability, does NOT search every node. May have to re-issue query later

KaZaA 21  History  In 2001, KaZaA created by Dutch company Kazaa BV  Single network called FastTrack used by other clients as well: Morpheus, giFT, etc.  Eventually protocol changed so other clients could no longer talk to it  “Supernode” Query Flooding:  Join: on startup, client contacts a “supernode”... may at some point beco me one itself  Publish: send list of files to supernode  Search: send query to supernode, supernodes flood query amongst them selves.  Fetch: get the file directly from peer(s); can fetch simultaneously from mu ltiple peers

KaZaA: Network Design 22 “Super Nodes”

KaZaA: File Insert 23 I have X! Publish insert(X, )

KaZaA: File Search 24 Where is file A? Query search(A) --> search(A) --> Replies

KaZaA: Fetching 25  More than one node may have requested file...  How to tell?  Must be able to distinguish identical files  Not necessarily same filename  Same filename not necessarily same file...  Use Hash of file  KaZaA uses UUHash: fast, but not secure  Alternatives: MD5, SHA-1  How to fetch?  Get bytes [ ] from A, [ ] from B

KaZaA: Discussion 26  Pros:  Tries to take into account node heterogeneity: Bandwidth Host Computational Resources Host Availability (?)  Cons:  Mechanisms easy to circumvent  Still no real guarantees on search scope or search time  Similar behavior to gnutella, but better.

Stability and Superpeers 27  Why superpeers?  Query consolidation Many connected nodes may have only a few files Propagating a query to a sub-node would take more b/w than answ ering it yourself  Caching effect Requires network stability  Superpeer selection is time-based  How long you’ve been on is a good predictor of how long y ou’ll be around.

Outline 28  Centralized Database  Napster  Query Flooding  Gnutella, KaZaA  Swarming  BitTorrent  Structured Overlay Routing  Distributed Hash Tables

BitTorrent: History 29  In 2002, B. Cohen debuted BitTorrent  Key Motivation:  Popularity exhibits temporal locality (Flash Crowds)  E.g., Slashdot effect, CNN on 9/11, new movie/game release  Focused on Efficient Fetching, not Searching:  Distribute the same file to all peers  Single publisher, multiple downloaders  Has some “real” publishers:  Blizzard Entertainment using it to distribute the beta of their new game

BitTorrent: Overview 30  Swarming:  Join: contact centralized “tracker” server, get a list of peers.  Publish: Run a tracker server.  Search: Out-of-band. E.g., use Google to find a tracker for t he file you want.  Fetch: Download chunks of the file from your peers. Upload chunks you have to them.  Big differences from Napster:  Chunk based downloading  “few large files” focus  Anti-freeloading mechanisms

BitTorrent: Publish/Join 31 Tracker

BitTorrent: Fetch 32

BitTorrent: Sharing Strategy 33  Employ “Tit-for-tat” sharing strategy  A is downloading from some other people A will let the fastest N of those download from him  Be optimistic: occasionally let freeloaders download Otherwise no one would ever start! Also allows you to discover better peers to download from when they reciprocate

Modern BitTorrent 34  Uses DHT: Vuze,  The DHT operates by distributing lists of peers identified by the SHA-1 hash of the torrent.SHA-1  The DHT acts like a transparent, distributed Hash Table (thus the name) with node IDs based on the SHA-1 hash of the node's IP/Port combination.

Outline 35  Centralized Database  Napster  Query Flooding  Gnutella, KaZaA  Swarming  BitTorrent  Structured Overlay Routing  Distributed Hash Tables

Distributed Hash Tables: History 36  Academic answer to p2p  Goals  Guatanteed lookup success  Provable bounds on search time  Provable scalability  Makes some things harder  Fuzzy queries / full-text search / etc.  Read-write, not read-only  Hot Topic in networking since introduction in ~2000/20 01

DHT: Overview 37  Abstraction: a distributed “hash-table” (DHT) data s tructure:  put(id, item);  item = get(id);  Implementation: nodes in system form a distributed data structure  Can be Ring, Tree, Hypercube, Skip List, Butterfly Netw ork,...

DHT: Overview (2) 38  Structured Overlay Routing:  Join: On startup, contact a “bootstrap” node and integrate yourself into the distributed data structure; get a node id  Publish: Route publication for file id toward a close node id along the d ata structure  Search: Route a query for file id toward a close node id. Data structure guarantees that query will meet the publication.  Fetch: Two options: Publication contains actual file => fetch from where query stops Publication says “I have file X” => query tells you has X, use IP rou ting to get X from

DHT: Example - Chord 39  Associate to each node and file a unique id in an uni- dimensional space (a Ring)  E.g., pick from the range [0...2 m ]  Usually the hash of the file or IP address  Properties:  Routing table size is O(log N), where N is the total number of nodes  Guarantees that a file is found in O(log N) hops from MIT in 2001

DHT: Consistent Hashing 40 N32 N90 N105 K80 K20 K5 Circular ID space Key 5 Node 105 A key is stored at its successor: node with next higher ID

DHT: Chord Basic Lookup 41 N32 N90 N105 N60 N10 N120 K80 “Where is key 80?” “N90 has K80”

DHT: Chord “Finger Table” 42 N80 1/2 1/4 1/8 1/16 1/32 1/64 1/128 Entry i in the finger table of node n is the first node that succeeds or equals n + 2 i In other words, the ith finger points 1/2 n-i way around the ring

DHT: Chord Join 43  Assume an identifier space [0..8]  Node n1 joins i id+2 i succ Succ. Table

DHT: Chord Join 44  Node n2 joins i id+2 i succ Succ. Table i id+2 i succ Succ. Table

DHT: Chord Join 45  Nodes n0, n6 join i id+2 i succ Succ. Table i id+2 i succ Succ. Table i id+2 i succ Succ. Table i id+2 i succ Succ. Table

DHT: Chord Join 46  Nodes: n1, n2, n0, n6  Items: f7, f i id+2 i succ Succ. Table i id+2 i succ Succ. Table i id+2 i succ Succ. Table 7 Items 1 i id+2 i succ Succ. Table

DHT: Chord Routing 47  Upon receiving a query for item id, a n ode:  Checks whether stores the item locally  If not, forwards the query to the larges t node in its successor table that does n ot exceed id i id+2 i succ Succ. Table i id+2 i succ Succ. Table i id+2 i succ Succ. Table 7 Items 1 i id+2 i succ Succ. Table query(7)

DHT: Chord Summary 48  Routing table size?  Log N fingers  Routing time?  Each hop expects to 1/2 the distance to the desired id => expect O(log N) hops.  Pros:  Guaranteed Lookup  O(log N) per node state and search scope  Cons:  No one uses them? (only one file sharing app)  Supporting non-exact match search is hard

BitCoin 49  What they claim:  “Bitcoin is an innovative payment network and a new kind of money.”  “Bitcoin uses peer-to-peer technology to operate with no central authority or banks; managing transactions and the issuing of bitcoins is carried out collectively by the network. Bitcoin is open-source; its design is public, nobody owns or controls Bitcoin and everyone can take part.”

Bitcoin 50  Each client mints a bitcoin.  Clients have to perform many cryptographic operations to generate a bitcoin.  A bitcoin can be only used once. This is guaranteed by the software.  The rate at which it can create a bitcoin is limited by the difficulty of solving the crypto problem.  The problem gets more difficult over time. 

Bitcoin statistics 51

Bitcoin statistics 52

53  “Bitcoin Could Go To $1 Million”

P2P: Summary 54  Many different styles; remember pros and cons of each  centralized, flooding, swarming, unstructured and structured routing  Lessons learned:  Single points of failure are very bad  Flooding messages to everyone is bad  Underlying network topology is important  Not all nodes are equal  Need incentives to discourage freeloading  Privacy and security are important  Structure can provide theoretical bounds and guarantees