Peer-to-peer systems and

Slides:



Advertisements
Similar presentations
P2P data retrieval DHT (Distributed Hash Tables) Partially based on Hellerstein’s presentation at VLDB2004.
Advertisements

Ion Stoica, Robert Morris, David Karger, M. Frans Kaashoek, Hari Balakrishnan MIT and Berkeley presented by Daniel Figueiredo Chord: A Scalable Peer-to-peer.
Pastry Peter Druschel, Rice University Antony Rowstron, Microsoft Research UK Some slides are borrowed from the original presentation by the authors.
Scalable Content-Addressable Network Lintao Liu
Peer-to-Peer Systems Chapter 25. What is Peer-to-Peer (P2P)? Napster? Gnutella? Most people think of P2P as music sharing.
Clayton Sullivan PEER-TO-PEER NETWORKS. INTRODUCTION What is a Peer-To-Peer Network A Peer Application Overlay Network Network Architecture and System.
Denial-of-Service Resilience in Peer-to-Peer Systems D. Dumitriu, E. Knightly, A. Kuzmanovic, I. Stoica and W. Zwaenepoel Presenter: Yan Gao.
Peer to Peer File Sharing Huseyin Ozgur TAN. What is Peer-to-Peer?  Every node is designed to(but may not by user choice) provide some service that helps.
Secure routing for structured peer-to-peer overlay networks (by Castro et al.) Shariq Rizvi CS 294-4: Peer-to-Peer Systems.
presented by Hasan SÖZER1 Scalable P2P Search Daniel A. Menascé George Mason University.
Chord-over-Chord Overlay Sudhindra Rao Ph.D Qualifier Exam Department of ECECS.
Freenet A Distributed Anonymous Information Storage and Retrieval System I Clarke O Sandberg I Clarke O Sandberg B WileyT W Hong.
Topics in Reliable Distributed Systems Fall Dr. Idit Keidar.
Wide-area cooperative storage with CFS
 Structured peer to peer overlay networks are resilient – but not secure.  Even a small fraction of malicious nodes may result in failure of correct.
1CS 6401 Peer-to-Peer Networks Outline Overview Gnutella Structured Overlays BitTorrent.
Introduction to Peer-to-Peer Networks. What is a P2P network Uses the vast resource of the machines at the edge of the Internet to build a network that.
P2P File Sharing Systems
INTRODUCTION TO PEER TO PEER NETWORKS Z.M. Joseph CSE 6392 – DB Exploration Spring 2006 CSE, UT Arlington.
1 Napster & Gnutella An Overview. 2 About Napster Distributed application allowing users to search and exchange MP3 files. Written by Shawn Fanning in.
Introduction Widespread unstructured P2P network
Distributed Systems Concepts and Design Chapter 10: Peer-to-Peer Systems Bruce Hammer, Steve Wallis, Raymond Ho.
Introduction to Peer-to-Peer Networks. What is a P2P network A P2P network is a large distributed system. It uses the vast resource of PCs distributed.
Content Overlays (Nick Feamster). 2 Content Overlays Distributed content storage and retrieval Two primary approaches: –Structured overlay –Unstructured.
Peer to Peer Research survey TingYang Chang. Intro. Of P2P Computers of the system was known as peers which sharing data files with each other. Build.
Chord: A Scalable Peer-to-peer Lookup Protocol for Internet Applications Xiaozhou Li COS 461: Computer Networks (precept 04/06/12) Princeton University.
1 Distributed Hash Tables (DHTs) Lars Jørgen Lillehovde Jo Grimstad Bang Distributed Hash Tables (DHTs)
Security Michael Foukarakis – 13/12/2004 A Survey of Peer-to-Peer Security Issues Dan S. Wallach Rice University,
Chord: A Scalable Peer-to-peer Lookup Service for Internet Applications.
Peer-to-Peer Network Tzu-Wei Kuo. Outline What is Peer-to-Peer(P2P)? P2P Architecture Applications Advantages and Weaknesses Security Controversy.
SIGCOMM 2001 Lecture slides by Dr. Yingwu Zhu Chord: A Scalable Peer-to-peer Lookup Service for Internet Applications.
1 Peer-to-Peer Technologies Seminar by: Kunal Goswami (05IT6006) School of Information Technology Guided by: Prof. C.R.Mandal, School of Information Technology.
Peer to Peer A Survey and comparison of peer-to-peer overlay network schemes And so on… Chulhyun Park
1 Secure Peer-to-Peer File Sharing Frans Kaashoek, David Karger, Robert Morris, Ion Stoica, Hari Balakrishnan MIT Laboratory.
1 Distributed Hash Table CS780-3 Lecture Notes In courtesy of Heng Yin.
Peer to Peer Network Design Discovery and Routing algorithms
Algorithms and Techniques in Structured Scalable Peer-to-Peer Networks
Peer-to-Peer Systems: An Overview Hongyu Li. Outline  Introduction  Characteristics of P2P  Algorithms  P2P Applications  Conclusion.
LOOKING UP DATA IN P2P SYSTEMS Hari Balakrishnan M. Frans Kaashoek David Karger Robert Morris Ion Stoica MIT LCS.
CS Spring 2014 CS 414 – Multimedia Systems Design Lecture 37 – Introduction to P2P (Part 1) Klara Nahrstedt.
INTERNET TECHNOLOGIES Week 10 Peer to Peer Paradigm 1.
P2P Search COP6731 Advanced Database Systems. P2P Computing  Powerful personal computer Share computing resources P2P Computing  Advantages: Shared.
P2P Search COP P2P Search Techniques Centralized P2P systems  e.g. Napster, Decentralized & unstructured P2P systems  e.g. Gnutella.
Large Scale Sharing Marco F. Duarte COMP 520: Distributed Systems September 19, 2004.
CS Spring 2010 CS 414 – Multimedia Systems Design Lecture 24 – Introduction to Peer-to-Peer (P2P) Systems Klara Nahrstedt (presented by Long Vu)
Distributed Web Systems Peer-to-Peer Systems Lecturer Department University.
Peer-to-Peer Information Systems Week 12: Naming
A Survey of Peer-to-Peer Content Distribution Technologies Stephanos Androutsellis-Theotokis and Diomidis Spinellis ACM Computing Surveys, December 2004.
Ion Stoica, Robert Morris, David Liben-Nowell, David R. Karger, M
An example of peer-to-peer application
CS 268: Lecture 22 (Peer-to-Peer Networks)
Pastry Scalable, decentralized object locations and routing for large p2p systems.
Peer-to-Peer Data Management
Controlling the Cost of Reliability in Peer-to-Peer Overlays
(slides by Nick Feamster)
CHAPTER 3 Architectures for Distributed Systems
Net 323 D: Networks Protocols
EE 122: Peer-to-Peer (P2P) Networks
Providing Secure Storage on the Internet
DHT Routing Geometries and Chord
Prof. Leonardo Mostarda University of Camerino
Presentation by Theodore Mao CS294-4: Peer-to-peer Systems
CS 162: P2P Networks Computer Science Division
Unstructured Routing : Gnutella and Freenet
DATA RETRIEVAL IN ADHOC NETWORKS
Mobile P2P Data Retrieval and Caching
An Overview of Peer-to-Peer
A Semantic Peer-to-Peer Overlay for Web Services Discovery
Peer-to-Peer Information Systems Week 12: Naming
#02 Peer to Peer Networking
Presentation transcript:

Peer-to-peer systems and Complex Adaptive Systems C.d.L. Informatica – Università di Bologna Peer-to-peer systems and overlay networks Fabio Picconi Dipartimento di Scienze dell’Informazione

Outline Introduction to P2P systems Common topologies Data location Churn Newscast algorithm Security

Peer-to-peer vs. client-server Server well connected to the “center” of the Internet Servers carries out critical tasks Clients only talk to server Only nodes located on the “periphery” of the Internet Tasks distributed across all nodes Clients talk to other clients

Example – Video sharing Client-server: YouTube Advantages Client can disconnect after upload Uploader needs little bandwidth Other users can find the file easily (just use search on server webpage) Disadvantages Server may not accept file or remove it later (according to content policy) Whole system depends on the server (what if shut down like Napster?) Server storage and bandwidth are expensive! downloader downloader downloader uploader downloader downloader client-server

Example – Video sharing Peer-to-peer: BitTorrent Advantages Does not depend on a central server Bandwidth shared across nodes (downloaders also act as uploaders) High scalability, low cost Disadvantages Seeder must remain on-line to guarantee file availability Content is more difficult to find (downloaders must find .torrent file) Freeloaders cheat in order to download without uploading downloader downloader downloader seeder downloader downloader peer-to-peer

Comparison: P2P vs. client-server Asymmetric: client and servers carry out different tasks Global knowledge: servers have a global view of the network Centralization: communications and management are centralized Single point of failure: a server failure brings down the system Limited scalability: servers easily overloaded Expensive: server storage and bandwidth capacity is not cheap Peer-to-peer Symmetric: each node carries out the same tasks Local knowledge: nodes only know a small set of other nodes Decentralization: nodes must self-organize in a decentralized way Robustness: several nodes may fail with little or no impact High scalability: high aggregate capacity, load distribution Low-cost: storage and bandwidth are contributed by users

Characterizing peer-to-peer systems The main characteristics of P2P systems are: decentralization (i.e., no central server) self-organization (e.g., adding new nodes and removing disconnected ones) symmetric communications (e.g., peers act as clients and servers) scalability (thanks to high aggregate capacity and load distribution) shared ownership (i.e., storage and bandwidth are contributed by peers) overlay construction and routing (i.e., nodes form a logical network on top of the underlying IP network) a message from one peer to another is sent through the underlying IP network

P2P environment P2P systems are deployed in a challenging environment: High latency and low bandwidth between nodes - a high hop count will result in a high end-to-end latency - transferring large files may take a long time Churn - nodes may disconnect temporarily - new nodes are constantly joining the system, while others leave the overlay permanently Security - P2P clients run on machines under full control of their users - data sent to other nodes may be erased, corrupted, disclosed, etc. - malicious users may try to bring down the system (e.g., routing attack) Selfishness - users may run hacked P2P clients in order to avoid contributing resources

Problems Some of the problems that a P2P systems designer must face: Overlay construction and maintenance - maintain a given overlay topology (e.g., random, two-level, ring, etc.) Data location - locate a given data object among a large number of nodes Data dissemination - propagate data in an efficient and robust manner Per-node state - keep the amount of state per node small Tolerance to churn - maintain system invariants (e.g., topology, data location, data availability) despite node arrivals and departures

Topology Some common topologies: Flat unstructured: a node can connect to any other node - only constraint: maximum degree dmax - fast join procedure - usually very tolerant to churn - good for data dissemination, bad for location Two-level unstructured: nodes connect to a supernode - supernodes form a small overlay - used for indexing and forwarding - large state and high load on supernodes Flat structured: constraints based on node ids - allows for efficient data location - constraints require long join and leave procedures - less robust in high-churn environments

Data location - Flooding Problem: find the set of nodes S that store a copy of object O Solutions: (1) Flooding: send a search message to all nodes [first Gnutella protocol] A search message contains either keywords or an object id Advantages: - simplicity - no topology constraints Disadvantages: - high network overhead (huge traffic generated by each search request) - flooding stopped by TTL (which produces search horizon) - only applicable to small number of nodes

Data location - Flooding (1) Flooding (cont.) Flooding in a flat unstructured network: obj search horizon for TTL = 2 search Objects that lie outside of the horizon are not found

Data location - Superpeers (2) Two-level overlay: use superpeers to index the locations of an object [eMule, Gnutella 2, BitTorrent] Each node connects to a superpeer and advertises the list of objects it stores Search requests are sent to the superpeer, which forwards them to other superpeers Advantages: - highly scalable Disadvantages: - superpeers must be realiable, powerful and well connected to the Internet (expensive) - superpeers must maintain large state - the system relies on a small number of superpeers

Data location - Superpeers (2) Two-level overlay (cont.) request obj response A two-level overlay is a partially centralized system In some systems superpeers do not connect to each other (e.g., BitTorrent)

Data location - KBR (3) Structured networks: use a routing algorithm that implements Key-Based Routing [Overnet, Kad, BitTorrent trackerless] Key-Based Routing (also known as Distributed Hash Tables, or DHTs) works as follows: each node is given a unique node identifier, or nodeid given a key k, the node whose nodeid is numerically closest to k among all nodes in the network is known as the root of key k given a routing key k, a KBR algorithm can route a message to the root of k in a small number of hops, usually O(log N) the location of an object with id objectid is tracked by the root of k = objectid thus, one can find the location of an object by routing a message to the root of k = objectid and querying the root for the location of the object

Data location - KBR (cont.) Key-Based Routing [Pastry] Source node id: 04F2 k = object id: 8955 Hop # Hop id Shared prefix length 0 04F2 0 1 85E0 1 2 8909 2 3 8957 3 4 8954 3 (root of k) route(k=8955,msg) 04F2 E25A obj 8955 3A79 C52A AC78 5230 8957 620F 8954 obj 8955 obj8955 stored on nodes 620F,C52A 8909 Object 8955 is tracked by node 8954, which knows of two copies stored at nodes 620F and C52A 85E0 8821 overlay address space [0000,FFFF]

Data location - KBR (cont.) Routing table for node 4F28 [Pastry] used to find next hop with longer shared prefix used to find the nodeid closest to a key that is close to the local nodeid In this example the routing table size is 4 x 15 = 60 entries, for a maximum network size of N = 65536 nodes. The average route length in this case is 4 hops.

Data location - KBR (cont.) (3) Structured networks (cont.) Advantages: - completely decentralized (no need for superpeers) - routing algorithm achieves low hop count for large network sizes Disadvantages: - each object must be tracked by a different node - objects are tracked by unreliable nodes (i.e., which may disconnect) - keyword-based searches are more difficult to implement than with superpeers (because objects are located by their objectid) - the overlay must be structured according to a given topology in order to achieve a low hop count - routing tables must be updated every time a node joins or leaves the overlay

Data location - Loosely structured overlays (4) Loosely structured networks: use hints on the location of objects [Freenet] Nodes locate objects by sending search requests containing the object id Requests are propagated using a technique similar to flooding Objects with similar identifiers are grouped on the same nodes request for AE5J C A B AE5J 5B20 response E D F AF02

Data location - Loosely structured overlays (4) Loosely structured networks (cont.) A search response leaves routing hints on the path back to the source Hints are used when propagating future requests for similar object ids Hints AE5J: B Hints AE5J: D 5B20: E C A request for AF02 B AE5J 5B20 F E D AF02 Hints AE5J: F

Data location - Loosely structured overlays (4) Loosely structured networks (cont.) Advantages: - no topology constraints, flat architecture - searches are more efficient than with plain flooding Disadvantages: - does not support keyword-based searches - search requests have a TTL - contrary to structured overlays, loosely structured overlay do not guarantee a low number of hops, nor that the object will be found

partially centralized Data location - Summary The location schemes described previously can be classified according to: degree of structure decentralization unstructured loosely structured structured partially centralized eMule Gnutella 2 BitTorrent - decentralized Gnutella Freenet Kad

Effects of churn Churn (node arrivals and departures) can have several effects on a P2P system: data objects may be become unavailable if all replicas disconnect routing tables may become inconsistent (e.g., entries may point to nodes which have disconnected) the overlay may become partitioned if several nodes suddenly disconnect:

Churn tolerance Node arrivals and departures must not disrupt the normal behavior of the system system invariants must be maintained - connected overlay (i.e., no partitions), low average path length - data objects accessible from anywhere in the network two types of churn tolerance: - dynamic repair: ability to react to changes in the overlay to maintain system invariants (e.g., heal partitions) - static resilience: ability to continue operating correctly before adaptation occurs (e.g., route messages through alternate paths)

Churn – Preventing partitions An simple way to prevent partitions is to increase the node degree Ring partitions can be avoided by keeping a list of successor nodes

Churn – Static resilience (Ring topology) Percentage of failed paths for various numbers of successor nodes 1 successor 16 successors 48 successors More successors reduce the chances of “opening” the ring

Churn – Static resilience (various topologies) Percentage of failed paths for various overlay topologies Some topologies provide more alternate paths between nodes than others

Churn – Dynamic repair (Ring topology) Dynamically update routing information to adapt to overlay changes Two types of repair algorithms: - reactive: start maintenance procedure immediately after detection - periodic: execute maintenance procedure periodically A reactive algorithm can bring down the system instead of repairing it: Each node disconnection triggers a maintenance procedure on a set of nodes Over a given churn rate, the maintenance traffic congests the network The network congestion leads to nodes being considered as disconnected This triggers even more maintenance procedures (positive feedback), eventually bringing down the network This effect may be avoided using a periodic maintenance algorithm

Churn – Summary Churn is an important issue in P2P overlays: Data may become unavailable, and routing information outdated Static resilience - depends on the topology (i.e., the number of alternate paths) - increases with the average node degree Dynamic repair protocols must be carefully designed - reactive protocols are usually faster - periodic protocols can handle higher churn rates

Newscast Peer-to-peer protocol that creates and maintains an unstructured overlay Highly resilient to churn Can be used to propagate information Extremely simple design based on information gossip: - Each node only knows about a small set of other knows (view) - Each node periodically picks a random node from this set - Both nodes exchange their views and update them The random view exchange makes the algorithm very robust to failures and changes in the overlay

Newscast - View exchange Each node maintains a view v containing c entries, where entry = {node address, timestamp} Each node executes the following code every T seconds: 1. select random entry r from local view 2. send local view plus an entry with the local address to node r 3. retrieve view of node r and merge it with local view 4. keep the c entries with the most recent timestamps The view of a node changes on each round

Newscast - View exchange (cont.) Example view exchange initiated by node A B,10 D,5 E,3 X,8 S,12 W,2 B D W A E view of node A (c = 6) S X

Newscast - View exchange (cont.) 1. Select random node B,10 D,5 E,3 X,8 S,12 W,2 selected node view of node A 2. Exchange views (plus local entry) with selected node A,15 B,10 D,5 E,3 X,8 S,12 W,2 E,15 J,14 C,9 D,8 H,10 L,14 Z,2 view of node A view of node E A E

Newscast - View exchange (cont.) 3. Merge views, and order by timestamp 4. Keep c most recent entries E,15 J,14 L,14 S,12 H,10 B,10 C,9 X,8 D,8 D,5 E,3 W,2 Z,2 E,15 J,14 L,14 S,12 H,10 B,10 merge result new view of node A (c = 6)

Newscast - Average path length Newscast overlays have a low average path length, i.e., O(log N) (average number of hops between any two nodes)

Newscast - Static resilience The overlay shows high static resilience

Newscast - Summary Simple peer-to-peer algorithm Nodes use only local information Periodic peer-wise data exchanges Emergent properties: - low average path length - resistance to high churn

Security Security in peer-to-peer systems is hard to enforce: Users have full control on their computers Modified clients may not follow the standard protocol Communications may be eavesdropped Data may be corrupted Private data stored on remote computers may be disclosed

Security - Weak identities The user may leave the system and rejoin it with a new identity (i.e, user id) If an attack is detected, the attacker can reenter the system with a new id An attacker may create a large number of false identities (Sybil attack) Example of Sybil Attack: Nodes S1 to S6 are actually 6 instances of the P2P client running on the same machine The attacker can intercept all traffic coming from or going to node A S1 S2 S6 A S3 S5 S4

Security - Strong identities The user cannot change its identity Solution: use a centralized, trusted Certification Authority (CA) - Each new user must obtain an identity certificate: certificate = { user id, IP address, user’s public key, signatureCA } - The certificate is digitally signed by the CA, whose public key is known by all users - A certificate cannot be forged (would require the CA’s private key) To prove his identity, a user signs a message with his private key, and attaches the corresponding certificate signed by the CA Strong identities prevent Sybil Attacks If an attacker is caught, it cannot easily rejoin the system

Security – Weak vs. strong identities Strong identities requires a centralized CA new nodes must contact the CA before joining the network: - the CA response may be slow (a few days) - if the CA is unavailable, new nodes cannot join the system the security of the system depends on the CA: - the CA must correctly verify the identity of the requester - the CA’s private key must be kept secret Many P2P systems use weak identities IP address already gives some identity information Some systems ensure anonymity (e.g., FreeNet)

The end You can get these slides from http://www.cs.unibo.it/~picconi/slides For any questions, mail me to: picconi@cs.unibo.it