Peer-to-peer systems and

Peer-to-peer systems and
Complex Adaptive Systems C.d.L. Informatica – Università di Bologna Peer-to-peer systems and overlay networks Fabio Picconi Dipartimento di Scienze dell’Informazione

Outline Introduction to P2P systems Common topologies Data location
Churn Newscast algorithm Security

Peer-to-peer vs. client-server
Server well connected to the “center” of the Internet Servers carries out critical tasks Clients only talk to server Only nodes located on the “periphery” of the Internet Tasks distributed across all nodes Clients talk to other clients

Example – Video sharing
Client-server: YouTube Advantages Client can disconnect after upload Uploader needs little bandwidth Other users can find the file easily (just use search on server webpage) Disadvantages Server may not accept file or remove it later (according to content policy) Whole system depends on the server (what if shut down like Napster?) Server storage and bandwidth are expensive! downloader downloader downloader uploader downloader downloader client-server

Example – Video sharing
Peer-to-peer: BitTorrent Advantages Does not depend on a central server Bandwidth shared across nodes (downloaders also act as uploaders) High scalability, low cost Disadvantages Seeder must remain on-line to guarantee file availability Content is more difficult to find (downloaders must find .torrent file) Freeloaders cheat in order to download without uploading downloader downloader downloader seeder downloader downloader peer-to-peer

Comparison: P2P vs. client-server
Asymmetric: client and servers carry out different tasks Global knowledge: servers have a global view of the network Centralization: communications and management are centralized Single point of failure: a server failure brings down the system Limited scalability: servers easily overloaded Expensive: server storage and bandwidth capacity is not cheap Peer-to-peer Symmetric: each node carries out the same tasks Local knowledge: nodes only know a small set of other nodes Decentralization: nodes must self-organize in a decentralized way Robustness: several nodes may fail with little or no impact High scalability: high aggregate capacity, load distribution Low-cost: storage and bandwidth are contributed by users

Characterizing peer-to-peer systems
The main characteristics of P2P systems are: decentralization (i.e., no central server) self-organization (e.g., adding new nodes and removing disconnected ones) symmetric communications (e.g., peers act as clients and servers) scalability (thanks to high aggregate capacity and load distribution) shared ownership (i.e., storage and bandwidth are contributed by peers) overlay construction and routing (i.e., nodes form a logical network on top of the underlying IP network) a message from one peer to another is sent through the underlying IP network

P2P environment P2P systems are deployed in a challenging environment:
High latency and low bandwidth between nodes - a high hop count will result in a high end-to-end latency - transferring large files may take a long time Churn - nodes may disconnect temporarily - new nodes are constantly joining the system, while others leave the overlay permanently Security - P2P clients run on machines under full control of their users - data sent to other nodes may be erased, corrupted, disclosed, etc. - malicious users may try to bring down the system (e.g., routing attack) Selfishness - users may run hacked P2P clients in order to avoid contributing resources

Problems Some of the problems that a P2P systems designer must face:
Overlay construction and maintenance - maintain a given overlay topology (e.g., random, two-level, ring, etc.) Data location - locate a given data object among a large number of nodes Data dissemination - propagate data in an efficient and robust manner Per-node state - keep the amount of state per node small Tolerance to churn - maintain system invariants (e.g., topology, data location, data availability) despite node arrivals and departures

Topology Some common topologies:
Flat unstructured: a node can connect to any other node - only constraint: maximum degree dmax - fast join procedure - usually very tolerant to churn - good for data dissemination, bad for location Two-level unstructured: nodes connect to a supernode - supernodes form a small overlay - used for indexing and forwarding - large state and high load on supernodes Flat structured: constraints based on node ids - allows for efficient data location - constraints require long join and leave procedures - less robust in high-churn environments

Data location - Flooding
Problem: find the set of nodes S that store a copy of object O Solutions: (1) Flooding: send a search message to all nodes [first Gnutella protocol] A search message contains either keywords or an object id Advantages: - simplicity - no topology constraints Disadvantages: - high network overhead (huge traffic generated by each search request) - flooding stopped by TTL (which produces search horizon) - only applicable to small number of nodes

Data location - Flooding
(1) Flooding (cont.) Flooding in a flat unstructured network: obj search horizon for TTL = 2 search Objects that lie outside of the horizon are not found

Data location - Superpeers
(2) Two-level overlay: use superpeers to index the locations of an object [eMule, Gnutella 2, BitTorrent] Each node connects to a superpeer and advertises the list of objects it stores Search requests are sent to the superpeer, which forwards them to other superpeers Advantages: - highly scalable Disadvantages: - superpeers must be realiable, powerful and well connected to the Internet (expensive) - superpeers must maintain large state - the system relies on a small number of superpeers

Data location - Superpeers
(2) Two-level overlay (cont.) request obj response A two-level overlay is a partially centralized system In some systems superpeers do not connect to each other (e.g., BitTorrent)

Data location - KBR (3) Structured networks: use a routing algorithm that implements Key-Based Routing [Overnet, Kad, BitTorrent trackerless] Key-Based Routing (also known as Distributed Hash Tables, or DHTs) works as follows: each node is given a unique node identifier, or nodeid given a key k, the node whose nodeid is numerically closest to k among all nodes in the network is known as the root of key k given a routing key k, a KBR algorithm can route a message to the root of k in a small number of hops, usually O(log N) the location of an object with id objectid is tracked by the root of k = objectid thus, one can find the location of an object by routing a message to the root of k = objectid and querying the root for the location of the object

Data location - KBR (cont.)
Key-Based Routing [Pastry] Source node id: 04F2 k = object id: 8955 Hop # Hop id Shared prefix length 0 04F2 0 1 85E0 1 (root of k) route(k=8955,msg) 04F2 E25A obj 8955 3A79 C52A AC78 5230 8957 620F 8954 obj 8955 obj8955 stored on nodes 620F,C52A 8909 Object 8955 is tracked by node 8954, which knows of two copies stored at nodes 620F and C52A 85E0 8821 overlay address space [0000,FFFF]

Routing table for node 4F28 [Pastry] used to find next hop with longer shared prefix used to find the nodeid closest to a key that is close to the local nodeid In this example the routing table size is 4 x 15 = 60 entries, for a maximum network size of N = nodes. The average route length in this case is 4 hops.

(3) Structured networks (cont.) Advantages: - completely decentralized (no need for superpeers) - routing algorithm achieves low hop count for large network sizes Disadvantages: - each object must be tracked by a different node - objects are tracked by unreliable nodes (i.e., which may disconnect) - keyword-based searches are more difficult to implement than with superpeers (because objects are located by their objectid) - the overlay must be structured according to a given topology in order to achieve a low hop count - routing tables must be updated every time a node joins or leaves the overlay

Data location - Loosely structured overlays
(4) Loosely structured networks: use hints on the location of objects [Freenet] Nodes locate objects by sending search requests containing the object id Requests are propagated using a technique similar to flooding Objects with similar identifiers are grouped on the same nodes request for AE5J C A B AE5J 5B20 response E D F AF02

(4) Loosely structured networks (cont.) A search response leaves routing hints on the path back to the source Hints are used when propagating future requests for similar object ids Hints AE5J: B Hints AE5J: D 5B20: E C A request for AF02 B AE5J 5B20 F E D AF02 Hints AE5J: F

(4) Loosely structured networks (cont.) Advantages: - no topology constraints, flat architecture - searches are more efficient than with plain flooding Disadvantages: - does not support keyword-based searches - search requests have a TTL - contrary to structured overlays, loosely structured overlay do not guarantee a low number of hops, nor that the object will be found

partially centralized
Data location - Summary The location schemes described previously can be classified according to: degree of structure decentralization unstructured loosely structured structured partially centralized eMule Gnutella 2 BitTorrent - decentralized Gnutella Freenet Kad

Effects of churn Churn (node arrivals and departures) can have several effects on a P2P system: data objects may be become unavailable if all replicas disconnect routing tables may become inconsistent (e.g., entries may point to nodes which have disconnected) the overlay may become partitioned if several nodes suddenly disconnect:

Churn tolerance Node arrivals and departures must not disrupt the normal behavior of the system system invariants must be maintained - connected overlay (i.e., no partitions), low average path length - data objects accessible from anywhere in the network two types of churn tolerance: - dynamic repair: ability to react to changes in the overlay to maintain system invariants (e.g., heal partitions) - static resilience: ability to continue operating correctly before adaptation occurs (e.g., route messages through alternate paths)

Churn – Preventing partitions
An simple way to prevent partitions is to increase the node degree Ring partitions can be avoided by keeping a list of successor nodes

Churn – Static resilience (Ring topology)
Percentage of failed paths for various numbers of successor nodes 1 successor 16 successors 48 successors More successors reduce the chances of “opening” the ring

Churn – Static resilience (various topologies)
Percentage of failed paths for various overlay topologies Some topologies provide more alternate paths between nodes than others

Churn – Dynamic repair (Ring topology)
Dynamically update routing information to adapt to overlay changes Two types of repair algorithms: - reactive: start maintenance procedure immediately after detection - periodic: execute maintenance procedure periodically A reactive algorithm can bring down the system instead of repairing it: Each node disconnection triggers a maintenance procedure on a set of nodes Over a given churn rate, the maintenance traffic congests the network The network congestion leads to nodes being considered as disconnected This triggers even more maintenance procedures (positive feedback), eventually bringing down the network This effect may be avoided using a periodic maintenance algorithm

Churn – Summary Churn is an important issue in P2P overlays:
Data may become unavailable, and routing information outdated Static resilience - depends on the topology (i.e., the number of alternate paths) - increases with the average node degree Dynamic repair protocols must be carefully designed - reactive protocols are usually faster - periodic protocols can handle higher churn rates

Newscast Peer-to-peer protocol that creates and maintains an unstructured overlay Highly resilient to churn Can be used to propagate information Extremely simple design based on information gossip: - Each node only knows about a small set of other knows (view) - Each node periodically picks a random node from this set - Both nodes exchange their views and update them The random view exchange makes the algorithm very robust to failures and changes in the overlay

Newscast - View exchange
Each node maintains a view v containing c entries, where entry = {node address, timestamp} Each node executes the following code every T seconds: 1. select random entry r from local view 2. send local view plus an entry with the local address to node r 3. retrieve view of node r and merge it with local view 4. keep the c entries with the most recent timestamps The view of a node changes on each round

Newscast - View exchange (cont.)
Example view exchange initiated by node A B,10 D,5 E,3 X,8 S,12 W,2 B D W A E view of node A (c = 6) S X

1. Select random node B,10 D,5 E,3 X,8 S,12 W,2 selected node view of node A 2. Exchange views (plus local entry) with selected node A,15 B,10 D,5 E,3 X,8 S,12 W,2 E,15 J,14 C,9 D,8 H,10 L,14 Z,2 view of node A view of node E A E

3. Merge views, and order by timestamp 4. Keep c most recent entries E,15 J,14 L,14 S,12 H,10 B,10 C,9 X,8 D,8 D,5 E,3 W,2 Z,2 E,15 J,14 L,14 S,12 H,10 B,10 merge result new view of node A (c = 6)

Newscast - Average path length
Newscast overlays have a low average path length, i.e., O(log N) (average number of hops between any two nodes)

Newscast - Static resilience
The overlay shows high static resilience

Newscast - Summary Simple peer-to-peer algorithm
Nodes use only local information Periodic peer-wise data exchanges Emergent properties: - low average path length - resistance to high churn

Security Security in peer-to-peer systems is hard to enforce:
Users have full control on their computers Modified clients may not follow the standard protocol Communications may be eavesdropped Data may be corrupted Private data stored on remote computers may be disclosed

Security - Weak identities
The user may leave the system and rejoin it with a new identity (i.e, user id) If an attack is detected, the attacker can reenter the system with a new id An attacker may create a large number of false identities (Sybil attack) Example of Sybil Attack: Nodes S1 to S6 are actually 6 instances of the P2P client running on the same machine The attacker can intercept all traffic coming from or going to node A S1 S2 S6 A S3 S5 S4

Security - Strong identities
The user cannot change its identity Solution: use a centralized, trusted Certification Authority (CA) - Each new user must obtain an identity certificate: certificate = { user id, IP address, user’s public key, signatureCA } - The certificate is digitally signed by the CA, whose public key is known by all users - A certificate cannot be forged (would require the CA’s private key) To prove his identity, a user signs a message with his private key, and attaches the corresponding certificate signed by the CA Strong identities prevent Sybil Attacks If an attacker is caught, it cannot easily rejoin the system

Security – Weak vs. strong identities
Strong identities requires a centralized CA new nodes must contact the CA before joining the network: - the CA response may be slow (a few days) - if the CA is unavailable, new nodes cannot join the system the security of the system depends on the CA: - the CA must correctly verify the identity of the requester - the CA’s private key must be kept secret Many P2P systems use weak identities IP address already gives some identity information Some systems ensure anonymity (e.g., FreeNet)

The end You can get these slides from
For any questions, mail me to:

Peer-to-peer systems and

Similar presentations

Presentation on theme: "Peer-to-peer systems and"— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

Peer-to-peer systems and

Similar presentations

Presentation on theme: "Peer-to-peer systems and"— Presentation transcript:

Similar presentations

About project

Feedback