Download presentation
Presentation is loading. Please wait.
Published byEsther Newton Modified over 9 years ago
1
LOOKING UP DATA IN P2P SYSTEMS Hari Balakrishnan M. Frans Kaashoek David Karger Robert Morris Ion Stoica MIT LCS
2
Key Idea Survey paper Discusses how to access data in a P2P system Covers four solutions –CAN –Chord –Pastry –Tapestry
3
INTRODUCTION P2P systems are popular due to –Low startup cost – High scalability at very low cost –Use of resources that would otherwise remain unused –Potential for greater robustness Fully decentralized and distributed
4
The lookup problem How do we locate data in large P2P systems? One solution – Distributed hash tables ( DHT )
5
Previous solutions (I) Centralized database –Napster Not scalable Vulnerable to attacks on database
6
Previous solutions (II) Broadcasting –Customers broadcast their requests to their neighbors, which forward them to their own neighbors and so on –Gnutella –Does not scale either Broadcast messages consume too much bandwidth
7
Previous solutions (III) Internet DNS –Organizes network nodes into an hierarchy –All searches start at top of hierarchy Propagate down –Used by KaZaA, Grokster and others –Nodes higher in the tree do much more work than lower nodes –Solution vulnerable to loss of root node(s)
8
Previous solutions (IV) Freenet –Forwards queries from node to node until requested data are found –Emphasis is on anonymity Not performance Unpopular documents may become inaccessible –Nobody cares!
9
DISTRIBUTED HASH TABLES Implements primitive lookup ( key ) –Produces a path going from a node n o to the node holding key Big tradeoff is between –Keeping paths short –Minimizing state information kept by nodes
10
Main design issues Mapping keys to nodes in a balanced way –Use a hash function Forwarding a lookup for a key to appropriate node –Find at each step a node closer to the node holding the key Building routing tables –Each node should have a successor
11
CAN Uses a d-dimensional key space –Partitioned into hyper-rectangles " Zones " –Each node manages a zone Responsible for all keys in zone
12
Neighbors Each node keeps track of addresses of all its neighbors –Routing table Neighbors are defined as nodes sharing a (d-1) dimensional hyper-plane –Contacts with fewer dimensions in common do not count
13
A two-dimensional example (I)
14
A two-dimensional example (II) X (0, 0; 0.5, 0.5) (0, 0) (1, 0) (1, 1) (0, 1) In reality the state space wraps X (0, 0.5; 0.5, 1) X (0.5, 0.5; 1, 1) X (0.5, 0.25; 0.75, 0.5) X (0.5, 0; 0.75, 0.25) X (0.75, 0;1, 0.5)
15
A path from (0.25, 0.3) to (0.8, 0.8) X (0, 0; 0.5, 0.5) (0, 0) (1, 0) (1, 1) (0, 1) In reality the state space wraps X (0, 0.5; 0.5, 1) X (0.5, 0.5; 1, 1) X (0.5, 0.25; 0.75, 0.5) X (0.5, 0; 0.75, 0.25) X (0.75, 0;1, 0.5)
16
Lookup Routing tries to approximate the straight path between current zone and zone holding the key Various optimizations attempt to reduce lookup latency
17
Dynamic behavior When a node joins the network –It picks random point in space –Find node managing the zone –Splits with it current zone When a node departs –Zones are merged More complex process
18
Fault-tolerance When a node fails neighbor with smallest zone takes over –Multiple failures may cause too many nodes to handle multiple zones
19
CHORD Assigns ID's to keys and nodes in the same address space ID's are organized in a ring –ID 0 follows the highest ID Each node is responsible for all keys that immediately precede it in the key space
20
Example N 4 K 6 N 12 N 20 N 24 K1 K 10 K 15
21
Finger table Each node keeps a table containing IP addresses of nodes – Halfway around in the key space –Quarter-of-the-way around – … Table has log N entries –Allows O(log N) searches
22
Partial example N 4 N 12 N 20 N 24
23
Fault-tolerance Each node has a successor list –Contains IP addresses of next r successors Guarantees routing progress as long as all r successors are not down
24
Dynamic behavior New node n learns its place in the Chord ring by asking any extant node to do a lookup( n ) Must also –Update successor list of its predecessor –Create its own successor list
25
PASTRY Scalable, self-organizing, routing and object location infrastructureScalable, self-organizing, routing and object location infrastructure Each node has a node IDEach node has a node ID –IDs are uniformly distributed in the ID space Includes a proximity metric to measure distances between pairs of ID'sIncludes a proximity metric to measure distances between pairs of ID's
26
Pastry Nodes Each node maintains three sets of nodesEach node maintains three sets of nodes – Leaf set Closest nodes in terms of node ID'sClosest nodes in terms of node ID's Same function as Chord's successor listSame function as Chord's successor list – Nodes in routing table Prefix routing (big idea)Prefix routing (big idea) – Neighborhood set Closest nodes in terms of proximity metricClosest nodes in terms of proximity metric
27
Dynamic behavior Pastry is self-organizingPastry is self-organizing – Nodes come and go – Includes a seed discovery protocol
28
Prefix Routing At each step, a node forwards an incoming request to a node whose node id has largest common prefix withAt each step, a node forwards an incoming request to a node whose node id has largest common prefix with Destination ID: 1230Destination ID: 1230 Node ID: 023Node ID: 1 023 Next Hop: 12 --Next Hop: 12 --
29
Routing table for node 1023 022122303120 113012331302 100310131032 10201022 No common prefix One common digit Two common digits Three common digits
30
Routing request for node 1230 022122303120 1130 1223 1302 100310131032 10201022 No common prefix One common digit Two common digits Three common digits Request is always send to a node having at least one more common prefix digit. Here it's node 1223
31
At node 1233 022122303120 103011301302 120112111220 1230 1232 No common prefix One common digit Two common digits Three common digits Node with at least one more common prefix digit is node 1230
32
TAPESTRY Interprets keys as sequences of digits Incremental prefix routing –Similar to Pastry Main contribution is emphasis on proximity –In the actual world Reduces query latency Makes system much more complex
33
CONCLUSIONS Major issues include – Operational costs: searches are all O(log n ) ; storage costs vary – Fault-tolerance and concurrent changes: only Chord and Tapestry can handle them – Proximity routing: Pastry, CAN and Tapestry have heuristics – Malicious nodes: Pastry checks node ID's
34
Summary of costs CANChordPastryTapestry Node state 1 d log N Lookup 2 dN 1/ d log N Join 2 dN 1/ d + d log N log 2 N 1 number of other nodes known by a given
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.