Pastry And Squirrel Presented by Eirik T. Laberg Håvard Semundseth Orri G. Pálsson
What is Pastry System? Overlay network that handles: Routing between nodes Object localization Each node is assigned a unique nodeId. 128-bit SHA-1hash of either the nodes public key or IP-address
Pastry Node Leaf set (L) A set of nodes that are numerically closer in the nodeId space to the present Node. Half larger and half smaller than the current node. The leaf set is mainly used when routing messages. Routing table (R) The routing table consists of a number of rows, where row i containing nodes sharing i initial digits of the nodeId with the local node
Pastry Node (2) Neighborhood set (M): Contains nodeIds and IP addresses of the |M| nodes that are closest (according to the proximity metric) to the local node. The neighbor set is used as a starting point for the construction of the routing table process and maintaining locality properties.
Routing Message with key D arrives to node with nodeId A. Checks if the key falls within the range of nodeIds in leaf set. If yes, forward message to destination node If no, use routing table. Forward to node with common prefix by at least one more digit Table entry are empty or node unreachable. Forward to node with prefix at least as long as key, and numerically closer in the nodeId space
Routing (2) Example:
Node Arrival (1/3) A new nodeId is X and its nearby Pastry node is A. Assumed that the new node X knows initially about the nearby Pastry node A. Node X asks A to route a “join” message with a key equal to X. Pastry routes the join message to existing node Z whose id is numerically closest to X in the nodeId space. Nodes A, Z and all nodes encountered on the path sends their state tables to X.
Node Arrival (2/3)
Node Arrival (3/3) New node X initializes its own state tables Neighborhood set is initialized with A’s (closest in proximity metric) neighborhood set Since Z is closest numerically to X: X’s leaf set is initialized with Z’s leaf set. Row 0 (R 0 ) of A’s routing table used to initialize X row 0 Row 1 (R 1 ) of node B’s routing table used to initialize X row 1 … Node X transmits a copy of its resulting state to all nodes in its neighborhood set (M), leaf set (L) and routing table (R) Nodes in Pastry network updates own state based on info. received
Locality What do we mean by locality? We mean the ability to exploit “local” resources over “global” ones whenever possible. The route chosen for a message is likely to be “good“ with respect to the proximity metric. Who can we maintain this property when a new node X joins/arrives? Discuss the locality property regarding: Locality in the routing table Route locality Locating the nearest among k nodes
Locality in the routing table Goal: Want to be sure that all routing entries refer to a node that is near the present node. According to the proximity metric with live nodes with appropriate prefix for entry Want to maintain this property when a new node X arrives. Stage one: Require that node A is near X and A’s R 0 are close to A according to proximity metric. We also assume that node B’s R 1 entries are reasonable choice for R 1 of X. The new node X initializes its state in this fashion X’s routing table (R) and neighborhood set (M) approximate the desired locality property. Problem: The quality of the approximation must be improved => Avoiding cascading errors that could lead to poor route locality. Solution: Use second stage. Node X requests the state from each of the nodes in its routing table and neighborhood set to update its entries to closer nodes. Neighborhood set contributes valuable information.
Route locality Each routing step moves the message closer to the destination: In the nodeId space. While traveling the least possible distance in the proximity space. Given that: A routed message from A to B at a distance d cannot be routed to a node with a distance of less than d from A. “Local” information is used, Pastry minimizes the distance of the next routing step with no global direction. Does not guarantee shortest path from source to destination is chosen.
Locating the nearest among k nodes Goal: Peer-to-peer applications may want to use Pastry to replicate information on k Pastry nodes. The k Pastry nodes is numerically closest to a given key in the Pastry nodeId space. A message routed from a client application (CA), reaches first a node near the CA, among k numerically closest nodes to a key. Problem: Since Pastry routes primarily is based on nodeId prefixes, it may miss nearby nodes with a different prefix than the key. Solution: Pastry uses heuristic to overcome prefix mismatch issues. It detects when a message approaches the set of k nodes and switches to numerically nearest address based routing. Heuristic: based on estimating the density of nodeIds.
Node departure Node considered failed when its immediate neighbor in the nodeId space can no longer communicate with the node. Leaf node: the failed node’s neighbor in the nodeId space contacts the node with the largest index in L on the side of the failed node and ask for its leaf table. Routing node : contacts another node of the same row, and asks for that nodes neighbor. If it can’t find a suitable on the same row it looks on the row below.
Arbitrary node failures Node continues to be responsive, but behaves incorrectly or even maliciously. Repeated queries fail each time since they normally take the same route. Solution: Routing can be randomized The choice among multiple nodes that satisfy the routing criteria should be made randomly
Some experimental results Quad-processor Compaq AlphaServer ES MHz Alpha CPUs 6GBytes of main memory True64 UNIX Implemented in Java Pastry nodes were configured to run in a single java VM
Routing performance
Maintaining the network (1/2)
Maintaining the network (2/2) Shows the quality of the routing tables With respect to locality property. How information exchange during join operation affects the quality. Optimal: the best (closest according to the proximity metric). Sub-Optimal: an entry was not the closest or was missing. SL: considers only the appropriate row from each node along the route. WT: fetches the entire state of each node along the path (omitting the second stage of update). WTF: WT + the second stage of update. Result: => Pastry’s method of node integration (“WFT”) is effective in initializing the routing tables with good locality. Less information exchange during join operation => lower quality with respect to locality.
Conclusion Pastry is a generic peer-to-peer content location and routing system Scales well Used for applications like global file sharing, file storage etc. Takes into account locality when routing messages
What is squirrel ? Squirrel is an alternative to caches that are deployed on dedicated machines on the boundaries of corporative LAN's. Client desktop machines cooperate in a p2p fashion inside the LAN to provide the functionality of a proxy.
Traditional approach One machine that have to be capable of handling peak loads of traffic Expensive hardware and administrative costs. Growth of users require hardware updates. Single point of failure.
Web caching If object is found in the cache server it is tested for freshness (ttl) If it is fresh, object is returned, otherwise a cGET request is generated by the browser. Two types of cGET If-Modified-Since (uses timestamp) If-None-Match (ETag = hashed web content ) Response from cGET either includes the content or not-modified message
Pastry Uses pastry for the location of web objects by mapping the url to a node. Hashes the url and uses it as a key. If web browser does not find the requested object in his cache then squirrel tries to locate a copy on another node.
Models Home store model Directory model
Home store model Homenode of an object is the node that has nodeId numerically closest to a given objectId All external requests are routed through home node.
Scenario Home store
Directory model A homenode holds a small directory of pointers to nodes (called delegates) that have recently accessed the object. Additionally it stores meta data about the object such as ETag, fetch time, last modified time, ttl etc. Requests are forwarded to a randomly chosen delegate that is know to have the object
Directory model scenario 1
Directory model scenario 2
Directory model scenario 3
Directory model scenario 4
Arrival, failure and departure Arrival New node is automatically set as homenode. Two neighboring nodes transfer objects and directories which objectId are numerically closest to the newly joined node. Failure Future requests will be routed to the node that has become numerically closest to the objectID. Departure Nodes that are capable of announcing their desire to leave the system can transfer stuff to neighbors
External bandwidth consumption A 100 mb of disk donation from each client to squirrel, lowers the external bandwidth consumption to the level of a dedicated cache
Latency Latency is dependant on LAN hops In traditional proxy caching LAN hops = 2 In squirrel LAN hops = 4-7 LAN communication is fast user perceived latency is minimal
Load on each node LocationObjects/secObjects/min Redmond*48388 Caimbridge**55125 *36,782 clients**105 clients Directory model
Load on each node LocationObjects/secObjects/min Redmond*865 Caimbridge**935 *36,782 clients**105 clients Home store model Average load on any given minute is 0,31 object/min (for Redmond) for both models. Squirrel performes webcaching with low cost
Fault tolerance Possible to loose connection to the internet due to router failure Internal link ore router fails results in partitioning the network. Squirrel would partition itself into two separate systems Individual nodes can fail. Most nodes leave voluntarily