SCRIBE A large-scale and decentralized application-level multicast infrastructure
Overview Pastry PAST distributed file system layered on top of Pastry SCRIBE decentralized publish/subscribe system
Pastry – Quick Review Chord like routing Consistent hashing Prefix routing Leaf set
Pastry – locality properties Short routes Total distance traveled Average dist 1.59 to 2.2 times actual dist Route convergence Dist Traveled by 2 messages sent to same key Equal to dist between to nodes before routes converge
Pastry API nodeID = pastryInit(Credentials) Causes node to join pastry network route(msg,key) send(msg,IP-addr) Applications must export: deliver(msg,key) forward(msg,key,nextID) newLeafs(leafSet)
SCRIBE Built on top of Pastry Support large number of groups Handle a high rate of membership turnover SCRIBE nodes can: Create groups Join groups Multicast messages to groups
SCRIBE API create(credentials, groupID) join(credentials, groupID) leave(credentials, groupID) multicast(credentials, groupID, message)
SCRIBE – Creating a Group Pastry route(msg, key) SCRIBE route(CREATE, groupID) groupID => hash textual name cat creator name Message delivered to closest key which become rendez-vous point for the group (root of multicast tree for group) Adds to local list of groups Stores credentials Alternative use itself as root good choice if creator sends to group often
SCRIBE – Joining a Group Pastry route(msg, key) SCRIBE route(JOIN, groupID) routed to rendez-vous point along the way multicast tree formed
SCRIBE – Leaving a Group Remove from local group children list If list becomes empty forward to parent Part of the multicast tree may be removed
SCRIBE – Sending a multicast message route(MULTICAST, groupID) ask for rendez-vous IP address If rendez-vous fails re-request rendez-vous point Pastry handles node duplication All messages are sent through the rendez-vous point
SCRIBE – Repairing the Multicast Tree Messages are delivered only in best-effort may be out of order delivery Periodic heartbeat message sent to all children Child rejoins the tree through sending a new JOIN message if suspects parent has failed Can repair rendez-vous point Pastry handles node duplication in leaf nodes Children nodes JOIN new root when missing heartbeat is detected
SCRIBE – Forming a Multicast Tree Rendez-vous point (root) Forwarders may or may not be members of the group maintain a children table (IP and nodeID) for group
SCRIBE - Strengths Pastry handles root duplication Rendez-vous point does not handle all join requests Locality properties of Pastry short routes delay from rendez-vous point to member is short route convergence load imposed on physical network is small
SCRIBE – Experimental Evalutation Simulation experimental results Focus on three metrics: delay to deliver events to group members stress on each node stress on each physical network link
SCRIBE – Simulator Evaluation 5050 routers and 100,000 end nodes 1,500 groups of different sizes 10 different runs using same parameters but different random seeds Averaged all results Compared results with IP multicast
SCRIBE – Delay Penalty RMD – ratio between max delay using SCRIBE & max delay using IP multicast RAD – ratio between average delay using SCRIBE & average delay using IP multicast
SCRIBE – Node Stress Average node is responsible for forwarding a small number of multicast messages
SCRIBE – Link Stress Total num links = 1,035,295 SCRIBE = 2,489,824 messages (mean 2.4) IP multicast = 758,853 messages (mean 0.7)
SCRIBE – Bottleneck remover Bottlenecks Low capacity nodes High capacity nodes with extremely high children entries Drop children if over capacity Select child to drop and send message with children table Child chooses new parent node and sends JOIN message Result Removes long tail in node stress graph Increases average link stress
SCRIBE – Scalability Small Groups 50,000 nodes 30,000 groups 11 members each group SCRIBE performs poorly for large number small groups SCRIBE collapse Removes long paths removing nodes that are not members of a group & have only one entry in their children table
SCRIBE – Scalability Small Groups Average link stress 6.1 to 3.3 Average number of children 21.2 to 8.5
SCRIBE - Conclusion Fully decentralized Support large number of groups Support large group size Multiple multicast sources per group QUESTIONS