Download presentation
Presentation is loading. Please wait.
Published byCatherine Adams Modified over 9 years ago
1
Gil EinzigerRoy Friedman Computer Science Department Technion
2
Background: Publish/Subscribe Publisher: any entity that wishes to publish some event Look at my new hairstyle
3
Background: Publish/Subscribe Subscriber: any entity that wishes to be notified about events that match its interests (also called subscription) I want to know everything about hairstyles I want to know everything about Ariana Grande I only care about science fiction
4
Background: Publish/Subscribe The system’s goal is to deliver notifications about events to interested subscribers and only to them – Decoupling of information producers from consumers Applications: – Social networking (Twitter and the likes) – Stock quotes – Control systems – Data-center management – Etc.
5
Background: P2P Decentralized systems in which (most of) the communication is performed directly between the end nodes of the system – the peers Often, peers are donated users’ machines – But can also be set-top boxes, routers, a large datacenter’s servers, etc. Famous example applications: – Skype, Bittorrent (and other file sharing), IPTV, Bitcoin
6
It’s a Brave New World Out There Most users access online content through their mobiles – Intermittent connectivity – Limited bandwidth – Limited battery life – Limited resources We need to decouple the devices used to access the data and the ones serving the P2P network
7
A Vision for Future P2P Solutions Whether ran as a true P2P network between donated machines or inside a data-center: – Decoupling between devices that consume services and the ones providing the service Incentives might be in the form of revenue share with advertisers or paid subscribers – P2P machines are used for providing multiple services Not feasible to optimize the P2P overlay for a specific service
8
Problem Statement A scalable and efficient pub/sub system for self- sustained P2P networks – The challenge Subscribers might not be present much of the time Short client sessions Use the existing overlay – Quality Goals High delivery rate Efficient publication delivery Reasonable latency High churn resilience
9
Our Solution: Overview
10
Our Solution: Subscribing to Mailbox Clients that are aware of mailboxes serving their topics, simply notify these mailboxes about it A client that is unaware of such mailboxes, initiates multiple biased random walks – Each mailbox distributes a hint (Bloom filter) with the topics it subscribes to its overlay neighbors up to some distance – The random walks favor visiting nodes whose hints include a match – The random walks continue for a given TTL trying to find as many matching mailbox as possible If none is found, then the home node becomes a mailbox for these topics
11
Our Solution: Mailboxes An overloaded mailbox can refuse to accept new clients and topics Mailboxes disappear naturally due to churn or when they are underutilized The important objective is load sharing rather than load balancing
12
Our Solution: Dissemination Spanning tree among mailboxes that know each other Random walks to discover new mailboxes and disseminate to them
13
Our Solution: Dissemination Spanning tree among mailboxes that know each other Random walks to discover new mailboxes and disseminate to them
14
Our Solution: Dissemination Spanning tree + random walks between mailboxes Normally, a mailbox pushes events to corresponding registered clients Additionally, out-of-band gossip between mailboxes and clients – Clients poll their set of known mailboxes – Exchange list of known events with each polled mailbox – Occurs periodically plus after re-connection
15
Implementation – Written in Java – Open source project All code including testing available online – Can be run on top of real IP networks as well as the PeerSim simulator In the real networks case, executed on top of the OpenKAD implementation of the Kademlia DHT – Measurements confirm similarity between results with similar size networks Simulations can be used to explore scalability Real networks can be used to validate simulation results
16
Evaluation: Methodology Traces: – Synthetic traces Subscriptions are spread to clients/home nodes uniformly Topic publication distribution is Zipf-like with α=0.9 – Twitter traces Metrics: – Delivery rate – Communication load – Mailbox subscription pattern vs. users’ – Effects of churn
17
Results: Synthetic Workload Time (minutes) Delivery Rate Delivery rate over time vs. network size 1) Delivery rate approachs 100% after a few minutes 2) For 1500 nodes, simulation ~ real runs
18
Results: Twitter Traces Time (minutes) Delivery Rate Delivery rate over time
19
Results: Load Distribution # node Total Handled Messages Load Distribution (Twitter) 1)Almost all load goes to mailboxes only 2)Even most loaded need to handle fewer than 10 messages per second
20
Results: Subscription Pattern # node Subscription Number Twitter (Feb. 7 2010 19:00-20:20) Subscription pattern of mailboxes much more uniform than of clients => balanced dissemination trees
21
Results: Subscription Pattern # node Subscription Number 1000 nodes, 3000 clients, 3 topics each
22
Results: Subscription Pattern Client Subscriptions Mailbox Subscriptions #Registered Clients/#Registered Mailboxes Only a small number of mailboxes register even to the most popular topics => the dissemination trees are relatively small
23
Results: Single 10% Churn Event Time (minutes) Miss Rate Churn Recovery Time
24
Results: Repetitive 10% Churn Time (minutes) Miss Rate Churn Recovery Over Time
25
Results: Repetitive 100% Churn Time (minutes) Miss Rate Churn Recovery Time (Loosing All Mailboxes)
26
Summary The concept of elastic mailboxes – Self electing, self evaporating, self organizing Complementing delivery mechanisms – Spanning tree – Random walks – Out-of-band gossip through clients interaction End result: – Mailboxes dramatically reduce the scalability problem – Highly efficient, highly effective, highly robust to failures and churn
27
Open Issues Exploit subscription similarity Privacy
28
Q&A Thanks for listening…
29
Client applications poll mailboxes in order to satisfy all their subscriptions. 4 Conformity Architecture: Mailbox A rendezvous point between publications and subscriptions. Mailboxes use load sharing in order to adjust their number to match the demand for publish/subscribe service. When busy mailboxes refuse or fail to serve client applications, these applications create additional mailboxes. A publish/subscribe architecture pushes content to the mailboxes.
Similar presentations
© 2024 SlidePlayer.com. Inc.
All rights reserved.