Alex King Yeung Cheung and Hans-Arno Jacobsen University of Toronto June, 24 th 2010 ICDCS 2010 MIDDLEWARE SYSTEMS RESEARCH GROUP
Problem Publishers can join anywhere in the broker overlay Closest broker Impact High delivery delay High system utilization Matching Bandwidth Subscription Storage P P S S S S 2
Motivation High system utilization leads to overloads High response times Reliability issues Critical for enterprise-grade publish/subscribe systems GooPS Google’s internal publish/subscribe middleware Supermontage Tibco’s pub/sub distribution network for Nasdaq’s quote and order processing system GDSN (Global Data Synchronization Network) Global pub/sub network to allow retailers and suppliers to exchange supply chain data 3
Goal Adaptively move publisher to area of matching subscribers Algorithms should be Dynamic Transparent Scalable Robust S S S S P P 4
Terminology B1 B2 B3 B4 B5 P P Reference broker upstreamdownstream Publication flow 5
Publisher Placement Algorithms POP Publisher Optimistic Placement Fully distributed design Retrieves trace information per traced publication Uses one metric: number of publication deliveries downstream GRAPE Greedy Relocation Algorithm for Publishers of Events Computations are centralized at each publisher’s broker, makes implementing and debugging easier Retrieves trace information per trace session Customize on minimizing delivery delay, broker load, or a specificed combination of both Uses two metrics: average delivery delay and total system message rate Goal: Move publishers to where the subscribers are based on past publication traffic 6
Choice of Minimizing Delivery Delay or Load S S S S S S S S S S S S [class,=,`STOCK’], [symbol,=,`GOOG’], [volume,>, ] P P [class,`STOCK’], [symbol,`GOOG’], [volume, ] [class,=,`STOCK’], [symbol,=,`GOOG’], [volume,>,0] 4 msg/s 1 msg/s 100% Load 0% 0% Delay 100% 7
GRAPE’s 3 Phases Phase 1 Discover location of publication deliveries by tracing live publication messages Retrieve trace and broker performance information Phase 2 Pinpoint the broker that minimizes the average delivery delay or system load in a centralized manner Phase 3 Migrate the publisher to the broker decided in Phase 2 Transparently with minimal routing table update and message overhead 8
Phase 1 – Illustration Trace session ID Start of bit vector 1 Total number of deliveries made to local subscribers Publications received at this broker Number of matching subscribers B34-M213 B34-M212 5 B34-M215 B34-M B34-M216 B34-M212 5 B34-M217 B34-M212 1 B34-M220 B34-M212 3 B34-M222 B34-M B34-M225 B34-M212 1 B34-M226 B34-M GRAPE’s data structure per publisher Message ID Trace session ID 9
Phase 1 – Trace Data and Broker Performance Retrieval B1B5 B7 B6 B8 P P S S S S S S 1x 9x 5x S S 1x Reply B8 Reply B8 Reply B7 Reply B7 Reply B8, B7, B6 Reply B8, B7, B6 Reply B8, B7, B6, B5 Reply B8, B7, B6, B5 Once G threshold publications are traced, then the trace session ends… 10
Contents of Trace Reply in Phase 1 Broker ID Neighbor ID(s) Bit vector (for estimating total system message rate) Total number of local deliveries (for estimating end-to-end delivery delay) Input queuing delay Average matching delay Output queuing delays to neighbor(s) and binding(s) Message overhead-wise, GRAPE adds 1 reply message per trace session 11
Phase 2 – Broker Selection Simulate placing the publisher at every downstream broker and estimate the average end-to-end delivery delay Local delivery counts Processing delay at each broker queuing and matching delays Publisher ping times to each broker Simulate placing the publisher at every downstream broker and estimate the total system message rate Bit vectors 12
Phase 2 – Estimating Average End- to-End Delivery Delay B1 B8 B6 B7 P P S S S S S S 9 5 S S 2 1 Input Q: Matching: Output Q (RMI): Output Q (B5): Input Q: Matching: Output Q (RMI): Output Q (B5): Output Q (B7): Output Q (B8): Input Q: Matching: Output Q (RMI): Output Q (B6): Input Q: Matching: Output Q (RMI): Output Q (B6): 30 ms 20 ms 100 ms 50ms 20 ms 5 ms 45 ms 25 ms 40 ms 35 ms 30 ms 10 ms 70 ms 30 ms 35 ms 15 ms 75 ms 35 ms Subscriber at B1: 10+( ) ×1 = 160 ms Subscribers at B6: 10+[( )+( )] ×2 = 350 ms Subscribers at B7: 10+ [( )+( )+ ( )] ×9 = 2,485 ms Subscribers at B8: 10+[( )+( )+ ( )] ×5 = 1,435 ms Average end-to-end delivery delay: ( ) ÷ 17 = 268 ms 10 ms Ping time: 13
Phase 2 – Estimating Total Broker Message Rate B1 B8 B6 B7 P P S S S S S S 9 5 S S Bit vector are necessary in capturing publication deliveries to local subscribers in content-based pub/sub systems Message rate through a broker is calculated by using the OR-bit operator to aggregate the bit vectors of all downstream brokers 14
Phase 2 – Minimizing Delivery Delay with Weight P% 1. Get publisher-to-broker ping times 2. Calculate the average delivery delay if the publisher is positioned at each of the downstream brokers 3. Normalize, sort, and drop candidates with average delivery delays greater than P 4. Calculate the total broker message rate if the publisher is positioned at each of the remaining candidate brokers 5. Select the candidate that yields the lowest total system message rate 15
POP’s 3 Phases Phase 1 Discover location of publication deliveries by probabilistically tracing live publication messages Phase 2 Pinpoint the broker closest to the set of matching subscribers using trace data from phase 1 in a decentralized fashion Phase 3 Migrate the publisher to the broker decided in Phase 2 Transparently with minimal routing table update and message overhead 16
Phase 1 – Publication Tracing B43 B615 B1 B3 B2 B5 B4B7 B6 B8 P P S S S S S S S S S S S S 2x 4x 3x 1x 9x 5x S S 1x B1 B2 B4 B B32 B B89 B75 B6 B7 B Publisher Profile Table Multiple publication traces are aggregated by : S i = S new + (1 - α) S i-1 Reply 9 Reply 9 Reply 5 Reply 5 Reply 15 Reply 15 Reply 15 Reply 15 17
Phase 2 – Broker Selection B43 B615 B1 B3 B2 B5 B4B7 B6 B8 S S S S S S S S S S S S 2x 4x 3x 1x 9x 5x P P S S 1x B1 B2 B4 B B32 B B89 B75 B6 B7 B AdvId: P DestId: null Broker List: B1, B5, B6 10 B6 18
Experiment Setup Experiments on both PlanetLab and a cluster testbed PlanetLab: 63 brokers 1 broker per box 20 publishers with publication rate of 10 – 40 msg/min 80 subscribers per publisher 1600 subscribers in total P threshold of 50 G threshold of 50 Cluster testbed: 127 brokers Up to 7 brokers per box 30 publishers with publication rate of 30 – 300 msg/min 200 subscribers per publisher 6000 subscribers in total P threshold of 100 G threshold of
Experiment Setup - Workloads 2 workloads Random Scenario 5 % are high-rated; sink all traffic from their publisher 25% are medium-rated; sink ~50% of traffic 70% are low-rated; sink ~10% of traffic Subscribers are randomly placed on N brokers Enterprise Scenario 5 % are high-rated; sink all traffic from their publisher 95% are low-rated; sink ~10% of traffic All high-rated subscribers are clustered onto one broker, and all low-rated subscribers onto N-1 brokers 20
Average Input Utilization Ratio vs Subscriber Distribution Graph Load reduction by up to 68% 21
Average Delivery Delay vs Subscriber Distribution Graph Delivery delay reduction by up to 68% 22
Average Message Overhead Ratio vs Subscriber Distribution Graph 23
Conclusions POP and GRAPE moves publishers to areas of matching subscribers to Reduce load in the system to increase scalability, and/or Reduce average delivery delay on publication messages to improve performance POP is suitable for pub/sub systems that strive for simplicity, such as GooPS GRAPE is suitable for systems that strive to minimize in the extremes, such as system load in sensor networks and delivery delay in SuperMontage, or want the flexibility to adjust the performance and based on resource usage 24
25
Related Approaches Filter-based Publish/Subscribe: Re-organize the broker overlay to minimize delivery delay and system load. R.Baldoni et al. The Computer Journal, Migliavacca et al. DEBS Multicast-based Publish/Subscribe: Assign similar subscriptions to one or more cluster of servers Suitable for static workloads May get false-positive publication delivery Architecture is fundamentally different than filter-based approaches Riabov et al. ICDCS 2002 and 2003 Voulgaris et al. IPTPS 2006 Baldoni et al. DEBS
Average Broker Message Rate VS Subscriber Distribution Graph 27
Average Output Utilization Ratio VS Subscriber Distribution Graph 28
Average Delivery Delay VS Subscriber Distribution Graph 29
Average Hop Count VS Subscriber Distribution Graph 30
Average Broker Message Rate VS Subscriber Distribution Graph 31
Average Delivery Delay VS Subscriber Distribution Graph 32
Average Message Overhead Ratio VS Time Graph 33
Message Rate VS Time Graph 34
Average Delivery Delay VS Time Graph 35
Average Hop Count VS Time Graph 36
Broker Selection Time VS Migration Hop Count Graph 37
Broker Selection Time VS Migration Hop Count Graph 38
Publisher Wait Time VS Migration Hop Count Graph 39
Results Summary Under random workload No significant performance differences between POP and GRAPE Prioritization metric and weight has almost no impact on GRAPE’s performance Increasing the number of publication samples on POP Increases the response time Increases the amount of message overhead Increases the average broker message rate GRAPE reduces the input util ratio by up to 68%, average message rate by 84%, average delivery delay by 68%, and message overhead relative to POP by 91%. 40
Phase 1 – Logging Publication History Each broker records, per publisher, the publications delivered to local subscribers Each trace session is identified by the message ID of first publication of that session The trace session ID is in the header of each subsequent publication message G threshold publications are traced per trace session 41
POP - Intro Publisher Optimistic Placement Goal: Move publishers to the area with highest publication delivery or concentration of matching subscribers 42
POP’s Methodology Overview 3 phase algorithm: Phase 1: Discover location of publication deliveries by probabilistically tracing live publication messages Ongoing, efficiently with minimal network, computational, and storage overhead Phase 2: Pinpoint the broker closest to the set of matching subscribers using trace data from phase 1 in a decentralized fashion Phase 3: Migrate the publisher to the broker decided in phase 2 Transparently with minimal routing table update and message overhead 43
Phase 1 – Aggregated Replies B43 B615 B1 B3 B2 B5 B4B7 B6 B8 P P S S S S S S S S S S S S 2x 4x 3x 1x 9x 5x S S 1x B1 B2 B4 B B32 B B89 B75 B6 B7 B Publisher Profile Table Multiple publication traces are aggregated by : S i = S new + (1 - α) S i-1 Reply 9 Reply 9 Reply 5 Reply 5 Reply 15 Reply 15 Reply 15 Reply 15 44
Phase 2 – Decentralized Broker Selection Algorithm Phase 2 starts when P threshold publications are traced Goal: Pinpoint the broker that is closest to highest concentration of matching subscribers Using trace information from only a subset of brokers The Next Best Broker condition: The next best neighboring broker is the one whose number of downstream subscribers is greater than the sum of all other neighbors' downstream subscribers plus the local broker's subscribers. 45
Phase 2 – Example B43 B615 B1 B3 B2 B5 B4B7 B6 B8 S S S S S S S S S S S S 2x 4x 3x 1x 9x 5x P P S S 1x B1 B2 B4 B B32 B B89 B75 B6 B7 B AdvId: P DestId: null Broker List: B1, B5, B6 10 B6 46
Phase 3 - Example B1 B3 B2 B5 B4B7 B6 B8 S S S S S S S S S S S S 2x 4x 3x 1x 9x 5x P P S S 1x (1) Update last hop of P to B6-x (1)Update last hop of P to B6 (2)Remove all S with B6 as last hop (1)Update last hop of P to B6 (2)Remove all S with B5 as last hop (3)Forward (all) matching S to B5 How to tell when all subs are processed by B6 before P can publish again? DONE 47
Phase 2 – Minimizing Load with Weight P% 1. Calculate the total broker message rate if the publisher is positioned at each of the downstream brokers 2. Normalize, sort, and drop candidates with total message rate greater than P. 3. Get publisher-to-broker ping times on remaining candidates 4. Calculate the average delivery delay if the publisher is positioned at each of the remaining downstream brokers 5. Select the candidate that yields the lowest average delivery delay. 48