Download presentation
Presentation is loading. Please wait.
Published byJordan Hodges Modified over 9 years ago
1
PhD Candidate: Alex K. Y. Cheung Supervisor: Hans-Arno Jacobsen PhD Thesis Presentation University of Toronto March 28, 2011 MIDDLEWARE SYSTEMS RESEARCH GROUP Resource Allocation Algorithms for Event-Based Enterprise Systems
2
PhD Thesis Presentation, Alex Cheung © 2011 Introduction to Distributed Content- based Publish/Subscribe 2 subscriber brand = ‘Honda’ cashback > $2000 subscriber brand= ‘Honda’ cashback > $4000 publisher brand = ‘Honda’ cashback = $6000 broker multicast Advertisement path Subscription path Publication path brand = ‘Honda’ cashback >= $0
3
PhD Thesis Presentation, Alex Cheung © 2011 Desirable Properties of Distributed Content-based Publish/Subscribe Decoupling of data sources and sinks Ease of component addition and removal Flexible routing based on message content Efficient use of network resources Distributed broker overlay network Scalable Fault tolerant 3
4
PhD Thesis Presentation, Alex Cheung © 2011 Applications of Publish/Subscribe Network and systems monitoring [Mukherjee 1994] Business activity monitoring [Fawcett et al. 1999] Business process execution [Schuler et al. 2001] Workflow management [Cugola et al. 2001] Multiplayer online games [Bharambe et al. 2002] RSS filtering [Petrovic et al. 2005; Rose et al. 2007] Automated service composition [Hu et al. 2008] Resource discovery [Yan et al. 2009] 4
5
PhD Thesis Presentation, Alex Cheung © 2011 Real Deployments of Distributed Publish/Subscribe GooPS ▫ Google’s pub/sub messaging middleware to integrate web applications (such as Gmail, Google Docs, Google Calendar) on a world-wide scale supporting millions of users ▫ Hundreds of brokers with tens of thousands of pub/sub clients Yahoo Message Broker ▫ Yahoo’s pub/sub middleware to integrate applications with their database system, PNUTS SuperMontage ▫ Tibco’s pub/sub distribution network for Nasdaq’s quote and order-processing system GDSN (Global Data Synchronization Network) ▫ A global pub/sub network that allows retailers and suppliers (i.e., Walmart, Target, Metro, etc.) to exchange timely and accurate supply chain data 5
6
PhD Thesis Presentation, Alex Cheung © 2011 Contributions Load Balancing in Content-based Publish/Subscribe Systems (ACM TOCS’10) Publisher Placement Algorithms in Content- based Publish/Subscribe (IEEE ICDCS’10) Green Resource Allocation Algorithms in Content-based Publish/Subscribe (IEEE ICDCS’11) 6
7
PhD Thesis Presentation, Alex Cheung © 2011 Problem Brokers located at different geographical areas may suffer from uneven load distribution due to ▫ Heterogeneous servers ▫ Network congestion ▫ Different densities and interests of end-users Consequences ▫ Overloaded brokers introduce high delivery delays that may ultimately crash from running out of memory ▫ System that does not scale with the added resources 7
8
PhD Thesis Presentation, Alex Cheung © 2011 S S S S S S S S S S P P Visualizing the Problem 8
9
PhD Thesis Presentation, Alex Cheung © 2011 P P S S S S S S S S S S Overview of Load Balancing Approach 9 Local Load Balancing Global Load Balancing offloading broker load-accepting broker
10
PhD Thesis Presentation, Alex Cheung © 2011 Evaluation Implemented on a real open source pub/sub system called PADRES PlanetLab and a cluster testbed Local and global load balancing Homogeneous and heterogeneous servers Compared against a naive approach 10 B20 B21 B22 B30 B31 B32 B40 B41 B42 S S S S S S B10 B11 B12 P P P P Global LB Setup B50 B51 B52 B60 B61 B62
11
PhD Thesis Presentation, Alex Cheung © 2011 Summary Load balancing enables the pub/sub system to scale with the number of resources Load balancing solutions that are unaware of subscription load and relationships are ineffective ▫ Long response time ▫ Unstable system 11
12
PhD Thesis Presentation, Alex Cheung © 2011 Contributions Load Balancing in Content-based Publish/Subscribe Systems (ACM TOCS’10) Publisher Placement Algorithms in Content- based Publish/Subscribe (IEEE ICDCS’10) Green Resource Allocation Algorithms in Content-based Publish/Subscribe (IEEE ICDCS’11) 12
13
PhD Thesis Presentation, Alex Cheung © 2011 Problem Publishers can join anywhere or to the closest broker in the overlay Consequences ▫ High delivery delay Sluggish system ▫ High resource usage in terms of matching, network bandwidth, and subscription storage High IT costs 13 P P S S S S
14
PhD Thesis Presentation, Alex Cheung © 2011 Approach Adaptively move publisher to area of matching subscribers Two unique solutions ▫ POP (Publisher Optimistic Placement) Decision is based on the average number of downstream publication deliveries ▫ GRAPE (Greedy Relocation Algorithm for Publishers of Events) Decision is based on the end-to-end delivery delay, total broker message rate, and user specified inputs including the minimization metric (load/delivery delay) and weight 14 S S S S P P
15
PhD Thesis Presentation, Alex Cheung © 2011 Evaluation Implemented on the open source pub/sub system called PADRES PlanetLab and a cluster testbed Enterprise and random workloads 15 Reduced delivery delay by up to 68% Reduced message rate by up to 85%
16
PhD Thesis Presentation, Alex Cheung © 2011 Summary POP is suitable for pub/sub systems that strive for simplicity, such as GooPS GRAPE is suitable for systems that strive to minimize in the extremes, such as system load in sensor networks or delivery delay in SuperMontage 16
17
PhD Thesis Presentation, Alex Cheung © 2011 Contributions Load Balancing in Content-based Publish/Subscribe Systems (ACM TOCS’10) Publisher Placement Algorithms in Content- based Publish/Subscribe (IEEE ICDCS’10) Green Resource Allocation Algorithms in Content-based Publish/Subscribe (IEEE ICDCS’11) 17
18
PhD Thesis Presentation, Alex Cheung © 2011 Problem What is the deployment strategy for the broker overlay, publisher assignment, and subscriber assignment to minimize the broker message rate and number of allocated brokers? Proven to be an NP-complete problem Benefits ▫ Increase capacity of the system ▫ More efficient energy usage of the allocated servers ▫ Fewer servers mean lower investment and maintenance costs ▫ Inline with Green IT, which is also what enterprises such as Google and Yahoo are currently engaged in 18
19
PhD Thesis Presentation, Alex Cheung © 2011 Approach 3 phase design. Most compelling properties ▫ Language independent Content-based (XPath, regex, ranged, SQL, composite subscriptions, etc.) and topic-based, such as GooPS ▫ Works effectively under any workload (defined or undefined) 19 Phase 1Record the publications delivered to each subscription into bit vectors Phase 2Use information from the bit vectors to allocate subscriptions to brokers using one of 10 algorithms Phase 3Construct the broker overlay with 3 optimization techniques and deploy the new configuration
20
PhD Thesis Presentation, Alex Cheung © 2011 Phase 1: Subscription Profiling 20 000000000 Message ID of first index Start of bit vector 1 Publications delivered to subscription B34-M213 B34-M215 B34-M216 B34-M217 B34-M220 B34-M222 B34-M225 B34-M226 B34-M213 01010101010101 Profile of each subscriber per advertisement maintained at the subscriber’s first broker Message ID Cardinality of bit vector corresponds to bandwidth requirement of the subscription Used to compute “closeness” of between any two subscriptions in the clustering algorithm. closeness = |s i ∩ s j | Fixed size so shift left if next publication is out of bit vector range
21
PhD Thesis Presentation, Alex Cheung © 2011 Phase 2: Subscription Allocation Algorithms MANUAL/(AUTOMATIC) ▫ Tree with fanout of 2, manual (random) placement of clients Fastest Broker First (FBF) ▫ Assign subscriptions randomly to the next most powerful broker Bin Packing ▫ Like FBF, but assigns the next highest traffic subscription PAIRWISE-N, PAIRWISE-K (related approaches in ICDCS’02) ▫ Subscription clustering where the number of clusters is given CRAM (Clustering with Resource Awareness and Minimization) ▫ Dynamically determines the number of clusters ▫ Utilizes a new clustering algorithm that is more effective ▫ Evaluated with 4 different subscription closeness metrics, with one derived from Banavar et al. in ICDCS '99 21
22
PhD Thesis Presentation, Alex Cheung © 2011 Bin Packing 22 S S S S S S S S S S S S
23
PhD Thesis Presentation, Alex Cheung © 2011 Bin Packing’s Allocation Result 23 S S S S S S S S S S S S
24
PhD Thesis Presentation, Alex Cheung © 2011 S S S S Phase 3: Broker Overlay Construction 24 S S S S S S S S S S S S S S
25
PhD Thesis Presentation, Alex Cheung © 2011 Bin Packing’s Final Overlay 25 S S S S S S S S S S S S S S S S S S P P P P ( ( GRAPE ) ) ( ( GRAPE ) )
26
PhD Thesis Presentation, Alex Cheung © 2011 Evaluation Implemented on the PADRES open source content-based pub/sub project Evaluated on a cluster testbed with 80 brokers Evaluated on SciNet, an HPC with 1000 brokers Comparison against two related works (Riabov et al. ICDCS’02, Banavar et al. ICDCS’99) Homogeneous and heterogeneous scenarios Workload saturates the initial deployment (MANUAL) 26
27
PhD Thesis Presentation, Alex Cheung © 2011 Evaluation Results on SciNet 27 Reduced message rate by up to 92% Reduced number of allocated brokers by up to 91%
28
PhD Thesis Presentation, Alex Cheung © 2011 Summary CRAM combines the benefits of ▫ Subscription clustering ▫ Resource awareness from Bin Packing by simultaneously reducing both ▫ Broker message rates ▫ Number of allocated brokers Bit vectors are powerful ▫ Language independent (XPath, regex, topics) ▫ Effective with any workload distribution 28
29
PhD Thesis Presentation, Alex Cheung © 2011 Conclusions Load balancing increases ▫ Availability by circumventing overloads ▫ Scalability of the system Publisher placement algorithms reduce ▫ Broker input load by up to 68% ▫ Broker message rate by up to 85% ▫ Delivery delay by up to 68% Resource allocation algorithms reduce ▫ Average broker message rate by up to 92% ▫ Number of allocated brokers by up to 91% 29
30
PhD Thesis Presentation, Alex Cheung © 2011 Future Work Self-tuning of load balancing parameters React dynamically by growing and shrinking the network in incremental steps Improve runtime of the CRAM algorithm by parallelization or reducing its computational complexity Model workload with more sophisticated methods, such as stochastic processes, to improve accuracy of load estimation Address fault resiliency in each approach 30
31
PhD Thesis Presentation, Alex Cheung © 2011 Q & A 31
32
PhD Thesis Presentation, Alex Cheung © 2011 Related Works - Clustering Riabov et al. (ICDCS’02) ▫ The number of clusters K is pre-specified ▫ Each cluster is a multicast address, thus there is no upper limit on its size ▫ Event space is divided into grids ▫ Supports only ranged subscriptions ▫ Their pairwise clustering considers each subscription individually Gryphon (ICDCS'99) ▫ Supports only equal and * subscriptions ▫ Each cluster is stored in memory, the upper bound limit is not a major concern SUB-2-SUB (IPTPS'06) ▫ Supports only ranged subscriptions ▫ Each cluster is a p2p network, thus there is no upper limit on the cluster size 32
33
PhD Thesis Presentation, Alex Cheung © 2011 Related Works – Broker Overlay Construction, Publisher and Subscriber Placement Algorithms Baldoni et al. (The Computer Journal), Jaeger et al. (SAC'07) Migliavacca et al. (DEBS’07) ▫ Reconfigure broker overlay to reduce delivery delay and broker processing load Cheung et al. (Middleware’06, ICDCS’10) ▫ Load balancing by relocating subscriber clients ▫ Reduce delivery delay and broker processing load by relocating publisher clients 33
Similar presentations
© 2024 SlidePlayer.com. Inc.
All rights reserved.