Download presentation
Presentation is loading. Please wait.
1
1 04/18/2005 Flux Flux: An Adaptive Partitioning Operator for Continuous Query Systems M.A. Shah, J.M. Hellerstein, S. Chandrasekaran, M.J. Franklin UC Berkeley Presenter: Bradley Momberger
2
2 04/18/2005 Overview ● Introduction ● Background ● Experiments and Considerations ● Conclusion
3
3 04/18/2005 Introduction ● Continuous query (CQ) systems – Create unbounded, streaming results from unbounded, streaming data sources. – May in the long run have scalability issues, due to the need for fast response times, the possibility of large numbers of users, and the management of potentially large histories. – Are only as fast as their constituent operators will allow.
4
4 04/18/2005 Parallelism ● Traditional parallelism techniques – Poor fit for CQ systems – Not adaptive ● CQ requires adaptability to changing conditions
5
5 04/18/2005 Overview ● Introduction ● Background ● Experiments and Considerations ● Conclusion
6
6 04/18/2005 Background ● Exchange – Producer-consumer pair – Ex-Prod: Intermediate producer instance connected to consumers – Ex-Cons: Intermediate consumer instance which polls inputs from all producers. – “Content sensitive” routing ● RiverDQ – “Content insensitive” routing – Random choice of Ex-Cons target
7
7 04/18/2005 Flux ● Flux, Fault-tolerant Load-balancing eXchange – Load balancing through active repartitioning – Producer-consumer pair – Buffering and reordering – Detection of imbalances
8
8 04/18/2005 Short Term Imbalances ● A stage runs only as fast as its slowest Ex-Cons – Head-of-line blocking – Uneven distribution over time ● The Flux-Prod solution – Transient Skew buffer ● Hashtable buffer between producer and Flux-Prod – Get new tuples for each Flux-Cons as buffer space becomes available. – On-demand input reordering
9
9 04/18/2005 Flux-Prod Design
10
10 04/18/2005 Long Term Imbalances ● Eventually overload fixed size buffers – Cannot use same strategy as short term ● The Flux-Cons solution – Repartition at consumer level – Move states – Aim for maximal benefit per state moved – Avoid “thrashing”
11
11 04/18/2005 Flux-Cons Design
12
12 04/18/2005 Memory Constrained Environment ● First tests were done with adequate memory – Does not necessarily reflect reality – Memory shortages ● Large histories ● Extra operators ● Load shedding with little memory – Push to disk – Move to other site – Decrease history size ● May not be acceptable in some applications
13
13 04/18/2005 Flux and Constrained Memory ● Dual-destination repartitioning – Other machines – Disk storage ● Local mechanism – Flux-Cons spills to disk when memory is low ● Retrieves from disk when memory becomes available ● Global Memory Constrained Repartitioning – Poll Flux-Cons operators for memory usage – Repartition based on results
14
14 04/18/2005 Memory-Adaptive Flux-Cons
15
15 04/18/2005 Overview ● Introduction ● Background ● Experiments and Considerations ● Conclusion
16
16 04/18/2005 Experimental Methodology ● Example operator – Hash-based, windowed group-by-aggregate – Statistic over fixed-size history ● Cluster hardware – CPU: 1000 MIPS – 1GB main memory ● Network simulation – 1K packet size, infinite bandwidth, 0.07ms latency – Virtual machines, simulated disk.
17
17 04/18/2005 Experimental Methodology ● Simulator – TelegraphCQ base system – Operators share physical CPU with event simulator – Aggregate evaluation and scheduler simulated ● Testbed – Single producer-consumer stage – 32 nodes in simulated cluster – Ex-Cons operator dictates performance
18
18 04/18/2005 Short Term Imbalance Experiment ● Give Flux stage a transient skew buffer – Compare to base Exchange stage with equivalent space ● Comparison statistics – 500ms load per virtual machine, round robin – Simulated process: 0.1ms processing, 0.05ms sleep – 16s runtime (32 machines 0.5s/machine)
19
19 04/18/2005 Short Term Imbalance Experiment
20
20 04/18/2005 Long Term Imbalance Experiment ● Operator stage – 64 partitions per virtual machine – 10,000 tuple (800KB) history per partition – 160KB skew buffer – 0.2μs per tuple for partition processing ● Network – 500mbps throughput for partitions – 250mbps point-to-point
21
21 04/18/2005 Balancing Processing Load
22
22 04/18/2005 Graceful Degradation
23
23 04/18/2005 Varying Collection Time
24
24 04/18/2005 Memory Constrained Experiments ● Memory “pressure” – 768MB initial memory load ● 6MB/partition 128 partitions/machine – Available memory => 512MB (down from 1GB) ● Change made after 1s of simulation – 14s required to push the remaining 256MB ● May be to disk or to other machines
25
25 04/18/2005 Throughput during Memory Balancing
26
26 04/18/2005 Avg. Latency during Memory Balancing
27
27 04/18/2005 Average Latency Degradation
28
28 04/18/2005 Hybrid Policy ● Combines previous policies – Memory-based policy when partitions are on disk ● Minimize latency – Load-balancing policy when all partitions are in memory ● Maximize throughput
29
29 04/18/2005 Comparative Review ┌ last 20 seconds of simulation ┐┌ Steady state ┐
30
30 04/18/2005 Overview ● Introduction ● Background ● Experiments and Considerations ● Conclusion
31
31 04/18/2005 Conclusions ● Flux – Is a reusable mechanism – Encapsulates adaptive repartitioning – Extends the Exchange operator – Alleviates short- and long-term imbalances – Outperforms static partitioning when correcting imbalances – Can use hybrid policies to adapt to changing processing and memory requirements.
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.