1 04/18/2005 Flux Flux: An Adaptive Partitioning Operator for Continuous Query Systems M.A. Shah, J.M. Hellerstein, S. Chandrasekaran, M.J. Franklin UC.

1 04/18/2005 Flux Flux: An Adaptive Partitioning Operator for Continuous Query Systems M.A. Shah, J.M. Hellerstein, S. Chandrasekaran, M.J. Franklin UC Berkeley Presenter: Bradley Momberger

2 04/18/2005 Overview ● Introduction ● Background ● Experiments and Considerations ● Conclusion

3 04/18/2005 Introduction ● Continuous query (CQ) systems – Create unbounded, streaming results from unbounded, streaming data sources. – May in the long run have scalability issues, due to the need for fast response times, the possibility of large numbers of users, and the management of potentially large histories. – Are only as fast as their constituent operators will allow.

4 04/18/2005 Parallelism ● Traditional parallelism techniques – Poor fit for CQ systems – Not adaptive ● CQ requires adaptability to changing conditions

6 04/18/2005 Background ● Exchange – Producer-consumer pair – Ex-Prod: Intermediate producer instance connected to consumers – Ex-Cons: Intermediate consumer instance which polls inputs from all producers. – “Content sensitive” routing ● RiverDQ – “Content insensitive” routing – Random choice of Ex-Cons target

7 04/18/2005 Flux ● Flux, Fault-tolerant Load-balancing eXchange – Load balancing through active repartitioning – Producer-consumer pair – Buffering and reordering – Detection of imbalances

8 04/18/2005 Short Term Imbalances ● A stage runs only as fast as its slowest Ex-Cons – Head-of-line blocking – Uneven distribution over time ● The Flux-Prod solution – Transient Skew buffer ● Hashtable buffer between producer and Flux-Prod – Get new tuples for each Flux-Cons as buffer space becomes available. – On-demand input reordering

9 04/18/2005 Flux-Prod Design

10 04/18/2005 Long Term Imbalances ● Eventually overload fixed size buffers – Cannot use same strategy as short term ● The Flux-Cons solution – Repartition at consumer level – Move states – Aim for maximal benefit per state moved – Avoid “thrashing”

11 04/18/2005 Flux-Cons Design

12 04/18/2005 Memory Constrained Environment ● First tests were done with adequate memory – Does not necessarily reflect reality – Memory shortages ● Large histories ● Extra operators ● Load shedding with little memory – Push to disk – Move to other site – Decrease history size ● May not be acceptable in some applications

13 04/18/2005 Flux and Constrained Memory ● Dual-destination repartitioning – Other machines – Disk storage ● Local mechanism – Flux-Cons spills to disk when memory is low ● Retrieves from disk when memory becomes available ● Global Memory Constrained Repartitioning – Poll Flux-Cons operators for memory usage – Repartition based on results

14 04/18/2005 Memory-Adaptive Flux-Cons

16 04/18/2005 Experimental Methodology ● Example operator – Hash-based, windowed group-by-aggregate – Statistic over fixed-size history ● Cluster hardware – CPU: 1000 MIPS – 1GB main memory ● Network simulation – 1K packet size, infinite bandwidth, 0.07ms latency – Virtual machines, simulated disk.

17 04/18/2005 Experimental Methodology ● Simulator – TelegraphCQ base system – Operators share physical CPU with event simulator – Aggregate evaluation and scheduler simulated ● Testbed – Single producer-consumer stage – 32 nodes in simulated cluster – Ex-Cons operator dictates performance

18 04/18/2005 Short Term Imbalance Experiment ● Give Flux stage a transient skew buffer – Compare to base Exchange stage with equivalent space ● Comparison statistics – 500ms load per virtual machine, round robin – Simulated process: 0.1ms processing, 0.05ms sleep – 16s runtime (32 machines  0.5s/machine)

19 04/18/2005 Short Term Imbalance Experiment

20 04/18/2005 Long Term Imbalance Experiment ● Operator stage – 64 partitions per virtual machine – 10,000 tuple (800KB) history per partition – 160KB skew buffer – 0.2μs per tuple for partition processing ● Network – 500mbps throughput for partitions – 250mbps point-to-point

21 04/18/2005 Balancing Processing Load

22 04/18/2005 Graceful Degradation

23 04/18/2005 Varying Collection Time

24 04/18/2005 Memory Constrained Experiments ● Memory “pressure” – 768MB initial memory load ● 6MB/partition  128 partitions/machine – Available memory => 512MB (down from 1GB) ● Change made after 1s of simulation – 14s required to push the remaining 256MB ● May be to disk or to other machines

25 04/18/2005 Throughput during Memory Balancing

26 04/18/2005 Avg. Latency during Memory Balancing

27 04/18/2005 Average Latency Degradation

28 04/18/2005 Hybrid Policy ● Combines previous policies – Memory-based policy when partitions are on disk ● Minimize latency – Load-balancing policy when all partitions are in memory ● Maximize throughput

29 04/18/2005 Comparative Review ┌ last 20 seconds of simulation ┐┌ Steady state ┐

31 04/18/2005 Conclusions ● Flux – Is a reusable mechanism – Encapsulates adaptive repartitioning – Extends the Exchange operator – Alleviates short- and long-term imbalances – Outperforms static partitioning when correcting imbalances – Can use hybrid policies to adapt to changing processing and memory requirements.

1 04/18/2005 Flux Flux: An Adaptive Partitioning Operator for Continuous Query Systems M.A. Shah, J.M. Hellerstein, S. Chandrasekaran, M.J. Franklin UC.

Similar presentations

Presentation on theme: "1 04/18/2005 Flux Flux: An Adaptive Partitioning Operator for Continuous Query Systems M.A. Shah, J.M. Hellerstein, S. Chandrasekaran, M.J. Franklin UC."— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

1 04/18/2005 Flux Flux: An Adaptive Partitioning Operator for Continuous Query Systems M.A. Shah, J.M. Hellerstein, S. Chandrasekaran, M.J. Franklin UC.

Similar presentations

Presentation on theme: "1 04/18/2005 Flux Flux: An Adaptive Partitioning Operator for Continuous Query Systems M.A. Shah, J.M. Hellerstein, S. Chandrasekaran, M.J. Franklin UC."— Presentation transcript:

Similar presentations

About project

Feedback