Online parameter optimization for elastic data stream processing Heinze et.al Scribed by Tao Feng 02/18/2016
Recap Introduce an Elastic Scaling Streaming Process that achieves an efficient balance between Monetary cost and Quality of Service (latency) Threshold based elastic scaling Runtime and automated operator placement On-line parameter optimization Parameter search scheme Cost function Online profiler When to trigger the optimization Size of utilization history
Evaluation Trade-off between Monetary cost and Quality of Service Manually tuned parameter Optimized parameter (contribution of this paper) Parameter learned from Reinforcement Learning
Pros & Cons Pros Carefully and extensively comparison the trade-offs of the three baseline using different parameter values Simple architecture and intuitive algorithm for elastic scaling and parameter search The framework can be used to other type of elastic scaling stream processing The prototype scales linearly with number of queries and window size …
Pros & Cons Cons Cost function seems self-provable MAX is predefined Overloading part could be further improved or with more explanation The evaluation is based authors’ own system What the situation will be on a system used in the industry (Storm) Didn’t talk about load burst when that is much larger than regular workload User still need to define the latency threshold Reinforcement Learning Black box
Comments & Questions Why is monetary cost the only cost considered? We could also focus on energy as cost etc. What advantage did we get from making the upper and lower threshold granularities so small (which led to several thousand parameter configurations)? Besides CPU, network, and memory consumption, are there any other possible resources that also should be considered? Why can’t memory and network overload be parametrized?