Operator Placement for In-Network Stream Query Processing
Outline Introduction Preliminaries Filter placement Extensions Conclusions
Introduction In-network query processing Consider a video surveillance application Environment Target Suspicious activity dark, movement Need filter for calculating intensity (F1) filter for detecting sufficient motion (F2)
Introduction Previous work push down all filters since CPU cost << communication cost What if the queries involve expensive predicates ? Objective place each filter at the “ best" node based on selectivity and cost minimize the overall cost
Introduction Operator placement problem Tradeoff Lower computational costs Put on the nodes higher up Lower transmission cost Put on the nodes lower down Candidate m-level hierarchy n filters m n possible solutions In this paper… Key idea Model network links as filters Content define the problem provide a greedy alg. that failed present a polynomial-time optimal alg. extend to multiway stream join …
Preliminaries Consider a linear chain of nodes Notation S = data acquired by node N 1 F = { F 1, F 2, …, F n } Query
Cost Model Three quantities Selectivity of filter F : s(F) fraction of the tuples in stream S that are expected to satisfy F Cost of filter F : c(F, i) per-tuple cost of execution on node N i c(F, i+1) = i c(F, i) i ≤ 1 (if i > 1 ) Cost of network transmission : l i per-tuple cost of transmitting from N i to N i+1 r s( F )r
Cost Model Notation P(F) = i if filter F is executed on N i F i = { F | P(F) = i } F ’ = F ’ 1, F ’ 2, …, F ’ n ’ c(F ’, i) = the cost per tuple of executing F ’ at node N i r(F i ) = F i in rank order Ref. J. Hellerstein and M.Stonebraker. Predicate migration: Optimizing queries with expensive predicates Cost on a single node Overall cost
Example 2.2 c(P) = c(F 1, 1) + s(F 1 ) c(F 2, 1) + s(F 1 ) s(F 2 ) [ l 1 + l 2 + c(F 3, 3) ] + s(F 1 ) s(F 2 ) s(F 3 ) [ l 3 + c(F 4, 4) ] = (½) (½) (½) [ (1/5) (1/2) 1300 ] + (½) (½) (½) [ (1/5) (1/2) (1/4) 2500 ] = = s(F) = 1/2
Filter Placement 1.Greedy algorithm 2.Optimal algorithm
Greedy algorithm Notation c(P, i) = part of the total cost c(P) incurred at N i including transmission from N i to N i+1 network link N i to N i+1 : s( ) = 0, c(,1) = l i
Example 3.3 At N 1, r(F 1 ) = 400, r(F 2 ) = 800, r(F 3 ) = 2600, r(F 4 ) = 5000, F l 1 = 700 > r(F 1 ) At N 2, r(F 2 ) = 160, r(F 3 ) = 520, r(F 4 ) = 1000, F l 2 = 500 > r(F 2 ) At N 3, r(F 3 ) = 260, r(F 4 ) = 500, F l 3 = 300 > r(F 3 ) At N 4, r(F 4 ) c(P) = =
Optimal algorithm Notation network link N i to N i+1 :,
Optimal algorithm Short-circuiting Rank Cost scaleup
Optimal algorithm
Example 3.7 Model links as filters = , r(F 1 ) = 400, r(F 2 ) = 800, r(F 3 ) = 2600, r(F 4 ) = 5000, r(F l 1 ) = 875, r(F l 2,4 ) = r(F 1 ) < r(F 2 ) < r(F l 1 ) < r(F 3 ) < r(F l 2,4 ) < r(F 4 ) c(P) = =
Extensions Correlated filters Tree hierarchies Joins Other extensions
Correlated filters Definition Conditional selecivity s(F|Q) = the fraction of tuples that satisfy F given that they satisfy all the filters in Q Reference Optimal ordering of correlated filters at a single node NP-hard guaranteed to find a cost at most 4 times the opt. cost Approximation ratio of 4 the best possible unless P = NP
Correlated filters Definition , Short-circuiting Optimal solution Tree hierarchy = Each of the queries operates on different data. There is no sharing computation or transmission among them.
Joins Problem k different data streams acquired by N 1 Solution Reference Sliding-window join MJoin operator at a single node join tree is left as future work Query W 1 and W 2 represent the lengths of the windows (time-pased or tuple- based) on streams S 1 and S 2.
Joins Joint operator Illustration Selectivity s( ) = the fraction of the cross product that occurs in the join result Cost r1r1 r2r2 s( )r 1 r 2
Joins Notation F i = filters that can be applied either on S i before the join or after | F i | = n i F 12 = filters that can be applied only on after e the join
Joins Time complexity : O(n 2 n 1 m(n+m)log(n+m))
Extensions Constrained nodes Per-filter cost scaling c(F, i+1) / c(F, i) may be different for different F. Modeling network links as filters no longer applies. It becomes NP-hard.
Conclusion Environment Operator placement problem Tradeoff Lower computational costs Put on the nodes higher up Lower transmission cost Put on the nodes lower down Provide Greedy alg. & Optimal alg. Extensions
Lemma 3.1 by (2)
F 1 in P is chosen according to the theorem. ∵ Lemma 3.1 and s(F l 1 )=0 ∴ F’ 1 in P’ s.t. c( P’, 1 ) < c( P, 1 ) ∵ Theorem 2.1 ∴ c( P, 1 ) ≦ c( P’, 1 ) → contradiction Theorem 3.2
Lemma 3.4
F 1 in P is chosen according to the theorem. ∵ Lemma 3.4 ∴ P’ s.t. c( P’, 1 ) < c( P, 1 ) ∵ Theorem 2.1 ∴ c( P, 1 ) ≦ c( P’, 1 ) → contradiction Theorem 3.5
Suppose and the best Moving the filters on node Ni to Ni-1 Moving the filters on node Ni to Ni+1 ∵ P is best plan ∴ c( P) < c( P’), c( P) < c( P”) → implies → contradiction Lemma 3.6