Download presentation
Presentation is loading. Please wait.
1
Operator Placement for In-Network Stream Query Processing
2
Outline Introduction Preliminaries Filter placement Extensions Conclusions
3
Introduction In-network query processing Consider a video surveillance application Environment Target Suspicious activity dark, movement Need filter for calculating intensity (F1) filter for detecting sufficient motion (F2)
4
Introduction Previous work push down all filters since CPU cost << communication cost What if the queries involve expensive predicates ? Objective place each filter at the “ best" node based on selectivity and cost minimize the overall cost
5
Introduction Operator placement problem Tradeoff Lower computational costs Put on the nodes higher up Lower transmission cost Put on the nodes lower down Candidate m-level hierarchy n filters m n possible solutions In this paper… Key idea Model network links as filters Content define the problem provide a greedy alg. that failed present a polynomial-time optimal alg. extend to multiway stream join …
6
Preliminaries Consider a linear chain of nodes Notation S = data acquired by node N 1 F = { F 1, F 2, …, F n } Query
7
Cost Model Three quantities Selectivity of filter F : s(F) fraction of the tuples in stream S that are expected to satisfy F Cost of filter F : c(F, i) per-tuple cost of execution on node N i c(F, i+1) = i c(F, i) i ≤ 1 (if i > 1 ) Cost of network transmission : l i per-tuple cost of transmitting from N i to N i+1 r s( F )r
8
Cost Model Notation P(F) = i if filter F is executed on N i F i = { F | P(F) = i } F ’ = F ’ 1, F ’ 2, …, F ’ n ’ c(F ’, i) = the cost per tuple of executing F ’ at node N i r(F i ) = F i in rank order Ref. J. Hellerstein and M.Stonebraker. Predicate migration: Optimizing queries with expensive predicates. 1993 Cost on a single node Overall cost
9
Example 2.2 c(P) = c(F 1, 1) + s(F 1 ) c(F 2, 1) + s(F 1 ) s(F 2 ) [ l 1 + l 2 + c(F 3, 3) ] + s(F 1 ) s(F 2 ) s(F 3 ) [ l 3 + c(F 4, 4) ] = 200 + (½) 400 + (½) (½) [ 700 + 500 + (1/5) (1/2) 1300 ] + (½) (½) (½) [ 300 + (1/5) (1/2) (1/4) 2500 ] = 200 + 200 + 332.5 + 45.3125 = 777.8125 s(F) = 1/2
10
Filter Placement 1.Greedy algorithm 2.Optimal algorithm
11
Greedy algorithm Notation c(P, i) = part of the total cost c(P) incurred at N i including transmission from N i to N i+1 network link N i to N i+1 : s( ) = 0, c(,1) = l i
12
Example 3.3 At N 1, r(F 1 ) = 400, r(F 2 ) = 800, r(F 3 ) = 2600, r(F 4 ) = 5000, F l 1 = 700 > r(F 1 ) At N 2, r(F 2 ) = 160, r(F 3 ) = 520, r(F 4 ) = 1000, F l 2 = 500 > r(F 2 ) At N 3, r(F 3 ) = 260, r(F 4 ) = 500, F l 3 = 300 > r(F 3 ) At N 4, r(F 4 ) c(P) = 200 + 350 + 40 + 125 + 32.5 + 37.5 + 7.8125 = 792.8125
13
Optimal algorithm Notation network link N i to N i+1 :,
14
Optimal algorithm Short-circuiting Rank Cost scaleup
15
Optimal algorithm
16
Example 3.7 Model links as filters = 4571.42857142857, r(F 1 ) = 400, r(F 2 ) = 800, r(F 3 ) = 2600, r(F 4 ) = 5000, r(F l 1 ) = 875, r(F l 2,4 ) = 4571.4 r(F 1 ) < r(F 2 ) < r(F l 1 ) < r(F 3 ) < r(F l 2,4 ) < r(F 4 ) c(P) = 200 + 200 + 175 + 65 + 100 + 7.8125 = 747.8125
17
Extensions Correlated filters Tree hierarchies Joins Other extensions
18
Correlated filters Definition Conditional selecivity s(F|Q) = the fraction of tuples that satisfy F given that they satisfy all the filters in Q Reference Optimal ordering of correlated filters at a single node NP-hard guaranteed to find a cost at most 4 times the opt. cost Approximation ratio of 4 the best possible unless P = NP
19
Correlated filters Definition , Short-circuiting Optimal solution Tree hierarchy = Each of the queries operates on different data. There is no sharing computation or transmission among them.
20
Joins Problem k different data streams acquired by N 1 Solution Reference Sliding-window join MJoin operator at a single node join tree is left as future work Query W 1 and W 2 represent the lengths of the windows (time-pased or tuple- based) on streams S 1 and S 2.
21
Joins Joint operator Illustration Selectivity s( ) = the fraction of the cross product that occurs in the join result Cost r1r1 r2r2 s( )r 1 r 2
22
Joins Notation F i = filters that can be applied either on S i before the join or after | F i | = n i F 12 = filters that can be applied only on after e the join
23
Joins Time complexity : O(n 2 n 1 m(n+m)log(n+m))
24
Extensions Constrained nodes Per-filter cost scaling c(F, i+1) / c(F, i) may be different for different F. Modeling network links as filters no longer applies. It becomes NP-hard.
25
Conclusion Environment Operator placement problem Tradeoff Lower computational costs Put on the nodes higher up Lower transmission cost Put on the nodes lower down Provide Greedy alg. & Optimal alg. Extensions
26
Lemma 3.1 by (2)
27
F 1 in P is chosen according to the theorem. ∵ Lemma 3.1 and s(F l 1 )=0 ∴ F’ 1 in P’ s.t. c( P’, 1 ) < c( P, 1 ) ∵ Theorem 2.1 ∴ c( P, 1 ) ≦ c( P’, 1 ) → contradiction Theorem 3.2
28
Lemma 3.4
29
F 1 in P is chosen according to the theorem. ∵ Lemma 3.4 ∴ P’ s.t. c( P’, 1 ) < c( P, 1 ) ∵ Theorem 2.1 ∴ c( P, 1 ) ≦ c( P’, 1 ) → contradiction Theorem 3.5
30
Suppose and the best Moving the filters on node Ni to Ni-1 Moving the filters on node Ni to Ni+1 ∵ P is best plan ∴ c( P) < c( P’), c( P) < c( P”) → implies → contradiction Lemma 3.6
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.