Download presentation
Presentation is loading. Please wait.
Published byMarian Campbell Modified over 8 years ago
1
Network-Aware Query Processing for Stream- based Application Yanif Ahmad, Ugur Cetintemel - Brown University VLDB 2004
2
One-line Comments This paper is addressing the operator placement problem in distributed query processing by using network latency information
3
Contents Motivation Problem Solution Approach Central Version of Algorithm Edge Edge+ In-Network latency Constrained Distributed Version of Algorithm Experiment Critique
4
Motivation Small scale query processing system: Not-scalable A lot of data stream & query request Widely-distributed query processing
5
Problem Operator placement problem Operators in query processing trees should be dispersed into the network O 00 O 10 O 11 O 20 O 22 O 21 O 23 O 25 O 24 O 26 O 00 Processing tree (query plan)IP network O 10 O 11 O 22 O 23 O 26 O 25 O 20 O 21 O 24 operatornode Application node
6
Problem : formalized version Operator placement problem For efficient operator placement Cost: Bandwidth O: operators A: their connected inputs & outputs V: nodes E: their links C(): link cost, bandwidth c(a)=0 if for a=(m,n) : Source (operator’s) locations are determined m n ac(a)
7
Solution Approach Network-aware operator placement algorithms Edge Consider only sources and the proxy location Edge+ Edge with pair-wise server communication latencies In-Network Sources, proxy, a subset of all locations Latency-bound algorithm
8
Contents Motivation Problem Solution Approach Central Version of Algorithm Distributed Version of Algorithm Experiment Critique
9
Algorithm Design Principle Naïve algorithm for operator placement Calculate all the combination of possible mapping => Too complex Greedy algorithm Calculate only for the locations of having high possibility Locate operators in post-order When we put a operator at a location, we can move by its children Processing tree O 00 O 10 O 11 O 20 O 22 O 21 O 23 O 25 O 24 O 26 IP network operatornode Application node S0S0 S1S1
10
Mapping Function O O 10 O 12 O 11 O 20 O 22 O 21 O 23 O 25 O 24 O 26 O 27 O 29 O 28
11
Edge Location candidate: sources, proxy Candidate with high possibility (1) One of children’s locations (2) A common location (3) Proxy’s location Link cost
12
Edge (1) One of children’s locations A location that maximizes the total tree cost between the operator and all of its children O 00 O 10 O 12 O 11 O 20 O 22 O 21 O 23 O 25 O 24 O 26 O 27 O 29 O 28 S0S0 S0S0 S1S1 S0S0 S1S1 S1S1 S2S2 S0S0 S1S1 S1S1 S0S0 S1S1 O 10 O 20 O 22 O 21 305020 Processing tree
13
Edge (2) A common location Idea Placing an operator and its children at a common location -> zero overlay cost between the operator and its children Common location (cl) Good place for all its children -> an intersection of each child’s dl (the set of descendant leaf locations) O 00 O 10 O 12 O 11 O 20 O 22 O 21 O 23 O 25 O 24 O 26 O 27 O 29 O 28 S0S0 S0S0 S1S1 S0S0 S1S1 S1S1 S2S2 S0S0 S1S1 S1S1 dl(O 11 )={S 0, S 1, S 2 } cl(O 00 )={S 0, S 1 }
14
Edge (3) Proxy’s location Idea If tree costs are higher near the root -> proxy location, r O 00 O 10 O 12 O 11 O 20 O 22 O 21 O 23 O 25 O 24 O 26 O 27 O 29 O 28 S0S0 S0S0 S1S1 S0S0 S1S1 S1S1 S2S2 S0S0 S1S1 S1S1
15
Edge – Summary Summary
16
Edge+ Location candidate: sources, proxy Edge with network latency (d) between two locations Link cost Mapping function
17
In-Network Placement Location candidate : arbitrary locations (including sources and proxy) Overlay cost and mapping function is the same as Edge+ Problem: reducing the candidate location set
18
In-Network Placement Approach Remove the location unless its distance to all current child placements is less than all pairwise distances between child placements O 00 O 10 O 12 O 11 O 12 O 10 O 00 40 30 20 50 60 30 N2N2 N4N4 N7N7 N8N8
19
Latency-Constrained Placement Find the configuration satisfying the latency-constrained Latency-constrained o cici O 20 O 22 O 21 S0S0 S0S0 S1S1 S0S0 S1S1 S1S1 S2S2 S0S0 S1S1 S1S1 P: a set of leaf-to-root paths cici O O 20 50 30 N4N4 N7N7 S1S1 O 22 O 21 S0S0 O 20 N5N5 If l=75
20
Contents Motivation Problem Solution Approach Central Version of Algorithm Distributed Version of Algorithm Experiment Critique
21
Distributed Query Placement Reason Centralized approach – not scalable Substantial network state Algorithm complexity
22
Distributed Query Placement O1O1 C1C1 C2C2 C3C3 C4C4 O2O2 O3O3 O4O4 Processing tree Application proxy Partition a processing tree into subtrees (zones) Assign each zone to a coordinator node
23
Distributed Query Placement C1C1 C2C2 C3C3 C4C4 Tree Overlay
24
Experiment Experimental Setup Processing Tree Binary tree Depth: 3 ~ 5 Network Topology Max pair-wise path delay: 500ms Server and proxy location Uniform: APD = ASD Star: APD = 0.5*ASD Cluster: APD = 2*ASD APD: Average Proxy Distance ASD: Average Server Distance ServerProxy, Uniform Proxy, ClusterProxy, Star
25
Experiment Latency constraints 120ms (0.9nd, tight delay) vs. 300ms (2.2nd, loose delay) Direct comparison Baseline case: all operators are located at the proxy Result Bandwidth consumptionLatency stretch
26
Critique Pros Operator placement problem Focus on network-related cost not processing cost (BW, latency) Cons High complexity algorithm possible to apply? Heavy processing Too much time taken to complete the placement Latency information of many places is needed Sequential convergence in a bottom-up manner => impossible to use in case of complex query plan & topology => more simple algorithm is appropriate Dynamic? Unresilient to Dynamic topology change In case of node leave, latency change
Similar presentations
© 2024 SlidePlayer.com. Inc.
All rights reserved.