Download presentation
Presentation is loading. Please wait.
Published byKaliyah Faye Modified over 10 years ago
1
Using Partial Evaluation in Distributed Query Evaluation Peter Buneman, Gao Cong, Wenfei Fan, Anastasios (Tasos) Kementsietsidis
2
© Anastasios KementsietsidisVLDB 2006 2 name NASDAQ Cutting Down Trees… portofolio broker name market name stock code YHOO stock NASDAQ Merill Lynch broker name market Bache market name NYSE Tell me when GOOG stock sells for 376: [//stock[code = GOOG sell = 376] buy $33 sell $35 code GOOG buy $374 sell $373 stock code IBM buy $80 sell $78 stock code AAPL stock buy $71 sell $65 code GOOG buy $370 sell $372 … … Lets stream! Not P0P0 P1P1 P2P2 P2P2 Lets do a Depth-first traversal. We visit: P 0 P 1 P 2 P 1 P 0 P 2 P 0
3
© Anastasios KementsietsidisVLDB 2006 3 Status report… We have XML Trees arbitrarily fragmented and distributed We want to execute Boolean Xpath queries Q = [q] over the fragmented trees. q := p | p/text()=str | label() = A | ¬q | q q | q q p := | A | * | p//p | p/p | p[q] Lessons learned: We want to visit each peer only once, irrespectively of the number of (tree) fragments it stores. We want to minimize communication costs. Ideally, no fragment data should be send while evaluating a query. Our motto: Send processing to data NOT data to processing
4
© Anastasios KementsietsidisVLDB 2006 4 Partial Evaluation Consider a function f (s, d ) and part of its input, say s. Then, partial evaluation is to specialize f (s, d ), i.e., to perform the part of f s computation that depends only on s. This generates a residual function g(d) that depends only on d.
5
© Anastasios KementsietsidisVLDB 2006 5 Tree Fragments F1F1 F3F3 F2F2 Fragment F 0 Fragment F 1 Fragment F 2 Fragment F 3 F0F0 F1F1 F2F2 F3F3 Fragment Tree portofolio broker name Bache market name NYSE stock code IBM buy $80 sell $78 … broker name Merill Lynch … market name stock code YHOO stock buy $33 sell $35 code GOOG buy $374 sell $373 NASDAQ name market stock code AAPL stock buy $71 sell $65 code GOOG buy $370 sell $372 NASDAQ
6
© Anastasios KementsietsidisVLDB 2006 6 F1F1 F3F3 portofolio broker name Bache market name NYSE stock code IBM buy $80 sell $78 … Partial Evaluation in Distributed Query Evaluation Main idea: Given a query Q, send Q to every peer holding a fragment [//stock[code = GOOG sell = 376] P0P0 P1P1 P2P2 Compute Partial Answers (Boolean formulas): Q is evaluated bottom-up We use Boolean variables for the evaluation of fragment nodes Compute Partial Answers (Boolean formulas): Q is evaluated bottom-up We use Boolean variables for the evaluation of fragment nodes P 2 has two fragments but is only visited once Answer of Q:Computed by solving a linear system of Boolean equations Answer of Q:Computed by solving a linear system of Boolean equations
7
© Anastasios KementsietsidisVLDB 2006 7 Query Evaluation Q = [//stock[code = GOOG sell = 376] q 0 : code = GOOG q 1 : sell = 376 q 2 : */q 0 */q 1 q 3 : stock[q 2 ] q 4 : //q 3 Q = Query Representation: stock code GOOG buy $370 sell $376 market … Query Evaluation Example 1: stock code GOOG buy $370 F market … Query Evaluation Example 2:
8
© Anastasios KementsietsidisVLDB 2006 8 Three stages Stage 1: Querying peer P Q sends query Q to all peers having a fragment (use the fragment tree to identify all such peers) Stage 2: Evaluate Q, in parallel, over each fragment F i in peer P j Stage 3: Collect partial answers in P Q and compute the answer to Q. Key considerations/concerns: (Total/Parallel) Computation costs. Communication costs. Level of fragmentation. The ParBoX Algorithm F 0 (P 0 ) F 1 (P 1 ) F 2 (P 2 ) F 3 (P 2 ) ParBoX comes in flavors: HybridParBoX FullDistParBoX LazyParBoX
9
© Anastasios KementsietsidisVLDB 2006 9 Analysis of Algorithms AlgorithmVisits/PeerComputationCommunication NaiveCentralized1 O (|Q| |T|) O (|T|) NaiveDistributedcard(S i ) O (|Q| |T|) O (|Q|card(T)) ParBoX1 Tot O (|Q| (|T| + card(T))) O (|Q|card(T)) Par O (|Q| (max Pj |F Pj | + card(T))) HybridParBoX1 Tot O (|Q| |T|) O (|T|) Par O (|Q| (max Pj |F Pj | + card(T))) FullDistParBoXcard(S i ) Tot O (|Q| (|T| + card(T))) O (|Q|card(T)) Par O (|Q| (max Pj |F Pj | + card(T))) LazyParBoXcard(S i ) Tot O (|Q| (|T| + card(T))) O (|Q|card(T)) Par O (|Q| card(T) max T |F i | ) card(S i ) = # of fragments in peer P i card(T) = # of fragments of tree T. Note that card(T) |T| |F Sj | = sum of fragments (sizes) in peer P j Communication costs are LOW and independent of T (the data) Communication costs are LOW and independent of T (the data) Computation costs are comparable to the best-known centralized algorithm Computation costs are comparable to the best-known centralized algorithm
10
© Anastasios KementsietsidisVLDB 2006 10 The Experimental Study The setting: Ten (10) Linux machines (peers) distributed over a local LAN XMark sites are fragmented and distributed over the network. Their sizes vary between 5MB-150MB. The parameters: # of machines participating in each experiment Size of query Q Size of tree T The shape of the fragment tree –Number of fragments in the tree –Nesting level (deep vs. shallow fragment trees) –Number of fragments per machine
11
© Anastasios KementsietsidisVLDB 2006 11 NaiveCentralized vs. ParBoX |T| = 50MB |Q| = 8 # fragment/peer = 1 |T| = 50MB |Q| = 8 # fragment/peer = 1 With |T| fixed, as we increase the number of machines, the difference (between iterations) in the size of the fragment that is allocated in each machine decreases. Parallelism works! Shipping data costs! Parallelism works! Shipping data costs!
12
© Anastasios KementsietsidisVLDB 2006 12 Varying Query and Data Size # peers = 8 # fragment/peer = 1 # peers = 8 # fragment/peer = 1 F0F0 F1F1 F4F4 F2F2 F3F3 F6F6 F7F7 F5F5
13
© Anastasios KementsietsidisVLDB 2006 13 Summary We (practically) proved that partial evaluation is effective in XML query processing of fragmented XML document trees. We presented the family of ParBoX algorithms to evaluate Boolean Xpath queries. Our algorithms guarantee that: –Optimal computation costs. –Each peer is visited only once. –Communication is depends only on the query size (and not the tree) The question in everybodys mind… Can we extend this idea to non-boolean Xpath queries??? The answer is YES… but you have to wait a bit to read about it!!
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.