Download presentation
Presentation is loading. Please wait.
Published byGiles Harrison Modified over 8 years ago
1
Partial Query-Evaluation in Internet Query Engines Jayavel Shanmugasundaram Kristin Tufte David DeWitt David Maier Jeffrey Naughton University of Wisconsin & Oregon Graduate Institute
2
Outline Motivation Desired Operator Properties Implementation Alternatives Performance Evaluation Conclusion
3
Querying the WWW: The Present Who won the Nobel prize for Physics in 1999? Nobel prize physics 1999 www.google.com The Internet Search Engine HTML File
4
Querying the WWW: The Present Want 1998 Red BMW No accidents 20% < avg. model price The Internet Search Engine HTML File 1998 Red BMW price www.google.com HTML File
5
Querying the WWW: The Future? Want 1998 Red BMW No accidents 20% < avg. model price (Queryable) Data Source (Queryable) XML Sources The Internet Internet Query Engine XML Query Language XML Query Engine (e.g., Niagara) High-level Query Through GUI www.google+.com
6
Inside the Internet Query Engine (carId, model, price, otherinfo) Red Used BMW Cars (carId, model, price, otherinfo) Not Exists (carId, model, price, otherinfo) (model, avgprice) Group By Join model = model price <= 0.8 * avgprice Union (carId, model, price, otherinfo) Accident Reports (carId) Union (carId)
7
The Problem Return results to users as soon as possible “Results so far” for queries with blocking operators Arbitrary blocking operators –Not exists, Average, Nest … Blocking operators occurring anywhere in the query –Potentially intermixed with non-blocking operators
8
Outline Motivation Desired Operator Properties Implementation Alternatives Performance Evaluation Conclusion
9
What is a Partial Result of a Query? Let Full Result of Query Q on Inputs A and B be: –Q(A, B) Then Partial Result of Query Q on Inputs A and B is: –Q(PA, PB) –PA A –PB B
10
Maximal Output Property Produce “correct” results as soon as possible Why? –If query is non-blocking Produces results soon –If query is blocking Return “non-blocking parts” soon (e.g., outer join)
11
Inside the Internet Query Engine (carId, model, price, otherinfo) Red 1998 BMW CarsAccident Reports (carId) (carId, model, price, otherinfo) Not Exists (carId, model, price, otherinfo) (model, avgprice) Group By Join model = model price <= 0.8 * avgprice Union (carId, model, price, otherinfo) Union (carId)
12
Anytime Property Blocking operators should be able to return the “result so far” at any time Why? –User can request partial results at any time
13
Inside the Internet Query Engine (carId, model, price, otherinfo) Red 1998 BMW CarsAccident Reports (carId) (carId, model, price, otherinfo) Not Exists (carId, model, price, otherinfo) (model, avgprice) Group By Join model = model price <= 0.8 * avgprice Union (carId, model, price, otherinfo) Union (carId)
14
Non-Monotonic Input/Output Property Operators should handle “changes”, not just additions to input Similarly, operators should produce “changes”, not just additions to output Both blocking and non-blocking operators Why? –Partial results may represent “wrong” answers –Need to be corrected later
15
Inside the Internet Query Engine (carId, model, price, otherinfo) Red 1998 BMW CarsAccident Reports (carId) (carId, model, price, otherinfo) Not Exists (carId, model, price, otherinfo) (model, avgprice) Group By Join model = model price <= 0.8 * avgprice Union (carId, model, price, otherinfo) Union (carId)
16
Flexible Input Property Should be able to process data from any input at any time Processes data as it becomes available Why? –If query is non-blocking: Can return results soon –If query is blocking Faster partial result response time
17
A Note on Partial Result Accuracy Focus is on producing partial results Architecture is general enough to exploit existing techniques –Online aggregation [Hellerstein et. al.] –Nested aggregates [Tan et. al.] Accuracy for general blocking operators?
18
Outline Motivation Desired Operator Properties Implementation Alternatives Performance Evaluation Conclusion
19
Where do we start? Use known flexible input, maximal output operator implementations –Non-blocking: select, symmetric hash join, Xjoin –Blocking: group-by, symmetric outer join Blocking operator implementations should satisfy anytime property All operator implementations should satisfy non-monotonic input/output property
20
Non-Monotonic Input/Output Re-evaluation Approach: –On partial result request, compute results “so far” –Then forget all “potentially incorrect” inputs Differential Approach: –On partial result request, compute results “so far” –“Update” incorrect inputs for future result computation
21
Inside the Internet Query Engine (carId, model, price, otherinfo) Red 1998 BMW CarsAccident Reports (carId) (carId, model, price, otherinfo) Not Exists (carId, model, price, otherinfo) (model, avgprice) Group By Join model = model price <= 0.8 * avgprice Union (carId, model, price, otherinfo) Union (carId)
22
Re-evaluation Join (1, Z3, 10000) (Z3, 15000) (19, Z3, 20000) (5, 400i, 30000) (400i, 25000) (1, Z3, 10000) (19, Z3, 20000) (5, 400i, 30000) (Z3, 15000) (400i, 25000) (1, Z3, 10000) (3, 400i, 20000)
23
Re-evaluation Join (1, Z3, 10000) (19, Z3, 20000) (5, 400i, 30000) (1, Z3, 10000) (3, 400i, 20000) (8, 400i, 20000) (Z3, 15000) (400i, 23333)
24
Differential Join (1, Z3, 10000) (Z3, 15000) (19, Z3, 20000) (5, 400i, 30000) (400i, 25000) (1, Z3, 10000) (19, Z3, 20000) (5, 400i, 30000) (Z3, 15000) (400i, 25000) (1, Z3, 10000) (3, 400i, 20000)
25
Differential Join (1, Z3, 10000) (19, Z3, 20000) (5, 400i, 30000) (Z3, 15000) (400i, 25000) (3, 400i, 20000) update (400i, 23333) del (3, 400i, 20000)
26
Differential Join (1, Z3, 10000) (19, Z3, 20000) (5, 400i, 30000) (Z3, 15000) (400i, 23333) (3, 400i, 20000) del (3, 400i, 20000) (8, 400i, 20000)
27
Re-evaluation vs. Differential Re-evaluation Approach: –Simple – just “forget” partial inputs –Easier to extend (no changes to tuple structure) –Unnecessary computation Differential Approach: –Need to handle deletions/updates of inputs –Changes to tuple structure –Re-computes only what is necessary
28
Outline Motivation Desired Operator Properties Implementation Alternatives Performance Evaluation Conclusion
29
Response Time
30
Outline Motivation Desired Operator Properties Implementation Alternatives Performance Evaluation Conclusion
31
New properties for query engine operators Operator implementation alternatives –Re-evaluation –Differential Evaluation –Partial results improve response time –Re-evaluation approach is simpler –Differential approach is more efficient
32
Future Work General GUI Partial result accuracy for general blocking operators Changes at finer granularities Consistent partial results
33
Related Work Online aggregation [Hellerstein et. al.] Nested aggregates [Tan et. al.] Online reordering [Raman et. al.] Symmetric hash join [Wilschut et. al.] Adaptive operators [Ives et. al.] XJoin [Urhan et. al.]
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.