Download presentation
Presentation is loading. Please wait.
Published byDanna Bitton Modified over 9 years ago
1
Relaxing Join and Selection Queries Rares Vernica UC Irvine, USA Joint work with Nick Koudas, Chen Li, and Anthony K. H. Tung
2
Rares Vernica, UC Irvine 2 Query Example SELECT * FROM Jobs J, Candidates C WHERE J.Salary <= 95 AND J.Zipcode = C.Zipcode AND J.Zipcode = C.Zipcode AND C.WorkExp >= 5; AND C.WorkExp >= 5; JobsCandidates IDCompanyZipcodeSalaryIDZipcodeExpSalaryWorkExp J1Broadcom9204780C1936521203 J2Intel9365295C2926121306 J3Microsoft82632120C3826321005 J4IBM90391130C4903911501...……… ………
3
Rares Vernica, UC Irvine 3 What if the query answer is empty? SELECT * FROM Jobs J, Candidates C WHERE J.Salary <= 95 AND J.Zipcode = C.Zipcode AND J.Zipcode = C.Zipcode AND C.WorkExp >= 5; AND C.WorkExp >= 5; Adjust the conditions What conditions to adjust? How to adjust them?
4
Rares Vernica, UC Irvine 4 Example Percentages of Empty Result Queries In a Customer Relationship Management (CRM) application developed by IBMIn a Customer Relationship Management (CRM) application developed by IBM 18.07% (3,396 empty result queries in 18,793 queries) In a real estate application developed by IBMIn a real estate application developed by IBM 5.75% In a digital library application [JCM + 00]In a digital library application [JCM + 00] 10.53% In a bioinformatics application [RCP + 98]In a bioinformatics application [RCP + 98] 38% Efficient Detection of Empty-Result Queries (p.1015)Gang Luo (IBM T.J. Watson Research Center, USA) VLDB 2006 Efficient Detection of Empty-Result Queries (p.1015)Gang Luo (IBM T.J. Watson Research Center, USA) VLDB 2006
5
Rares Vernica, UC Irvine 5 Observations JobsCandidates IDCompanyZipcodeSalaryIDZipcodeExpSalaryWorkExp J1Broadcom9204780C1936521203 J2Intel9365295C2926121306 J3Microsoft82632120C3826321005 J4IBM90391130C4903911501...……… ……… Different ways to adjust the conditions: Select vs. Join How much to adjust each condition? Salary <= 100 vs. Salary <= 120 Adjust join vs. Adjust both selections Salary <= 95 WorkExp >= 5
6
Rares Vernica, UC Irvine 6 Contributions Query relaxation framework for selections and joins Lattice-based approach for query relaxation Efficient relaxation algorithms
7
Rares Vernica, UC Irvine 7 Overview 1.Motivation 2.Query Relaxation 3.Lattice-based Relaxation 4.Relaxation Algorithms 5.Variations 6.Experiments
8
Rares Vernica, UC Irvine 8 Query Relaxation Top-k / Nearest neighbor Weight for each condition Skyline No weights are needed Conditions are not considered equal Return non dominated points
9
Rares Vernica, UC Irvine 9 Query Relaxation Skyline Stephan Börzsönyi, Donald Kossmann, Konrad Stocker: The Skyline Operator. ICDE 2001
10
Rares Vernica, UC Irvine 10 Overview 1.Motivation 2.Query Relaxation 3.Lattice-based Relaxation 4.Relaxation Algorithms 5.Variations 6.Experiments
11
Rares Vernica, UC Irvine 11 Lattice-based Relaxation JobsCandidates IDCompanyZipcodeSalaryIDZipcodeExpSalaryWorkExp J1Broadcom9204780C1936521203 J2Intel9365295C2926121306 J3Microsoft82632120C3826321005 J4IBM90391130C4903911501...……… ……… Salary <= 95 WorkExp >= 5 R – select on Jobs J – join condition S – select on Candidates
12
Rares Vernica, UC Irvine 12 Overview 1.Motivation 2.Query Relaxation 3.Lattice-based Relaxation 4.Relaxation Algorithms 5.Variations 6.Experiments
13
Rares Vernica, UC Irvine 13 Relaxing Selection Conditions JobsCandidates IDCompanyZipcodeSalaryIDZipcodeExpSalaryWorkExp J1Broadcom9204780C1936521203 J2Intel9365295C2926121306 J3Microsoft82632120C3826321005 J4IBM90391130C4903911501...……… ……… Algorithm: 1.Compute Skyline on Jobs 2.Compute Skyline on Candidates 3.Join the Skylines Salary <= 95 WorkExp >= 5 INCORRECT Skyline Skyline Empty Join Skyline
14
Rares Vernica, UC Irvine 14 Relaxing Selection Conditions JobsCandidates IDCompanyZipcodeSalaryIDZipcodeExpSalaryWorkExp J1Broadcom9204780C1936521203 J2Intel9365295C2926121306 J3Microsoft82632120C3826321005 J4IBM90391130C4903911501...……… ……… Join First Algorithm: 1.Compute the join (disregarding the selections) 2.Compute Skyline on join results Salary <= 95 WorkExp >= 5 Join Skyline
15
Rares Vernica, UC Irvine 15 Relaxing Selection Condition Variations Pruning Join Build the Skyline during the join Pruning Join+ Pruning Join Build the local Skyline before the join Sorted Access Join Fagin’s Top-k: sort the columns on relaxation Compute the join Skyline
16
Rares Vernica, UC Irvine 16 Relaxing all conditions Multi-Dim.-Index-based-Relaxation Algorithm: 1.Traverse the index structure top-down 2.Form pairs of nodes or records 3.Build the Skyline Skyline Queue
17
Rares Vernica, UC Irvine 17 Overview 1.Motivation 2.Query Relaxation 3.Lattice-based Relaxation 4.Relaxation Algorithms 5.Variations 6.Experiments
18
Rares Vernica, UC Irvine 18 Variations Computing Top-k over Skyline Weight to each condition Queries with multiple joins Conditions on nonnumeric attributes Dominance checking function
19
Rares Vernica, UC Irvine 19 Overview 1.Motivation 2.Query Relaxation 3.Lattice-based Relaxation 4.Relaxation Algorithms 5.Variations 6.Experiments
20
Rares Vernica, UC Irvine 20 Experimental Setting Datasets Real 1.Internet Movie Database (IMDB) Movies (120k) & ActorInMovies (1.2m) 2.Census-Income – UCI KDD Repository Census (200k) Synthetic Independent, Correlated, and Anticorrelated Implementation GNU C++ Spatial Index Library (R-tree) Linux, AMD Opteron 240, 1GB RAM
21
Rares Vernica, UC Irvine 21 IMDB Dataset Different algorithms, different behaviors
22
Rares Vernica, UC Irvine 22 Correlated Dataset Different datasets, different behaviors Anticorrelated Dataset Independent Dataset
23
Rares Vernica, UC Irvine 23 How big is the Skyline?
24
Rares Vernica, UC Irvine 24 Relaxing join takes time Self-join on Census Dataset
25
Rares Vernica, UC Irvine 25 Top-k over Skyline IMDB Dataset
26
Rares Vernica, UC Irvine 26 Related Work Muslea et al. Alternate forms of conjunctive expressions Efficient Skyline algorithms Selection queries Efficient Top-k algorithms Require weights for conditions
27
Rares Vernica, UC Irvine 27 Conclusions Query relaxation framework for selections and joins Lattice-based approach for query relaxation Efficient relaxation algorithms
28
Rares Vernica, UC Irvine 28 Future Work Optimum use of the lattice structure Relax conditions on string attributes Algorithms applicable outside the databases
29
Questions ?
30
Rares Vernica, UC Irvine 30
31
Rares Vernica, UC Irvine 31 Skyline vs. Top-k
32
Rares Vernica, UC Irvine 32 Skyline vs. Top-k over Skyline
Similar presentations
© 2024 SlidePlayer.com. Inc.
All rights reserved.