Relaxing Join and Selection Queries Rares Vernica UC Irvine, USA Joint work with Nick Koudas, Chen Li, and Anthony K. H. Tung
Rares Vernica, UC Irvine 2 Query Example SELECT * FROM Jobs J, Candidates C WHERE J.Salary <= 95 AND J.Zipcode = C.Zipcode AND J.Zipcode = C.Zipcode AND C.WorkExp >= 5; AND C.WorkExp >= 5; JobsCandidates IDCompanyZipcodeSalaryIDZipcodeExpSalaryWorkExp J1Broadcom C J2Intel C J3Microsoft C J4IBM C ……… ………
Rares Vernica, UC Irvine 3 What if the query answer is empty? SELECT * FROM Jobs J, Candidates C WHERE J.Salary <= 95 AND J.Zipcode = C.Zipcode AND J.Zipcode = C.Zipcode AND C.WorkExp >= 5; AND C.WorkExp >= 5; Adjust the conditions What conditions to adjust? How to adjust them?
Rares Vernica, UC Irvine 4 Example Percentages of Empty Result Queries In a Customer Relationship Management (CRM) application developed by IBMIn a Customer Relationship Management (CRM) application developed by IBM 18.07% (3,396 empty result queries in 18,793 queries) In a real estate application developed by IBMIn a real estate application developed by IBM 5.75% In a digital library application [JCM + 00]In a digital library application [JCM + 00] 10.53% In a bioinformatics application [RCP + 98]In a bioinformatics application [RCP + 98] 38% Efficient Detection of Empty-Result Queries (p.1015)Gang Luo (IBM T.J. Watson Research Center, USA) VLDB 2006 Efficient Detection of Empty-Result Queries (p.1015)Gang Luo (IBM T.J. Watson Research Center, USA) VLDB 2006
Rares Vernica, UC Irvine 5 Observations JobsCandidates IDCompanyZipcodeSalaryIDZipcodeExpSalaryWorkExp J1Broadcom C J2Intel C J3Microsoft C J4IBM C ……… ……… Different ways to adjust the conditions: Select vs. Join How much to adjust each condition? Salary <= 100 vs. Salary <= 120 Adjust join vs. Adjust both selections Salary <= 95 WorkExp >= 5
Rares Vernica, UC Irvine 6 Contributions Query relaxation framework for selections and joins Lattice-based approach for query relaxation Efficient relaxation algorithms
Rares Vernica, UC Irvine 7 Overview 1.Motivation 2.Query Relaxation 3.Lattice-based Relaxation 4.Relaxation Algorithms 5.Variations 6.Experiments
Rares Vernica, UC Irvine 8 Query Relaxation Top-k / Nearest neighbor Weight for each condition Skyline No weights are needed Conditions are not considered equal Return non dominated points
Rares Vernica, UC Irvine 9 Query Relaxation Skyline Stephan Börzsönyi, Donald Kossmann, Konrad Stocker: The Skyline Operator. ICDE 2001
Rares Vernica, UC Irvine 10 Overview 1.Motivation 2.Query Relaxation 3.Lattice-based Relaxation 4.Relaxation Algorithms 5.Variations 6.Experiments
Rares Vernica, UC Irvine 11 Lattice-based Relaxation JobsCandidates IDCompanyZipcodeSalaryIDZipcodeExpSalaryWorkExp J1Broadcom C J2Intel C J3Microsoft C J4IBM C ……… ……… Salary <= 95 WorkExp >= 5 R – select on Jobs J – join condition S – select on Candidates
Rares Vernica, UC Irvine 12 Overview 1.Motivation 2.Query Relaxation 3.Lattice-based Relaxation 4.Relaxation Algorithms 5.Variations 6.Experiments
Rares Vernica, UC Irvine 13 Relaxing Selection Conditions JobsCandidates IDCompanyZipcodeSalaryIDZipcodeExpSalaryWorkExp J1Broadcom C J2Intel C J3Microsoft C J4IBM C ……… ……… Algorithm: 1.Compute Skyline on Jobs 2.Compute Skyline on Candidates 3.Join the Skylines Salary <= 95 WorkExp >= 5 INCORRECT Skyline Skyline Empty Join Skyline
Rares Vernica, UC Irvine 14 Relaxing Selection Conditions JobsCandidates IDCompanyZipcodeSalaryIDZipcodeExpSalaryWorkExp J1Broadcom C J2Intel C J3Microsoft C J4IBM C ……… ……… Join First Algorithm: 1.Compute the join (disregarding the selections) 2.Compute Skyline on join results Salary <= 95 WorkExp >= 5 Join Skyline
Rares Vernica, UC Irvine 15 Relaxing Selection Condition Variations Pruning Join Build the Skyline during the join Pruning Join+ Pruning Join Build the local Skyline before the join Sorted Access Join Fagin’s Top-k: sort the columns on relaxation Compute the join Skyline
Rares Vernica, UC Irvine 16 Relaxing all conditions Multi-Dim.-Index-based-Relaxation Algorithm: 1.Traverse the index structure top-down 2.Form pairs of nodes or records 3.Build the Skyline Skyline Queue
Rares Vernica, UC Irvine 17 Overview 1.Motivation 2.Query Relaxation 3.Lattice-based Relaxation 4.Relaxation Algorithms 5.Variations 6.Experiments
Rares Vernica, UC Irvine 18 Variations Computing Top-k over Skyline Weight to each condition Queries with multiple joins Conditions on nonnumeric attributes Dominance checking function
Rares Vernica, UC Irvine 19 Overview 1.Motivation 2.Query Relaxation 3.Lattice-based Relaxation 4.Relaxation Algorithms 5.Variations 6.Experiments
Rares Vernica, UC Irvine 20 Experimental Setting Datasets Real 1.Internet Movie Database (IMDB) Movies (120k) & ActorInMovies (1.2m) 2.Census-Income – UCI KDD Repository Census (200k) Synthetic Independent, Correlated, and Anticorrelated Implementation GNU C++ Spatial Index Library (R-tree) Linux, AMD Opteron 240, 1GB RAM
Rares Vernica, UC Irvine 21 IMDB Dataset Different algorithms, different behaviors
Rares Vernica, UC Irvine 22 Correlated Dataset Different datasets, different behaviors Anticorrelated Dataset Independent Dataset
Rares Vernica, UC Irvine 23 How big is the Skyline?
Rares Vernica, UC Irvine 24 Relaxing join takes time Self-join on Census Dataset
Rares Vernica, UC Irvine 25 Top-k over Skyline IMDB Dataset
Rares Vernica, UC Irvine 26 Related Work Muslea et al. Alternate forms of conjunctive expressions Efficient Skyline algorithms Selection queries Efficient Top-k algorithms Require weights for conditions
Rares Vernica, UC Irvine 27 Conclusions Query relaxation framework for selections and joins Lattice-based approach for query relaxation Efficient relaxation algorithms
Rares Vernica, UC Irvine 28 Future Work Optimum use of the lattice structure Relax conditions on string attributes Algorithms applicable outside the databases
Questions ?
Rares Vernica, UC Irvine 30
Rares Vernica, UC Irvine 31 Skyline vs. Top-k
Rares Vernica, UC Irvine 32 Skyline vs. Top-k over Skyline