Relaxing Join and Selection Queries Rares Vernica UC Irvine, USA Joint work with Nick Koudas, Chen Li, and Anthony K. H. Tung.

Slides:



Advertisements
Similar presentations
Chen Li ( 李晨 ) Chen Li Scalable Interactive Search NFIC August 14, 2010, San Jose, CA Joint work with colleagues at UC Irvine and Tsinghua University.
Advertisements

Structural Joins: A Primitive for Efficient XML Query Pattern Matching Shurug Al-Khalifa, H. V. Jagadish, Nick Koudas, Jignesh M. Patel, Divesh Srivastava,
A Generic Framework for Handling Uncertain Data with Local Correlations Xiang Lian and Lei Chen Department of Computer Science and Engineering The Hong.
6.814/6.830 Lecture 8 Memory Management. Column Representation Reduces Scan Time Idea: Store each column in a separate file GM AAPL.
ON IT Utilization of a High Resolution Weather and Impact Model to Predict Hurricane Irene Northeast Regional Operations Workshop 2011 Albany, NY Brandon.
Efficient Type-Ahead Search on Relational Data: a TASTIER Approach Guoliang Li 1, Shengyue Ji 2, Chen Li 2, Jianhua Feng 1 1 Tsinghua University, Beijing,
Liang Jin * UC Irvine Nick Koudas University of Toronto Chen Li * UC Irvine Anthony K.H. Tung National University of Singapore VLDB’2005 * Liang Jin and.
Constructivist and Directed Models The Differences Between The Two.
Utility Service Database Design a database to keep track of service calls for a utility company: Customers call to report problems Call center manages.
Nick Scott | Dynamics CRM Consultant with BKD Technologies.
EXAMPLE 1 Solve an equation with a variable on one side Solve 4 5 x + 8 = x + 8 = x = 12 x = (12) 5 4 x = 15 Write original equation. Subtract.
Efficient Parallel Set-Similarity Joins Using Hadoop Chen Li Joint work with Michael Carey and Rares Vernica.
Database Change Notifications: Primitives for Efficient Database Query Result Caching Cesar Galindo-Legaria, Torsten Grabs, Christian Kleinerman, Florian.
Teleworking Successful forEmployees. Did you know? More than 197,000 employees (23% of the entire Central Texas region workforce) work within Austin’s.
 decimals/cc-7th-fracs-to-decimals/v/converting-fractions-to-decimals-example.
Commission Another way of getting paid. Commission Commission is getting paid by a certain percentage. For example, a car sales woman might be paid 5%
VGRAM:Improving Performance of Approximate Queries on String Collections Using Variable- Length Grams VLDB 2007 Chen Li (UC, Irvine) Bin Wang (Northeastern.
Database Principles. Basics A database is a collection of data, along with the relationships between the data The data has to be entered into a structure,
Teleworking Successful forEmployees Copyright © 2015 The Thrival Company. All Rights Reserved.
Effective Keyword-Based Selection of Relational Databases By Bei Yu, Guoliang Li, Karen Sollins & Anthony K. H. Tung Presented by Deborah Kallina.
Liang Jin * UC Irvine Nick Koudas University of Toronto Chen Li * UC Irvine Anthony K.H. Tung National University of Singapore * Liang Jin and Chen Li:
32nd International Conference on Very Large Data Bases September , 2006 Seoul, Korea Efficient Detection of Empty Result Queries Gang Luo IBM T.J.
Chen Li Department of Computer Science Joint work with Liang Jin, Nick Koudas, Anthony Tung, and Rares Vernica Answering Approximate Queries Efficiently.
Building a System of HEALTH-Care Rosie Adams, Senior PM| Planning Hoag Health System.
Observation vs. Inferences The Local Environment.
The Effect of Customer Relationship Management Systems on Firm Performance.
UC Open Access Policy Presentation to CORCL October 17, /17/2013University of California Irvine1.
BoldLeads - Bold Leads Real Estate Reviews
1 CS122A: Introduction to Data Management Lecture #4 (E-R  Relational Translation) Instructor: Chen Li.
Welcomes YOU To Meet your requirement in one Place.
Best Customer Relationship Management (CRM) in Toronto
Database Principles.
Roadmap Week 1: Intro to MIS Week 2: Systems Analysis
Support 2001.
The Internet of Things (IoT) and Analytics
Remote Monitoring solution
Services Transportation Construction Retail Finance Wholesale -5.2%
Microsoft Dynamics CRM Development

Microsoft Dynamics CRM Training. About Us KMRsoft provides Microsoft Dynamics CRM Training online with live projects. Enroll CRM Course today become expert.
Microsoft Dynamics CRM Development
LUT Structure for Delay: Cluster or Cascade?
در تجزیه و تحلیل شغل باید به 3 سوال اساسی پاسخ دهیم Job analysis تعریف کارشکافی، مطالعه و ثبت جنبه های مشخص و اساسی هر یک از مشاغل عبارتست از مراحلی.
31a. Percentage of projected employment change, by industry,
סדר דין פלילי – חקיקה ומהות ההליך הפלילי
Good User Experience is a pinnacle point of your customer’s online experience. Only by testing your website’s usability will you understand how real Australian.
Measurement LI to read a gauge?.
Structure and Content Scoring for XML

Structure and Content Scoring for XML
CS122B: Projects in Databases and Web Applications Spring 2018
You must show all steps of your working out.
Relaxing Join and Selection Queries
CS122B: Projects in Databases and Web Applications Winter 2018
Question 1.
Real Estate Services - Austin, TX
Қазіргі заманғы ақпараттық технологиялар
财务管理案例教学法 研究及示例 ——王遐昌 2006/11/10.
Yahoo Mail Technical Support Number USA
PROBLEM: Recruiting the right physician at the right time is time-consuming and expensive.
Reach your full recruiting potential
Liang Jin (UC Irvine) Nick Koudas (AT&T Labs Research)
Lease and Real Estate Service in Austin, Texas.
Relax and Adapt: Computing Top-k Matches to XPath Queries
An Efficient Partition Based Method for Exact Set Similarity Joins
CS122B: Projects in Databases and Web Applications Spring 2018
CS122B: Projects in Databases and Web Applications Winter 2018
Presentation transcript:

Relaxing Join and Selection Queries Rares Vernica UC Irvine, USA Joint work with Nick Koudas, Chen Li, and Anthony K. H. Tung

Rares Vernica, UC Irvine 2 Query Example SELECT * FROM Jobs J, Candidates C WHERE J.Salary <= 95 AND J.Zipcode = C.Zipcode AND J.Zipcode = C.Zipcode AND C.WorkExp >= 5; AND C.WorkExp >= 5; JobsCandidates IDCompanyZipcodeSalaryIDZipcodeExpSalaryWorkExp J1Broadcom C J2Intel C J3Microsoft C J4IBM C ……… ………

Rares Vernica, UC Irvine 3 What if the query answer is empty? SELECT * FROM Jobs J, Candidates C WHERE J.Salary <= 95 AND J.Zipcode = C.Zipcode AND J.Zipcode = C.Zipcode AND C.WorkExp >= 5; AND C.WorkExp >= 5; Adjust the conditions  What conditions to adjust?  How to adjust them?

Rares Vernica, UC Irvine 4 Example Percentages of Empty Result Queries In a Customer Relationship Management (CRM) application developed by IBMIn a Customer Relationship Management (CRM) application developed by IBM  18.07% (3,396 empty result queries in 18,793 queries) In a real estate application developed by IBMIn a real estate application developed by IBM  5.75% In a digital library application [JCM + 00]In a digital library application [JCM + 00]  10.53% In a bioinformatics application [RCP + 98]In a bioinformatics application [RCP + 98]  38% Efficient Detection of Empty-Result Queries (p.1015)Gang Luo (IBM T.J. Watson Research Center, USA) VLDB 2006 Efficient Detection of Empty-Result Queries (p.1015)Gang Luo (IBM T.J. Watson Research Center, USA) VLDB 2006

Rares Vernica, UC Irvine 5 Observations JobsCandidates IDCompanyZipcodeSalaryIDZipcodeExpSalaryWorkExp J1Broadcom C J2Intel C J3Microsoft C J4IBM C ……… ……… Different ways to adjust the conditions: Select vs. Join How much to adjust each condition? Salary <= 100 vs. Salary <= 120 Adjust join vs. Adjust both selections Salary <= 95 WorkExp >= 5

Rares Vernica, UC Irvine 6 Contributions Query relaxation framework for selections and joins Lattice-based approach for query relaxation Efficient relaxation algorithms

Rares Vernica, UC Irvine 7 Overview 1.Motivation 2.Query Relaxation 3.Lattice-based Relaxation 4.Relaxation Algorithms 5.Variations 6.Experiments

Rares Vernica, UC Irvine 8 Query Relaxation Top-k / Nearest neighbor  Weight for each condition Skyline  No weights are needed  Conditions are not considered equal  Return non dominated points

Rares Vernica, UC Irvine 9 Query Relaxation Skyline Stephan Börzsönyi, Donald Kossmann, Konrad Stocker: The Skyline Operator. ICDE 2001

Rares Vernica, UC Irvine 10 Overview 1.Motivation 2.Query Relaxation 3.Lattice-based Relaxation 4.Relaxation Algorithms 5.Variations 6.Experiments

Rares Vernica, UC Irvine 11 Lattice-based Relaxation JobsCandidates IDCompanyZipcodeSalaryIDZipcodeExpSalaryWorkExp J1Broadcom C J2Intel C J3Microsoft C J4IBM C ……… ……… Salary <= 95 WorkExp >= 5 R – select on Jobs J – join condition S – select on Candidates

Rares Vernica, UC Irvine 12 Overview 1.Motivation 2.Query Relaxation 3.Lattice-based Relaxation 4.Relaxation Algorithms 5.Variations 6.Experiments

Rares Vernica, UC Irvine 13 Relaxing Selection Conditions JobsCandidates IDCompanyZipcodeSalaryIDZipcodeExpSalaryWorkExp J1Broadcom C J2Intel C J3Microsoft C J4IBM C ……… ……… Algorithm: 1.Compute Skyline on Jobs 2.Compute Skyline on Candidates 3.Join the Skylines Salary <= 95 WorkExp >= 5 INCORRECT Skyline Skyline Empty Join Skyline

Rares Vernica, UC Irvine 14 Relaxing Selection Conditions JobsCandidates IDCompanyZipcodeSalaryIDZipcodeExpSalaryWorkExp J1Broadcom C J2Intel C J3Microsoft C J4IBM C ……… ……… Join First Algorithm: 1.Compute the join (disregarding the selections) 2.Compute Skyline on join results Salary <= 95 WorkExp >= 5 Join Skyline

Rares Vernica, UC Irvine 15 Relaxing Selection Condition Variations Pruning Join  Build the Skyline during the join Pruning Join+  Pruning Join  Build the local Skyline before the join Sorted Access Join  Fagin’s Top-k: sort the columns on relaxation  Compute the join Skyline

Rares Vernica, UC Irvine 16 Relaxing all conditions Multi-Dim.-Index-based-Relaxation Algorithm: 1.Traverse the index structure top-down 2.Form pairs of nodes or records 3.Build the Skyline Skyline Queue

Rares Vernica, UC Irvine 17 Overview 1.Motivation 2.Query Relaxation 3.Lattice-based Relaxation 4.Relaxation Algorithms 5.Variations 6.Experiments

Rares Vernica, UC Irvine 18 Variations Computing Top-k over Skyline  Weight to each condition Queries with multiple joins Conditions on nonnumeric attributes  Dominance checking function

Rares Vernica, UC Irvine 19 Overview 1.Motivation 2.Query Relaxation 3.Lattice-based Relaxation 4.Relaxation Algorithms 5.Variations 6.Experiments

Rares Vernica, UC Irvine 20 Experimental Setting Datasets  Real 1.Internet Movie Database (IMDB) Movies (120k) & ActorInMovies (1.2m) 2.Census-Income – UCI KDD Repository Census (200k)  Synthetic Independent, Correlated, and Anticorrelated Implementation  GNU C++  Spatial Index Library (R-tree)  Linux, AMD Opteron 240, 1GB RAM

Rares Vernica, UC Irvine 21 IMDB Dataset Different algorithms, different behaviors

Rares Vernica, UC Irvine 22 Correlated Dataset Different datasets, different behaviors Anticorrelated Dataset Independent Dataset

Rares Vernica, UC Irvine 23 How big is the Skyline?

Rares Vernica, UC Irvine 24 Relaxing join takes time Self-join on Census Dataset

Rares Vernica, UC Irvine 25 Top-k over Skyline IMDB Dataset

Rares Vernica, UC Irvine 26 Related Work Muslea et al.  Alternate forms of conjunctive expressions Efficient Skyline algorithms  Selection queries Efficient Top-k algorithms  Require weights for conditions

Rares Vernica, UC Irvine 27 Conclusions Query relaxation framework for selections and joins Lattice-based approach for query relaxation Efficient relaxation algorithms

Rares Vernica, UC Irvine 28 Future Work Optimum use of the lattice structure Relax conditions on string attributes Algorithms applicable outside the databases

Questions ?

Rares Vernica, UC Irvine 30

Rares Vernica, UC Irvine 31 Skyline vs. Top-k

Rares Vernica, UC Irvine 32 Skyline vs. Top-k over Skyline