HKU CSIS DB Seminar Processing Ad-Hoc Joins on Mobile Devices HKU CSIS DB Seminar 10 Oct 2003 Speaker: Eric Lo.

Slides:



Advertisements
Similar presentations
Answering Approximate Queries over Autonomous Web Databases Xiangfu Meng, Z. M. Ma, and Li Yan College of Information Science and Engineering, Northeastern.
Advertisements

Database Management Systems, R. Ramakrishnan and J. Gehrke1 Evaluation of Relational Operations Chapter 12, Part A.
CS4432: Database Systems II
Fast Algorithms For Hierarchical Range Histogram Constructions
Introduction to Histograms Presented By: Laukik Chitnis
STHoles: A Multidimensional Workload-Aware Histogram Nicolas Bruno* Columbia University Luis Gravano* Columbia University Surajit Chaudhuri Microsoft Research.
Database Management Systems 3ed, R. Ramakrishnan and Johannes Gehrke1 Evaluation of Relational Operations: Other Techniques Chapter 14, Part B.
Database Management Systems, R. Ramakrishnan and Johannes Gehrke1 Evaluation of Relational Operations: Other Techniques Chapter 12, Part B.
Database Management Systems, R. Ramakrishnan and Johannes Gehrke1 Evaluation of Relational Operations: Other Techniques Chapter 12, Part B.
CMU SCS /615Faloutsos/Pavlo1 Carnegie Mellon Univ. Dept. of Computer Science /615 – DB Applications C. Faloutsos & A. Pavlo Lecture #13: Query.
Advanced Database Systems September 2013 Dr. Fatemeh Ahmadi-Abkenari 1.
Query Evaluation. An SQL query and its RA equiv. Employees (sin INT, ename VARCHAR(20), rating INT, age REAL) Maintenances (sin INT, planeId INT, day.
SPRING 2004CENG 3521 Join Algorithms Chapter 14. SPRING 2004CENG 3522 Schema for Examples Similar to old schema; rname added for variations. Reserves:
Optimization of Spatial Joins on Mobile Devices N. Mamoulis 1, P. Kalnis 2, S. Bakiras 3, X. Li 2 1 Department of Computer Science and Information Systems,
1 Distributed Databases Chapter Two Types of Applications that Access Distributed Databases The application accesses data at the level of SQL statements.
1 Distributed Databases CS347 Lecture 14 May 30, 2001.
Query Processing (overview)
CS 347Notes 041 CS 347: Distributed Databases and Transaction Processing Notes04: Query Optimization Hector Garcia-Molina.
Query Optimization 3 Cost Estimation R&G, Chapters 12, 13, 14 Lecture 15.
1 Evaluation of Relational Operations: Other Techniques Chapter 12, Part B.
Ad-hoc Distributed Spatial Joins on Mobile Devices Panos Kalnis, Xiaochen Li National University of Singapore Nikos Mamoulis The University of Hong Kong.
Evaluation of Relational Operations. Relational Operations v We will consider how to implement: – Selection ( ) Selects a subset of rows from relation.
Database Management Systems, R. Ramakrishnan and J. Gehrke1 Query Evaluation Chapter 12: Overview.
1 HKU CSIS DB Seminar: HKU CSIS DB Seminar: Web Services Oriented Data Processing and Integration Speaker: Eric Lo.
On QoS Guarantees with Reward Optimization for Servicing Multiple Priority Class in Wireless Networks YaoChing Peng Eunyoung Chang.
1 Evaluating top-k Queries over Web-Accessible Databases Paper By: Amelie Marian, Nicolas Bruno, Luis Gravano Presented By Bhushan Chaudhari University.
Towards Robust Indexing for Ranked Queries Dong Xin, Chen Chen, Jiawei Han Department of Computer Science University of Illinois at Urbana-Champaign VLDB.
1 Relational Algebra and Calculas Chapter 4, Part A.
1 Chapter 10 Joins and Subqueries. 2 Joins & Subqueries Joins – Methods to combine data from multiple tables – Optimizer information can be limited based.
Histograms for Selectivity Estimation
1University of Texas at Arlington.  Introduction  Motivation  Requirements  Paper’s Contribution.  Related Work  Overview of Ripple Join  Rank.
Spatial Query Processing Spatial DBs do not have a set of operators that are considered to be basic elements in a query evaluation. Spatial DBs handle.
Lecture 15- Parallel Databases (continued) Advanced Databases Masood Niazi Torshiz Islamic Azad University- Mashhad Branch
Answering Top-k Queries Using Views Gautam Das (Univ. of Texas), Dimitrios Gunopulos (Univ. of California Riverside), Nick Koudas (Univ. of Toronto), Dimitris.
CS4432: Database Systems II Query Processing- Part 2.
Database Management Systems, R. Ramakrishnan and J. Gehrke1 Introduction to Query Optimization Chapter 13.
Relational Operator Evaluation. Overview Application Programmer (e.g., business analyst, Data architect) Sophisticated Application Programmer (e.g.,
By: Gang Zhou Computer Science Department University of Virginia 1 Medians and Beyond: New Aggregation Techniques for Sensor Networks CS851 Seminar Presentation.
Finding skyline on the fly HKU CS DB Seminar 21 July 2004 Speaker: Eric Lo.
CPSC 404, Laks V.S. Lakshmanan1 Evaluation of Relational Operations – Join Chapter 14 Ramakrishnan and Gehrke (Section 14.4)
Introduction to Distributed Databases Yiwei Wu. Introduction A distributed database is a database in which portions of the database are stored on multiple.
Efficient OLAP Operations in Spatial Data Warehouses Dimitris Papadias, Panos Kalnis, Jun Zhang and Yufei Tao Department of Computer Science Hong Kong.
1 Querying the Physical World Son, In Keun Lim, Yong Hun.
1 CSIS 7101: CSIS 7101: Spatial Data (Part 1) The R*-tree : An Efficient and Robust Access Method for Points and Rectangles Rollo Chan Chu Chung Man Mak.
Query Execution. Where are we? File organizations: sorted, hashed, heaps. Indexes: hash index, B+-tree Indexes can be clustered or not. Data can be stored.
03/02/20061 Evaluating Top-k Queries Over Web-Accessible Databases Amelie Marian Nicolas Bruno Luis Gravano Presented By: Archana and Muhammed.
Computing & Information Sciences Kansas State University Wednesday, 08 Nov 2006CIS 560: Database System Concepts Lecture 32 of 42 Monday, 06 November 2006.
Implementation of Database Systems, Jarek Gryz1 Evaluation of Relational Operations Chapter 12, Part A.
CS 540 Database Management Systems
Query Execution Query compiler Execution engine Index/record mgr. Buffer manager Storage manager storage User/ Application Query update Query execution.
Alon Levy 1 Relational Operations v We will consider how to implement: – Selection ( ) Selects a subset of rows from relation. – Projection ( ) Deletes.
SQL and Query Execution for Aggregation. Example Instances Reserves Sailors Boats.
Database Management Systems 3ed, R. Ramakrishnan and J. Gehrke1 Evaluation of Relational Operations Chapter 14, Part A (Joins)
1 Overview of Query Evaluation Chapter Outline  Query Optimization Overview  Algorithm for Relational Operations.
Chapter 13: Query Processing
CS4432: Database Systems II Query Processing- Part 1 1.
SQL IMPLEMENTATION & ADMINISTRATION Indexing & Views.
Database Management System
Database System Implementation CSE 507
Evaluation of Relational Operations
Chapter 15 QUERY EXECUTION.
Relational Algebra 461 The slides for this text are organized into chapters. This lecture covers relational algebra, from Chapter 4. The relational calculus.
On Spatial Joins in MapReduce
Database Applications (15-415) DBMS Internals- Part VI Lecture 15, Oct 23, 2016 Mohammad Hammoud.
Efficient Evaluation of k-NN Queries Using Spatial Mashups
Overview of Query Evaluation
Implementation of Relational Operations
DATABASE HISTOGRAMS E0 261 Jayant Haritsa
Evaluation of Relational Operations: Other Techniques
Overview of Query Evaluation: JOINS
Presentation transcript:

HKU CSIS DB Seminar Processing Ad-Hoc Joins on Mobile Devices HKU CSIS DB Seminar 10 Oct 2003 Speaker: Eric Lo

HKU CSIS DB Seminar Mobile Devices and Databases Cellular phones and Personal Data Assistants (PDAs) are capable to ask information from remote database(s) anywhere and anytime The connection channel is wireless E.g., WAP, IEEE (also WiFi), GPRS, 3G

HKU CSIS DB Seminar HK Stock Exchange Example 11:55am: What is the stock price of “PCCW” now? SELECT Stock_Price FROM DB WHERE Stock_Code = ‘8’ 8 - PCCW 11:56am: HKD 2.5

HKU CSIS DB Seminar HK Stock Exchange There are no free lunch Option 1: Charged by airtime 11:55am: What is the stock price of “PCCW” now? SELECT Stock_Price FROM DB WHERE Stock_Code = ‘8’ 8 - PCCW 11:56am: HKD 2.5 $ 1 $ 2.8$ 4.6$ 10.2

HKU CSIS DB Seminar Option 2: Charged by amount of data transferred Network traffic and QoS of wireless data networking are strongly dependent on factors like Network workloads Availability of network stations Charged by amount of data transferred Minimizing the transfer cost  dollar!

HKU CSIS DB Seminar Query more than one data source Mobile users may wish to combine information from more than one remote databases E.g., A vegetarian visits Hong Kong and looks for some restaurants recommended by both HK tourist office and HK vegetarian community

HKU CSIS DB Seminar Example relations SELECT R1.Name, R1.Address, R2.Cost FROM R1, R2 WHERE R1.Name = R2.Name Join Query

HKU CSIS DB Seminar Motivations Evaluating join queries on mobile devices Considerations: 1. Mobile device has limited memory 2. Minimizing the transfer cost (dollar $$$$$$) 3. Databases are non-collaborative 4. Query results are small in sizes compare to input relations (ad-hoc)

HKU CSIS DB Seminar Download all relations? Download both relations (HK tourist office and HK vegetarian directory) onto the mobile device and evaluate the join on the device locally Won’t be able to hold the large amount of data from the remote databases (for most mobile devices) The transfer cost is very high though the result size is very small

HKU CSIS DB Seminar Outline Introduction and motivation A simple late-projection strategy Block-merge join Ship-data-as-queries join RAMJ: Recursive and Adaptive Mobile Join Experiment result Conclusions and future work

HKU CSIS DB Seminar A simple late-projection strategy Traditional distributed query processing techniques like semi-join involves: Shipping of join columns and (whole) tuples Across the trusted distributed nodes directly In high selective join, most tuples fails to contribute to the final result Semi-join d/l the non-key attributes which may not be included in the result Download and join the distinct values of join keys only (Do not download the non-key attributes) Only tuples belong to join result entails downloading the rest of non-key attributes

HKU CSIS DB Seminar A late projection strategy SELECT R1.Name, R1.Address, R2.Cost FROM R1, R2 WHERE R1.Name = R2.Name

HKU CSIS DB Seminar Step 1 Download  Name R1 SELECT R1.Name, R1.Address, R2.Cost FROM R1, R2 WHERE R1.Name = R2.Name

HKU CSIS DB Seminar Step 2 Download  Name R2 SELECT R1.Name, R1.Address, R2.Cost FROM R1, R2 WHERE R1.Name = R2.Name

HKU CSIS DB Seminar Step 3 Evaluate T =  Name (R1)  Name (R2) locally = SELECT R1.Name, R1.Address, R2.Cost FROM R1, R2 WHERE R1.Name = R2.Name T

HKU CSIS DB Seminar Step 4 Evaluate  Name,Address (  Name=T (R1) ) SELECT R1.Name, R1.Address, R2.Cost FROM R1, R2 WHERE R1.Name = R2.Name T

HKU CSIS DB Seminar Step 5 Evaluate  Name,Cost (  Name=T (R2) ) SELECT R1.Name, R1.Address, R2.Cost FROM R1, R2 WHERE R1.Name = R2.Name T

HKU CSIS DB Seminar Step 6 Join the two resultsets SELECT R1.Name, R1.Address, R2.Cost FROM R1, R2 WHERE R1.Name = R2.Name

HKU CSIS DB Seminar Block-merge join (BMJ) Late-projection still insufficient if the whole join column cannot fit into the memory of mobile devices Applying sort-merge join, with the sorting part on the servers 1 block of ordered join keys are downloaded from each server and join them locally, until one block is exhausted Each block must  Cover same data range  Sorted in same order (e.g., both in ascending order)  Small enough to be resided in memory Each block can be downloaded by using ROWNUM or LIMIT SQL statements

HKU CSIS DB Seminar Ship-data-as-queries join (SDAQ) If R1<< R2, transfer cost can be reduced by: 1. Download the join column of R1 to the mobile device: SELECT Name FROM R1 2. Send the join keys to R2 in form of SQL selection queries (e.g., if two results returned in step 1): SELECT Name FROM R2 WHERE Name in (‘Beta Food’,‘Ceta Food’) ) 3. The result from R2 are the joined keys Very few results returned

HKU CSIS DB Seminar Can we do even better? Block-merge join (BMJ) can handle the limited memory problem, BUT download all join keys essentially Ship-data-as-queries (SDAQ) can do better ONLY if the sizes of two relations differ much Pay small overhead  Build histograms that capture the data distribution of target relations Join space pruning Bucket-base joining

HKU CSIS DB Seminar Pruning the data space

HKU CSIS DB Seminar Constructing Histogram Problem Mobile devices are not able to receive those histograms (they are some internal data structures in remote databases) Solution Constructing some queries that build histogram through SQL

HKU CSIS DB Seminar String Histogram Using the SUBSTRING function SELECT SUBSTRING(Name,1,1) AS Bucket, COUNT(Name) As Count FROM R1 GROUP BY SUBSTRING(Name,1,1) HAVING COUNT(Name) > 0

HKU CSIS DB Seminar Numeric Histogram Using ROUND function SELECT ROUND(Cost/(D/G)) AS Bucket, COUNT(ROUND(Cost/(D/G)) As Count FROM R2 GROUP BY ROUND(Cost/(D/G) HAVING COUNT(ROUND(Cost/(D/G)) > 0 G is granularity that specifies the number of bucket D is the numeric domain

HKU CSIS DB Seminar A Bucket-base Approach So far we know that: SDAQ is good when input relations have large size difference But, BMJ is better when two input relations have similar size (Why?) Histogram helpful to prune join space “Which method is better?” The histogram can do more! The histogram already partitioned the data space in form of buckets Depends on the data distribution of each bucket, assign the best action to them adaptively

HKU CSIS DB Seminar Direct Join (~BMJ)

HKU CSIS DB Seminar Ship-join-keys (~SDAQ)

HKU CSIS DB Seminar Recursive Partitioning

HKU CSIS DB Seminar Recursive Partitioning Further breaks a partition into more refined ones and further request histogram for it Hoping some sub-buckets are being pruned in future Or hoping cheap ship-join-keys join can be applied on some future sub-buckets

HKU CSIS DB Seminar Recursive Partitioning SELECT SUBSTRING(Name,1,2) AS Bucket, COUNT(Name) AS Count FROM R2 WHERE SUBSTRING(Name,1,1)=‘A’ GROUP BY SUBSTRING(Name,1,2) HAVING COUNT(Name) > 0 BucketCount AA10 AB54 AC105 … AX85 AY12 AZ32 BucketCount AB54 AC5 … AX90 AZ32

HKU CSIS DB Seminar Which action is the best for each bucket? The cost model! The largest amount of data that can be transferred in one packet is called MTU (Maximum Transfer Unit) The largest segment of TCP data that can be transmitted is called MSS (Maximum Segment Size) MTU = MSS + B H (B H is the size of headers) To transfer B bytes data, the actual number of bytes to be transferred is:

HKU CSIS DB Seminar Cost Model Assume C R1 and C R2 be the cost of accessing R1 and R2, respectively Send a selection query Q to a server needs T(B SQL + B key ) bytes B key = 4 bytes for numeric attributes B key = 2L bytes for string attributes in length L

HKU CSIS DB Seminar Cost Model Under these settings, we have to determine the cost of: Direct Join C 1 Ship-join-key Join C 2 Recursive Partitioning C 3 Execute the minimal cost action for each bucket adaptively

HKU CSIS DB Seminar C 1 : Direct Join α i,β i be the i-th histogram bucket summarizing the same data region 1. Send a selection query to R1: C R1 T(B SQL + B key ) 2. Receiving C R1 T(|α i |B key ) bytes. 3. Send a selection query to R2: C R2 T(B SQL + B key ) 4. Receiving C R2 T(|β i |B key ) bytes. C 1 (α i,β i )= (C R1 + C R2 )T(B SQL + B key ) + C R1 T(|α i |B key ) + C R2 T(|β i |B key ) C 1 (α i,β i )= (C R1 + C R2 )T(B SQL + B key ) + C R1 T(|α i |B key ) + C R2 T(|β i |B key )

HKU CSIS DB Seminar C 2 : Ship-join-keys |α i |<=| β i | 1. Send a selection query to the smaller relation R2 that holds α i : C R2 T(B SQL + B key ) 2. Receiving C R2 T(|α i |B key ) bytes from R2 3. Send a selection query to larger relation R1 to check existence of |α i | keys: C R1 T(B SQL + |α i |B key ) 4. Receiving at most |α i | keys from R1: C R1 T(|α i |B key ) C 2 (α i,β i )= C R1 (T(B SQL + B key )+ T(|α i |B key )) + C R2 T(T(B SQL + 2|α i |B key ))

HKU CSIS DB Seminar C 3 : Recursive Partitioning Have to estimate the cost of: 1. Ask for finer histograms for that bucket from R1 2. Ask for finer histograms for that bucket from R2 3. For each pair of (future/virtual) sub-buckets, each of them may execute direct-join, ship-join-key or recursive partitioning again BucketCount AA10 AB54 AC105 … AX85 AY12 AZ32 BucketCount AB54 AC5 … AX90 AZ32

HKU CSIS DB Seminar Recursive Partitioning C 3 Ask for finer histograms from R1: C h (G,R1) = C R1 (T(G(B key +4))+T(B SQL +B key )) Ask for finer histograms from R2: C h (G,R2) = C R2 (T(G(B key +4))+T(B SQL +B key )) For each pair of (future) sub-buckets, each sub-bucket pair may recursively follow direct-join, ship-join-key or recursive partitioning again:

HKU CSIS DB Seminar Recursive Partitioning C 3 C 3 (α i,β i )= C h (G,R1) + C h (G,R2) + C RP (α i,β i ) C 3 (α i,β i )= C h (G,R1) + C h (G,R2) + C RP (α i,β i ) BucketCount AA10 AB54 AC105 … AX85 AY12 AZ32 BucketCount AB54 AC5 … AX90 AZ32 ?

HKU CSIS DB Seminar Recursive Partitioning – Optimistic Estimation Optimistically assume that buckets in next level are all being pruned It will hold if the data distribution in the two datasets is very different Since all future (next-level) sub-buckets are being pruned, they would NOT have any actions Therefore: C 3 (α i,β i )= C h (G,R1) + C h (G,R2) + C RP (α i,β i )

HKU CSIS DB Seminar Recursive Partitioning – Linear Interpolation Estimation More accurate. Higher computational cost Exploit the histogram in current level to estimate the distribution of next level We DON’T have histograms in this level We have histograms in this level

HKU CSIS DB Seminar Linear Interpolation Estimation b1 b2 b3 b4 b5 b1 b2 b3 b4 b5 Select adjacent buckets as interpolation points Preserve the current trend Resistance to fluctuated distribution

HKU CSIS DB Seminar One problem left 1. Level 1: The cost of RP on b2 is? 2. Estimate Level 2 by Linear Interpolation b 2,1, b2,2, …, b 2,5 are found 3. b 2,1, b 2,2, …, b 2,5 are found, determine which action is the most cost-efficient (C 1,C 2 or C 3 ) for each sub-bucket? C 1, C 2 of b 2,1, b2,2, …, b 2,5 can be determine C 3 of b 2,1, b2,2, …, b 2,5 ? Started from step 1 again b1 b2 b3 b4 b5 b1 b2 b3 b4 b5 Level 1 Level 2 b 2,1 b 2,1,1, b 2,1,2, b 2,1,3, b 2,1,4, b 2,1,5 3 2

HKU CSIS DB Seminar If you don’t understand… Cost 3 of one level depends on next level C RP (α i,β i ) Fortunately the cost of C RP (α i,β i ) is bounded by the following inequality: αi,βiαi,βiαi,βiαi,βi C1C1 C2C2 C3C3C3C3 C1C1 C2C2 C3C3C3C3 C1C1 C2C2 C3C3C3C3

HKU CSIS DB Seminar Recursive Partitioning – Linear Interpolation Estimation C 3 (α i,β i )= C h (G,R1) + C h (G,R2) + C RP (α i,β i ) C 3 (α i,β i )= C h (G,R1) + C h (G,R2) + C RP (α i,β i ) In optimistic estimation, we omit the last item optimistically C RP (α i,β i ) C RP (α i,β i ) is bounded by the inequality: Summing up everything:

HKU CSIS DB Seminar RAMJ Algorithm Recursive and Adaptive Mobile Join

HKU CSIS DB Seminar Real Data Experiment Real Data Set 1 DBLP Join relations “Conference” (235K tuples) and “Journal” (125K tuples) in order to find the set of publications that have the same conference and journal title 3836 publications have same title in both conference and journal paper SELECT R1.Title FROM Conference R1, Journal R2 WHERE R1.Title = R2.Title

HKU CSIS DB Seminar Real Data Experiment Real Data Set 2 Restaurants Data Set Crawled from Join relation “Steak” (4573 tuples) and “Vegetarian” (2098 tuples) in order to find the set of restaurants that offer both steak and vegetarian dishes (163 joined) SELECT R1.Name FROM Steak R1, Vegetarian R2 WHERE R1.Name = R2.Name

HKU CSIS DB Seminar Real Data Experiment Result AlgorithmDBLPRestaurant BMJ45.89M266.22K SDAQ43.28M180.15K RAMJ-OPT30.11M116.67K

HKU CSIS DB Seminar Synthetic Data Experiment Generate 3 relations with different distributions: Gaussian Negative Exponential Zipf (skewness θ = 1) Default: 10,000 tuples Domain = 100,000 G = 20

HKU CSIS DB Seminar Synthetic Data Experiment Result Algorithm NegExp-GaussianZipf-GaussianZipf-NegExp Transferred (Bytes) No. of joined keys Transferred (Bytes) No. of joined keys Transferred (Bytes) No. of joined keys BMJ SDAQ RAMJ-OPT RAMJ-LI

HKU CSIS DB Seminar The impact of data skew

HKU CSIS DB Seminar The impact of memory size

HKU CSIS DB Seminar Conclusions and Future Work Identify the requirements and limitation on evaluating ad-hoc join on mobile devices A recursive and adaptive algorithm – RAMJ Extension to multi-way joins and multi- attributes Extension to Top-K-Join Existing approaches on Top-K problem ONLY works on collaborative database

HKU CSIS DB Seminar Q & A ?

HKU CSIS DB Seminar Approach 2: Mediator User queries are free-form … i.e., User KY may issue a join query that involves DB x and DB y, whereas user BY may issue a join query that involves DB e and DB f Mediator cannot answer those queries without prior preparation like data integration, schema matching … Mediator services may charge the users as well

HKU CSIS DB Seminar Approach 2: Distributed query processing? Existing distributed database work on trusted environment only DB1.Name DB2.CustomerName Semi-join 1. Site DB1: Evaluate J:  Name DB1 [J = All Names] 2. Send J from DB1 to DB2 3. Site DB2: Evaluate K:  CustomerName = J ( DB2 ) [Find all CustomerName = Name in DB1] 4. Send K from DB2 to DB1

HKU CSIS DB Seminar Approach 2: Distributed query processing? Not work on our problem! DB1 and DB2 are non-collaborative Would not accept “data structures” as input (e.g., a “join column” in semi-join or a “hash-table” in bloom-join Accept SQL only Semi-join is worked by some modifications: Send J and K through the mobile device   High transfer cost

HKU CSIS DB Seminar References 1. Processing Ad-hoc joins on mobile devices, submitted to EDBT P.A. Bernstein and N. Goodman. Power of natural semijoin. SAIM Journal of Computing, P.A. Bernstein, N. Goodman, et. al. Query processing in a system for distributed databases (sdd-1). ACM TODS, N. Mamoulis, P. Kalnis, et. al. Optimization of spatial joins on mobile devices. SSTD, 2003

HKU CSIS DB Seminar Other Join Types Equi-join with selection constraints E.g., We are interested in restaurants appear in both datasets and the expense is less than $20 SELECT R1.Name, R1.Address, R2.Cost FROM R1, R2 WHERE R1.Name = R2.Name AND R2.Cost < 20 Add this condition in histogram construction: SELECT SUBSTRING(Name,1,1) AS Bucket, COUNT(Name) As Count FROM R1 WHERE R2.Cost < 20 GROUP BY SUBSTRING(Name,1,1) HAVING COUNT(Name) > 0

HKU CSIS DB Seminar Iceberg Semi-join Find all restaurants in R1 which are recommended by at least 10 users in a discussion group R2 Properties: Equi-join between R1 and R2 Results are comes from R1 only Condition is applied on R2 only (>10 users) SELECT SUBSTRING(Name,1,1) AS Bucket, COUNT(Name) As Count FROM R2 GROUP BY SUBSTRING(Name,1,1) HAVING COUNT(Name) > t