Download presentation
Presentation is loading. Please wait.
1
Mining for Empty Rectangles in Large Data Sets
Jeff Edmonds Jarek Gryz Dongming Liang Renee Miller
2
Matrix representation
A,B(R S) 1 1 2 3 6 7 8 A B 3 1 6 7 8
3
Find All Maximal 0-Rectangles
um al A,B(R S) 1 1 2 3 6 7 8 A B 3 1 6 7 8
4
Example 1 1 1 A,B(R S) … First BMW Z3 series cars were made in 1997.
Car Year … BMW Z3 1 Honda L2 1 Toyota 6A 1 First BMW Z3 series cars were made in 1997.
5
Relation to Previous Work
[Namaad, Hsu, Lee] Our Work [Lui, Ku, Hsu] & [Orlowski] between points in real plane within a 0-1 matrix Find all maximal empty rectangles Problem: Purpose: Machine Learning Computational Geometry Query Optimization O( (# 1’s)2 ) O( #0’s ) # of maximal 0-rectangles:
6
Relation to Previous Work
[Namaad, Hsu, Lee] Our Work [Lui, Ku, Hsu] & [Orlowski] O( # 1’s log(#1’s) # rectangles ) = O(|X||Y|) O( #0’s ) = O(|X||Y|) Time: Space: O(|X||Y|) O(min(|X|, |Y|)) only two rows of matrix kept in memory
7
Relation to Previous Work
[Namaad, Hsu, Lee] Our Work [Lui, Ku, Hsu] & [Orlowski] Intensive random memory access Requires a single scan of the sorted data Practical Implementation: Scalable: Scales Badly Scales well wrt # of tuples in join # of maximal rectangles # of values |X| & |Y| IBM paid us $25,000 to patent it! Practical?
8
Structure of Algorithm
loop y = 1..|Y| loop x = 1..|X| Construct staircase(x,y) Output all maximal 0-rectangles with <x,y> as bottom-right corner First Third Second 1 Fourth X Y 1 Timing O(1) amortized time per <x,y> 1 1 1 <x,y> * 1
9
Structure of Algorithm
loop y = 1..|Y| loop x = 1..|X| Construct staircase(x,y) Output all maximal 0-rectangles with <x,y> as bottom-right corner 1 X Y Fifth 1 Query Optimization & Experimental Results 1 1 1 <x,y> * 1
10
Staircase(x,y) Staircase(x,y) step Stack of steps Y X <x,y> * 1
Jarek Gryz: Staircase(x,y) Staircase(x,y) ( x ,y ) r 1 2 3 4 5 Stack of steps 1 1 step Y 1 1 <x,y> * X
11
Constructing Maximal Rectangles
Jarek Gryz: Constructing Maximal Rectangles <x,y> *
12
Constructing Maximal Rectangles
Jarek Gryz: Constructing Maximal Rectangles Too Narrow Maximal Too short <x,y> *
13
Constructing staircase(x,y) from staircase(x-1,y)
Jarek Gryz: Constructing staircase(x,y) from staircase(x-1,y) 1 <x,y> * 1 Case 1 1 1 1 1 1 1 <x-1,y> * 1
14
Constructing staircase(x,y) from staircase(x-1,y)
Jarek Gryz: Constructing staircase(x,y) from staircase(x-1,y) 1 Case 2 <x,y> * 1 1 1 1 1 1 <x-1,y> * 1 1
15
Constructing staircase(x,y) from staircase(x-1,y)
Jarek Gryz: Constructing staircase(x,y) from staircase(x-1,y) Delete Keep <x,y> * 1 Too Narrow Maximal Too short ( x ,y ) r r 1 1 Y 1 1 1 1 ( x ,y ) 1 1 1 <x-1,y> * ( x, y ) 1 X
16
Constructing x*(x,y) & y*(x,y)
Jarek Gryz: Constructing x*(x,y) & y*(x,y) 1 ( x ,y ) r r 1 1 y*(x-1,y) 1 1 1 1 ( x ,y ) 1 1 1 <x-1,y> * ( x, y ) x*(x-1,y) 1
17
Constructing x*(x,y) & y*(x,y) from x*(x-1,y) & y*(x,y-1)
Jarek Gryz: Constructing x*(x,y) & y*(x,y) from x*(x-1,y) & y*(x,y-1) 1 <x,y> * y*(x,y) x*(x,y) ( x ,y ) r r 1 1 y*(x,y-1) 1 (saved) 1 1 1 1 ( x ,y ) 1 1 1 <x-1,y> * ( x, y ) Query x*(x-1,y) 1
18
Structure of Algorithm
loop y = 1..|Y| loop x = 1..|X| Construct staircase(x,y) Output all maximal 0-rectangles with <x,y> as bottom-right corner 1 Third X Y 1 Timing O(1) amortized time per <x,y> 1 1 1 <x,y> * <x.y> 1
19
Timing Only work that is not constant Time Delete Too Narrow Maximal
Jarek Gryz: Timing Only work that is not constant Time Delete 1 Too Narrow Maximal Too short ( x ,y ) r r 1 1 Y 1 1 1 1 ( x ,y ) 1 1 1 <x,y> * ( x, y ) 1 X
20
Timing Amortized # of steps deleted (per <x,y>)
= # of steps created (per <x,y>) £ 1 <x-1,y> * 1
21
Number of Maximal Rectangles
# of maximal 0-rectangles: O( (# 1’s)2 ) [Namaad, Hsu, Lee] Running time of alg = O( #0’s )
22
How many empty rectangles are there?
Tests done on 4 pairs of attributes with numerical domain present in typical joins in a real-world workload of a health insurance company.
23
How big are the rectangles?
24
Query rewrite: simple case
select … from R, S,... where R.C=S.C and 60<R.A<80 and 20<S.B<80 and... select … from R, S,... where R.C=S.C and 60<R.A<80 and 20<S.B<60 and...
25
Query rewrite: complex case
select … from R, S,... where R.C=S.C and 60<R.A<80 and 20<S.B<80 and... select … from R, S,... where R.C=S.C and (… and …) or ...
26
How much do the rectangles overlap with queries?
27
Query optimization experiments
real-world workload of 26 queries 5 of the queries “qualified” for the rewrite only simple rewrites were considered all rewrites led to improved performance
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.