Presentation is loading. Please wait.

Presentation is loading. Please wait.

Mining for Empty Rectangles in Large Data Sets

Similar presentations


Presentation on theme: "Mining for Empty Rectangles in Large Data Sets"— Presentation transcript:

1 Mining for Empty Rectangles in Large Data Sets
Jeff Edmonds Jarek Gryz Dongming Liang Renee Miller

2 Matrix representation
A,B(R S) 1 1 2 3 6 7 8 A B 3 1 6 7 8

3 Find All Maximal 0-Rectangles
um al A,B(R S) 1 1 2 3 6 7 8 A B 3 1 6 7 8

4 Example 1 1 1 A,B(R S) … First BMW Z3 series cars were made in 1997.
Car Year BMW Z3 1 Honda L2 1 Toyota 6A 1 First BMW Z3 series cars were made in 1997.

5 Relation to Previous Work
[Namaad, Hsu, Lee] Our Work [Lui, Ku, Hsu] & [Orlowski] between points in real plane within a 0-1 matrix Find all maximal empty rectangles Problem: Purpose: Machine Learning Computational Geometry Query Optimization O( (# 1’s)2 ) O( #0’s ) # of maximal 0-rectangles:

6 Relation to Previous Work
[Namaad, Hsu, Lee] Our Work [Lui, Ku, Hsu] & [Orlowski] O( # 1’s log(#1’s) # rectangles ) = O(|X||Y|) O( #0’s ) = O(|X||Y|) Time: Space: O(|X||Y|) O(min(|X|, |Y|)) only two rows of matrix kept in memory

7 Relation to Previous Work
[Namaad, Hsu, Lee] Our Work [Lui, Ku, Hsu] & [Orlowski] Intensive random memory access Requires a single scan of the sorted data Practical Implementation: Scalable: Scales Badly Scales well wrt # of tuples in join # of maximal rectangles # of values |X| & |Y| IBM paid us $25,000 to patent it! Practical?

8 Structure of Algorithm
loop y = 1..|Y| loop x = 1..|X| Construct staircase(x,y) Output all maximal 0-rectangles with <x,y> as bottom-right corner First Third Second 1 Fourth X Y 1 Timing O(1) amortized time per <x,y> 1 1 1 <x,y> * 1

9 Structure of Algorithm
loop y = 1..|Y| loop x = 1..|X| Construct staircase(x,y) Output all maximal 0-rectangles with <x,y> as bottom-right corner 1 X Y Fifth 1 Query Optimization & Experimental Results 1 1 1 <x,y> * 1

10 Staircase(x,y) Staircase(x,y) step Stack of steps Y X <x,y> * 1
Jarek Gryz: Staircase(x,y) Staircase(x,y) ( x ,y ) r 1 2 3 4 5 Stack of steps 1 1 step Y 1 1 <x,y> * X

11 Constructing Maximal Rectangles
Jarek Gryz: Constructing Maximal Rectangles <x,y> *

12 Constructing Maximal Rectangles
Jarek Gryz: Constructing Maximal Rectangles Too Narrow Maximal Too short <x,y> *

13 Constructing staircase(x,y) from staircase(x-1,y)
Jarek Gryz: Constructing staircase(x,y) from staircase(x-1,y) 1 <x,y> * 1 Case 1 1 1 1 1 1 1 <x-1,y> * 1

14 Constructing staircase(x,y) from staircase(x-1,y)
Jarek Gryz: Constructing staircase(x,y) from staircase(x-1,y) 1 Case 2 <x,y> * 1 1 1 1 1 1 <x-1,y> * 1 1

15 Constructing staircase(x,y) from staircase(x-1,y)
Jarek Gryz: Constructing staircase(x,y) from staircase(x-1,y) Delete Keep <x,y> * 1 Too Narrow Maximal Too short ( x ,y ) r r 1 1 Y 1 1 1 1 ( x ,y ) 1 1 1 <x-1,y> * ( x, y ) 1 X

16 Constructing x*(x,y) & y*(x,y)
Jarek Gryz: Constructing x*(x,y) & y*(x,y) 1 ( x ,y ) r r 1 1 y*(x-1,y) 1 1 1 1 ( x ,y ) 1 1 1 <x-1,y> * ( x, y ) x*(x-1,y) 1

17 Constructing x*(x,y) & y*(x,y) from x*(x-1,y) & y*(x,y-1)
Jarek Gryz: Constructing x*(x,y) & y*(x,y) from x*(x-1,y) & y*(x,y-1) 1 <x,y> * y*(x,y) x*(x,y) ( x ,y ) r r 1 1 y*(x,y-1) 1 (saved) 1 1 1 1 ( x ,y ) 1 1 1 <x-1,y> * ( x, y ) Query x*(x-1,y) 1

18 Structure of Algorithm
loop y = 1..|Y| loop x = 1..|X| Construct staircase(x,y) Output all maximal 0-rectangles with <x,y> as bottom-right corner 1 Third X Y 1 Timing O(1) amortized time per <x,y> 1 1 1 <x,y> * <x.y> 1

19 Timing Only work that is not constant Time Delete Too Narrow Maximal
Jarek Gryz: Timing Only work that is not constant Time Delete 1 Too Narrow Maximal Too short ( x ,y ) r r 1 1 Y 1 1 1 1 ( x ,y ) 1 1 1 <x,y> * ( x, y ) 1 X

20 Timing Amortized # of steps deleted (per <x,y>)
= # of steps created (per <x,y>) £ 1 <x-1,y> * 1

21 Number of Maximal Rectangles
# of maximal 0-rectangles: O( (# 1’s)2 ) [Namaad, Hsu, Lee] Running time of alg = O( #0’s )

22 How many empty rectangles are there?
Tests done on 4 pairs of attributes with numerical domain present in typical joins in a real-world workload of a health insurance company.

23 How big are the rectangles?

24 Query rewrite: simple case
select … from R, S,... where R.C=S.C and 60<R.A<80 and 20<S.B<80 and... select … from R, S,... where R.C=S.C and 60<R.A<80 and 20<S.B<60 and...

25 Query rewrite: complex case
select … from R, S,... where R.C=S.C and 60<R.A<80 and 20<S.B<80 and... select … from R, S,... where R.C=S.C and (… and …) or ...

26 How much do the rectangles overlap with queries?

27 Query optimization experiments
real-world workload of 26 queries 5 of the queries “qualified” for the rewrite only simple rewrites were considered all rewrites led to improved performance


Download ppt "Mining for Empty Rectangles in Large Data Sets"

Similar presentations


Ads by Google