Mining for Empty Rectangles in Large Data Sets Jeff Edmonds Jarek Gryz Dongming Liang Renee Miller
Matrix representation A B A,B (R S)
Find All Maximal 0-Rectangles A,B (R S) al um A B
BMW Z3 Honda L2 Toyota 6A Example A,B (R S) 00 Car Year … First BMW Z3 series cars were made in 1997.
5 Relation to Previous Work [Lui, Ku, Hsu] & [Orlowski] Our Work Problem: Purpose: Machine Learning Computational Geometry Query Optimization between points in real plane within a 0-1 matrix Find all maximal empty rectangles # of maximal 0-rectangles: O( (# 1’s) 2 ) O( #0’s ) [Namaad, Hsu, Lee]
6 Relation to Previous Work Our Work Time: Space: O(|X||Y|) O(min (|X|, |Y|)) only two rows of matrix kept in memory O( # 1’s log(#1’s) + # rectangles ) = O(|X||Y|) O( #0’s ) = O(|X||Y|) [Lui, Ku, Hsu] & [Orlowski] [Namaad, Hsu, Lee]
7 Relation to Previous Work Our Work Practical Implementation: Scalable: Scales Badly Scales well wrt # of tuples in join # of maximal rectangles # of values |X| & |Y| Intensive random memory access Requires a single scan of the sorted data Practical? IBM paid us $25,000 to patent it! [Lui, Ku, Hsu] & [Orlowski] [Namaad, Hsu, Lee]
8 Structure of Algorithm loop y = 1..|Y| loop x = 1..|X| Output all maximal 0-rectangles with as bottom-right corner Maintain the loop invariant X 0 Y 0 1 Timing O(1) amortized time per *
9 Designing an Algorithm Define ProblemDefine Loop Invariants Define Measure of Progress Define StepDefine Exit ConditionMaintain Loop Inv Make ProgressInitial ConditionsEnding 79 km to school Exit 79 km75 km Exit 0 kmExit
X Y * Define the Loop Invariant We have read the matrix up to and cannot reread the matrix. We must output all maximal 0-rectangles with as bottom-right corner What must we remember?
11 0 step ( x,y ) rr Stack of steps 1 1 X Y * x*x* y*y*
12 * Constructing Maximal Rectangles
13 Too Narrow Maximal Too short * Constructing Maximal Rectangles
14 * Constructing staircase(x,y) from staircase(x - 1,y) Case 1 * 0
X Y ( x,y ) rr 11 0 * Constructing staircase(x,y) from staircase(x - 1,y) 0 Case 2
X Y ( x,y ) rr 11 0 Too Narrow Maximal Too short * Constructing staircase(x,y) from staircase(x - 1,y) 0 0 Delete Keep * 0
17 Constructing x * & y * ( x,y ) rr 11 0 * x*x* y*y*
18 X Y Location of last 1 seen in each column *
19 Structure of Algorithm loop y = 1..|Y| loop x = 1..|X| Construct staircase(x,y) Output all maximal 0-rectangles with as bottom-right corner X 0 Y 0 1 Timing O(1) amortized time per Third *
X Y ( x,y ) rr 11 0 Too Narrow Maximal Too short * Timing 0 0 Delete 0 Only work that is not constant Time
21 Timing Amortized # of steps deleted (per ) = # of steps created (per ) *
22 Number of Maximal Rectangles # of maximal 0-rectangles: O( (# 1’s) 2 ) [Namaad, Hsu, Lee] Running time of alg = O( #0’s )
23 Designing an Algorithm Define ProblemDefine Loop Invariants Define Measure of Progress Define StepDefine Exit ConditionMaintain Loop Inv Make ProgressInitial ConditionsEnding 79 km to school Exit 79 km75 km Exit 0 kmExit