Mining for Empty Rectangles in Large Data Sets Jeff Edmonds Jarek Gryz Dongming Liang Renee Miller.

Slides:



Advertisements
Similar presentations
Algorithm Analysis Input size Time I1 T1 I2 T2 …
Advertisements

Estimating Distinct Elements, Optimally
Algorithms (and Datastructures) Lecture 3 MAS 714 part 2 Hartmut Klauck.
Sorting Really Big Files Sorting Part 3. Using K Temporary Files Given  N records in file F  M records will fit into internal memory  Use K temp files,
Selinger Optimizer Lecture 10 October 15, 2009 Sam Madden.
The Skyline Operator (Stephan Borzsonyi, Donald Kossmann, Konrad Stocker) Presenter: Shehnaaz Yusuf March 2005.
Lars Arge 1/43 Big Terrain Data Analysis Algorithms in the Field Workshop SoCG June 19, 2012 Lars Arge.
1 Searching in a Graph Jeff Edmonds York University COSC 3101 Lecture 5 Generic Search Breadth First Search Dijkstra's Shortest Paths Algorithm Depth First.
CIS December '99 Introduction to Parallel Architectures Dr. Laurence Boxer Niagara University.
CS223 Advanced Data Structures and Algorithms 1 Divide and Conquer Neil Tang 4/15/2010.
Greedy Algorithms Credits: Many of these slides were originally authored by Jeff Edmonds, York University. Thanks Jeff!
Two Technique Papers on High Dimensionality Allan Rempel December 5, 2005.
CMPT 354, Simon Fraser University, Fall 2008, Martin Ester 52 Database Systems I Relational Algebra.
1 Chapter 10 Query Processing: The Basics. 2 External Sorting Sorting is used in implementing many relational operations Problem: –Relations are typically.
CS263 Lecture 19 Query Optimisation.  Motivation for Query Optimisation  Phases of Query Processing  Query Trees  RA Transformation Rules  Heuristic.
Special Cases of the Hidden Line Elimination Problem Computational Geometry, WS 2007/08 Lecture 16 Prof. Dr. Thomas Ottmann Algorithmen & Datenstrukturen,
I/O-Algorithms Lars Arge Aarhus University February 16, 2006.
General Computer Science for Engineers CISC 106 James Atlas Computer and Information Sciences 10/23/2009.
I/O-Algorithms Lars Arge Aarhus University February 7, 2005.
I/O-Algorithms Lars Arge University of Aarhus February 13, 2005.
I/O-Algorithms Lars Arge University of Aarhus March 7, 2005.
A Parallel Algorithm for Approximate Regularity, by Laurence Boxer and Russ Miller, A presentation for the Niagara University Research Council, Nov.,
E.Papandrea PM3 - Paris, 2 nd Mar 2004 DFCI COMPUTING PERFORMANCEPage 1 Enzo Papandrea COMPUTING PERFORMANCE.
1 A Handy Data Structure Space-Efficient Finger Search on Degree-Balanced Search Trees Guy Blelloch, Bruce Maggs, Maverick Woo.
Programming Epson Robots – Part 2 ME 4135 – Fall 2012 Dr. R. Lindeke.
Advanced Algorithm Design and Analysis Student: Gertruda Grolinger Supervisor: Prof. Jeff Edmonds CSE 4080 Computer Science Project.
1 Review Jeff Edmonds York University COSC Some Math Recurrence Relations T(n) = a T(n/b) + f(n) Input Size Time Classifying Functions f(i) =
1 Jeff Edmonds York University COSC 2011 Lecture 3 Contracts Assertions Loop Invariants The Sum of Objects Insertion and Selection Sort Binary Search Like.
Heavily based on slides by Lars Arge I/O-Algorithms Thomas Mølhave Spring 2012 February 9, 2012.
COSC 3101NJ. Elder Announcements Midterm Exam: Fri Feb 27 CSE C Two Blocks: –16:00-17:30 –17:30-19:00 The exam will be 1.5 hours in length. You can attend.
Simple Efficient Algorithm for MPQ-tree of an Interval Graph Toshiki SAITOH Masashi KIYOMI Ryuhei UEHARA Japan Advanced Institute of Science and Technology.
Randomized Turing Machines
Ripple Joins for Online Aggregation by Peter J. Haas and Joseph M. Hellerstein published in June 1999 presented by Ronda Hilton.
External Memory Algorithms for Geometric Problems Piotr Indyk (slides partially by Lars Arge and Jeff Vitter)
ICS 321 Fall 2011 Overview of Storage & Indexing (i) Asst. Prof. Lipyeow Lim Information & Computer Science Department University of Hawaii at Manoa 11/9/20111Lipyeow.
Status “Lifetime of a Query” –Query Rewrite –Query Optimization –Query Execution Optimization –Use cost-estimation to iterate over all possible plans,
Computing the Volume of the Union of Cubes in R 3 Pankaj K. Agarwal Haim Kaplan Micha Sharir.
Lars Arge Presented by Or Ozery. I/O Model Previously defined: N = # of elements in input M = # of elements that fit into memory B = # of elements per.
too.
Greedy Algorithms. Surprisingly, many important and practical computational problems can be solved this way. Every two year old knows the greedy algorithm.
Data Structure Introduction.
Weikang Qian. Outline Intersection Pattern and the Problem Motivation Solution 2.
Chapter 12 Query Processing (1) Yonsei University 2 nd Semester, 2013 Sanghyun Park.
3.3 Complexity of Algorithms
Introduction to Query Optimization, R. Ramakrishnan and J. Gehrke 1 Introduction to Query Optimization Chapter 13.
Image Processing Ch2: Digital image Fundamentals Prepared by: Tahani Khatib.
LogTree: A Framework for Generating System Events from Raw Textual Logs Liang Tang and Tao Li School of Computing and Information Sciences Florida International.
1 Algorithms CSCI 235, Fall 2015 Lecture 17 Linear Sorting.
Optimal Aggregation Algorithms for Middleware By Ronald Fagin, Amnon Lotem, and Moni Naor.
Computing & Information Sciences Kansas State University Wednesday, 08 Nov 2006CIS 560: Database System Concepts Lecture 32 of 42 Monday, 06 November 2006.
Loop Invariants and Binary Search Chapter 4.4, 5.1.
Computing Approximate Weighted Matchings in Parallel Fredrik Manne, University of Bergen with Rob Bisseling, Utrecht University Alicia Permell, Michigan.
Implementation of Database Systems, Jarek Gryz1 Evaluation of Relational Operations Chapter 12, Part A.
Database Management Systems 3ed, R. Ramakrishnan and J. Gehrke1 Evaluation of Relational Operations Chapter 14, Part A (Joins)
An Algorithm for the Consecutive Ones Property Claudio Eccher.
1 Chapter 7 Network Flow Slides by Kevin Wayne. Copyright © 2005 Pearson-Addison Wesley. All rights reserved.
1 A Methodology for automatic retrieval of similarly shaped machinable components Mark Ascher - Dept of ECE.
Chapter 10 The Basics of Query Processing. Copyright © 2005 Pearson Addison-Wesley. All rights reserved External Sorting Sorting is used in implementing.
Central Algorithmic Techniques Iterative Algorithms.
Mining for Empty Rectangles in Large Data Sets
Relational Algebra Chapter 4 1.
Lesson 5-15 AP Computer Science Principles
Ripple Joins for Online Aggregation
Introduction to Query Optimization
Relational Algebra 1.
Relational Algebra Chapter 4 1.
Multiplying a Polynomial by a Monomial
Lecture 2- Query Processing (continued)
Sorting We may build an index on the relation, and then use the index to read the relation in sorted order. May lead to one disk block access for each.
Algorithms CSCI 235, Spring 2019 Lecture 18 Linear Sorting
Presentation transcript:

Mining for Empty Rectangles in Large Data Sets Jeff Edmonds Jarek Gryz Dongming Liang Renee Miller

Matrix representation A B  A,B (R S)

Find All Maximal 0-Rectangles  A,B (R S) al um A B

BMW Z3 Honda L2 Toyota 6A Example  A,B (R S) 00 Car Year … First BMW Z3 series cars were made in 1997.

5 Relation to Previous Work [Lui, Ku, Hsu] & [Orlowski] Our Work Problem: Purpose: Machine Learning Computational Geometry Query Optimization between points in real plane within a 0-1 matrix Find all maximal empty rectangles # of maximal 0-rectangles: O( (# 1’s) 2 ) O( #0’s ) [Namaad, Hsu, Lee]

6 Relation to Previous Work Our Work Time: Space: O(|X||Y|) O(min (|X|, |Y|)) only two rows of matrix kept in memory O( # 1’s log(#1’s) + # rectangles ) = O(|X||Y|) O( #0’s ) = O(|X||Y|) [Lui, Ku, Hsu] & [Orlowski] [Namaad, Hsu, Lee]

7 Relation to Previous Work Our Work Practical Implementation: Scalable: Scales Badly Scales well wrt # of tuples in join # of maximal rectangles # of values |X| & |Y| Intensive random memory access  Requires a single scan of the sorted data Practical? IBM paid us $25,000 to patent it! [Lui, Ku, Hsu] & [Orlowski] [Namaad, Hsu, Lee]

8 Structure of Algorithm loop y = 1..|Y| loop x = 1..|X| Output all maximal 0-rectangles with as bottom-right corner Maintain the loop invariant X 0 Y 0 1 Timing O(1) amortized time per *

9 Designing an Algorithm Define ProblemDefine Loop Invariants Define Measure of Progress Define StepDefine Exit ConditionMaintain Loop Inv Make ProgressInitial ConditionsEnding 79 km to school Exit 79 km75 km Exit 0 kmExit

X Y * Define the Loop Invariant We have read the matrix up to and cannot reread the matrix. We must output all maximal 0-rectangles with as bottom-right corner What must we remember?

11 0 step ( x,y ) rr Stack of steps 1 1 X Y * x*x* y*y*

12 * Constructing Maximal Rectangles

13 Too Narrow Maximal Too short * Constructing Maximal Rectangles

14 * Constructing staircase(x,y) from staircase(x - 1,y) Case 1 * 0

X Y ( x,y ) rr 11 0 * Constructing staircase(x,y) from staircase(x - 1,y) 0 Case 2

X Y ( x,y ) rr 11 0 Too Narrow Maximal Too short * Constructing staircase(x,y) from staircase(x - 1,y) 0 0 Delete Keep * 0

17 Constructing x * & y * ( x,y ) rr 11 0 * x*x* y*y*

18 X Y Location of last 1 seen in each column *

19 Structure of Algorithm loop y = 1..|Y| loop x = 1..|X| Construct staircase(x,y) Output all maximal 0-rectangles with as bottom-right corner X 0 Y 0 1 Timing O(1) amortized time per Third *

X Y ( x,y ) rr 11 0 Too Narrow Maximal Too short * Timing 0 0 Delete 0 Only work that is not constant Time

21 Timing Amortized # of steps deleted (per ) = # of steps created (per )  *

22 Number of Maximal Rectangles # of maximal 0-rectangles: O( (# 1’s) 2 ) [Namaad, Hsu, Lee] Running time of alg = O( #0’s )  

23 Designing an Algorithm Define ProblemDefine Loop Invariants Define Measure of Progress Define StepDefine Exit ConditionMaintain Loop Inv Make ProgressInitial ConditionsEnding 79 km to school Exit 79 km75 km Exit 0 kmExit