Computing Diameter in the Streaming and Sliding-Window Models J. Feigenbaum, S. Kannan, J. Zhang.

Slides:



Advertisements
Similar presentations
The Data Stream Space Complexity of Cascaded Norms T.S. Jayram David Woodruff IBM Almaden.
Advertisements

Xiaoming Sun Tsinghua University David Woodruff MIT
Counting Distinct Objects over Sliding Windows Presented by: Muhammad Aamir Cheema Joint work with Wenjie Zhang, Ying Zhang and Xuemin Lin University of.
Advanced Topics in Algorithms and Data Structures Lecture 7.2, page 1 Merging two upper hulls Suppose, UH ( S 2 ) has s points given in an array according.
MENG 372 Chapter 3 Graphical Linkage Synthesis
2/14/13CMPS 3120 Computational Geometry1 CMPS 3120: Computational Geometry Spring 2013 Planar Subdivisions and Point Location Carola Wenk Based on: Computational.
Ariel Rosenfeld Network Traffic Engineering. Call Record Analysis. Sensor Data Analysis. Medical, Financial Monitoring. Etc,
Maintaining Variance and k-Medians over Data Stream Windows Brian Babcock, Mayur Datar, Rajeev Motwani, Liadan O’Callaghan Stanford University.
Maintaining Variance over Data Stream Windows Brian Babcock, Mayur Datar, Rajeev Motwani, Liadan O ’ Callaghan, Stanford University ACM Symp. on Principles.
Computing Diameter in the Streaming and Sliding-Window Models J. Feigenbaum, S. Kannan, J. Zhang.
One of the most important problems is Computational Geometry is to find an efficient way to decide, given a subdivision of E n and a point P, in which.
The Divide-and-Conquer Strategy
Chapter 6 Divide and Conquer  Introduction  Binary Search  Mergesort  The Divide and Conquer Paradigm  Quicksort  Multiplication of Large Integers.
Advanced Topics in Algorithms and Data Structures Lecture 7.1, page 1 An overview of lecture 7 An optimal parallel algorithm for the 2D convex hull problem,
Lecture 3: Reduce to known problem Convex Hull (section 33.3 of CLRS). Suppose we have a bunch of points in the plane, given by their x and y coordinates.
Chapter 7 Data Structure Transformations Basheer Qolomany.
Heavy hitter computation over data stream
UMass Lowell Computer Science Graduate Analysis of Algorithms Prof. Karen Daniels Spring, 2010 Lecture 3 Tuesday, 2/9/10 Amortized Analysis.
1 Complexity of Network Synchronization Raeda Naamnieh.
Lecture 12 : Special Case of Hidden-Line-Elimination Computational Geometry Prof. Dr. Th. Ottmann 1 Special Cases of the Hidden Line Elimination Problem.
Special Cases of the Hidden Line Elimination Problem Computational Geometry, WS 2007/08 Lecture 16 Prof. Dr. Thomas Ottmann Algorithmen & Datenstrukturen,
Massive Data Streams in Graph Theory and Computational Geometry Ph.D. Dissertation Defense Jian Zhang Advisor: Joan Feigenbaum Committee: Ravi Kannan Avi.
Approximate Range Searching in the Absolute Error Model Guilherme D. da Fonseca CAPES BEX Advisor: David M. Mount.
UMass Lowell Computer Science Graduate Analysis of Algorithms Prof. Karen Daniels Spring, 2009 Lecture 3 Tuesday, 2/10/09 Amortized Analysis.
Sublinear time algorithms Ronitt Rubinfeld Blavatnik School of Computer Science Tel Aviv University TexPoint fonts used in EMF. Read the TexPoint manual.
Lars Arge1, Mark de Berg2, Herman Haverkort3 and Ke Yi1
5 - 1 § 5 The Divide-and-Conquer Strategy e.g. find the maximum of a set S of n numbers.
Hidden-Line Elimination Computational Geometry, WS 2006/07 Lecture 14 Prof. Dr. Thomas Ottmann Algorithmen & Datenstrukturen, Institut für Informatik Fakultät.
Computing Sketches of Matrices Efficiently & (Privacy Preserving) Data Mining Petros Drineas Rensselaer Polytechnic Institute (joint.
Preference Analysis Joachim Giesen and Eva Schuberth May 24, 2006.
Probabilistic Skyline Operator over sliding Windows Wan Qian HKUST DB Group.
1 Separator Theorems for Planar Graphs Presented by Shira Zucker.
Orthogonality and Least Squares
6 6.3 © 2012 Pearson Education, Inc. Orthogonality and Least Squares ORTHOGONAL PROJECTIONS.
Circle Drawing algo..
Scientific Computing Partial Differential Equations Poisson Equation Calculus of Variations.
Cloud and Big Data Summer School, Stockholm, Aug Jeffrey D. Ullman.
Maintaining Variance and k-Medians over Data Stream Windows Paper by Brian Babcock, Mayur Datar, Rajeev Motwani and Liadan O’Callaghan. Presentation by.
1 By: MOSES CHARIKAR, CHANDRA CHEKURI, TOMAS FEDER, AND RAJEEV MOTWANI Presented By: Sarah Hegab.
Approximate Frequency Counts over Data Streams Gurmeet Singh Manku, Rajeev Motwani Standford University VLDB2002.
© The McGraw-Hill Companies, Inc., Chapter 6 Prune-and-Search Strategy.
Chapter 3 Sec 3.3 With Question/Answer Animations 1.
MINING FREQUENT ITEMSETS IN A STREAM TOON CALDERS, NELE DEXTERS, BART GOETHALS ICDM2007 Date: 5 June 2008 Speaker: Li, Huei-Jyun Advisor: Dr. Koh, Jia-Ling.
Computer Algorithms Submitted by: Rishi Jethwa Suvarna Angal.
Swarup Acharya Phillip B. Gibbons Viswanath Poosala Sridhar Ramaswamy Presented By Vinay Hoskere.
Elementary Sorting Algorithms Many of the slides are from Prof. Plaisted’s resources at University of North Carolina at Chapel Hill.
The Selection Problem. 2 Median and Order Statistics In this section, we will study algorithms for finding the i th smallest element in a set of n elements.
Copyright © Cengage Learning. All rights reserved. 15 Multiple Integrals.
Lars Arge Presented by Or Ozery. I/O Model Previously defined: N = # of elements in input M = # of elements that fit into memory B = # of elements per.
1 Approximating Quantiles over Sliding Windows Srimathi Harinarayanan CMPS 565.
1 Prune-and-Search Method 2012/10/30. A simple example: Binary search sorted sequence : (search 9) step 1  step 2  step 3  Binary search.
1 Online Computation and Continuous Maintaining of Quantile Summaries Tian Xia Database CCIS Northeastern University April 16, 2004.
Space-Efficient Online Computation of Quantile Summaries SIGMOD 01 Michael Greenwald & Sanjeev Khanna Presented by ellery.
1 / 41 Convex Hulls in 3-space Jason C. Yang. 2 / 41 Problem Statement Given P: set of n points in 3-space Return: –Convex hull of P: CH (P) –Smallest.
Approximate NN queries on Streams with Guaranteed Error/performance Bounds Nick AT&T labs-research Beng Chin Ooi, Kian-Lee Tan, Rui National.
© The McGraw-Hill Companies, Inc., Chapter 12 On-Line Algorithms.
Geometry Honors Section 5.3 Circumference and Area of Circles.
Lower bounds on data stream computations Seminar in Communication Complexity By Michael Umansky Instructor: Ronitt Rubinfeld.
1 Binary Search Trees  Average case and worst case Big O for –insertion –deletion –access  Balance is important. Unbalanced trees give worse than log.
11 -1 Chapter 12 On-Line Algorithms On-Line Algorithms On-line algorithms are used to solve on-line problems. The disk scheduling problem The requests.
Extending a displacement A displacement defined by a pair where l is the length of the displacement and  the angle between its direction and the x-axix.
Discrete Methods in Mathematical Informatics Kunihiko Sadakane The University of Tokyo
Continuous Monitoring of Distributed Data Streams over a Time-based Sliding Window MADALGO – Center for Massive Data Algorithmics, a Center of the Danish.
Discrete Methods in Mathematical Informatics Kunihiko Sadakane The University of Tokyo
Algorithms for Big Data: Streaming and Sublinear Time Algorithms
New Characterizations in Turnstile Streams with Applications
Algorithm design techniques Dr. M. Gavrilova
Lecture 6: Counting triangles Dynamic graphs & sampling
Given a list of n  8 integers, what is the runtime bound on the optimal algorithm that sorts the first eight? O(1) O(log n) O(n) O(n log n) O(n2)
Maintaining Stream Statistics over Sliding Windows
Presentation transcript:

Computing Diameter in the Streaming and Sliding-Window Models J. Feigenbaum, S. Kannan, J. Zhang

Introduction ● Two computational models: 1. Streaming model 2. Sliding-window model ● The problem: diameter of a point set P in R 2. The diameter is the maximum pairwise distance between points in P.

More about Models The streaming model ● A data stream is a sequence of data elements a 1 a 2,..., a m. ● A streaming algorithm is an algorithm that computes some function over a data stream and has the following properties: 1. The input data are accessed in a sequential order. 2. The order of the data elements in the stream is not controlled by the algorithm ● The length of the stream, m, is huge. Only space-efficient algorithms (sublinear or even polylog(m)) are considered.

Dynamic Algorithm in Computational Geometry ● Dynamic means that the set of objects under consideration may change. There could be additions and deletions to the point set P. ● Maintain the current set of geometry objects in certain data structures. Efficient updating and query answering are emphasized. ● May use linear space ─ different from the requirement of the streaming and the sliding-window models.

More about Models (Continued) The sliding-window model ● The input is still a stream of data elements. ● A data element arrives at each time instant; it later expires after a number of time stamps equal to the window size n ● The current window at any time instant is the set of data elements that have not yet expired.

Computing Diameter in the Streaming Model ● A well-known diameter-approximation is streaming in nature. ● Project the points onto lines. ● Requires θ ≤ such that |π(p)π(q)| ≥ |pq| cosθ ≥ (1− θ 2 /2)|pq| ≥ (1−ε)|pq| ● The algorithm goes through the input once. It needs storage for O(1/ ) points. To process each point, it performs O(1/ ) projections.

Diameter Approximation in the Streaming Model Theorem 1 There is a streaming ε-approximation algorithm for diameter that needs storage for O(1/ε) points and processes each point in O(log(1/ε)) time. ● Take the first point of the stream as the “center” and divide the space into sectors of angle θ = ε/2(1-ε). ● For each sector, keep the point furthest from the center in that sector.

Diameter Approximation in the Streaming Model Let H be the maximum distance between the center and any other point and T i,j be the minimal distance between the boundary arcs of sector i (bb') and sector j (aa'). Approximate the diameter with max{H, max i,j T ij }

Maintaining Diameter in the Sliding-Window Model ● Our space efficient mehtod maintains the diameter for sliding windows when the set of points P can be bounded in a box that is not too “large”. ● Let R be the maximum, over all windows, the ratio of the diameter over the minimal non-zero distance between any two points in that window. ● That the bounding space is “not too large” means R < 2 n.

Maintaining Diameter in the Sliding-Window Model Theorem 2 There is an ε-approximation algorithm that maintains the diameter for a planar point set in the sliding-window model using Poly(1/ε, log n, log R) bits of space.

Remove Irrelevant Points ● Consider maintaining the diameter in 1-d. ● A point will never realize any diameter if it is spatially located between two newer points. ● Remove these points. The locations of the remaining points would look like: (where a 1 is newer than a 2 which is newer than a 3...) ● The newer points would be located “inside” and the older points would be located “outside”

The “Rounding” Method ● Take the newest point as the “center,” and “round” down other points. ● Divide the line into the following intervals such that |ct i | = ( 1+ε ) i d for some distance d (to be specified later). ● Round all points in the interval [t i, t i+1 ) down to t i. ● In what follows we call the set of pints after “rounding” a cluster. If 2 i original points are grouped into a cluster, we say the cluster is at level i.

Number of Points in a Cluster ● If multiple points are rounded to the same location, we can discard the older ones and only keep the newest one. ● In each interval, we have only one point. Let D be the diameter, the number of points k in a cluster is bounded by: k ≤ log 1+ε D/d = (log D/d)/log (1+ε) ≤ (2/ε )log D/d

When Window Starts Sliding ● Need to consider addition and deletion. ● Deletion is easy, because the oldest point must be one of the cluster's extreme points. ● Addition is complicated, because we may need to update the cluster center for each point that arrives. ● Our solution: keep multiple clusters.

Multiple Clusters in a Window ● We allow at most two clusters to be at each “level”. ● When the number of clusters of “level” i exceeds 2, merge the oldest twe clusters to form a “cluster” at “level” i+1. ● The window can thus be divided into clusters.

Clusters in a Window

Merge Clusters ● Cluster c 1 +cluster c 2 = cluster c 3 ● Make Ctr 2 the center of cluster c 3

Merge Clusters (Continued) ● Discard the points in c 1 that are located between the centers of c 1 and c 2. ● If point p in c 1 satisfies |pCtr 1 | ≤ (1+ε)|Ctr 1 Ctr 2 |, discard it, too.

Merge Clusters (Continued) ● Round the points in c 2 and those remaining in c 1 after the previous two steps using the center Ctr 2. ● The value for d is lower bounded by ε ∙ |Ctr 1 Ctr 2 |. The number of points in a cluster is then bounded by: (2/ε )(log R + log 1/ε )

The Algorithm in 1-d ● Update: when a new point arrives, 1. Check the age of the boundary points of the oldest cluster. If one of them has expired, remove it. 2. Make the newly arrived point a cluster of size 1. Go through the clusters and merge clusters whenever necessary according to the rules stated above. 3. While going throught the clusters, update the boundary points of any cluster changed. 4. Update the window boundary points if necessary. ● Query Answer: Report the distance between the window boundary points as the window diameter.

Space Requirement ● Let diam p be a diameter realized by point p. Each time we do “rounding,” we introduce a displacement for p at most ε ∙ diam p. Also p can be “rounded” at most log n times. ● Choose ε to be at most ε/(2log n) to bound the error. ● There are at most 2log n clusters and in each cluster at most O ( 1/ε log n (log R + log log n + log 1/ε ) ) points. Keeping the age may require log n space for each point. The total space required is: O ( 1/ε log 3 n (log R + log log n + log 1/ε ) )

Time Complexity ● Query answer time is O(1). ● Worst case update time is O ( 1/ε log 2 n (log R + log log n + log 1/ε ) ) because we may have cascading merges. ● The amortized update time is O(log n)

Extend the Algorithm to 2-d ● We will have a set of lines l 0, l 1,... and project the points in the plane onto the lines. ● Guarantee that any paire of points will be projected to a line with angle φ such that 1− cos φ ≤ ε/2 ● Use the diameter-maintenance algorithm in 1-d for each line. ● Everything will have a multiplicative overhead of O(1/ ).

Lower Bound for Maintaining Exact Diameter Theorem 3 To maintain the exact diameter in a sliding window model requires Ω(n) bits of space. Consider 2n points {a 1, a 2,..., a 2n } with the following properties: –a n+1, a n+2,..., a 2n are located at coordinate zero. –|a 1 a n | ≥ |a 2 a n+1 | ≥ |a 3 a n+2 | ≥... ≥ |a n-1 a 2n-2 | = 1 –The coordinates of the points a j for j = 1,2,..., n-2 have the form n∙k for some k = 1,2,..., n.

A Family of Point Sequences a n a n+1 a n a n-1 a n-2 a2a2 a1a1 a n a n+1 a n a n-1 a n-2 a2a1a2a We show below two sequences in the family:

Lower Bound for Maintaining Exact Diameter (Countinued) ● There are at least different sequences of 2n points satisfying the above properties. ● Need O(n) space to distinguish them. (Note here R ≤ n 2 << 2 n )