Stream-based Geometric Algorithms

Slides:



Advertisements
Similar presentations
The Data Stream Space Complexity of Cascaded Norms T.S. Jayram David Woodruff IBM Almaden.
Advertisements

Subspace Embeddings for the L1 norm with Applications Christian Sohler David Woodruff TU Dortmund IBM Almaden.
A Dependent LP-Rounding Approach for the k-Median Problem Moses Charikar 1 Shi Li 1 1 Department of Computer Science Princeton University ICALP 2012, Warwick,
Lindsey Bleimes Charlie Garrod Adam Meyerson
Overcoming the L 1 Non- Embeddability Barrier Robert Krauthgamer (Weizmann Institute) Joint work with Alexandr Andoni and Piotr Indyk (MIT)
O(N 1.5 ) divide-and-conquer technique for Minimum Spanning Tree problem Step 1: Divide the graph into  N sub-graph by clustering. Step 2: Solve each.
Fast Algorithms For Hierarchical Range Histogram Constructions
The Stability of a Good Clustering Marina Meila University of Washington
Maintaining Variance and k-Medians over Data Stream Windows Brian Babcock, Mayur Datar, Rajeev Motwani, Liadan O’Callaghan Stanford University.
MIT CSAIL Vision interfaces Towards efficient matching with random hashing methods… Kristen Grauman Gregory Shakhnarovich Trevor Darrell.
Geometric embeddings and graph expansion James R. Lee Institute for Advanced Study (Princeton) University of Washington (Seattle)
Sketching for M-Estimators: A Unified Approach to Robust Regression
Clustering Geometric Data Streams Jiří Skála Ivana Kolingerová ZČU/FAV/KIV2007.
Navigating Nets: Simple algorithms for proximity search Robert Krauthgamer (IBM Almaden) Joint work with James R. Lee (UC Berkeley)
1 Processing & Analysis of Geometric Shapes Shortest path problems Shortest path problems The discrete way © Alexander & Michael Bronstein, ©
University of CreteCS4831 The use of Minimum Spanning Trees in microarray expression data Gkirtzou Ekaterini.
Clustering.
Pattern Classification All materials in these slides were taken from Pattern Classification (2nd ed) by R. O. Duda, P. E. Hart and D. G. Stork, John Wiley.
Clustering In Large Graphs And Matrices Petros Drineas, Alan Frieze, Ravi Kannan, Santosh Vempala, V. Vinay Presented by Eric Anderson.
Bioinformatics Challenge  Learning in very high dimensions with very few samples  Acute leukemia dataset: 7129 # of gene vs. 72 samples  Colon cancer.
Improved Approximation Bounds for Planar Point Pattern Matching (under rigid motions) Minkyoung Cho Department of Computer Science University of Maryland.
Geometric Approaches to Reconstructing Times Series Project Outline 15 February 2007 CSC/Math 870 Computational Discrete Geometry Connie Phong.
Sketching for M-Estimators: A Unified Approach to Robust Regression Kenneth Clarkson David Woodruff IBM Almaden.
CS 591 A11 Algorithms for Data Streams Dhiman Barman CS 591 A1 Algorithms for the New Age 2 nd Dec, 2002.
Embedding and Sketching Non-normed spaces Alexandr Andoni (MSR)
Minimal Spanning Trees What is a minimal spanning tree (MST) and how to find one.
1 Streaming Algorithms for Geometric Problems Piotr Indyk MIT.
1 Introduction to Approximation Algorithms. 2 NP-completeness Do your best then.
1 Streaming Algorithms for Geometric Problems Piotr Indyk MIT.
Joint work with Chandrashekhar Nagarajan (Yahoo!)
Minimal Broker Overlay Design for Content-Based Publish/Subscribe Systems Naweed Tajuddin Balasubramaneyam Maniymaran Hans-Arno Jacobsen University of.
Data in Motion Michael Hoffman (Leicester) S Muthukrishnan (Google) Rajeev Raman (Leicester)
Facility Location in Dynamic Geometric Data Streams Christiane Lammersen Christian Sohler.
“Fault Tolerant Clustering Revisited” - - CCCG 2013 Nirman Kumar, Benjamin Raichel خوشه بندی مقاوم در برابر خرابی سپیده آقاملائی.
Hierarchical Well-Separated Trees (HST) Edges’ distances are uniform across a level of the tree Stretch  = factor by which distances decrease from root.
Feature Selection in k-Median Clustering Olvi Mangasarian and Edward Wild University of Wisconsin - Madison.
Komplexitätstheorie und effiziente Algorithmen Christian Sohler, TU Dortmund Algorithms for geometric data streams.
What is a metric embedding?Embedding ultrametrics into R d An embedding of an input metric space into a host metric space is a mapping that sends each.
Community detection via random walk Draft slides.
11 Lecture 24: MapReduce Algorithms Wrap-up. Admin PS2-4 solutions Project presentations next week – 20min presentation/team – 10 teams => 3 days – 3.
1 Approximations and Streaming Algorithms for Geometric Problems Piotr Indyk MIT.
1.  RNN(q) – returns a set of data points that have the query point q as the nearest neighbor.  Advanced database applications: fixed wireless telephone.
Curve Simplification under the L 2 -Norm Ben Berg Advisor: Pankaj Agarwal Mentor: Swaminathan Sankararaman.
A Binary Linear Programming Formulation of the Graph Edit Distance Presented by Shihao Ji Duke University Machine Learning Group July 17, 2006 Authors:
S IMILARITY E STIMATION T ECHNIQUES FROM R OUNDING A LGORITHMS Paper Review Jieun Lee Moses S. Charikar Princeton University Advanced Database.
Sparse RecoveryAlgorithmResults  Original signal x = x k + u, where x k has k large coefficients and u is noise.  Acquire measurements Ax = y. If |x|=n,
Gilad Lerman Math Department, UMN
Clustering Data Streams
Open Problems in Streaming
Core-Sets and Geometric Optimization problems.
Haim Kaplan and Uri Zwick
Polynomial-time approximation schemes for NP-hard geometric problems
Sublinear Algorithmic Tools 2
Overview Of Clustering Techniques
Object Modeling with Layers
Advanced Artificial Intelligence
Near(est) Neighbor in High Dimensions
Lecture 16: Earth-Mover Distance
Parallel Algorithms for Geometric Graph Problems
CIS 700: “algorithms for Big Data”
Y. Kotidis, S. Muthukrishnan,
Near-Optimal (Euclidean) Metric Compression
CSCI B609: “Foundations of Data Science”
Kinetic Collision Detection for Convex Fat Objects
Optimization Problems Online with Random Demands
Lecture 15: Least Square Regression Metric Embeddings
Randomized Online Algorithm for Minimum Metric Bipartite Matching
Clustering.
Clustering The process of grouping samples so that the samples are similar within each group.
Approximation Algorithms
Presentation transcript:

Stream-based Geometric Algorithms Piotr Indyk MIT

Streaming Algorithms for Geometric Problems Input: a stream S=p1…pn of points in Rd Goal: compute certain geometric quantity and/or structure Variations: Dynamic case: points can be deleted Sliding window: points disappear after some time t

Minimum Spanning Tree The tree has representation size (n) We only estimate the cost of MST

Minimum Weight Matching

Minimum Weight Bichromatic Matching

Facility Location Goal: choose a set F of facilities to minimize the sum of the distances to nearest facility plus the number of facilities times f

K-median K is given Goal: choose K medians to minimize the sum of the distances to the nearest median

Known Results Computing Lp norms of a stream (Graham’s talk) Clustering of points in metric spaces Charikar et al ’97, ’03; Guha et al’00: K-center and K-median (K) space, no deletions Meyerson’02: Facility location (|F|) space, no deletions

More of Known Results Approximate diameter etc Convex hulls etc Indyk’03: high dimensions Feigenbaum et al, Hershberger et al, Cormode et al’03: low dimensions Convex hulls etc

*follows Charikar’02; also Varadarajan’02 and Indyk-Thaper’02 Our Results Problem Type Delete Space Appr. MST Cost Yes polylog(D,n) log D MWM MWBM* Fac.Loc. No log2 D K-median Full poly(K,log D,log n) *follows Charikar’02; also Varadarajan’02 and Indyk-Thaper’02

Applications MST, MWM: ? MWBM: similarity of low-dim data sets Fac. Loc. : “clusterability” of a data set K-median: allocation of servers to clients (Muthu’03) log D might be not so bad in practice (1.1 in Indyk-Thaper’03)

Approach Impose square grids G0…Gk, with side lengths 20,21, …, 2k , shifted at random. For each square cell c in Gi, let nP(c) be the number of points from P in c. The algorithms will maintain certain statistics over nP(.), which will allow it to approximately solve the problems 2 1 1 3 1 1 3

Estimators MST: ∑i 2i ∑c Gi [nP(c)>0] MWM: ∑i 2i ∑c Gi [nP(c) is odd] MWBM: ∑i 2i ∑c Gi |nG(c)-nB(c)| Fac. Loc.: ∑i 2i ∑c Gi min[nP(c), Ti] K-median: ∑c Bj nP(c) for B1…Bl sampled from Gi’s with density 1/K

Proofs View the grids as a probabilistic embedding of P into a tree (HST’s) Show how to solve the problem in HST’s Show how to express the solution using just nP(c)’s First application of this kind of embeddings to streaming

Conclusions and Open Problems Replace log D by O(1) Other apps ?