Facility Location in Dynamic Geometric Data Streams Christiane Lammersen Christian Sohler.

Slides:

Advertisements

Similar presentations

A Fast PTAS for k-Means Clustering

Advertisements

Optimal Approximations of the Frequency Moments of Data Streams Piotr Indyk David Woodruff.

1+eps-Approximate Sparse Recovery Eric Price MIT David Woodruff IBM Almaden.

The Data Stream Space Complexity of Cascaded Norms T.S. Jayram David Woodruff IBM Almaden.

Tight Bounds for Distributed Functional Monitoring David Woodruff IBM Almaden Qin Zhang Aarhus University MADALGO.

Numerical Linear Algebra in the Streaming Model Ken Clarkson - IBM David Woodruff - IBM.

Optimal Space Lower Bounds for all Frequency Moments David Woodruff Based on SODA 04 paper.

The Average Case Complexity of Counting Distinct Elements David Woodruff IBM Almaden.

Optimal Bounds for Johnson- Lindenstrauss Transforms and Streaming Problems with Sub- Constant Error T.S. Jayram David Woodruff IBM Almaden.

Subspace Embeddings for the L1 norm with Applications Christian Sohler David Woodruff TU Dortmund IBM Almaden.

Scalable and Dynamic Quorum Systems Moni Naor & Udi Wieder The Weizmann Institute of Science.

Ion Stoica, Robert Morris, David Karger, M. Frans Kaashoek, Hari Balakrishnan MIT and Berkeley presented by Daniel Figueiredo Chord: A Scalable Peer-to-peer.

Counting Distinct Objects over Sliding Windows Presented by: Muhammad Aamir Cheema Joint work with Wenjie Zhang, Ying Zhang and Xuemin Lin University of.

Design Guidelines for Maximizing Lifetime and Avoiding Energy Holes in Sensor Networks with Uniform Distribution and Uniform Reporting Stephan Olariu Department.

Maintaining Variance and k-Medians over Data Stream Windows Brian Babcock, Mayur Datar, Rajeev Motwani, Liadan O’Callaghan Stanford University.

Randomized Algorithms Randomized Algorithms CS648 Lecture 15 Randomized Incremental Construction (building the background) Lecture 15 Randomized Incremental.

1 CS 361 Lecture 5 Approximate Quantiles and Histograms 9 Oct 2002 Gurmeet Singh Manku

Algorithms for data streams Foundations of Data Science 2014 Indian Institute of Science Navin Goyal.

Effectively Indexing Uncertain Moving Objects for Predictive Queries School of Computing National University of Singapore Department of Computer Science.

SIA: Secure Information Aggregation in Sensor Networks Bartosz Przydatek, Dawn Song, Adrian Perrig Carnegie Mellon University Carl Hartung CSCI 7143: Secure.

1 Algorithms for Large Data Sets Ziv Bar-Yossef Lecture 12 June 18, 2006

1 Reversible Sketches for Efficient and Accurate Change Detection over Network Data Streams Robert Schweller Ashish Gupta Elliot Parsons Yan Chen Computer.

Beneficial Caching in Mobile Ad Hoc Networks Bin Tang, Samir Das, Himanshu Gupta Computer Science Department Stony Brook University.

Optimization of Spatial Joins on Mobile Devices N. Mamoulis 1, P. Kalnis 2, S. Bakiras 3, X. Li 2 1 Department of Computer Science and Information Systems,

1 University of Freiburg Computer Networks and Telematics Prof. Christian Schindelhauer Distributed Coloring in Õ(  log n) Bit Rounds COST 293 GRAAL and.

Institute of Computer Science University of Wroclaw Page Migration in Dynamic Networks Marcin Bieńkowski Joint work with: Jarek Byrka (Centrum voor Wiskunde.

Cache Placement in Sensor Networks Under Update Cost Constraint Bin Tang, Samir Das and Himanshu Gupta Department of Computer Science Stony Brook University.

What ’ s Hot and What ’ s Not: Tracking Most Frequent Items Dynamically G. Cormode and S. Muthukrishman Rutgers University ACM Principles of Database Systems.

Distributed Lookup Systems

Reverse Hashing for Sketch Based Change Detection in High Speed Networks Ashish Gupta Elliot Parsons with Robert Schweller, Theory Group Advisor: Yan Chen.

Time-Decaying Sketches for Sensor Data Aggregation Graham Cormode AT&T Labs, Research Srikanta Tirthapura Dept. of Electrical and Computer Engineering.

A survey on stream data mining

SIA: Secure Information Aggregation in Sensor Networks Dhiman Barman Authors: Bartosz Przydateck, Dawn Song, and Adrian Perrig CMU SenSys 2003.

A DoS-Resilient Information System for Dynamic Data Management Stefan Schmid & Christian Scheideler Dept. of Computer Science University of Paderborn Matthias.

Problems and MotivationsOur ResultsTechnical Contributions Membership: Maintain a set S in the universe U with |S| ≤ n. Given an x in U, answer whether.

Tight Bounds for Graph Problems in Insertion Streams Xiaoming Sun and David P. Woodruff Chinese Academy of Sciences and IBM Research-Almaden.

1 By: MOSES CHARIKAR, CHANDRA CHEKURI, TOMAS FEDER, AND RAJEEV MOTWANI Presented By: Sarah Hegab.

Approximate Frequency Counts over Data Streams Loo Kin Kong 4 th Oct., 2002.

Streaming Algorithms Piotr Indyk MIT. Data Streams A data stream is a sequence of data that is too large to be stored in available memory Examples: –Network.

1 Streaming Algorithms for Geometric Problems Piotr Indyk MIT.

Summarizing and mining inverse distributions on data streams via dynamic inverse sampling Graham Cormode S. Muthukrishnan

PODC Distributed Computation of the Mode Fabian Kuhn Thomas Locher ETH Zurich, Switzerland Stefan Schmid TU Munich, Germany TexPoint fonts used in.

Data Stream Algorithms Ke Yi Hong Kong University of Science and Technology.

Data in Motion Michael Hoffman (Leicester) S Muthukrishnan (Google) Rajeev Raman (Leicester)

Massive Data Sets and Information Theory Ziv Bar-Yossef Department of Electrical Engineering Technion.

Calculating frequency moments of Data Stream

Komplexitätstheorie und effiziente Algorithmen Christian Sohler, TU Dortmund Algorithms for geometric data streams.

Energy Efficient Data Management for Wireless Sensor Networks with Data Sink Failure Hyunyoung Lee, Kyoungsook Lee, Lan Lin and Andreas Klappenecker †

11 Lecture 24: MapReduce Algorithms Wrap-up. Admin PS2-4 solutions Project presentations next week – 20min presentation/team – 10 teams => 3 days – 3.

1 Approximations and Streaming Algorithms for Geometric Problems Piotr Indyk MIT.

Distributed Algorithms for Dynamic Coverage in Sensor Networks Lan Lin and Hyunyoung Lee Department of Computer Science University of Denver.

Efficient Point Coverage in Wireless Sensor Networks Jie Wang and Ning Zhong Department of Computer Science University of Massachusetts Journal of Combinatorial.

REU 2009-Traffic Analysis of IP Networks Daniel S. Allen, Mentor: Dr. Rahul Tripathi Department of Computer Science & Engineering Data Streams Data streams.

Discrete Methods in Mathematical Informatics Kunihiko Sadakane The University of Tokyo

Discrete Methods in Mathematical Informatics Kunihiko Sadakane The University of Tokyo

Stream-based Geometric Algorithms

基于多核加速计算平台的深度神经网络分割与重训练技术

New Characterizations in Turnstile Streams with Applications

Monitoring Churn in Wireless Networks

Finding Frequent Items in Data Streams

Randomized Algorithms

Streaming & sampling.

Lecture 7: Dynamic sampling Dimension Reduction

Range-Efficient Counting of Distinct Elements

Neuro-RAM Unit in Spiking Neural Networks with Applications

Range-Efficient Computation of F0 over Massive Data Streams

Lecture 6: Counting triangles Dynamic graphs & sampling

Dynamic Graph Algorithms

President’s Day Lecture: Advanced Nearest Neighbor Search

Maintaining Stream Statistics over Sliding Windows

Presentation transcript:

Facility Location in Dynamic Geometric Data Streams Christiane Lammersen Christian Sohler

Dynamic Geometric Data Streams Streams of geometric data arise in –Mobile networks –Sensor networks –… Continuously changing data –Mobile networks: position of nodes –Sensor networks: measured data Communication in form of update operations –Update consists of ID of node, old value, new value IITK Workshop on Algorithms for Christiane Lammersen Processing Massive Data Sets 2

IITK Workshop on Algorithms for Christiane Lammersen Processing Massive Data Sets 333 Hierarchical Communication Systems upper layer offers lower layer a certain service each node can be a server cost for server ↔ access time 3

IITK Workshop on Algorithms for Christiane Lammersen Processing Massive Data Sets 4 Hierarchical Communication Systems upper layer offers lower layer a certain service each node can be a server cost for server ↔ access time

IITK Workshop on Algorithms for Christiane Lammersen Processing Massive Data Sets 5 Dynamic Geometric Data Streams m insert and delete operations points in low-dimensional, discrete space {1,...,  } d polylog( , m) memory space, one pass  [Indyk ‘04]

IITK Workshop on Algorithms for Christiane Lammersen Processing Massive Data Sets 666 Dynamic Uniform FLP point set P facilities have uniform opening cost f clients have uniform demand b goal: maintaining F  P, so as to minimize 6 FLP related to k -Median but | F | can be  (|P|)  problem in streaming  approximation of the cost

IITK Workshop on Algorithms for Christiane Lammersen Processing Massive Data Sets 777 Related Work P. Indyk: Algorithms for Dynamic Geometric Problems over Data Streams, STOC 04 – O(log 2  ) -approximation for cost of FLP – Idea: nested squared grids, open facility in all heavy cells G. Frahling and C. Sohler: Coresets in Dynamic Geometric Data Streams, STOC 05 – space partition based on heavy cells 7

IITK Workshop on Algorithms for Christiane Lammersen Processing Massive Data Sets 8 Construction of Our Streaming Method deterministic method E det (P) =  (OPT(P)) randomized method E rand (P) =  (E det (P)) streaming method E stream (P) =  (E rand (P))

IITK Workshop on Algorithms for Christiane Lammersen Processing Massive Data Sets Impose log(  )+1 nested squared grids In each grid, identify the heavy cells Partition the input space based on the heavy cells For each cell size, count the number of points within cells of that size => estimator for cost: [Indyk ’04, Frahling and Sohler ‘05] 9 Deterministic Method

IITK Workshop on Algorithms for Christiane Lammersen Processing Massive Data Sets Impose log(  )+1 nested squared grids In each grid, identify the heavy cells Partition the input space based on the heavy cells For each cell size, count the number of points within cells of that size => estimator for cost: 10 Deterministic Method Idea: Open one facility in each heavy cell in the space partition.

IITK Workshop on Algorithms for Christiane Lammersen Processing Massive Data Sets Impose log(  )+1 nested squared grids In each grid, identify the heavy cells Partition the input space based on the heavy cells For each cell size, count the number of points within cells of that size => estimator for cost: 11 Deterministic Method Idea: Open one facility in each heavy cell in the space partition.

IITK Workshop on Algorithms for Christiane Lammersen Processing Massive Data Sets 12 Nested Grids Impose log(  )+1 nested squared grids  = 16 Level: 4

IITK Workshop on Algorithms for Christiane Lammersen Processing Massive Data Sets 13 Nested Grids Impose log(  )+1 nested squared grids  = 16 Level: 3

IITK Workshop on Algorithms for Christiane Lammersen Processing Massive Data Sets 14 Nested Grids Impose log(  )+1 nested squared grids  = 16 Level: 2

IITK Workshop on Algorithms for Christiane Lammersen Processing Massive Data Sets 15 Nested Grids Impose log(  )+1 nested squared grids  = 16 Level: 1

IITK Workshop on Algorithms for Christiane Lammersen Processing Massive Data Sets 16 Nested Grids Impose log(  )+1 nested squared grids  = 16 Level: 0

IITK Workshop on Algorithms for Christiane Lammersen Processing Massive Data Sets 17 Deterministic Method Impose log(  )+1 nested squared grids In each grid, identify the heavy cells Partition the input space based on the heavy cells For each cell size, count the number of points within cells of that size => estimator for cost:

IITK Workshop on Algorithms for Christiane Lammersen Processing Massive Data Sets 18 Space Partition In each grid, identify the heavy cells Partition the input space based on the heavy cells f = 8  = 16 Level: 4 Cell in level i is heavy if it contains f / 2 i points.

IITK Workshop on Algorithms for Christiane Lammersen Processing Massive Data Sets 19 Space Partition In each grid, identify the heavy cells Partition the input space based on the heavy cells f = 8  = 16 Level: 3 Cell in level i is heavy if it contains f / 2 i points.

IITK Workshop on Algorithms for Christiane Lammersen Processing Massive Data Sets 20 Space Partition In each grid, identify the heavy cells Partition the input space based on the heavy cells f = 8  = 16 Level: 3 Cell in level i is heavy if it contains f / 2 i points.

IITK Workshop on Algorithms for Christiane Lammersen Processing Massive Data Sets 21 Space Partition In each grid, identify the heavy cells Partition the input space based on the heavy cells f = 8  = 16 Level: 3 Cell in level i is heavy if it contains f / 2 i points.

IITK Workshop on Algorithms for Christiane Lammersen Processing Massive Data Sets 22 Space Partition In each grid, identify the heavy cells Partition the input space based on the heavy cells f = 8  = 16 Level: 3 Cell in level i is heavy if it contains f / 2 i points.

IITK Workshop on Algorithms for Christiane Lammersen Processing Massive Data Sets 23 Space Partition In each grid, identify the heavy cells Partition the input space based on the heavy cells f = 8  = 16 Level: 2 Cell in level i is heavy if it contains f / 2 i points.

IITK Workshop on Algorithms for Christiane Lammersen Processing Massive Data Sets 24 Space Partition In each grid, identify the heavy cells Partition the input space based on the heavy cells f = 8  = 16 Level: 2 Cell in level i is heavy if it contains f / 2 i points.

IITK Workshop on Algorithms for Christiane Lammersen Processing Massive Data Sets 25 Space Partition In each grid, identify the heavy cells Partition the input space based on the heavy cells f = 8  = 16 Level: 2 Cell in level i is heavy if it contains f / 2 i points.

IITK Workshop on Algorithms for Christiane Lammersen Processing Massive Data Sets 26 Space Partition In each grid, identify the heavy cells Partition the input space based on the heavy cells f = 8  = 16 Level: 2 Cell in level i is heavy if it contains f / 2 i points.

IITK Workshop on Algorithms for Christiane Lammersen Processing Massive Data Sets 27 Space Partition In each grid, identify the heavy cells Partition the input space based on the heavy cells f = 8  = 16 Level: 1 Cell in level i is heavy if it contains f / 2 i points.

IITK Workshop on Algorithms for Christiane Lammersen Processing Massive Data Sets 28 Space Partition In each grid, identify the heavy cells Partition the input space based on the heavy cells f = 8  = 16 Level: 1 Cell in level i is heavy if it contains f / 2 i points.

IITK Workshop on Algorithms for Christiane Lammersen Processing Massive Data Sets 29 Space Partition In each grid, identify the heavy cells Partition the input space based on the heavy cells f = 8  = 16 Level: 1 Cell in level i is heavy if it contains f / 2 i points.

IITK Workshop on Algorithms for Christiane Lammersen Processing Massive Data Sets 30 Space Partition In each grid, identify the heavy cells Partition the input space based on the heavy cells f = 8  = 16 Level: 0 Cell in level i is heavy if it contains f / 2 i points.

IITK Workshop on Algorithms for Christiane Lammersen Processing Massive Data Sets 31 Space Partition In each grid, identify the heavy cells Partition the input space based on the heavy cells

IITK Workshop on Algorithms for Christiane Lammersen Processing Massive Data Sets 32 Deterministic Method Impose log(  )+1 nested squared grids In each grid, identify the heavy cells Partition the input space based on the heavy cells For each cell size, count the number of points within cells of that size => estimator for cost:

IITK Workshop on Algorithms for Christiane Lammersen Processing Massive Data Sets 33 Cost Estimator For each cell size, count the number of points within cells of that size => estimator for cost:

IITK Workshop on Algorithms for Christiane Lammersen Processing Massive Data Sets 34 Cost Estimator For each cell size, count the number of points within cells of that size => estimator for cost:

IITK Workshop on Algorithms for Christiane Lammersen Processing Massive Data Sets 35 Cost Estimator For each cell size, count the number of points within cells of that size => estimator for cost: 9 points

IITK Workshop on Algorithms for Christiane Lammersen Processing Massive Data Sets 36 Cost Estimator For each cell size, count the number of points within cells of that size => estimator for cost:

IITK Workshop on Algorithms for Christiane Lammersen Processing Massive Data Sets 37 Cost Estimator For each cell size, count the number of points within cells of that size => estimator for cost: 7 points

IITK Workshop on Algorithms for Christiane Lammersen Processing Massive Data Sets 38 Value of Cost Estimator is  (OPT(P)) Contribution of heavy cell C in level i is at most Contribution of light cell C in level i is at most A heavy cell in level i contains  ( f / 2 i ) points. The space partition is balanced. The distance of a cell in level i to heavy cell is O(2 i ).

IITK Workshop on Algorithms for Christiane Lammersen Processing Massive Data Sets 39 Value of Cost Estimator is O(OPT(P)) Contribution of distant cell C in level i is at least n(C). 2 i-1 OPT(P)  f. |F OPT | Estimated cost for near cell C in level i is n(C). 2 i = O( f ) There is a constant number of near cells. Estimated cost for near cells is O( f. |F OPT |) level i radius 2 i-1

IITK Workshop on Algorithms for Christiane Lammersen Processing Massive Data Sets 40 Deterministic Method Impose log(  )+1 nested squared grids In each grid, identify the heavy cells Partition the input space based on the heavy cells For each cell size, count the number of points within cells of that size => estimator for cost:

IITK Workshop on Algorithms for Christiane Lammersen Processing Massive Data Sets 41 Randomized Method Idea: –Heavy cell in level i contains at least f /2 i points –Sample a point in level i with probability 2 i /f Problem: coin flips & delete operations Solution: –Hash function h i : { 1,…,  } d → { 1,…,  f / 2 i  } –Sample set S i = { p  P | h i ( p) = 1 } … hihi

IITK Workshop on Algorithms for Christiane Lammersen Processing Massive Data Sets 42 Randomized Method for each level i do F(i)  set of all marked cells C in level i such that a)no subcell of C is marked b)no smaller cell within a distance of less than 2 i-1 is marked return E rand (P) =  (E det (P))

IITK Workshop on Algorithms for Christiane Lammersen Processing Massive Data Sets 43 Idea: Reduction to counting distinct elements Implementation: -For each level i count distinct elements in DE 1 (i) = {C|C is in level i and marked}  {C|C is in level i and a) or b) fails} and DE 2 (i) = {C|C is in level i and a) or b) fails} -Output difference as cost for level i Streaming Method DE 1 (i) DE 2 (i) DE 1 (i+1) DE 2 (i+1)

IITK Workshop on Algorithms for Christiane Lammersen Processing Massive Data Sets 44 Conclusion & Future Work Streaming Algorithm for Dynamic FLP: constant factor approximation of cost update-time: O(log(1/  ). polylog(  )) space : O(log(1/  ). polylog(  )) failure probability:  Future Work: approximation factor not exponential in d (1+  ) -approximation algorithm 44

Thank you for your attention! Department of Computer Science Technische Universität Dortmund Otto-Hahn-Str Dortmund, Germany Phone: Fax.: