Facility Location in Dynamic Geometric Data Streams Christiane Lammersen Christian Sohler
Dynamic Geometric Data Streams Streams of geometric data arise in –Mobile networks –Sensor networks –… Continuously changing data –Mobile networks: position of nodes –Sensor networks: measured data Communication in form of update operations –Update consists of ID of node, old value, new value IITK Workshop on Algorithms for Christiane Lammersen Processing Massive Data Sets 2
IITK Workshop on Algorithms for Christiane Lammersen Processing Massive Data Sets 333 Hierarchical Communication Systems upper layer offers lower layer a certain service each node can be a server cost for server ↔ access time 3
IITK Workshop on Algorithms for Christiane Lammersen Processing Massive Data Sets 4 Hierarchical Communication Systems upper layer offers lower layer a certain service each node can be a server cost for server ↔ access time
IITK Workshop on Algorithms for Christiane Lammersen Processing Massive Data Sets 5 Dynamic Geometric Data Streams m insert and delete operations points in low-dimensional, discrete space {1,..., } d polylog( , m) memory space, one pass [Indyk ‘04]
IITK Workshop on Algorithms for Christiane Lammersen Processing Massive Data Sets 666 Dynamic Uniform FLP point set P facilities have uniform opening cost f clients have uniform demand b goal: maintaining F P, so as to minimize 6 FLP related to k -Median but | F | can be (|P|) problem in streaming approximation of the cost
IITK Workshop on Algorithms for Christiane Lammersen Processing Massive Data Sets 777 Related Work P. Indyk: Algorithms for Dynamic Geometric Problems over Data Streams, STOC 04 – O(log 2 ) -approximation for cost of FLP – Idea: nested squared grids, open facility in all heavy cells G. Frahling and C. Sohler: Coresets in Dynamic Geometric Data Streams, STOC 05 – space partition based on heavy cells 7
IITK Workshop on Algorithms for Christiane Lammersen Processing Massive Data Sets 8 Construction of Our Streaming Method deterministic method E det (P) = (OPT(P)) randomized method E rand (P) = (E det (P)) streaming method E stream (P) = (E rand (P))
IITK Workshop on Algorithms for Christiane Lammersen Processing Massive Data Sets Impose log( )+1 nested squared grids In each grid, identify the heavy cells Partition the input space based on the heavy cells For each cell size, count the number of points within cells of that size => estimator for cost: [Indyk ’04, Frahling and Sohler ‘05] 9 Deterministic Method
IITK Workshop on Algorithms for Christiane Lammersen Processing Massive Data Sets Impose log( )+1 nested squared grids In each grid, identify the heavy cells Partition the input space based on the heavy cells For each cell size, count the number of points within cells of that size => estimator for cost: 10 Deterministic Method Idea: Open one facility in each heavy cell in the space partition.
IITK Workshop on Algorithms for Christiane Lammersen Processing Massive Data Sets Impose log( )+1 nested squared grids In each grid, identify the heavy cells Partition the input space based on the heavy cells For each cell size, count the number of points within cells of that size => estimator for cost: 11 Deterministic Method Idea: Open one facility in each heavy cell in the space partition.
IITK Workshop on Algorithms for Christiane Lammersen Processing Massive Data Sets 12 Nested Grids Impose log( )+1 nested squared grids = 16 Level: 4
IITK Workshop on Algorithms for Christiane Lammersen Processing Massive Data Sets 13 Nested Grids Impose log( )+1 nested squared grids = 16 Level: 3
IITK Workshop on Algorithms for Christiane Lammersen Processing Massive Data Sets 14 Nested Grids Impose log( )+1 nested squared grids = 16 Level: 2
IITK Workshop on Algorithms for Christiane Lammersen Processing Massive Data Sets 15 Nested Grids Impose log( )+1 nested squared grids = 16 Level: 1
IITK Workshop on Algorithms for Christiane Lammersen Processing Massive Data Sets 16 Nested Grids Impose log( )+1 nested squared grids = 16 Level: 0
IITK Workshop on Algorithms for Christiane Lammersen Processing Massive Data Sets 17 Deterministic Method Impose log( )+1 nested squared grids In each grid, identify the heavy cells Partition the input space based on the heavy cells For each cell size, count the number of points within cells of that size => estimator for cost:
IITK Workshop on Algorithms for Christiane Lammersen Processing Massive Data Sets 18 Space Partition In each grid, identify the heavy cells Partition the input space based on the heavy cells f = 8 = 16 Level: 4 Cell in level i is heavy if it contains f / 2 i points.
IITK Workshop on Algorithms for Christiane Lammersen Processing Massive Data Sets 19 Space Partition In each grid, identify the heavy cells Partition the input space based on the heavy cells f = 8 = 16 Level: 3 Cell in level i is heavy if it contains f / 2 i points.
IITK Workshop on Algorithms for Christiane Lammersen Processing Massive Data Sets 20 Space Partition In each grid, identify the heavy cells Partition the input space based on the heavy cells f = 8 = 16 Level: 3 Cell in level i is heavy if it contains f / 2 i points.
IITK Workshop on Algorithms for Christiane Lammersen Processing Massive Data Sets 21 Space Partition In each grid, identify the heavy cells Partition the input space based on the heavy cells f = 8 = 16 Level: 3 Cell in level i is heavy if it contains f / 2 i points.
IITK Workshop on Algorithms for Christiane Lammersen Processing Massive Data Sets 22 Space Partition In each grid, identify the heavy cells Partition the input space based on the heavy cells f = 8 = 16 Level: 3 Cell in level i is heavy if it contains f / 2 i points.
IITK Workshop on Algorithms for Christiane Lammersen Processing Massive Data Sets 23 Space Partition In each grid, identify the heavy cells Partition the input space based on the heavy cells f = 8 = 16 Level: 2 Cell in level i is heavy if it contains f / 2 i points.
IITK Workshop on Algorithms for Christiane Lammersen Processing Massive Data Sets 24 Space Partition In each grid, identify the heavy cells Partition the input space based on the heavy cells f = 8 = 16 Level: 2 Cell in level i is heavy if it contains f / 2 i points.
IITK Workshop on Algorithms for Christiane Lammersen Processing Massive Data Sets 25 Space Partition In each grid, identify the heavy cells Partition the input space based on the heavy cells f = 8 = 16 Level: 2 Cell in level i is heavy if it contains f / 2 i points.
IITK Workshop on Algorithms for Christiane Lammersen Processing Massive Data Sets 26 Space Partition In each grid, identify the heavy cells Partition the input space based on the heavy cells f = 8 = 16 Level: 2 Cell in level i is heavy if it contains f / 2 i points.
IITK Workshop on Algorithms for Christiane Lammersen Processing Massive Data Sets 27 Space Partition In each grid, identify the heavy cells Partition the input space based on the heavy cells f = 8 = 16 Level: 1 Cell in level i is heavy if it contains f / 2 i points.
IITK Workshop on Algorithms for Christiane Lammersen Processing Massive Data Sets 28 Space Partition In each grid, identify the heavy cells Partition the input space based on the heavy cells f = 8 = 16 Level: 1 Cell in level i is heavy if it contains f / 2 i points.
IITK Workshop on Algorithms for Christiane Lammersen Processing Massive Data Sets 29 Space Partition In each grid, identify the heavy cells Partition the input space based on the heavy cells f = 8 = 16 Level: 1 Cell in level i is heavy if it contains f / 2 i points.
IITK Workshop on Algorithms for Christiane Lammersen Processing Massive Data Sets 30 Space Partition In each grid, identify the heavy cells Partition the input space based on the heavy cells f = 8 = 16 Level: 0 Cell in level i is heavy if it contains f / 2 i points.
IITK Workshop on Algorithms for Christiane Lammersen Processing Massive Data Sets 31 Space Partition In each grid, identify the heavy cells Partition the input space based on the heavy cells
IITK Workshop on Algorithms for Christiane Lammersen Processing Massive Data Sets 32 Deterministic Method Impose log( )+1 nested squared grids In each grid, identify the heavy cells Partition the input space based on the heavy cells For each cell size, count the number of points within cells of that size => estimator for cost:
IITK Workshop on Algorithms for Christiane Lammersen Processing Massive Data Sets 33 Cost Estimator For each cell size, count the number of points within cells of that size => estimator for cost:
IITK Workshop on Algorithms for Christiane Lammersen Processing Massive Data Sets 34 Cost Estimator For each cell size, count the number of points within cells of that size => estimator for cost:
IITK Workshop on Algorithms for Christiane Lammersen Processing Massive Data Sets 35 Cost Estimator For each cell size, count the number of points within cells of that size => estimator for cost: 9 points
IITK Workshop on Algorithms for Christiane Lammersen Processing Massive Data Sets 36 Cost Estimator For each cell size, count the number of points within cells of that size => estimator for cost:
IITK Workshop on Algorithms for Christiane Lammersen Processing Massive Data Sets 37 Cost Estimator For each cell size, count the number of points within cells of that size => estimator for cost: 7 points
IITK Workshop on Algorithms for Christiane Lammersen Processing Massive Data Sets 38 Value of Cost Estimator is (OPT(P)) Contribution of heavy cell C in level i is at most Contribution of light cell C in level i is at most A heavy cell in level i contains ( f / 2 i ) points. The space partition is balanced. The distance of a cell in level i to heavy cell is O(2 i ).
IITK Workshop on Algorithms for Christiane Lammersen Processing Massive Data Sets 39 Value of Cost Estimator is O(OPT(P)) Contribution of distant cell C in level i is at least n(C). 2 i-1 OPT(P) f. |F OPT | Estimated cost for near cell C in level i is n(C). 2 i = O( f ) There is a constant number of near cells. Estimated cost for near cells is O( f. |F OPT |) level i radius 2 i-1
IITK Workshop on Algorithms for Christiane Lammersen Processing Massive Data Sets 40 Deterministic Method Impose log( )+1 nested squared grids In each grid, identify the heavy cells Partition the input space based on the heavy cells For each cell size, count the number of points within cells of that size => estimator for cost:
IITK Workshop on Algorithms for Christiane Lammersen Processing Massive Data Sets 41 Randomized Method Idea: –Heavy cell in level i contains at least f /2 i points –Sample a point in level i with probability 2 i /f Problem: coin flips & delete operations Solution: –Hash function h i : { 1,…, } d → { 1,…, f / 2 i } –Sample set S i = { p P | h i ( p) = 1 } … hihi
IITK Workshop on Algorithms for Christiane Lammersen Processing Massive Data Sets 42 Randomized Method for each level i do F(i) set of all marked cells C in level i such that a)no subcell of C is marked b)no smaller cell within a distance of less than 2 i-1 is marked return E rand (P) = (E det (P))
IITK Workshop on Algorithms for Christiane Lammersen Processing Massive Data Sets 43 Idea: Reduction to counting distinct elements Implementation: -For each level i count distinct elements in DE 1 (i) = {C|C is in level i and marked} {C|C is in level i and a) or b) fails} and DE 2 (i) = {C|C is in level i and a) or b) fails} -Output difference as cost for level i Streaming Method DE 1 (i) DE 2 (i) DE 1 (i+1) DE 2 (i+1)
IITK Workshop on Algorithms for Christiane Lammersen Processing Massive Data Sets 44 Conclusion & Future Work Streaming Algorithm for Dynamic FLP: constant factor approximation of cost update-time: O(log(1/ ). polylog( )) space : O(log(1/ ). polylog( )) failure probability: Future Work: approximation factor not exponential in d (1+ ) -approximation algorithm 44
Thank you for your attention! Department of Computer Science Technische Universität Dortmund Otto-Hahn-Str Dortmund, Germany Phone: Fax.: