PODC 20081 Distributed Computation of the Mode Fabian Kuhn Thomas Locher ETH Zurich, Switzerland Stefan Schmid TU Munich, Germany TexPoint fonts used in.

Slides:



Advertisements
Similar presentations
Optimal Space Lower Bounds for All Frequency Moments David Woodruff MIT
Advertisements

Optimal Space Lower Bounds for all Frequency Moments David Woodruff Based on SODA 04 paper.
The Average Case Complexity of Counting Distinct Elements David Woodruff IBM Almaden.
Scalable and Dynamic Quorum Systems Moni Naor & Udi Wieder The Weizmann Institute of Science.
Chapter 15 Basic Asynchronous Network Algorithms
Improved Approximation Algorithms for the Spanning Star Forest Problem Prasad Raghavendra Ning ChenC. Thach Nguyen Atri Rudra Gyanit Singh University of.
Small Subgraphs in Random Graphs and the Power of Multiple Choices The Online Case Torsten Mütze, ETH Zürich Joint work with Reto Spöhel and Henning Thomas.
On the relation between probabilistic and deterministic avoidance games Torsten Mütze, ETH Zürich Joint work with Michael Belfrage (ETH Zürich), Thomas.
CPSC 689: Discrete Algorithms for Mobile and Wireless Systems Spring 2009 Prof. Jennifer Welch.
Information Networks Small World Networks Lecture 5.
1 Algorithms for Large Data Sets Ziv Bar-Yossef Lecture 13 June 25, 2006
Routing, Anycast, and Multicast for Mesh and Sensor Networks Roland Flury Roger Wattenhofer RAM Distributed Computing Group.
Department of Computer Science, University of Maryland, College Park, USA TexPoint fonts used in EMF. Read the TexPoint manual before you delete this box.:
Graph Sparsifiers by Edge-Connectivity and Random Spanning Trees Nick Harvey University of Waterloo Department of Combinatorics and Optimization Joint.
Graph Sparsifiers by Edge-Connectivity and Random Spanning Trees Nick Harvey U. Waterloo C&O Joint work with Isaac Fung TexPoint fonts used in EMF. Read.
CPSC 689: Discrete Algorithms for Mobile and Wireless Systems Spring 2009 Prof. Jennifer Welch.
ETH Zurich – Distributed Computing Group Roger Wattenhofer 1ETH Zurich – Distributed Computing – Christoph Lenzen Roger Wattenhofer Exponential.
1 Algorithms for Large Data Sets Ziv Bar-Yossef Lecture 12 June 18, 2006
1 Distributed Streams Algorithms for Sliding Windows Phillip B. Gibbons, Srikanta Tirthapura.
On the power of choices in random graph processes Reto Spöhel, PhD Defense February 17, 2010, ETH Zürich Examiners: Prof. Dr. Angelika Steger, ETH Zürich.
Network Design Adam Meyerson Carnegie-Mellon University.
Parallel Routing Bruce, Chiu-Wing Sham. Overview Background Routing in parallel computers Routing in hypercube network –Bit-fixing routing algorithm –Randomized.
Dynamic Hypercube Topology Stefan Schmid URAW 2005 Upper Rhine Algorithms Workshop University of Tübingen, Germany.
ETH Zurich – Distributed Computing Group Jasmin Smula 1ETH Zurich – Distributed Computing – Stephan Holzer Yvonne Anne Pignolet Jasmin.
Derandomizing LOGSPACE Based on a paper by Russell Impagliazo, Noam Nissan and Avi Wigderson Presented by Amir Rosenfeld.
Coloring random graphs online without creating monochromatic subgraphs Torsten Mütze, ETH Zürich Joint work with Thomas Rast (ETH Zürich) and Reto Spöhel.
Small Subgraphs in Random Graphs and the Power of Multiple Choices The Online Case Torsten Mütze, ETH Zürich Joint work with Reto Spöhel and Henning Thomas.
Skip Lists1 Skip Lists William Pugh: ” Skip Lists: A Probabilistic Alternative to Balanced Trees ”, 1990  S0S0 S1S1 S2S2 S3S3 
CPSC 689: Discrete Algorithms for Mobile and Wireless Systems Spring 2009 Prof. Jennifer Welch.
Approximation Algorithms: Bristol Summer School 2008 Seffi Naor Computer Science Dept. Technion Haifa, Israel TexPoint fonts used in EMF. Read the TexPoint.
Online Function Tracking with Generalized Penalties Marcin Bieńkowski Institute of Computer Science, University of Wrocław, Poland Stefan Schmid Deutsche.
Distributed Algorithms on a Congested Clique Christoph Lenzen.
Distributed Verification and Hardness of Distributed Approximation Atish Das Sarma Stephan Holzer Danupon Nanongkai Gopal Pandurangan David Peleg 1 Weizmann.
Broadcast & Convergecast Downcast & Upcast
On Probabilistic Snap-Stabilization Karine Altisen Stéphane Devismes University of Grenoble.
Small subgraphs in the Achlioptas process Reto Spöhel, ETH Zürich Joint work with Torsten Mütze and Henning Thomas TexPoint fonts used in EMF. Read the.
Tight Bounds for Graph Problems in Insertion Streams Xiaoming Sun and David P. Woodruff Chinese Academy of Sciences and IBM Research-Almaden.
On Probabilistic Snap-Stabilization Karine Altisen Stéphane Devismes University of Grenoble.
Finding Frequent Items in Data Streams [Charikar-Chen-Farach-Colton] Paper report By MH, 2004/12/17.
Streaming Algorithms Piotr Indyk MIT. Data Streams A data stream is a sequence of data that is too large to be stored in available memory Examples: –Network.
Trevor Brown – University of Toronto B-slack trees: Space efficient B-trees.
Graph Sparsifiers Nick Harvey Joint work with Isaac Fung TexPoint fonts used in EMF. Read the TexPoint manual before you delete this box.: A A A.
1 Tight Bounds for Delay-Sensitive Aggregation Yvonne Anne Oswald Stefan Schmid Roger Wattenhofer Distributed Computing Group LEA.
One-way multi-party communication lower bound for pointer jumping with applications Emanuele Viola & Avi Wigderson Columbia University IAS work done while.
1 HEINZ NIXDORF INSTITUTE University of Paderborn Algorithms and Complexity Christian Schindelhauer Algorithms for Radio Networks Winter Term 2005/2006.
Balanced Online Graph Avoidance Games Henning Thomas Master Thesis supervised by Reto Spöhel ETH Zürich TexPoint fonts used in EMF. Read the TexPoint manual.
Data Stream Algorithms Lower Bounds Graham Cormode
ETH Zurich – Distributed Computing Group Stephan Holzer 1ETH Zurich – Distributed Computing – Stephan Holzer Yvonne Anne Pignolet Jasmin.
Lower bounds on data stream computations Seminar in Communication Complexity By Michael Umansky Instructor: Ronitt Rubinfeld.
Avoiding small subgraphs in the Achlioptas process Torsten Mütze, ETH Zürich Joint work with Reto Spöhel and Henning Thomas TexPoint fonts used in EMF.
Portland, Oregon, 13 August, 2007 A Randomized Distributed Algorithm for the Maximal Independent Set Problem in Growth-Bounded Graphs Beat Gfeller, Elias.
1 Plaxton Routing. 2 History Greg Plaxton, Rajmohan Rajaraman, Andrea Richa. Accessing nearby copies of replicated objects, SPAA 1997 Used in several.
ETH Zurich – Distributed Computing Group Stephan HolzerSODA Stephan Holzer Silvio Frischknecht Roger Wattenhofer Networks Cannot Compute Their Diameter.
1 HEINZ NIXDORF INSTITUTE University of Paderborn Algorithms and Complexity Christian Schindelhauer Algorithms for Radio Networks Winter Term 2005/2006.
The Message Passing Communication Model David Woodruff IBM Almaden.
Beating CountSketch for Heavy Hitters in Insertion Streams Vladimir Braverman (JHU) Stephen R. Chestnut (ETH) Nikita Ivkin (JHU) David P. Woodruff (IBM)
An algorithmic proof of the Lovasz Local Lemma via resampling oracles Jan Vondrak IBM Almaden TexPoint fonts used in EMF. Read the TexPoint manual before.
Discrete Methods in Mathematical Informatics Kunihiko Sadakane The University of Tokyo
ETH Zurich – Distributed Computing Group Stephan HolzerETH Zurich – Distributed Computing – Stephan Holzer - ETH Zürich Thomas Locher.
Peer-to-Peer Networks 07 Degree Optimal Networks
Andreas Klappenecker [partially based on the slides of Prof. Welch]
Randomized Min-Cut Algorithm
Stochastic Streams: Sample Complexity vs. Space Complexity
New Characterizations in Turnstile Streams with Applications
Monitoring Churn in Wireless Networks
Finding Frequent Items in Data Streams
Lectures on Network Flows
Turnstile Streaming Algorithms Might as Well Be Linear Sketches
Range-Efficient Computation of F0 over Massive Data Streams
Compact routing schemes with improved stretch
Presentation transcript:

PODC Distributed Computation of the Mode Fabian Kuhn Thomas Locher ETH Zurich, Switzerland Stefan Schmid TU Munich, Germany TexPoint fonts used in EMF. Read the TexPoint manual before you delete this box.: AA A AA A A A

PODC General Trend in Information Technology New Applications and System Paradigms Large-scale Distributed Systems Centralized Systems Networked Systems Internet

PODC Distributed Data Earlier: Data stored on a central sever Today: Data distributed over network (e.g. distributed databases, sensor networks) Typically: Data stored where it occurs Nevertheless: Need to query all / large portion of data Methods for distributed aggregation needed

PODC Model Network given by a graph G=(V,E) Nodes: Network devices, Edges: Communication links Data stored at the nodes – For simplicity: each node has exactly one data item / value Query initiated at some node Compute result of query by sending around (small) messages

PODC Simple Aggregation Functions Simple aggregation functions: 1 convergecast on spanning tree (simple: algebraic, distributive e.g.: min, max, sum, avg, …) On BFS tree: time complexity = O(D) (D = diameter) k independent simple functions: Time O(D+k) by using pipelining

PODC The Mode Mode = most frequent element Every node has an element from {1,…,K} k different elements e 1,…,e k, frequencies: m 1 ¸ m 2 ¸ … ¸ m k (k and m i are not known to algorithm) Goal: Find mode = element occuring m 1 times Per message: 1 element, O(log n + log K) additional bits

PODC Mode: Simple Algorithm Send all elements to root, aggregate frequencies along the way Using pipelining, time O(D+k) – Always send smallest element first to avoid empty queues For almost uniform frequency distributions, algorithm is optimal Goal: Fast algorithm if frequency distribution is good (skewed)

PODC Mode: Basic Idea Assume, nodes have access to common random hash functions h 1, h 2, … where h i : {1,…,K}  {-1,+1} Apply h i to all elements: hihi element e 1, h i (e 1 )=-1 m1m1 m2m2 m4m4 m3m3 m5m5 … … +1 element e 2, h i (e 2 )=+1element e 3, h i (e 3 )=+1element e 4, h i (e 4 )=-1element e 5, h i (e 5 )=-1

PODC Mode: Basic Idea Intuition: bin containing mode tends to be larger Introduce counter c i for each element e i Go through hash functions h 1, h 2, … Function h j : Increment c i by number of elements in bin h j (e i ) Intuition: counter c 1 of mode will be largest after some time

PODC Compare Counters Compare counters c 1 and c 2 of elements e 1 and e 2 If h j (e 1 ) = h j (e 2 ), c 1 and c 2 increased by same amount Consider only j for which h j (e 1 ) h j (e 2 ) Change in c 1 – c 2 difference: where

PODC Counter Difference Given indep. Z 1, …, Z n, Pr(Z i = ® i )=Pr(Z i =- ® i )=1/2 Chernoff: H: set of hash function with h j (e 1 ) h j (e 2 ), |H|=s

PODC Counter Difference is called the 2 nd frequency moment Can make the same for all other counters: If h j (e 1 ) h j (e i ) for s hash fct.: h j (e 1 ) h j (e i ) for roughly 1/2 of all hash functions After considering O(F 2 /(m 1 –m 2 ) 2 ¢ log n) hash functions:  c 1 largest counter w.h.p.

PODC Distributed Implementation Assume, nodes know hash functions Bin sizes for each hash function: time O(D) (simply a sum) Update counter in time O(D) (root broadcasts bin sizes) We can pipeline computations for different hash functions Algorithm with time complexity: … only good if m 1 -m 2 large

PODC Improvement Only apply algorithm until w.h.p., c 1 > c i if m 1 ¸ 2m i Time: Apply simple deterministic algorithm for remaining elements #elements e i with m 1 ¸ 2m i : at most 4F 2 /m 1 2 Time of second phase:

PODC Improved Algorithm Many details missing (in particular: need to know F 2, m 1 ) Can be done (F 2 : use ideas from [Alon,Matias,Szegedy 1999]) If nodes have access to common random hash functions: Mode can be computed in time

PODC Random Hash Functions Still need mechanism that provides random hash functions Select functions in advance (hard-wired into alg):  algorithm does not work for all input distributions Choosing random hash function h : [K]  {-1,+1} requires sending O(K) bits  we want messages of size O(log K + log n)

PODC Quasi-Random Hash Functions Fix set H of hash functions s.t. |H|= O(poly(n,K)) such that H satisfies a set of uniformity conditions Choosing random hash function from H requires only O(log n + log K) bits. Show that algorithm still works if hash functions are from a set H that satisfies uniformity conditions

PODC Quasi-Random Hash Functions Possible to give a set of uniformity conditions that allow to prove that algorithm still works (quite involved…) Using probabilistic method: Show that a set H of size O(poly(n,K)) satisfying uniformity conditions exists.

PODC Distributed Computation of the Mode Lower bound based on generalization (by Alon et. al.) of set disjointness communication complexity lower bound by Razborov Theorem: The mode can be computed in time O(D+F 2 /m 1 2 ¢ log n) by a distributed algorithm. Theorem: The time needed to compute the mode by a distributed algorithm is at least  (D+F 5 /(m 1 5 ¢ log n)).

PODC Related Work Paper by Charikar, Chen, Farach-Colton: Finds element with frequency (1- ² ) ¢ m 1 in a streaming model with a different method It turns out: – Basic techniques of Charikar et. al. can be applied in distributed case – Our techniques can be applied in streaming model – Both techniques yield same results in both cases

PODC Conclusions: Obvious open problem: Close gap between upper and lower bound We believe: Upper bound is tight Proving that upper bound is tight would probably also prove a conjecture in [Alon,Matias,Szegedy 1999] regarding the space complexity of the computation of frequency moments in streaming models.

PODC Questions?