Distributed Clustering for Robust Aggregation in Large Networks Ittay Eyal, Idit Keidar, Raphi Rom Technion, Israel
Aggregation in Sensor Networks – Applications Temperature sensors thrown in the woods Seismic sensors Grid computing load
Aggregation in Sensor Networks – Applications Large networks with light nodes. Target is a function of all sensed data. Average, min, majority, etc.
Tree Aggregation and Gossip Hierarchical solution Fast - O(height of tree) Disadvantages: Limited to static topology. No failure robustness. D. Kempe, A. Dobra, and J. Gehrke. Gossip-based computation of aggregate information. In FOCS, S. Nath, P. B. Gibbons, S. Seshan, and Z. R. Anderson. Synopsis diffusion for robust aggregation in sensor networks. In SenSys,
Tree Aggregation and Gossip D. Kempe, A. Dobra, and J. Gehrke. Gossip-based computation of aggregate information. In FOCS, S. Nath, P. B. Gibbons, S. Seshan, and Z. R. Anderson. Synopsis diffusion for robust aggregation in sensor networks. In SenSys, Gossip: Each node maintains a synopsis.
Tree Aggregation and Gossip D. Kempe, A. Dobra, and J. Gehrke. Gossip-based computation of aggregate information. In FOCS, S. Nath, P. B. Gibbons, S. Seshan, and Z. R. Anderson. Synopsis diffusion for robust aggregation in sensor networks. In SenSys, Gossip: Each node maintains a synopsis. Occasionally, each node contacts a neighbor and they improve their synopses
Tree Aggregation and Gossip D. Kempe, A. Dobra, and J. Gehrke. Gossip-based computation of aggregate information. In FOCS, S. Nath, P. B. Gibbons, S. Seshan, and Z. R. Anderson. Synopsis diffusion for robust aggregation in sensor networks. In SenSys, Gossip: Each node maintains a synopsis. Indifferent to topology changes. Crash robust. Occasionally, each node contacts a neighbor and they improve their synopses
Outliers - Definition Samples deviating from the distribution of the bulk of the data.
Sources of Outliers Sensor Malfunction Short circuit in a seismic sensor Sensing Error An animal sitting on a temperature sensor Interesting Info: DDoS: Irregular load on some machines in a grid Software bugs: In grid computing, a machine reports negative CPU usage Interesting Info: Fire outbreak: Extremely high temperature in a certain area of the woods Interesting Info: intrusion: A truck driving by a seismic detector
The Implications of Outliers 4 A single erroneous sample can radically offset the data o The average (47 o ) doesn’t tell the whole story. 25 o 26 o 25 o 28 o 98 o 120 o 27 o
Outlier Detection Challenge A double bind: A centralized solution won’t cut it: Bandwidth limitations Power limitations Storage limitations No one in the system has enough information. Regular data distribution Outliers
Aggregate Clusters Each cluster has its own mean and mass. A bounded number ( k ) of clusters is maintained. Original samples abcd Aggregation Result Aggregation of a and b 1 13 (No compression) Aggregation of a, b and c Here k = 2
But what does the mean mean? New Sample Mean A Mean B The variance must be taken into account Gaussian A Gaussian B
Gossip Aggregation of Gaussian Clusters Each cluster is described by: Mass Mean Covariance matrix Distribution is described as k clusters. Nodes gossip: unify synopses by merging close clusters. ab += Merge
It works where it matters Not Interesting Easy
Protocol is Crash Robust Regular, 5% Crash probability per round Standard, No Crashes Outlier robust, with and without crashes
Describe Elaborate Data
Temperature sensors on a fence by the woods FireNo FireFireNo FireFireNo Fire
Summary Robust Aggregation requires outlier detection 27 o 98 o 120 o We present outlier detection by Gaussian clustering: Merge
Summary – Our Protocol Outlier Detection (where it’s important)Crash RobustnessElaborate Data
Ittay Eyal, Idit Keidar, Raphael Rom. Distributed Clustering for Robust Aggregation in Large Networks, Technion, 2009 Thank you