Presentation is loading. Please wait.

Presentation is loading. Please wait.

Distributed Clustering for Robust Aggregation in Large Networks Ittay Eyal, Idit Keidar, Raphi Rom Technion, Israel.

Similar presentations


Presentation on theme: "Distributed Clustering for Robust Aggregation in Large Networks Ittay Eyal, Idit Keidar, Raphi Rom Technion, Israel."— Presentation transcript:

1 Distributed Clustering for Robust Aggregation in Large Networks Ittay Eyal, Idit Keidar, Raphi Rom Technion, Israel

2 Aggregation in Sensor Networks – Applications Temperature sensors thrown in the woods Seismic sensors Grid computing load 2

3 Aggregation in Sensor Networks – Applications Large networks, light nodes, low bandwidth Fault-prone sensors, network Multi-dimensional (location X temperature) Target is a function of all sensed data Average temperature, max location, majority… 3

4 What has been done?

5 Tree Aggregation Hierarchical solution Fast - O(height of tree) 1 3 9 11 2 10 5 6

6 Tree Aggregation Hierarchical solution Fast - O(height of tree)  Limited to static topology  No failure robustness 1 3 9 11 2 10 6 62

7 Gossip D. Kempe, A. Dobra, and J. Gehrke. Gossip-based computation of aggregate information. In FOCS, 2003. S. Nath, P. B. Gibbons, S. Seshan, and Z. R. Anderson. Synopsis diffusion for robust aggregation in sensor networks. In SenSys, 2004. Gossip: Each node maintains a synopsis 7 11 9 3 1

8 Gossip D. Kempe, A. Dobra, and J. Gehrke. Gossip-based computation of aggregate information. In FOCS, 2003. S. Nath, P. B. Gibbons, S. Seshan, and Z. R. Anderson. Synopsis diffusion for robust aggregation in sensor networks. In SenSys, 2004. Gossip: Each node maintains a synopsis Occasionally, each node contacts a neighbor and they improve their synopses 8 11 9 3 17 5 5 7

9 Gossip D. Kempe, A. Dobra, and J. Gehrke. Gossip-based computation of aggregate information. In FOCS, 2003. S. Nath, P. B. Gibbons, S. Seshan, and Z. R. Anderson. Synopsis diffusion for robust aggregation in sensor networks. In SenSys, 2004. Gossip: Each node maintains a synopsis Occasionally, each node contacts a neighbor and they improve their synopses Indifferent to topology changes Crash robust  No data error robustness Proven convergence 9 7 5 5 7 6 6 6 6

10 A closer look at the problem

11 The Implications of Irregular Data 4 A single erroneous sample can radically offset the data 3 1 10 6 27 o The average (47 o ) doesn’t tell the whole story 25 o 26 o 25 o 28 o 98 o 120 o 27 o 11

12 Sources of Irregular Data Sensor Malfunction Short circuit in a seismic sensor Sensing Error An animal sitting on a temperature sensor Interesting Info: DDoS: Irregular load on some machines in a grid Software bugs: In grid computing, a machine reports negative CPU usage Interesting Info: Fire outbreak: Extremely high temperature in a certain area of the woods Interesting Info: intrusion: A truck driving by a seismic detector 12

13 It Would Help to Know The Data Distribution 27 o The average is 47 o Bimodal distribution with peaks at 26.3 o and 109 o 25 o 26 o 25 o 28 o 98 o 120 o 27 o 13

14 Estimate a range of distributions [1,2] or data clustering according to values [3,4] Fast aggregation [1,2] Tolerate crash failures, dynamic networks [1,2]  High bandwidth [3,4], multi-epoch [2,3,4] or  One dimensional data only [1,2]  No data error robustness [1,2] Existing Distribution Estimation Solutions 1.M. Haridasan and R. van Renesse. Gossip-based distribution estimation in peer-to-peer networks. In InternationalWorkshop on Peer-to-Peer Systems (IPTPS 08), February 2008. 2.J. Sacha, J. Napper, C. Stratan, and G. Pierre. Reliable distribution estimation in decentralised environments. Submitted for Publication, 2009. 3.W. Kowalczyk and N. A. Vlassis. Newscast em. In Neural Information Processing Systems, 2004. 4.N. A. Vlassis, Y. Sfakianakis, and W. Kowalczyk. Gossip-based greedy gaussian mixture learning. In Panhellenic Conference on Informatics, 2005. 14

15 Our Solution Samples deviating from the distribution of the bulk of the data Outliers: 15 Estimate a range of distributions by data clustering according to values Fast aggregation Tolerate crash failures, dynamic networks Low bandwidth, single epoch Multi-dimensional data Data error robustness by outlier detection

16 Outlier Detection Challenge 27 o 25 o 26 o 25 o 28 o 98 o 120 o 27 o 16

17 Outlier Detection Challenge A double bind: 27 o 25 o 26 o 25 o 28 o 98 o 120 o 27 o Regular data distribution ~26 o Outliers {98 o, 120 o } No one in the system has enough information 17

18 Aggregating Data Into Clusters Each cluster has its own mean and mass A bounded number ( k ) of clusters is maintained Here k = 2 Original samples 13510 1 abcd 1 Clustering a and b 13 ab Clustering all 13510 1 abc d 3 Clustering a, b and c 510 1 2 ab c 2 1 18

19 But What Does The Mean Mean? New Sample Mean A Mean B The variance must be taken into account Gaussian A Gaussian B 19

20 Gossip Aggregation of Gaussian Clusters Distribution is described as k clusters Each cluster is described by: Mass Mean Covariance matrix (variance for 1-d data) 20

21 Gossip Aggregation of Gaussian Clusters a b Merge 21 Keep half, Send half

22 Distributed Clustering for Robust Aggregation 22 Aggregate a mixture of Gaussian clusters Merge when necessary (exceeding k) Our solution: Recognize outliers By the time we need to merge, we can estimate the distribution

23 Simulation Results 23 1.Data error robustness 2.Crash robustness 3.Elaborate multidimensional data Simulation Results:

24 It Works Where It Matters Not Interesting Easy 24

25 It Works Where It Matters Error No outlier detection With outlier detection 25

26 Simulation Results 26 1.Data error robustness 2.Crash robustness 3.Elaborate multidimensional data Simulation Results:

27 Error Round Protocol is Crash Robust No outlier detection, 5% crash probability No outlier detection, no crashes Outlier detection 27

28 Simulation Results 28 1.Data error robustness 2.Crash robustness 3.Elaborate multidimensional data Simulation Results:

29 Describe Elaborate Data Fire No Fire Distance Temperature 29

30 The algorithm converges Eventually all nodes have the same clusters forever Note: this holds even without atomic actions The invariant is preserved by both send and receive Theoretical Results (In Progress) 30 … to the “right” output If outliers are “far enough” from other samples, then they are never mixed into non-outlier clusters They are discovered They do not bias the good samples’ aggregate (where it matters)

31 Summary Robust Aggregation requires outlier detection 27 o 98 o 120 o 31 We present outlier detection by Gaussian clustering: Merge

32 Summary – Our Protocol 32 Elaborate Data Crash Robustness Outlier Detection (where it matters)

33 Protocol is Crash Robust Simulation round: each node performs one gossip step After each round, 5% crash probability No message loss or corruption 33


Download ppt "Distributed Clustering for Robust Aggregation in Large Networks Ittay Eyal, Idit Keidar, Raphi Rom Technion, Israel."

Similar presentations


Ads by Google