Distributed Clustering for Robust Aggregation in Large Networks Ittay Eyal, Idit Keidar, Raphi Rom Technion, Israel.

Slides:



Advertisements
Similar presentations
Applications of one-class classification
Advertisements

A Hierarchical Multiple Target Tracking Algorithm for Sensor Networks Songhwai Oh and Shankar Sastry EECS, Berkeley Nest Retreat, Jan
Gossip-Based Computation of Aggregation Information
Local L2-Thresholding Based Data Mining in Peer-to-Peer Systems Ran Wolff Kanishka Bhaduri Hillol Kargupta CSEE Dept, UMBC Presented by: Kanishka Bhaduri.
Teaser - Introduction to Distributed Computing
Towards an Exa-scale Operating System* Ely Levy, The Hebrew University *Work supported in part by a grant from the DFG program SPPEXA, project FFMK.
Sensor Network 教育部資通訊科技人才培育先導型計畫. 1.Introduction General Purpose  A wireless sensor network (WSN) is a wireless network using sensors to cooperatively.
Rumor Routing Algorithm For sensor Networks David Braginsky, Computer Science Department, UCLA Presented By: Yaohua Zhu CS691 Spring 2003.
Gossip Algorithms and Implementing a Cluster/Grid Information service MsSys Course Amar Lior and Barak Amnon.
Distributed Clustering for Robust Aggregation in Large Networks Ittay Eyal, Idit Keidar, Raphi Rom Technion, Israel.
Broadcasting Protocol for an Amorphous Computer Lukáš Petrů MFF UK, Prague Jiří Wiedermann ICS AS CR.
Dynamic Computations in Ever-Changing Networks Idit Keidar Technion, Israel 1Idit Keidar, TADDS Sep 2011.
Distributed Data Classification in Sensor Networks DE: Verteilte Daten-Klassifikation in Sensor-Netzwerken FR: Classification distribuée de données dans.
SYNOPSIS DIFFUSION For Robust Aggregation in Sensor Networks Suman Nath, Phillip B. Gibbons, Srinivasan Seshan, Zachary R. Anderson Presented by Xander.
1 LINK STATE PROTOCOLS (contents) Disadvantages of the distance vector protocols Link state protocols Why is a link state protocol better?
Differentiated Surveillance for Sensor Networks Ting Yan, Tian He, John A. Stankovic CS294-1 Jonathan Hui November 20, 2003.
DNA Research Group 1 CountTorrent: Ubiquitous Access to Query Aggregates in Dynamic and Mobile Sensor Networks Abhinav Kamra, Vishal Misra and Dan Rubenstein.
Distributed Regression: an Efficient Framework for Modeling Sensor Network Data Carlos Guestrin Peter Bodik Romain Thibaux Mark Paskin Samuel Madden.
Faculty of Electrical Engineering, Technion FuDiCo II G. Badishi & I. Keidar Towards Survivability of Application-Level Multicast Gal Badishi, Idit Keidar,
Mitigating routing misbehavior in ad hoc networks Mary Baker Departments of Computer Science and.
May 14, Organization Design and Dynamic Resources Huzaifa Zafar Computer Science Department University of Massachusetts, Amherst.
Geographic Gossip: Efficient Aggregations for Sensor Networks Author: Alex Dimakis, Anand Sarwate, Martin Wainwright University: UC Berkeley Venue: IPSN.
Idit Keidar, Principles of Reliable Distributed Systems, Technion EE, Spring Principles of Reliable Distributed Systems Lecture 5: Synchronous Uniform.
Chess Review May 11, 2005 Berkeley, CA Tracking Multiple Objects using Sensor Networks and Camera Networks Songhwai Oh EECS, UC Berkeley
Probabilistic Data Aggregation Ling Huang, Ben Zhao, Anthony Joseph Sahara Retreat January, 2004.
Network Layer Design Isues Store-and-Forward Packet Switching Services Provided to the Transport Layer The service should be independent of the router.
Large Scale File Distribution Troy Raeder & Tanya Peters.
Time-Decaying Sketches for Sensor Data Aggregation Graham Cormode AT&T Labs, Research Srikanta Tirthapura Dept. of Electrical and Computer Engineering.
 Idit Keidar, Technion Intel Academic Seminars, February Octopus A Fault-Tolerant and Efficient Ad-hoc Routing Protocol Idit Keidar, Technion Joint.
4-1 Network layer r transport segment from sending to receiving host r on sending side encapsulates segments into datagrams r on rcving side, delivers.
Routing.
A Local Facility Location Algorithm Supervisor: Assaf Schuster Denis Krivitski Technion – Israel Institute of Technology.
GS 3 GS 3 : Scalable Self-configuration and Self-healing in Wireless Networks Hongwei Zhang & Anish Arora.
Correctness of Gossip-Based Membership under Message Loss Maxim Gurevich, Idit Keidar Technion.
Computer Vision James Hays, Brown
P2P Architecture Case Study: Gnutella Network
Routing Algorithms (Ch5 of Computer Network by A. Tanenbaum)
Presented by Amira Ahmed El-Sharkawy Ibrahim.  There are six of eight turtle species in Ontario are listed as endangered, threatened or of special concern.
1 Reading Report 4 Yin Chen 26 Feb 2004 Reference: Peer-to-Peer Architecture Case Study: Gnutella Network, Matei Ruoeanu, In Int. Conf. on Peer-to-Peer.
Simultaneous Localization and Mapping Presented by Lihan He Apr. 21, 2006.
Growth Codes: Maximizing Sensor Network Data Persistence abhinav Kamra, Vishal Misra, Jon Feldman, Dan Rubenstein Columbia University, Google Inc. (SIGSOMM’06)
Dave McKenney 1.  Introduction  Algorithms/Approaches  Tiny Aggregation (TAG)  Synopsis Diffusion (SD)  Tributaries and Deltas (TD)  OPAG  Exact.
REED: Robust, Efficient Filtering and Event Detection in Sensor Networks Daniel Abadi, Samuel Madden, Wolfgang Lindner MIT United States VLDB 2005.
CountTorrent: Ubiquitous Access to Query Aggregates in Dynamic and Mobile Sensor Networks Abhinav Kamra, Vishal Misra and Dan Rubenstein - Columbia University.
Secure In-Network Aggregation for Wireless Sensor Networks
MMAC: A Mobility- Adaptive, Collision-Free MAC Protocol for Wireless Sensor Networks Muneeb Ali, Tashfeen Suleman, and Zartash Afzal Uzmi IEEE Performance,
Teknik Routing Pertemuan 10 Matakuliah: H0524/Jaringan Komputer Tahun: 2009.
DISTIN: Distributed Inference and Optimization in WSNs A Message-Passing Perspective SCOM Team
1/18/2016Atomic Scale Simulation1 Definition of Simulation What is a simulation? –It has an internal state “S” In classical mechanics, the state = positions.
Ahmad Salam AlRefai.  Introduction  System Features  General Overview (general process)  Details of each component  Simulation Results  Considerations.
Self-Organizing Maps (SOM) (§ 5.5)
Detection, Classification and Tracking in Distributed Sensor Networks D. Li, K. Wong, Y. Hu and A. M. Sayeed Dept. of Electrical & Computer Engineering.
Global Clock Synchronization in Sensor Networks Qun Li, Member, IEEE, and Daniela Rus, Member, IEEE IEEE Transactions on Computers 2006 Chien-Ku Lai.
Energy-Efficient Signal Processing and Communication Algorithms for Scalable Distributed Fusion.
Submitted by: Sounak Paul Computer Science & Engineering 4 th Year, 7 th semester Roll No:
COMMUNICATING VIA FIREFLIES: GEOGRAPHIC ROUTING ON DUTY-CYCLED SENSORS S. NATH, P. B. GIBBONS IPSN 2007.
Distributed cooperation and coordination using the Max-Sum algorithm
1 Roie Melamed, Technion AT&T Labs Araneola: A Scalable Reliable Multicast System for Dynamic Wide Area Environments Roie Melamed, Idit Keidar Technion.
Optimization-based Cross-Layer Design in Networked Control Systems Jia Bai, Emeka P. Eyisi Yuan Xue and Xenofon D. Koutsoukos.
CS 5565 Network Architecture and Protocols
Scalable Load-Distance Balancing
Distributed Computing
Vineet Mittal Should more be added here Committee Members:
Energy-Efficient Communication Protocol for Wireless Microsensor Networks by Wendi Rabiner Heinzelman, Anantha Chandrakasan, and Hari Balakrishnan Presented.
Routing.
湖南大学-信息科学与工程学院-计算机与科学系
Infer: A Bayesian Inference Approach towards Energy Efficient Data Collection in Dense Sensor Networks. G. Hartl and B.Li In Proc. of ICDCS Natalia.
Presented by: Yang Yu Spatiotemporal GMM for Background Subtraction with Superpixel Hierarchy Mingliang Chen, Xing Wei, Qingxiong.
Coverage and Connectivity in Sensor Networks
Routing.
Presentation transcript:

Distributed Clustering for Robust Aggregation in Large Networks Ittay Eyal, Idit Keidar, Raphi Rom Technion, Israel

Aggregation in Sensor Networks – Applications Temperature sensors thrown in the woods Seismic sensors Grid computing load 2

Aggregation in Sensor Networks – Applications Large networks, light nodes, low bandwidth Fault-prone sensors, network Multi-dimensional (location X temperature) Target is a function of all sensed data Average temperature, max location, majority… 3

What has been done?

Tree Aggregation Hierarchical solution Fast - O(height of tree)

Tree Aggregation Hierarchical solution Fast - O(height of tree)  Limited to static topology  No failure robustness

Gossip D. Kempe, A. Dobra, and J. Gehrke. Gossip-based computation of aggregate information. In FOCS, S. Nath, P. B. Gibbons, S. Seshan, and Z. R. Anderson. Synopsis diffusion for robust aggregation in sensor networks. In SenSys, Gossip: Each node maintains a synopsis

Gossip D. Kempe, A. Dobra, and J. Gehrke. Gossip-based computation of aggregate information. In FOCS, S. Nath, P. B. Gibbons, S. Seshan, and Z. R. Anderson. Synopsis diffusion for robust aggregation in sensor networks. In SenSys, Gossip: Each node maintains a synopsis Occasionally, each node contacts a neighbor and they improve their synopses

Gossip D. Kempe, A. Dobra, and J. Gehrke. Gossip-based computation of aggregate information. In FOCS, S. Nath, P. B. Gibbons, S. Seshan, and Z. R. Anderson. Synopsis diffusion for robust aggregation in sensor networks. In SenSys, Gossip: Each node maintains a synopsis Occasionally, each node contacts a neighbor and they improve their synopses Indifferent to topology changes Crash robust  No data error robustness Proven convergence

A closer look at the problem

The Implications of Irregular Data 4 A single erroneous sample can radically offset the data o The average (47 o ) doesn’t tell the whole story 25 o 26 o 25 o 28 o 98 o 120 o 27 o 11

Sources of Irregular Data Sensor Malfunction Short circuit in a seismic sensor Sensing Error An animal sitting on a temperature sensor Interesting Info: DDoS: Irregular load on some machines in a grid Software bugs: In grid computing, a machine reports negative CPU usage Interesting Info: Fire outbreak: Extremely high temperature in a certain area of the woods Interesting Info: intrusion: A truck driving by a seismic detector 12

It Would Help to Know The Data Distribution 27 o The average is 47 o Bimodal distribution with peaks at 26.3 o and 109 o 25 o 26 o 25 o 28 o 98 o 120 o 27 o 13

Estimate a range of distributions [1,2] or data clustering according to values [3,4] Fast aggregation [1,2] Tolerate crash failures, dynamic networks [1,2]  High bandwidth [3,4], multi-epoch [2,3,4] or  One dimensional data only [1,2]  No data error robustness [1,2] Existing Distribution Estimation Solutions 1.M. Haridasan and R. van Renesse. Gossip-based distribution estimation in peer-to-peer networks. In InternationalWorkshop on Peer-to-Peer Systems (IPTPS 08), February J. Sacha, J. Napper, C. Stratan, and G. Pierre. Reliable distribution estimation in decentralised environments. Submitted for Publication, W. Kowalczyk and N. A. Vlassis. Newscast em. In Neural Information Processing Systems, N. A. Vlassis, Y. Sfakianakis, and W. Kowalczyk. Gossip-based greedy gaussian mixture learning. In Panhellenic Conference on Informatics,

Our Solution Samples deviating from the distribution of the bulk of the data Outliers: 15 Estimate a range of distributions by data clustering according to values Fast aggregation Tolerate crash failures, dynamic networks Low bandwidth, single epoch Multi-dimensional data Data error robustness by outlier detection

Outlier Detection Challenge 27 o 25 o 26 o 25 o 28 o 98 o 120 o 27 o 16

Outlier Detection Challenge A double bind: 27 o 25 o 26 o 25 o 28 o 98 o 120 o 27 o Regular data distribution ~26 o Outliers {98 o, 120 o } No one in the system has enough information 17

Aggregating Data Into Clusters Each cluster has its own mean and mass A bounded number ( k ) of clusters is maintained Here k = 2 Original samples abcd 1 Clustering a and b 13 ab Clustering all abc d 3 Clustering a, b and c ab c

But What Does The Mean Mean? New Sample Mean A Mean B The variance must be taken into account Gaussian A Gaussian B 19

Gossip Aggregation of Gaussian Clusters Distribution is described as k clusters Each cluster is described by: Mass Mean Covariance matrix (variance for 1-d data) 20

Gossip Aggregation of Gaussian Clusters a b Merge 21 Keep half, Send half

Distributed Clustering for Robust Aggregation 22 Aggregate a mixture of Gaussian clusters Merge when necessary (exceeding k) Our solution: Recognize outliers By the time we need to merge, we can estimate the distribution

Simulation Results 23 1.Data error robustness 2.Crash robustness 3.Elaborate multidimensional data Simulation Results:

It Works Where It Matters Not Interesting Easy 24

It Works Where It Matters Error No outlier detection With outlier detection 25

Simulation Results 26 1.Data error robustness 2.Crash robustness 3.Elaborate multidimensional data Simulation Results:

Error Round Protocol is Crash Robust No outlier detection, 5% crash probability No outlier detection, no crashes Outlier detection 27

Simulation Results 28 1.Data error robustness 2.Crash robustness 3.Elaborate multidimensional data Simulation Results:

Describe Elaborate Data Fire No Fire Distance Temperature 29

The algorithm converges Eventually all nodes have the same clusters forever Note: this holds even without atomic actions The invariant is preserved by both send and receive Theoretical Results (In Progress) 30 … to the “right” output If outliers are “far enough” from other samples, then they are never mixed into non-outlier clusters They are discovered They do not bias the good samples’ aggregate (where it matters)

Summary Robust Aggregation requires outlier detection 27 o 98 o 120 o 31 We present outlier detection by Gaussian clustering: Merge

Summary – Our Protocol 32 Elaborate Data Crash Robustness Outlier Detection (where it matters)

Protocol is Crash Robust Simulation round: each node performs one gossip step After each round, 5% crash probability No message loss or corruption 33