Continuous Fragmented Skylines over Distributed Streams Odysseas Papapetrou and Minos Garofalakis SoftNet laboratory, Technical University of Crete.

Slides:

Advertisements

Similar presentations

SkipNet: A Scalable Overlay Network with Practical Locality Properties Nick Harvey, Mike Jones, Stefan Saroiu, Marvin Theimer, Alec Wolman Microsoft Research.

Advertisements

Wenke Lee and Nick Feamster Georgia Tech Botnet and Spam Detection in High-Speed Networks.

Wenke Lee and Nick Feamster Georgia Tech Botnet and Spam Detection in High-Speed Networks.

Sketch-based Querying of Distributed Sliding-window Data Streams

Topology-Aware Overlay Construction and Server Selection Sylvia Ratnasamy Mark Handley Richard Karp Scott Shenker Infocom 2002.

Load Management and High Availability in Borealis Magdalena Balazinska, Jeong-Hyon Hwang, and the Borealis team MIT, Brown University, and Brandeis University.

Detecting Spam Zombies by Monitoring Outgoing Messages Zhenhai Duan Department of Computer Science Florida State University.

Jan SedmidubskyOctober 28, 2011Scalability and Robustness in a Self-organizing Retrieval System Jan Sedmidubsky Vlastislav Dohnal Pavel Zezula On Investigating.

Aggregating local image descriptors into compact codes

VLDB 2011 Pohang University of Science and Technology (POSTECH) Republic of Korea Jongwuk Lee, Seung-won Hwang VLDB 2011.

Probabilistic Skyline Operator over Sliding Windows Wenjie Zhang University of New South Wales & NICTA, Australia Joint work: Xuemin Lin, Ying Zhang, Wei.

New Directions in Traffic Measurement and Accounting Cristian Estan – UCSD George Varghese - UCSD Reviewed by Michela Becchi Discussion Leaders Andrew.

Active Learning for Streaming Networked Data Zhilin Yang, Jie Tang, Yutao Zhang Computer Science Department, Tsinghua University.

CPSC 502, Lecture 15Slide 1 Introduction to Artificial Intelligence (AI) Computer Science cpsc502, Lecture 15 Nov, 1, 2011 Slide credit: C. Conati, S.

Fast Algorithms For Hierarchical Range Histogram Constructions

Bidding Protocols for Deploying Mobile Sensors Reporter: Po-Chung Shih Computer Science and Information Engineering Department Fu-Jen Catholic University.

Efficient Constraint Monitoring Using Adaptive Thresholds Srinivas Kashyap, IBM T. J. Watson Research Center Jeyashankar Ramamirtham, Netcore Solutions.

Distributed Indexing and Querying in Sensor Networks using Statistical Models Arnab Bhattacharya Indian Institute of Technology (IIT),

Answering Metric Skyline Queries by PM-tree Tomáš Skopal, Jakub Lokoč Department of Software Engineering, FMP, Charles University in Prague.

Using Structure Indices for Efficient Approximation of Network Properties Matthew J. Rattigan, Marc Maier, and David Jensen University of Massachusetts.

Probabilistic Aggregation in Distributed Networks Ling Huang, Ben Zhao, Anthony Joseph and John Kubiatowicz {hling, ravenben, adj,

Streaming Algorithms for Robust, Real- Time Detection of DDoS Attacks S. Ganguly, M. Garofalakis, R. Rastogi, K. Sabnani Krishan Sabnani Bell Labs Research.

Vivaldi Coordinate Service Justin Ma, Patrick Verkaik, Michael Vrable Department of Computer Science And Engineering UCSD CSE222A, Winter 2005.

1 In-Network PCA and Anomaly Detection Ling Huang* XuanLong Nguyen* Minos Garofalakis § Michael Jordan* Anthony Joseph* Nina Taft § *UC Berkeley § Intel.

1 Reversible Sketches for Efficient and Accurate Change Detection over Network Data Streams Robert Schweller Ashish Gupta Elliot Parsons Yan Chen Computer.

Report on Intrusion Detection and Data Fusion By Ganesh Godavari.

Reverse Hashing for High-speed Network Monitoring: Algorithms, Evaluation, and Applications Robert Schweller 1, Zhichun Li 1, Yan Chen 1, Yan Gao 1, Ashish.

Communication-Efficient Distributed Monitoring of Thresholded Counts Ram Keralapura, UC-Davis Graham Cormode, Bell Labs Jai Ramamirtham, Bell Labs.

SEBD Tutorial, June Monitoring Distributed Streams Joint works with Tsachi Scharfman, Daniel Keren.

Aggregation in Sensor Networks NEST Weekly Meeting Sam Madden Rob Szewczyk 10/4/01.

Geographic Gossip: Efficient Aggregations for Sensor Networks Author: Alex Dimakis, Anand Sarwate, Martin Wainwright University: UC Berkeley Venue: IPSN.

Multi-Scale Analysis for Network Traffic Prediction and Anomaly Detection Ling Huang Joint work with Anthony Joseph and Nina Taft January, 2005.

Cumulative Violation For any window size  t  Communication-Efficient Tracking for Distributed Cumulative Triggers Ling Huang* Minos Garofalakis.

Detecting SYN-Flooding Attacks ＡａｒｏｎＢｅａｃｈＣＳ３９５ＮｅｔｗｏｒｋＳｅｃｕｒｉｔｙＳｐｒｉｎｇ２００４.

Collaborating Against Common Enemies Sachin Katti Balachander Krishnamurthy and Dina Katabi AT&T Labs-Research & MIT CSAIL.

Models and Issues in Data Streaming Presented By :- Ankur Jain Department of Computer Science 6/23/03 A list of relevant papers is available at

Not All Microseconds are Equal: Fine-Grained Per-Flow Measurements with Reference Latency Interpolation Myungjin Lee †, Nick Duffield‡, Ramana Rao Kompella†

Decentralised Coordination of Mobile Sensors School of Electronics and Computer Science University of Southampton Ruben Stranders,

A Statistical Anomaly Detection Technique based on Three Different Network Features Yuji Waizumi Tohoku Univ.

PIC: Practical Internet Coordinates for Distance Estimation Manuel Costa joint work with Miguel Castro, Ant Rowstron, Peter Key Microsoft Research Cambridge.

online convex optimization (with partial information)

An Integration Framework for Sensor Networks and Data Stream Management Systems.

10/5/ Geometric Approach Geometric Interpretation: Geometric Interpretation: Each node holds a statistics vector Each node holds a statistics vector.

Tracking with Unreliable Node Sequences Ziguo Zhong, Ting Zhu, Dan Wang and Tian He Computer Science and Engineering, University of Minnesota Infocom 2009.

Department of Information Engineering The Chinese University of Hong Kong A Framework for Monitoring and Measuring a Large-Scale Distributed System in.

Boundary Recognition in Sensor Networks by Topology Methods Yue Wang, Jie Gao Dept. of Computer Science Stony Brook University Stony Brook, NY Joseph S.B.

Report on Intrusion Detection and Data Fusion By Ganesh Godavari.

Constructing Optimal Wavelet Synopses Dimitris Sacharidis Timos Sellis

Energy-Efficient Signal Processing and Communication Algorithms for Scalable Distributed Fusion.

1 LD-Sketch: A Distributed Sketching Design for Accurate and Scalable Anomaly Detection in Network Data Streams Qun Huang and Patrick P. C. Lee The Chinese.

Mohamed Hefeeda 1 School of Computing Science Simon Fraser University, Canada Efficient k-Coverage Algorithms for Wireless Sensor Networks Mohamed Hefeeda.

Multi-user Broadcast Authentication in Wireless Sensor Networks Kui Ren, Wenjing Lou, Yanchao Zhang SECON2007 Manar Mahmoud Abou elwafa.

ICDCS 2014 Madrid, Spain 30 June-3 July 2014

By: Gang Zhou Computer Science Department University of Virginia 1 Medians and Beyond: New Aggregation Techniques for Sensor Networks CS851 Seminar Presentation.

Location Privacy Protection for Location-based Services CS587x Lecture Department of Computer Science Iowa State University.

An Adaptive Zone-based Storage Architecture for Wireless Sensor Networks Thang Nam Le, Dong Xuan and *Wei Yu Department of Computer Science and Engineering,

Global Clock Synchronization in Sensor Networks Qun Li, Member, IEEE, and Daniela Rus, Member, IEEE IEEE Transactions on Computers 2006 Chien-Ku Lai.

On Mobile Sink Node for Target Tracking in Wireless Sensor Networks Thanh Hai Trinh and Hee Yong Youn Pervasive Computing and Communications Workshops(PerComW'07)

Network Anomaly Detection Using Autonomous System Flow Aggregates Thienne Johnson 1,2 and Loukas Lazos 1 1 Department of Electrical and Computer Engineering.

Kalman Filter and Data Streaming Presented By :- Ankur Jain Department of Computer Science 7/21/03.

Wireless Sensor Networks: A Survey I. F. Akyildiz, W. Su, Y. Sankarasubramaniam and E. Cayirci.

Continuous Monitoring of Distributed Data Streams over a Time-based Sliding Window MADALGO – Center for Massive Data Algorithmics, a Center of the Danish.

Geometric Approach Geometric Interpretation:

Talal H. Noor, Quan Z. Sheng, Lina Yao,

Location Cloaking for Location Safety Protection of Ad Hoc Networks

Quality-aware Aggregation & Predictive Analytics at the Edge

Sublinear Algorithmic Tools 2

DDoS Attack Detection under SDN Context

Mapping Internet Sensors With Probe Response Attacks

Range-Efficient Computation of F0 over Massive Data Streams

Presentation transcript:

Continuous Fragmented Skylines over Distributed Streams Odysseas Papapetrou and Minos Garofalakis SoftNet laboratory, Technical University of Crete

New requirements for skylines Distributed and P2P algorithms, tracking of skylines, etc. Continuous monitoring of functional skylines with data fragmentation Volatile data: sensor networks, network monitoring, financial streams Skyline tracking essential Data points fragmented over the network: no single node has knowledge of each points coordinates Coordinates of each point computed by aggregation Skyline dimensions computed through (possibly) non-linear functions over the aggregate data

Example Weather sensors spread over the US Skyline of states with the most extreme weather situations Lowest temperature, highest humidity Lowest temperature, lowest dew-point (dew-point=f(temperature, humidity)) Average values over all sensors at each state

Challenges Distributed data Data points are fragmented cannot apply distributed skyline techniques Non-linear functions Direction of the local update not the same as direction of the change in the skyline space Impossible to filter out local updates Network cost Prohibitive for voluminous streams Financial streams - stock ticks (80 Million updates per second) Network packet monitoring (up to 100Gbps) Sensors (arbitrary frequency)

Our Contribution First work to address continuous fragmented functional skyline monitoring Decompose skyline monitoring to a set of threshold crossing queries Monitor using the Geometric Method Minimize the number of queries Novel adaptive combination of streaming/geometric scheme Stochastic model Observes the sites behavior Switches to the most efficient monitoring scheme

Geometry to the rescue The geometric method [SIGMOD06, TODS07] Distributed monitoring of threshold crossing queries with fragmented data Detect when where is the aggregate value, for arbitrary Key idea: Cannot monitor the range monitor domain Any convex aggregate is within the balls with center and radius Check if for all in all balls Last known average Drift of x at node i Current average of x Unknown

Monitoring of fragmented skylines Decompose skyline monitoring to threshold queries PIVOT : Check relative positioning of each object to fixed pivot points Pivot points defined in range space DIRECT : Check relative positioning of each pair of objects in range space f(.) Average values e.g., avg #packets, tr.vol. per IP address PIVOT DIRECT

The PIVOT method Check relative positioning of each object to fixed pivot points Pivot points – mid points between two objects in f() space Geometric method to determine threshold crossings Example: function vector f: R 2 R 2 f(.) Average values e.g., avg #packets, tr.vol. per IP address

The PIVOT method Check relative positioning of each object to fixed pivot points Pivot points – mid points between two objects in f() space Geometric method to determine threshold crossings Example: function vector f: R 2 R 2 f(.) Average values e.g., avg #packets, tr.vol. per IP address

The PIVOT method Handling of threshold crossings Synchronization: Collect updated statistics for violating object Partial: updates at some nodes cancel out partial average not causing threshold crossings Full: recompute skyline and update threshold queries Full algorithm Initialization: collect statistics and compute initial skyline Extract threshold queries and broadcast to nodes Threshold crossing initiate synchronization process.

The DIRECT method Check relative positioning of each pair of objects No fixed pivot points possibly more slack for movement Threshold queries constructed on pairs of objects g(o 1 |o 2 )=f(o 1 )-f(o 2 ) -- dimensions of function double Threshold crossing when sign of g(o 1 |o 2 )[.] changes Example with 1-dim. objects: g(.)

Example for PIVOT Group pivot points p 1,5 and p 1,6 grouped to p 1,G Keep most restricting pivot points p 1,5, p 1,6, p 1,G dominated by p 1,4 Total queries reduced to O(n) Same principles apply for DIRECT Composite objects Reducing the number of queries

Only for PIVOT Some queries are just too tight frequent threshold crossings Frequent synchronization more expensive than streaming Identify these queries and set the corresponding objects to streaming mode Cost model based on random walks and statistics Adaptively switches between streaming and geometric scheme Cannot be used in DIRECT Objects always examined in pairs Adaptive method: Streaming vs Geometric

Experimental evaluation Baseline: All updates streamed to a coordinator Measure network efficiency Transfer volume and number of messages Accuracy always 100% Data sets: Real-world and synthetic Up to 94 Million updates, 5000 sites, objects Functions used: Identity: Variance: Euclidean norm: L2 distance in 4 dimensions:

Synthetic data sets Cost presented as ratio of baseline dimensions at domain space 2 functions Identity Variance Euclidean norm L2 distance

Conclusions First work of Continuous Fragmented Skylines Objects are fragmented over the network Skyline dimensions defined through arbitrary functions Continuous maintenance PIVOT and DIRECT Decomposition of fragmented skyline maintenance to threshold crossing queries Use of Geometric Method to monitor these queries Optimizations Reduction of queries to O(n) Adaptive monitoring based on novel cost model Scalable and efficient Orders of magnitude network improvement compared to streaming

Thank you for your attention Questions? Work partially supported by: LIFT: USING LOCAL INFERENCE IN MASSIVELY DISTRIBUTED SYSTEMS

Skylines 101 Buying a used car It should be cheap But it should not be too old And... Let the user decide on the trade-off of cheap and not too old price age high low highlow worst best

Example Network monitoring at the edge routers #packets Tr.vol. P2P DDoS attack DoS attack Raw data routertarget IP#packetsvol *.* *.* *.* *.* *.* *.* ………… Dimensions target IP#packetsvol.var(vol.) *.* *.* *.* *.* ………… #packets Var(Tr.vol.) DDoS attack

Synthetic data sets 1000 sites 2000 objects 10 Million updates 2-4 functions

Synthetic data sets 2000 objects updates per site/object 2 dimensions

Real world data sets W EATHER : NOAA weather data ( ) ~94 million readings 5423 sensors, 257 countries Sensors monitor only one object! M OVIES : Movielens movie ratings 10 million ratings movies users assigned to 200 sites Winter 2010/11