The max-divergence of E’ is: Intuitively, p-divergence of d means that the probability of at least X E’,p edges occurring p-recently is 1/d A (maximal)

Slides:



Advertisements
Similar presentations
CSE 211 Discrete Mathematics
Advertisements

Complex Networks for Representation and Characterization of Images For CS790g Project Bingdong Li 9/23/2009.
Mobile Communication Networks Vahid Mirjalili Department of Mechanical Engineering Department of Biochemistry & Molecular Biology.
Lecture 7. Network Flows We consider a network with directed edges. Every edge has a capacity. If there is an edge from i to j, there is an edge from.
Size-estimation framework with applications to transitive closure and reachability Presented by Maxim Kalaev Edith Cohen AT&T Bell Labs 1996.
Dynamic Graph Algorithms - I
Active Learning for Streaming Networked Data Zhilin Yang, Jie Tang, Yutao Zhang Computer Science Department, Tsinghua University.
Anomaly Detection in Communication Networks Brian Thompson James Abello.
1 Connectivity Structure of Bipartite Graphs via the KNC-Plot Erik Vee joint work with Ravi Kumar, Andrew Tomkins.
A Randomized Linear-Time Algorithm to Find Minimum Spanning Trees David R. Karger David R. Karger Philip N. Klein Philip N. Klein Robert E. Tarjan.
Ordering and Consistent Cuts Presented By Biswanath Panda.
CSE 780 Algorithms Advanced Algorithms Minimum spanning tree Generic algorithm Kruskal’s algorithm Prim’s algorithm.
Greedy Algorithms Reading Material: Chapter 8 (Except Section 8.5)
Streaming Models and Algorithms for Communication and Information Networks Brian Thompson (joint work with James Abello)
CSE 222 Systems Programming Graph Theory Basics Dr. Jim Holten.
Lecture 11. Matching A set of edges which do not share a vertex is a matching. Application: Wireless Networks may consist of nodes with single radios,
Computer Science Sampling Biases in IP Topology Measurements John Byers with Anukool Lakhina, Mark Crovella and Peng Xie Department of Computer Science.
Minimum Spanning Trees1 JFK BOS MIA ORD LAX DFW SFO BWI PVD
© 2004 Goodrich, Tamassia Minimum Spanning Trees1 Minimum Spanning Trees (§ 13.7) Spanning subgraph Subgraph of a graph G containing all the vertices of.
BotGraph: Large Scale Spamming Botnet Detection Yao Zhao Yinglian Xie *, Fang Yu *, Qifa Ke *, Yuan Yu *, Yan Chen and Eliot Gillum ‡ EECS Department,
Lecture 11. Matching A set of edges which do not share a vertex is a matching. Application: Wireless Networks may consist of nodes with single radios,
Algorithm: For all e E t, define X e = {w e if e G t, 1 - w e otherwise}. Measure likelihood of substructure S by. Flag S as anomalous if, where is an.
LLNL-PRES This work was performed under the auspices of the U.S. Department of Energy by Lawrence Livermore National Laboratory under Contract DE-AC52-07NA27344.
Sublinear time algorithms Ronitt Rubinfeld Computer Science and Artificial Intelligence Laboratory (CSAIL) Electrical Engineering and Computer Science.
Neighbourhood Sampling for Local Properties on a Graph Stream A. Pavan, Iowa State University Kanat Tangwongsan, IBM Research Srikanta Tirthapura, Iowa.
EM and expected complete log-likelihood Mixture of Experts
Network Aware Resource Allocation in Distributed Clouds.
Graph Algorithms for Irregular, Unstructured Data John Feo Center for Adaptive Supercomputing Software Pacific Northwest National Laboratory July, 2010.
Demo. Overview Overall the project has two main goals: 1) Develop a method to use sensor data to determine behavior probability. 2) Use the behavior probability.
GRAPHS CSE, POSTECH. Chapter 16 covers the following topics Graph terminology: vertex, edge, adjacent, incident, degree, cycle, path, connected component,
Scalable and Efficient Data Streaming Algorithms for Detecting Common Content in Internet Traffic Minho Sung Networking & Telecommunications Group College.
Scalable Analysis of Distributed Workflow Traces Daniel K. Gunter and Brian Tierney Distributed Systems Department Lawrence Berkeley National Laboratory.
Efficient Data Mining for Calling Path Patterns in GSM Networks Information Systems, accepted 5 December 2002 SPEAKER: YAO-TE WANG ( 王耀德 )
Clustering Spatial Data Using Random Walk David Harel and Yehuda Koren KDD 2001.
Efficient Deployment Algorithms for Prolonging Network Lifetime and Ensuring Coverage in Wireless Sensor Networks Yong-hwan Kim Korea.
Association Rules with Graph Patterns Yinghui Wu Washington State University Wenfei Fan Jingbo Xu University of Edinburgh Southwest Jiaotong University.
© 2010 AT&T Intellectual Property. All rights reserved. AT&T, the AT&T logo and all other AT&T marks contained herein are trademarks of AT&T Intellectual.
Threshold Phenomena and Fountain Codes Amin Shokrollahi EPFL Joint work with M. Luby, R. Karp, O. Etesami.
A Discrepancy Detector James Abello, CCICADA-DIMACS FACULTY ( Student: Nishchal Devanur CS Dept Rutgers Goal To detect the most influential.
A Discrepancy Detector James Abello, CCICADA-DIMACS FACULTY ( Student: Nishchal Devanur CS Dept Rutgers Goal To detect the most influential.
EMIS 8374 Optimal Trees updated 25 April slide 1 Minimum Spanning Tree (MST) Input –A (simple) graph G = (V,E) –Edge cost c ij for each edge e 
Leveraging Asset Reputation Systems to Detect and Prevent Fraud and Abuse at LinkedIn Jenelle Bray Staff Data Scientist Strata + Hadoop World New York,
Estimating Component Availability by Dempster-Shafer Belief Networks Estimating Component Availability by Dempster-Shafer Belief Networks Lan Guo Lane.
Visualizing QoS. Background(1/2) A tremendous growth in the development and deployment of networked applications such as video streaming, IP telephony,
CS 484 Load Balancing. Goal: All processors working all the time Efficiency of 1 Distribute the load (work) to meet the goal Two types of load balancing.
Consensus Extraction from Heterogeneous Detectors to Improve Performance over Network Traffic Anomaly Detection Jing Gao 1, Wei Fan 2, Deepak Turaga 2,
PRIN WOMEN PROJECT Research Unit: University of Naples Federico II G. Ferraiuolo
Models and Algorithms for Event-Driven Networks PhD Defense Brian Thompson Committee: Muthu Muthukrishnan (advisor), Danfeng Yao (Virginia Tech), Rebecca.
Data Structures and Algorithms in Parallel Computing Lecture 7.
GRAPHS. Graph Graph terminology: vertex, edge, adjacent, incident, degree, cycle, path, connected component, spanning tree Types of graphs: undirected,
Association Mining via Co-clustering of Sparse Matrices Brian Thompson *, Linda Ness †, David Shallcross †, Devasis Bassu † *†
Graph Data Management Lab, School of Computer Science Personalized Privacy Protection in Social Networks (VLDB2011)
Complexity and Efficient Algorithms Group / Department of Computer Science Testing the Cluster Structure of Graphs Christian Sohler joint work with Artur.
Exponential random graphs and dynamic graph algorithms David Eppstein Comp. Sci. Dept., UC Irvine.
Raptor Codes Amin Shokrollahi EPFL. BEC(p 1 ) BEC(p 2 ) BEC(p 3 ) BEC(p 4 ) BEC(p 5 ) BEC(p 6 ) Communication on Multiple Unknown Channels.
Indian Institute of Technology Bombay 1 Communication Networks Prof. D. Manjunath
Chapter 11. Chapter Summary  Introduction to trees (11.1)  Application of trees (11.2)  Tree traversal (11.3)  Spanning trees (11.4)
Errol Lloyd Design and Analysis of Algorithms Approximation Algorithms for NP-complete Problems Bin Packing Networks.
Mining Coherent Dense Subgraphs across Multiple Biological Networks Vahid Mirjalili CSE 891.
International Conference on Data Engineering (ICDE 2016)
Workshop on Data Mining in Networks ICDM 2015
Minimum Spanning Tree 8/7/2018 4:26 AM
Query-Friendly Compression of Graph Streams
Cache-Efficient Layouts of BVHs and Meshes
Lecture 7: Dynamic sampling Dimension Reduction
EMIS 8373: Integer Programming
CIS 700: “algorithms for Big Data”
Minimum Spanning Trees
Instructor: Shengyu Zhang
CSCI B609: “Foundations of Data Science”
Presentation transcript:

The max-divergence of E’ is: Intuitively, p-divergence of d means that the probability of at least X E’,p edges occurring p-recently is 1/d A (maximal) p-component of G = (V,E) is a connected subgraph C = (V’,E’) such that (1) w(e) ≤ p for all e in E’ and (2) w(e) > p for all e not in E’ incident to V’ The set of p-components partition V, for all p in [0,1] The p-components of G t for p = 0.3 are shown in blue The MCD Algorithm: 1.Calculate edge weights using the Recency function 2.Gradually increase the edge threshold, updating components and divergence values as necessary 3.Output: Disjoint components with max divergence Communication across an edge is modeled as a sequence of time-stamped events, which yields a distribution of inter-arrival times (IATs) A communication network is a time-evolving graph that models interactions between entities over time Pervasive in today’s world: phone calls, blog posts, , social network messages, IP connections Volatile: static network analysis tools not sufficient Goal: Efficiently identify local or global changes in communication activity or graph structure over time A Renewal Theory Approach to Anomaly Detection in Communication Networks Introduction/Motivation Model Traditional network analysis is inadequate for dealing with communication networks, which are dynamic and volatile Studying the inter-arrival time distributions of edges is a novel approach for analyzing communication networks Our algorithms are streaming, and run in O(m) space and O(m log m) time, where m is the # of edges in the dataset MCD analysis can be easily visualized and used as a tool for monitoring activity in a variety of real-world domains Our ApproachExperimental Results Conclusions Experiments on 4 datasets: Enron , LBNL IP traffic, Twitter messages, and Reality Mining Bluetooth proximity Clear and intuitive visualization reveals anomalous activity in the Bluetooth dataset at two points in time Brian Thompson † † Rutgers University Tina Eliassi-Rad †‡ ‡ Lawrence Livermore Lab Algorithm IATs for human interaction frequently follow a power-law distribution = t = 1t = 2t = 3t = Summary graph = ! = ?  Day 220: Day 250: Sorted by degreeRecencyMCD Analysis The Bounded Pareto allows us to model communication concisely, and make updates in real-time and constant space x min x max The recency function Rec : 2 T x T → [0,1] assigns a weight to edge e at time t based on its age, i.e. the time since the last event, subject to the constraints: Rec is uniquely determined by the constraints The uniformity property eliminates time-scale bias pComponentDiv 0.1{V 1,V 2 } {V 1,V 2,V 3 } {V 1,V 2,V 3 } {V 4,V 5 } {V 1,V 2,V 3,V 4,V 5 } {V 1,V 2,V 3,V 4,V 5 } Consider the weighted graph G t = (V,E) representing a communication network at time t, with w(e) = Rec(e,t) For, let X E’,p = # of edges in E’ with w(e) ≤ p We define the p-divergence of E’ as follows: Let E’ be the set of thick edges |E’| = 6 X E’,0.3 = 4 P(X ≥ 4) = 0.07 Div 0.3 (E’) = 14.2, where X ~ Bin(|E’|,p) A simple plot of MCD over time (left) identifies hand- labeled scanning activity in the LBNL dataset, as well as other anomalies overlooked by human analysts The plot at right shows scalability using the Twitter dataset (263k nodes, 308k edges, 1.1 million timestamps) Rec(e,t) = 0 at the time an event occurs, 1 when age = x max, and is increasing in between Rec(e,t) is uniform over [0,1] when sampled uniformly in time This work performed under the auspices of the U.S. Department of Energy by Lawrence Livermore National Laboratory under Contract DE-AC52-07NA27344.IM Review and Release number