The max-divergence of E’ is: Intuitively, p-divergence of d means that the probability of at least X E’,p edges occurring p-recently is 1/d A (maximal) p-component of G = (V,E) is a connected subgraph C = (V’,E’) such that (1) w(e) ≤ p for all e in E’ and (2) w(e) > p for all e not in E’ incident to V’ The set of p-components partition V, for all p in [0,1] The p-components of G t for p = 0.3 are shown in blue The MCD Algorithm: 1.Calculate edge weights using the Recency function 2.Gradually increase the edge threshold, updating components and divergence values as necessary 3.Output: Disjoint components with max divergence Communication across an edge is modeled as a sequence of time-stamped events, which yields a distribution of inter-arrival times (IATs) A communication network is a time-evolving graph that models interactions between entities over time Pervasive in today’s world: phone calls, blog posts, , social network messages, IP connections Volatile: static network analysis tools not sufficient Goal: Efficiently identify local or global changes in communication activity or graph structure over time A Renewal Theory Approach to Anomaly Detection in Communication Networks Introduction/Motivation Model Traditional network analysis is inadequate for dealing with communication networks, which are dynamic and volatile Studying the inter-arrival time distributions of edges is a novel approach for analyzing communication networks Our algorithms are streaming, and run in O(m) space and O(m log m) time, where m is the # of edges in the dataset MCD analysis can be easily visualized and used as a tool for monitoring activity in a variety of real-world domains Our ApproachExperimental Results Conclusions Experiments on 4 datasets: Enron , LBNL IP traffic, Twitter messages, and Reality Mining Bluetooth proximity Clear and intuitive visualization reveals anomalous activity in the Bluetooth dataset at two points in time Brian Thompson † † Rutgers University Tina Eliassi-Rad †‡ ‡ Lawrence Livermore Lab Algorithm IATs for human interaction frequently follow a power-law distribution = t = 1t = 2t = 3t = Summary graph = ! = ? Day 220: Day 250: Sorted by degreeRecencyMCD Analysis The Bounded Pareto allows us to model communication concisely, and make updates in real-time and constant space x min x max The recency function Rec : 2 T x T → [0,1] assigns a weight to edge e at time t based on its age, i.e. the time since the last event, subject to the constraints: Rec is uniquely determined by the constraints The uniformity property eliminates time-scale bias pComponentDiv 0.1{V 1,V 2 } {V 1,V 2,V 3 } {V 1,V 2,V 3 } {V 4,V 5 } {V 1,V 2,V 3,V 4,V 5 } {V 1,V 2,V 3,V 4,V 5 } Consider the weighted graph G t = (V,E) representing a communication network at time t, with w(e) = Rec(e,t) For, let X E’,p = # of edges in E’ with w(e) ≤ p We define the p-divergence of E’ as follows: Let E’ be the set of thick edges |E’| = 6 X E’,0.3 = 4 P(X ≥ 4) = 0.07 Div 0.3 (E’) = 14.2, where X ~ Bin(|E’|,p) A simple plot of MCD over time (left) identifies hand- labeled scanning activity in the LBNL dataset, as well as other anomalies overlooked by human analysts The plot at right shows scalability using the Twitter dataset (263k nodes, 308k edges, 1.1 million timestamps) Rec(e,t) = 0 at the time an event occurs, 1 when age = x max, and is increasing in between Rec(e,t) is uniform over [0,1] when sampled uniformly in time This work performed under the auspices of the U.S. Department of Energy by Lawrence Livermore National Laboratory under Contract DE-AC52-07NA27344.IM Review and Release number