Streaming Models and Algorithms for Communication and Information Networks Brian Thompson (joint work with James Abello)

Slides:



Advertisements
Similar presentations
By Venkata Sai Pulluri ( ) Narendra Muppavarapu ( )
Advertisements

The A-tree: An Index Structure for High-dimensional Spaces Using Relative Approximation Yasushi Sakurai (NTT Cyber Space Laboratories) Masatoshi Yoshikawa.
1 VLDB 2006, Seoul Mapping a Moving Landscape by Mining Mountains of Logs Automated Generation of a Dependency Model for HUG’s Clinical System Mirko Steinle,
LEARNING INFLUENCE PROBABILITIES IN SOCIAL NETWORKS Amit Goyal Francesco Bonchi Laks V. S. Lakshmanan University of British Columbia Yahoo! Research University.
Active Learning for Streaming Networked Data Zhilin Yang, Jie Tang, Yutao Zhang Computer Science Department, Tsinghua University.
Minimizing Seed Set for Viral Marketing Cheng Long & Raymond Chi-Wing Wong Presented by: Cheng Long 20-August-2011.
Efficient Constraint Monitoring Using Adaptive Thresholds Srinivas Kashyap, IBM T. J. Watson Research Center Jeyashankar Ramamirtham, Netcore Solutions.
Fault-Tolerant Target Detection in Sensor Networks Min Ding +, Dechang Chen *, Andrew Thaeler +, and Xiuzhen Cheng + + Department of Computer Science,
Anomaly Detection in Communication Networks Brian Thompson James Abello.
Dynamic Bayesian Networks (DBNs)
Introduction of Probabilistic Reasoning and Bayesian Networks
Anomaly Detection in the WIPER System using A Markov Modulated Poisson Distribution Ping Yan Tim Schoenharl Alec Pawling Greg Madey.
Carnegie Mellon 1 Maximum Likelihood Estimation for Information Thresholding Yi Zhang & Jamie Callan Carnegie Mellon University
Statistical Methods Chichang Jou Tamkang University.
Multi-Scale Analysis for Network Traffic Prediction and Anomaly Detection Ling Huang Joint work with Anthony Joseph and Nina Taft January, 2005.
The max-divergence of E’ is: Intuitively, p-divergence of d means that the probability of at least X E’,p edges occurring p-recently is 1/d A (maximal)
On the Difficulty of Scalably Detecting Network Attacks Kirill Levchenko with Ramamohan Paturi and George Varghese.
Graphs and Topology Yao Zhao. Background of Graph A graph is a pair G =(V,E) –Undirected graph and directed graph –Weighted graph and unweighted graph.
BotGraph: Large Scale Spamming Botnet Detection Yao Zhao Yinglian Xie *, Fang Yu *, Qifa Ke *, Yuan Yu *, Yan Chen and Eliot Gillum ‡ EECS Department,
The Union-Split Algorithm and Cluster-Based Anonymization of Social Networks Brian Thompson Danfeng Yao Rutgers University Dept. of Computer Science Piscataway,
1 An Information Theoretic Approach to Network Trace Compression Y. Liu, D. Towsley, J. Weng and D. Goeckel.
Algorithm: For all e E t, define X e = {w e if e G t, 1 - w e otherwise}. Measure likelihood of substructure S by. Flag S as anomalous if, where is an.
LLNL-PRES This work was performed under the auspices of the U.S. Department of Energy by Lawrence Livermore National Laboratory under Contract DE-AC52-07NA27344.
Stream Clustering CSE 902. Big Data Stream analysis Stream: Continuous flow of data Challenges ◦Volume: Not possible to store all the data ◦One-time.
Models of Influence in Online Social Networks
Social Network Analysis via Factor Graph Model
On Anomalous Hot Spot Discovery in Graph Streams
Active Learning for Networked Data Based on Non-progressive Diffusion Model Zhilin Yang, Jie Tang, Bin Xu, Chunxiao Xing Dept. of Computer Science and.
Extracting Places and Activities from GPS Traces Using Hierarchical Conditional Random Fields Yong-Joong Kim Dept. of Computer Science Yonsei.
Predicting and Bypassing End-to-End Internet Service Degradation Anat Bremler-BarrEdith CohenHaim KaplanYishay Mansour Tel-Aviv UniversityAT&T Labs Tel-Aviv.
Modeling Information Diffusion in Networks with Unobserved Links Quang Duong Michael P. Wellman Satinder Singh Computer Science and Engineering University.
WEMAREC: Accurate and Scalable Recommendation through Weighted and Ensemble Matrix Approximation Chao Chen ⨳ , Dongsheng Li
Network Aware Resource Allocation in Distributed Clouds.
UNIVERSITY OF SOUTHERN CALIFORNIA 1 ELECTION: Energy-efficient and Low- latEncy sCheduling Technique for wIreless sensOr Networks S. Begum, S. Wang, B.
Scalable and Efficient Data Streaming Algorithms for Detecting Common Content in Internet Traffic Minho Sung Networking & Telecommunications Group College.
Message-Passing for Wireless Scheduling: an Experimental Study Paolo Giaccone (Politecnico di Torino) Devavrat Shah (MIT) ICCCN 2010 – Zurich August 2.
Influence Maximization in Dynamic Social Networks Honglei Zhuang, Yihan Sun, Jie Tang, Jialin Zhang, Xiaoming Sun.
DoWitcher: Effective Worm Detection and Containment in the Internet Core S. Ranjan et. al in INFOCOM 2007 Presented by: Sailesh Kumar.
Selfishness, Altruism and Message Spreading in Mobile Social Networks September 2012 In-Seok Kang
MURI: Integrated Fusion, Performance Prediction, and Sensor Management for Automatic Target Exploitation 1 Dynamic Sensor Resource Management for ATE MURI.
Challenges and Opportunities Posed by Power Laws in Network Analysis Bruno Ribeiro UMass Amherst MURI REVIEW MEETING Berkeley, 26 th Oct 2011.
Page 1 Inferring Relevant Social Networks from Interpersonal Communication Munmun De Choudhury, Winter Mason, Jake Hofman and Duncan Watts WWW ’10 Summarized.
1 Clarifying Sensor Anomalies using Social Network feeds * University of Illinois at Urbana Champaign + U.S. Army Research Lab ++ IBM Research, USA Prasanna.
KAIS T On the problem of placing Mobility Anchor Points in Wireless Mesh Networks Lei Wu & Bjorn Lanfeldt, Wireless Mesh Community Networks Workshop, 2006.
Measuring Behavioral Trust in Social Networks
Consensus Extraction from Heterogeneous Detectors to Improve Performance over Network Traffic Anomaly Detection Jing Gao 1, Wei Fan 2, Deepak Turaga 2,
Stefanos Antaris A Socio-Aware Decentralized Topology Construction Protocol Stefanos Antaris *, Despina Stasi *, Mikael Högqvist † George Pallis *, Marios.
Models and Algorithms for Event-Driven Networks PhD Defense Brian Thompson Committee: Muthu Muthukrishnan (advisor), Danfeng Yao (Virginia Tech), Rebecca.
Minas Gjoka, Emily Smith, Carter T. Butts
Panther: Fast Top-k Similarity Search in Large Networks JING ZHANG, JIE TANG, CONG MA, HANGHANG TONG, YU JING, AND JUANZI LI Presented by Moumita Chanda.
Association Mining via Co-clustering of Sparse Matrices Brian Thompson *, Linda Ness †, David Shallcross †, Devasis Bassu † *†
A Latent Social Approach to YouTube Popularity Prediction Amandianeze Nwana Prof. Salman Avestimehr Prof. Tsuhan Chen.
Speaker : Yu-Hui Chen Authors : Dinuka A. Soysa, Denis Guangyin Chen, Oscar C. Au, and Amine Bermak From : 2013 IEEE Symposium on Computational Intelligence.
1 Travel Times from Mobile Sensors Ram Rajagopal, Raffi Sevlian and Pravin Varaiya University of California, Berkeley Singapore Road Traffic Control TexPoint.
Facets: Fast Comprehensive Mining of Coevolving High-order Time Series Hanghang TongPing JiYongjie CaiWei FanQing He Joint Work by Presenter:Wei Fan.
1 Patterns of Cascading Behavior in Large Blog Graphs Jure Leskoves, Mary McGlohon, Christos Faloutsos, Natalie Glance, Matthew Hurst SDM 2007 Date:2008/8/21.
Arizona State University1 Fast Mining of a Network of Coevolving Time Series Wei FanHanghang TongPing JiYongjie Cai.
Bo Zong, Yinghui Wu, Ambuj K. Singh, Xifeng Yan 1 Inferring the Underlying Structure of Information Cascades
The Message Passing Communication Model David Woodruff IBM Almaden.
UNCLASSIFIED Inferring Pairwise Influence from Encrypted Communication Brian Thompson and Hasan Cam U.S. Army Research Laboratory MILCOM
 DM-Group Meeting Liangzhe Chen, Oct Papers to be present  RSC: Mining and Modeling Temporal Activity in Social Media  KDD’15  A. F. Costa,
On-line Detection of Real Time Multimedia Traffic
Workshop on Data Mining in Networks ICDM 2015
Probabilistic Data Management
Modeling, sampling, generating Networks with MRV
Scaling up Link Prediction with Ensembles
Feifei Li, Ching Chang, George Kollios, Azer Bestavros
GANG: Detecting Fraudulent Users in OSNs
GhostLink: Latent Network Inference for Influence-aware Recommendation
Presentation transcript:

Streaming Models and Algorithms for Communication and Information Networks Brian Thompson (joint work with James Abello)

Outline Introduction and Motivation A Streaming Model Algorithms Experimental Results Conclusions and Future Work Streaming Models and Algorithms for Communication and Information Networks Our Approach

Outline Introduction and Motivation A Streaming Model Algorithms Experimental Results Conclusions and Future Work Streaming Models and Algorithms for Communication and Information Networks Our Approach

Streaming Models and Algorithms for Communication and Information Networks Data: A network (G;T) G = (V,E) is a graph T is a set of time-stamped events corresponding to nodes or edges in G Goals: Identify recent correlated activity Measure influence between entities Challenges: Scalability – networks may be very large, limited space Efficiency – high data rate, time-sensitive information Variability – entities have different temporal dynamics Problem Description

Streaming Models and Algorithms for Communication and Information Networks Time-evolving graph model - sequence of “snapshots” Time series analysis t = 1t = 2t = 3t = 4 Related Work

Streaming Models and Algorithms for Communication and Information Networks Cascade model – set of seed nodes, information (product, news, virus) propagates through network Related Work

Outline Introduction and Motivation A Streaming Model Algorithms Experimental Results Conclusions and Future Work Streaming Models and Algorithms for Communication and Information Networks Our Approach

G is a graph T is a set of time-stamped events corresponding to nodes or edges in G SourceRecipientContentTimestamp Alice(public)“Fire at 2nd & Main!”Tuesday, 9:25am BobCheng(private message)Tuesday, 9:27am Fire...”Tuesday, 9:28am Alice Bob Cheng Devika Elina Streaming Models and Algorithms for Communication and Information Networks Data Model

(Node-centric) Alice Bob Cheng Devika Elina Streaming Models and Algorithms for Communication and Information Networks Data Model

(Edge-centric) Streaming Models and Algorithms for Communication and Information Networks Data Model Bob Cheng Alice Devika Elina

Streaming Models and Algorithms for Communication and Information Networks t1t1 t2t2 t3t3 t4t4 t5t5 0 S3S3 Renewal Theory

Streaming Models and Algorithms for Communication and Information Networks t1t1 t2t2 t3t3 t4t4 t5t5 0 t Renewal Theory

We model a stream of communication data from a node or across an edge as a renewal process Streaming Models and Algorithms for Communication and Information Networks x min x max Inter-Arrival Time Distribution Discrete-event sequence: t1t1 t2t2 t3t3 t4t4 t5t5 REneWal theory Approach for Real-time Data Streams The REWARDS Model

Given a stream of time-stamped events, we estimate the parameters of the renewal process for each node or edge based on the inter-arrival times Streaming Models and Algorithms for Communication and Information Networks x min x max Inter-Arrival Time Distribution REneWal theory Approach for Real-time Data Streams Discrete-event sequence: t1t1 t2t2 t3t3 t4t4 t5t5 The REWARDS Model

Outline Introduction and Motivation A Streaming Model Algorithms Experimental Results Conclusions and Future Work Streaming Models and Algorithms for Communication and Information Networks Our Approach

Streaming Models and Algorithms for Communication and Information Networks Goal: highlight recent activity Key idea: more recent = more relevant Challenge: The most frequent communicators will always seem “recent”, overshadowing others’ behavior. We call this time-scale bias. 8:00 am10:00 am12:00 pmNOW! alice1337 bob_iz_kewl User: Recency

Streaming Models and Algorithms for Communication and Information Networks Recency

Streaming Models and Algorithms for Communication and Information Networks Recency of Edge in Bluetooth Dataset Recency

Streaming Models and Algorithms for Communication and Information Networks Goal: measure influence of entity A on entity B Key idea: study pairwise (A,B)-gaps Challenge: More frequent communicators will tend to always have shorter “gaps”. 8:00 am10:00 am12:00 pmNOW! alice1337 bob_iz_kewl User: Another example of time-scale bias. Delay

Streaming Models and Algorithms for Communication and Information Networks Delay

Streaming Models and Algorithms for Communication and Information Networks Delay

Outline Introduction and Motivation A Streaming Model Algorithms Experimental Results Conclusions and Future Work Streaming Models and Algorithms for Communication and Information Networks Our Approach

Divergence Based on the Kolmogorov-Smirnov statistic: Recency divergence compares recency values for a set of nodes or edges to the CDF for Uniform(0,1) Delay divergence compares delay values for a set of edges, or for all (A,B)-gaps, to the CDF for Uniform(0,1) Streaming Models and Algorithms for Communication and Information Networks Compares empirical EDF F n (x) to hypothetical CDF F(x) KS = 0.32

Streaming Node-Centric Algorithm Goal: Flag times at which a node exhibits anomalous activity (indicated by an unusually high concentration of recent outgoing communication) Approach: Since the recency function is decreasing between consecutive communication, measure the recency divergence at a node only at times at which new activity occurs Streaming Models and Algorithms for Communication and Information Networks

The MCD Algorithm Goal: Identify subgraphs with correlated behavior Recency divergence to find recent anomalous activity Delay divergence to identify spheres of influence Streaming Models and Algorithms for Communication and Information Networks Challenge: How do we overcome the combinatorial explosion? Maximal Component Divergence Algorithm

The MCD Algorithm V2V2 V3V3 V1V1 V5V5 V4V V1V1 V2V2 V3V3 V4V4 V5V5 θComponentDiv(C) 0.9{V 1,V 2 } {V 1,V 2,V 3 } {V 1,V 2,V 3 } {V 4,V 5 } {V 1,V 2,V 3,V 4,V 5 } {V 1,V 2,V 3,V 4,V 5 } Calculate edge weights using recency or delay function 2. Gradually decrease the threshold, updating components and divergence values as necessary 3. Output: Disjoint components with max divergence Streaming Models and Algorithms for Communication and Information Networks Maximal Component Divergence Algorithm

Sample Output MCDθ#V(C)E-frac%E(C)%E(G) / / / / / Streaming Models and Algorithms for Communication and Information Networks

Outline Introduction and Motivation A Streaming Model Algorithms Experimental Results Conclusions and Future Work Streaming Models and Algorithms for Communication and Information Networks Our Approach

Robustness to Time Scale Streaming Models and Algorithms for Communication and Information Networks Simulation: R-MAT model, 128 vertices, avg. degree 16 IATs for edge activity sampled from Bounded Pareto distributions, rate parameter btwn 10 mins. and 1 week Every 5 days, a randomly selected node has anomalous activity at 10x its normal rate

Robustness to Time Scale Streaming Models and Algorithms for Communication and Information Networks

Robustness to Time Scale Streaming Models and Algorithms for Communication and Information Networks Conclusion: While it takes longer for anomalous activity to be recognized at nodes with lower rates, the magnitude of the peak seems to be independent of activity rate but highly correlated with degree

Accuracy and Precision Streaming Models and Algorithms for Communication and Information Networks Simulation: star network, 100 trials w/ only normal activity and 100 trials including a period of anomalous activity ROC curves show accuracy and precision for several methods for distinguishing between the two scenarios Conclusion: Especially when variability is introduced, our approach out-performs the WtdDeg and Z-Score metrics

Detection Latency Streaming Models and Algorithms for Communication and Information Networks Data: Enron corpus, 1k nodes, 2k edges, 4k timestamps Compare our approach with GraphScope Algorithm Conclusion: The two algorithms seem to identify similar times of anomalous activity, but our approach based on the REWARDS model has shorter response time

Anomaly Detection in IP Traffic Streaming Models and Algorithms for Communication and Information Networks Data: LBNL network trace, > 9 million timestamps during one hour on December 15, 2004 Compare our approach with total network volume and with “scanning activity” labeled by LBNL analysts

Anomaly Detection in IP Traffic Streaming Models and Algorithms for Communication and Information Networks

Complexity Analysis Dataset: Twitter messages, Nov – Oct (263k nodes, 308k edges, 1.1 million timestamps) Updates O(1) per communication MCD Algorithm O(m log m), where m = # of edges; can be approximated in effectively O(m) time Streaming Models and Algorithms for Communication and Information Networks

Outline Introduction and Motivation A Streaming Model Algorithms Experimental Results Conclusions and Future Work Streaming Models and Algorithms for Communication and Information Networks Our Approach

Future Work Incorporate duration of communication and other node or edge attributes into our model Make use of geographical and textual content Use gap divergence to infer links, compare to approach of Gomez-Rodriguez et. al. Develop streaming algorithm to identify emerging trends Streaming Models and Algorithms for Communication and Information Networks

Acknowledgements Part of this work was conducted at Lawrence Livermore National Laboratory, under the guidance of Tina Eliassi- Rad. This project is partially supported by a DHS Career Development Grant, under the auspices of CCICADA, a DHS Center of Excellence. Streaming Models and Algorithms for Communication and Information Networks