Models and Algorithms for Event-Driven Networks PhD Defense Brian Thompson Committee: Muthu Muthukrishnan (advisor), Danfeng Yao (Virginia Tech), Rebecca.

Models and Algorithms for Event-Driven Networks PhD Defense Brian Thompson Committee: Muthu Muthukrishnan (advisor), Danfeng Yao (Virginia Tech), Rebecca Wright, Paul Kantor, Hanghang Tong (CUNY City College) December 19, 2013Rutgers University

Models and Algorithms for Event-Driven Networks 2 What is an event-driven network?

Models and Algorithms for Event-Driven Networks We consider three problems that arise in the study of event-driven networks: 1. Detecting correlated events 2. Discovering functional communities 3. Modeling academic collaboration 3 Outline

Models and Algorithms for Event-Driven Networks Temporal dynamics Group behavior Attribution Computational feasibility 4 Themes

Detecting Correlated Events in Communication Networks Joint work with James Abello 5

Detecting Correlated Events in Communication Networks Setup: An event-driven network, where events indicate communication between two nodes Goal: Identify parts of the network with an unexpectedly high concentration of recent activity Challenges: Scalability – data accumulates, need concise representation Efficiency – high data rate, time-sensitive information Variability – entities have different temporal dynamics Problem Description 6

Detecting Correlated Events in Communication Networks Network Representation 7 Given an event-driven communication network: Muthu RebeccaPaulDanfengHanghang Node 1Node 2Timestamp MuthuRebecca8:30 AM RebeccaPaul9:00 AM MuthuDanfeng9:15 AM PaulHanghang2:00 PM

Detecting Correlated Events in Communication Networks Network Representation 8 For each pair of nodes (could be directed or undirected), we extract a time sequence: t1t1 t2t2 t3t3 t4t4 t5t5 Muthu Rebecca

Detecting Correlated Events in Communication Networks Network Representation Paul Rebecca MuthuDanfeng Hanghang 9 We can visualize the network like this:

Goal: Identify sets of nodes with an unexpectedly high concentration of recent activity Question: How to define “recent”? The most frequent communications will always seem “recent”, overshadowing others’ behavior. We call this time-scale bias. NOW Router Traffic Temporal Bias Attack Traffic Detecting Correlated Events in Communication Networks 10

Detecting Correlated Events in Communication Networks Time series analysis Sequence of “summary graphs” t = 1t = 2t = 3t = 4 Related Work 11

Our Approach 1. Use a streaming stochastic model to concisely represent communication between each node pair 2. Define a notion of “recent” communication that addresses time-scale bias 3. Apply a statistical test to detect correlated recent activity among a set of nodes Detecting Correlated Events in Communication Networks 12

Detecting Correlated Events in Communication Networks x min x max Inter-Arrival Time Distribution REneWal theory Approach for Real-time Data Streams The REWARDS Model 13 Time sequence: t1t1 t2t2 t3t3 t4t4 t5t5

For each pair of nodes in the network, estimate the parameters of the renewal process that is most likely to have generated the corresponding time sequence Detecting Correlated Events in Communication Networks x min x max Inter-Arrival Time Distribution REneWal theory Approach for Real-time Data Streams The REWARDS Model 14 Time sequence: t1t1 t2t2 t3t3 t4t4 t5t5

Detecting Correlated Events in Communication Networks Recency 15 t1t1 t2t2 t3t3 t4t4 t5t5 0 t

Recency Detecting Correlated Events in Communication Networks 16

Recency Detecting Correlated Events in Communication Networks 17

Detecting Correlated Events in Communication Networks 18 The L-CORE Algorithm Local algorithm for detecting CORrelated Events 1.0 0.90.3 0.8

.90 0.9 0.75 0.7 0.1 0.5 0.3.42 Node set 0.900 0.973 0.500 0.421 3. Run a variant of the Union-Find algorithm, keeping track of the subgraphs with highest recency 2. Initialize a disjoint set data structure on the nodes.97.90.50 Detecting Correlated Events in Communication Networks 19 The G-CORE Algorithm Global algorithm for detecting CORrelated Events 0.973 0.500

Detecting Correlated Events in Communication Networks 20 Complexity

Robustness to Time Scale Detecting Correlated Events in Communication Networks 21 Simulation: star network, 100 trials w/ normal activity, and 100 trials including a period of correlated activity Our approach is robust to temporal variability

Detection Latency Detecting Correlated Events in Communication Networks 22 Data: Enron corpus, ~1000 nodes and ~5000 events The algorithms identify similar times of correlated activity, but our approach has shorter response time

Visualization Detecting Correlated Events in Communication Networks 23 Output from G-CORE algorithm on the Bluetooth dataset at 12:00pm on Day 100

Summary of Contributions REWARDS: a stochastic model for event-driven networks A formal definition of recency that is time-scale invariant L-CORE: a streaming local algorithm for detecting correlated recent activity among a given set of node pairs G-CORE: an efficient global algorithm for detecting correlations throughout the network simultaneously Detecting Correlated Events in Communication Networks 24

Discovering Functional Communities Joint work with Linda Ness, David Shallcross, Devasis Bassu 25

Discovering Functional Communities Setup: An event-driven network, where events correspond to actions by a single node, each with an associated label Goal: Identify functional communities of individuals who use the same labels Challenges: Scalability – there may be many nodes and many labels Mixed membership – each node may be part of more than one community Problem Description 26

Discovering Functional Communities Network Representation Paul Rebecca Muthu Danfeng Hanghang 27 Given a set of nodes and a collection of labeled events:

Discovering Functional Communities Network Representation 28 Hanghang Rebecca Paul Danfeng Muthu bicluster

Discovering Functional Communities Network Representation 29 Hanghang Rebecca Paul Danfeng Muthu

Discovering Functional Communities Network Representation 30 Hanghang Danfeng Paul Rebecca Muthu

Goal: Given a matrix, cluster the rows and columns simultaneously to reveal hidden structure Challenges: Don’t know the number or sizes of clusters a priori Number of possible co-clusterings is exponential in the size of the matrix R1R1 R2R2 C1C1 C2C2 Discovering Functional Communities 31 Co-Clustering

Spectral methods use linear algebraic techniques such as SVD to fit a block diagonal structure Usually require number of clusters to be pre-specified Likely to perform well on the matrix on the left, but not the one on the right: Discovering Functional Communities 32 Related Work

1. Define a quality metric for co-clusterings that rewards large, dense biclusters 2. Find a co-clustering that maximizes the metric value NP-hard in general, so need efficient heuristics Discovering Functional Communities 33 Our Approach

largedense Property P 1 Property P 2 Discovering Functional Communities 34 Choosing a Metric

1. Build randomized k-d trees on the rows and columns 2. Initialize maximal anti-chains as the leaves of each tree 3. Traverse the trees simultaneously from the bottom up, greedily merging the rows or columns that result in the greatest increase in the metric value 4. Output the co-clustering with the best metric value Discovering Functional Communities 35 The CC-MACS Algorithm Co-Clustering via Maximal Anti-Chain Search

Discovering Functional Communities 36

Discovering Functional Communities 45 Experiments: Synthetic Data

Matrices with known structure, taken from the NIST Matrix Market repository Discovering Functional Communities 46 Experiments: Visual Comparison Original Matrix Randomly Permuted Cross- Association

Meme-Tracker dataset of Leskovec et al. Top biclusters returned by the CC-MACS algorithm: Discovering Functional Communities 47 Experiments: Web Memes # of Domains# of MemesDensityTopic 212698.2% St. Jude Children’s Hospital 517896.1%Brazilian news 63998.7%Spanish news 62099.2%Tech news 617100.0%Politics

A new class of co-clustering metrics that reward large, dense biclusters The CC-MACS algorithm, which efficiently searches the space of possible co-clusterings for one which maximizes the value of a given metric Advantages over existing methods: Do not need to specify number of clusters in advance Not limited to matrices with a block diagonal structure Discovering Functional Communities 48 Summary of Contributions

Modeling Collaboration in Academia Joint work with Graham Cormode, Qiang Ma, Muthu Muthukrishnan 49

Modeling Collaboration in Academia Problem Description 50

Modeling Collaboration in Academia Model one researcher’s papers and citations over time Model as a static network: same collaborations and number of papers per year Related Work 51 +3 +6 +9

Our Approach Model the system as a repeated game, where the researchers choose collaborators each year in an attempt to maximize their long-term academic success Determine which sets of collaboration strategies form a game equilibrium, such that no pair of researchers would benefit from changing their strategies in order to collaborate with each other Modeling Collaboration in Academia 52

Game-Theoretic Model Modeling Collaboration in Academia 53

Main Results Modeling Collaboration in Academia 54

Future Directions Do there exist equilibria in the dynamic game? Extend the model to allow mixed strategies Analyze the game under other metrics of academic success besides the h-index Modeling Collaboration in Academia 55

Models and Algorithms for Event-Driven Networks 1. Detecting correlated events New stochastic model to address issue of time-scale bias Efficiently find subgraphs with unusually high recent activity 2. Discovering functional communities New class of metrics to reward large, dense biclusters CC-MACS algorithm efficiently finds a good co-clustering 3. Modeling academic collaboration Game-theoretic model allows formal analysis and simulation of collaborative behavior in a dynamic setting 56

Other Work Measuring pairwise influence Use the REWARDS model to measure influence between nodes based on the times of their respective activity Innovation and circulation in information networks Determine most likely sources of new content, and measure the importance of each node in the diffusion process Cascade partitioning Infer likely threads of related content from temporal and relational information alone 57

I owe much gratitude to: My committee: Muthu Muthukrishnan, Danfeng Yao, Rebecca Wright, Paul Kantor, and Hanghang Tong Fred Roberts, Tami Carpenter, Tina Eliassi-Rad, and James Abello, for mentoring me over the years My other collaborators, mentors, and friends at Rutgers, DIMACS/CCICADA, ACS, and elsewhere The DHS Fellowship which funded me for 3 years Last but not least, my family and friends 58

Models and Algorithms for Event-Driven Networks PhD Defense Brian Thompson Committee: Muthu Muthukrishnan (advisor), Danfeng Yao (Virginia Tech), Rebecca.

Similar presentations

Presentation on theme: "Models and Algorithms for Event-Driven Networks PhD Defense Brian Thompson Committee: Muthu Muthukrishnan (advisor), Danfeng Yao (Virginia Tech), Rebecca."— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

Models and Algorithms for Event-Driven Networks PhD Defense Brian Thompson Committee: Muthu Muthukrishnan (advisor), Danfeng Yao (Virginia Tech), Rebecca.

Similar presentations

Presentation on theme: "Models and Algorithms for Event-Driven Networks PhD Defense Brian Thompson Committee: Muthu Muthukrishnan (advisor), Danfeng Yao (Virginia Tech), Rebecca."— Presentation transcript:

Similar presentations

About project

Feedback