Download presentation
Presentation is loading. Please wait.
Published byMeredith Hawkins Modified over 9 years ago
1
Models and Algorithms for Event-Driven Networks PhD Defense Brian Thompson Committee: Muthu Muthukrishnan (advisor), Danfeng Yao (Virginia Tech), Rebecca Wright, Paul Kantor, Hanghang Tong (CUNY City College) December 19, 2013Rutgers University
2
Models and Algorithms for Event-Driven Networks 2 What is an event-driven network?
3
Models and Algorithms for Event-Driven Networks We consider three problems that arise in the study of event-driven networks: 1. Detecting correlated events 2. Discovering functional communities 3. Modeling academic collaboration 3 Outline
4
Models and Algorithms for Event-Driven Networks Temporal dynamics Group behavior Attribution Computational feasibility 4 Themes
5
Detecting Correlated Events in Communication Networks Joint work with James Abello 5
6
Detecting Correlated Events in Communication Networks Setup: An event-driven network, where events indicate communication between two nodes Goal: Identify parts of the network with an unexpectedly high concentration of recent activity Challenges: Scalability – data accumulates, need concise representation Efficiency – high data rate, time-sensitive information Variability – entities have different temporal dynamics Problem Description 6
7
Detecting Correlated Events in Communication Networks Network Representation 7 Given an event-driven communication network: Muthu RebeccaPaulDanfengHanghang Node 1Node 2Timestamp MuthuRebecca8:30 AM RebeccaPaul9:00 AM MuthuDanfeng9:15 AM PaulHanghang2:00 PM
8
Detecting Correlated Events in Communication Networks Network Representation 8 For each pair of nodes (could be directed or undirected), we extract a time sequence: t1t1 t2t2 t3t3 t4t4 t5t5 Muthu Rebecca
9
Detecting Correlated Events in Communication Networks Network Representation Paul Rebecca MuthuDanfeng Hanghang 9 We can visualize the network like this:
10
Goal: Identify sets of nodes with an unexpectedly high concentration of recent activity Question: How to define “recent”? The most frequent communications will always seem “recent”, overshadowing others’ behavior. We call this time-scale bias. NOW Router Traffic Temporal Bias Attack Traffic Detecting Correlated Events in Communication Networks 10
11
Detecting Correlated Events in Communication Networks Time series analysis Sequence of “summary graphs” t = 1t = 2t = 3t = 4 Related Work 11
12
Our Approach 1. Use a streaming stochastic model to concisely represent communication between each node pair 2. Define a notion of “recent” communication that addresses time-scale bias 3. Apply a statistical test to detect correlated recent activity among a set of nodes Detecting Correlated Events in Communication Networks 12
13
Detecting Correlated Events in Communication Networks x min x max Inter-Arrival Time Distribution REneWal theory Approach for Real-time Data Streams The REWARDS Model 13 Time sequence: t1t1 t2t2 t3t3 t4t4 t5t5
14
For each pair of nodes in the network, estimate the parameters of the renewal process that is most likely to have generated the corresponding time sequence Detecting Correlated Events in Communication Networks x min x max Inter-Arrival Time Distribution REneWal theory Approach for Real-time Data Streams The REWARDS Model 14 Time sequence: t1t1 t2t2 t3t3 t4t4 t5t5
15
Detecting Correlated Events in Communication Networks Recency 15 t1t1 t2t2 t3t3 t4t4 t5t5 0 t
16
Recency Detecting Correlated Events in Communication Networks 16
17
Recency Detecting Correlated Events in Communication Networks 17
18
Detecting Correlated Events in Communication Networks 18 The L-CORE Algorithm Local algorithm for detecting CORrelated Events 1.0 0.90.3 0.8
19
.90 0.9 0.75 0.7 0.1 0.5 0.3.42 Node set 0.900 0.973 0.500 0.421 3. Run a variant of the Union-Find algorithm, keeping track of the subgraphs with highest recency 2. Initialize a disjoint set data structure on the nodes.97.90.50 Detecting Correlated Events in Communication Networks 19 The G-CORE Algorithm Global algorithm for detecting CORrelated Events 0.973 0.500
20
Detecting Correlated Events in Communication Networks 20 Complexity
21
Robustness to Time Scale Detecting Correlated Events in Communication Networks 21 Simulation: star network, 100 trials w/ normal activity, and 100 trials including a period of correlated activity Our approach is robust to temporal variability
22
Detection Latency Detecting Correlated Events in Communication Networks 22 Data: Enron corpus, ~1000 nodes and ~5000 events The algorithms identify similar times of correlated activity, but our approach has shorter response time
23
Visualization Detecting Correlated Events in Communication Networks 23 Output from G-CORE algorithm on the Bluetooth dataset at 12:00pm on Day 100
24
Summary of Contributions REWARDS: a stochastic model for event-driven networks A formal definition of recency that is time-scale invariant L-CORE: a streaming local algorithm for detecting correlated recent activity among a given set of node pairs G-CORE: an efficient global algorithm for detecting correlations throughout the network simultaneously Detecting Correlated Events in Communication Networks 24
25
Discovering Functional Communities Joint work with Linda Ness, David Shallcross, Devasis Bassu 25
26
Discovering Functional Communities Setup: An event-driven network, where events correspond to actions by a single node, each with an associated label Goal: Identify functional communities of individuals who use the same labels Challenges: Scalability – there may be many nodes and many labels Mixed membership – each node may be part of more than one community Problem Description 26
27
Discovering Functional Communities Network Representation Paul Rebecca Muthu Danfeng Hanghang 27 Given a set of nodes and a collection of labeled events:
28
Discovering Functional Communities Network Representation 28 Hanghang Rebecca Paul Danfeng Muthu bicluster
29
Discovering Functional Communities Network Representation 29 Hanghang Rebecca Paul Danfeng Muthu
30
Discovering Functional Communities Network Representation 30 Hanghang Danfeng Paul Rebecca Muthu
31
Goal: Given a matrix, cluster the rows and columns simultaneously to reveal hidden structure Challenges: Don’t know the number or sizes of clusters a priori Number of possible co-clusterings is exponential in the size of the matrix R1R1 R2R2 C1C1 C2C2 Discovering Functional Communities 31 Co-Clustering
32
Spectral methods use linear algebraic techniques such as SVD to fit a block diagonal structure Usually require number of clusters to be pre-specified Likely to perform well on the matrix on the left, but not the one on the right: Discovering Functional Communities 32 Related Work
33
1. Define a quality metric for co-clusterings that rewards large, dense biclusters 2. Find a co-clustering that maximizes the metric value NP-hard in general, so need efficient heuristics Discovering Functional Communities 33 Our Approach
34
largedense Property P 1 Property P 2 Discovering Functional Communities 34 Choosing a Metric
35
1. Build randomized k-d trees on the rows and columns 2. Initialize maximal anti-chains as the leaves of each tree 3. Traverse the trees simultaneously from the bottom up, greedily merging the rows or columns that result in the greatest increase in the metric value 4. Output the co-clustering with the best metric value Discovering Functional Communities 35 The CC-MACS Algorithm Co-Clustering via Maximal Anti-Chain Search
36
Discovering Functional Communities 36
37
Discovering Functional Communities 37
38
Discovering Functional Communities 38
39
Discovering Functional Communities 39
40
Discovering Functional Communities 40
41
Discovering Functional Communities 41
42
Discovering Functional Communities 42
43
Discovering Functional Communities 43
44
Discovering Functional Communities 44
45
Discovering Functional Communities 45 Experiments: Synthetic Data
46
Matrices with known structure, taken from the NIST Matrix Market repository Discovering Functional Communities 46 Experiments: Visual Comparison Original Matrix Randomly Permuted Cross- Association
47
Meme-Tracker dataset of Leskovec et al. Top biclusters returned by the CC-MACS algorithm: Discovering Functional Communities 47 Experiments: Web Memes # of Domains# of MemesDensityTopic 212698.2% St. Jude Children’s Hospital 517896.1%Brazilian news 63998.7%Spanish news 62099.2%Tech news 617100.0%Politics
48
A new class of co-clustering metrics that reward large, dense biclusters The CC-MACS algorithm, which efficiently searches the space of possible co-clusterings for one which maximizes the value of a given metric Advantages over existing methods: Do not need to specify number of clusters in advance Not limited to matrices with a block diagonal structure Discovering Functional Communities 48 Summary of Contributions
49
Modeling Collaboration in Academia Joint work with Graham Cormode, Qiang Ma, Muthu Muthukrishnan 49
50
Modeling Collaboration in Academia Problem Description 50
51
Modeling Collaboration in Academia Model one researcher’s papers and citations over time Model as a static network: same collaborations and number of papers per year Related Work 51 +3 +6 +9
52
Our Approach Model the system as a repeated game, where the researchers choose collaborators each year in an attempt to maximize their long-term academic success Determine which sets of collaboration strategies form a game equilibrium, such that no pair of researchers would benefit from changing their strategies in order to collaborate with each other Modeling Collaboration in Academia 52
53
Game-Theoretic Model Modeling Collaboration in Academia 53
54
Main Results Modeling Collaboration in Academia 54
55
Future Directions Do there exist equilibria in the dynamic game? Extend the model to allow mixed strategies Analyze the game under other metrics of academic success besides the h-index Modeling Collaboration in Academia 55
56
Models and Algorithms for Event-Driven Networks 1. Detecting correlated events New stochastic model to address issue of time-scale bias Efficiently find subgraphs with unusually high recent activity 2. Discovering functional communities New class of metrics to reward large, dense biclusters CC-MACS algorithm efficiently finds a good co-clustering 3. Modeling academic collaboration Game-theoretic model allows formal analysis and simulation of collaborative behavior in a dynamic setting 56
57
Other Work Measuring pairwise influence Use the REWARDS model to measure influence between nodes based on the times of their respective activity Innovation and circulation in information networks Determine most likely sources of new content, and measure the importance of each node in the diffusion process Cascade partitioning Infer likely threads of related content from temporal and relational information alone 57
58
I owe much gratitude to: My committee: Muthu Muthukrishnan, Danfeng Yao, Rebecca Wright, Paul Kantor, and Hanghang Tong Fred Roberts, Tami Carpenter, Tina Eliassi-Rad, and James Abello, for mentoring me over the years My other collaborators, mentors, and friends at Rutgers, DIMACS/CCICADA, ACS, and elsewhere The DHS Fellowship which funded me for 3 years Last but not least, my family and friends 58
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.