Download presentation
Presentation is loading. Please wait.
1
Copyright 2006, Data Mining Research Laboratory An Event-based Framework for Characterizing the Evolutionary Behavior of Interaction Graphs Sitaram Asur, Srinivasan Parthasarathy and Duygu Ucar Department of Computer Science The Ohio State University
2
Copyright 2006, Data Mining Research Laboratory Motivation Interaction Networks –Represent scientific data from various domains –Nodes represent entities –Edges represent interactions among entities –Examples: Biological Networks - Protein- Protein Interaction (PPI) networks, gene expression networks Collaboration networks Social networks, online communities, blog networks Protein-protein interactions in yeast (Jeong et al, 2001) Physicist collaboration network (Newman and Girvan, 2004)
3
Copyright 2006, Data Mining Research Laboratory Motivation Mining interaction networks important –Gain insight into structure, properties and behavior of these networks [Newman, 2001] Modular nature of interaction networks important –Co-expression networks : dense components - > functional modules –Social networks : clusters -> community structure
4
Copyright 2006, Data Mining Research Laboratory Motivation A large number of earlier approaches focused on mining static interaction networks Many important real-world networks are dynamic Temporal protein interaction network of the yeast mitotic cell cycle. Ulrik de Lichtenberg, et al. Science 307, 724 (2005)
5
Copyright 2006, Data Mining Research Laboratory Motivation Dynamic Interaction Networks –Nodes and interactions change over time –Structure changes in the network Need for a structured method to characterize and model evolution –Understand nature of change (evolution) in networks –Consider evolution of individuals and communities –Develop models for reasoning and inference of future events
6
Copyright 2006, Data Mining Research Laboratory Workflow Temporal Snapshots Clustering Event Detection Behavioral Patterns Analysis and Inference Iterate i SiSi S i+1 CiCi C i+1 Evolving Graph
7
Copyright 2006, Data Mining Research Laboratory Temporal Snapshots Split the graph data into non-overlapping temporal snapshots –Each snapshot corresponds to a graph –Consists of all nodes and interactions active in that time period –Nodes active if they have an interaction in a particular time period A B E C G F D A B E C G F D T1T1 T2T2
8
Copyright 2006, Data Mining Research Laboratory Clustering Represent the snapshot graphs using clusters –Clusters of a graph can provide structure information –Examine the evolution of clusters over time –Can provide insight on corresponding changes to the graph –MCL clustering algorithm employed in this work –Ensemble clustering approaches can be employed to obtain robust clusters (Asur et al, ISMB 2007) A B E C G F D A B E C G F D T1T1 T2T2
9
Copyright 2006, Data Mining Research Laboratory Community-based Event Detection Continue Merge Split Form Dissolve C 1 1 C 2 1 C 2 2 C 3 1 5 1 C 5 3 5 2 CC 5 4 C C 6 1 C 6 4 C 6 2 C 6 5 C 6 3 T=1T=3T=2 4 3 C 4 1 C 4 2 C T=4T=5T=6
10
Copyright 2006, Data Mining Research Laboratory Entity-based Event Detection Appear Disappear Join Leave 1 2 C 1 1 C T=1T=3T=2T=4 C 3 2 3 1 C B A 4 1 C 4 2 C A B A B C 2 1 C 2 2 A B
11
Copyright 2006, Data Mining Research Laboratory Event Detection Represent each set of snapshot clusters as a k X N binary cluster-membership matrix Use bitwise operators to compute the events between each successive pair of matrices (snapshots) Example: Continue Event Continue (C j, C k ) = AND (S i (j), S i+1 (k)) == OR(S i (j), S i+1 (k)) Event Detection algorithm linear in the number of nodes in the graph O(N)
12
Copyright 2006, Data Mining Research Laboratory Temporal Analysis Use critical events for analysis Form and Dissolve events –Used to study group formation and dissipation Merge and Split events –Evolution of groups Continue events –Stability of clusters/groups –Evolution of topics in a collaboration network
13
Copyright 2006, Data Mining Research Laboratory Behavioral Analysis Use entity-based critical events discovered to compose incremental measures for capturing behavioral patterns Behavioral measures can then be used to analyze evolutionary behavior of nodes and clusters Four Behavioral measures –Stability Index –Sociability Index –Popularity Index –Influence Index
14
Copyright 2006, Data Mining Research Laboratory Case Study 1 : DBLP Collaboration network Data from 28 key conferences in databases/data mining/AI over 10 years Authors (nodes) connected by collaborations (edges) 23136 nodes and 54989 edges Collaboration networks display many of the structural features of social networks (Kempe, Kleinberg and Tardos 2003, Newman 2001)
15
Copyright 2006, Data Mining Research Laboratory Case Study 2 : Clinical Trials Network Clinical Trials –Can provide information on risks, benefits and optimal dosage levels. –Consists of observations of patients under drug use as well as some under placebo –Generally represented as a set of multivariate time series Evolving clinical trials network –Nodes representing patients –Correlations among patients modeled as edges –Edges change over time as correlations change Motivation: Use evolution of correlation to identify potential toxic effects of drugs
16
Copyright 2006, Data Mining Research Laboratory Stability Index Propensity of a node to interact with the same group of people over time Stability for a node over time incrementally computed based on the stability of the clusters it belongs to
17
Copyright 2006, Data Mining Research Laboratory Stability for Clinical Trials data Nodes with low Stability Index values represent patients with fluctuating correlation values (outliers) Null Hypothesis: –If the drug does not result in toxicity, then outliers are likely to be flagged at random from each group (drug and placebo). Experiment on clinical trials network for diabetes patients –19 nodes (patients) found having Stability Index below threshold. –The drug under study was discontinued due to possible toxic effects. 18 out of the 19 were on the drug!!!
18
Copyright 2006, Data Mining Research Laboratory Sociability Index Incremental measure of the different interactions a node participates in Opposite of the Stability Index Does not represent degree!
19
Copyright 2006, Data Mining Research Laboratory Sociability Index for Community Prediction Goal : To identify future cluster co-occurrences based on history data for the DBLP dataset Key Intuition: If two authors have high sociability, and they have not yet collaborated (not been clustered together), there is a high chance they will. Setup : Use the data for 1997-2001 to predict cluster co- occurrences for 2002-2006
20
Copyright 2006, Data Mining Research Laboratory Experimental Results Comparison with other measures (Liben-Nowell and Kleinberg, CIKM 2003) –Common Neighbor –Adamic-Adar –Jacquard
21
Copyright 2006, Data Mining Research Laboratory Popularity Index Measure of attraction of nodes to a cluster Influence measure of a cluster Does not reflect the size of the cluster DBLP dataset –Can be used to identify hot topics –If a large number of nodes join a cluster and they are all working on a similar topic, it indicates a buzz around that topic for that year
22
Copyright 2006, Data Mining Research Laboratory Application of Popularity Index Example : XML Year 1999 : 3 authors (XML and web applications) Year 2000 : 50 joins –30 of these authors published papers on XML
23
Copyright 2006, Data Mining Research Laboratory Influence Index Measure of influence of a node on others Influence in terms of participation in critical events Influence of a node initially computed as Follower nodes need to be pruned! unless
24
Copyright 2006, Data Mining Research Laboratory Top Influential authors – DBLP dataset
25
Copyright 2006, Data Mining Research Laboratory Diffusion Models Study the spread of information in an evolving interaction network (Kempe et al, 2003, 2005) –Nodes activated with information –Newly activated nodes become contagious briefly –Information propagates through the network –Activation function maps weights of the links of a node to determine if it is activated SUM Activation: If sum of weights > threshold, activate MAX Activation: If any single weight > threshold, activate t1t1 t2t2 t3t3 t4t4
26
Copyright 2006, Data Mining Research Laboratory Diffusion Models – Influence Maximization Influence Maximization Problem : Find initial set of nodes that can activate the most number of nodes over a time period –Critical in applications such as viral marketing and for epidemiological research –Complicated in the case of dynamic interaction networks as the network changes over time Need for dynamic measures that reflect the current status of the network –Sociability Index used to weight links Highly sociable nodes have high propensity to pass on information –Influence Index to determine initial set of active nodes –Comparison with random choice of nodes and degree-based selection (Wasserman and Faust, 1994)
27
Copyright 2006, Data Mining Research Laboratory Conclusions Most real-world graphs dynamic in nature –Need for analysis, reasoning and inference –Proposed an event-based framework Clusters to capture structure at different snapshots Critical events over clusters to identify dynamic properties of graphs Behavioral patterns incrementally composed from critical events –Proposed method useful in many application domains Protein function prediction, drug design, recommender systems, viral marketing, epidemiology Temporal Snapshots Clustering Event Detection Behavioral Patterns Analysis and Inference
28
Copyright 2006, Data Mining Research Laboratory Future Directions Extensions to large interaction graphs Use of semantic information for reasoning and inference –Merge and Split Events If two clusters have high semantic similarity, probability of a Merge is high –Continue events Track the evolution of topics Sequences of Form, Continue, Continue … Multi-scale temporal modeling Analyze snapshots of different granularity
29
Copyright 2006, Data Mining Research Laboratory Poster # 36, this evening (Mon 13 th Aug, 6:15 – 9:15 pm) This work was supported by the following grants: –DOE Early Career Principal Investigator Award No. DE-FG02- 04ER25611 –NSF CAREER Grant IIS-0347662 Contacts: –Sitaram Asur : asur@cse.ohio-state.edu –Dr Srinivasan Parthasarathy : srini@cse.ohio-state.edu –Duygu Ucar : ucar@cse.ohio-state.edu Group Webpage : http://dmrl.cse.ohio-state.edu Thanks!
30
Copyright 2006, Data Mining Research Laboratory Event Detection
31
Copyright 2006, Data Mining Research Laboratory Event Detection
Similar presentations
© 2024 SlidePlayer.com. Inc.
All rights reserved.