Presentation is loading. Please wait.

Presentation is loading. Please wait.

Association Mining via Co-clustering of Sparse Matrices Brian Thompson *, Linda Ness †, David Shallcross †, Devasis Bassu † *†

Similar presentations


Presentation on theme: "Association Mining via Co-clustering of Sparse Matrices Brian Thompson *, Linda Ness †, David Shallcross †, Devasis Bassu † *†"— Presentation transcript:

1 Association Mining via Co-clustering of Sparse Matrices Brian Thompson *, Linda Ness †, David Shallcross †, Devasis Bassu † *†

2 Clustering Given: A sparse binary matrix Goal: Cluster the rows so that similar rows are in the same cluster Challenges: Don’t know the number of clusters a priori Need solution to be efficient; making all pairwise comparisons is too expensive Association Mining via Co-clustering of Sparse Matrices R1R1 R2R2 R3R3

3 Co-Clustering Given: A sparse binary matrix Goal: Cluster the rows and columns so that they form large, dense biclusters Challenges: Don’t know the number of clusters a priori Need solution to be efficient; making all pairwise comparisons is too expensive Association Mining via Co-clustering of Sparse Matrices R1R1 R2R2 R3R3 C1C1 C2C2 C3C3

4 The 1-minute talk What do we want to do? Association Mining via Co-clustering of Sparse Matrices

5 EventTimestamp Alice: “Go to WSDM!”Feb. 7, 2012 9:45pm Bob  Chris: (private)Feb. 8, 2012 12:30am Chris: “RT @Alice...”Feb. 8, 2012 12:37am Edge-centric Node-centric Alice Bob Chris Dave Eve Alice Bob Chris Dave Eve 18.73.0 0.3 13.6

6 Association Mining via Co-clustering of Sparse Matrices

7 The 2-minute talk What is our approach? Association Mining via Co-clustering of Sparse Matrices

8 Problem Description Given: A network (G;T) G = (V,E) is a graph T is a set of discrete-event sequences corresponding to elements of G Goals: Identify recent correlated activity Measure influence of one entity on another Challenges: Scalability - comparing every set or even pair of entities is too expensive Variability – different entities have very different properties discrete-event sequence: Association Mining via Co-clustering of Sparse Matrices

9 Approach 8:00 am10:00 am12:00 pmNOW! alice1337 bob_iz_kool x min x max Inter-arrival Time Distribution User: recency pairwise gap Association Mining via Co-clustering of Sparse Matrices

10 The 5-minute talk How does our model address temporal variability in a network? Association Mining via Co-clustering of Sparse Matrices

11 We model a stream of communication data as a renewal process: a sequence of time-stamped events sampled from a distribution of inter-arrival times (IATs) x min x max Inter-Arrival Time Distribution The REWARDS Model REneWal theory Approach for Real-time Data Streams Discrete-event sequence: t1t1 t2t2 t3t3 t4t4 t5t5 Association Mining via Co-clustering of Sparse Matrices

12 Given a stream of time-stamped events, we estimate the parameters of the renewal process for each node or edge based on its event inter-arrival times x min x max Inter-Arrival Time Distribution The REWARDS Model REneWal theory Approach for Real-time Data Streams Discrete-event sequence: t1t1 t2t2 t3t3 t4t4 t5t5 Association Mining via Co-clustering of Sparse Matrices

13 Recency 8:00 am10:00 am12:00 pmNOW! alice1337 bob_iz_kool User: Association Mining via Co-clustering of Sparse Matrices

14 Pairwise Gaps 8:00 am10:00 am12:00 pmNOW! alice1337 bob_iz_kool User: Association Mining via Co-clustering of Sparse Matrices

15 Based on the Kolmogorov-Smirnov statistic: Recency divergence compares recency values for a set of nodes or edges to the Triangle(0,1) distribution Gap divergence compares pairwise (A,B)-gaps to the theoretical distribution if A and B were independent Compares EDF F n (x) to hypothetical CDF F(x) KS = 0.32 Divergence Association Mining via Co-clustering of Sparse Matrices

16 LBNL Case Study Association Mining via Co-clustering of Sparse Matrices

17 Acknowledgements/Disclaimer This research was supported by the Intelligence Advanced Research Projects Activity (IARPA) via Air Force Research Laboratory (AFRL) contract number FA8650-10-C-706. The U.S. Government is authorized to reproduce and distribute reprints for Governmental purposes notwithstanding any copyright annotation thereon. The views and conclusions contained herein are those of the authors and should not be interpreted as necessarily representing the official policies or endorsements, either expressed or implied, of IARPA, AFRL, or the U.S. Government. Any misinformation, mistakes, or misunderstanding resulting from this talk are solely the fault of the speaker. Association Mining via Co-clustering of Sparse Matrices

18


Download ppt "Association Mining via Co-clustering of Sparse Matrices Brian Thompson *, Linda Ness †, David Shallcross †, Devasis Bassu † *†"

Similar presentations


Ads by Google