Association Mining via Co-clustering of Sparse Matrices Brian Thompson *, Linda Ness †, David Shallcross †, Devasis Bassu † *†

Slides:



Advertisements
Similar presentations
Numerical Linear Algebra in the Streaming Model Ken Clarkson - IBM David Woodruff - IBM.
Advertisements

A Comparison of Rule-Based versus Exemplar-Based Categorization Using the ACT-R Architecture Matthew F. RUTLEDGE-TAYLOR, Christian LEBIERE, Robert THOMSON,
Active Learning for Streaming Networked Data Zhilin Yang, Jie Tang, Yutao Zhang Computer Science Department, Tsinghua University.
Anomaly Detection in Communication Networks Brian Thompson James Abello.
Online Social Networks and Media. Graph partitioning The general problem – Input: a graph G=(V,E) edge (u,v) denotes similarity between u and v weighted.
Using a Modified Borda Count to Predict the Outcome of a Condorcet Tally on a Graphical Model 11/19/05 Galen Pickard, MIT Advisor: Dr. Whitman Richards,
Author: Jie chen and Yousef Saad IEEE transactions of knowledge and data engineering.
Copyright 2006, Data Mining Research Laboratory An Event-based Framework for Characterizing the Evolutionary Behavior of Interaction Graphs Sitaram Asur,
A New Biclustering Algorithm for Analyzing Biological Data Prashant Paymal Advisor: Dr. Hesham Ali.
Communities in Heterogeneous Networks Chapter 4 1 Chapter 4, Community Detection and Mining in Social Media. Lei Tang and Huan Liu, Morgan & Claypool,
1 The Time-Triggered Model of Computation Lior Zimet.
Sampling from Large Graphs. Motivation Our purpose is to analyze and model social networks –An online social network graph is composed of millions of.
Dimension reduction : PCA and Clustering Slides by Agnieszka Juncker and Chris Workman.
Streaming Models and Algorithms for Communication and Information Networks Brian Thompson (joint work with James Abello)
The max-divergence of E’ is: Intuitively, p-divergence of d means that the probability of at least X E’,p edges occurring p-recently is 1/d A (maximal)
1 AutoPart: Parameter-Free Graph Partitioning and Outlier Detection Deepayan Chakrabarti
Yajie Miao Florian Metze
The Union-Split Algorithm and Cluster-Based Anonymization of Social Networks Brian Thompson Danfeng Yao Rutgers University Dept. of Computer Science Piscataway,
Using Interpretive Structural Modeling to Identify and Quantify Interactive Risks ASTIN 2007 Orlando, FL, USA Rick Gorvett, FCAS, MAAA, ARM, FRM, PhD Director,
1 Representing Relations Epp section ??? CS 202 Aaron Bloomfield.
Stream Clustering CSE 902. Big Data Stream analysis Stream: Continuous flow of data Challenges ◦Volume: Not possible to store all the data ◦One-time.
Graph-based consensus clustering for class discovery from gene expression data Zhiwen Yum, Hau-San Wong and Hongqiang Wang Bioinformatics, 2007.
On Anomalous Hot Spot Discovery in Graph Streams
Attention and Event Detection Identifying, attributing and describing spatial bursts Early online identification of attention items in social media Louis.
Venkatram Ramanathan 1. Motivation Evolution of Multi-Core Machines and the challenges Background: MapReduce and FREERIDE Co-clustering on FREERIDE Experimental.
6. Experimental Analysis Visible Boltzmann machine with higher-order potentials: Conditional random field (CRF): Exponential random graph model (ERGM):
Evolutionary Clustering and Analysis of Bibliographic Networks Manish Gupta (UIUC) Charu C. Aggarwal (IBM) Jiawei Han (UIUC) Yizhou Sun (UIUC) ASONAM 2011.
Spiros Papadimitriou Jimeng Sun IBM T.J. Watson Research Center Hawthorne, NY, USA Reporter: Nai-Hui, Ku.
Semantic Information Fusion Shashi Phoha, PI Head, Information Science and Technology Division Applied Research Laboratory The Pennsylvania State.
4.2 An Introduction to Matrices Algebra 2. Learning Targets I can create a matrix and name it using its dimensions I can perform scalar multiplication.
CS433: Modeling and Simulation Dr. Anis Koubâa Al-Imam Mohammad bin Saud University 15 October 2010 Lecture 05: Statistical Analysis Tools.
A Cross-Sensor Evaluation of Three Commercial Iris Cameras for Iris Biometrics Ryan Connaughton and Amanda Sgroi June 20, 2011 CVPR Biometrics Workshop.
6 - 1 © 1998 Prentice-Hall, Inc. Chapter 6 Sampling Distributions.
Association Mining via Co-clustering of Sparse Matrices Brian Thompson *, Linda Ness †, David Shallcross †, Devasis Bassu † *†
Class Opener:. Identifying Matrices Student Check:
Markov Cluster (MCL) algorithm Stijn van Dongen.
Gene expression & Clustering. Determining gene function Sequence comparison tells us if a gene is similar to another gene, e.g., in a new species –Dynamic.
Matrix Algebra Section 7.2. Review of order of matrices 2 rows, 3 columns Order is determined by: (# of rows) x (# of columns)
Massive Data Sets and Information Theory Ziv Bar-Yossef Department of Electrical Engineering Technion.
Models and Algorithms for Event-Driven Networks PhD Defense Brian Thompson Committee: Muthu Muthukrishnan (advisor), Danfeng Yao (Virginia Tech), Rebecca.
Review of Statistical Terms Population Sample Parameter Statistic.
Graphs Basic properties.
The statistics of pairwise alignment BMI/CS 576 Colin Dewey Fall 2015.
What is a Matrices? A matrix is a rectangular array of data entries (elements) displayed in rows and columns and enclosed in brackets. The number of rows.
Ultra-high dimensional feature selection Yun Li
Assessing the significance of (data mining) results Data D, an algorithm A Beautiful result A (D) But: what does it mean? How to determine whether the.
PTCN-HPEC-02-1 AIR 25Sept02 MIT Lincoln Laboratory Resource Management for Digital Signal Processing via Distributed Parallel Computing Albert I. Reuther.
6 - 1 © 2000 Prentice-Hall, Inc. Statistics for Business and Economics Sampling Distributions Chapter 6.
Near repeat burglary chains: describing the physical and network properties of a network of close burglary pairs. Dr Michael Townsley, UCL Jill Dando Institute.
Parallel Multifrontal Sparse Solvers Information Sciences Institute 22 June 2010 Bob Lucas, Gene Wagenbreth, Dan Davis, Roger Grimes
The Message Passing Communication Model David Woodruff IBM Almaden.
Notes Over 4.2 Finding the Product of Two Matrices Find the product. If it is not defined, state the reason. To multiply matrices, the number of columns.
4.1 An Introduction to Matrices Katie Montella Mod. 6 5/25/07.
TribeFlow Mining & Predicting User Trajectories Flavio Figueiredo Bruno Ribeiro Jussara M. AlmeidaChristos Faloutsos 1.
Deploying an Intelligent Pairing Assistant for Air Operation Centers Jeremy Ludwig, Ph.D. June 21, Distribution A: Approved for public release.
 DM-Group Meeting Liangzhe Chen, Oct Papers to be present  RSC: Mining and Modeling Temporal Activity in Social Media  KDD’15  A. F. Costa,
Visualization in Process Mining
13.4 Product of Two Matrices
Albert I. Reuther & Joel Goodman HPEC Sept 2003
Workshop on Data Mining in Networks ICDM 2015
Behavioral Statistics
Securing A Compiler Transformation
Distributed Learning of Multilingual DNN Feature Extractors using GPUs
Discovering Functional Communities in Social Media
Matrices Elements, Adding and Subtracting
Isomorphism in GRAPHS.
Scaling up Link Prediction with Ensembles
Combining relations via relational composition
Basic Matrix Operations
Prepared by Po-Chuan on 2016/05/24
Presentation transcript:

Association Mining via Co-clustering of Sparse Matrices Brian Thompson *, Linda Ness †, David Shallcross †, Devasis Bassu † *†

Clustering Given: A sparse binary matrix Goal: Cluster the rows so that similar rows are in the same cluster Challenges: Don’t know the number of clusters a priori Need solution to be efficient; making all pairwise comparisons is too expensive Association Mining via Co-clustering of Sparse Matrices R1R1 R2R2 R3R3

Co-Clustering Given: A sparse binary matrix Goal: Cluster the rows and columns so that they form large, dense biclusters Challenges: Don’t know the number of clusters a priori Need solution to be efficient; making all pairwise comparisons is too expensive Association Mining via Co-clustering of Sparse Matrices R1R1 R2R2 R3R3 C1C1 C2C2 C3C3

The 1-minute talk What do we want to do? Association Mining via Co-clustering of Sparse Matrices

EventTimestamp Alice: “Go to WSDM!”Feb. 7, :45pm Bob  Chris: (private)Feb. 8, :30am Chris: 8, :37am Edge-centric Node-centric Alice Bob Chris Dave Eve Alice Bob Chris Dave Eve

Association Mining via Co-clustering of Sparse Matrices

The 2-minute talk What is our approach? Association Mining via Co-clustering of Sparse Matrices

Problem Description Given: A network (G;T) G = (V,E) is a graph T is a set of discrete-event sequences corresponding to elements of G Goals: Identify recent correlated activity Measure influence of one entity on another Challenges: Scalability - comparing every set or even pair of entities is too expensive Variability – different entities have very different properties discrete-event sequence: Association Mining via Co-clustering of Sparse Matrices

Approach 8:00 am10:00 am12:00 pmNOW! alice1337 bob_iz_kool x min x max Inter-arrival Time Distribution User: recency pairwise gap Association Mining via Co-clustering of Sparse Matrices

The 5-minute talk How does our model address temporal variability in a network? Association Mining via Co-clustering of Sparse Matrices

We model a stream of communication data as a renewal process: a sequence of time-stamped events sampled from a distribution of inter-arrival times (IATs) x min x max Inter-Arrival Time Distribution The REWARDS Model REneWal theory Approach for Real-time Data Streams Discrete-event sequence: t1t1 t2t2 t3t3 t4t4 t5t5 Association Mining via Co-clustering of Sparse Matrices

Given a stream of time-stamped events, we estimate the parameters of the renewal process for each node or edge based on its event inter-arrival times x min x max Inter-Arrival Time Distribution The REWARDS Model REneWal theory Approach for Real-time Data Streams Discrete-event sequence: t1t1 t2t2 t3t3 t4t4 t5t5 Association Mining via Co-clustering of Sparse Matrices

Recency 8:00 am10:00 am12:00 pmNOW! alice1337 bob_iz_kool User: Association Mining via Co-clustering of Sparse Matrices

Pairwise Gaps 8:00 am10:00 am12:00 pmNOW! alice1337 bob_iz_kool User: Association Mining via Co-clustering of Sparse Matrices

Based on the Kolmogorov-Smirnov statistic: Recency divergence compares recency values for a set of nodes or edges to the Triangle(0,1) distribution Gap divergence compares pairwise (A,B)-gaps to the theoretical distribution if A and B were independent Compares EDF F n (x) to hypothetical CDF F(x) KS = 0.32 Divergence Association Mining via Co-clustering of Sparse Matrices

LBNL Case Study Association Mining via Co-clustering of Sparse Matrices

Acknowledgements/Disclaimer This research was supported by the Intelligence Advanced Research Projects Activity (IARPA) via Air Force Research Laboratory (AFRL) contract number FA C-706. The U.S. Government is authorized to reproduce and distribute reprints for Governmental purposes notwithstanding any copyright annotation thereon. The views and conclusions contained herein are those of the authors and should not be interpreted as necessarily representing the official policies or endorsements, either expressed or implied, of IARPA, AFRL, or the U.S. Government. Any misinformation, mistakes, or misunderstanding resulting from this talk are solely the fault of the speaker. Association Mining via Co-clustering of Sparse Matrices