Discovering Functional Communities in Social Media

Slides:



Advertisements
Similar presentations
Partitional Algorithms to Detect Complex Clusters
Advertisements

ICDE 2014 LinkSCAN*: Overlapping Community Detection Using the Link-Space Transformation Sungsu Lim †, Seungwoo Ryu ‡, Sejeong Kwon§, Kyomin Jung ¶, and.
Object Specific Compressed Sensing by minimizing a weighted L2-norm A. Mahalanobis.
Query Optimization of Frequent Itemset Mining on Multiple Databases Mining on Multiple Databases David Fuhry Department of Computer Science Kent State.
Lecture outline Density-based clustering (DB-Scan) – Reference: Martin Ester, Hans-Peter Kriegel, Jorg Sander, Xiaowei Xu: A Density-Based Algorithm for.
Community Detection Laks V.S. Lakshmanan (based on Girvan & Newman. Finding and evaluating community structure in networks. Physical Review E 69,
Dimensionality Reduction PCA -- SVD
A New Biclustering Algorithm for Analyzing Biological Data Prashant Paymal Advisor: Dr. Hesham Ali.
Lecture outline Clustering aggregation – Reference: A. Gionis, H. Mannila, P. Tsaparas: Clustering aggregation, ICDE 2004 Co-clustering (or bi-clustering)
Communities in Heterogeneous Networks Chapter 4 1 Chapter 4, Community Detection and Mining in Social Media. Lei Tang and Huan Liu, Morgan & Claypool,
Linear Transformations
Computing Sketches of Matrices Efficiently & (Privacy Preserving) Data Mining Petros Drineas Rensselaer Polytechnic Institute (joint.
Kyle Heath, Natasha Gelfand, Maks Ovsjanikov, Mridul Aanjaneya, Leo Guibas Image Webs Computing and Exploiting Connectivity in Image Collections.
1 Constructing Pseudo-Random Permutations with a Prescribed Structure Moni Naor Weizmann Institute Omer Reingold AT&T Research.
6 1 Linear Transformations. 6 2 Hopfield Network Questions.
Collaborative Recommendation via Adaptive Association Rule Mining KDD-2000 Workshop on Web Mining for E-Commerce (WebKDD-2000) Weiyang Lin Sergio A. Alvarez.
CBLOCK: An Automatic Blocking Mechanism for Large-Scale Deduplication Tasks Ashwin Machanavajjhala Duke University with Anish Das Sarma, Ankur Jain, Philip.
COMMUNITIES IN MULTI-MODE NETWORKS 1. Heterogeneous Network Heterogeneous kinds of objects in social media – YouTube Users, tags, videos, ads – Del.icio.us.
1 Motivation Web query is usually two or three words long. –Prone to ambiguity –Example “keyboard” –Input device of computer –Musical instruments How can.
December 7-10, 2013, Dallas, Texas
Association Mining via Co-clustering of Sparse Matrices Brian Thompson *, Linda Ness †, David Shallcross †, Devasis Bassu † *†
Spectral Clustering Jianping Fan Dept of Computer Science UNC, Charlotte.
Models and Algorithms for Event-Driven Networks PhD Defense Brian Thompson Committee: Muthu Muthukrishnan (advisor), Danfeng Yao (Virginia Tech), Rebecca.
Association Mining via Co-clustering of Sparse Matrices Brian Thompson *, Linda Ness †, David Shallcross †, Devasis Bassu † *†
Unsupervised Streaming Feature Selection in Social Media
DATA MINING LECTURE 8 Sequence Segmentation Dimensionality Reduction.
1 An Efficient Optimal Leaf Ordering for Hierarchical Clustering in Microarray Gene Expression Data Analysis Jianting Zhang Le Gruenwald School of Computer.
Algebraic Techniques for Analysis of Large Discrete-Valued Datasets 
Sampling Based Range Partition for Big Data Analytics + Some Extras Milan Vojnović Microsoft Research Cambridge, United Kingdom Joint work with Charalampos.
1 Dongheng Sun 04/26/2011 Learning with Matrix Factorizations By Nathan Srebro.
Unsupervised Learning
Yu Wang1, Gao Cong2, Guojie Song1, Kunqing Xie1
Cohesive Subgraph Computation over Large Graphs
Selected Topics in CI I Genetic Programming Dr. Widodo Budiharto 2014.
Dimensionality Reduction and Principle Components Analysis
Nanyang Technological University
Finding Dense and Connected Subgraphs in Dual Networks
Evolutionary Algorithms Jim Whitehead
Semi-Supervised Clustering
Clustering CSC 600: Data Mining Class 21.
Game Theory Just last week:
MEIKE: Influence-based Communities in Networks
Workshop on Data Mining in Networks ICDM 2015
Abolfazl Asudeh Azade Nazi Nan Zhang Gautam DaS
Non-additive Security Games
WSRec: A Collaborative Filtering Based Web Service Recommender System
RE-Tree: An Efficient Index Structure for Regular Expressions
Asymmetric Correlation Regularized Matrix Factorization for Web Service Recommendation Qi Xie1, Shenglin Zhao2, Zibin Zheng3, Jieming Zhu2 and Michael.
Computer Science cpsc322, Lecture 14
CARPENTER Find Closed Patterns in Long Biological Datasets
K-means and Hierarchical Clustering
Parallelism in High-Performance Computing Applications
Distributed Representations of Subgraphs
Jianping Fan Dept of CS UNC-Charlotte
Hierarchical clustering approaches for high-throughput data
Advanced Artificial Intelligence
Q4 : How does Netflix recommend movies?
Algebraic Techniques for Analysis of Large Discrete-Valued Datasets
Unit-2 Divide and Conquer
Clustering.
3.3 Network-Centric Community Detection
Efficient Record Linkage in Large Data Sets
Text Categorization Berlin Chen 2003 Reference:
Clustering.
15th Scandinavian Workshop on Algorithm Theory
Clustering.
Yingze Wang and Shi-Kuo Chang University of Pittsburgh
A network approach to topic models
Unsupervised Learning
Presentation transcript:

Discovering Functional Communities in Social Media Brian Thompson, Linda Ness, David Shallcross, Devasis Bassu Workshop on Data Mining in Social Networks (DMSN) @ ICDM 2015

Discovering Functional Communities in Social Media Problem Description Setup: Social media users share or endorse content such as products, artists, news, and music Goal: Identify functional communities of users who share or endorse common content Challenges: Scalability – there may be many users and lots of content Mixed membership – each user may belong to more than one community Discovering Functional Communities in Social Media

Discovering Functional Communities in Social Media Examples memes shared by users of online social media products endorsed by users of a recommender system [credit: Yannick Feder] [credit: http://blog.soton.ac.uk/hive] Discovering Functional Communities in Social Media

Discovering Functional Communities in Social Media Data Representation A B C D Muthu a bicluster 𝐵 is formed by the intersection of a set of rows and a set of columns Danfeng Paul Rebecca A bicluster is formed by the intersection of a set of rows with a set of columns. Hanghang Discovering Functional Communities in Social Media

Discovering Functional Communities in Social Media Data Representation A B C D Muthu permuting rows and columns can reveal latent structure Danfeng Paul Rebecca A bicluster is formed by the intersection of a set of rows with a set of columns. Hanghang Discovering Functional Communities in Social Media

Discovering Functional Communities in Social Media Data Representation A C B D Muthu permuting rows and columns can reveal latent structure Rebecca Paul Danfeng Hanghang Discovering Functional Communities in Social Media

Discovering Functional Communities in Social Media Co-Clustering Goal: Given a matrix, cluster the rows and columns simultaneously to reveal hidden structure Challenges: Don’t know the number or sizes of clusters a priori Number of possible co-clusterings is exponential in the size of the matrix R1 R2 C1 C2 Discovering Functional Communities in Social Media

Discovering Functional Communities in Social Media Related Work Spectral methods use linear algebraic techniques such as SVD to fit a block diagonal structure Usually require number of clusters to be pre-specified Likely to perform well on the matrix on the left, but not the one on the right: Discovering Functional Communities in Social Media

Discovering Functional Communities in Social Media Our Approach Define a quality metric for co-clusterings that rewards large, dense biclusters Find a co-clustering that maximizes the metric value NP-hard in general, so need efficient heuristics Others have taken a similar approach. The novelty of our work is (1) the metrics we consider reward large, dense clusters, as opposed to other metrics which optimize encoding cost or other objectives; (2) our approach allows any number of clusters and run-time is not dependent on output; (3) we do not require a block diagonal structure; (4) our algorithm efficiently traverses a wide range of possible co-clusterings, without getting stuck at local optima. Discovering Functional Communities in Social Media

Discovering Functional Communities in Social Media Choosing a Metric Motivated by two desired properties: We propose the following class of metrics: Proposition: 𝜇 𝛼 satisfies P1 and P2 for all 𝛼≥0. Property P1 Property P2 𝜇 𝛼 = 𝐵∈𝛱 𝑎 𝐵 2 𝑠(𝐵) ⋅ 𝑤 𝐵 𝑎 𝐵 2+𝛼 large dense 𝑎 𝐵 = area 𝑠 𝐵 = semiperimeter 𝑤 𝐵 = weight where Discovering Functional Communities in Social Media

Discovering Functional Communities in Social Media The CC-MACS Algorithm Co-Clustering via Maximal Anti-Chain Search Build randomized k-d trees on the rows and columns Initialize maximal anti-chains as the leaves of each tree Traverse the trees simultaneously from the bottom up, greedily merging the rows or columns that result in the greatest increase in the metric value Output the co-clustering with the best metric value Complexity: 𝑂 𝑁⋅ log 2 𝑚𝑛 time for an 𝑚×𝑛 matrix, where 𝑁≪𝑚𝑛 is the number of non-zero values Discovering Functional Communities in Social Media

Discovering Functional Communities in Social Media

Discovering Functional Communities in Social Media

Discovering Functional Communities in Social Media

Discovering Functional Communities in Social Media

Discovering Functional Communities in Social Media

Discovering Functional Communities in Social Media

Discovering Functional Communities in Social Media Traverse all the way to the top, instead of stopping at a local optimum. Discovering Functional Communities in Social Media

Discovering Functional Communities in Social Media Output the best co-clustering found. Discovering Functional Communities in Social Media

Discovering Functional Communities in Social Media Experiments: Synthetic Data 1024×1024 matrix with dense biclusters of size 4×4 Compare to Cross-Association algorithm (Chakrabarti et al., KDD ’04) via 𝐹1-score: 𝐹 1 =2⋅ 𝑎𝑐𝑐𝑢𝑟𝑎𝑐𝑦 × 𝑝𝑟𝑒𝑐𝑖𝑠𝑖𝑜𝑛 𝑎𝑐𝑐𝑢𝑟𝑎𝑐𝑦 + 𝑝𝑟𝑒𝑐𝑖𝑠𝑖𝑜𝑛 Averaged over 10 trials. Cross-Association algorithm of Chakrabarti et al. [KDD ’04] Discovering Functional Communities in Social Media

Discovering Functional Communities in Social Media Experiments: Visual Comparison Matrices with known structure (NIST Matrix Market repository) Original Matrix Randomly Permuted Cross-Association CC-MACS ( 𝝁 𝟐 ) Constructed from the domains of finite element modeling (FIDAP005, top, and FIDAPM05, middle) and quantum chemistry (QC324, bottom) Discovering Functional Communities in Social Media

Discovering Functional Communities in Social Media Experiments: Web Memes Meme-Tracker dataset (Leskovec et al., KDD ’09) ~70k domains, ~50k memes, ~4m non-zeros (~0.1%) Top biclusters returned by the CC-MACS algorithm: # of Domains # of Memes Density Topic 21 26 98.2% St. Jude Children’s Hospital 5 178 96.1% Brazilian news 6 39 98.7% Spanish news 20 99.2% Tech news 17 100.0% Politics Meme-Tracker dataset: ~70k rows/domains, ~50k columns/memes/”phrase clusters”, ~4m non-zeros (~0.1%) Discovering Functional Communities in Social Media

Discovering Functional Communities in Social Media Summary of Contributions A new class of co-clustering metrics that reward large, dense biclusters The CC-MACS algorithm, which efficiently searches the space of possible co-clusterings for one which maximizes the value of a given metric Advantages over existing methods: Do not need to specify number of clusters in advance Not limited to matrices with a block diagonal structure Discovering Functional Communities in Social Media

Discovering Functional Communities in Social Media Future Work Consider a more general class of bicluster partitions (not just those formed by co-clustering rows and columns) Bound the approximation factor of our heuristic algorithm Adapt our approach for distributed computation Related Projects Infer pairwise influence between nodes based on the times of their respective activity (MILCOM) Detect correlated activity in comm networks (DaMNet) Discover cascades when textual content is not available Discovering Functional Communities in Social Media

Discovering Functional Communities in Social Media Contact Info: Brian Thompson bthompso8784@gmail.com http://pidancer.com Discovering Functional Communities in Social Media