Download presentation
Presentation is loading. Please wait.
Published byLoren Henry Modified over 6 years ago
1
Discovering Functional Communities in Social Media
Brian Thompson, Linda Ness, David Shallcross, Devasis Bassu Workshop on Data Mining in Social Networks ICDM 2015
2
Discovering Functional Communities in Social Media
Problem Description Setup: Social media users share or endorse content such as products, artists, news, and music Goal: Identify functional communities of users who share or endorse common content Challenges: Scalability – there may be many users and lots of content Mixed membership – each user may belong to more than one community Discovering Functional Communities in Social Media
3
Discovering Functional Communities in Social Media
Examples memes shared by users of online social media products endorsed by users of a recommender system [credit: Yannick Feder] [credit: Discovering Functional Communities in Social Media
4
Discovering Functional Communities in Social Media
Data Representation A B C D Muthu a bicluster 𝐵 is formed by the intersection of a set of rows and a set of columns Danfeng Paul Rebecca A bicluster is formed by the intersection of a set of rows with a set of columns. Hanghang Discovering Functional Communities in Social Media
5
Discovering Functional Communities in Social Media
Data Representation A B C D Muthu permuting rows and columns can reveal latent structure Danfeng Paul Rebecca A bicluster is formed by the intersection of a set of rows with a set of columns. Hanghang Discovering Functional Communities in Social Media
6
Discovering Functional Communities in Social Media
Data Representation A C B D Muthu permuting rows and columns can reveal latent structure Rebecca Paul Danfeng Hanghang Discovering Functional Communities in Social Media
7
Discovering Functional Communities in Social Media
Co-Clustering Goal: Given a matrix, cluster the rows and columns simultaneously to reveal hidden structure Challenges: Don’t know the number or sizes of clusters a priori Number of possible co-clusterings is exponential in the size of the matrix R1 R2 C1 C2 Discovering Functional Communities in Social Media
8
Discovering Functional Communities in Social Media
Related Work Spectral methods use linear algebraic techniques such as SVD to fit a block diagonal structure Usually require number of clusters to be pre-specified Likely to perform well on the matrix on the left, but not the one on the right: Discovering Functional Communities in Social Media
9
Discovering Functional Communities in Social Media
Our Approach Define a quality metric for co-clusterings that rewards large, dense biclusters Find a co-clustering that maximizes the metric value NP-hard in general, so need efficient heuristics Others have taken a similar approach. The novelty of our work is (1) the metrics we consider reward large, dense clusters, as opposed to other metrics which optimize encoding cost or other objectives; (2) our approach allows any number of clusters and run-time is not dependent on output; (3) we do not require a block diagonal structure; (4) our algorithm efficiently traverses a wide range of possible co-clusterings, without getting stuck at local optima. Discovering Functional Communities in Social Media
10
Discovering Functional Communities in Social Media
Choosing a Metric Motivated by two desired properties: We propose the following class of metrics: Proposition: 𝜇 𝛼 satisfies P1 and P2 for all 𝛼≥0. Property P1 Property P2 𝜇 𝛼 = 𝐵∈𝛱 𝑎 𝐵 2 𝑠(𝐵) ⋅ 𝑤 𝐵 𝑎 𝐵 2+𝛼 large dense 𝑎 𝐵 = area 𝑠 𝐵 = semiperimeter 𝑤 𝐵 = weight where Discovering Functional Communities in Social Media
11
Discovering Functional Communities in Social Media
The CC-MACS Algorithm Co-Clustering via Maximal Anti-Chain Search Build randomized k-d trees on the rows and columns Initialize maximal anti-chains as the leaves of each tree Traverse the trees simultaneously from the bottom up, greedily merging the rows or columns that result in the greatest increase in the metric value Output the co-clustering with the best metric value Complexity: 𝑂 𝑁⋅ log 2 𝑚𝑛 time for an 𝑚×𝑛 matrix, where 𝑁≪𝑚𝑛 is the number of non-zero values Discovering Functional Communities in Social Media
12
Discovering Functional Communities in Social Media
13
Discovering Functional Communities in Social Media
14
Discovering Functional Communities in Social Media
15
Discovering Functional Communities in Social Media
16
Discovering Functional Communities in Social Media
17
Discovering Functional Communities in Social Media
18
Discovering Functional Communities in Social Media
Traverse all the way to the top, instead of stopping at a local optimum. Discovering Functional Communities in Social Media
19
Discovering Functional Communities in Social Media
Output the best co-clustering found. Discovering Functional Communities in Social Media
20
Discovering Functional Communities in Social Media
Experiments: Synthetic Data 1024×1024 matrix with dense biclusters of size 4×4 Compare to Cross-Association algorithm (Chakrabarti et al., KDD ’04) via 𝐹1-score: 𝐹 1 =2⋅ 𝑎𝑐𝑐𝑢𝑟𝑎𝑐𝑦 × 𝑝𝑟𝑒𝑐𝑖𝑠𝑖𝑜𝑛 𝑎𝑐𝑐𝑢𝑟𝑎𝑐𝑦 + 𝑝𝑟𝑒𝑐𝑖𝑠𝑖𝑜𝑛 Averaged over 10 trials. Cross-Association algorithm of Chakrabarti et al. [KDD ’04] Discovering Functional Communities in Social Media
21
Discovering Functional Communities in Social Media
Experiments: Visual Comparison Matrices with known structure (NIST Matrix Market repository) Original Matrix Randomly Permuted Cross-Association CC-MACS ( 𝝁 𝟐 ) Constructed from the domains of finite element modeling (FIDAP005, top, and FIDAPM05, middle) and quantum chemistry (QC324, bottom) Discovering Functional Communities in Social Media
22
Discovering Functional Communities in Social Media
Experiments: Web Memes Meme-Tracker dataset (Leskovec et al., KDD ’09) ~70k domains, ~50k memes, ~4m non-zeros (~0.1%) Top biclusters returned by the CC-MACS algorithm: # of Domains # of Memes Density Topic 21 26 98.2% St. Jude Children’s Hospital 5 178 96.1% Brazilian news 6 39 98.7% Spanish news 20 99.2% Tech news 17 100.0% Politics Meme-Tracker dataset: ~70k rows/domains, ~50k columns/memes/”phrase clusters”, ~4m non-zeros (~0.1%) Discovering Functional Communities in Social Media
23
Discovering Functional Communities in Social Media
Summary of Contributions A new class of co-clustering metrics that reward large, dense biclusters The CC-MACS algorithm, which efficiently searches the space of possible co-clusterings for one which maximizes the value of a given metric Advantages over existing methods: Do not need to specify number of clusters in advance Not limited to matrices with a block diagonal structure Discovering Functional Communities in Social Media
24
Discovering Functional Communities in Social Media
Future Work Consider a more general class of bicluster partitions (not just those formed by co-clustering rows and columns) Bound the approximation factor of our heuristic algorithm Adapt our approach for distributed computation Related Projects Infer pairwise influence between nodes based on the times of their respective activity (MILCOM) Detect correlated activity in comm networks (DaMNet) Discover cascades when textual content is not available Discovering Functional Communities in Social Media
25
Discovering Functional Communities in Social Media
Contact Info: Brian Thompson Discovering Functional Communities in Social Media
Similar presentations
© 2024 SlidePlayer.com. Inc.
All rights reserved.