1 Clustering Applications at Yahoo! Deepayan Chakrabarti

Slides:



Advertisements
Similar presentations
Consistent Bipartite Graph Co-Partitioning for High-Order Heterogeneous Co-Clustering Tie-Yan Liu WSM Group, Microsoft Research Asia Joint work.
Advertisements

TrustRank Algorithm Srđan Luković 2010/3482
Lecture outline Density-based clustering (DB-Scan) – Reference: Martin Ester, Hans-Peter Kriegel, Jorg Sander, Xiaowei Xu: A Density-Based Algorithm for.
CMU SCS Copyright: C. Faloutsos (2012)# : Multimedia Databases and Data Mining Lecture #27: Graph mining - Communities and a paradox Christos.
1 Connectivity Structure of Bipartite Graphs via the KNC-Plot Erik Vee joint work with Ravi Kumar, Andrew Tomkins.
Information Networks Graph Clustering Lecture 14.
Online Social Networks and Media. Graph partitioning The general problem – Input: a graph G=(V,E) edge (u,v) denotes similarity between u and v weighted.
Contextual Advertising by Combining Relevance with Click Feedback D. Chakrabarti D. Agarwal V. Josifovski.
Bandits for Taxonomies: A Model-based Approach Sandeep Pandey Deepak Agarwal Deepayan Chakrabarti Vanja Josifovski.
Lecture outline Clustering aggregation – Reference: A. Gionis, H. Mannila, P. Tsaparas: Clustering aggregation, ICDE 2004 Co-clustering (or bi-clustering)
Communities in Heterogeneous Networks Chapter 4 1 Chapter 4, Community Detection and Mining in Social Media. Lei Tang and Huan Liu, Morgan & Claypool,
Spectral Clustering 指導教授 : 王聖智 S. J. Wang 學生 : 羅介暐 Jie-Wei Luo.
Discovering Overlapping Groups in Social Media Xufei Wang, Lei Tang, Huiji Gao, and Huan Liu Arizona State University.
Normalized Cuts and Image Segmentation Jianbo Shi and Jitendra Malik, Presented by: Alireza Tavakkoli.
1 Estimating Rates of Rare Events at Multiple Resolutions Deepak Agarwal Andrei Broder Deepayan Chakrabarti Dejan Diklic Vanja Josifovski Mayssam Sayyadian.
Fully Automatic Cross-Associations Deepayan Chakrabarti (CMU) Spiros Papadimitriou (CMU) Dharmendra Modha (IBM) Christos Faloutsos (CMU and IBM)
Clustering (Part II) 11/26/07. Spectral Clustering.
A scalable multilevel algorithm for community structure detection
1 AutoPart: Parameter-Free Graph Partitioning and Outlier Detection Deepayan Chakrabarti
Clustering In Large Graphs And Matrices Petros Drineas, Alan Frieze, Ravi Kannan, Santosh Vempala, V. Vinay Presented by Eric Anderson.
Web Projections Learning from Contextual Subgraphs of the Web Jure Leskovec, CMU Susan Dumais, MSR Eric Horvitz, MSR.
1 Fully Automatic Cross-Associations Deepayan Chakrabarti (CMU) Spiros Papadimitriou (CMU) Dharmendra Modha (IBM) Christos Faloutsos (CMU and IBM)
1 Matching DOM Trees to Search Logs for Accurate Webpage Clustering Deepayan Chakrabarti Rupesh Mehta.
FLANN Fast Library for Approximate Nearest Neighbors
Image Segmentation Image segmentation is the operation of partitioning an image into a collection of connected sets of pixels. 1. into regions, which usually.
COMMUNITIES IN MULTI-MODE NETWORKS 1. Heterogeneous Network Heterogeneous kinds of objects in social media – YouTube Users, tags, videos, ads – Del.icio.us.
Domain decomposition in parallel computing Ashok Srinivasan Florida State University COT 5410 – Spring 2004.
Segmentation using eigenvectors Papers: “Normalized Cuts and Image Segmentation”. Jianbo Shi and Jitendra Malik, IEEE, 2000 “Segmentation using eigenvectors:
Physical Database Design & Performance. Optimizing for Query Performance For DBs with high retrieval traffic as compared to maintenance traffic, optimizing.
1 A Graph-Theoretic Approach to Webpage Segmentation Deepayan Chakrabarti Ravi Kumar
A Scalable Self-organizing Map Algorithm for Textual Classification: A Neural Network Approach to Thesaurus Generation Dmitri G. Roussinov Department of.
Jure Leskovec Computer Science Department Cornell University / Stanford University Joint work with: Jon Kleinberg (Cornell), Christos.
Output URL Bidding Panagiotis Papadimitriou, Hector Garcia-Molina, (Stanford University) Ali Dasdan, Santanu Kolay (Ebay Inc)
CS654: Digital Image Analysis
Detecting Communities Via Simultaneous Clustering of Graphs and Folksonomies Akshay Java Anupam Joshi Tim Finin University of Maryland, Baltimore County.
Jiafeng Guo(ICT) Xueqi Cheng(ICT) Hua-Wei Shen(ICT) Gu Xu (MSRA) Speaker: Rui-Rui Li Supervisor: Prof. Ben Kao.
Spectral Clustering Jianping Fan Dept of Computer Science UNC, Charlotte.
Learning Spectral Clustering, With Application to Speech Separation F. R. Bach and M. I. Jordan, JMLR 2006.
Predictive Discrete Latent Factor Models for large incomplete dyadic data Deepak Agarwal, Srujana Merugu, Abhishek Agarwal Y! Research MMDS Workshop, Stanford.
CS 484 Load Balancing. Goal: All processors working all the time Efficiency of 1 Distribute the load (work) to meet the goal Two types of load balancing.
Jure Leskovec Kevin J. Lang Anirban Dasgupta Michael W. Mahoney WWW’ 2008 Statistical Properties of Community Structure in Large Social and Information.
Data Structures and Algorithms in Parallel Computing Lecture 7.
About Me Swaroop Butala  MSCS – graduating in Dec 09  Specialization: Systems and Databases  Interests:  Learning new technologies  Application of.
ApproxHadoop Bringing Approximations to MapReduce Frameworks
Graph Partitioning using Single Commodity Flows
Efficient Semi-supervised Spectral Co-clustering with Constraints
Graphs, Vectors, and Matrices Daniel A. Spielman Yale University AMS Josiah Willard Gibbs Lecture January 6, 2016.
Community structure in graphs Santo Fortunato. More links “inside” than “outside” Graphs are “sparse” “Communities”
Non-parametric Methods for Clustering Continuous and Categorical Data Steven X. Wang Dept. of Math. and Stat. York University May 13, 2010.
Ariel Fuxman, Panayiotis Tsaparas, Kannan Achan, Rakesh Agrawal (2008) - Akanksha Saxena 1.
CS 440 Database Management Systems Web Data Management 1.
CS 540 Database Management Systems Web Data Management some slides are due to Kevin Chang 1.
Purnamrita Sarkar (Carnegie Mellon) Andrew W. Moore (Google, Inc.)
A Peta-Scale Graph Mining System
Data Driven Resource Allocation for Distributed Learning
NetMine: Mining Tools for Large Graphs
Graph and Tensor Mining for fun and profit
Large Graph Mining: Power Tools and a Practitioner’s guide
Bandits for Taxonomies: A Model-based Approach
Jianping Fan Dept of CS UNC-Charlotte
Chapter 6. Large Scale Optimization
Grouping.
Discovering Functional Communities in Social Media
Statistical properties of network community structure
Overcoming Resolution Limits in MDL Community Detection
Spectral Clustering Eric Xing Lecture 8, August 13, 2010
3.3 Network-Centric Community Detection
Computational Advertising and
Chapter 6. Large Scale Optimization
--WWW 2010, Hongji Bao, Edward Y. Chang
Presentation transcript:

1 Clustering Applications at Yahoo! Deepayan Chakrabarti

2 Outline Clustering, in itself, is often not the primary problem  Exploratory analysis is rarely needed Methods are often tied to the “final” goal Data Problem Method Issues Data Problem Method Issues Data Problem Method Issues …

3 Outline Advertiser-keyword graph (+ social networks)  Graph Partitioning CTR predictions for ads on webpages  Co-clustering Query refinement and suggestions  Local search methods Conclusions

4 Outline Advertiser-keyword graph (+ social networks)  Graph Partitioning CTR predictions for ads on webpages  Co-clustering Query refinement and suggestions  Local search methods Conclusions

5 Graph Partitioning (Applications) Find clusters of advertisers and keywords  Keyword suggestions  Running experiments on some “natural” clusters Similar:  Y! Answers  Flickr Advertiser Bidded Keyword bids ~10M nodes

6 Graph Partitioning (Applications) Find clusters of IM users  Targeted advertising  Exploratory analysis Clusters of the Web Graph  Distributed pagerank computation ~100M nodes Who-messages-whom IM graph

7 Graph Partitioning (Methods) Basic “Global” spectral partitioning [Ng+/01]  Find 2 nd eigenvector of the graph Laplacian  This embeds all nodes on the real line  Split the line in two, to get two clusters Can approximate the optimal conductance cut  For more clusters: Use k eigenvectors (for known k), or Split in two, and recurse on each cluster

8 Graph Partitioning (Methods) However, this has problems [Lang/05, Leskovec+/08]:  Min. conductance or quotient cuts lead to “small chunks” Better balance  worse cuts Large-sized low-quotient cuts are actually just unions of whiskers “Whiskers”

9 Graph Partitioning (Methods) However, this has problems:  Min. conductance or quotient cuts lead to “small chunks” Balance is not very strongly encouraged  Recursive partitioning takes too long Each eigenvector computation yields only a small cluster being broken off Two alternatives:  More balanced cuts  recursion is faster  Unbalanced cuts, but much faster computation

10 Graph Partitioning (Methods) Achieving better balance  Combine algorithms (e.g., [Anderson+/08]) METIS (more balanced cuts), followed by Flow-based improvement (conductance)  Stronger balance constraints Many nodesOne node Spectral embedding Perfect balance on real line: NP-Hard Perfect balance on hypersphere: SDP formulation [Lang/05]

11 Graph Partitioning (Methods) Faster Computation via “local” graph partitioning [Spielman+/04, Anderson+/06]  Pick seeds randomly  Build local clusters around seed  Bite off cluster, and repeat Time for each iteration is proportional to the size of the local cluster (scalability) Better for large graphs, or when not all clusters are needed

12 Graph Partitioning (Issues) Many complex networks are very different from planar/mesh-like networks Good small cuts Good large cuts are hard to find, and may not even exist Hard to have a good hierarchical partitioning Is conductance really the best objective?  METIS+flow finds lower conductance cuts, but  Local spectral methods finds more “tightly-knit” cuts [Leskovec+/08]  Is there a good compromise?

13 Graph Partitioning (Issues) Speed and Scalability  Most implementations have trouble with large graphs, or graphs with some extremely high- degree nodes  Map-reduce style partitioning algorithms?

14 Outline Advertiser-keyword graph (+ social networks)  Graph Partitioning CTR predictions for ads on webpages  Co-clustering Query refinement and suggestions  Local search methods Conclusions

15 Co-clustering (Applications) The Content Match problem  Predict click-thru rate (CTR)  Extreme sparsity Few views Even fewer clicks Ads Webpages clicks views = CTR

16 Co-clustering (Methods) Approximate cell CTR using block CTR [Dhillon+/03] Minimize divergence between the original matrix and its reconstruction Combats sparsity Ad clusters Webpage clusters = row effect + column effect + block effectCell CTR

17 Co-clustering (Issues) Picking the right number of clusters  Use MDL [Chakrabarti+/2004] Handling new ads/pages  Combine co-clustering with feature-based prediction models [Agarwal+/2007] Handling extra information  E.g., if each page and ad can be categorized into a taxonomy [Chakrabarti+/2007]  Can the taxonomy be automatically modified?

18 Co-clustering (Issues) Iterative process (hard for map-reduce)  Factor-3 approximation by just clustering webpages and ads separately [Dasgupta+/2008]

19 Outline Advertiser-keyword graph (+ social networks)  Graph Partitioning CTR predictions for ads on webpages  Co-clustering Query refinement and suggestions  Local search methods Conclusions

20 Query Refinement User inputs ambiguous query (“madonna”) Search engine asks: “Did you mean: songs, videos, pictures?” lyrics “song title” album … Refinement = cluster of terms

21 pictures photos songs lyrics albums Query Refinement Suppose we could relate queries with keywords Honda Ford Queries Related Keywords Madonna Beatles

22 Query Refinement (Problem) Different from plain bipartite graph partitioning  Don’t confuse users! only 3 or fewer clusters for each query only a few easily-distinguishable clusters overall Clustering quality now also depends on  the query workload  the algo that picks the “top-3” clusters for any query

23 Query Refinement (Method) Can optimally pick top-k clusters for any query [Wang+/2009]  for a wide range of matching functions  only if clusters are disjoint Iteratively improve clustering via local search  Move a keyword to a new cluster  Update top-k clusters for all queries in workload  Repeat

24 Query Refinement (Issues) Iterative  slow  Each iteration has to go over the entire historical query logs  Optimality guarantees?  Modeling issues: How do we present a cluster to the user? Cluster naming? How does a user interact with a cluster?

25 Outline Advertiser-keyword graph (+ social networks)  Graph Partitioning CTR predictions for ads on webpages  Co-clustering Query refinement and suggestions  Local search methods Conclusions

26 Conclusions Standalone clustering applications are rare!  Constraints on clustering What use will it serve (as in query refinement)? What is the desired “natural” cluster size?  Clustering for predictions (e.g., CTR) Missing values Extreme sparsity  Combining clustering with explore-exploit

27 Conclusions Even for standalone clustering:  Scalability concerns 10s to 100s of millions of nodes Skewed degree distributions Algorithms amenable to map-reduce  The right “balance” relaxation Tradeoffs between cluster “compactness”, conductance, and partition sizes Is it even reasonable to require balance?

28 References On Spectral Clustering: Analysis and an Algorithm, by Ng, Jordan, & Weiss, in NIPS 2002 Nearly-linear time algorithms for graph partitioning, graph sparsification, and solving linear systems, by Spielman & Teng, STOC 2004 Fixing two weaknesses of the Spectral Method, by Lang, NIPS 2005 Local Graph Partitioning using PageRank Vectors, by Anderson, Chung, & Lang, SODA 2008 An algorithm for improving graph partitions, by Anderson & Lang, SODA 2008 Finding dense and isolated submarkets in a sponsored search spending graph, by Lang & Anderson, CIKM 2007 Clustering of bipartite advertiser-keyword graph, by Carrasco, Fain, Lang, & Zhukov, IEEE Computer Society, 2003 Statistical Properties of Community Structure in Large Social and Information Networks, by Leskovec, Lang, Dasgupta, & Mahoney, WWW 2008 Approximate Algorithms for Co-Clustering, by Anagnostopoulos, Dasgupta, & Kumar, PODS 2008 Information-theoretic co-clustering, by Dhillon, Mallela, & Modha, in KDD 2003 Fully Automatic Cross-Associations, by Chakrabarti, Papadimitriou, & Faloutsos, in KDD 2004 Predictive discrete latent factor models for large scale dyadic data, by Merugu, & Agarwal, in KDD 2007 Estimating Rates of Rare Events at Multiple Resolutions, by Agarwal, Broder, Chakrabarti, Diklic, Josifovski, & Sayyadian, in KDD 2007 Mining Broad Latent Query Aspects from Search Sessions, by Wang, Chakrabarti, & Punera, in KDD 2009

29 Graph Partitioning (Applications) Find clusters of users posting in similar groups  Enhance group activity  Are their special groups? Can we build special tools for them? Similar:  Y! Answers  Flickr Users Yahoo! Groups ~100M users, ~10M groups

30 Graph Partitioning (Issues) Many complex networks are very different from planar/mesh-like networks Good small cuts Good large cuts are hard to find, and may not even exist Hard to have a good hierarchical partitioning Good large cuts  If they exist, most algorithms find them  If not: METIS+flow finds lower conductance cuts, but local spectral methods finds more “coherent” cuts  Is there a good compromise?

31 Query Refinement Given an ambiguous query, present user with up to 5 refinements  madonna  songs, videos, pictures Each refinement represents a cluster of terms  songs = (lyrics, “song title”, album, …)

32 lyrics albums pictures photos songs Query Refinement How can we relate “madonna” and “songs”? Search “madonna” Search “madonna songs” click no click Beatles Honda Ford QueriesReformulations Madonna