Lecture outline Clustering aggregation – Reference: A. Gionis, H. Mannila, P. Tsaparas: Clustering aggregation, ICDE 2004 Co-clustering (or bi-clustering)

Slides:



Advertisements
Similar presentations
Clustering.
Advertisements

Lecture 24 MAS 714 Hartmut Klauck
Clustering.
Principal Component Analysis Based on L1-Norm Maximization Nojun Kwak IEEE Transactions on Pattern Analysis and Machine Intelligence, 2008.
Data Mining Cluster Analysis: Basic Concepts and Algorithms
Lecture outline Density-based clustering (DB-Scan) – Reference: Martin Ester, Hans-Peter Kriegel, Jorg Sander, Xiaowei Xu: A Density-Based Algorithm for.
Dimensionality Reduction PCA -- SVD
Online Social Networks and Media. Graph partitioning The general problem – Input: a graph G=(V,E) edge (u,v) denotes similarity between u and v weighted.
Data Mining Techniques: Clustering
Clustering IV 1.
Communities in Heterogeneous Networks Chapter 4 1 Chapter 4, Community Detection and Mining in Social Media. Lei Tang and Huan Liu, Morgan & Claypool,
DIMENSIONALITY REDUCTION BY RANDOM PROJECTION AND LATENT SEMANTIC INDEXING Jessica Lin and Dimitrios Gunopulos Ângelo Cardoso IST/UTL December
UPGMA Algorithm.  Main idea: Group the taxa into clusters and repeatedly merge the closest two clusters until one cluster remains  Algorithm  Add a.
Clustering short time series gene expression data Jason Ernst, Gerard J. Nau and Ziv Bar-Joseph BIOINFORMATICS, vol
Clustering… in General In vector space, clusters are vectors found within  of a cluster vector, with different techniques for determining the cluster.
Introduction to Bioinformatics Algorithms Clustering.
Implicit Hitting Set Problems Richard M. Karp Harvard University August 29, 2011.
4. Ad-hoc I: Hierarchical clustering
1 Vertex Cover Problem Given a graph G=(V, E), find V' ⊆ V such that for each edge (u, v) ∈ E at least one of u and v belongs to V’ and |V’| is minimized.
Introduction to Information Retrieval Introduction to Information Retrieval Hinrich Schütze and Christina Lioma Lecture 16: Flat Clustering 1.
1 AutoPart: Parameter-Free Graph Partitioning and Outlier Detection Deepayan Chakrabarti
Semi-Supervised Clustering Jieping Ye Department of Computer Science and Engineering Arizona State University
1 Clustering Applications at Yahoo! Deepayan Chakrabarti
ECE Synthesis & Verification 1 ECE 667 ECE 667 Synthesis and Verification of Digital Systems Exact Two-level Minimization Quine-McCluskey Procedure.
Clustering Ram Akella Lecture 6 February 23, & 280I University of California Berkeley Silicon Valley Center/SC.
Clustering III. Lecture outline Soft (model-based) clustering and EM algorithm Clustering aggregation [A. Gionis, H. Mannila, P. Tsaparas: Clustering.
Evaluation of Relational Operations. Relational Operations v We will consider how to implement: – Selection ( ) Selects a subset of rows from relation.
Clustering IV. Outline Impossibility theorem for clustering Density-based clustering and subspace clustering Bi-clustering or co-clustering.
Radial Basis Function Networks
Domain decomposition in parallel computing Ashok Srinivasan Florida State University COT 5410 – Spring 2004.
Bi-Clustering Jinze Liu. Outline The Curse of Dimensionality Co-Clustering  Partition-based hard clustering Subspace-Clustering  Pattern-based 2.
Gene expression & Clustering (Chapter 10)
Radial Basis Function Networks
1 Online algorithms Typically when we solve problems and design algorithms we assume that we know all the data a priori. However in many practical situations.
START OF DAY 8 Reading: Chap. 14. Midterm Go over questions General issues only Specific issues: visit with me Regrading may make your grade go up OR.
1 Motivation Web query is usually two or three words long. –Prone to ambiguity –Example “keyboard” –Input device of computer –Musical instruments How can.
Clustering What is clustering? Also called “unsupervised learning”Also called “unsupervised learning”
Clustering.
Spectral Clustering Jianping Fan Dept of Computer Science UNC, Charlotte.
Learning Spectral Clustering, With Application to Speech Separation F. R. Bach and M. I. Jordan, JMLR 2006.
Gene expression & Clustering. Determining gene function Sequence comparison tells us if a gene is similar to another gene, e.g., in a new species –Dynamic.
CS 8751 ML & KDDData Clustering1 Clustering Unsupervised learning Generating “classes” Distance/similarity measures Agglomerative methods Divisive methods.
Machine Learning and Data Mining Clustering (adapted from) Prof. Alexander Ihler TexPoint fonts used in EMF. Read the TexPoint manual before you delete.
E.G.M. PetrakisText Clustering1 Clustering  “Clustering is the unsupervised classification of patterns (observations, data items or feature vectors) into.
Domain decomposition in parallel computing Ashok Srinivasan Florida State University.
Basic Machine Learning: Clustering CS 315 – Web Search and Data Mining 1.
Timing Model Reduction for Hierarchical Timing Analysis Shuo Zhou Synopsys November 7, 2006.
What is a metric embedding?Embedding ultrametrics into R d An embedding of an input metric space into a host metric space is a mapping that sends each.
Identifying “Best Bet” Web Search Results by Mining Past User Behavior Author: Eugene Agichtein, Zijian Zheng (Microsoft Research) Source: KDD2006 Reporter:
© Tan,Steinbach, Kumar Introduction to Data Mining 4/18/ Data Mining: Cluster Analysis This lecture node is modified based on Lecture Notes for Chapter.
1 Pattern Recognition: Statistical and Neural Lonnie C. Ludeman Lecture 28 Nov 9, 2005 Nanjing University of Science & Technology.
Implementation of Database Systems, Jarek Gryz1 Evaluation of Relational Operations Chapter 12, Part A.
1 Microarray Clustering. 2 Outline Microarrays Hierarchical Clustering K-Means Clustering Corrupted Cliques Problem CAST Clustering Algorithm.
Multiple Sequence Alignment Vasileios Hatzivassiloglou University of Texas at Dallas.
CZ5211 Topics in Computational Biology Lecture 4: Clustering Analysis for Microarray Data II Prof. Chen Yu Zong Tel:
Algebraic Techniques for Analysis of Large Discrete-Valued Datasets 
Database Management Systems 3ed, R. Ramakrishnan and J. Gehrke1 Evaluation of Relational Operations Chapter 14, Part A (Joins)
Corresponding Clustering: An Approach to Cluster Multiple Related Spatial Datasets Vadeerat Rinsurongkawong and Christoph F. Eick Department of Computer.
Clustering Data Streams A presentation by George Toderici.
Data Mining Cluster Analysis: Basic Concepts and Algorithms Lecture Notes Introduction to Data Mining by Tan, Steinbach, Kumar © Tan,Steinbach, Kumar Introduction.
COMP24111 Machine Learning K-means Clustering Ke Chen.
Correlation Clustering
Semi-Supervised Clustering
AdWords and Generalized On-line Matching
Haim Kaplan and Uri Zwick
Jianping Fan Dept of CS UNC-Charlotte
Discovering Functional Communities in Social Media
Consensus Partition Liang Zheng 5.21.
Fair Clustering through Fairlets ( NIPS 2017)
Clustering.
Presentation transcript:

Lecture outline Clustering aggregation – Reference: A. Gionis, H. Mannila, P. Tsaparas: Clustering aggregation, ICDE 2004 Co-clustering (or bi-clustering) References: – A. Anagnostopoulos, A. Dasgupta and R. Kumar: Approximation Algorithms for co-clustering, PODS – K. Puolamaki. S. Hanhijarvi and G. Garriga: An approximation ratio for biclustering, Information Processing Letters 2008.

Clustering aggregation Many different clusterings for the same dataset! – Different objective functions – Different algorithms – Different number of clusters Which clustering is the best? – Aggregation: we do not need to decide, but rather find a reconciliation between different outputs

The clustering-aggregation problem Input – n objects X = {x 1,x 2,…,x n } – m clusterings of the objects C 1,…,C m partition: a collection of disjoint sets that cover X Output – a single partition C, that is as close as possible to all input partitions How do we measure closeness of clusterings? – disagreement distance

Disagreement distance For object x and clustering C, C(x) is the index of set in the partition that contains x For two partitions C and P, and objects x,y in X define if I P,Q (x,y) = 1 we say that x,y create a disagreement between partitions P and Q UCP x1x1 11 x2x2 12 x3x3 21 x4x4 33 x5x5 34 D(P,Q) = 4

Metric property for disagreement distance For clustering C: D(C,C) = 0 D(C,C’)≥0 for every pair of clusterings C, C’ D(C,C’) = D(C’,C) Triangle inequality? It is sufficient to show that for each pair of points x,y єX: I x,y (C 1,C 3 )≤ I x,y (C 1,C 2 ) + I x,y (C 2,C 3 ) I x,y takes values 0/1; triangle inequality can only be violated when – I x,y (C 1,C 3 )=1 and I x,y (C 1,C 2 ) = 0 and I x,y (C 2,C 3 )=0 – Is this possible?

Clustering aggregation Given partitions C 1,…,C m find C such that is minimized UC1C1 C2C2 C3C3 C x1x x2x x3x x4x x5x x6x the aggregation cost

Why clustering aggregation? Clustering categorical data The two problems are equivalent UCityProfessionNationality x1x1 New YorkDoctorU.S. x2x2 New YorkTeacherCanada x3x3 BostonDoctorU.S. x4x4 BostonTeacherCanada x5x5 Los AngelesLawerMexican x6x6 Los AngelesActorMexican

Why clustering aggregation? Identify the correct number of clusters – the optimization function does not require an explicit number of clusters Detect outliers – outliers are defined as points for which there is no consensus

Why clustering aggregation? Improve the robustness of clustering algorithms – different algorithms have different weaknesses. – combining them can produce a better result.

Why clustering aggregation? Privacy preserving clustering – different companies have data for the same users. They can compute an aggregate clustering without sharing the actual data.

Complexity of Clustering Aggregation The clustering aggregation problem is NP-hard – the median partition problem [Barthelemy and LeClerc 1995]. Look for heuristics and approximate solutions. ALG(I) ≤ c OPT(I)

A simple 2-approximation algorithm The disagreement distance D(C,P) is a metric The algorithm BEST: Select among the input clusterings the clustering C * that minimizes D(C * ). – a 2-approximate solution. Why?

A 3-approximation algorithm The BALLS algorithm: – Select a point x and look at the set of points B within distance ½ of x – If the average distance of x to B is less than ¼ then create the cluster B U {p} – Otherwise, create a singleton cluster {p} – Repeat until all points are exhausted Theorem: The BALLS algorithm has worst-case approximation factor 3

Other algorithms AGGLO: – Start with all points in singleton clusters – Merge the two clusters with the smallest average inter-cluster edge weight – Repeat until the average weight is more than ½ LOCAL: – Start with a random partition of the points – Remove a point from a cluster and try to merge it to another cluster, or create a singleton to improve the cost of aggregation. – Repeat until no further improvements are possible

Clustering Robustness

Lecture outline Clustering aggregation – Reference: A. Gionis, H. Mannila, P. Tsaparas: Clustering aggregation, ICDE 2004 Co-clustering (or bi-clustering) References: – A. Anagnostopoulos, A. Dasgupta and R. Kumar: Approximation Algorithms for co-clustering, PODS – K. Puolamaki. S. Hanhijarvi and G. Garriga: An approximation ratio for biclustering, Information Processing Letters 2008.

A Clustering m points in R n Group them to k clusters Represent them by a matrix A  R m×n – A point corresponds to a row of A Cluster: Partition the rows to k groups m n RnRn

Co-Clustering A Co-Clustering: Cluster rows and columns of A simultaneously: k = 2 ℓ = 2 Co-cluster

Motivation: Sponsored Search Main revenue for search engines Advertisers bid on keywords A user makes a query Show ads of advertisers that are relevant and have high bids User clicks or not an ad Ads

Motivation: Sponsored Search For every (advertiser, keyword) pair we have: –Bid amount –Impressions –# clicks Mine information at query time –Maximize # clicks / revenue

Ski boots Co-Clusters in Sponsored Search Advertiser Keywords Vancouver Air France Skis.com Bids of skis.com for “ski boots” Markets = co-clusters All these keywords are relevant to a set of advertisers

Co-Clustering in Sponsored Search Applications: Keyword suggestion – Recommend to advertisers other relevant keywords Broad matching / market expansion – Include more advertisers to a query Isolate submarkets – Important for economists – Apply different advertising approaches Build taxonomies of advertisers / keywords

A Clustering of the rows m points in R n Group them to k clusters Represent them by a matrix A  R m×n – A point corresponds to a row of A Clustering: Partitioning of the rows into k groups m n RnRn

Clustering of the columns AR n points in R m Group them to k clusters Represent them by a matrix A  R m×n – A point corresponds to a column of A Clustering: Partitioning of the columns into k groups RnRn m n

R MA Cost of clustering AIAI Original data points AData representation A’ In A’ every point in A (row or column) is replaced by the corresponding representative (row or column) The quality of the clustering is measured by computing distances between the data in the cells of A and A’. k-means clustering: cost = ∑ i=1…n ∑ j=1…m ( A(i,j)-A’(i,j) ) 2 k-median clustering: cost = ∑ i=1…n ∑ j=1…m | A(i,j)-A’(i,j) |

Co-Clustering AMRR M C Co-Clustering: Cluster rows and columns of A  R m×n simultaneously k row clusters, ℓ column clusters Every cell in A is represented by a cell in A’ All cells in the same co-cluster are represented by the same value in the cells of A’ C Original data A Co-cluster representation A’

Co-Clustering Objective Function AR M C In A’ every point in A (row or column) is replaced by the corresponding representative (row or column) The quality of the clustering is measured by computing distances between the data in the cells of A and A’. k-means Co-clustering: cost = ∑ i=1…n ∑ j=1…m ( A(i,j)-A’(i,j) ) 2 k-median Co-clustering: cost = ∑ i=1…n ∑ j=1…m | A(i,j)-A’(i,j) |

Some Background A.k.a.: biclustering, block clustering, … Many objective functions in co-clustering – This is one of the easier – Others factor out row-column average (priors) – Others based on information theoretic ideas (e.g. KL divergence) A lot of existing work, but mostly heuristic – k -means style, alternate between rows/columns – Spectral techniques

Algorithm 1.Cluster rows of A 2.Cluster columns of A 3.Combine

Properties of the algorithm Theorem 1. Algorithm with optimal row/column clusterings is 3- approximation to co-clustering optimum. Theorem 2. For L 2 distance function, the algorithm with optimal row/column clusterings is a 2-approximation.

Algorithm--details Clustering of the n rows of A assigns every row to a cluster with cluster name {1,…,k} – R(i)= r i with 1≤ r i ≤k Clustering of the m columns of A assigns every column to a cluster with cluster name {1,…,ℓ} – C(j)=c j with 1≤ c j ≤ℓ A’(i,j) = {r i,c j } (i,j) is in the same co-cluster as (i’,j’) if A’(i,j)=A’(i’,j’)