Presentation is loading. Please wait.

Presentation is loading. Please wait.

Clustering Using Pairwise Comparisons

Similar presentations


Presentation on theme: "Clustering Using Pairwise Comparisons"β€” Presentation transcript:

1 Clustering Using Pairwise Comparisons
R. Srikant ECE/CSL University of Illinois at Urbana-Champaign

2 Coauthors Barbara Dembin Siddhartha Satpathi
Builds on the work in R. Wu, J. Xu, R. Srikant, L. Massoulie, M. Lelarge, and B. Hajek,Clustering and Inference from Pairwise comparisons (arXiv: v2)

3 Outline Traditional Noisy Pairwise Comparisons
Our Problem: Clustering users Algorithm in Prior Work New Algorithm Conclusions

4 Noisy pairwise comparisons
Amazon DSLR Item 1 < item 2; item 3 < item 2 Goal: Infer information about user preferences from such pairwise rankings The user buys this

5 Bradley-Terry model Item 𝑖 is associated with a score πœƒ 𝑖
𝑃 item 𝑖 is preferred over item 𝑗 = 𝑒 πœƒ 𝑖 𝑒 πœƒ 𝑖 + 𝑒 πœƒ 𝑗 Goal: Estimate the vector πœƒ from the pairwise comparisons Assumption: all users belong to one cluster, i.e., have the same πœƒ vector. So we can aggregate the results from all users to estimate πœƒ

6 The data about the π‘š items
(1, 2) (1, 3) ... (1, m) (2, 3) … (m-1, m) 1 -1

7 Maximum likelihood estimation
Let 𝑅 𝑖𝑗 be the number of times item 𝑖 is preferred over item 𝑗 Maximum likelihood estimation πœƒ = argmax 𝛾 𝐿(𝛾) 𝐿 𝛾 = 𝑖, 𝑗 𝑅 𝑖𝑗 log 𝑒 𝛾 𝑖 𝑒 𝛾 𝑖 + 𝑒 𝛾 𝑗 Well Studied: (Hunter 2004), (Negahban, Oh, D. Shah 2014) Non-parametric: NB Shah and Wainwright (2016)

8 Outline Traditional Noisy Pairwise Comparisons
Our Problem: Clustering users Algorithm in Prior Work New Algorithm Conclusions

9 Clustering Users & Ranking Items
Amazon camera Different types of users use different score vectors Cluster users of the same type together, and then estimate the Bradley-Terry parameters for each cluster

10 Generalized Bradley-Terry model
𝑛 users and π‘š items (𝑛,π‘šβ†’βˆž) Users are in π‘Ÿ clusters (π‘Ÿ is a constant) : users in cluster π‘˜ have the same score vector πœƒ π‘˜ : 𝑃 item 𝑖 is preferred over item 𝑗 = 𝑒 πœƒ π‘˜, 𝑖 𝑒 πœƒ π‘˜, 𝑖 + 𝑒 πœƒ π‘˜, 𝑗 Each user compares a pair of items with probability 1βˆ’πœ–: want πœ– to be close to 1 2018/11/20

11 Observation Matrix (1, 2) (1, 3) ... (1, m) (2, 3) … (m-1, m) 1 -1

12 Observation Matrix (1, 2) (1, 3) ... (1, m) (2, 3) … (m-1, m) ? 1 -1

13 Questions We focus on the clustering problem
Once users are clustered, parameter estimation can be performed using other techniques; the results here don’t explicitly depend on the Bradley-Terry model What is the minimum of samples (pairwise comparisons) needed to cluster the users from pairwise comparison data ? What algorithm should we use to achieve this limit ? We will provide answers to these questions in the reverse order

14 Outline Traditional Noisy Pairwise Comparisons
Our Problem: Clustering users Algorithm in Prior Work New Algorithm Conclusions

15 Net Wins Matrix (1,2) (1,3) (1,4) (2,3) (2,4) (3,4) Item 1 2 3 4 1 -1
-1 Item 1 2 3 4 -1

16 Why Net Wins Matrix ? The original pairwise comparisons data is very noisy, unless the same pair of items is shown to the same user many times (which is not the case in our model) The net wins matrix reduces the π‘š 2 comparisons for each user to information about the π‘š items Makes the data less noisy

17 Clustering rows of Net Wins Matrix
Spectral Clustering Clustering rows of Net Wins Matrix Step 1: The expected net wins matrix has only π‘Ÿ independent rows. The true net wins matrix has a singular value distribution that looks like this (example, π‘Ÿ=10):

18 Spectral Clustering 𝝈𝟏>…>πˆπ’
Step 2: Perform Singular Value Decomposition, and retain only the top π‘Ÿ singular values, and set the rest equal to zero 𝝈𝟏>…>πˆπ’

19 Spectral Clustering 𝝈𝟏>…>πˆπ’
Step 3: Cluster the rows of the rank π‘Ÿ projection using the K-means algorithm, for example: 𝝈𝟏>…>πˆπ’

20 Result from Prior Work (assume π‘š=𝑛)
With π’“πŸ logπŸ‘ 𝑛 pairwise comparisons per user at most 𝑲 π’π’π’ˆ 𝒏 users are misclustered with high probability While the fraction of misclustered users goes to zero, the rate at which it goes to zero is not satisfactory Moreover, to prove that perfect clustering (all users clustered with high probability) is achieved, we need 𝒏 𝒓 𝟐 π₯𝐨𝐠 πŸ“ 𝒏 pairwise comparisons/user Can we prove that perfect clustering is achieved with high probability with far fewer comparisons? Yes, we tweak the previous algorithm (Spectral Clustering on the Net Wins matrix)

21 Outline Traditional Noisy Pairwise Comparisons
Our Problem: Clustering users Algorithm in Prior Work New Algorithm Conclusions

22 Outline of the Algorithm
Split the items into different partitions, and only consider the pairwise comparisons data within each partition (inspired by (Vu, 2014) for community detection) Apply the previous algorithm to each data partition, and cluster the users based on the information in each partition Can result in inconsistent clusters: users 1 and 2 may be in the same cluster in one partition, but not in another partition. Which one of these clusters is correct? Use simple majority voting to correct errors, i.e., assign the user to the cluster to which it belongs most often

23 (Note: some data is lost)
Data Partitioning Split the items into 𝐿 sets, Example: 𝐿 =2 (1, 2) (1, 3) (1,4) (1, 5) (1, 6) (2,3) (2, 4) (2, 5) (2, 6) (3, 4) (3, 5) (3, 6) (4,5) (4,6) (5,6) 1 -1 (1, 2) (1, 3) (2,3) 1 -1 (4, 5) (4, 6) (5,6) 1 -1 𝐿 pairwise comparison matrices 𝐿 Net Wins matrices (Note: some data is lost)

24 Cluster Users Based on Each Partition
Item 1 3 4 18 -1 Partition 1 1 r Spectral clustering Partition L Item 2 5 19 33 1 -1 1 r L Net Wins matrices L different clusterings

25 Numbering the Clusters
Number the clusters 1, 2, … , r arbitrarily in the first data partition For the second partition, the cluster which overlaps the most with cluster 1 in Partition 1 is called cluster 1, the cluster which overlaps the most with cluster 2 in Partition 1 is called cluster 2, and so on Partition 1 Partition 2 Partition 3 Partition 4 1 2 3 ? ? ? ? ? ? ? ? ? 2016/5/2

26 Numbering the Clusters
Number the clusters 1, 2, … , r in the results from the first data partition For the second partition, the cluster which overlaps the most with cluster 1 in Partition 1 is called cluster 1, the cluster which overlaps the most with cluster 2 in Partition 1 is called cluster 2, and so on Partition 1 Partition 2 Partition 3 Partition 4 1 2 3 3 2 1 1 3 2 2 1 3 2016/5/2

27 Clustering the Users A user may belong to cluster 1 in one partition, but may belong to some other cluster in another partition Majority voting determines the correct cluster for each user. Partition 1 Partition 2 Partition 3 Partition 4 1 2 3 1 2 3 1 2 3 1 2 3 = User 𝒖 e.g. Here # of data partitions 𝐿 = 4, # clusters π‘Ÿ = 3 e.g. Here user 𝑒 is assigned to cluster 2

28 Summary of the algorithm
Partition items uniformly into L sets Partition 1 Item 1 2 3 4 -1 Net Wins matrix Partition L Item 1 2 3 4 -1 Partition 1 1 r Majority voting 1 r Spectral Clustering Final clustering of users Partition L 1 r

29 Main Result Previous result: If more than 𝒏 𝒓 𝟐 π₯𝐨𝐠 πŸ“ 𝒏 pairwise comparisons/user are available, all users are clustered w.p. at least (1 – 1/n). New result: If more than 𝒓 π₯𝐨𝐠 πŸ“ 𝒏 pairwise comparisons/user are available, all users are correctly clustered w.p. at least (1 – 1/n). Key Idea: Spectral clustering results in many incorrectly clustered users Split the items into many groups, perform spectral clustering on each, and combine the results using majority voting Works despite loss of data in the partitioning process Idea works for more general models than the B-T model

30 Outline of the Proof: Part I
Two rows of the expected Net Wins matrix belonging to different clusters are well separated: 𝑆 𝑒 βˆ’ 𝑆 𝑣 2 > 𝐢 1 1βˆ’πœ– 𝑛 (by assumption) Let 𝑃 π‘Ÿ (β‹…) be the rank r projection. Using concentration inequalities 𝑃 π‘Ÿ 𝑆 𝑒 βˆ’ 𝑆 𝑒 2 ≀ 𝐢 2 log 𝑛 1βˆ’πœ–

31 Outline of Proof: Part II
All the clusters are well separated with high probability if we have a lot of measurements (as in the previous paper) But with fewer measurements, the probability of misclustering is 𝛿, which does not go to zero when π‘›β†’βˆž 𝑺 𝒖 𝑺 𝒗 𝑺 𝒖 𝑺 𝒗

32 Outline of the Proof: Part III
Partition items into 𝐿 sets In each set, user 𝑒 is misclustered w.p. Ξ΄ By the Chernoff bound, 𝑃( 𝑒 is misclustered in more than 𝐿/2 sets) < 𝑒π‘₯𝑝(βˆ’ π›Ώβˆ’ 𝐿/2) For 𝐿=𝐢 log⁑(𝑛), majority voting clusters all users correctly

33 Lower Bound on Sample Complexity
Event A: Two users from different clusters have no pairwise comparisons. If A occurs, all users cannot be clustered correctly. P(A) β†’1 as nβ†’βˆž when 1 βˆ’πœ–<𝑂( log 𝑛 𝑛 2 )

34 Main Result If more than 𝒓 π₯𝐨𝐠 πŸ“ 𝒏 pairwise comparisons/user are available, all users are correctly clustered w.p. at least 1 – 1/n. The number of comparisons required is within a polylog factor of the lower bound Assumption required for the main result: The rows in different clusters of the expected Net Wins matrix are well separated

35 Related Work Vu (2014) Lu-Negahban (2014)
Exact cluster recovery in community detection through spectral methods Partition data into two sets, use one for clustering and other to correct errors in the recovered clusters Lu-Negahban (2014) Bradley-Terry parameters are different for each user, but form a low-rank matrix Park, Neeman, Zhang, Sanghavi (2015) Related to the model above, but with a different algorithm Oh, Thekumparampil, Xu (2015) Generalization to multi-item rankings

36 Conclusions Algorithm to achieve perfect clustering with high probability Majority voting from spectral clustering over different data partitions Number of samples required is within poly( log 𝑛 ) factor of a lower bound


Download ppt "Clustering Using Pairwise Comparisons"

Similar presentations


Ads by Google