Learning Clusterwise Similarity with First-Order Features Aron Culotta and Andrew McCallum University of Massachusetts - Amherst NIPS Workshop on Theoretical Foundations of Clustering December 10, 2005
Supervised Clustering Estimate pairwise similarity metric
Supervised Clustering
Conditional Models of Identity Uncertainty with Application to Noun Coreference y 12 y 23 y 13 g 12 g 13 g 23 [McCallum, Wellner 04] g 123 transitivity checking function Learned Pairwise Metric x2x2 x1x1 x3x3 He Jon Jonathan 1 1 1
Inference = Graph Partitioning [McCallum, Wellner 04] [Boykov et al 99] [Bansal et al 02] x2x2 x1x1 x3x3 He Jon Jonathan
Inside the Pairwise Metric String x i has low edit distance to x j x i is a pronoun in the same sentence as x j x i is the same number and gender as x j
Drawbacks of Pairwise Metric Cannot represent cluster-wide constraints E.g. –A cluster of pronouns should have at least one non- pronoun. –A researcher is unlikely to publish in more than 5 different conferences in the same year –A person is unlikely to have more than 3 different job titles in the same year [Milch et al 04]
Clusterwise Metric Measures compatibility of all nodes in a cluster Enables first-order features –mean, median, mode of attributes –maximum string edit distance is K –cluster size is greater than N
Probabilistic Interpretation of Pairwise Metric Learning x2x2 x1x1 x3x3 y 12 y 23 y 13 g 12 g 13 g 23
Probabilistic Interpretation of Clusterwise Metric Learning x2x2 x1x1 x3x3 y 12 y 23 y 13 g 12 g 13 g 23 y 123 g 123
Empirical Results Citation matching –paper deduplication –author deduplication/disambiguation Proper noun coreference Modest but consistent improvements over pairwise metric (10-30% error reduction)
Implications of Clusterwise Metric x2x2 x1x1 x3x Locally compatible -122 Globally incompatible
Open Questions What is the geometric interpretation for clusterwise metric? What are implications of clusterwise metrics on common clustering methods? What is kernel interpretation for clusterwise metric?
References N. Bansal et al. Correlation Clustering. FOCS 02 Yuri Boykov et al. Fast Approximate Energy Minimization via Graph Cuts. ICCV A. Culotta and A. McCallum. Practical Markov logic containing first- order quantifiers with application to identity uncertainty. Technical Report IR-430, University of Massachusetts, September A. McCallum and B. Wellner. Conditional models of identity uncertainty with applications to proper noun coreference. NIPS 2004 B. Milch et. al. BLOG: Relational modeling with unknown objects. Statistical Relational Learning Workshop. ICML 2004.