Bayesian Sets Zoubin Ghahramani and Kathertine A. Heller NIPS 2005 Presented by Qi An Mar. 17 th, 2006
Outline Introduction Bayesian Sets Implementation –Binary data –Exponential families Experimental results Conclusions
Introduction Inspired by “Google TM Sets” What do Jesus and Darwin have in common? –Two different views on the origin of man –There are colleges at Cambridge University named after them The objective is to retrieve items from a concept of cluster, given a query consisting of a few items from that cluster
Introduction Consider a universe of items, which can be a set of web pages, movies, people or any other subjects depending on the application Make a query of small subset of items, which are assumed be examples of some cluster in the data. The algorithm provides a completion to the query set,. It presumably includes all the elements in and other elements in that are also in this cluster.
Introduction View the problem from two perspectives: –Clustering on demand Unlike other completely unsupervised clustering algorithm, here the query provides supervised hints or constraints as to the membership of a particular cluster. –Information retrieval Retrieve the information that are relevant to the query and rank the output by relevance to the query
Bayesian Sets Very simple algorithm Given and, we aim to rank the elements of by how well they would “fit into” a set which includes Define a score for each : From Bayes rule, the score can be re-written as:
Bayesian Sets Intuitively, the score compares the probability that x and were generated by the same model with the same unknown parameters θ, to the probability that x and came from models with different parameters θ and θ’.
Bayesian Sets
Sparse Binary Data Assume each item is a binary vector where each component is a binary variable from an independent Bernoulli distribution: The conjugate prior for a Bernoulli distribution is a Beta distribution: For a query where
Sparse Binary Data The score can be computed as: If we take a log of the score and put the entire data set into one large matrix X with J columns, we can compute a vector s of log scores for all points using a single matrix vector multiplication: where and
Exponential Families If the distribution for the model is not a Bernoulli distribution, but in the form of exponential families: we can use the conjugate prior: so that the score is:
Experimental results The experiments are performed on three different datasets: the Grolier Encyclopedia dataset, the EachMovie dataset and NIPS authors dataset. The running times of the algorithm is very fast on all three datasets:
Experimental results
Conclusions A simple algorithm which takes a query of a small set of items and returns additional items from belonging to this set. The score is computed w.r.t a statistical model and unknown model parameters are all marginalized out. With conjugate priors, the score can be computed exactly and efficiently. The methods does well when compared to Google Sets in terms of set completions. The algorithm is very flexible in that it can be combined with a wide variety of types of data and probabilistic model.