Bayesian Sets Zoubin Ghahramani and Kathertine A. Heller NIPS 2005 Presented by Qi An Mar. 17 th, 2006.

Slides:



Advertisements
Similar presentations
Part 2: Unsupervised Learning
Advertisements

Sinead Williamson, Chong Wang, Katherine A. Heller, David M. Blei
Context-based object-class recognition and retrieval by generalized correlograms by J. Amores, N. Sebe and P. Radeva Discussion led by Qi An Duke University.
INTRODUCTION TO MACHINE LEARNING Bayesian Estimation.
Fast Bayesian Matching Pursuit Presenter: Changchun Zhang ECE / CMR Tennessee Technological University November 12, 2010 Reading Group (Authors: Philip.
CSC321: 2011 Introduction to Neural Networks and Machine Learning Lecture 10: The Bayesian way to fit models Geoffrey Hinton.
What is Statistical Modeling
Searchable Web sites Recommendation Date : 2012/2/20 Source : WSDM’11 Speaker : I- Chih Chiu Advisor : Dr. Koh Jia-ling 1.
Rob Fergus Courant Institute of Mathematical Sciences New York University A Variational Approach to Blind Image Deconvolution.
Industrial Engineering College of Engineering Bayesian Kernel Methods for Binary Classification and Online Learning Problems Theodore Trafalis Workshop.
Relational Learning with Gaussian Processes By Wei Chu, Vikas Sindhwani, Zoubin Ghahramani, S.Sathiya Keerthi (Columbia, Chicago, Cambridge, Yahoo!) Presented.
Estimation of Distribution Algorithms Ata Kaban School of Computer Science The University of Birmingham.
Predictive Automatic Relevance Determination by Expectation Propagation Yuan (Alan) Qi Thomas P. Minka Rosalind W. Picard Zoubin Ghahramani.
1 Unsupervised Learning With Non-ignorable Missing Data Machine Learning Group Talk University of Toronto Monday Oct 4, 2004 Ben Marlin Sam Roweis Rich.
Bayesian Content-Based Image Retrieval research with: Katherine A. Heller based on (Heller and Ghahramani, 2006) part IB, paper 8, Lent.
A Probabilistic Model for Classification of Multiple-Record Web Documents June Tang Yiu-Kai Ng.
Iterative Set Expansion of Named Entities using the Web Richard C. Wang and William W. Cohen Language Technologies Institute Carnegie Mellon University.
Presenting: Assaf Tzabari
Collaborative Ordinal Regression Shipeng Yu Joint work with Kai Yu, Volker Tresp and Hans-Peter Kriegel University of Munich, Germany Siemens Corporate.
Review of Lecture Two Linear Regression Normal Equation
Binary Variables (1) Coin flipping: heads=1, tails=0 Bernoulli Distribution.
Modeling Documents by Combining Semantic Concepts with Unsupervised Statistical Learning Author: Chaitanya Chemudugunta America Holloway Padhraic Smyth.
Bayesian Hierarchical Clustering Paper by K. Heller and Z. Ghahramani ICML 2005 Presented by HAO-WEI, YEH.
Bayesian networks Classification, segmentation, time series prediction and more. Website: Twitter:
Bayesian Extension to the Language Model for Ad Hoc Information Retrieval Hugo Zaragoza, Djoerd Hiemstra, Michael Tipping Presented by Chen Yi-Ting.
High-Dimensional Unsupervised Selection and Estimation of a Finite Generalized Dirichlet Mixture model Based on Minimum Message Length by Nizar Bouguila.
K-Hit Query: Top-k Query Processing with Probabilistic Utility Function SIGMOD2015 Peng Peng, Raymond C.-W. Wong CSE, HKUST 1.
1 Opinion Retrieval from Blogs Wei Zhang, Clement Yu, and Weiyi Meng (2007 CIKM)
Randomized Algorithms for Bayesian Hierarchical Clustering
Learning the Structure of Related Tasks Presented by Lihan He Machine Learning Reading Group Duke University 02/03/2006 A. Niculescu-Mizil, R. Caruana.
21/11/20151Gianluca Demartini Ranking Clusters for Web Search Gianluca Demartini Paul–Alexandru Chirita Ingo Brunkhorst Wolfgang Nejdl L3S Info Lunch Hannover,
Chapter 11 Statistical Techniques. Data Warehouse and Data Mining Chapter 11 2 Chapter Objectives  Understand when linear regression is an appropriate.
Multi-Speaker Modeling with Shared Prior Distributions and Model Structures for Bayesian Speech Synthesis Kei Hashimoto, Yoshihiko Nankaku, and Keiichi.
The Infinite Hierarchical Factor Regression Model Piyush Rai and Hal Daume III NIPS 2008 Presented by Bo Chen March 26, 2009.
Active learning Haidong Shi, Nanyi Zeng Nov,12,2008.
Bayesian Prior and Posterior Study Guide for ES205 Yu-Chi Ho Jonathan T. Lee Nov. 24, 2000.
Multiple Instance Learning for Sparse Positive Bags Razvan C. Bunescu Machine Learning Group Department of Computer Sciences University of Texas at Austin.
Statistical Models for Partial Membership Katherine Heller Gatsby Computational Neuroscience Unit, UCL Sinead Williamson and Zoubin Ghahramani University.
Lecture 2: Statistical learning primer for biologists
Dependence Language Model for Information Retrieval Jianfeng Gao, Jian-Yun Nie, Guangyuan Wu, Guihong Cao, Dependence Language Model for Information Retrieval,
Dimensionality Reduction in Unsupervised Learning of Conditional Gaussian Networks Authors: Pegna, J.M., Lozano, J.A., Larragnaga, P., and Inza, I. In.
Bayesian Speech Synthesis Framework Integrating Training and Synthesis Processes Kei Hashimoto, Yoshihiko Nankaku, and Keiichi Tokuda Nagoya Institute.
26/01/20161Gianluca Demartini Ranking Categories for Faceted Search Gianluca Demartini L3S Research Seminars Hannover, 09 June 2006.
Feature Selction for SVMs J. Weston et al., NIPS 2000 오장민 (2000/01/04) Second reference : Mark A. Holl, Correlation-based Feature Selection for Machine.
Statistics Sampling Distributions and Point Estimation of Parameters Contents, figures, and exercises come from the textbook: Applied Statistics and Probability.
Personalization Services in CADAL Zhang yin Zhuang Yuting Wu Jiangqin College of Computer Science, Zhejiang University November 19,2006.
1 Random Walks on the Click Graph Nick Craswell and Martin Szummer Microsoft Research Cambridge SIGIR 2007.
Introduction to Information Retrieval Introduction to Information Retrieval Lecture Probabilistic Information Retrieval.
Multi-label Prediction via Sparse Infinite CCA Piyush Rai and Hal Daume III NIPS 2009 Presented by Lingbo Li ECE, Duke University July 16th, 2010 Note:
Learning to Rank: From Pairwise Approach to Listwise Approach Authors: Zhe Cao, Tao Qin, Tie-Yan Liu, Ming-Feng Tsai, and Hang Li Presenter: Davidson Date:
Hierarchical Beta Process and the Indian Buffet Process by R. Thibaux and M. I. Jordan Discussion led by Qi An.
Bayesian Hierarchical Clustering Paper by K. Heller and Z. Ghahramani ICML 2005 Presented by David Williams Paper Discussion Group ( )
A Collapsed Variational Bayesian Inference Algorithm for Latent Dirichlet Allocation Yee W. Teh, David Newman and Max Welling Published on NIPS 2006 Discussion.
Canadian Bioinformatics Workshops
Bayesian Extension to the Language Model for Ad Hoc Information Retrieval Hugo Zaragoza, Djoerd Hiemstra, Michael Tipping Microsoft Research Cambridge,
MLPR - Questions. Can you go through integration, differentiation etc. Why do we need priors? Difference between prior and posterior. What does Bayesian.
DEEP LEARNING BOOK CHAPTER to CHAPTER 6
Learning Recommender Systems with Adaptive Regularization
Reading Notes Wang Ning Lab of Database and Information Systems
ICS 280 Learning in Graphical Models
Alan Qi Thomas P. Minka Rosalind W. Picard Zoubin Ghahramani
Multimodal Learning with Deep Boltzmann Machines
Machine Learning Basics
Distributions and Concepts in Probability Theory
A Non-Parametric Bayesian Method for Inferring Hidden Causes
Logistic Regression & Parallel SGD
Shashi Shekhar Weili Wu Sanjay Chawla Ranga Raju Vatsavai
Michal Rosen-Zvi University of California, Irvine
Learning From Observed Data
Machine Learning – a Probabilistic Perspective
Presentation transcript:

Bayesian Sets Zoubin Ghahramani and Kathertine A. Heller NIPS 2005 Presented by Qi An Mar. 17 th, 2006

Outline Introduction Bayesian Sets Implementation –Binary data –Exponential families Experimental results Conclusions

Introduction Inspired by “Google TM Sets” What do Jesus and Darwin have in common? –Two different views on the origin of man –There are colleges at Cambridge University named after them The objective is to retrieve items from a concept of cluster, given a query consisting of a few items from that cluster

Introduction Consider a universe of items, which can be a set of web pages, movies, people or any other subjects depending on the application Make a query of small subset of items, which are assumed be examples of some cluster in the data. The algorithm provides a completion to the query set,. It presumably includes all the elements in and other elements in that are also in this cluster.

Introduction View the problem from two perspectives: –Clustering on demand Unlike other completely unsupervised clustering algorithm, here the query provides supervised hints or constraints as to the membership of a particular cluster. –Information retrieval Retrieve the information that are relevant to the query and rank the output by relevance to the query

Bayesian Sets Very simple algorithm Given and, we aim to rank the elements of by how well they would “fit into” a set which includes Define a score for each : From Bayes rule, the score can be re-written as:

Bayesian Sets Intuitively, the score compares the probability that x and were generated by the same model with the same unknown parameters θ, to the probability that x and came from models with different parameters θ and θ’.

Bayesian Sets

Sparse Binary Data Assume each item is a binary vector where each component is a binary variable from an independent Bernoulli distribution: The conjugate prior for a Bernoulli distribution is a Beta distribution: For a query where

Sparse Binary Data The score can be computed as: If we take a log of the score and put the entire data set into one large matrix X with J columns, we can compute a vector s of log scores for all points using a single matrix vector multiplication: where and

Exponential Families If the distribution for the model is not a Bernoulli distribution, but in the form of exponential families: we can use the conjugate prior: so that the score is:

Experimental results The experiments are performed on three different datasets: the Grolier Encyclopedia dataset, the EachMovie dataset and NIPS authors dataset. The running times of the algorithm is very fast on all three datasets:

Experimental results

Conclusions A simple algorithm which takes a query of a small set of items and returns additional items from belonging to this set. The score is computed w.r.t a statistical model and unknown model parameters are all marginalized out. With conjugate priors, the score can be computed exactly and efficiently. The methods does well when compared to Google Sets in terms of set completions. The algorithm is very flexible in that it can be combined with a wide variety of types of data and probabilistic model.