Xiaolong Wang and Daniel Khashabi

Slides:

Advertisements

Similar presentations

Chapter 6 Matrix Algebra.

Advertisements

Pattern Recognition and Machine Learning

Dirichlet Processes in Dialogue Modelling

Sinead Williamson, Chong Wang, Katherine A. Heller, David M. Blei

Topic models Source: Topic models, David Blei, MLSS 09.

Teg Grenager NLP Group Lunch February 24, 2005

Markov Chain Sampling Methods for Dirichlet Process Mixture Models R.M. Neal Summarized by Joon Shik Kim (Thu) Computational Models of Intelligence.

Course: Neural Networks, Instructor: Professor L.Behera.

MAD-Bayes: MAP-based Asymptotic Derivations from Bayes

Hierarchical Dirichlet Process (HDP)

CSE 473/573 Computer Vision and Image Processing (CVIP) Ifeoma Nwogu Lecture 27 – Overview of probability concepts 1.

Basics of Statistical Estimation

Gibbs Sampling Methods for Stick-Breaking priors Hemant Ishwaran and Lancelot F. James 2001 Presented by Yuting Qi ECE Dept., Duke Univ. 03/03/06.

Hierarchical Dirichlet Processes

DEPARTMENT OF ENGINEERING SCIENCE Information, Control, and Vision Engineering Bayesian Nonparametrics via Probabilistic Programming Frank Wood

HW 4. Nonparametric Bayesian Models Parametric Model Fixed number of parameters that is independent of the data we’re fitting.

Learning Scalable Discriminative Dictionaries with Sample Relatedness a.k.a. “Infinite Attributes” Jiashi Feng, Stefanie Jegelka, Shuicheng Yan, Trevor.

Variational Inference for Dirichlet Process Mixture Daniel Klein and Soravit Beer Changpinyo October 11, 2011 Applied Bayesian Nonparametrics Special Topics.

Nonparametric Bayes and human cognition Tom Griffiths Department of Psychology Program in Cognitive Science University of California, Berkeley.

Latent Dirichlet Allocation a generative model for text

Nonparametric Bayesian Learning

Computer vision: models, learning and inference Chapter 3 Common probability distributions.

Hierarchical Bayesian Nonparametrics with Applications Michael I. Jordan University of California, Berkeley Acknowledgments: Emily Fox, Erik Sudderth,

Binary Variables (1) Coin flipping: heads=1, tails=0 Bernoulli Distribution.

Correlated Topic Models By Blei and Lafferty (NIPS 2005) Presented by Chunping Wang ECE, Duke University August 4 th, 2006.

Example 16,000 documents 100 topic Picked those with large p(w|z)

Topic Models in Text Processing IR Group Meeting Presented by Qiaozhu Mei.

Learning the structure of Deep sparse Graphical Model Ryan Prescott Adams Hanna M Wallach Zoubin Ghahramani Presented by Zhengming Xing Some pictures are.

Fast Max–Margin Matrix Factorization with Data Augmentation Minjie Xu, Jun Zhu & Bo Zhang Tsinghua University.

Finding Scientific topics August , Topic Modeling 1.A document as a probabilistic mixture of topics. 2.A topic as a probability distribution.

Inferring structure from data Tom Griffiths Department of Psychology Program in Cognitive Science University of California, Berkeley.

Ch 2. Probability Distributions (1/2) Pattern Recognition and Machine Learning, C. M. Bishop, Summarized by Yung-Kyun Noh and Joo-kyung Kim Biointelligence.

The Dirichlet Labeling Process for Functional Data Analysis XuanLong Nguyen & Alan E. Gelfand Duke University Machine Learning Group Presented by Lu Ren.

Probability and Measure September 2, Nonparametric Bayesian Fundamental Problem: Estimating Distribution from a collection of Data E. ( X a distribution-valued.

Summary We propose a framework for jointly modeling networks and text associated with them, such as networks or user review websites. The proposed.

Randomized Algorithms for Bayesian Hierarchical Clustering

Latent Dirichlet Allocation D. Blei, A. Ng, and M. Jordan. Journal of Machine Learning Research, 3: , January Jonathan Huang

Variational Inference for the Indian Buffet Process

Hierarchical Dirichlet Process and Infinite Hidden Markov Model Duke University Machine Learning Group Presented by Kai Ni February 17, 2006 Paper by Y.

1 Dirichlet Process Mixtures A gentle tutorial Graphical Models – Khalid El-Arini Carnegie Mellon University November 6 th, 2006 TexPoint fonts used.

Stick-Breaking Constructions

Topic Models Presented by Iulian Pruteanu Friday, July 28 th, 2006.

The Infinite Hierarchical Factor Regression Model Piyush Rai and Hal Daume III NIPS 2008 Presented by Bo Chen March 26, 2009.

Stick-breaking Construction for the Indian Buffet Process Duke University Machine Learning Group Presented by Kai Ni July 27, 2007 Yee Whye The, Dilan.

Nonparametric Bayesian Models. HW 4 x x Parametric Model Fixed number of parameters that is independent of the data we’re fitting.

Multi-label Prediction via Sparse Infinite CCA Piyush Rai and Hal Daume III NIPS 2009 Presented by Lingbo Li ECE, Duke University July 16th, 2010 Note:

NIPS 2013 Michael C. Hughes and Erik B. Sudderth

Hierarchical Beta Process and the Indian Buffet Process by R. Thibaux and M. I. Jordan Discussion led by Qi An.

APPLICATIONS OF DIRICHLET PROCESS MIXTURES TO SPEAKER ADAPTATION Amir Harati and Joseph PiconeMarc Sobel Institute for Signal and Information Processing,

Latent Feature Models for Network Data over Time Jimmy Foulds Advisor: Padhraic Smyth (Thanks also to Arthur Asuncion and Chris Dubois)

Completely Random Measures for Bayesian Nonparametrics Michael I. Jordan University of California, Berkeley Acknowledgments: Emily Fox, Erik Sudderth,

A Collapsed Variational Bayesian Inference Algorithm for Latent Dirichlet Allocation Yee W. Teh, David Newman and Max Welling Published on NIPS 2006 Discussion.

Predictive Automatic Relevance Determination by Expectation Propagation Y. Qi T.P. Minka R.W. Picard Z. Ghahramani.

Bayesian Semi-Parametric Multiple Shrinkage

An Infinite Factor Model Hierarchy Via a Noisy-Or Mechanism

Accelerated Sampling for the Indian Buffet Process

Alan Qi Thomas P. Minka Rosalind W. Picard Zoubin Ghahramani

Non-Parametric Models

Multimodal Learning with Deep Boltzmann Machines

Latent Variables, Mixture Models and EM

A Non-Parametric Bayesian Method for Inferring Hidden Causes

Collapsed Variational Dirichlet Process Mixture Models

Hierarchical Topic Models and the Nested Chinese Restaurant Process

Bayesian Inference for Mixture Language Models

Michal Rosen-Zvi University of California, Irvine

Latent Dirichlet Allocation

Topic Models in Text Processing

Biointelligence Laboratory, Seoul National University

Rational models of categorization

Presentation transcript:

Xiaolong Wang and Daniel Khashabi Indian Buffet Process Xiaolong Wang and Daniel Khashabi UIUC, 2013

Contents Mixture Model Matrix Feature Model Finite mixture Infinite mixture Matrix Feature Model Finite features Infinite features(Indian Buffet Process)

Bayesian Nonparametrics Models with undefined number of elements, Dirichlet Process for infinite mixture models With various applications Hierarchies Topics and syntactic classes Objects appearing in one image Cons The models are limited to the case that could be modeled using DP. i.e. set of observations are generated by only one latent component

Bayesian Nonparametrics contd. In practice there might be more complicated interaction between latent variables and observations Solution Looking for more flexible nonparametric models Such interaction could be captured via a binary matrix Infinite features means infinite number of columns

Finite Mixture Model Set of observation: Constant clusters, Cluster assignment for is Cluster assignments vector : The probability of each sample under the model: The likelihood of samples: The prior on the component probabilities (symmetric Dirichlet dits.)

Finite Mixture Model Since we want the mixture model to be valid for any general component we only assume the number of cluster assignments to be the goal of learning this mixture model ! Cluster assignments: The model can be summarized as : To have a valid model, all of the distributions must be valid ! likelihood prior posterior Marginal likelihood (Evidence)

Finite Mixture Model contd.

Infinite mixture model Infinite clusters likelihood It is like saying that we have : Since we always have limited samples in reality, we will have limited number of clusters used; so we define two set of clusters: number of classes for which Assume a reordering, such that Infinite dimensional multinomial cluster distribution.

Infinite mixture model Now we return to the previous slides and set in formulas If we set the marginal likelihood will be . Instead we can model this problem, by defining probabilities on partitions of samples, instead of class labels for each sample.

Infinite mixture model contd. Define a partition of objects; Want to partition objects into classes Equivalence class of object partitions: Valid probability distribution for an infinite mixture model Exchangeable with respect to clusters assignments ! Important for Gibbs sampling (and Chinese restaurant process) Di Finetti’s theorem: explains why exchangeable observations are conditionally independent given some probability distribution

Chinese Restaurant Process (CRP) Parameter

Chinese Restaurant Process (CRP) Parameter

Chinese Restaurant Process (CRP) Parameter

Chinese Restaurant Process (CRP) Parameter

Chinese Restaurant Process (CRP) Parameter

Chinese Restaurant Process (CRP) Parameter

Chinese Restaurant Process (CRP) Parameter

Chinese Restaurant Process (CRP) Parameter

Chinese Restaurant Process (CRP) Parameter

Chinese Restaurant Process (CRP) Parameter

CRP: Gibbs sampling Gibbs sampler requires full conditional Finite Mixture Model: Infinite Mixture Model:

Beyond the limit of single label In Latent Class Models: Each object (word) has only one latent label (topic) Finite number of latent labels: LDA Infinite number of latent labels: DPM In Latent Feature (latent structure) Models: Each object (graph) has multiple latent features (entities) Finite number of latent features: Finite Feature Model (FFM) Infinite number of latent features: Indian Buffet Process (IBP) Rows are data points Columns are latent features Movie Preference Example: Rows are movies: Rise of the Planet of the Apes Columns are latent features: Made in U.S. Is Science fiction Has apes in it …

Latent Feature Model F : latent feature matrix Z : binary matrix V : value matrix With p(F)=p(Z). p(V) Z continuous F discrete F

Finite Feature Model Generating Z : (N*K) binary matrix For each column k, draw from beta distribution For each object, flip a coin by Distribution of Z :

Finite Feature Model Generating Z : (N*K) binary matrix Z is sparse: For each column k, draw from beta distribution For each object, flip a coin by Z is sparse: Even

Indian Buffet Process 1st Representation: Difficulty: P(Z) -> 0 Solution: define equivalence classes on random binary feature matrices. left-ordered form function of binary matrices, lof(Z): Compute history h of feature (column) k Order features by h decreasingly Z1 Z2 are lof equivalent iff lof(Z1)=lof(Z2) = [Z] 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 Describing [Z]: Cardinality of [Z]:

Indian Buffet Process 1st Representation: Given: Derive when : ( almost surely)

Indian Buffet Process 1st Representation: Given: Derive when : ( almost surely)

Indian Buffet Process 1st Representation: Given: Derive when : ( almost surely)

Indian Buffet Process 1st Representation: Given: Derive when : ( almost surely) Harmonic sequence sum

Indian Buffet Process 1st Representation: Given: Derive when : ( almost surely) Note: is well defined because a.s. depends on Kh : The number of features (columns) with history h Permute the rows (data points) does not change Kh (exchangeability)

Indian Buffet Process 2st Representation: Customers & Dishes Indian restaurant with inﬁnitely many inﬁnite dishes (columns) The first customer tastes first K1(1) dishes, sample The i-th customer: Taste a previously sampled dish with probability mk : number of previously customers taking dish k Taste following K1(i) new dishes, sample Poission Distribution:

Indian Buffet Process 2st Representation: Customers & Dishes 1/1 1/2 1/3 2/4 2/5 3/6 4/7 3/8 5/9 6/10 1/1 2/2 3/3 1/4 4/5 5/6 6/7 1/8 7/9 8/10

Indian Buffet Process 2st Representation: Customers & Dishes From: We have: Note: Permute K1(1), next K1(2), next K1(3),… next K1(N) dishes (columns) does not change P(Z) The permuted matrices are all of same lof equivalent class [Z] Number of permutation: 2nd representation is equivalent to 1st representation

Indian Buffet Process 3rd Representation: Distribution over Collections of Histories Directly generating the left ordered form (lof) matrix Z For each history h: mh: number of non-zero elements in h Generate Kh columns of history h The distribution over collections of histories Note: Permute digits in h does not change mh (nor P(K) ) Permute rows means customers are exchangeable

Indian Buffet Process Effective dimension of the model K+: Follow Poission distribution: Derives from 2nd representation by summing Poission components Number of features possessed by each object: Follow Possion distribution: Derives from 2nd representation: The first customer chooses dishes The customers are exchangeable and thus can be purmuted Z is sparse: Non-zero element Derives from the 2nd (or 1st) representation: Expected number of non-zeros for each row is Expected entries in Z is

IBP: Gibbs sampling Need to have the full conditional denotes the entries of Z other than . p(X|Z) depends on the model chosen for the observed data. By exchangeability, consider generating row as the last customer: By IBP, in which If sample zik=0, and mk=0: delete the row At the end, draw new dishes from with considering Approximated by truncation, computing probabilities for a range of values of new dishes up to a upper bound

More talk on applications As prior distribution in models with infinite number of features. Modeling Protein Interactions Models of bipartite graph consisting of one side with undefined number of elements. Binary Matrix Factorization for Modeling Dyadic Data Extracting Features from Similarity Judgments Latent Features in Link Prediction Independent Components Analysis and Sparse Factor Analysis More on inference Stick-breaking representation (Yee Whye Teh et al., 2007) Variational inference (Finale Doshi-Velez et al., 2009) Accelerated Inference (Finale Doshi-Velez et al., 2009)

References Griffiths, T. L., & Ghahramani, Z. (2011). The indian buffet process: An introduction and review. Journal of Machine Learning Research, 12, 1185-1224. Slides: Tom Griffiths, “The Indian buffet process”.