Course: Neural Networks, Instructor: Professor L.Behera.

Slides:

Advertisements

Similar presentations

Pattern Recognition and Machine Learning

Advertisements

Random Processes Introduction (2)

Dirichlet Processes in Dialogue Modelling

Sinead Williamson, Chong Wang, Katherine A. Heller, David M. Blei

Teg Grenager NLP Group Lunch February 24, 2005

Xiaolong Wang and Daniel Khashabi

Markov Chain Sampling Methods for Dirichlet Process Mixture Models R.M. Neal Summarized by Joon Shik Kim (Thu) Computational Models of Intelligence.

MAD-Bayes: MAP-based Asymptotic Derivations from Bayes

Hierarchical Dirichlet Process (HDP)

Basics of Statistical Estimation

Ouyang Ruofei Topic Model Latent Dirichlet Allocation Ouyang Ruofei May LDA.

Information retrieval – LSI, pLSI and LDA

Probabilistic models Haixu Tang School of Informatics.

Gibbs Sampling Methods for Stick-Breaking priors Hemant Ishwaran and Lancelot F. James 2001 Presented by Yuting Qi ECE Dept., Duke Univ. 03/03/06.

Hierarchical Dirichlet Processes

DEPARTMENT OF ENGINEERING SCIENCE Information, Control, and Vision Engineering Bayesian Nonparametrics via Probabilistic Programming Frank Wood

Nonparametric hidden Markov models Jurgen Van Gael and Zoubin Ghahramani.

Hierarchical Dirichlet Trees for Information Retrieval Gholamreza Haffari Simon Fraser University Yee Whye Teh University College London NAACL talk, Boulder,

HW 4. Nonparametric Bayesian Models Parametric Model Fixed number of parameters that is independent of the data we’re fitting.

Flipping A Biased Coin Suppose you have a coin with an unknown bias, θ ≡ P(head). You flip the coin multiple times and observe the outcome. From observations,

Latent Dirichlet Allocation a generative model for text

Hierarchical Bayesian Nonparametrics with Applications Michael I. Jordan University of California, Berkeley Acknowledgments: Emily Fox, Erik Sudderth,

Chapter Two Probability Distributions: Discrete Variables

Binary Variables (1) Coin flipping: heads=1, tails=0 Bernoulli Distribution.

ETHEM ALPAYDIN © The MIT Press, Lecture Slides for.

Motivation Parametric models can capture a bounded amount of information from the data. Real data is complex and therefore parametric assumptions is wrong.

Correlated Topic Models By Blei and Lafferty (NIPS 2005) Presented by Chunping Wang ECE, Duke University August 4 th, 2006.

Topic Models in Text Processing IR Group Meeting Presented by Qiaozhu Mei.

Hierarchical Dirichelet Processes Y. W. Tech, M. I. Jordan, M. J. Beal & D. M. Blei NIPS 2004 Presented by Yuting Qi ECE Dept., Duke Univ. 08/26/05 Sharing.

Adaptor Grammars Ehsan Khoddammohammadi Recent Advances in Parsing Technology WS 2012/13 Saarland University 1.

Bayesian Hierarchical Clustering Paper by K. Heller and Z. Ghahramani ICML 2005 Presented by HAO-WEI, YEH.

Hierarchical Topic Models and the Nested Chinese Restaurant Process Blei, Griffiths, Jordan, Tenenbaum presented by Rodrigo de Salvo Braz.

Finding Scientific topics August , Topic Modeling 1.A document as a probabilistic mixture of topics. 2.A topic as a probability distribution.

Dynamical Systems Model of the Simple Genetic Algorithm Introduction to Michael Vose’s Theory Rafal Kicinger Summer Lecture Series 2002.

Ch 2. Probability Distributions (1/2) Pattern Recognition and Machine Learning, C. M. Bishop, Summarized by Yung-Kyun Noh and Joo-kyung Kim Biointelligence.

1 LING 696B: Midterm review: parametric and non-parametric inductive inference.

The Dirichlet Labeling Process for Functional Data Analysis XuanLong Nguyen & Alan E. Gelfand Duke University Machine Learning Group Presented by Lu Ren.

Randomized Algorithms for Bayesian Hierarchical Clustering

MACHINE LEARNING 8. Clustering. Motivation Based on E ALPAYDIN 2004 Introduction to Machine Learning © The MIT Press (V1.1) 2  Classification problem:

Latent Dirichlet Allocation D. Blei, A. Ng, and M. Jordan. Journal of Machine Learning Research, 3: , January Jonathan Huang

Hierarchical Dirichlet Process and Infinite Hidden Markov Model Duke University Machine Learning Group Presented by Kai Ni February 17, 2006 Paper by Y.

1 Dirichlet Process Mixtures A gentle tutorial Graphical Models – Khalid El-Arini Carnegie Mellon University November 6 th, 2006 TexPoint fonts used.

Stick-Breaking Constructions

1 Clustering in Generalized Linear Mixed Model Using Dirichlet Process Mixtures Ya Xue Xuejun Liao April 1, 2005.

Lecture #9: Introduction to Markov Chain Monte Carlo, part 3

Bayesian Multi-Population Haplotype Inference via a Hierarchical Dirichlet Process Mixture Duke University Machine Learning Group Presented by Kai Ni August.

Gaussian Processes For Regression, Classification, and Prediction.

1 Chapter 8: Model Inference and Averaging Presented by Hui Fang.

Nonparametric Bayesian Models. HW 4 x x Parametric Model Fixed number of parameters that is independent of the data we’re fitting.

Analysis of Social Media MLD , LTI William Cohen

Dirichlet Distribution

Ch 2. Probability Distributions (1/2) Pattern Recognition and Machine Learning, C. M. Bishop, Summarized by Joo-kyung Kim Biointelligence Laboratory,

Hierarchical Beta Process and the Indian Buffet Process by R. Thibaux and M. I. Jordan Discussion led by Qi An.

APPLICATIONS OF DIRICHLET PROCESS MIXTURES TO SPEAKER ADAPTATION Amir Harati and Joseph PiconeMarc Sobel Institute for Signal and Information Processing,

Completely Random Measures for Bayesian Nonparametrics Michael I. Jordan University of California, Berkeley Acknowledgments: Emily Fox, Erik Sudderth,

A Collapsed Variational Bayesian Inference Algorithm for Latent Dirichlet Allocation Yee W. Teh, David Newman and Max Welling Published on NIPS 2006 Discussion.

Machine Learning and Data Mining Clustering

Bayesian Generalized Product Partition Model

Non-Parametric Models

Omiros Papaspiliopoulos and Gareth O. Roberts

CSCI 5822 Probabilistic Models of Human and Machine Learning

Kernel Stick-Breaking Process

Collapsed Variational Dirichlet Process Mixture Models

Multitask Learning Using Dirichlet Process

Hierarchical Topic Models and the Nested Chinese Restaurant Process

Chinese Restaurant Representation Stick-Breaking Construction

Latent Dirichlet Allocation

Topic Models in Text Processing

Rational models of categorization

Presentation transcript:

Course: Neural Networks, Instructor: Professor L.Behera. Dirichlet Process -Joy Bhattacharjee, Department of ChE, IIT Kanpur. Johann Peter Gustav Lejeune Dirichlet

What is Dirichlet Process ? The Dirichlet process is a stochastic process used in Bayesian nonparametric models of data, particularly in Dirichlet process mixture models (also known as infinite mixture models). It is a distribution over distributions, i.e. each draw from a Dirichlet process is itself a distribution. It is called a Dirichlet process because it has Dirichlet distributed finite dimensional marginal distributions.

What is Dirichlet Process ? The Dirichlet process is a stochastic process used in Bayesian nonparametric models of data, particularly in Dirichlet process mixture models (also known as infinite mixture models). It is a distribution over distributions, i.e. each draw from a Dirichlet process is itself a distribution. It is called a Dirichlet process because it has Dirichlet distributed finite dimensional marginal distributions.

What is Dirichlet Process ? The Dirichlet process is a stochastic process used in Bayesian nonparametric models of data, particularly in Dirichlet process mixture models (also known as infinite mixture models). It is a distribution over distributions, i.e. each draw from a Dirichlet process is itself a distribution. It is called a Dirichlet process because it has Dirichlet distributed finite dimensional marginal distributions.

Dirichlet Priors A distribution over possible parameter vectors of the multinomial distribution Thus values must lie in the k-dimensional simplex Beta distribution is the 2-parameter special case Expectation A conjugate prior to the multinomial xi   N

What is Dirichlet Distribution ? Methods to generate Dirichlet distribution : Polya’s Urn Stick Breaking Chinese Restaurant Problem

Samples from a DP

Dirichlet Distribution

Polya’s Urn scheme: Suppose we want to generate a realization of Q Dir(α). To start, put i balls of color i for i = 1; 2; : : : ; k; in an urn. Note that i > 0 is not necessarily an integer, so we may have a fractional or even an irrational number of balls of color i in our urn! At each iteration, draw one ball uniformly at random from the urn, and then place it back into the urn along with an additional ball of the same color. As we iterate this procedure more and more times, the proportions of balls of each color will converge to a pmf that is a sample from the distribution Dir(α).

Mathematical form:

Stick Breaking Process The stick-breaking approach to generating a random vector with a Dir(α) distribution involves iteratively breaking a stick of length 1 into k pieces in such a way that the lengths of the k pieces follow a Dir(α) distribution. Following figure illustrates this process with simulation results.

Stick Breaking Process 0.4 0.6 0.5 0.3 0.3 0.8 0.24 What is G? - A sample from the DP The theta params for each datum are drawn from it Because prob of drawing the same theta twice is positive, it must be discrete Depends somehow on theta and G_0 Stick breaking process G0

Chinese Restaurant Process

Chinese Restaurant Process CRP is a distribution on partitions that captures the clustering effect of the DP

Nested CRP To generate a document given a tree with L levels Choose a path from the root of the tree to a leaf Draw a vector  of topic mixing proportions from an L-dimensional Dirichlet Generate the words in the document from a mixture of the topics along the path, with mixing proportions 

Nested CRP Used for modeling topic hierarchies by Blei et. al., 2004. Day 1 Day 2 Day 3

Properties of the DP Let (,) be a measurable space, G0 be a probability measure on the space, and  be a positive real number A Dirichlet process is any distribution of a random probability measure G over (,) such that, for all finite partitions (A1,…,Ar) of , Draws G from DP are generally not distinct The number of distinct values grows with O(log n)

In general, an infinite set of random variables is said to be infinitely exchangeable if for every finite subset {xi,…,xn} and for any permutation  we have Note that infinite exchangeability is not the same as being independent and identically distributed (i.i.d.)! Using DeFinetti’s theorem, it is possible to show that our draws  are infinitely exchangeable Thus the mixture components may be sampled in any order.

Mixture Model Inference We want to find a clustering of the data: an assignment of values to the hidden class variable Sometimes we also want the component parameters In most finite mixture models, this can be found with EM The Dirichlet process is a non-parametric prior, and doesn’t permit EM We use Gibbs sampling instead

Finite mixture model

Infinite mixture model

DP Mixture model

Agglomerative Clustering Num Clusters Max Distance 20 19 5 18 5 17 5 16 8 15 8 14 8 13 8 12 8 11 9 10 9 9 Pros: Doesn’t need generative model (number of clusters, parametric distribution) Cons: Ad-hoc, no probabilistic foundation, intractable for large data sets 8 10 7 10 6 10 5 10 4 12 3 12 2 15 1 16

Mixture Model Clustering Examples: K-means, mixture of Gaussians, Naïve Bayes Pros: Sound probabilistic foundation, efficient even for large data sets Cons: Requires generative model, including number of clusters (mixture components)

Applications Clustering in Natural Language Processing Document clustering for topic, genre, sentiment… Word clustering for Part of Speech(POS), Word sense disambiguation(WSD), synonymy… Topic clustering across documents Noun coreference: don’t know how many entities are there Other identity uncertainty problems: deduping, etc. Grammar induction Sequence modeling: the “infinite HMM” Topic segmentation) Sequence models for POS tagging Society modeling in public places Unsupervised machine learning Useful anytime you want to cluster or do unsup learning without specifying the number fo clusters

References: Bela A. Frigyik, Amol Kapila, and Maya R. Gupta , University of Washington, Seattle, UWEE Technical report : Introduction to Dirichlet distribution and related processes, report number UWEETR2010-0006. Yee Whye Teh, University College London : Dirichlet Process Khalid-El-Arini, Select Lab meeting, October 2006. Teg Granager, Natural Language Processing, Stanford University : Introduction to Chinese Restaurant problem and Stick breaking scheme. Wikipedia

Questions ? Suggest some distributions that can use Dirichlet process to find classes. What are the applications in finite mixture model? Comment on: The DP of a cluster is also a Dirichlet distribution.