Multitask Learning Using Dirichlet Process

Slides:



Advertisements
Similar presentations
Pattern Recognition and Machine Learning
Advertisements

Sinead Williamson, Chong Wang, Katherine A. Heller, David M. Blei
Teg Grenager NLP Group Lunch February 24, 2005
Xiaolong Wang and Daniel Khashabi
Course: Neural Networks, Instructor: Professor L.Behera.
Hierarchical Dirichlet Process (HDP)
Insert Date HereSlide 1 Using Derivative and Integral Information in the Statistical Analysis of Computer Models Gemma Stephenson March 2007.
COMPUTER AIDED DIAGNOSIS: CLASSIFICATION Prof. Yasser Mostafa Kadah –
Basics of Statistical Estimation
Gibbs Sampling Methods for Stick-Breaking priors Hemant Ishwaran and Lancelot F. James 2001 Presented by Yuting Qi ECE Dept., Duke Univ. 03/03/06.
INTRODUCTION TO MACHINE LEARNING Bayesian Estimation.
Hierarchical Dirichlet Processes
Bayesian dynamic modeling of latent trait distributions Duke University Machine Learning Group Presented by Kai Ni Jan. 25, 2007 Paper by David B. Dunson,
DEPARTMENT OF ENGINEERING SCIENCE Information, Control, and Vision Engineering Bayesian Nonparametrics via Probabilistic Programming Frank Wood
CS479/679 Pattern Recognition Dr. George Bebis
Computer vision: models, learning and inference Chapter 8 Regression.
LECTURE 11: BAYESIAN PARAMETER ESTIMATION
K Means Clustering , Nearest Cluster and Gaussian Mixture
Basis Expansion and Regularization Presenter: Hongliang Fei Brian Quanz Brian Quanz Date: July 03, 2008.
ETHEM ALPAYDIN © The MIT Press, Lecture Slides for.
Supervised learning: Mixture Of Experts (MOE) Network.
Arizona State University DMML Kernel Methods – Gaussian Processes Presented by Shankar Bhargav.
Kernel Methods Part 2 Bing Han June 26, Local Likelihood Logistic Regression.
Review Rong Jin. Comparison of Different Classification Models  The goal of all classifiers Predicating class label y for an input x Estimate p(y|x)
Learning In Bayesian Networks. Learning Problem Set of random variables X = {W, X, Y, Z, …} Training set D = { x 1, x 2, …, x N }  Each observation specifies.
Chapter Two Probability Distributions: Discrete Variables
Binary Variables (1) Coin flipping: heads=1, tails=0 Bernoulli Distribution.
1 Machine Learning Overview Adapted from Sargur N. Srihari University at Buffalo, State University of New York USA.
EM and expected complete log-likelihood Mixture of Experts
Bayesian Hierarchical Clustering Paper by K. Heller and Z. Ghahramani ICML 2005 Presented by HAO-WEI, YEH.
Bayesian Extension to the Language Model for Ad Hoc Information Retrieval Hugo Zaragoza, Djoerd Hiemstra, Michael Tipping Presented by Chen Yi-Ting.
Learning Theory Reza Shadmehr logistic regression, iterative re-weighted least squares.
Hierarchical Topic Models and the Nested Chinese Restaurant Process Blei, Griffiths, Jordan, Tenenbaum presented by Rodrigo de Salvo Braz.
Simulation of the matrix Bingham-von Mises- Fisher distribution, with applications to multivariate and relational data Discussion led by Chunping Wang.
ECE 8443 – Pattern Recognition LECTURE 07: MAXIMUM LIKELIHOOD AND BAYESIAN ESTIMATION Objectives: Class-Conditional Density The Multivariate Case General.
High-Dimensional Unsupervised Selection and Estimation of a Finite Generalized Dirichlet Mixture model Based on Minimum Message Length by Nizar Bouguila.
CS 782 – Machine Learning Lecture 4 Linear Models for Classification  Probabilistic generative models  Probabilistic discriminative models.
Learning Theory Reza Shadmehr Linear and quadratic decision boundaries Kernel estimates of density Missing data.
Jun-Won Suh Intelligent Electronic Systems Human and Systems Engineering Department of Electrical and Computer Engineering Speaker Verification System.
Exploiting Context Analysis for Combining Multiple Entity Resolution Systems -Ramu Bandaru Zhaoqi Chen Dmitri V.kalashnikov Sharad Mehrotra.
Randomized Algorithms for Bayesian Hierarchical Clustering
1 Dirichlet Process Mixtures A gentle tutorial Graphical Models – Khalid El-Arini Carnegie Mellon University November 6 th, 2006 TexPoint fonts used.
Stick-Breaking Constructions
Learning In Bayesian Networks. General Learning Problem Set of random variables X = {X 1, X 2, X 3, X 4, …} Training set D = { X (1), X (2), …, X (N)
Ch15: Decision Theory & Bayesian Inference 15.1: INTRO: We are back to some theoretical statistics: 1.Decision Theory –Make decisions in the presence of.
Lecture 2: Statistical learning primer for biologists
1 Unsupervised Learning and Clustering Shyh-Kang Jeng Department of Electrical Engineering/ Graduate Institute of Communication/ Graduate Institute of.
Bayesian Density Regression Author: David B. Dunson and Natesh Pillai Presenter: Ya Xue April 28, 2006.
CSC321: Introduction to Neural Networks and Machine Learning Lecture 15: Mixtures of Experts Geoffrey Hinton.
Generative classifiers: The Gaussian classifier Ata Kaban School of Computer Science University of Birmingham.
Intro. ANN & Fuzzy Systems Lecture 38 Mixture of Experts Neural Network.
Hierarchical Beta Process and the Indian Buffet Process by R. Thibaux and M. I. Jordan Discussion led by Qi An.
Bayesian Hierarchical Clustering Paper by K. Heller and Z. Ghahramani ICML 2005 Presented by David Williams Paper Discussion Group ( )
Hierarchical Mixture of Experts Presented by Qi An Machine learning reading group Duke University 07/15/2005.
Nonparametric Bayesian Learning of Switching Dynamical Processes
Variational Bayes Model Selection for Mixture Distribution
Non-Parametric Models
Omiros Papaspiliopoulos and Gareth O. Roberts
Special Topics In Scientific Computing
Kernel Stick-Breaking Process
Learning with information of features
Collapsed Variational Dirichlet Process Mixture Models
Exact and Approximate Sum Representations for the Dirichlet Process
Hierarchical Topic Models and the Nested Chinese Restaurant Process
Generalized Spatial Dirichlet Process Models
Mathematical Foundations of BME Reza Shadmehr
Learning piecewise linear classifiers via Dirichlet Process
Chinese Restaurant Representation Stick-Breaking Construction
Parametric Methods Berlin Chen, 2005 References:
Mathematical Foundations of BME
Presentation transcript:

Multitask Learning Using Dirichlet Process Ya Xue July 1, 2005

Outline Task defined: infinite mixture of priors Multitask learning Dirichlet process Task undefined: expert network Finite expert network Infinite expert network

Multitask Learning - Common Prior Model M classification tasks: Shared prior of w:

Drawback of This Model Assume each wm is a two-dimensional vector.

Proposed Model w is drawn from a Gaussian mixture model:

Two Special Cases Common prior model - single Gaussian: Piecewise linear classifier – point mass function similar vs. identical

Clustering Unknown parameters: Another uncertainty: K. Model selection: compute evidence/Marginal:

Clustering with DP: No Model Selection We rewrite the model in another form: We define a Dirichlet process prior for parameters

Stick-Breaking View of DP 1 Finally we get

Prediction Rule of DP for Posterior Inference is a new data point. Assuming there are K distinct values of among , belongs to an existing cluster k: belongs to new cluster:

Toy Problem

Task 1

Task 2

Task 3

Task 4

Task 5

Task 6

Task 7

Task 8

Expert Network

Mathematical Model Gating node j: Likelihood:

Mathematical Model is the unique path from the root note to expert m. where

Example

Infinite Expert Network Infinite number of gating node.