Multitask Learning Using Dirichlet Process

Slides:

Advertisements

Similar presentations

Pattern Recognition and Machine Learning

Advertisements

Sinead Williamson, Chong Wang, Katherine A. Heller, David M. Blei

Teg Grenager NLP Group Lunch February 24, 2005

Xiaolong Wang and Daniel Khashabi

Course: Neural Networks, Instructor: Professor L.Behera.

Hierarchical Dirichlet Process (HDP)

Insert Date HereSlide 1 Using Derivative and Integral Information in the Statistical Analysis of Computer Models Gemma Stephenson March 2007.

COMPUTER AIDED DIAGNOSIS: CLASSIFICATION Prof. Yasser Mostafa Kadah –

Basics of Statistical Estimation

Gibbs Sampling Methods for Stick-Breaking priors Hemant Ishwaran and Lancelot F. James 2001 Presented by Yuting Qi ECE Dept., Duke Univ. 03/03/06.

INTRODUCTION TO MACHINE LEARNING Bayesian Estimation.

Hierarchical Dirichlet Processes

Bayesian dynamic modeling of latent trait distributions Duke University Machine Learning Group Presented by Kai Ni Jan. 25, 2007 Paper by David B. Dunson,

DEPARTMENT OF ENGINEERING SCIENCE Information, Control, and Vision Engineering Bayesian Nonparametrics via Probabilistic Programming Frank Wood

CS479/679 Pattern Recognition Dr. George Bebis

Computer vision: models, learning and inference Chapter 8 Regression.

LECTURE 11: BAYESIAN PARAMETER ESTIMATION

K Means Clustering , Nearest Cluster and Gaussian Mixture

Basis Expansion and Regularization Presenter: Hongliang Fei Brian Quanz Brian Quanz Date: July 03, 2008.

ETHEM ALPAYDIN © The MIT Press, Lecture Slides for.

Supervised learning: Mixture Of Experts (MOE) Network.

Arizona State University DMML Kernel Methods – Gaussian Processes Presented by Shankar Bhargav.

Kernel Methods Part 2 Bing Han June 26, Local Likelihood Logistic Regression.

Review Rong Jin. Comparison of Different Classification Models  The goal of all classifiers Predicating class label y for an input x Estimate p(y|x)

Learning In Bayesian Networks. Learning Problem Set of random variables X = {W, X, Y, Z, …} Training set D = { x 1, x 2, …, x N }  Each observation specifies.

Chapter Two Probability Distributions: Discrete Variables

Binary Variables (1) Coin flipping: heads=1, tails=0 Bernoulli Distribution.

1 Machine Learning Overview Adapted from Sargur N. Srihari University at Buffalo, State University of New York USA.

EM and expected complete log-likelihood Mixture of Experts

Bayesian Hierarchical Clustering Paper by K. Heller and Z. Ghahramani ICML 2005 Presented by HAO-WEI, YEH.

Bayesian Extension to the Language Model for Ad Hoc Information Retrieval Hugo Zaragoza, Djoerd Hiemstra, Michael Tipping Presented by Chen Yi-Ting.

Learning Theory Reza Shadmehr logistic regression, iterative re-weighted least squares.

Hierarchical Topic Models and the Nested Chinese Restaurant Process Blei, Griffiths, Jordan, Tenenbaum presented by Rodrigo de Salvo Braz.

Simulation of the matrix Bingham-von Mises- Fisher distribution, with applications to multivariate and relational data Discussion led by Chunping Wang.

ECE 8443 – Pattern Recognition LECTURE 07: MAXIMUM LIKELIHOOD AND BAYESIAN ESTIMATION Objectives: Class-Conditional Density The Multivariate Case General.

High-Dimensional Unsupervised Selection and Estimation of a Finite Generalized Dirichlet Mixture model Based on Minimum Message Length by Nizar Bouguila.

CS 782 – Machine Learning Lecture 4 Linear Models for Classification  Probabilistic generative models  Probabilistic discriminative models.

Learning Theory Reza Shadmehr Linear and quadratic decision boundaries Kernel estimates of density Missing data.

Jun-Won Suh Intelligent Electronic Systems Human and Systems Engineering Department of Electrical and Computer Engineering Speaker Verification System.

Exploiting Context Analysis for Combining Multiple Entity Resolution Systems -Ramu Bandaru Zhaoqi Chen Dmitri V.kalashnikov Sharad Mehrotra.

Randomized Algorithms for Bayesian Hierarchical Clustering

1 Dirichlet Process Mixtures A gentle tutorial Graphical Models – Khalid El-Arini Carnegie Mellon University November 6 th, 2006 TexPoint fonts used.

Stick-Breaking Constructions

Learning In Bayesian Networks. General Learning Problem Set of random variables X = {X 1, X 2, X 3, X 4, …} Training set D = { X (1), X (2), …, X (N)

Ch15: Decision Theory & Bayesian Inference 15.1: INTRO: We are back to some theoretical statistics: 1.Decision Theory –Make decisions in the presence of.

Lecture 2: Statistical learning primer for biologists

1 Unsupervised Learning and Clustering Shyh-Kang Jeng Department of Electrical Engineering/ Graduate Institute of Communication/ Graduate Institute of.

Bayesian Density Regression Author: David B. Dunson and Natesh Pillai Presenter: Ya Xue April 28, 2006.

CSC321: Introduction to Neural Networks and Machine Learning Lecture 15: Mixtures of Experts Geoffrey Hinton.

Generative classifiers: The Gaussian classifier Ata Kaban School of Computer Science University of Birmingham.

Intro. ANN & Fuzzy Systems Lecture 38 Mixture of Experts Neural Network.

Hierarchical Beta Process and the Indian Buffet Process by R. Thibaux and M. I. Jordan Discussion led by Qi An.

Bayesian Hierarchical Clustering Paper by K. Heller and Z. Ghahramani ICML 2005 Presented by David Williams Paper Discussion Group ( )

Hierarchical Mixture of Experts Presented by Qi An Machine learning reading group Duke University 07/15/2005.

Nonparametric Bayesian Learning of Switching Dynamical Processes

Variational Bayes Model Selection for Mixture Distribution

Non-Parametric Models

Omiros Papaspiliopoulos and Gareth O. Roberts

Special Topics In Scientific Computing

Kernel Stick-Breaking Process

Learning with information of features

Collapsed Variational Dirichlet Process Mixture Models

Exact and Approximate Sum Representations for the Dirichlet Process

Hierarchical Topic Models and the Nested Chinese Restaurant Process

Generalized Spatial Dirichlet Process Models

Mathematical Foundations of BME Reza Shadmehr

Learning piecewise linear classifiers via Dirichlet Process

Chinese Restaurant Representation Stick-Breaking Construction

Parametric Methods Berlin Chen, 2005 References:

Mathematical Foundations of BME

Presentation transcript:

Multitask Learning Using Dirichlet Process Ya Xue July 1, 2005

Outline Task defined: infinite mixture of priors Multitask learning Dirichlet process Task undefined: expert network Finite expert network Infinite expert network

Multitask Learning - Common Prior Model M classification tasks: Shared prior of w:

Drawback of This Model Assume each wm is a two-dimensional vector.

Proposed Model w is drawn from a Gaussian mixture model:

Two Special Cases Common prior model - single Gaussian: Piecewise linear classifier – point mass function similar vs. identical

Clustering Unknown parameters: Another uncertainty: K. Model selection: compute evidence/Marginal:

Clustering with DP: No Model Selection We rewrite the model in another form: We define a Dirichlet process prior for parameters

Stick-Breaking View of DP 1 Finally we get

Prediction Rule of DP for Posterior Inference is a new data point. Assuming there are K distinct values of among , belongs to an existing cluster k: belongs to new cluster:

Toy Problem

Task 1

Task 2

Task 3

Task 4

Task 5

Task 6

Task 7

Task 8

Expert Network

Mathematical Model Gating node j: Likelihood:

Mathematical Model is the unique path from the root note to expert m. where

Example

Infinite Expert Network Infinite number of gating node.