Multitask Learning Using Dirichlet Process Ya Xue July 1, 2005
Outline Task defined: infinite mixture of priors Multitask learning Dirichlet process Task undefined: expert network Finite expert network Infinite expert network
Multitask Learning - Common Prior Model M classification tasks: Shared prior of w:
Drawback of This Model Assume each wm is a two-dimensional vector.
Proposed Model w is drawn from a Gaussian mixture model:
Two Special Cases Common prior model - single Gaussian: Piecewise linear classifier – point mass function similar vs. identical
Clustering Unknown parameters: Another uncertainty: K. Model selection: compute evidence/Marginal:
Clustering with DP: No Model Selection We rewrite the model in another form: We define a Dirichlet process prior for parameters
Stick-Breaking View of DP 1 Finally we get
Prediction Rule of DP for Posterior Inference is a new data point. Assuming there are K distinct values of among , belongs to an existing cluster k: belongs to new cluster:
Toy Problem
Task 1
Task 2
Task 3
Task 4
Task 5
Task 6
Task 7
Task 8
Expert Network
Mathematical Model Gating node j: Likelihood:
Mathematical Model is the unique path from the root note to expert m. where
Example
Infinite Expert Network Infinite number of gating node.