Download presentation
Presentation is loading. Please wait.
Published byGladys Gilbert Modified over 9 years ago
1
Learning the Structure of Related Tasks Presented by Lihan He Machine Learning Reading Group Duke University 02/03/2006 A. Niculescu-Mizil, R. Caruana
2
Outline Introduction Learning single Bayes networks from data Learning from related tasks Experimental results Conclusions
3
Introduction Graphical model: Node represents random variables; edge represents dependency. Undirected graphical model: Markov network Directed graphical model: Bayesian network x1x1 x2x2 x3x3 x4x4 Causal relationships between nodes; Directed acyclic graph (DAG) : No directed cycles allowed; B={ G,θ }
4
Introduction Goal: simultaneously learn Bayes Net structures for multiple tasks. Different tasks are related; Structures might be similar, but not identical. Example: gene expression data. 1) Learning one single structure from data. 2) Generalizing to multiple task learning by setting joint prior of structures.
5
Single Bayesian network learning from data Bayes Network B={ G, θ }, including a set of n random variables X ={ X 1, X 2,…, X n } Joint probability P ( X) can be factorized by Given dataset D={x 1, x 2, …, x m }, where x i = (x 1,x 2,…,x n ), we can learn structure G and parameter θ from the dataset D.
6
Single Bayesian network learning from data Model selection : find the highest P(G|D) for all possible G Searching for all possible G is impossible: n=4, there are 543 possible DAGs n=10, there are O(10 18 ) possible DAGs Question: How to search the best structure in the huge amount of possible DAGs?
7
Algorithm: 1) Randomly generate an initial DAG, evaluate its score; 2) Evaluate the scores of all the neighbors of current DAG; 3) while {some neighbors have higher scores than current DAG} move to the neighbor that has the highest score Evaluate the scores of all the neighbors of the new DAG; end 4) Repeat (1) - (3) a number of times starting from different DAG every time. Single Bayesian network learning from data
8
Neighbors of a structure G: the set of all the DAGs that can be obtained by adding, removing or reversing an edge in G Single Bayesian network learning from data Must satisfy acyclic constraint x1x1 x2x2 x3x3 x4x4 x1x1 x2x2 x3x3 x4x4 x1x1 x2x2 x3x3 x4x4 x1x1 x2x2 x3x3 x4x4 x1x1 x2x2 x3x3 x4x4
9
Given iid dataset D 1, D 2, …, D k, Simultaneously learn the structure B 1 ={G 1, θ 1 },B 2 ={G 2, θ 2 },…,B k ={G k, θ k } Structures (G 1,G 2,…,G k ) – similar, but not identical Learning from related task
10
One more assumption: the parameters of different networks are independent: Not true, but make structure learning more efficient. Since we focus on structure learning, not parameter learning, this is acceptable.
11
Learning from related task Prior: If structures are not related: G 1,…,G k are independent a priori Structures are learned independently for each task. If structures are identical, Learning the same structure: Learning the single structure under the restriction that TSK is always the parent of all the other nodes. Common structure: remove node TSK and all the edges connected to it.
12
Learning from related task Prior: Between independent and identical: Penalize each edge ( X i, X j ) that is different in two DAGs δ=0: independent δ=1: identical 0<δ<1 For the k task prior
13
Learning from related task Model selection : find the highest P(G 1,…,G k |D 1,…D k ) Same idea as single task structure learning. Question: what is a neighbor of (G 1,…,G k ) ? Def 1: Size of neighbors: O( n 2k ) Def 2: Def1 + one more constraint: All the changes of edges happen between the same two nodes for all DAGs in ( G 1,…, G k ) Size of neighbors: O( n 2 3 k )
14
Learning from related task Acceleration : At each iteration, algorithm must find best score from a set of neighbors Not necessary search all the elements in The first i tasks are specified and the rest k-i tasks are not specified. where is the upper bound of the neighbor subset
15
Results Original network, delete edges with probability P del, create 5 tasks. 1000 data points. 10 trials Compute KL-divergence and editing distance between learned structure and true structure. KL-divergenceEditing distance
16
Learning from related task
Similar presentations
© 2024 SlidePlayer.com. Inc.
All rights reserved.