Learning the Structure of Related Tasks Presented by Lihan He Machine Learning Reading Group Duke University 02/03/2006 A. Niculescu-Mizil, R. Caruana.

Learning the Structure of Related Tasks Presented by Lihan He Machine Learning Reading Group Duke University 02/03/2006 A. Niculescu-Mizil, R. Caruana

Outline  Introduction  Learning single Bayes networks from data  Learning from related tasks  Experimental results  Conclusions

Introduction Graphical model: Node represents random variables; edge represents dependency. Undirected graphical model: Markov network Directed graphical model: Bayesian network x1x1 x2x2 x3x3 x4x4 Causal relationships between nodes; Directed acyclic graph (DAG) : No directed cycles allowed; B={ G,θ }

Introduction Goal: simultaneously learn Bayes Net structures for multiple tasks. Different tasks are related; Structures might be similar, but not identical. Example: gene expression data. 1) Learning one single structure from data. 2) Generalizing to multiple task learning by setting joint prior of structures.

Single Bayesian network learning from data Bayes Network B={ G, θ }, including a set of n random variables X ={ X 1, X 2,…, X n } Joint probability P ( X) can be factorized by Given dataset D={x 1, x 2, …, x m }, where x i = (x 1,x 2,…,x n ), we can learn structure G and parameter θ from the dataset D.

Single Bayesian network learning from data Model selection : find the highest P(G|D) for all possible G Searching for all possible G is impossible:  n=4, there are 543 possible DAGs  n=10, there are O(10 18 ) possible DAGs Question: How to search the best structure in the huge amount of possible DAGs?

Algorithm: 1) Randomly generate an initial DAG, evaluate its score; 2) Evaluate the scores of all the neighbors of current DAG; 3) while {some neighbors have higher scores than current DAG} move to the neighbor that has the highest score Evaluate the scores of all the neighbors of the new DAG; end 4) Repeat (1) - (3) a number of times starting from different DAG every time. Single Bayesian network learning from data

Neighbors of a structure G: the set of all the DAGs that can be obtained by adding, removing or reversing an edge in G Single Bayesian network learning from data  Must satisfy acyclic constraint x1x1 x2x2 x3x3 x4x4 x1x1 x2x2 x3x3 x4x4 x1x1 x2x2 x3x3 x4x4 x1x1 x2x2 x3x3 x4x4 x1x1 x2x2 x3x3 x4x4

Given iid dataset D 1, D 2, …, D k, Simultaneously learn the structure B 1 ={G 1, θ 1 },B 2 ={G 2, θ 2 },…,B k ={G k, θ k } Structures (G 1,G 2,…,G k ) – similar, but not identical Learning from related task

One more assumption: the parameters of different networks are independent: Not true, but make structure learning more efficient. Since we focus on structure learning, not parameter learning, this is acceptable.

Learning from related task Prior:  If structures are not related: G 1,…,G k are independent a priori Structures are learned independently for each task.  If structures are identical, Learning the same structure: Learning the single structure under the restriction that TSK is always the parent of all the other nodes. Common structure: remove node TSK and all the edges connected to it.

Learning from related task Prior:  Between independent and identical: Penalize each edge ( X i, X j ) that is different in two DAGs δ=0: independent δ=1: identical 0<δ<1 For the k task prior

Learning from related task Model selection : find the highest P(G 1,…,G k |D 1,…D k )  Same idea as single task structure learning.  Question: what is a neighbor of (G 1,…,G k ) ? Def 1: Size of neighbors: O( n 2k ) Def 2: Def1 + one more constraint: All the changes of edges happen between the same two nodes for all DAGs in ( G 1,…, G k ) Size of neighbors: O( n 2 3 k )

Learning from related task Acceleration : At each iteration, algorithm must find best score from a set of neighbors Not necessary search all the elements in The first i tasks are specified and the rest k-i tasks are not specified. where is the upper bound of the neighbor subset

Results  Original network, delete edges with probability P del, create 5 tasks.  1000 data points.  10 trials  Compute KL-divergence and editing distance between learned structure and true structure. KL-divergenceEditing distance

Learning from related task

Learning the Structure of Related Tasks Presented by Lihan He Machine Learning Reading Group Duke University 02/03/2006 A. Niculescu-Mizil, R. Caruana.

Similar presentations

Presentation on theme: "Learning the Structure of Related Tasks Presented by Lihan He Machine Learning Reading Group Duke University 02/03/2006 A. Niculescu-Mizil, R. Caruana."— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

Learning the Structure of Related Tasks Presented by Lihan He Machine Learning Reading Group Duke University 02/03/2006 A. Niculescu-Mizil, R. Caruana.

Similar presentations

Presentation on theme: "Learning the Structure of Related Tasks Presented by Lihan He Machine Learning Reading Group Duke University 02/03/2006 A. Niculescu-Mizil, R. Caruana."— Presentation transcript:

Similar presentations

About project

Feedback