Download presentation
Presentation is loading. Please wait.
1
Two Approaches to Bayesian Network Structure Learning Goal : Compare an algorithm that learns a BN tree structure (TAN) with an algorithm that learns a constraints-free structure – (Build-BN). Problem Definition: Finding an exact BN structure for complete discrete data. –Known to be NP-hard. –maximization problem over defined score. Build-BN Algorithm: Algorithm’s Attributes: –No structural constraints. –Straight Forward approach – not avoiding any computation. –Feasible only for small networks (<30 variables). Crucial Facts lying in the core of the Algorithm: –There are scoring-functions which are decomposable to local scores (we used BIC for the algorithm) –Every DAG has at least one node with no outgoing arcs (=sink). Implementation Note: Build-BN requires a lot of memory.Therefore, implementation strongly utilizes the file-system. Yael Kinderman & Tali Goren
2
Algorithm’s Flow Step I: Find Local Scores:, (V = set of all variables), calculate ‘local BIC’: )\(,xVvsVx BIC(x,vs) = Where: k iterates over all possible values of x J iterates over all possible values of Pa(x), N = number of samples N j = number of samples where par(X)= j, N jk = “ “ “ “ “ “ “ and X=k. All in all, n2 n-1 scores are calculated in this step. Step II: Find Best Parents:, find best parents of x in the var-set. Traversing var-sets by lexicographic order (smaller to larger), Results in time complexity of O((n-1)2 n-1 ).
3
Algorithm’s Flow – cont. Step III: Find Best Sinks For each 2 n var -sets we find a best sink. Let Sink*(W) be the best sink of a var-set W. Then Sink*(W) can be found by: Ws ) ),(( maxarg sWskore Sink*(W) Where: g* s (var-set) = the best set of parents for s in the var-set. G*(var-set) = the highest scoring network for a var-set. We traverse var-sets by lexicographic order, and use scores that were calculated in previous iterations.
4
Algorithm’s Flow – cont. Step V: Find best network Having best order (ord i *(V)) and best parents (g*(W)) for each W V, we can find the network as following: In other words: the i th var in the optimal ordering, picks best parents from the var-set that contains all the variables that are predecessors in the ordering. Step IV: Find Best Order Best sinks immediately yield the best ordering (in reverse order). || 1 *** ))(\(sin)( V ij ji VordVkV
5
Using the BN for Prediction 5-fold cross validation: 80% of the data used for building structure & CPDs, 20% “ “ “ “ “ ‘label prediction’. Predicting the label ‘C’ of a given sample is done using:
6
Test over the Famous ‘Student’ Model Testing our implementation over ‘synthetic’ data: We simulated 300 samples according to the BN and the CPDs as were presented in class. Prediction performed using TAN and build-BN. Note: In Build-BN, 4 out of 5 fold cross validation gave the above net. Build-BN result Prediction Success Rate: 0.836 Prediction Success Rate: 0.85 TAN result
7
Experimental Results Possible explanation for the last 2 results: Zoo – only 101 instances… Vehicle – what’s wrong with this data ?! Note the low in-degrees (model induced by data-sets are by nature close to trees). Data taken from: UCI machine learning DB
8
Example I: Corral Example I: Corral Build-BN does not force ‘Irrelevant’ variables to be linked into the BN Build-BN result Prediction Success Rate: 0.969 Prediction Success Rate: 0.937 TAN result
9
Example II – TIC TAC TOE Example II – TIC TAC TOE No constraints on the structure enables better prediction Prediction Success Rate: 0.653 TAN result Prediction Success Rate: 0.844 Build-BN result References: Tomi Silander, Petri Myllymaki, HIIT. A Simple Approach for Finding the Globally Optimal Bayesian Network Structure.
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.