Presentation is loading. Please wait.

Presentation is loading. Please wait.

Model Averaging with Discrete Bayesian Network Classifiers

Similar presentations


Presentation on theme: "Model Averaging with Discrete Bayesian Network Classifiers"— Presentation transcript:

1 Model Averaging with Discrete Bayesian Network Classifiers
Denver Dash and Gregory F. Cooper In the Proceedings of the Ninth International Workshop on Artificial Intelligence and Statistics (AISTATS 2003)

2 (c) 2003 SNU CSE Biointelligence Lab
Contents Model-averaging over a class of discrete Bayesian network classifiers A partial ordering and bounded in-degree k. Theoretical results (for N nodes) The class has at least distinct structures. The summation can be performed in time. Approximate averaging in O(N) time. Experiments The technique can be beneficial even when the generating distribution is not a member of the class. Characterize the performance over several parameters. (c) 2003 SNU CSE Biointelligence Lab

3 Bayesian network classifiers
Naïve Bayes classifier General Bayesian network classifiers C F1 F2 FN Optimal in zero-one loss Poor generalization performance could be improved by Bayesian model averaging.  the space of network structure is super-exponential. F1 C F2 FN (c) 2003 SNU CSE Biointelligence Lab

4 (c) 2003 SNU CSE Biointelligence Lab
In this paper Bayesian model-averaging over a restricted class of Bayesian network classifiers A partial order (π) and a bounded in-degree (k). Contributions The factorization of the conditionals to apply to the task of classification. Show that MA over this class can be approximated by a single network S*  calculation in O(N) time. Empirical evaluation of the method compared with A single naïve Bayes classifer A single Bayesian network learned by a greedy search Exact MA on naïve Bayes classifiers. (c) 2003 SNU CSE Biointelligence Lab

5 (c) 2003 SNU CSE Biointelligence Lab
Notations The classification problem A set of features F = {F1, F2, …, FN}. X0 = C, X1 = F1, …, XN = FN.  X (in Bayesian networks) A set of classes C = {C1, C2, …, CNC}. A database D = {D1, D2, …, DR}. A Bayesian network G(X): a DAG structure Xi: a multinomial distribution Pi: a parents of Xi A parameter Parameter set θ Other assumptions: parameter independence, Dirichlet priors, … (c) 2003 SNU CSE Biointelligence Lab

6 Fixed network structures
With the fixed network parameters θ Bayesian averaging over the parameters with conjugate priors (c) 2003 SNU CSE Biointelligence Lab

7 Averaging with a fixed ordering (1)
For a structural feature, e.g. XL  XM The posterior probability P(XL  XM|D), The structure modularity The marginal likelihood (decomposable) (c) 2003 SNU CSE Biointelligence Lab

8 Averaging with a fixed ordering (2)
Then, the posterior probability of a structural feature can be represented as, (c) 2003 SNU CSE Biointelligence Lab

9 Averaging with a fixed ordering (3)
Enumerating the possible parents of Xi given a partial ordering: π: <{X1, X3}, {X2, X4}>, k = 2. P20 = 0, P21 = {X1}, P22 = {X3}, P23 = {X1, X3}. (c) 2003 SNU CSE Biointelligence Lab

10 Averaging with a fixed ordering (4)
(c) 2003 SNU CSE Biointelligence Lab

11 Averaging with a fixed ordering (5)
(c) 2003 SNU CSE Biointelligence Lab

12 Averaging with a fixed ordering (6)
Dynamic programming solution Finally, (c) 2003 SNU CSE Biointelligence Lab

13 Model averaging for predictions
The probability of a new example can be calculated as similarly as the probability of a structural feature. Hence, The parameter value θijk is used on behalf of the Kronecker-delta function. (c) 2003 SNU CSE Biointelligence Lab

14 Approximation on the model averaging
The time bound is still severe even for moderate cases (k = 3 or 4). One approximation Order the set of possible parents for Xi based on the function f(Xi, Piν|D) and prune them. (c) 2003 SNU CSE Biointelligence Lab

15 Experimental evaluation (1)
Performance metric δ = (R1 – R2 / T – R2) Synthetic data sets Comparisons between exact averaging and approximation (c) 2003 SNU CSE Biointelligence Lab

16 Experimental evaluation (2)
Approximate model averaging vs. greedy thick-thin search (c) 2003 SNU CSE Biointelligence Lab

17 Experimental evaluation (3)
Synthetic data from the ALARM network AMA vs. GTT (c) 2003 SNU CSE Biointelligence Lab

18 Experimental evaluation (4)
Real classification data sets from the UCI repository (c) 2003 SNU CSE Biointelligence Lab

19 (c) 2003 SNU CSE Biointelligence Lab
Discussion Approximate model averaging outperforms a single BN classifier. Simplicity of the implementation. Future work Find a better method for optimizing for the ordering. Applications to the real-world problems. Relax the assumption of the complete data. (c) 2003 SNU CSE Biointelligence Lab


Download ppt "Model Averaging with Discrete Bayesian Network Classifiers"

Similar presentations


Ads by Google