Download presentation
Presentation is loading. Please wait.
Published byViktor Gáspár Modified over 6 years ago
1
Model Averaging with Discrete Bayesian Network Classifiers
Denver Dash and Gregory F. Cooper In the Proceedings of the Ninth International Workshop on Artificial Intelligence and Statistics (AISTATS 2003)
2
(c) 2003 SNU CSE Biointelligence Lab
Contents Model-averaging over a class of discrete Bayesian network classifiers A partial ordering and bounded in-degree k. Theoretical results (for N nodes) The class has at least distinct structures. The summation can be performed in time. Approximate averaging in O(N) time. Experiments The technique can be beneficial even when the generating distribution is not a member of the class. Characterize the performance over several parameters. (c) 2003 SNU CSE Biointelligence Lab
3
Bayesian network classifiers
Naïve Bayes classifier General Bayesian network classifiers C F1 F2 FN Optimal in zero-one loss Poor generalization performance could be improved by Bayesian model averaging. the space of network structure is super-exponential. F1 C F2 FN (c) 2003 SNU CSE Biointelligence Lab
4
(c) 2003 SNU CSE Biointelligence Lab
In this paper Bayesian model-averaging over a restricted class of Bayesian network classifiers A partial order (π) and a bounded in-degree (k). Contributions The factorization of the conditionals to apply to the task of classification. Show that MA over this class can be approximated by a single network S* calculation in O(N) time. Empirical evaluation of the method compared with A single naïve Bayes classifer A single Bayesian network learned by a greedy search Exact MA on naïve Bayes classifiers. (c) 2003 SNU CSE Biointelligence Lab
5
(c) 2003 SNU CSE Biointelligence Lab
Notations The classification problem A set of features F = {F1, F2, …, FN}. X0 = C, X1 = F1, …, XN = FN. X (in Bayesian networks) A set of classes C = {C1, C2, …, CNC}. A database D = {D1, D2, …, DR}. A Bayesian network G(X): a DAG structure Xi: a multinomial distribution Pi: a parents of Xi A parameter Parameter set θ Other assumptions: parameter independence, Dirichlet priors, … (c) 2003 SNU CSE Biointelligence Lab
6
Fixed network structures
With the fixed network parameters θ Bayesian averaging over the parameters with conjugate priors (c) 2003 SNU CSE Biointelligence Lab
7
Averaging with a fixed ordering (1)
For a structural feature, e.g. XL XM The posterior probability P(XL XM|D), The structure modularity The marginal likelihood (decomposable) (c) 2003 SNU CSE Biointelligence Lab
8
Averaging with a fixed ordering (2)
Then, the posterior probability of a structural feature can be represented as, (c) 2003 SNU CSE Biointelligence Lab
9
Averaging with a fixed ordering (3)
Enumerating the possible parents of Xi given a partial ordering: π: <{X1, X3}, {X2, X4}>, k = 2. P20 = 0, P21 = {X1}, P22 = {X3}, P23 = {X1, X3}. (c) 2003 SNU CSE Biointelligence Lab
10
Averaging with a fixed ordering (4)
(c) 2003 SNU CSE Biointelligence Lab
11
Averaging with a fixed ordering (5)
(c) 2003 SNU CSE Biointelligence Lab
12
Averaging with a fixed ordering (6)
Dynamic programming solution Finally, (c) 2003 SNU CSE Biointelligence Lab
13
Model averaging for predictions
The probability of a new example can be calculated as similarly as the probability of a structural feature. Hence, The parameter value θijk is used on behalf of the Kronecker-delta function. (c) 2003 SNU CSE Biointelligence Lab
14
Approximation on the model averaging
The time bound is still severe even for moderate cases (k = 3 or 4). One approximation Order the set of possible parents for Xi based on the function f(Xi, Piν|D) and prune them. (c) 2003 SNU CSE Biointelligence Lab
15
Experimental evaluation (1)
Performance metric δ = (R1 – R2 / T – R2) Synthetic data sets Comparisons between exact averaging and approximation (c) 2003 SNU CSE Biointelligence Lab
16
Experimental evaluation (2)
Approximate model averaging vs. greedy thick-thin search (c) 2003 SNU CSE Biointelligence Lab
17
Experimental evaluation (3)
Synthetic data from the ALARM network AMA vs. GTT (c) 2003 SNU CSE Biointelligence Lab
18
Experimental evaluation (4)
Real classification data sets from the UCI repository (c) 2003 SNU CSE Biointelligence Lab
19
(c) 2003 SNU CSE Biointelligence Lab
Discussion Approximate model averaging outperforms a single BN classifier. Simplicity of the implementation. Future work Find a better method for optimizing for the ordering. Applications to the real-world problems. Relax the assumption of the complete data. (c) 2003 SNU CSE Biointelligence Lab
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.