Model Averaging with Discrete Bayesian Network Classifiers Denver Dash and Gregory F. Cooper In the Proceedings of the Ninth International Workshop on Artificial Intelligence and Statistics (AISTATS 2003)
(c) 2003 SNU CSE Biointelligence Lab Contents Model-averaging over a class of discrete Bayesian network classifiers A partial ordering and bounded in-degree k. Theoretical results (for N nodes) The class has at least distinct structures. The summation can be performed in time. Approximate averaging in O(N) time. Experiments The technique can be beneficial even when the generating distribution is not a member of the class. Characterize the performance over several parameters. (c) 2003 SNU CSE Biointelligence Lab
Bayesian network classifiers Naïve Bayes classifier General Bayesian network classifiers C F1 F2 FN Optimal in zero-one loss Poor generalization performance could be improved by Bayesian model averaging. the space of network structure is super-exponential. F1 C F2 FN (c) 2003 SNU CSE Biointelligence Lab
(c) 2003 SNU CSE Biointelligence Lab In this paper Bayesian model-averaging over a restricted class of Bayesian network classifiers A partial order (π) and a bounded in-degree (k). Contributions The factorization of the conditionals to apply to the task of classification. Show that MA over this class can be approximated by a single network S* calculation in O(N) time. Empirical evaluation of the method compared with A single naïve Bayes classifer A single Bayesian network learned by a greedy search Exact MA on naïve Bayes classifiers. (c) 2003 SNU CSE Biointelligence Lab
(c) 2003 SNU CSE Biointelligence Lab Notations The classification problem A set of features F = {F1, F2, …, FN}. X0 = C, X1 = F1, …, XN = FN. X (in Bayesian networks) A set of classes C = {C1, C2, …, CNC}. A database D = {D1, D2, …, DR}. A Bayesian network G(X): a DAG structure Xi: a multinomial distribution Pi: a parents of Xi A parameter Parameter set θ Other assumptions: parameter independence, Dirichlet priors, … (c) 2003 SNU CSE Biointelligence Lab
Fixed network structures With the fixed network parameters θ Bayesian averaging over the parameters with conjugate priors (c) 2003 SNU CSE Biointelligence Lab
Averaging with a fixed ordering (1) For a structural feature, e.g. XL XM The posterior probability P(XL XM|D), The structure modularity The marginal likelihood (decomposable) (c) 2003 SNU CSE Biointelligence Lab
Averaging with a fixed ordering (2) Then, the posterior probability of a structural feature can be represented as, (c) 2003 SNU CSE Biointelligence Lab
Averaging with a fixed ordering (3) Enumerating the possible parents of Xi given a partial ordering: π: <{X1, X3}, {X2, X4}>, k = 2. P20 = 0, P21 = {X1}, P22 = {X3}, P23 = {X1, X3}. (c) 2003 SNU CSE Biointelligence Lab
Averaging with a fixed ordering (4) (c) 2003 SNU CSE Biointelligence Lab
Averaging with a fixed ordering (5) (c) 2003 SNU CSE Biointelligence Lab
Averaging with a fixed ordering (6) Dynamic programming solution Finally, (c) 2003 SNU CSE Biointelligence Lab
Model averaging for predictions The probability of a new example can be calculated as similarly as the probability of a structural feature. Hence, The parameter value θijk is used on behalf of the Kronecker-delta function. (c) 2003 SNU CSE Biointelligence Lab
Approximation on the model averaging The time bound is still severe even for moderate cases (k = 3 or 4). One approximation Order the set of possible parents for Xi based on the function f(Xi, Piν|D) and prune them. (c) 2003 SNU CSE Biointelligence Lab
Experimental evaluation (1) Performance metric δ = (R1 – R2 / T – R2) Synthetic data sets Comparisons between exact averaging and approximation (c) 2003 SNU CSE Biointelligence Lab
Experimental evaluation (2) Approximate model averaging vs. greedy thick-thin search (c) 2003 SNU CSE Biointelligence Lab
Experimental evaluation (3) Synthetic data from the ALARM network AMA vs. GTT (c) 2003 SNU CSE Biointelligence Lab
Experimental evaluation (4) Real classification data sets from the UCI repository (c) 2003 SNU CSE Biointelligence Lab
(c) 2003 SNU CSE Biointelligence Lab Discussion Approximate model averaging outperforms a single BN classifier. Simplicity of the implementation. Future work Find a better method for optimizing for the ordering. Applications to the real-world problems. Relax the assumption of the complete data. (c) 2003 SNU CSE Biointelligence Lab