Model Averaging with Discrete Bayesian Network Classifiers

Model Averaging with Discrete Bayesian Network Classifiers
Denver Dash and Gregory F. Cooper In the Proceedings of the Ninth International Workshop on Artificial Intelligence and Statistics (AISTATS 2003)

(c) 2003 SNU CSE Biointelligence Lab
Contents Model-averaging over a class of discrete Bayesian network classifiers A partial ordering and bounded in-degree k. Theoretical results (for N nodes) The class has at least distinct structures. The summation can be performed in time. Approximate averaging in O(N) time. Experiments The technique can be beneficial even when the generating distribution is not a member of the class. Characterize the performance over several parameters. (c) 2003 SNU CSE Biointelligence Lab

Bayesian network classifiers
Naïve Bayes classifier General Bayesian network classifiers C F1 F2 FN Optimal in zero-one loss Poor generalization performance could be improved by Bayesian model averaging.  the space of network structure is super-exponential. F1 C F2 FN (c) 2003 SNU CSE Biointelligence Lab

In this paper Bayesian model-averaging over a restricted class of Bayesian network classifiers A partial order (π) and a bounded in-degree (k). Contributions The factorization of the conditionals to apply to the task of classification. Show that MA over this class can be approximated by a single network S*  calculation in O(N) time. Empirical evaluation of the method compared with A single naïve Bayes classifer A single Bayesian network learned by a greedy search Exact MA on naïve Bayes classifiers. (c) 2003 SNU CSE Biointelligence Lab

Notations The classification problem A set of features F = {F1, F2, …, FN}. X0 = C, X1 = F1, …, XN = FN.  X (in Bayesian networks) A set of classes C = {C1, C2, …, CNC}. A database D = {D1, D2, …, DR}. A Bayesian network G(X): a DAG structure Xi: a multinomial distribution Pi: a parents of Xi A parameter Parameter set θ Other assumptions: parameter independence, Dirichlet priors, … (c) 2003 SNU CSE Biointelligence Lab

Fixed network structures
With the fixed network parameters θ Bayesian averaging over the parameters with conjugate priors (c) 2003 SNU CSE Biointelligence Lab

Averaging with a fixed ordering (1)
For a structural feature, e.g. XL  XM The posterior probability P(XL  XM|D), The structure modularity The marginal likelihood (decomposable) (c) 2003 SNU CSE Biointelligence Lab

Model averaging for predictions
The probability of a new example can be calculated as similarly as the probability of a structural feature. Hence, The parameter value θijk is used on behalf of the Kronecker-delta function. (c) 2003 SNU CSE Biointelligence Lab

Approximation on the model averaging
The time bound is still severe even for moderate cases (k = 3 or 4). One approximation Order the set of possible parents for Xi based on the function f(Xi, Piν|D) and prune them. (c) 2003 SNU CSE Biointelligence Lab

Experimental evaluation (1)
Performance metric δ = (R1 – R2 / T – R2) Synthetic data sets Comparisons between exact averaging and approximation (c) 2003 SNU CSE Biointelligence Lab

Discussion Approximate model averaging outperforms a single BN classifier. Simplicity of the implementation. Future work Find a better method for optimizing for the ordering. Applications to the real-world problems. Relax the assumption of the complete data. (c) 2003 SNU CSE Biointelligence Lab

Model Averaging with Discrete Bayesian Network Classifiers

Similar presentations

Presentation on theme: "Model Averaging with Discrete Bayesian Network Classifiers"— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

Model Averaging with Discrete Bayesian Network Classifiers

Similar presentations

Presentation on theme: "Model Averaging with Discrete Bayesian Network Classifiers"— Presentation transcript:

Similar presentations

About project

Feedback