Download presentation
Presentation is loading. Please wait.
Published bySherman Morris Modified over 9 years ago
1
Intelligent Database Systems Lab Advisor : Dr.Hsu Graduate : Keng-Wei Chang Author : Lian Yan and David J. Miller 國立雲林科技大學 National Yunlin University of Science and Technology General statistical inference for discrete and mixed spaces by an approximate application of the maximum entropy principle IEEE TRANSACTIONS ON NEURAL NETWORKS, VOL. 11, NO. 3, MAY 2000
2
Intelligent Database Systems Lab Outline Motivation Objective Introduction Maximum Entropy Joint PMF Extensions for More General Inference Problems Experimental Results Conclusions and Possible Extensions N.Y.U.S.T. I.M.
3
Intelligent Database Systems Lab Motivation maximum entropy (ME) joint probability mass function (pmf) powerful and not require expression of conditional independence the huge learning complexity has severely limited the use of this approach N.Y.U.S.T. I.M.
4
Intelligent Database Systems Lab Objective propose an approach can quite tractable learning extend to with mixed data N.Y.U.S.T. I.M.
5
Intelligent Database Systems Lab 1. Introduction probability mass function (pmf) joint pmf, can compute a posteriori probabilities for a single, fixed feature given knowledge of the remaining feature values statistical classification with some feature values missing statistical classification for any (e.g., user-specified) discrete feature dimensions given values for the other features generalized classification N.Y.U.S.T. I.M.
6
Intelligent Database Systems Lab 1. Introduction Multiple Networks Approach Bayesian Networks Maximum Entropy Models Advantages of the Proposed ME Method over BN’s N.Y.U.S.T. I.M.
7
Intelligent Database Systems Lab 1.1 Multiple Networks Approach multilayer perceptrons (MLP’s), radial basis functions, support vector machines one would train one network for each feature example: documents classification to multiple topics one network was used to make an individual yes/no decision for presence of each possible topic multiple networks approach N.Y.U.S.T. I.M.
8
Intelligent Database Systems Lab 1.1 Multiple Networks Approach several potential difficulties increased learning and storage complexities accuracy of inferences ignores dependencies between features example: network predict F 1 = 1 and F 2 = 1 respectively but the joint event (F 1 =1, F 2 =1) has zero probability N.Y.U.S.T. I.M.
9
Intelligent Database Systems Lab 1.2 Bayesian Networks handles missing features and captures dependencies between the multiple features joint pmf explicitly a product of conditional probability versatile tools for inference that have a convenient, informative representation… N.Y.U.S.T. I.M.
10
Intelligent Database Systems Lab 1.2 Bayesian Networks several difficulties with BN explicitly conditional independence relations between features optimizing over the set of possible BN structures sequential, greedy methods may be suboptimal sequential learning where to stop to avoid overfitting N.Y.U.S.T. I.M.
11
Intelligent Database Systems Lab 1.3 Maximum Entropy Models Cheeseman proposed maximum entropy (ME) joint pmf consistent with arbitrary lower order probability constraints powerful, allowing joint pmf to express general dependencies between features N.Y.U.S.T. I.M.
12
Intelligent Database Systems Lab 1.3 Maximum Entropy Models several difficulties with ME difficult learning for estimating the ME Ku and Kullback proposed an iterative algorithm, satisfies one constraint at a time, but cause violation of others they only presented results for dimension N = 4 and J = 2 discrete values per feature Peral cites complexity as the main barriers to using ME N.Y.U.S.T. I.M.
13
Intelligent Database Systems Lab 1.4 Advantages of the Proposed ME Method over BN’s our approach not requir explicit conditional independence an effective joint optimization learning technique N.Y.U.S.T. I.M.
14
Intelligent Database Systems Lab 2. Maximum Entropy Joint PMF N.Y.U.S.T. I.M. a random feature vector full discrete feature space
15
Intelligent Database Systems Lab 2. Maximum Entropy Joint PMF pairwise pmf constrain the joint pmf to agree with the ME joint pmf consistent with these pairwise pmf’s has the Gibbs form N.Y.U.S.T. I.M. Lagrange multiplier
16
Intelligent Database Systems Lab 2. Maximum Entropy Joint PMF Lagrange multiplier equality constraint on the individual pairwise probability the joint pmf is specified by the set of Lagrange multipliers these probabilities also depend on Γ, they can often be tractably computed N.Y.U.S.T. I.M.
17
Intelligent Database Systems Lab 2. Maximum Entropy Joint PMF two major difficulties optimization requires calculating intractable cost D will require marginalizations over the joint pmf intractable approximate ME was inspired N.Y.U.S.T. I.M.
18
Intelligent Database Systems Lab 2.1 Review of the ME Formulation for Classification random feature vector still has intractable form (1) classification does require computing but rather just the a posteriori probabilities N.Y.U.S.T. I.M. still not feasible!
19
Intelligent Database Systems Lab 2.1 Review of the ME Formulation for Classification here we review a tractable, approximate method Joint PMF Form Support Approximation Lagrangian Formulation N.Y.U.S.T. I.M.
20
Intelligent Database Systems Lab 2.1.1 Joint PMF Form via Bayes rule where N.Y.U.S.T. I.M.
21
Intelligent Database Systems Lab 2.1.2 Support Approximation the approximation may have some effect on accuracy of the learned model but will not sacrifice our capability full feature space subset computationally feasible example: N =19 40 billion 100 reduction is huge N.Y.U.S.T. I.M.
22
Intelligent Database Systems Lab 2.1.3 Lagrangian Formulation i.e., then the joint entropy for N.Y.U.S.T. I.M.
23
Intelligent Database Systems Lab 2.1.3 Lagrangian Formulation suggest the cross entropy the cross entropy/Kullback distance N.Y.U.S.T. I.M.
24
Intelligent Database Systems Lab 2.1.3 Lagrangian Formulation For pairwise constraints involving the class label P[F k, C] N.Y.U.S.T. I.M.
25
Intelligent Database Systems Lab 2.1.3 Lagrangian Formulation overall constraint cost D is formed as a sum of all the individual pairwise costs given D and H, can form the Lagrangian cost function N.Y.U.S.T. I.M.
26
Intelligent Database Systems Lab 3. Extensions for More General Inference Problems General statistical Inference Joint PMF Representation Support Approximation Lagrangian Formulatoin Discussion Mixed Discrete and Continuous Feature Space N.Y.U.S.T. I.M.
27
Intelligent Database Systems Lab 3.1.1 Joint PMF Representation the posteriori probabilities have N.Y.U.S.T. I.M.
28
Intelligent Database Systems Lab 3.1.1 Joint PMF Representation respect to each feature F i, the joint pmf as N.Y.U.S.T. I.M.
29
Intelligent Database Systems Lab 3.1.2 Support Approximation reduced joint pmf for if there is a set N.Y.U.S.T. I.M.
30
Intelligent Database Systems Lab 3.1.3 Lagrangian Formulatoin the joint entropy H can be written N.Y.U.S.T. I.M.
31
Intelligent Database Systems Lab 3.1.3 Lagrangian Formulatoin pairwise pmf P M [F k, F l ] can be calculated in two different ways and N.Y.U.S.T. I.M.
32
Intelligent Database Systems Lab 3.1.3 Lagrangian Formulatoin overall constraint cost D N.Y.U.S.T. I.M.
33
Intelligent Database Systems Lab 3.1.3 Lagrangian Formulatoin N.Y.U.S.T. I.M.
34
Intelligent Database Systems Lab N.Y.U.S.T. I.M.
35
Intelligent Database Systems Lab N.Y.U.S.T. I.M.
36
Intelligent Database Systems Lab 3.2. Discussion Choice of Constraints encode all probabilities of second order Tractability of Learning Qualitative Comparison of Methods N.Y.U.S.T. I.M.
37
Intelligent Database Systems Lab 3.3. Mixed Discrete and Continuous Feature Space feature vector will be written our objective is to learn N.Y.U.S.T. I.M.
38
Intelligent Database Systems Lab 3.3. Mixed Discrete and Continuous Feature Space given our choice of constraints, these probabilities decompose the joint density as N.Y.U.S.T. I.M.
39
Intelligent Database Systems Lab 3.3. Mixed Discrete and Continuous Feature Space a conditional mean constraint on A i given C = c a pair of continuous features A i, A j N.Y.U.S.T. I.M.
40
Intelligent Database Systems Lab 4. Experiment Results Evaluation of generalized classification performance used solely for classification Mushroom, Congress, Nursery, Zoo, Hepatitis Generalized classification performance on data sets indicates multiple possible class features Solar Flare, Flag, Horse Colic Classification performance on data sets with mixed continuous and discrete features Credit Approval, Hepatitis, Horse Colic N.Y.U.S.T. I.M.
41
Intelligent Database Systems Lab 4. Experiment Results the ME method was compared with BN DT powerful extension of DT mixtures of DT multilayer perceptrons (MLP) N.Y.U.S.T. I.M.
42
Intelligent Database Systems Lab 4. Experiment Results for a arbitrary feature to be inrerred, Fi, computes the a posteriori probabilities N.Y.U.S.T. I.M.
43
Intelligent Database Systems Lab use the following criteria to evaluate all the methods (1) misclassification rate on the test set for the data set ’ s class label (2) (1) with a single feature missing randomly (3) average misclassification rate on the test set (4) misclassification rate on the test set, based on predicting a pair of randomly chosen features N.Y.U.S.T. I.M. 4. Experiment Results
44
Intelligent Database Systems Lab N.Y.U.S.T. I.M.
45
Intelligent Database Systems Lab N.Y.U.S.T. I.M.
46
Intelligent Database Systems Lab 4. Experiment Results N.Y.U.S.T. I.M.
47
Intelligent Database Systems Lab 5. Conclusions and Possible Extensions Regression Large-Scale Problems Model Selection-Searching for ME Constraints Applications N.Y.U.S.T. I.M.
48
Intelligent Database Systems Lab Personal opinion … N.Y.U.S.T. I.M.
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.