Download presentation
Presentation is loading. Please wait.
1
Maximum Entropy Model & Generalized Iterative Scaling Arindam Bose CS 621 – Artificial Intelligence 27 th August, 2007
2
Statistical Classification Statistical Classification problems Task is to find probability of class “a” occurring with context “b” or P(a,b) Context depends on nature of task Eg. In NLP tasks, the context may consist of several words and associated syntactic labels
3
Training Data Gather information from training data Large training data will contain some information about the co-occurrence of a’s and b’s Information never enough to completely specify P(a,b) for all possible (a,b) pairs
4
Task formulation Find a method using the sparse evidence about a’s and b’s to reliably estimate a probability model, P(a,b) Principle of Maximum Entropy: the correct distribution P(a,b) is that which maximizes entropy, or “uncertainity” subject to constraints Constraints represent evidence
5
The Philosophy Making inferences on the basis of partial information without biasing the assignment that would amount to arbitrary assumptions of information that we do not have Maximize where remaining consistent with evidence
6
Representing evidence Encode useful facts as features and impose constraints on the expectations of these features A feature is a binary valued function, g(f,h)=1 if current_token_capitalized(h)=true and f=location_start =0 otherwise Given k features, constraints have the form, i.e. the model’s expectation for each feature should match the observed expectation
7
Maximum Entropy Model Maximum entropy solution allows computation of P(f|h) for any f (a possible future/class) from every h (a possible history/context) “History” or the “context” is the conditioning data which enables the decision
8
The Model In the model produced by M.E. estimation, every feature has an associated parameter Conditional probability is calculated as : where M.E estimation technique guarantees for every feature g i, the expected value of g i according to the M.E. model will equal the empirical expectation of g i in the training corpus
9
Generalized Iterative Scaling Generalized Iterative Scaling, GIS, finds parameters of the distribution P GIS requires the constraint that If not, choose and add a correctional feature, such that
10
The GIS procedure It can be proven that a probability sequence whose parameters are defined by this procedure converges to a unique and positive solution
11
Computation Each iteration requires computation of and Given training sample,, calculation of is straightforward The computation of the model’s feature expectation can be intractable
12
Computation of We have, Baye’s rule, We use an approximation, summing over all the histories in the training sample, and not
13
Termination and Running Time Termination after a fixed number of iterations (e.g. 100) or negligible change in log likelihood Running time of each iteration dominated by computation of which is O(NPA)
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.