Presentation is loading. Please wait.

Presentation is loading. Please wait.

Maximum Entropy Model & Generalized Iterative Scaling Arindam Bose CS 621 – Artificial Intelligence 27 th August, 2007.

Similar presentations


Presentation on theme: "Maximum Entropy Model & Generalized Iterative Scaling Arindam Bose CS 621 – Artificial Intelligence 27 th August, 2007."— Presentation transcript:

1 Maximum Entropy Model & Generalized Iterative Scaling Arindam Bose CS 621 – Artificial Intelligence 27 th August, 2007

2 Statistical Classification  Statistical Classification problems  Task is to find probability of class “a” occurring with context “b” or P(a,b)  Context depends on nature of task Eg. In NLP tasks, the context may consist of several words and associated syntactic labels

3 Training Data  Gather information from training data  Large training data will contain some information about the co-occurrence of a’s and b’s  Information never enough to completely specify P(a,b) for all possible (a,b) pairs

4 Task formulation  Find a method using the sparse evidence about a’s and b’s to reliably estimate a probability model, P(a,b)  Principle of Maximum Entropy: the correct distribution P(a,b) is that which maximizes entropy, or “uncertainity” subject to constraints  Constraints represent evidence

5 The Philosophy  Making inferences on the basis of partial information without biasing the assignment that would amount to arbitrary assumptions of information that we do not have  Maximize where remaining consistent with evidence

6 Representing evidence  Encode useful facts as features and impose constraints on the expectations of these features  A feature is a binary valued function, g(f,h)=1 if current_token_capitalized(h)=true and f=location_start =0 otherwise  Given k features, constraints have the form, i.e. the model’s expectation for each feature should match the observed expectation

7 Maximum Entropy Model  Maximum entropy solution allows computation of P(f|h) for any f (a possible future/class) from every h (a possible history/context)  “History” or the “context” is the conditioning data which enables the decision

8 The Model  In the model produced by M.E. estimation, every feature has an associated parameter  Conditional probability is calculated as : where  M.E estimation technique guarantees for every feature g i, the expected value of g i according to the M.E. model will equal the empirical expectation of g i in the training corpus

9 Generalized Iterative Scaling  Generalized Iterative Scaling, GIS, finds parameters of the distribution P  GIS requires the constraint that  If not, choose and add a correctional feature, such that

10 The GIS procedure It can be proven that a probability sequence whose parameters are defined by this procedure converges to a unique and positive solution

11 Computation  Each iteration requires computation of and  Given training sample,, calculation of is straightforward  The computation of the model’s feature expectation can be intractable

12 Computation of  We have,  Baye’s rule,  We use an approximation, summing over all the histories in the training sample, and not

13 Termination and Running Time  Termination after a fixed number of iterations (e.g. 100) or negligible change in log likelihood  Running time of each iteration dominated by computation of which is O(NPA)


Download ppt "Maximum Entropy Model & Generalized Iterative Scaling Arindam Bose CS 621 – Artificial Intelligence 27 th August, 2007."

Similar presentations


Ads by Google