Download presentation
Presentation is loading. Please wait.
Published byCarmel Walters Modified over 9 years ago
1
Standard EM/ Posterior Regularization (Ganchev et al, 10) E-step: M-step: argmax w E q log P (x, y; w) Hard EM/ Constraint driven-learning (Chang et al, 07) E-step: M-step: argmax w E q log P (x, y; w) y * = argmax y P (y|x, w) Uy · b Not clear which version To use!!! Expectation Maximization and Variations EM is the most popular algorithm for unsupervised and semi-supervised learning E-step is an inference step: inferring posterior distribution over output variables Hard EM – infer the most likely variable assignment at each iteration step. Constrained EM: Posterior Regularization (PR) and Constraint-Driven Learning (CoDL) There are different variations of EM which change the E-step (or the inference step) Infers posterior distribution spread over all outputs Infers posterior distribution peaked over just one output Tuning the Posterior Entropy: Unified Expectation Maximization (UEM) UEM: A framework for explicitly tuning the entropy of the posterior distribution during the E-step or the inference-step and minimizes a modified KL divergence KL(q, P (y|x;w); °) where Changes the entropy of the posterior KL(q, p; °) = y ° q(y) log q(y) – q(y) log p(y) A Framework For Tuning Posterior Entropy Rajhans Samdani 1, Ming-Wei Chang 2, and Dan Roth 1 1 University of Illinois at Urbana-Champaign, 2 Microsoft Research Different ° values ! different EM algorithms: control the “hardness of inference” as means to better adapt learning to underlying distribution, data, initialization, constraints, etc. The effect of changing ° on the posterior q : Conditional Distribution p q with ° = 1q with ° = 0q with ° = 1q with ° = -1 ! Unification: Changing ° values results in different existing EM algorithms No Constraints With Constraints ° 10-11 Hard EM CODL EM PR Deterministic Annealing (Smith and Eisner, 04; Hofmann, 99) Experimental Evidence for Tuning Posterior Entropy Test if tuning the posterior entropy via ° improves the performance over baselines namely EM or Posterior Regularization (PR) corresponds to ° = 1.0 Hard EM or Constraint-driven Learning (CODL) corresponds to ° = -1 Study the relation between the quality of initialization and ° (or the “hardness” of inference) In almost all of our experiments, the best UEM algorithm corresponds to ° somewhere between 0 and 1 and which we discovered through our UEM framework Food for thought: why and how is the posterior entropy exactly affecting the learning so much? Unsupervised POS Tagging Uniform Initialization Initialization with 5 examples Initialization with 10 examples Initialization with 20 examples Initialization with 40-80 examples ° Performance relative to EM Hard EMEM Model as first order HMM Try varying qualities of initialization: Uniform initialization: initialize with equal probability for all states Supervised initialization: initialize with parameters trained on varying amounts of labeled data Observe: better quality initialization ) hard inference better Experiments: Entity-Relation Extraction Extract entity types (e.g. Loc, Org, Per) and relation types (e.g. Lives-in, Org- based-in, Killed) between pairs of entities Add constraints: Type constraints between entity and relations Expected count constraints to regularize the counts of ‘None’ relation Semi-supervised learning with a small amount of labeled data Macro-f1 scores % of labeled data Experiments: Word Alignment Word alignment from a language S to language T We try En-Fr and En-Es pairs We use an HMM-based model with agreement constraints for word alignment By tuning ° in UEM, reduce error rate over PR by 20% and over CODL by 40% Word Alignment: ES-EN Alignment Error Rate No. of unlabeled sentences Supported by the Army Research Laboratory (ARL), Defense Advanced Research Projects Agency (DARPA), and the Office of Naval Research (ONR).. argmin q KL(q t (y),P (y | x; w)) E q [Uy] · b
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.