Graphical Models over Multiple Strings Markus Dreyer and Jason Eisner Dept. of Computer Science, Johns Hopkins University EMNLP 2009 Presented by Ji Zongcheng.

Slides:

Advertisements

Similar presentations

Slide 1 of 18 Uncertainty Representation and Reasoning with MEBN/PR-OWL Kathryn Blackmond Laskey Paulo C. G. da Costa The Volgenau School of Information.

Advertisements

Autonomic Scaling of Cloud Computing Resources

Preliminary Results (Synthetic Data) We generate a random 4-ary MRF and we sample training and test data. We forget the structure and start learning with.

Exact Inference. Inference Basic task for inference: – Compute a posterior distribution for some query variables given some observed evidence – Sum out.

Deep Learning Bing-Chen Tsai 1/21.

Dynamic Bayesian Networks (DBNs)

Introduction to Belief Propagation and its Generalizations. Max Welling Donald Bren School of Information and Computer and Science University of California.

Introduction of Probabilistic Reasoning and Bayesian Networks

Conditional Random Fields: Probabilistic Models for Segmenting and Labeling Sequence Data John Lafferty Andrew McCallum Fernando Pereira.

Belief Propagation by Jakob Metzler. Outline Motivation Pearl’s BP Algorithm Turbo Codes Generalized Belief Propagation Free Energies.

Markov Networks.

Chapter 6: HIDDEN MARKOV AND MAXIMUM ENTROPY Heshaam Faili University of Tehran.

Sequential Modeling with the Hidden Markov Model Lecture 9 Spoken Language Processing Prof. Andrew Rosenberg.

CS774. Markov Random Field : Theory and Application Lecture 04 Kyomin Jung KAIST Sep

Finite-State Automata Shallow Processing Techniques for NLP Ling570 October 5, 2011.

Logistics Course reviews Project report deadline: March 16 Poster session guidelines: – 2.5 minutes per poster (3 hrs / 55 minus overhead) – presentations.

Shallow Processing: Summary Shallow Processing Techniques for NLP Ling570 December 7, 2011.

CS4705 Natural Language Processing.  Regular Expressions  Finite State Automata ◦ Determinism v. non-determinism ◦ (Weighted) Finite State Transducers.

FSA and HMM LING 572 Fei Xia 1/5/06.

Hidden Markov Model Special case of Dynamic Bayesian network Single (hidden) state variable Single (observed) observation variable Transition probability.

A Non-Parametric Bayesian Approach to Inflectional Morphology Jason Eisner Johns Hopkins University This is joint work with Markus Dreyer. Most of the.

Conditional Random Fields

LING 388 Language and Computers Take-Home Final Examination 12/9/03 Sandiway FONG.

Understanding Belief Propagation and its Applications Dan Yuan June 2004.

Belief Propagation Kai Ju Liu March 9, Statistical Problems Medicine Finance Internet Computer vision.

CPSC 422, Lecture 18Slide 1 Intelligent Systems (AI-2) Computer Science cpsc422, Lecture 18 Feb, 25, 2015 Slide Sources Raymond J. Mooney University of.

Statistical Natural Language Processing. What is NLP?  Natural Language Processing (NLP), or Computational Linguistics, is concerned with theoretical.

Computer vision: models, learning and inference

Extracting Places and Activities from GPS Traces Using Hierarchical Conditional Random Fields Yong-Joong Kim Dept. of Computer Science Yonsei.

History-Dependent Graphical Multiagent Models Quang Duong Michael P. Wellman Satinder Singh Computer Science and Engineering University of Michigan, USA.

A Markov Random Field Model for Term Dependencies Donald Metzler W. Bruce Croft Present by Chia-Hao Lee.

THE BIG PICTURE Basic Assumptions Linguistics is the empirical science that studies language (or linguistic behavior) Linguistics proposes theories (models)

Some Probability Theory and Computational models A short overview.

第十讲概率图模型导论 Chapter 10 Introduction to Probabilistic Graphical Models

CS774. Markov Random Field : Theory and Application Lecture 19 Kyomin Jung KAIST Nov

UIUC CS 498: Section EA Lecture #21 Reasoning in Artificial Intelligence Professor: Eyal Amir Fall Semester 2011 (Some slides from Kevin Murphy (UBC))

Indirect Supervision Protocols for Learning in Natural Language Processing II. Learning by Inventing Binary Labels This work is supported by DARPA funding.

Tell Me What You See and I will Show You Where It Is Jia Xu 1 Alexander G. Schwing 2 Raquel Urtasun 2,3 1 University of Wisconsin-Madison 2 University.

Xinhao Wang, Jiazhong Nie, Dingsheng Luo, and Xihong Wu Speech and Hearing Research Center, Department of Machine Intelligence, Peking University September.

CPSC 422, Lecture 15Slide 1 Intelligent Systems (AI-2) Computer Science cpsc422, Lecture 15 Oct, 14, 2015.

Approximation-aware Dependency Parsing by Belief Propagation September 19, 2015 TACL at EMNLP 1 Matt Gormley Mark Dredze Jason Eisner.

Penalized EP for Graphical Models Over Strings Ryan Cotterell and Jason Eisner.

DeepDive Model Dongfang Xu Ph.D student, School of Information, University of Arizona Dec 13, 2015.

John Lafferty Andrew McCallum Fernando Pereira

Motivation and Overview

CPSC 422, Lecture 17Slide 1 Intelligent Systems (AI-2) Computer Science cpsc422, Lecture 17 Oct, 19, 2015 Slide Sources D. Koller, Stanford CS - Probabilistic.

Efficient Belief Propagation for Image Restoration Qi Zhao Mar.22,2006.

Discriminative Training and Machine Learning Approaches Machine Learning Lab, Dept. of CSIE, NCKU Chih-Pin Liao.

Introduction on Graphic Models

Daphne Koller Overview Conditional Probability Queries Probabilistic Graphical Models Inference.

1 Relational Factor Graphs Lin Liao Joint work with Dieter Fox.

Markov Networks: Theory and Applications Ying Wu Electrical Engineering and Computer Science Northwestern University Evanston, IL 60208

1 Introduction to Turing Machines

Discriminative n-gram language modeling Brian Roark, Murat Saraclar, Michael Collins Presented by Patty Liu.

Instructor: Eyal Amir Grad TAs: Wen Pu, Yonatan Bisk Undergrad TAs: Sam Johnson, Nikhil Johri CS 440 / ECE 448 Introduction to Artificial Intelligence.

Graphical Models for Segmenting and Labeling Sequence Data Manoj Kumar Chinnakotla NLP-AI Seminar.

Dan Roth University of Illinois, Urbana-Champaign 7 Sequential Models Tutorial on Machine Learning in Natural.

Integrative Genomics I BME 230. Probabilistic Networks Incorporate uncertainty explicitly Capture sparseness of wiring Incorporate multiple kinds of data.

Lecture 7: Constrained Conditional Models

Ryan Cotterell, John Sylak-Glassman, and Christo Kirov

Section 4: Incorporating Structure into Factors and Variables

CS4705 Natural Language Processing

LECTURE 15: REESTIMATION, EM AND MIXTURES

GANG: Detecting Fraudulent Users in OSNs

Conditional Random Fields

Learning to Rank Typed Graph Walks: Local and Global Approaches

Markov Networks.

Ping LUO*, Fen LIN^, Yuhong XIONG*, Yong ZHAO*, Zhongzhi SHI^

Graduate School of Information Sciences, Tohoku University

Dan Roth Department of Computer Science

Presentation transcript:

Graphical Models over Multiple Strings Markus Dreyer and Jason Eisner Dept. of Computer Science, Johns Hopkins University EMNLP 2009 Presented by Ji Zongcheng

Contents Overview Motivation Formal Modeling Approach Approximate Inference Experiments Conclusions

Contents Overview Motivation Formal Modeling Approach Approximate Inference Experiments Conclusions

Overview We study graphical modeling in the case of string-valued random variables  Rather than over the finite domains: booleans, words, or tags Whereas a weighted finite-state transducer can model the probabilistic relationship between two strings. We are interested in building up joint models of three or more strings.

Graphical models Build: variables; domains; possible direct interactions Train: parameters θ  p(V1,..., Vn)  Choice of training procedure Infer: predict unobserved variables from observed ones  Choice of exact or approximate inference algorithm

Weighted finite-state transducer FSA: finite-state automata WFSA: weighted finite-state automata FST: finite-state transducers WFST: weighted finite-state transducers FSM: finite-state machine WFSM: weighted finite-state machine K = 1, Acceptor K = 2, Transducer K > 2, Machine

Contents Overview Motivation Formal Modeling Approach Approximate Inference Training the Model Parameters Comparison With Other Approaches Experiments Conclusions

Motivation String mapping between different forms and representations is ubiquitous in NLP and computational linguistics. However, many problems involve more than just two strings:  Morphological paradigm (e.g. infinitive, past, present-tense of verb )  Word translation  Cognates in multiple languages  Modern and ancestral word  In bioinformatics and in system combination, multiple sequences need to be aligned  …… We propose a unified model for multiple strings that is suitable for all the problems mentioned above.

Contents Overview Motivation Formal Modeling Approach Approximate Inference Experiments Conclusions

Formal Modeling Approach Variables  Markov Random Field (MRF) : a joint model of a set of random variables, V = {V1,..., Vn}  We assume that all variables are string-valued. The assumption is not crucial, since most can be easily encoded as strings Factors  Factor (or potential function): Fj : A  R ≥0 Unary factor: WFSA Binary factor: WFST A factor depend on k > 2 variables: WFSM  A MRF defines a probability for each assignment A of values to the variables in V:

Formal Modeling Approach Parameters  A vector of feature weights θ ∈ R  How to specify and train such Parameterized WFSMs (Eisner (2002) explains how to.) Power of the formalism  The framework is powerful enough to express computationally undecidable problems  Graphical models has developed many methods:  Exact or approximate inference Figure 1: Example of a factor graph

Contents Overview Motivation Formal Modeling Approach Approximate Inference Experiments Conclusions

Approximate Inference Belief Propagation  BP vs. forward-backward algorithm (only on chain-structured factor graphs)  Loopy Belief Propagation (factor graphs have cycles)

Approximate Inference How BP works in general?  Each variable V maintains a belief about it’s value:  Two messages:  The final beliefs are the output of the algorithm  If variable V is observed: modify (2),(4) by multiplying evidence potential (1 on v and 0 on other)

Contents Overview Motivation Formal Modeling Approach Approximate Inference Experiments Conclusions

Experiments Reconstruct missing word forms in morphological paradigms  Given lemma (e.g. brechen)  Ovserved (e.g. brachen, bricht, …)  Predict (e.g. breche, brichst,...)

Experiments Development Data (100 verbs)

Experiments Test Data (9293 test paradigms)

Contents Overview Motivation Formal Modeling Approach Approximate Inference Experiments Conclusions

Graphical model with string-valued variables  Factors are defined by WFSA  Approximate inference can be done by loopy BP Potentially applicable  Transliteration  Cognate modeling  Multiple-sequence alignment  System combination

Thank you!