1 Learning with Latent Alignment Structures Quasi-synchronous Grammar and Tree-edit CRFs for Question Answering and Textual Entailment Mengqiu Wang Joint.

Slides:

Advertisements

Similar presentations

CS460/IT632 Natural Language Processing/Language Technology for the Web Lecture 2 (06/01/06) Prof. Pushpak Bhattacharyya IIT Bombay Part of Speech (PoS)

Advertisements

COGEX at the Second RTE Marta Tatu, Brandon Iles, John Slavick, Adrian Novischi, Dan Moldovan Language Computer Corporation April 10 th, 2006.

COGEX at the Second RTE Marta Tatu, Brandon Iles, John Slavick, Adrian Novischi, Dan Moldovan Language Computer Corporation April 10 th, 2006.

Recognizing Textual Entailment Challenge PASCAL Suleiman BaniHani.

Learning for Structured Prediction Overview of the Material TexPoint fonts used in EMF. Read the TexPoint manual before you delete this box.: AA A A A.

Machine learning continued Image source:

Supervised Learning Recap

UNIT-III By Mr. M. V. Nikum (B.E.I.T). Programming Language Lexical and Syntactic features of a programming Language are specified by its grammar Language:-

A Joint Model For Semantic Role Labeling Aria Haghighi, Kristina Toutanova, Christopher D. Manning Computer Science Department Stanford University.

Learning with Probabilistic Features for Improved Pipeline Models Razvan C. Bunescu Electrical Engineering and Computer Science Ohio University Athens,

1 Pass Compiler 1. 1.Introduction 1.1 Types of compilers 2.Stages of 1 Pass Compiler 2.1 Lexical analysis 2.2. syntactical analyzer 2.3. Code generation.

Robust Textual Inference via Graph Matching Aria Haghighi Andrew Ng Christopher Manning.

Lecture 17: Supervised Learning Recap Machine Learning April 6, 2010.

C SC 620 Advanced Topics in Natural Language Processing Lecture 20 4/8.

1 Quasi-Synchronous Grammars  Based on key observations in MT: translated sentences often have some isomorphic syntactic structure, but not usually in.

Textual Entailment Using Univariate Density Model and Maximizing Discriminant Function “Third Recognizing Textual Entailment Challenge 2007 Submission”

1 Tree-edit CRFs for RTE Mengqiu Wang and Chris Manning.

Using Maximal Embedded Subtrees for Textual Entailment Recognition Sophia Katrenko & Pieter Adriaans Adaptive Information Disclosure project Human Computer.

Machine Translation Prof. Alexandros Potamianos Dept. of Electrical & Computer Engineering Technical University of Crete, Greece May 2003.

. Hidden Markov Models with slides from Lise Getoor, Sebastian Thrun, William Cohen, and Yair Weiss.

What is the Jeopardy Model? A Quasi-Synchronous Grammar for Question Answering Mengqiu Wang, Noah A. Smith and Teruko Mitamura Language Technology Institute.

Maximum Likelihood (ML), Expectation Maximization (EM)

Course Summary LING 575 Fei Xia 03/06/07. Outline Introduction to MT: 1 Major approaches –SMT: 3 –Transfer-based MT: 2 –Hybrid systems: 2 Other topics.

CMPT-825 (Natural Language Processing) Presentation on Zipf’s Law & Edit distance with extensions Presented by: Kaustav Mukherjee School of Computing Science,

Maximum Entropy Model LING 572 Fei Xia 02/07-02/09/06.

Parameter estimate in IBM Models: Ling 572 Fei Xia Week ??

A Confidence Model for Syntactically-Motivated Entailment Proofs Asher Stern & Ido Dagan ISCOL June 2011, Israel 1.

Online Learning Algorithms

Information Retrieval in Practice

STRUCTURED PERCEPTRON Alice Lai and Shi Zhi. Presentation Outline Introduction to Structured Perceptron ILP-CRF Model Averaged Perceptron Latent Variable.

Overview of the Fourth Recognising Textual Entailment Challenge NIST-Nov. 17, 2008TAC Danilo Giampiccolo (coordinator, CELCT) Hoa Trang Dan (NIST)

Knowledge and Tree-Edits in Learnable Entailment Proofs Asher Stern and Ido Dagan (earlier partial version by Roy Bar-Haim) Download at:

Tree Kernels for Parsing: (Collins & Duffy, 2001) Advanced Statistical Methods in NLP Ling 572 February 28, 2012.

INTRODUCTION TO COMPUTING CHAPTER NO. 06. Compilers and Language Translation Introduction The Compilation Process Phase 1 – Lexical Analysis Phase 2 –

Graphical models for part of speech tagging

Knowledge and Tree-Edits in Learnable Entailment Proofs Asher Stern, Amnon Lotan, Shachar Mirkin, Eyal Shnarch, Lili Kotlerman, Jonathan Berant and Ido.

Chapter 10: Compilers and Language Translation Invitation to Computer Science, Java Version, Third Edition.

1 Programming Languages Tevfik Koşar Lecture - II January 19 th, 2006.

CSC2535: Computation in Neural Networks Lecture 11: Conditional Random Fields Geoffrey Hinton.

AN IMPLEMENTATION OF A REGULAR EXPRESSION PARSER

1 CSA4050: Advanced Topics in NLP Spelling Models.

1 CS546: Machine Learning and Natural Language Latent-Variable Models for Structured Prediction Problems: Syntactic Parsing Slides / Figures from Slav.

Dependency Tree-to-Dependency Tree Machine Translation November 4, 2011 Presented by: Jeffrey Flanigan (CMU) Lori Levin, Jaime Carbonell In collaboration.

Reordering Model Using Syntactic Information of a Source Tree for Statistical Machine Translation Kei Hashimoto, Hirohumi Yamamoto, Hideo Okuma, Eiichiro.

Transformation-Based Learning Advanced Statistical Methods in NLP Ling 572 March 1, 2012.

Collocations and Information Management Applications Gregor Erbach Saarland University Saarbrücken.

1 Intelligente Analyse- und Informationssysteme Frank Reichartz, Hannes Korte & Gerhard Paass Fraunhofer IAIS, Sankt Augustin, Germany Dependency Tree.

What’s in a translation rule? Paper by Galley, Hopkins, Knight & Marcu Presentation By: Behrang Mohit.

1 Compiler Design (40-414)  Main Text Book: Compilers: Principles, Techniques & Tools, 2 nd ed., Aho, Lam, Sethi, and Ullman, 2007  Evaluation:  Midterm.

Presenter: Jinhua Du ( 杜金华 ) Xi’an University of Technology 西安理工大学 NLP&CC, Chongqing, Nov , 2013 Discriminative Latent Variable Based Classifier.

Chapter# 6 Code generation.  The final phase in our compiler model is the code generator.  It takes as input the intermediate representation(IR) produced.

. Parameter Estimation and Relative Entropy Lecture #8 Background Readings: Chapters 3.3, 11.2 in the text book, Biological Sequence Analysis, Durbin et.

CSE 413 Languages & Implementation Hal Perkins Autumn 2012 Structs, Implementing Languages (credits: Dan Grossman, CSE 341) 1.

Cross Language Clone Analysis Team 2 February 3, 2011.

For Friday Finish chapter 23 Homework –Chapter 23, exercise 15.

Shallow Parsing for South Asian Languages -Himanshu Agrawal.

Towards Entailment Based Question Answering: ITC-irst at Clef 2006 Milen Kouylekov, Matteo Negri, Bernardo Magnini & Bonaventura Coppola ITC-irst, Centro.

Conditional Markov Models: MaxEnt Tagging and MEMMs

Concepts and Realization of a Diagram Editor Generator Based on Hypergraph Transformation Author: Mark Minas Presenter: Song Gu.

NTU & MSRA Ming-Feng Tsai

Statistical Machine Translation Part II: Word Alignments and EM Alex Fraser Institute for Natural Language Processing University of Stuttgart

Support Vector Machines Reading: Ben-Hur and Weston, “A User’s Guide to Support Vector Machines” (linked from class web page)

For Monday Read chapter 26 Homework: –Chapter 23, exercises 8 and 9.

Edit Distances William W. Cohen.

January 2012Spelling Models1 Human Language Technology Spelling Models.

Graphical Models for Segmenting and Labeling Sequence Data Manoj Kumar Chinnakotla NLP-AI Seminar.

Dan Roth University of Illinois, Urbana-Champaign 7 Sequential Models Tutorial on Machine Learning in Natural.

Learning to Generate Complex Morphology for Machine Translation Einat Minkov †, Kristina Toutanova* and Hisami Suzuki* *Microsoft Research † Carnegie Mellon.

Mastering the Pipeline CSCI-GA.2590 Ralph Grishman NYU.

Presentation transcript:

1 Learning with Latent Alignment Structures Quasi-synchronous Grammar and Tree-edit CRFs for Question Answering and Textual Entailment Mengqiu Wang Joint work with Chris Manning, Noah Smith

2 Task definition At a high-level: Learning the syntactic and semantic relations between two pieces of text Application-specific definition of the relations Question Answering Q: Who is the leader of France? A: Bush later met with French President Jacques Chirac Machine Translation C: 温总理昨天会见了日本首相安培晋三。 E: Premier Wen Jiabao met with Japanese Prime Minister Shinzo Abe yesterday. Summarization T: US rounds up 400 Saddam diehards as group claims anti-US attacks in Iraq. S: US rounded up 400 people in Iraq. Textual Entailment (IE, IR, QA, SUM) Txt: Responding to Scheuer's comments in La Repubblica, the prime minister's office said the analysts' allegations, "beyond being false, are also absolutely incompatible with the contents of the conversation between Prime Minister Silvio Berlusconi and U.S. Ambassador to Rome Mel Sembler." Hyp: Mel Sembler represents the U.S.

3 The Challenges Latent alignment structure QA: Who is the leader of France? Bush later met with French President Jacques Chirac MT: 温总理昨天会见了日本首相安培晋三。 Premier Wen Jiabao met with Japanese Prime Minister Shinzo Abe yesterday. Sum: US rounds up 400 Saddam diehards as group claims anti-US attacks in Iraq. US rounded up 400 people in Iraq. RTE: Responding to … the conversation between Prime Minister Silvio Berlusconi and U.S. Ambassador to Rome Mel Sembler.“ Mel Sembler represents the U.S.

4 Other modeling challenges QuestionAnswer Ranking Who is the leader of France ? 1. Bush later met with French president Jacques Chirac. 2. Henri Hadjenberg, who is the leader of France ’s Jewish community, … 3. …

5 Semantic Tranformations Q:“Who is the leader of France?” A: Bush later met with French president Jacques Chirac.

6 Syntactic Transformations Who leadertheFranceofis? BushmetFrenchwithpresidentJacquesChirac mod

7 Syntactic Variations Who leadertheFranceofis? HenriHadjenberb,wholeaderistheofFrance’s’sJewishcommunity mod

8 What’s been done? The latent alignment problem Instead of treating alignment as latent variable, treat it as a separate task. First find the best alignment, then proceed with the rest of the task Pros: Usually simple and efficient. Cons: Not very robust, no way to correct alignment errors in later steps. Modeling syntax and semantics Extract features from syntactic parse trees and semantic resources then throw them into a linear classifier. Use syntax and semantic to enrich the feature space, but no principled ways to make use of syntax Pros: No need to worry about trees too much Cons: Ad-hocs

9 What I think an ideal model should do Carry alignment uncertainty into final task Treat alignment as latent variables and jointly learn about proper alignment structure and the overall task In other words, model the distribution over alignments and sum out all possible alignments at decoding time. Syntax-based and feature-rich models Directly model syntax Enable the use of rich semantic features and features from other world-knowledge resources.

10 Road map Present two models that address the raised issues 1: A model based on Quasi-synchronous Grammar (EMNLP 07’) Experiments on Question Answering task 2: A tree-edit CRFs model (current work) Experiments on RTE Discuss and compare these two models Modeling power Pros and cons Future work

11 Switching gear… Quasi-synchronous Grammar for Question Answering

12 Tree-edit CRFs for RTE Extension to McCallum et al. UAI2005 work on CRFs for finite-state String Edit Distance Key attractions: Models the transformation of dependency parse trees (thus directly models syntax), unlike McCallum et al. ’05, which only models word strings Discriminatively trained (not a generative model, unlike QG) Trained on both the positive and negative instances of sentence pairs (QG is only trained on positive Q/A pairs) CRFs – the underlying graphical model is an undirected graphical model (QG is basically a Bayes Net, directed) Joint model over alignments (vs. local alignment models in QG) Feature rich

13 TE-CRFs model in details First of all, let’s look at the correspondence between alignment (with constraints) and edit operations

Bush NNP person met VBD French JJ location president NN Jacques Chirac NNP person who WP qword leader NN is VB the DT France NNP location Q:A: $ root $ root subjobj detof root subjwith nmod substitute delete insert Fancy substitute

15 TE-CRFs model in details Each valid tree edit operation sequence that transforms one tree into the other corresponds to an alignment. A tree edit operation sequence is models as a transition sequence among a set of states in a FSM S1 S2 S3 D, S, I D, E, I D, S, I substitute deletesubstitute insert substitute S1 S2 S1 S3S1 S2 S3 S2 S1S2 S1S3 S2 S1 S3 S2 … … … … … … …

16 FSM This is for one edit operation sequence substitute deletesubstitute insert substitute S1 S2 S1 S3S1 S2 S3 S2 S1S2 S1S3 S2 S1 S3 S2 … … … … … … … delete substitute insert substitute S1 S2 S1 S3S1 … … … … … … … substitute deletesubstitute insert S1 S2 S1 S3S1 … … … … … … … substitute deletesubstitute insert substitute S1 S2 S1 S3S1 … … … … … … … There are many other valid edit sequences

17 FSM cont. S1 S2 S3 D, S, I Start Stop ε ε S1 S2 S3 D, S, I Positive State Set Negative State Set ε ε

18 FSM transitions S3 S2 S1 S3 S2 Start S2 S3 S1 S2 S1 S2 S1 S3 … …… … S2 … … … … … … … Stop S3 S2 S1 S3 S2 S3 S1 S2 S1 S2 S1 S3 … …… … S2 … … … … … … … Positive State Set Negative State Set

19 Parameterization S1 S2 substitute positive or negative positive and negative

20 Training using EM E-step M-step Using L-BFGS Jensen’s Inequality

21 Features for RTE Substitution Same -- Word/WordWithNE/Lemma/NETag/Verb/Noun/Adj/Adv/Other Sub/MisSub -- Punct/Stopword/ModalWord Antonym/Hypernym/Synonym/Nombank/Country Different – NE/Pos Unrelated words Delete Stopword/Punct/NE/Other/Polarity/Quantifier/Likelihood/Condition al/If Insert Stopword/Punct/NE/Other/Polarity/Quantifier/Likelihood/Condition al/If Tree RootAligned/RootAlignedSameWord Parent,Child,DepRel triple match/mismatch Date/Time/Numerical DateMismatch, hasNumDetMismatch, normalizedFormMismatch

22 Tree-edit CRFs for Textual Entailment Preliminary results Trained on RTE2 dev, tested on RTE2 test. model taken after 50 EM iterations acc:0.6275, map: RTE2 official results 1.Hickl (LCC) acc:0.7538, map: Tatu (LCC) acc:0.7375, map: Zanzotto (Milan & Rome) acc:0.6388, map: Adams (Dallas) acc:0.6262, map:0.6282

23 Comparison: QG vs. TE-CRFs 1.Generative 2.Directed, BayesNet, local 3.Allow arbitrary swapping in alignment 4.Allow limited use of semantic features (lexical- semantic log-linear model in mixture model) 5.Computationally cheaper 1. Discriminative 2. Undirected, CRFs, global 3. No swapping – can’t do substitutions that involve swapping (can be extended, see future work) 4. Allow arbitrary semantic features 5. Computationally more expensive QG TE-CRFs

24 Future work 1.Generative Train discriminatively using Noah’s Contrastive Estimation 2.Directed, BayesNet, local Higher-order Markovization 3.Allow arbitrary swapping in alignment 4.Allow limited use of semantic features (lexical- semantic log-linear model in mixture model) 5.Computationally cheaper 6.Run RTE experiments 1. Discriminative 2. Undirected, CRFs, global 3. No swapping Constrained unordered trees Fancy edit operations (e.g. substitute sub-trees) 4. Allow arbitrary semantic features 5. More expensive 6. Run QA and MT alignment experiments QG TE-CRFs

25 Thank you! Questions?