Two Discourse Driven Language Models for Semantics

Slides:



Advertisements
Similar presentations
Coreference Based Event-Argument Relation Extraction on Biomedical Text Katsumasa Yoshikawa 1), Sebastian Riedel 2), Tsutomu Hirao 3), Masayuki Asahara.
Advertisements

1 CS 388: Natural Language Processing Story Understanding Raymond J. Mooney University of Texas at Austin.
Proceedings of the Conference on Intelligent Text Processing and Computational Linguistics (CICLing-2007) Learning for Semantic Parsing Advisor: Hsin-His.
计算机科学与技术学院 Chinese Semantic Role Labeling with Dependency-driven Constituent Parse Tree Structure Hongling Wang, Bukang Wang Guodong Zhou NLP Lab, School.
Event Extraction Using Distant Supervision Kevin Reschke, Martin Jankowiak, Mihai Surdeanu, Christopher D. Manning, Daniel Jurafsky 30 May 2014 Language.
A Database of Nate Chambers and Dan Jurafsky Stanford University Narrative Schemas.
Semantic Role Labeling Abdul-Lateef Yussiff
Towards Parsing Unrestricted Text into PropBank Predicate- Argument Structures ACL4 Project NCLT Seminar Presentation, 7th June 2006 Conor Cafferkey.
Reranking Parse Trees with a SRL system Charles Sutton and Andrew McCallum University of Massachusetts June 30, 2005.
1 SIMS 290-2: Applied Natural Language Processing Marti Hearst Sept 22, 2004.
SI485i : NLP Set 12 Features and Prediction. What is NLP, really? Many of our tasks boil down to finding intelligent features of language. We do lots.
Extracting Opinions, Opinion Holders, and Topics Expressed in Online News Media Text Soo-Min Kim and Eduard Hovy USC Information Sciences Institute 4676.
Longbiao Kang, Baotian Hu, Xiangping Wu, Qingcai Chen, and Yan He Intelligent Computing Research Center, School of Computer Science and Technology, Harbin.
Page 1 Relation Alignment for Textual Entailment Recognition Department of Computer Science University of Illinois at Urbana-Champaign Mark Sammons, V.G.Vinod.
AQUAINT Kickoff Meeting – December 2001 Integrating Robust Semantics, Event Detection, Information Fusion, and Summarization for Multimedia Question Answering.
Learning Narrative Schemas Nate Chambers, Dan Jurafsky Stanford University IBM Watson Research Center Visit.
Illinois-Coref: The UI System in the CoNLL-2012 Shared Task Kai-Wei Chang, Rajhans Samdani, Alla Rozovskaya, Mark Sammons, and Dan Roth Supported by ARL,
L’età della parola Giuseppe Attardi Dipartimento di Informatica Università di Pisa ESA SoBigDataPisa, 24 febbraio 2015.
Combining terminology resources and statistical methods for entity recognition: an evaluation Angus Roberts, Robert Gaizauskas, Mark Hepple, Yikun Guo.
Coreference Resolution with Knowledge Haoruo Peng March 20,
Relation Alignment for Textual Entailment Recognition Cognitive Computation Group, University of Illinois Experimental ResultsTitle Mark Sammons, V.G.Vinod.
For Monday Read chapter 24, sections 1-3 Homework: –Chapter 23, exercise 8.
For Monday Read chapter 26 Last Homework –Chapter 23, exercise 7.
Prior Knowledge Driven Domain Adaptation Gourab Kundu, Ming-wei Chang, and Dan Roth Hyphenated compounds are tagged as NN. Example: H-ras Digit letter.
For Monday Read chapter 26 Homework: –Chapter 23, exercises 8 and 9.
Department of Computer Science The University of Texas at Austin USA Joint Entity and Relation Extraction using Card-Pyramid Parsing Rohit J. Kate Raymond.
Using Semantic Relations to Improve Information Retrieval
Overview of Statistical NLP IR Group Meeting March 7, 2006.
Solving Hard Coreference Problems Haoruo Peng, Daniel Khashabi and Dan Roth Problem Description  Problems with Existing Coref Systems Rely heavily on.
A Database of Narrative Schemas A 2010 paper by Nathaniel Chambers and Dan Jurafsky Presentation by Julia Kelly.
Dan Roth University of Illinois, Urbana-Champaign 7 Sequential Models Tutorial on Machine Learning in Natural.
The University of Illinois System in the CoNLL-2013 Shared Task Alla RozovskayaKai-Wei ChangMark SammonsDan Roth Cognitive Computation Group University.
Medical Semantic Similarity with a Neural Language Model Dongfang Xu School of Information Using Skip-gram Model for word embedding.
Unsupervised Sparse Vector Densification for Short Text Similarity
COSC 6336: Natural Language Processing
Cross-lingual Models of Word Embeddings: An Empirical Comparison
Automatically Labeled Data Generation for Large Scale Event Extraction
Advances in Statistical Script Learning
NELL Knowledge Base of Verbs
Korean version of GloVe Applying GloVe & word2vec model to Korean corpus speaker : 양희정 date :
Deep Learning for Bacteria Event Identification
Part 2 Applications of ILP Formulations in Natural Language Processing
Liberal Event Extraction and Event Schema Induction
Authorship Attribution Using Probabilistic Context-Free Grammars
Relation Extraction CSCI-GA.2591
Natural Language Processing (NLP)
Improving a Pipeline Architecture for Shallow Discourse Parsing
Generating Natural Answers by Incorporating Copying and Retrieving Mechanisms in Sequence-to-Sequence Learning Shizhu He, Cao liu, Kang Liu and Jun Zhao.
Learning Scripts for Text Understanding with Recurrent Neural Networks
Probabilistic and Lexicalized Parsing
Social Knowledge Mining
Donna M. Gates Carnegie Mellon University
Statistical NLP: Lecture 9
Learning Emoji Embeddings Using Emoji Co-Occurrence Network Graph
Lecture 9: Semantic Parsing
Word Embedding Word2Vec.
Natural Language Processing (NLP)
CS224N Section 3: Corpora, etc.
Word embeddings (continued)
CS565: Intelligent Systems and Interfaces
Rachit Saluja 03/20/2019 Relation Extraction with Matrix Factorization and Universal Schemas Sebastian Riedel, Limin Yao, Andrew.
Unsupervised Learning of Narrative Schemas and their Participants
Entity Linking Survey
The Winograd Schema Challenge Hector J. Levesque AAAI, 2011
VERB PHYSICS: Relative Physical Knowledge of Actions and Objects
Dan Roth Department of Computer Science
From Unstructured Text to StructureD Data
Statistical NLP : Lecture 9 Word Sense Disambiguation
Stance Classification of Ideological Debates
Natural Language Processing (NLP)
Presentation transcript:

Two Discourse Driven Language Models for Semantics Haoruo Peng and Dan Roth ACL-2016

A Typical “Semantic Sequence” Problem Co-reference “he” refers to whom? QA Who was arrested? / Was Kevin arrested by police? Event What happened to Robert after the robbery? “Understanding what could come next” is central to multiple natural language understanding tasks. [Kevin] was robbed by [Robert], and [he] was arrested by the police. First, let’s start with a motivating example …

Two Key Questions How do we model the sequential nature of NL at a semantic level? What do we mean by “a semantic level”? Outcome: Semantic Language Models (SemLMs) Quality evaluation of SemLMs How do we use the model to better support NLU tasks? Application to co-reference Application to shallow discourse parsing

Frame-Based Semantic Sequence [Kevin] was robbed by [Robert], and [he] was arrested by the police. predicate rob predicate arrest sub obj sub obj Robert Kevin … the police he … argument argument argument argument For the modelling problem, we choose the frame-based definition. [Chambers and Jurafsky, 2008; Bejan, 2008; Jans et al., 2012] [Granroth-Wilding et al., 2015; Pichotta and Mooney, 2016] …

What We Do Differently … … Infuse discourse information [Kevin] was robbed by [Robert], but the police mistakenly arrested [him]. predicate predicate connective Into the mix (between frames) Also this is important … … argument argument argument argument

Two Different Sequences [Kevin] was robbed by [Robert], but the police mistakenly arrested [him]. Frame-Chain (FC) Sequences Entity-Centered (EC) Sequences SRL SRL Shallow Discourse Parsing Rob.01 Arrest.01 sub obj but sub obj Robert Kevin the police Kevin him Co-reference Two models for different usages Rob.01 but Arrest.01 EOS Kevin Rob.01-obj but Arrest.01-obj 6

Semantic Language Model (SemLM) SemLM Units Disambiguated Predicates or Disambiguated Predicates with argument role label Discourse Markers including Language Models Semantic Sequences Semantic Knowledge Rob.01 Rob.01-obj but EOS

Language Model Implementations N-gram (UNI, BG, TRI) Skip-Gram (SG) Continuous Bag-of-Words (CBOW) Log-bilinear (LB) An extension to cbow

Building SemLMs from Scratch Semantic Role Labeling Shallow Discourse Parsing End-to-End Co-reference A Large Collection of Text Documents 20 Years NYT 1.8M Documents Annotated Documents Illinois NLP Packages FrameNet Mapping Augment to Verb Phrases Augment to Compound Verbs Filter Rare Units ( >=20 times) Add “UNK” SemLM Units SemLM Vocabulary

Building SemLMs from Scratch Augment to Verb Phrases + preposition “take over” take.03(over) + negation “not like” like.01(not) be + adj. “be happy” be.02(happy) Augment to Compound Verbs eat and drink “eat.01-drink.01” decide to buy “decide.01-buy.01” Together Actual Vocabulary He doesn’t want to give up. want.01(not)-give.08(up)

Building SemLMs from Scratch 20 Years NYT 1.8M Documents Annotated Documents Illinois NLP Packages FrameNet Mapping Augment to Verb Phrases Augment to Compound Verbs Filter Rare Units ( >=20 times) Add “UNK” SemLM Units SemLM Vocabulary SemLM Training Similar Settings as LM Two Models (FC / EC) Four Implementations 8 Different SemLMs

Design Choices Generate a Probabilistic Model V.S. script learning  Too Sparse e.g. Narrative Schema [Chambers and Jurafsky, 2009] Noise in Preprocessing (SRL, Co-ref, etc.) Large Data  Robustness (Shown in Quality Evaluation) Entity Modeling in Semantic Sequences The right level of abstraction is hard to determine. [The doctor] told [Susan] that [she] had been busy. Person [The doctor] told [Susan] that [she] had cancer. Doctor/Patient [Mary] told [Susan] that [she] had cancer. ?

Quality of SemLMs Two Standard Tests Test Corpus Perplexity Test Narrative Cloze Test Test Corpus NYT Hold-out Data (10% of NYT corpus) + Automatic Annotation Gold PropBank Data with Frame Chains + Gold Annotation Gold Ontonotes Data with Coref Chains + Gold Annotation

Quality of SemLMs Perplexity Test

Quality of SemLMs Narrative Cloze Test (MRR) Recall@30 in the paper

Application to NLP Tasks Co-reference Resolution Mention-Pair Model Conditional Probability Add as additional features Wiseman better results, but orthogonal

Application to NLP Tasks Shallow Discourse Parsing CoNLL Shared Task Setting (Connective Sense Classification) Conditional Probability Add as additional features

Conclusion Thank You! How do we model the sequential nature of NL at a semantic level? SemLMs: Discourse Driven Two Models - Four Implementations High Quality How do we use the modelling to better support NLU tasks? Two Tasks, which utilize two models separately SemLM Conditional Probability as Features Trained Embeddings SemLMs Available: Email me at hpeng7@illinois.edu

Example Output FC-LB EC-LB P(convict.01 | arrest.01, indict.01) = 8.2×10-3 P(rescue.01 | arrest.01, indict.01) = 6.7×10-5 EC-LB P(arrest.01-obj | rob.01-obj, but) = 2.1×10-3 P(arrest.01-obj | rob.01-sub, but) = 1.7×10-4 P(arrest.01-obj | rob.01-sub, and) = 7.3×10-3