1 Introduction to Computational Linguistics Eleni Miltsakaki AUTH Spring 2006-Lecture 6.

Slides:



Advertisements
Similar presentations
C O N T E X T - F R E E LANGUAGES ( use a grammar to describe a language) 1.
Advertisements

Eye Movements and Spoken Language Comprehension: effects of visual context on syntactic ambiguity resolution Spivey et al. (2002) Psych 526 Eun-Kyung Lee.
Sentence Processing III Language Use and Understanding Class 12.
Theeraporn Ratitamkul, University of Illinois and Adele E. Goldberg, Princeton University Introduction How do young children learn verb meanings? Scene.
Language and Cognition Colombo, June 2011 Day 2 Introduction to Linguistic Theory, Part 4.
Statistical NLP: Lecture 3
1 Discourse, coherence and anaphora resolution Lecture 16.
A Joint Model For Semantic Role Labeling Aria Haghighi, Kristina Toutanova, Christopher D. Manning Computer Science Department Stanford University.
April 26th, 2007 Workshop on Treebanking, HLT/NAACL, Rochester 1 Layering of Annotations in the Penn Discourse TreeBank (PDTB) Rashmi Prasad Institute.
Recognizing Implicit Discourse Relations in the Penn Discourse Treebank Ziheng Lin, Min-Yen Kan, and Hwee Tou Ng Department of Computer Science National.
LTAG Semantics on the Derivation Tree Presented by Maria I. Tchalakova.
Predicting Text Quality for Scientific Articles Annie Louis University of Pennsylvania Advisor: Ani Nenkova.
Introduction to treebanks Session 1: 7/08/
Amirkabir University of Technology Computer Engineering Faculty AILAB Efficient Parsing Ahmad Abdollahzadeh Barfouroush Aban 1381 Natural Language Processing.
1 Introduction to Computational Linguistics Eleni Miltsakaki AUTH Fall 2005-Lecture 9.
Annotation Types for UIMA Edward Loper. UIMA Unified Information Management Architecture Analytics framework –Consists of components that perform specific.
1 Annotation Guidelines for the Penn Discourse Treebank Part B Eleni Miltsakaki, Rashmi Prasad, Aravind Joshi, Bonnie Webber.
DS-to-PS conversion Fei Xia University of Washington July 29,
1 Introduction to Computational Linguistics Eleni Miltsakaki AUTH Spring 2006-Lecture 4.
Extracting LTAGs from Treebanks Fei Xia 04/26/07.
Features and Unification
June 7th, 2008TAG+91 Binding Theory in LTAG Lucas Champollion University of Pennsylvania
1 Introduction to Computational Linguistics Eleni Miltsakaki AUTH Fall 2005-Lecture 2.
MC-TAG, flexible composition, etc. ARAVIND K. JOSHI March
1 Annotation Guidelines for the Penn Discourse Treebank Part A Eleni Miltsakaki, Rashmi Prasad, Aravind Joshi, Bonnie Webber.
Linguistic Theory Lecture 3 Movement. A brief history of movement Movements as ‘special rules’ proposed to capture facts that phrase structure rules cannot.
Lecture 1, 7/21/2005Natural Language Processing1 CS60057 Speech &Natural Language Processing Autumn 2005 Lecture 1 21 July 2005.
Context Free Grammars Reading: Chap 12-13, Jurafsky & Martin This slide set was adapted from J. Martin, U. Colorado Instructor: Paul Tarau, based on Rada.
PropBank, VerbNet & SemLink Edward Loper. PropBank 1M words of WSJ annotated with predicate- argument structures for verbs. –The location & type of each.
Tree-adjoining grammar (TAG) is a grammar formalism defined by Aravind Joshi and introduced in Tree-adjoining grammars are somewhat similar to context-free.
Continuous Discontinuity in It-Clefts Introduction Tension between the two approaches Our proposal: TAG analysis Equative it-cleft: It was Ohno who won.
1 Statistical Parsing Chapter 14 October 2012 Lecture #9.
Dr. Monira Al-Mohizea MORPHOLOGY & SYNTAX WEEK 12.
1 Introduction to Natural Language Processing ( ) Linguistic Essentials: Syntax AI-lab
SYNTAX Lecture -1 SMRITI SINGH.
Lesson 3 CDT301 – Compiler Theory, Spring 2011 Teacher: Linus Källberg.
1 Special Electives of Comp.Linguistics: Processing Anaphoric Expressions Eleni Miltsakaki AUTH Fall 2005-Lecture 2.
Avoiding the Garden Path: Eye Movements in Context
7. Parsing in functional unification grammar Han gi-deuc.
A Cascaded Finite-State Parser for German Michael Schiehlen Institut für Maschinelle Sprachverarbeitung Universität Stuttgart
Context Free Grammars Reading: Chap 9, Jurafsky & Martin This slide set was adapted from J. Martin, U. Colorado Instructor: Rada Mihalcea.
1 Cohesion + Coherence Lecture 9 MODULE 2 Meaning and discourse in English.
1 Introduction to Computational Linguistics Eleni Miltsakaki AUTH Fall 2005-Lecture 4.
Discourse Connectives and Their Argument Structure: Annotating a discourse treebank ARAVIND K. JOSHI Department of Computer and Information Science October.
1 Discourse Connectives and Their Argument Structure: Annotating a discourse treebank ARAVIND K. JOSHI Department of Computer and Information Science August.
Albert Gatt LIN3021 Formal Semantics Lecture 4. In this lecture Compositionality in Natural Langauge revisited: The role of types The typed lambda calculus.
Minimally Supervised Event Causality Identification Quang Do, Yee Seng, and Dan Roth University of Illinois at Urbana-Champaign 1 EMNLP-2011.
Rules, Movement, Ambiguity
Supertagging CMSC Natural Language Processing January 31, 2006.
1Computer Sciences Department. Book: INTRODUCTION TO THE THEORY OF COMPUTATION, SECOND EDITION, by: MICHAEL SIPSER Reference 3Computer Sciences Department.
LING 6520: Comparative Topics in Linguistics (from a computational perspective) Martha Palmer Jan 15,
CHAPTER SIX GRAMMAR MOSAIC 2.
1 Introduction to Computational Linguistics Eleni Miltsakaki AUTH Spring 2006-Lecture 2.
CS 4705 Lecture 17 Semantic Analysis: Robust Semantics.
FILTERED RANKING FOR BOOTSTRAPPING IN EVENT EXTRACTION Shasha Liao Ralph York University.
1 Introduction to Computational Linguistics Eleni Miltsakaki AUTH Fall 2005-Lecture 3.
Handling Unlike Coordinated Phrases in TAG by Mixing Syntactic Category and Grammatical Function Carlos A. Prolo Faculdade de Informática – PUCRS CELSUL,
 2003 CSLI Publications Ling 566 Oct 17, 2011 How the Grammar Works.
NATURAL LANGUAGE PROCESSING
X-Bar Theory. The part of the grammar regulating the structure of phrases has come to be known as X'-theory (X’-bar theory'). X-bar theory brings out.
Certifying and Synthesizing Membership Equational Proofs Patrick Lincoln (SRI) joint work with Steven Eker (SRI), Jose Meseguer (Urbana) and Grigore Rosu.
Natural Language Processing Vasile Rus
PRESENTED BY: PEAR A BHUIYAN
Representation of Actions as an Interlingua
Chapter Eight Syntax.
Improving a Pipeline Architecture for Shallow Discourse Parsing
LING 581: Advanced Computational Linguistics
Chapter Eight Syntax.
Eleni Miltsakaki AUTH Fall 2005-Lecture 6
COMPILER CONSTRUCTION
Presentation transcript:

1 Introduction to Computational Linguistics Eleni Miltsakaki AUTH Spring 2006-Lecture 6

2 What’s the plan for today? Brief review of Trueswell et al’s experiements LTAG: A lexicalized tree adjoining grammar DLTAG: A lexicalized tree adjoining grammar for discourse DLTAG-based parsing system Annotation of the Penn Discourse Treebank (PDTB) –

3 Trueswell et al 1999 “The kindergarten-path effect: Studying on line sentence processing in young children”, in Cognition (1999)

4 The garden-path theory At points of syntactic ambiguity the syntactically simplest alternative is chosen: e.g. minimal attachment (e.g., Frazier and Rayner 1982, Ferreira and Clifton 1986) However, it has been shown that non-syntactic sources of information can mediate garden-path effects (e.g., Altmann and Steedman 1988, Tanenhaus et al 1995)

5 Referential principle Example: if two thieves are evoked in the context and then we hear Ann hit the thief with… we prefer the NP-attachment reading (Crain & Steedman 1985)

6 Experiment 1 Methodology: eye-tracking Participants: 16 5-year-old children Material: –Put the frog on the napkin in the box (ambiguous between DESTINATION and MODIFIER) –Put the frog that’s on the napkin in the box (unambiguous)

7 Head mounted eye tracker

8 1 and 2 referent context

9 Unambiguous

10 Analysis Percentage of trials with eye-fixation to INCORRECT DESTINATION (I.e. the empty napkin)

11

12 Results VP-attachment preference for children: 5-year olds prefer to interpret the ambiguous ‘on the napkin’ as destination regardless of referential context Children are insensitive to the “Referential Principle” They don’t ‘recover’ from initial interpretation In the 2-referent ambiguous condition they picked the Target animal at chance

13 Experiment 2 Participants: 12 adults Same material Same methodology

14

15 Results Adults experienced garden path in the 1- referent ambiguous condition only

16

17 Conclusions Adults and children differ in how they handle temporary syntactic ambiguity –Adults resolve ambiguity according to the Referential Principle: modifier in 2-referent context, destination in 1-referent context –Children are insensitive to the Referential Principle: They resolve the ambiguity to the VP-attachment interpretation, i.e., destination

18 Explanation of VP-attachment preference in children Minimal attachment? Lexical frequency?

19 Tree adjoining grammar Introduced by Joshi, Levy & Takahashi (1975) and Joshi (1985) Linguistically motivated –Tree generating grammar (generates tree structures not just strings) Example: I want him to leave, I promised him to leave –Allows factoring recursion from the statement of linguistic constraints (dependencies), thus simplifying linguistic description (Kroch & Joshi 1985) Formally motivated –A (new) class of grammars that describe mildly context sensitive languages (Joshi et al 1991)

20 TAG formalism Concepts: lexicalization and locality/recursion Who do you like t? Who does John think that you like t? Who does John think that Mary said that you like t? Elementary objects: initial trees and auxiliary trees Operations: substitution and adjunction –Adjunction

21

22 Adjunction

23 Adjunction

24 Derived and derivation trees

25 Basic references: DLTAG, PDTB Anchoring a Lexicalized Tree-Adjoining Grammar for Discourse (1998), –B. Webber and A. Joshi What are Little Texts Made of? A Structural Presuppositional Account Using Lexicalized TAG –B. Webber, A. Joshi, A. Knott, M. Stone DLTAG System: Discourse Parsing with a Lexicalized Tree-Adjoining Grammar (2001) –K. Forbes, E. Miltsakaki, R. Prasad, A. Sarkar, A. Joshi and B. Webber The Penn Discourse Treebank (2004) –E. Miltsakaki, R. Prasad, A. Joshi and B. Webber

26 Motivation and basics of the DLTAG approach Discourse meaning: more than its parts Compositional vs non-compositional aspects of discourse meaning This distinction is often conflated in most of related work Smooth transition from sentence level structure to discourse level structure

27 The DLTAG view of discourse connectives Discourse connectives are treated as higher level predicates taking clausal arguments Basic types of discourse connectives: –Structural Subordinate conjunctions (when, although, because etc) Coordinate conjunctions (and, but, or) –“Anaphoric” Adverbials (however, therefore, as a result, etc)

28 Elements of LTAG  Initial and auxiliary trees  Initial: Encode predicate-argument dependencies  Auxiliary: recursive, modify elementary trees  anchors of elementary trees are semantic predicates  substitution and adjunction  D-LTAG is similar  anchors of elementary trees are semantic features which can be lexicalized with discourse connectives

29 D-LTAG Structures and Semantics Initial Trees (a) John failed his exam because he was lazy

30 Auxiliary trees (a) Mary saw John but she decided to ignore him. (b) Mary saw John. She decided to ignore him. 1. On the one hand, John loves Barolo. 2. So he ordered three cases of the ‘ On the other hand, he had to cancel the order 4. because he then found that he was broke.

31 Phenomena that DLTAG captures Arguments of a coherence relation can be stretched “long distance” Multiple discourse connectives can appear in a single sentence or even a single clause Coherence relations can vary in how and when they are realized lexically

32 Stretching arguments On the one hand, John loves Barolo. So he ordered three cases of the ’97. On the other hand, he had to cancel the order Because he then found that he was broke.

33 Non-Compositional Semantics Non-defeasible vs defeasible causal connection (a)The City Council refused the women a permit because they feared violence. (b)The City Council refused the women a permit. They feared violence. Presuppositional semantics (Knott et al, 1996): –Defeasible rule: When people go to the zoo, they leave their work behind. ( c) John went to the zoo. However, he took his cell phone with him.

34 DLTAG system for parsing discourse Theoretical framework: DLTAG Main system components: –Sentence level parsing –Tree extractor –Tree mapper –Discourse input representation –Discourse level parsing

35 Parser (Sarkar, 2000) –XTAG grammar –One derivation per sentence E.g. Mary was amazed

36 Tree extractor:identifying discourse units (a) While she was eating lunch she saw a dog

37 Tree mapper From sentence level structure to discourse structure

38 Discourse input representation

39 System Architecture

40 Example Discourse (a) Mary was amazed. (b) While she was eating lunch, she saw a dog. (c) She’d seen a lot of dogs, but this one was amazing. (d) The dog barked and Mary smiled. (e) Then, she gave it a sandwich

Derived and Derivation trees

42 Corpus example The pilots could play hardball by noting they were crucial to any sale or restructuring because they can refuse to fly the airplanes. If they were to insist on a low bid of, say $200 a share the board mightn’t be able to obtain a higher offer from the bidders because banks might hesitate to finance a transaction the pilots oppose. Also, because UAL chairman Stephen Wolf and other UAL executives have joined the pilots’ bid, the board might be able to exclude him for its deliberations in order to be fair to other bidders (Wall Street Journal) LEXTRACT (Xia et al 2000)

Corpus: Derivation Tree

44 Derived Tree

45 Summary points of the DLTAG system Implementation of D-LTAG  use LTAG grammar to parse each clause  use the same LTAG-based parser both at the sentence level and discourse level  build the semantics compositionally from the sentence to the discourse level  factor away non-compositional semantic contributions In the output representation  The semantics of the connectives form only part of the compositional derivation of discourse relations  Discourse connectives are NOT viewed as names of relations

46 The Penn Discourse Treebank  Annotation of discourse connective and their arguments  Large scale: annotation of the entire Penn Treebank (1 million words)

47 Merits of the PDTB  Discourse relations are lexically grounded Exposing a clearly defined level of discourse structure Enabling annotations with high reliability  Building on existing syntactic and semantic layers of annotation (Treebank, PropBank)  Annotations independent of the DLTAG (or any other) framework

48 Project description  Annotation of connectives in the Penn Treebank  30K tokens of connectives 20K explicit conns + 10K implicit conns  Annotation of ARG1 and ARG2 of conns Ex. Mary left early because she was sick. ARG1: Mary left early CONN: because ARG2: she was sick  Four annotators at the beginning, then two  To come: Semantic role labels for ARG1 and ARG2

49 Connectives  Subordinate conjunctions (when, because, although, etc.) ARG1 – ARG2 (1) Because [the drought reduced U.S. stockpiles], [they have more than enough storage space for their new crop], and that permits them to wait for prices to rise.

50 Connectives  Coordinate conjunctions (and, but, or, etc.) ARG1 – ARG2  (2) [William Gates and Paul Allen in 1975 developed an early language- housekeeper system for PCs], and [Gates became an industry billionaire six years after IBP adapted one of these versions in 1981].

51 Connectives  Adverbials (therefore, then, as a result, etc.) ARG1 – ARG2 (3) For years, costume jewelry makers fought a losing battle. Jewelry displays in department stores were often cluttered and uninspired. And the merchandise was, well, fake. As a result, marketers of faux gems steadily lost space in department stores to more fashionable rivals -- cosmetics makers.

52 Connectives  Implicit (annotators provide named expression for implicit connective) ARG1 – ARG2  (4) …[The $6 billion that some 40 companies are looking to raise in the year ending March 31 compares with only $2.7 billion raised on the capital market in the previous fiscal year]. IMPLICIT-(In contrast) [In fiscal 1984 before Mr. Gandhi came to power, only $810 million was raised].

53 Annotation guidelines   What counts as a connective? Including distinction between clausal adverbials and discourse adverbials  What counts as an argument? Minimally a clause  How far does the argument extend? Including distinction between arguments (ARG1 and ARG2) and supplements to arguments (SUP1 and SUP2 respectively) Interesting comparison with ProbBank annotations of verbs

54 WordFreak (T. Morton & J. Lacivita)

55 Preliminary experiments  10 explicit connectives (2717 tokens) Therefore, as a result, instead, otherwise, nevertheless, because, although, even though, when, so that  386 tokens of implicit connectives  2 annotators

56 Inter-annotator agreement (1)  Measure by token (ARG1+ARG2) ARG1 and ARG2 counted together Total number of connective ARG1/ARG2 tokens = 2717  Agreement = 82.8% Subord. Conj. = 86% Adverbials = 57%

57 Agreement per connective (1) CONNECTIVESAGR No.Conn. Total% AGR When Because Even though Although So that % 88.2% 88.3% 81.8% 79.4% TOTAL SUBCONJ % Nevertheless Otherwise Instead As a result Therefore % 91.3% 61.0% 45.2% 78.6% TOTAL ADV OVERALL TOTAL %

58 Inter-annotator agreement (2)  Measure by ARG (ARG1, ARG2) Check agreement for ARG1 and ARG2 Total number of argument tokens = 5434 (2717 ARG ARG2)  Agreement = 90.2% –ARG1 = 86.3% –ARG2 = 94.1% –Subord. Conj. =92.4% –Adverbial: =71.8%

59 Agreement per connective (2) CONNECTIVES AGR No.Conn. Total% AGR When Because Even though Although So that % 93.4% 94.1% 90.1% 89.2% TOTAL SUBCONJ % Nevertheless Otherwise Instead As a result therefore % 95.7% 72.9% 65.5% 87.5 TOTAL ADV % OVERALL TOTAL %

60 Analysis of disagreement Majority of disagreement due to ‘partial overlap’: 79% (5) It was forced into liquidation before trial when investors yanked their funds after the government demanded a huge pre-trial asset forfeiture. DISAGREEMENT TYPENo.% Missing annotations No overlap % 5.6% PARTIAL OVERLAP TOTAL42279% Parentheticals Higher verb Dependent clause Other % 33.9% 34.1% 1.1% Unresolved101.9% TOTAL534100%

61 Reanalysis of agreement  Inter-annotator agreement counting in partial overlap 94.5%  Dealing with extent of the argument Revise guidelines BUT: Some disagreement will persist

62 Comparing predicates  PropBank – sentence level predicates (verbs) Arity of arguments: Hard Extent of the argument: Easy  Penn Discourse Treebank – discourse predicates Arity of arguments: Easy Extent of the argument: Hard

63 Summary points for PDTB   The Penn Discourse Treebank Large scale discourse annotation Basic level of annotation: connectives and their arguments Links to Penn Treebank and Penn PropBank (rich substrate for extracting syntactic and semantic features) Expected completion November 2005  Inter-annotator agreement Most conservative: 82.8% Relaxing exact match: 94.5%