1 Natural Language Processing Chapter 18. 2 Outline Reference –Kinds of reference phenomena –Constraints on co-reference –Preferences for co-reference.

Slides:



Advertisements
Similar presentations
Discourse Structure and Discourse Coherence
Advertisements

Referring Expressions: Definition Referring expressions are words or phrases, the semantic interpretation of which is a discourse entity (also called referent)
Semantics (Representing Meaning)
Discourse Analysis David M. Cassel Natural Language Processing Villanova University April 21st, 2005 David M. Cassel Natural Language Processing Villanova.
Unit 6 Predicates, Referring Expressions, and Universe of Discourse Part 1: Practices 1-7.
1 Discourse, coherence and anaphora resolution Lecture 16.
Pragmatics II: Discourse structure Ling 571 Fei Xia Week 7: 11/10/05.
Discourse Martin Hassel KTH NADA Royal Institute of Technology Stockholm
Natural Language Processing
1/18 Dialogue systems R Mitkov (ed.) The Oxford Handbook of Computational Linguistics, Oxford (2004): OUP – Chapters 6 ( “ Discourse ” Allan Ramsay), 7.
Anaphora Resolution Spring 2010, UCSC – Adrian Brasoveanu [Slides based on various sources, collected over a couple of years and repeatedly modified –
Chapter 18: Discourse Tianjun Fu Ling538 Presentation Nov 30th, 2006.
INTERNATIONAL CONFERENCE ON NATURAL LANGUAGE PROCESSING NLP-AI IIIT-Hyderabad CIIL, Mysore ICON DECEMBER, 2003.
Pronouns.
 Christel Kemke 2007/08 COMP 4060 Natural Language Processing Discourse and Dialogue.
ZERO PRONOUN RESOLUTION IN JAPANESE Jeffrey Shu Ling 575 Discourse and Dialogue.
Discourse: Reference Ling571 Deep Processing Techniques for NLP March 2, 2011.
CS 4705 Algorithms for Reference Resolution. Anaphora resolution Finding in a text all the referring expressions that have one and the same denotation.
CS 4705 Discourse Structure and Text Coherence. What makes a text/dialogue coherent? Incoherent? “Consider, for example, the difference between passages.
CS 4705 Lecture 21 Algorithms for Reference Resolution.
Natural Language Generation Martin Hassel KTH CSC Royal Institute of Technology Stockholm
1 Introduction to Computational Linguistics Eleni Miltsakaki AUTH Fall 2005-Lecture 2.
CS 4705 Pronouns and Reference Resolution. Gracie: Oh yeah... and then Mr. and Mrs. Jones were having matrimonial trouble, and my brother was hired to.
CS 4705 Lecture 20 Reference. Pragmatics Context-dependent meaning Jeb Bush was helped by his brother and so was Frank Lautenberg. Mike Bloomberg bet.
1 Pragmatics: Discourse Analysis J&M’s Chapter 21.
Pragmatics I: Reference resolution Ling 571 Fei Xia Week 7: 11/8/05.
Designed by Elisa Paramore
Linguistic Theory Lecture 3 Movement. A brief history of movement Movements as ‘special rules’ proposed to capture facts that phrase structure rules cannot.
Introduction to English Syntax Level 1 Course Ron Kuzar Department of English Language and Literature University of Haifa Chapter 2 Sentences: From Lexicon.
1 Natural Language Processing Computational Discourse.
Fall 2005 Lecture Notes #7 EECS 595 / LING 541 / SI 661 Natural Language Processing.
Binding Theory Describing Relationships between Nouns.
1 CS3730/ISP3120 Discourse Processing and Pragmatics Lecture Notes Jan 10, 12.
Discourse Read J & M Chapter More than One Sentence at a Time The alphas have a long-standing hatred of the betas. Their leaders have decided that.
Phrases Composition. Goals: Using prepositions in writing 1.Do not end sentences on prepositions. 2.Reduce strings of prepositional phrases. 3.Begin sentences.
Artificial Intelligence: Natural Language
Reference Resolution. Sue bought a cup of coffee and a donut from Jane. She met John as she left. He looked at her enviously as she drank the coffee.
Coherence and Coreference Introduction to Discourse and Dialogue CS 359 October 2, 2001.
1 Special Electives of Comp.Linguistics: Processing Anaphoric Expressions Eleni Miltsakaki AUTH Fall 2005-Lecture 7.
Rules, Movement, Ambiguity
1/21 Automatic Discovery of Intentions in Text and its Application to Question Answering (ACL 2005 Student Research Workshop )
ACE TESOL Diploma Program – London Language Institute OBJECTIVES You will understand: 1. The terminology and concepts of semantics, pragmatics and discourse.
1 LIN 1310B Introduction to Linguistics Prof: Nikolay Slavkov TA: Qinghua Tang CLASS 16, March 6, 2007.
Topic and the Representation of Discourse Content
Reference Resolution CMSC Discourse and Dialogue September 30, 2004.
Pragmatics Nuha Alwadaani.
Support Vector Machines and Kernel Methods for Co-Reference Resolution 2007 Summer Workshop on Human Language Technology Center for Language and Speech.
SYNTAX.
NLP. Introduction to NLP Anaphora –I went to see my grandfather at the hospital. The old man has been there for weeks. He had surgery a few days ago.
Reference Resolution CMSC Natural Language Processing January 15, 2008.
An evolutionary approach for improving the quality of automatic summaries Constantin Orasan Research Group in Computational Linguistics School of Humanities,
Discourse: Structure and Coherence Kathy McKeown Thanks to Dan Jurafsky, Diane Litman, Andy Kehler, Jim Martin.
Discourse & Natural Language Generation Martin Hassel KTH NADA Royal Institute of Technology Stockholm
Fall 2004 Lecture Notes #8 EECS 595 / LING 541 / SI 661 Natural Language Processing.
Natural Language Processing COMPSCI 423/723 Rohit Kate.
Unit 6 Predicates, Referring Expressions, and Universe of Discourse.
Discourse: Structure and Coherence Kathy McKeown Thanks to Dan Jurafsky, Diane Litman, Andy Kehler, Jim Martin.
Implicature. I. Definition The term “Implicature” accounts for what a speaker can imply, suggest or mean, as distinct from what the speaker literally.
Discourse Analysis Natural Language Understanding basis
Statistical NLP: Lecture 3
Part I: Basics and Constituency
Referring Expressions: Definition
Clustering Algorithms for Noun Phrase Coreference Resolution
Lecture 17: Discourse: Anaphora Resolution and Coherence
Algorithms for Reference Resolution
CSCI 5832 Natural Language Processing
CSCI 5832 Natural Language Processing
Pronouns and Reference Resolution
CS4705 Natural Language Processing
Presentation transcript:

1 Natural Language Processing Chapter 18

2 Outline Reference –Kinds of reference phenomena –Constraints on co-reference –Preferences for co-reference –The Lappin-Leass algorithm for coreference Coherence –Hobbs coherence relations –Rhetorical Structure Theory

3 Part I: Reference Resolution John went to Bill’s car dealership to check out an Acura Integra. He looked at it for half an hour I’d like to get from Boston to San Francisco, on either December 5th or December 6th. It’s ok if it stops in another city along they way

4 Some terminology John went to Bill’s car dealership to check out an Acura Integra. He looked at it for half an hour Reference: process by which speakers use words John and he to denote a particular person –Referring expression: John, he –Referent: the actual entity (but as a shorthand we might call “John” the referent). –John and he “corefer” –Antecedent: John –Anaphor: he

5 Discourse Model Model of the entities the discourse is about A referent is first evoked into the model. Then later it is accessed from the model John He Corefer Evoke Access

6 Many types of reference (after Webber 91) According to John, Bob bought Sue an Integra, and Sue bought Fred a Legend –But that turned out to be a lie (a speech act) –But that was false (proposition) –That struck me as a funny way to describe the situation (manner of description) –That caused Sue to become rather poor (event) –That caused them both to become rather poor (combination of several events)

7 Reference Phenomena Indefinite noun phrases: generally new –I saw an Acura Integra today –Some Acura Integras were being unloaded… –I am going to the dealership to buy an Acura Integra today. (specific/non-specific) I hope they still have it I hope they have a car I like Definite noun phrases: identifiable to hearer because –Mentioned: I saw an Acura Integra today. The Integra was white –Identifiable from beliefs: The Indianapolis 500 –Inherently unique: The fastest car in …

8 Reference Phenomena: Pronouns I saw an Acura Integra today. It was white Compared to definite noun phrases, pronouns require more referent salience. –John went to Bob’s party, and parked next to a beautiful Acura Integra –He got out and talked to Bob, the owner, for more than an hour. –Bob told him that he recently got engaged and that they are moving into a new home on Main Street. –??He also said that he bought it yesterday. –He also said that he bought the Acura yesterday

9 Salience Via Structural Recency E: So, you have the engine assembly finished. Now attach the rope. By the way, did you buy the gas can today? A: Yes E: Did it cost much? A: No E: Good. Ok, have you got it attached yet?

10 More on Pronouns Cataphora: pronoun appears before referent: –Before he bought it, John checked over the Integra very carefully.

11 Inferrables I almost bought an Acura Integra today, but the engine seemed noisy. Mix the flour, butter, and water. –Kneed the dough until smooth and shiny –Spread the paste over the blueberries –Stir the batter until all lumps are gone.

12 Discontinuous sets John has an Acura and Mary has a Suburu. They drive them all the time.

13 Generics I saw no less than 6 Acura Integras today. They are the coolest cars.

14 Pronominal Reference Resolution Given a pronoun, find the referent (either in text or as a entity in the world) We will approach this today in 3 steps –Hard constraints on reference –Soft constraints on reference –Algorithms which use these constraints

15 Why people care Classic: "text understanding" Information extraction, information retrieval, summarization…

16 What influences pronoun resolution? Syntax Semantics/world knowledge

17 Why syntax matters John kicked Bill. Mary told him to go home. Bill was kicked by John. Mary told him to go home. John kicked Bill. Mary punched him.

18 Why syntax matters John kicked Bill. Mary told him to go home. Bill was kicked by John. Mary told him to go home. John kicked Bill. Mary punched him. John

19 Why syntax matters John kicked Bill. Mary told him to go home. Bill was kicked by John. Mary told him to go home. John kicked Bill. Mary punched him. Bill

20 Why syntax matters John kicked Bill. Mary told him to go home. Bill was kicked by John. Mary told him to go home. John kicked Bill. Mary punched him. Bill

21 Why syntax matters John kicked Bill. Mary told him to go home. Bill was kicked by John. Mary told him to go home. John kicked Bill. Mary punched him. Grammatical role hierarchy

22 Why syntax matters John kicked Bill. Mary told him to go home. Bill was kicked by John. Mary told him to go home. John kicked Bill. Mary punched him. Grammatical role parallelism

23 Why semantics matters The city council denied the demonstrators a permit because they {feared|advocated} violence.

24 Why semantics matters The city council denied the demonstrators a permit because they {feared|advocated} violence.

25 Why semantics matters The city council denied the demonstrators a permit because they {feared|advocated} violence.

26 Why knowledge matters John hit Bill. He was severely injured.

27 Margaret Thatcher admires Hillary Clinton, and George W. Bush absolutely worships her. Why Knowledge Matters

28 Hard constraints on coreference Number agreement –John has an Acura. It is red. Person and case agreement –*John and Mary have Acuras. We love them (where We=John and Mary) Gender agreement –John has an Acura. He/it/she is attractive. Syntactic constraints –John bought himself a new Acura (himself=John) –John bought him a new Acura (him = not John)

29 Pronoun Interpretation Preferences Selectional Restrictions –John parked his Acura in the garage. He had driven it around for hours. Recency –John has an Integra. Bill has a Legend. Mary likes to drive it.

30 Pronoun Interpretation Preferences Grammatical Role: Subject preference –John went to the Acura dealership with Bill. He bought an Integra. –Bill went to the Acura dealership with John. He bought an Integra –(?) John and Bill went to the Acura dealership. He bought an Integra

31 Repeated Mention preference John needed a car to get to his new job. He decided that he wanted something sporty. Bill went to the Acura dealership with him. He bought an Integra.

32 Parallelism Preference Mary went with Sue to the Acura dealership. Sally went with her to the Mazda dealership. Mary went with Sue to the Acura dealership. Sally told her not to buy anything.

33 Verb Semantics Preferences John telephoned Bill. He lost the pamphlet on Acuras. John criticized Bill. He lost the pamphlet on Acuras. Implicit causality –Implicit cause of criticizing is object. –Implicit cause of telephoning is subject.

34 Verb Preferences John seized the Acura pamphlet from Bill. He loves reading about cars. John passed the Acura pamphlet to Bill. He loves reading about cars.

35 Pronoun Resolution Algorithm Lappin and Leass (1994): Given he/she/it, assign antecedent. Implements only recency and syntactic preferences Two steps –Discourse model update When a new noun phrase is encountered, add a representation to discourse model with a salience value Modify saliences. –Pronoun resolution Choose the most salient antecedent

36 Salience Factors and Weights From Lappin and Leass Sentence recency100 Subject emphasis80 Existential emphasis70 Accusative (direct object) emphasis50 Ind. Obj and oblique emphasis40 Non-adverbial emphasis50 Head noun emphasis80

37 Recency Weights are cut in half after each sentence is processed This, and a sentence recency weight (100 for new sentences, cut in half each time), captures the recency preferences

38 Lappin and Leass (cont) Grammatical role preference –Subject > existential predicate nominal > object > indirect object > demarcated adverbial PP Examples –An Acura Integra is parked in the lot (subject) –There is an Acura Integra parked in the lot (ex. pred nominal) –John parked an Acura Integra in the lot (object) –John gave his Acura Integra a bath (indirect obj) –In his Acura Integra, John showed Susan his new CD player (demarcated adverbial PP) Head noun emphasis factor gives above 80 points, but followed embedded NP nothing: –The owner’s manual for an Acura Integra is on John’s desk

39 Lappin and Leass Algorithm Collect the potential referents (up to 4 sentences back) Remove potential referents that do not agree in number or gender with the pronoun Remove potential references that do not pass syntactic coreference constraints Compute total salience value of referent from all factors, including, if applicable, role parallelism (+35) or cataphora (-175). Select referent with highest salience value. In case of tie, select closest.

40 Example John saw a beautiful Acura Integra at the dealership. He showed it to Bob. He bought it. recSubjExistObjInd- obj Non- adv Head N Total John Integra dealership Sentence 1:

41 After sentence 1 Cut all values in half ReferentPhrasesValue John{John}155 Integra{a beautiful Acura Integra} 140 dealership{the dealership115

42 He showed it to Bob He specifies male gender So Step 2 reduces set of referents to only John. Now update discourse model: He in current sentence (recency=100), subject position (=80), not adverbial (=50) not embedded (=80), so add 310: ReferentPhrasesValue John{John, he1} Integra{a beautiful Acura Integra}140 dealership{the dealership115

43 He showed it to Bob Can be Integra or dealership. Need to add weights: –Parallelism: it + Integra are objects (dealership is not), so +35 for integra –Integra 175 to dealership 115, so pick Integra Update discourse model: it is nonembedded object, gets =280:

44 He showed it to Bob ReferentPhrasesValue John{John, he1}465 Integra{a beautiful Acura Integra, it1} 420 dealership{the dealership}115

45 He showed it to Bob Bob is new referent, is oblique argument, weight is =270 ReferentPhrasesValue John{John, he1}465 Integra{a beautiful Acura Integra, it1}420 Bob{Bob}270 dealership{the dealership}115

46 He bought it Drop weights in half: ReferentPhrasesValue John{John, he1}232.5 Integra{a beautiful Acura Integra, it1}210 Bob{Bob}135 dealership{the dealership}57.5 He2 will be resolved to John, and it2 to Integra

47 A search-based solution Hobbs 1978: Resolving pronoun references

48 Hobbs 1978 Assessment of difficulty of problem Incidence of the phenomenon A simple algorithm that has become a baseline See handout

49 A parse tree

50 Hobbs’s point …the naïve approach is quite good. Computationally speaking, it will be a long time before a semantically based algorithm is sophisticated enough to perform as well, and these results set a very high standard for any other approach to aim for.

51 Hobbs’s point Yet there is every reason to pursue a semantically based approach. The naïve algorithm does not work. Any one can think of examples where it fails. In these cases it not only fails; it gives no indication that it has failed and offers no help in finding the real antecedent. (p. 345)

52 Reference Resolution: Summary Lots of other algorithms and other constraints –Centering theory: constraints which focus on discourse state, and focus. (read on your own) –Hobbs: ref. resolution as by-product of general reasoning (later in these notes) –Mitkov et al. (e.g.) Machine learning

53 Part II: Text Coherence

54 What Makes a Discourse Coherent? The reason is that these utterances, when juxtaposed, will not exhibit coherence. Almost certainly not. Do you have a discourse? Assume that you have collected an arbitrary set of well- formed and independently interpretable utterances, for instance, by randomly selecting one sentence from each of the previous chapters of this book.

55 Better? Assume that you have collected an arbitrary set of well-formed and independently interpretable utterances, for instance, by randomly selecting one sentence from each of the previous chapters of this book. Do you have a discourse? Almost certainly not. The reason is that these utterances, when juxtaposed, will not exhibit coherence.

56 Coherence John hid Bill’s car keys. He was drunk ??John hid Bill’s car keys. He likes spinach

57 What makes a text coherent? Appropriate use of coherence relations between subparts of the discourse -- rhetorical structure Appropriate sequencing of subparts of the discourse -- discourse/topic structure Appropriate use of referring expressions

58 Hobbs 1979 Coherence Relations Result Infer that the state or event asserted by S0 causes or could cause the state or event asserted by S1. –John bought an Acura. His father went ballistic.

59 Hobbs: “Explanation” Infer that the state or event asserted by S1 causes or could cause the state or event asserted by S0 John hid Bill’s car keys. He was drunk

60 Hobbs: “Parallel” Infer p(a1,a2...) from the assertion of S0 and p(b1,b2…) from the assertion of S1, where ai and bi are similar, for all I. John bought an Acura. Bill leased a BMW.

61 Hobbs “Elaboration” Infer the same proposition P from the assertions of S0 and S1: John bought an Acura this weekend. He purchased a beautiful new Integra for 20 thousand dollars at Bill’s dealership on Saturday afternoon.

62 An Inference-Based Algorithm Abduction A  B; B; infer A (unsound) All Jaguars are fast. John’s car is fast. Abductively infer: John’s car is a Jaguar. Defeasible: John’s car is a Porsche, though. When we use abduction to recognize discourse coherence, we want the best explanation. Probabilities, heuristics, or both (Hobbs)

63 Example See lecture

64 Rhetorical Structure Theory One theory of discourse structure, based on identifying relations between segments of the text –Nucleus/satellite notion encodes asymmetry –Some rhetorical relations: Elaboration (set/member, class/instance, whole/part…) Contrast: multinuclear Condition: Sat presents precondition for N Purpose: Sat presents goal of the activity in N

65 Relations A sample definition –Relation: evidence –Constraints on N: H might not believe N as much as S think s/he should –Constraints on Sat: H already believes or will believe Sat An example: The governor supports big business. He is sure to veto House Bill 1711.

66 Automatic Rhetorical Structure Labeling Supervised machine learning –Get a group of annotators to assign a set of RST relations to a text –Extract a set of surface features from the text that might signal the presence of the rhetorical relations in that text –Train a supervised ML system based on the training set

67 Features Explicit markers: because, however, therefore, then, etc. Tendency of certain syntactic structures to signal certain relations: Infinitives are often used to signal purpose relations: Use rm to delete files. Ordering Tense/aspect Intonation

68 Some Problems with RST How many Rhetorical Relations are there? How can we use RST in dialogue as well as monologue? RST forces an artificial tree structure on discousres Difficult to get annotators to agree on labeling the same texts

69 Summary Reference –Kinds of reference phenomena –Constraints on co-reference –Preferences for co-reference –The Lappin-Leass algorithm for coreference Coherence –Hobbs coherence relations –Rhetorical Structure Theory