Discourse Analysis David M. Cassel Natural Language Processing Villanova University April 21st, 2005 David M. Cassel Natural Language Processing Villanova.

Slides:



Advertisements
Similar presentations
Pronouns and Reference Resolution CS 4705 Julia Hirschberg CS 4705.
Advertisements

Specialized models and ranking for coreference resolution Pascal Denis ALPAGE Project Team INRIA Rocquencourt F Le Chesnay, France Jason Baldridge.
Semantics (Representing Meaning)
Syntax. Definition: a set of rules that govern how words are combined to form longer strings of meaning meaning like sentences.
Batter Up! To play, click the presentation button.
Processing of large document collections Part 6 (Text summarization: discourse- based approaches) Helena Ahonen-Myka Spring 2006.
1 Discourse, coherence and anaphora resolution Lecture 16.
Discourse Martin Hassel KTH NADA Royal Institute of Technology Stockholm
1/18 Dialogue systems R Mitkov (ed.) The Oxford Handbook of Computational Linguistics, Oxford (2004): OUP – Chapters 6 ( “ Discourse ” Allan Ramsay), 7.
Anaphora Resolution Spring 2010, UCSC – Adrian Brasoveanu [Slides based on various sources, collected over a couple of years and repeatedly modified –
Chapter 18: Discourse Tianjun Fu Ling538 Presentation Nov 30th, 2006.
For Monday Read Chapter 23, sections 3-4 Homework –Chapter 23, exercises 1, 6, 14, 19 –Do them in order. Do NOT read ahead.
 Christel Kemke 2007/08 COMP 4060 Natural Language Processing Discourse and Dialogue.
Discourse: Reference Ling571 Deep Processing Techniques for NLP March 2, 2011.
CS 4705 Algorithms for Reference Resolution. Anaphora resolution Finding in a text all the referring expressions that have one and the same denotation.
Final Review CS4705 Natural Language Processing. Semantics Meaning Representations –Predicate/argument structure and FOPC Thematic roles and selectional.
CS 4705 Lecture 21 Algorithms for Reference Resolution.
Natural Language Generation Martin Hassel KTH CSC Royal Institute of Technology Stockholm
CS 4705 Lecture 20 Reference. Pragmatics Context-dependent meaning Jeb Bush was helped by his brother and so was Frank Lautenberg. Mike Bloomberg bet.
1 Pragmatics: Discourse Analysis J&M’s Chapter 21.
Pragmatics I: Reference resolution Ling 571 Fei Xia Week 7: 11/8/05.
Designed by Elisa Paramore
Pronoun-Antecedent Agreement What do you need to understand about pronoun-antecedent agreement errors? What’s a pronoun? What’s an antecedent? What’s a.
Pronouns and Reference Resolution CS  HW3 deadline extended to Wednesday, Nov. 25 th at 11:58 pm  Michael Collins will talk Thursday, Dec. 3 rd.
11 CS 388: Natural Language Processing: Syntactic Parsing Raymond J. Mooney University of Texas at Austin.
Lecture 1, 7/21/2005Natural Language Processing1 CS60057 Speech &Natural Language Processing Autumn 2005 Lecture 1 21 July 2005.
A Light-weight Approach to Coreference Resolution for Named Entities in Text Marin Dimitrov Ontotext Lab, Sirma AI Kalina Bontcheva, Hamish Cunningham,
ELN – Natural Language Processing Giuseppe Attardi
1 Computational Discourse Chapter 21 November 2012 Lecture #15.
Copyright © Cengage Learning. All rights reserved.
Empirical Methods in Information Extraction Claire Cardie Appeared in AI Magazine, 18:4, Summarized by Seong-Bae Park.
1 Natural Language Processing Computational Discourse.
Fall 2005 Lecture Notes #7 EECS 595 / LING 541 / SI 661 Natural Language Processing.
For Friday Finish chapter 23 Homework: –Chapter 22, exercise 9.
A multiple knowledge source algorithm for anaphora resolution Allaoua Refoufi Computer Science Department University of Setif, Setif 19000, Algeria .
Introduction  Information Extraction (IE)  A limited form of “complete text comprehension”  Document 로부터 entity, relationship 을 추출 
1 CS3730/ISP3120 Discourse Processing and Pragmatics Lecture Notes Jan 10, 12.
Pronouns By Mrs. Ball and Ms. Jenkins. Pronoun Review What is a pronoun?
1 Special Electives of Comp.Linguistics: Processing Anaphoric Expressions Eleni Miltsakaki AUTH Fall 2005-Lecture 2.
Processing of large document collections Part 6 (Text summarization: discourse- based approaches) Helena Ahonen-Myka Spring 2005.
For Wednesday Read chapter 23 Homework: –Chapter 22, exercises 1,4, 7, and 14.
Reference Resolution. Sue bought a cup of coffee and a donut from Jane. She met John as she left. He looked at her enviously as she drank the coffee.
CSA2050 Introduction to Computational Linguistics Lecture 1 Overview.
Coherence and Coreference Introduction to Discourse and Dialogue CS 359 October 2, 2001.
1 Natural Language Processing Chapter Outline Reference –Kinds of reference phenomena –Constraints on co-reference –Preferences for co-reference.
CSA2050 Introduction to Computational Linguistics Parsing I.
For Monday Read chapter 24, sections 1-3 Homework: –Chapter 23, exercise 8.
dimaggio-damn-yankees-baseball/ html BASEBALL American’s ‘National Pasttime’
Natural Language Processing
Deep structure (semantic) Structure of language Surface structure (grammatical, lexical, phonological) Semantic units have all meaning components such.
Reference Resolution CMSC Discourse and Dialogue September 30, 2004.
Evaluation issues in anaphora resolution and beyond Ruslan Mitkov University of Wolverhampton Faro, 27 June 2002.
Measuring the Influence of Errors Induced by the Presence of Dialogs in Reference Clustering of Narrative Text Alaukik Aggarwal, Department of Computer.
Support Vector Machines and Kernel Methods for Co-Reference Resolution 2007 Summer Workshop on Human Language Technology Center for Language and Speech.
NLP. Introduction to NLP Anaphora –I went to see my grandfather at the hospital. The old man has been there for weeks. He had surgery a few days ago.
Reference Resolution CMSC Natural Language Processing January 15, 2008.
For Monday Read chapter 26 Homework: –Chapter 23, exercises 8 and 9.
Unit 1 Language Parts of Speech. Nouns A noun is a word that names a person, place, thing, or idea Common noun - general name Proper noun – specific name.
Natural Language Processing COMPSCI 423/723 Rohit Kate.
Grammar The “4 – Level” Analysis. The 4 - Levels Jack ate a delicious sandwich. Level 1 – parts of speech Level 2 – parts of a sentence Level 3 – phrases.
Parts of Speech Grammar Unit 1.
Discourse Analysis Natural Language Understanding basis
Semantics (Representing Meaning)
Referring Expressions: Definition
Chapter 4 Basics of English Grammar
Lecture 17: Discourse: Anaphora Resolution and Coherence
Anaphora Resolution Spring 2010, UCSC – Adrian Brasoveanu
Algorithms for Reference Resolution
Chapter 4 Basics of English Grammar
CS4705 Natural Language Processing
Presentation transcript:

Discourse Analysis David M. Cassel Natural Language Processing Villanova University April 21st, 2005 David M. Cassel Natural Language Processing Villanova University April 21st, 2005

April 2005 Discourse Analysis David M. Cassel Discourse Analysis Discourse: collocated, related groups of sentences (from book)

April 2005 Discourse Analysis David M. Cassel Discourse Analysis Discourse Model -- a model to represent the entities mentioned in the discourse Coreference or Anaphora Resolution -- determining which entity a referring expression refers to Coherence -- modeling the logical flow of the discourse The book also discusses Psycholinguistic Studies of Reference and Coherence Discourse Model -- a model to represent the entities mentioned in the discourse Coreference or Anaphora Resolution -- determining which entity a referring expression refers to Coherence -- modeling the logical flow of the discourse The book also discusses Psycholinguistic Studies of Reference and Coherence

April 2005 Discourse Analysis David M. Cassel Anaphora Resolution Before the game, manager Charlie Manuel said Gavin Floyd's performance would not affect whether he remains with the team when Vicente Padilla comes off the disabled list Tuesday. Then Floyd went out and had a nightmarish first inning: four walks, one wild pitch, one hit, four runs. After the game, Manuel said Floyd's disastrous outing had not changed his mind. The righthander will remain with the club and be used in relief. "The pitcher we saw in St. Louis is a pitcher who has the ability to be a very good major-league pitcher," he said. "He didn't have command of his fastball and couldn't get his breaking ball over tonight.... Maybe the cold was affecting his breaking ball, because he was bouncing a lot of them." -- Sam Carchidi, Philadelphia Inquirer, 4/16/05 Before the game, manager Charlie Manuel said Gavin Floyd's performance would not affect whether he remains with the team when Vicente Padilla comes off the disabled list Tuesday. Then Floyd went out and had a nightmarish first inning: four walks, one wild pitch, one hit, four runs. After the game, Manuel said Floyd's disastrous outing had not changed his mind. The righthander will remain with the club and be used in relief. "The pitcher we saw in St. Louis is a pitcher who has the ability to be a very good major-league pitcher," he said. "He didn't have command of his fastball and couldn't get his breaking ball over tonight.... Maybe the cold was affecting his breaking ball, because he was bouncing a lot of them." -- Sam Carchidi, Philadelphia Inquirer, 4/16/05

April 2005 Discourse Analysis David M. Cassel Discourse Model Gavin Floyd Charlie Manuel Vicente Padilla Gavin Floyd he Floyd The righthander The pitcher we saw in St. Louis his evoke (introduce) refer corefer Adapted from Figure 18.1, Speech & Language Processing

April 2005 Discourse Analysis David M. Cassel Types of Anaphoric References Indefinite noun phrases A baseball player like that should do well. Definite noun phrases The righthander will remain with the club. Pronouns He had a bad game. Demostratives This player has a bright future. One-anaphora I saw no less than 6 Acura Integras today. Now I want one. (from book) Indefinite noun phrases A baseball player like that should do well. Definite noun phrases The righthander will remain with the club. Pronouns He had a bad game. Demostratives This player has a bright future. One-anaphora I saw no less than 6 Acura Integras today. Now I want one. (from book)

April 2005 Discourse Analysis David M. Cassel Reference Constraints Number Agreement Floyd pitched 6 innings. They went well. Person and Case He didn’t have command of his fastball. Gender Agreement Floyd took his glove with him. It fit well. Syntactic Contraints Floyd threw him the ball. Selectional Restrictions Floyd stepped onto the mound with the ball. He threw it really fast. Number Agreement Floyd pitched 6 innings. They went well. Person and Case He didn’t have command of his fastball. Gender Agreement Floyd took his glove with him. It fit well. Syntactic Contraints Floyd threw him the ball. Selectional Restrictions Floyd stepped onto the mound with the ball. He threw it really fast.

April 2005 Discourse Analysis David M. Cassel Preferences Recency Floyd threw the ball. Lieberthal picked it up. He put the ball in his pocket. Grammatical Role Floyd threw the ball to Lieberthal. His arm was getting tired. Repeated Mention (See article) Parallelism Floyd threw a ball to Lieberthal. Wagner threw a ball to him, too. Verb Semantics John telephoned Bill. He lost the pamphlet on Acuras. John criticized Bill. He lost the pamphlet on Acuras. Recency Floyd threw the ball. Lieberthal picked it up. He put the ball in his pocket. Grammatical Role Floyd threw the ball to Lieberthal. His arm was getting tired. Repeated Mention (See article) Parallelism Floyd threw a ball to Lieberthal. Wagner threw a ball to him, too. Verb Semantics John telephoned Bill. He lost the pamphlet on Acuras. John criticized Bill. He lost the pamphlet on Acuras.

April 2005 Discourse Analysis David M. Cassel Pronoun Resolution Algorithms Traditional Carter: shallow parsing Rich, LuperFoy: distributed architecture Carbonell, Brown: multi-strategy Rico Pérez: scalar product Mitkov: combination of linguistic, statistical (high 80s) Lappin, Leass: syntax-based (86%) Hobbs: Tree Search Algorithm (91.7%) Grosz, Joshi, Weinstein: Centering Algorithm (77.6%) Hobbs: Coherence Traditional Carter: shallow parsing Rich, LuperFoy: distributed architecture Carbonell, Brown: multi-strategy Rico Pérez: scalar product Mitkov: combination of linguistic, statistical (high 80s) Lappin, Leass: syntax-based (86%) Hobbs: Tree Search Algorithm (91.7%) Grosz, Joshi, Weinstein: Centering Algorithm (77.6%) Hobbs: Coherence Alternative Nasukawa: knowledge- independent (93.8%) Dagan, Itai: statistical, corpus processing (87% for “genuine” it) Connolly, Burger, Day: machine learning Aone, Bennett: machine learning (“close to 90%”) Mitkov: uncertainty reasoning Mitkov: 2-engine (~90%) Tin, Akman: situational semantics Say, Vakman

April 2005 Discourse Analysis David M. Cassel Lappin & Leass Book presents a slightly modified algorithm for nonreflexive, 3rd person pronouns. Two parts: Update discourse model with salience value Resolve pronouns Let’s apply this to some text: In the afternoon, Gavin Floyd played baseball at the park. Then he went to a bar with Mike Lieberthal. He enjoyed a beer. Book presents a slightly modified algorithm for nonreflexive, 3rd person pronouns. Two parts: Update discourse model with salience value Resolve pronouns Let’s apply this to some text: In the afternoon, Gavin Floyd played baseball at the park. Then he went to a bar with Mike Lieberthal. He enjoyed a beer.

April 2005 Discourse Analysis David M. Cassel Salience Factors FactorWeight Sentence recency100 Subject emphasis80 Existential emphasis70 Accusative (direct object) emphasis50 Indirect object, oblique complement emphasis 40 Non-adverbial emphasis50 Head noun emphasis80

April 2005 Discourse Analysis David M. Cassel Pronoun Salience FactorWeight Role parallelism35 Cataphora-175

April 2005 Discourse Analysis David M. Cassel L&L Algorithm Collect the potential referents (up to four sentences back). Remove potential referents that do not agree in number or gender with the pronoun. Remove potential referents that do not pass intrasentential syntactic coreference constraints. Compute the total salience value of the referent by adding any applicable values to existing salience value. Select the referent with the highest salience value. In case of ties, select closest referent in terms of string position. Collect the potential referents (up to four sentences back). Remove potential referents that do not agree in number or gender with the pronoun. Remove potential referents that do not pass intrasentential syntactic coreference constraints. Compute the total salience value of the referent by adding any applicable values to existing salience value. Select the referent with the highest salience value. In case of ties, select closest referent in terms of string position.

April 2005 Discourse Analysis David M. Cassel Example In the afternoon, Gavin Floyd played baseball at the park. Then he went to a bar with Mike Lieberthal. He enjoyed a beer. RecSubjExistObjInd- Obj Non- Adv Head Noun Total the afternoon Gavin Floyd baseball the park

April 2005 Discourse Analysis David M. Cassel Example In the afternoon, Gavin Floyd played baseball at the park. Then he went to a bar with Mike Lieberthal. He enjoyed a beer. CarryRecSubjExistObjInd- Obj Non- Adv Head Noun Total the afternoon90 Gavin Floyd155 baseball125 the park75 a bar Mike Lieberthal

April 2005 Discourse Analysis David M. Cassel Example In the afternoon, Gavin Floyd played baseball at the park. Then he went to a bar with Mike Lieberthal. He enjoyed a beer. CarryRecSubjExistObjInd- Obj Non- Adv Head Noun Total the afternoon90 {Gavin Floyd, he} baseball125 the park75 a bar Mike Lieberthal

April 2005 Discourse Analysis David M. Cassel Example In the afternoon, Gavin Floyd played baseball at the park. Then he went to a bar with Mike Lieberthal. He enjoyed a beer. Carry the afternoon45 {Gavin Floyd, he}230 baseball62 the park37 a bar115 Mike Lieberthal75 a beer280 Gavin Floyd gets 35 point for Role Parallelism. Mike Lieberthal does not. Floyd => 265 points Lieberthal => 75 points We pick Floyd as the antecedent of He.

April 2005 Discourse Analysis David M. Cassel Summary Discourse Analysis requires processing more text than POS tagging or finding entities. Part of tracing the flow of discourse is resolving anaphora. That resolution lets us capture more relationships and other information than we could otherwise. Discourse Analysis requires processing more text than POS tagging or finding entities. Part of tracing the flow of discourse is resolving anaphora. That resolution lets us capture more relationships and other information than we could otherwise.