Presentation is loading. Please wait.

Presentation is loading. Please wait.

Discourse Analysis David M. Cassel Natural Language Processing Villanova University April 21st, 2005 David M. Cassel Natural Language Processing Villanova.

Similar presentations


Presentation on theme: "Discourse Analysis David M. Cassel Natural Language Processing Villanova University April 21st, 2005 David M. Cassel Natural Language Processing Villanova."— Presentation transcript:

1 Discourse Analysis David M. Cassel Natural Language Processing Villanova University April 21st, 2005 David M. Cassel Natural Language Processing Villanova University April 21st, 2005

2 April 2005 Discourse Analysis David M. Cassel Discourse Analysis Discourse: collocated, related groups of sentences (from book)

3 April 2005 Discourse Analysis David M. Cassel Discourse Analysis Discourse Model -- a model to represent the entities mentioned in the discourse Coreference or Anaphora Resolution -- determining which entity a referring expression refers to Coherence -- modeling the logical flow of the discourse The book also discusses Psycholinguistic Studies of Reference and Coherence Discourse Model -- a model to represent the entities mentioned in the discourse Coreference or Anaphora Resolution -- determining which entity a referring expression refers to Coherence -- modeling the logical flow of the discourse The book also discusses Psycholinguistic Studies of Reference and Coherence

4 April 2005 Discourse Analysis David M. Cassel Anaphora Resolution Before the game, manager Charlie Manuel said Gavin Floyd's performance would not affect whether he remains with the team when Vicente Padilla comes off the disabled list Tuesday. Then Floyd went out and had a nightmarish first inning: four walks, one wild pitch, one hit, four runs. After the game, Manuel said Floyd's disastrous outing had not changed his mind. The righthander will remain with the club and be used in relief. "The pitcher we saw in St. Louis is a pitcher who has the ability to be a very good major-league pitcher," he said. "He didn't have command of his fastball and couldn't get his breaking ball over tonight.... Maybe the cold was affecting his breaking ball, because he was bouncing a lot of them." -- Sam Carchidi, Philadelphia Inquirer, 4/16/05 Before the game, manager Charlie Manuel said Gavin Floyd's performance would not affect whether he remains with the team when Vicente Padilla comes off the disabled list Tuesday. Then Floyd went out and had a nightmarish first inning: four walks, one wild pitch, one hit, four runs. After the game, Manuel said Floyd's disastrous outing had not changed his mind. The righthander will remain with the club and be used in relief. "The pitcher we saw in St. Louis is a pitcher who has the ability to be a very good major-league pitcher," he said. "He didn't have command of his fastball and couldn't get his breaking ball over tonight.... Maybe the cold was affecting his breaking ball, because he was bouncing a lot of them." -- Sam Carchidi, Philadelphia Inquirer, 4/16/05

5 April 2005 Discourse Analysis David M. Cassel Discourse Model Gavin Floyd Charlie Manuel Vicente Padilla Gavin Floyd he Floyd The righthander The pitcher we saw in St. Louis his evoke (introduce) refer corefer Adapted from Figure 18.1, Speech & Language Processing

6 April 2005 Discourse Analysis David M. Cassel Types of Anaphoric References Indefinite noun phrases A baseball player like that should do well. Definite noun phrases The righthander will remain with the club. Pronouns He had a bad game. Demostratives This player has a bright future. One-anaphora I saw no less than 6 Acura Integras today. Now I want one. (from book) Indefinite noun phrases A baseball player like that should do well. Definite noun phrases The righthander will remain with the club. Pronouns He had a bad game. Demostratives This player has a bright future. One-anaphora I saw no less than 6 Acura Integras today. Now I want one. (from book)

7 April 2005 Discourse Analysis David M. Cassel Reference Constraints Number Agreement Floyd pitched 6 innings. They went well. Person and Case He didn’t have command of his fastball. Gender Agreement Floyd took his glove with him. It fit well. Syntactic Contraints Floyd threw him the ball. Selectional Restrictions Floyd stepped onto the mound with the ball. He threw it really fast. Number Agreement Floyd pitched 6 innings. They went well. Person and Case He didn’t have command of his fastball. Gender Agreement Floyd took his glove with him. It fit well. Syntactic Contraints Floyd threw him the ball. Selectional Restrictions Floyd stepped onto the mound with the ball. He threw it really fast.

8 April 2005 Discourse Analysis David M. Cassel Preferences Recency Floyd threw the ball. Lieberthal picked it up. He put the ball in his pocket. Grammatical Role Floyd threw the ball to Lieberthal. His arm was getting tired. Repeated Mention (See article) Parallelism Floyd threw a ball to Lieberthal. Wagner threw a ball to him, too. Verb Semantics John telephoned Bill. He lost the pamphlet on Acuras. John criticized Bill. He lost the pamphlet on Acuras. Recency Floyd threw the ball. Lieberthal picked it up. He put the ball in his pocket. Grammatical Role Floyd threw the ball to Lieberthal. His arm was getting tired. Repeated Mention (See article) Parallelism Floyd threw a ball to Lieberthal. Wagner threw a ball to him, too. Verb Semantics John telephoned Bill. He lost the pamphlet on Acuras. John criticized Bill. He lost the pamphlet on Acuras.

9 April 2005 Discourse Analysis David M. Cassel Pronoun Resolution Algorithms Traditional Carter: shallow parsing Rich, LuperFoy: distributed architecture Carbonell, Brown: multi-strategy Rico Pérez: scalar product Mitkov: combination of linguistic, statistical (high 80s) Lappin, Leass: syntax-based (86%) Hobbs: Tree Search Algorithm (91.7%) Grosz, Joshi, Weinstein: Centering Algorithm (77.6%) Hobbs: Coherence Traditional Carter: shallow parsing Rich, LuperFoy: distributed architecture Carbonell, Brown: multi-strategy Rico Pérez: scalar product Mitkov: combination of linguistic, statistical (high 80s) Lappin, Leass: syntax-based (86%) Hobbs: Tree Search Algorithm (91.7%) Grosz, Joshi, Weinstein: Centering Algorithm (77.6%) Hobbs: Coherence Alternative Nasukawa: knowledge- independent (93.8%) Dagan, Itai: statistical, corpus processing (87% for “genuine” it) Connolly, Burger, Day: machine learning Aone, Bennett: machine learning (“close to 90%”) Mitkov: uncertainty reasoning Mitkov: 2-engine (~90%) Tin, Akman: situational semantics Say, Vakman

10 April 2005 Discourse Analysis David M. Cassel Lappin & Leass Book presents a slightly modified algorithm for nonreflexive, 3rd person pronouns. Two parts: Update discourse model with salience value Resolve pronouns Let’s apply this to some text: In the afternoon, Gavin Floyd played baseball at the park. Then he went to a bar with Mike Lieberthal. He enjoyed a beer. Book presents a slightly modified algorithm for nonreflexive, 3rd person pronouns. Two parts: Update discourse model with salience value Resolve pronouns Let’s apply this to some text: In the afternoon, Gavin Floyd played baseball at the park. Then he went to a bar with Mike Lieberthal. He enjoyed a beer.

11 April 2005 Discourse Analysis David M. Cassel Salience Factors FactorWeight Sentence recency100 Subject emphasis80 Existential emphasis70 Accusative (direct object) emphasis50 Indirect object, oblique complement emphasis 40 Non-adverbial emphasis50 Head noun emphasis80

12 April 2005 Discourse Analysis David M. Cassel Pronoun Salience FactorWeight Role parallelism35 Cataphora-175

13 April 2005 Discourse Analysis David M. Cassel L&L Algorithm Collect the potential referents (up to four sentences back). Remove potential referents that do not agree in number or gender with the pronoun. Remove potential referents that do not pass intrasentential syntactic coreference constraints. Compute the total salience value of the referent by adding any applicable values to existing salience value. Select the referent with the highest salience value. In case of ties, select closest referent in terms of string position. Collect the potential referents (up to four sentences back). Remove potential referents that do not agree in number or gender with the pronoun. Remove potential referents that do not pass intrasentential syntactic coreference constraints. Compute the total salience value of the referent by adding any applicable values to existing salience value. Select the referent with the highest salience value. In case of ties, select closest referent in terms of string position.

14 April 2005 Discourse Analysis David M. Cassel Example In the afternoon, Gavin Floyd played baseball at the park. Then he went to a bar with Mike Lieberthal. He enjoyed a beer. RecSubjExistObjInd- Obj Non- Adv Head Noun Total the afternoon10080180 Gavin Floyd100805080310 baseball10050 250 the park10050150

15 April 2005 Discourse Analysis David M. Cassel Example In the afternoon, Gavin Floyd played baseball at the park. Then he went to a bar with Mike Lieberthal. He enjoyed a beer. CarryRecSubjExistObjInd- Obj Non- Adv Head Noun Total the afternoon90 Gavin Floyd155 baseball125 the park75 a bar1005080230 Mike Lieberthal10050150

16 April 2005 Discourse Analysis David M. Cassel Example In the afternoon, Gavin Floyd played baseball at the park. Then he went to a bar with Mike Lieberthal. He enjoyed a beer. CarryRecSubjExistObjInd- Obj Non- Adv Head Noun Total the afternoon90 {Gavin Floyd, he}155100805080465 baseball125 the park75 a bar1005080230 Mike Lieberthal10050150

17 April 2005 Discourse Analysis David M. Cassel Example In the afternoon, Gavin Floyd played baseball at the park. Then he went to a bar with Mike Lieberthal. He enjoyed a beer. Carry the afternoon45 {Gavin Floyd, he}230 baseball62 the park37 a bar115 Mike Lieberthal75 a beer280 Gavin Floyd gets 35 point for Role Parallelism. Mike Lieberthal does not. Floyd => 265 points Lieberthal => 75 points We pick Floyd as the antecedent of He.

18 April 2005 Discourse Analysis David M. Cassel Summary Discourse Analysis requires processing more text than POS tagging or finding entities. Part of tracing the flow of discourse is resolving anaphora. That resolution lets us capture more relationships and other information than we could otherwise. Discourse Analysis requires processing more text than POS tagging or finding entities. Part of tracing the flow of discourse is resolving anaphora. That resolution lets us capture more relationships and other information than we could otherwise.


Download ppt "Discourse Analysis David M. Cassel Natural Language Processing Villanova University April 21st, 2005 David M. Cassel Natural Language Processing Villanova."

Similar presentations


Ads by Google