Natural Language Processing COMPSCI 423/723 Rohit Kate.

Natural Language Processing COMPSCI 423/723 Rohit Kate

Discourse Processing Reference: Jurafsky & Martin book, Chapter 21

Meaning in context Basic Steps of Natural Language Processing Sound waves Phonetics Words Syntactic processing Parses Semantic processing Meaning Dsicourse processing This is a conceptual pipeline, humans or computers may process multiple stages simultaneously

Discourse So far we always analyzed one sentence in isolation, syntactically and/or semantically Natural languages are spoken or written as a collection of sentences In general, a sentence cannot be understood in isolation: – Today was Jack's birthday. Penny and Janet went to the store. They were going to get presents. Janet decided to get a kite. "Don't do that," said Penny. "Jack has a kite. He will make you take it back.”

Discourse Discourse is a coherent structured group of sentences – Example: monologues (including reading passages), dialogues Very little work has been done in understanding beyond a sentence, i.e. understanding a whole paragraph or an entire document together Important tasks in processing a discourse – Discourse Segmentation – Determining Coherence Relations – Anaphora Resolution Ideally deep understanding is needed to do well on these tasks but so far shallow methods have been used

Discourse Segmentation Discourse Segmentation: Separating a document into a linear sequence of subtopics – For example: scientific articles are segmented into Abstract, Introduction, Methods, Results, Conclusions – This is often a simplification of a higher level structure of a discourse Applications of automatic discourse segmentation: – For Summarization: Summarize each segment separately – For Information Retrieval or Information Extraction: Apply to an appropriate segment Related task: Paragraph segmentation, for example of a speech transcript

Unsupervised Discourse Segmentation Given raw text, segment it into multiple paragraph subtopics Unsupervised: No training data is given for the task Cohesion-based approach: Segment into subtopics in which sentences/paragraphs are cohesive with each other; A dip is cohesion at subtopic boundaries

Cohesion Cohesion: Links between text units due to linguistic devices Lexical Cohesion: Use of same or similar words to link text units – Today was Jack's birthday. Penny and Janet went to the store. They were going to get presents. Janet decided to get a kite. "Don't do that," said Penny. "Jack has a kite. He will make you take it back.” Non-lexical Cohesion: For example, using anaphora

Cohesion-based Unsupervised Discourse Segmentation TextTiling algorithm (Hearst, 1997) – compare adjacent blocks of text – look for shifts in vocabulary Do pre-processing: Tokenization, remove stop words, stemming Divide text into pseudo-sentences of equal length (say 20 words)

TextTiling Algorithm contd. Compute lexical cohesion score at each gap between pseudo-sentences Lexical cohesion score: Similarity of words before and after the gap (take say 10 pseudo- sentences before and 10 pseudo-sentences after) Similarity: Cosine similarity between the word vectors (high if words co-occur) Gap

TextTiling Algorithm contd. Compute lexical cohesion score at each gap between pseudo-sentences Lexical cohesion score: Similarity of words before and after the gap (take say 10 pseudo- sentences before and 10 pseudo-sentences after) Similarity: Cosine similarity between the word vectors (high if words co-occur) Gap Similarity

Plot the similarity and compute the depth scores of the “similarity valleys”, (a-b)+(c-b) Assign segmentation if the depth score is larger than a threshold (e.g. one standard deviation deeper than mean valley depth) TextTiling Algorithm contd. a b c valley

Plot the similarity and compute the depth scores of the “similarity valleys”, (a-b)+(c-b) Assign segmentation if the depth score is larger than a threshold (e.g. one standard deviation deeper than mean valley depth) TextTiling Algorithm contd.

From (Hearst, 1994)

Supervised Discourse Segmentation Easy to get supervised data for some segmentation tasks – For e.g., paragraph segmentation – Useful to find paragraphs in speech recognition output

Supervised Discourse Segmentation Easy to get supervised data for some segmentation tasks – For e.g., paragraph segmentation – Useful to find paragraphs in speech recognition output Model as a classification task: Classify if the sentence boundary is a paragraph boundary – Use any classifier SVM, Naïve Bayes, Maximum Entropy etc.

Supervised Discourse Segmentation Easy to get supervised data for some segmentation tasks – For e.g., paragraph segmentation – Useful to find paragraphs in speech recognition output Model as a classification task: Classify if the sentence boundary is a paragraph boundary – Use any classifier SVM, Naïve Bayes, Maximum Entropy etc. Or model as a sequence labeling task: Label a sentence boundary with “paragraph boundary” or “not a paragraph boundary label”

Features: – Use cohesion features: word overlap, word cosine similarity, anaphoras etc. – Additional features: Discourse markers or cue word Discourse marker or cue phrase/word: A word or phrase that signal discourse structure – For example, “good evening”, “joining us now” in broadcast news – “Coming up next” at the end of a segment, “Company Incorporated” at the beginning of a segment etc. – Either hand-code or automatically determine by feature selection Supervised Discourse Segmentation

Discourse Segmentation Evaluation Not a good idea to measure precision, recall and F- measure because that won’t be sensitive to near misses One good metric WindowDiff (Pevzner & Hearst, 2002) Slide a window of length k across the reference (correct) and the hypothesized segmentation and count the number of segmentation boundaries in each WindowDiff metric: Average difference in the number of boundaries in the sliding window

Text Coherence A collection of independent sentences do not make a discourse because they lack coherence Coherence: Meaning relation between two units of text; explains how the meaning of different units of text combine to build meaning of the larger unit (to contrast, cohesion is links between units) John hid Bill’s car keys. He was drunk. John hid Bill’s car keys. He likes spinach. Humans try to find coherence between sentences all the time Explanation ???

Coherence Relations Coherence Relations: Set of connections between units in a discourse. A few more such relations, Hobbs (1979): The Tin Woodman was caught in the rain. His joints rusted. The scarecrow wanted some brains. The Tin Woodman wanted a heart. Dorothy was from Kansas. She lived in the midst of the great Kansas prairies. Dorothy picked up the oil-can. She oiled the Tin Woodman’s joints. Result Parallel Elaboration Occasion

Discourse Structure Discourse Structure: The hierarchical structure of a discourse according to the coherence relations. John went to the bank to deposit his paycheck. He then took a train to Bill’s car dealership. He needed to buy a car. The company he works for now isn’t near any public transportation. He also wanted to talk to Bill about their softball league.

Discourse Structure Discourse Structure: The hierarchical structure of a discourse according to the coherence relations. Analogous to syntactic tree structure A node in a tree represents locally coherent sentences: discourse segment (not linear) Occasion Explanation Parallel Explanation John went to the bank to deposit his paycheck. He then took a train to Bill’s car dealership. He needed to buy a car.The company he works for now isn’t near any public transportation. He also wanted to talk to Bil l about their softball league.

Discourse Structure What are the uses of discourse structure? – Summarization systems may skip or merge the segment connected with Elaboration relation – Question-answering systems can search in segments with Explanation relations – Information extraction system need not merge information from segments not linked by relations – A semantic parser may build a larger meaning representation of the whole discourse

Discourse Parsing Coherence Relation Assignment: Automatically determining the coherence relations between units of a discourse Discourse Parsing: Automatically finding the discourse structure of an entire discourse Both are largely unsolved problems, but some shallow methods work to some degree, for example, using cue phrases (or discourse markers)

Automatic Coherence Assignment Shallow cue-phrase-based algorithm: 1.Identify cue phrases in a text 2.Segment text into discourse segments using cue phrases 3.Assign coherence relations between consecutive discourse segments

1. Identify Cue Phrases Phrases that signal discourse structure, e.g. “joining us now”, “coming up next” etc. Connectives: “because”, “although”, “example”, “with”, “and” However, their occurrence is not always indicative of discourse relation: they are ambiguous – With its distant orbit, Mars exhibits frigid weather conditions – We can see Mars with an ordinary telescope Use some simple heuristics, e.g. capitalization of with, etc. but in general use techniques similar to word sense disambiguation

2. Segment Text into Discourse Segments Usually sentences so may suffice to to sentence segmentation However, often clauses are more appropriate – With its distant orbit, Mars exhibits frigid weather conditions Use hand-written rules or utilize syntactic parses to get such segments Explanation

3. Classify Relation between Neighboring Segments Use rules based on the cue phrases and connectives – For example, a sentence beginning with “Because” indicates Explanation relation with the next segment Train classifiers using appropriate features

Drawback of Cue-phrase-based Algorithm Sometimes relations are not signaled by cue phrases but are implicit through syntax, words, negation etc.: – I don’t want a truck. I’d prefer a convertible. Difficult to encode such rules these manually or to get labeled training examples One solution: Automatically find easy examples with cue phrases then remove the cue phrases to generate difficult supervised training examples – I don’t want a truck although I’d prefer a convertible. Contrast

Drawback of Cue-phrase-based Algorithm Sometimes relations are not signaled by cue phrases but are implicit through syntax, words, negation etc.: – I don’t want a truck. I’d prefer a convertible. Difficult to encode such rules these manually or to get labeled training examples One solution: Automatically find easy examples with cue phrases then remove the cue phrases to generate difficult supervised training examples – I don’t want a truck. I’d prefer a convertible. Train using words, word pairs, POS tags, etc. as features Contrast

Penn Discourse Treebank Recently released corpus that is likely to lead to better systems for discourse processing Has coherence relations encoded associated with the discourse connectives Linked to the Penn Treebank http://www.seas.upenn.edu/~pdtb/

Reference Resolution Reference Resolution: The task of determining what entities are referred to by which linguistic expressions To understand any discourse it is necessary to know which entities are being talked about at which point Mr. Obama visited the city. The president talked about Milwaukee’s economy. He mentioned new jobs. – “Mr.Obama”, “The president” and “He” are referring expressions for referent “Barack Obama” and they corefer – Anaphora: When a referring expression refers to a previously introduced entity (antecedent), the referring expression is called anaphoric, e.g. “The president”, “He” – Cataphora: When a referring expression refers to an entity which is introduced later, the referring expression is called cataphoric, e.g. “the city”

Two Reference Resolution Tasks Coreference Resolution: The task of finding referring expressions that refer to the same entity, i.e. find coreference chain – In the previous example the coreference chains are: {Mr. Obama, The president, he}, {the city, Milwaukee’s} Pronominal Anaphora Resolution: The task of finding the antecedent for a single pronoun – In the previous example, “he” refers to “Mr. Obama” A lot of work has been done in these tasks in the last 15 or so years [Ng, 2010]

Supervised Pronominal Anaphora Resolution Given a pronoun and an entity mentioned earlier, classify whether the pronoun refers to that entity or not given the surrounding context First filter out pleonastic pronouns like “It is raining.” using hand-written rules Use any classifier, obtain positive examples from training data, generate negative examples by pairing each pronouns with other (incorrect) entities Mr. Obama visited the city. The president talked about Milwaukee’s economy. He mentioned new jobs. ? ? ?

Features for Pronominal Anaphora Resolution Constraints: – Number agreement Singular pronouns (it/he/she/his/her/him) refer to singular entities and plural pronouns (we/they/us/them) refer to plural entities – Person agreement He/she/they etc. must refer to a third person entity – Gender agreement He -> John; she -> Mary; it -> car – Certain syntactic constraints John bought himself a new car. [himself -> John] John bought him a new car. [him can not be John]

Features for Pronominal Anaphora Resolution Preferences: – Recency: More recently mentioned entities are more likely to be referred to John went to a movie. Jack went as well. He was not busy. – Grammatical Role: Entities in the subject position is more likely to be referred to than entities in the object position John went to a movie with Jack. He was not busy. – Parallelism: John went with Jack to a movie. Joe went with him to a bar.

Features for Pronominal Anaphora Resolution Preferences: – Verb Semantics: Certain verbs seem to bias whether the subsequent pronouns should be referring to their subjects or objects John telephoned Bill. He lost the laptop. John criticized Bill. He lost the laptop. – Selectional Restrictions: Restrictions because of semantics John parked his car in the garage after driving it around for hours. Encode all these and may be more as features

Coreference Resolution Can be done analogously to pronominal anaphora resolution: Given an anaphor and a potential antecedent, classify as true or false Some approaches also do clustering on the referring expressions instead of doing binary classification Additional features to incorporate aliases, variations in names etc., e.g. Mr. Obama, Barack Obama; Megabucks, Megabucks Inc.

Natural Language Processing COMPSCI 423/723 Rohit Kate.

Similar presentations

Presentation on theme: "Natural Language Processing COMPSCI 423/723 Rohit Kate."— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

Natural Language Processing COMPSCI 423/723 Rohit Kate.

Similar presentations

Presentation on theme: "Natural Language Processing COMPSCI 423/723 Rohit Kate."— Presentation transcript:

Similar presentations

About project

Feedback