Natural Language Processing COMPSCI 423/723 Rohit Kate.

Slides:

Advertisements

Similar presentations

Discourse Structure and Discourse Coherence

Advertisements

Specialized models and ranking for coreference resolution Pascal Denis ALPAGE Project Team INRIA Rocquencourt F Le Chesnay, France Jason Baldridge.

A Machine Learning Approach to Coreference Resolution of Noun Phrases By W.M.Soon, H.T.Ng, D.C.Y.Lim Presented by Iman Sen.

Distant Supervision for Emotion Classification in Twitter posts 1/17.

Processing of large document collections Part 6 (Text summarization: discourse- based approaches) Helena Ahonen-Myka Spring 2006.

1 Discourse, coherence and anaphora resolution Lecture 16.

Discourse Martin Hassel KTH NADA Royal Institute of Technology Stockholm

Natural Language Processing

Chapter 18: Discourse Tianjun Fu Ling538 Presentation Nov 30th, 2006.

Recognizing Implicit Discourse Relations in the Penn Discourse Treebank Ziheng Lin, Min-Yen Kan, and Hwee Tou Ng Department of Computer Science National.

1 SIMS 290-2: Applied Natural Language Processing Marti Hearst Sept 20, 2004.

1 Noun Homograph Disambiguation Using Local Context in Large Text Corpora Marti A. Hearst Presented by: Heng Ji Mar. 29, 2004.

Natural Language Generation Martin Hassel KTH CSC Royal Institute of Technology Stockholm

Introduction to CL Session 1: 7/08/2011. What is computational linguistics? Processing natural language text by computers  for practical applications.

Language, Mind, and Brain by Ewa Dabrowska Chapter 2: Language processing: speed and flexibility.

Information Retrieval and Extraction 資訊檢索與擷取 Chia-Hui Chang National Central University

Some definitions Morphemes = smallest unit of meaning in a language Phrase = set of one or more words that go together (from grammar) (e.g., subject clause,

1 Pragmatics: Discourse Analysis J&M’s Chapter 21.

Pragmatics I: Reference resolution Ling 571 Fei Xia Week 7: 11/8/05.

Statistical Natural Language Processing. What is NLP?  Natural Language Processing (NLP), or Computational Linguistics, is concerned with theoretical.

Semantic and phonetic automatic reconstruction of medical dictations STEFAN PETRIK, CHRISTINA DREXEL, LEO FESSLER, JEREMY JANCSARY, ALEXANDRA KLEIN,GERNOT.

ELN – Natural Language Processing Giuseppe Attardi

1 Computational Discourse Chapter 21 November 2012 Lecture #15.

Thumbs Up or Thumbs Down? Semantic Orientation Applied to Unsupervised Classification on Reviews Peter D. Turney Institute for Information Technology National.

Richard Socher Cliff Chiung-Yu Lin Andrew Y. Ng Christopher D. Manning

Empirical Methods in Information Extraction Claire Cardie Appeared in AI Magazine, 18:4, Summarized by Seong-Bae Park.

Distributional Part-of-Speech Tagging Hinrich Schütze CSLI, Ventura Hall Stanford, CA , USA NLP Applications.

2007. Software Engineering Laboratory, School of Computer Science S E Towards Answering Opinion Questions: Separating Facts from Opinions and Identifying.

Illinois-Coref: The UI System in the CoNLL-2012 Shared Task Kai-Wei Chang, Rajhans Samdani, Alla Rozovskaya, Mark Sammons, and Dan Roth Supported by ARL,

Automatic Detection of Tags for Political Blogs Khairun-nisa Hassanali Vasileios Hatzivassiloglou The University.

1 Statistical NLP: Lecture 9 Word Sense Disambiguation.

This work is supported by the Intelligence Advanced Research Projects Activity (IARPA) via Department of Interior National Business Center contract number.

Efficiently Computed Lexical Chains As an Intermediate Representation for Automatic Text Summarization H.G. Silber and K.F. McCoy University of Delaware.

CS 4705 Lecture 19 Word Sense Disambiguation. Overview Selectional restriction based approaches Robust techniques –Machine Learning Supervised Unsupervised.

CS774. Markov Random Field : Theory and Application Lecture 19 Kyomin Jung KAIST Nov

1 Learning Sub-structures of Document Semantic Graphs for Document Summarization 1 Jure Leskovec, 1 Marko Grobelnik, 2 Natasa Milic-Frayling 1 Jozef Stefan.

A Cross-Lingual ILP Solution to Zero Anaphora Resolution Ryu Iida & Massimo Poesio (ACL-HLT 2011)

Collocations and Information Management Applications Gregor Erbach Saarland University Saarbrücken.

Coherence and Coreference Introduction to Discourse and Dialogue CS 359 October 2, 2001.

1 Sentence Extraction-based Presentation Summarization Techniques and Evaluation Metrics Makoto Hirohata, Yousuke Shinnaka, Koji Iwano and Sadaoki Furui.

Using Semantic Relations to Improve Passage Retrieval for Question Answering Tom Morton.

1/21 Automatic Discovery of Intentions in Text and its Application to Question Answering (ACL 2005 Student Research Workshop )

A Critique and Improvement of an Evaluation Metric for Text Segmentation A Paper by Lev Pevzner (Harvard University) Marti A. Hearst (UC, Berkeley) Presented.

For Friday Finish chapter 23 Homework –Chapter 23, exercise 15.

Automatic recognition of discourse relations Lecture 3.

Support Vector Machines and Kernel Methods for Co-Reference Resolution 2007 Summer Workshop on Human Language Technology Center for Language and Speech.

Improved Video Categorization from Text Metadata and User Comments ACM SIGIR 2011:Research and development in Information Retrieval - Katja Filippova -

Levels of Linguistic Analysis

Finding document topics for improving topic segmentation Source: ACL2007 Authors: Olivier Ferret (18 route du Panorama, BP6) Reporter:Yong-Xiang Chen.

Discovering Relations among Named Entities from Large Corpora Takaaki Hasegawa *, Satoshi Sekine 1, Ralph Grishman 1 ACL 2004 * Cyberspace Laboratories.

An evolutionary approach for improving the quality of automatic summaries Constantin Orasan Research Group in Computational Linguistics School of Humanities,

Department of Computer Science The University of Texas at Austin USA Joint Entity and Relation Extraction using Card-Pyramid Parsing Rohit J. Kate Raymond.

Learning Event Durations from Event Descriptions Feng Pan, Rutu Mulkar, Jerry R. Hobbs University of Southern California ACL ’ 06.

Overview of Statistical NLP IR Group Meeting March 7, 2006.

Discourse: Structure and Coherence Kathy McKeown Thanks to Dan Jurafsky, Diane Litman, Andy Kehler, Jim Martin.

Discourse & Natural Language Generation Martin Hassel KTH NADA Royal Institute of Technology Stockholm

Recognizing Discourse Structure: Text Discourse & Dialogue CMSC October 16, 2006.

Instance Discovery and Schema Matching With Applications to Biological Deep Web Data Integration Tantan Liu, Fan Wang, Gagan Agrawal {liut, wangfa,

Discourse: Structure and Coherence Kathy McKeown Thanks to Dan Jurafsky, Diane Litman, Andy Kehler, Jim Martin.

Dan Roth University of Illinois, Urbana-Champaign 7 Sequential Models Tutorial on Machine Learning in Natural.

Mastering the Pipeline CSCI-GA.2590 Ralph Grishman NYU.

Discourse Analysis Natural Language Understanding basis

INAGO Project Automatic Knowledge Base Generation from Text for Interactive Question Answering.

Natural Language Processing (NLP)

Improving a Pipeline Architecture for Shallow Discourse Parsing

Levels of Linguistic Analysis

CSCI 5832 Natural Language Processing

Natural Language Processing (NLP)

Natural Language Processing (NLP)

Presentation transcript:

Natural Language Processing COMPSCI 423/723 Rohit Kate

Discourse Processing Reference: Jurafsky & Martin book, Chapter 21

Meaning in context Basic Steps of Natural Language Processing Sound waves Phonetics Words Syntactic processing Parses Semantic processing Meaning Dsicourse processing This is a conceptual pipeline, humans or computers may process multiple stages simultaneously

Discourse So far we always analyzed one sentence in isolation, syntactically and/or semantically Natural languages are spoken or written as a collection of sentences In general, a sentence cannot be understood in isolation: – Today was Jack's birthday. Penny and Janet went to the store. They were going to get presents. Janet decided to get a kite. "Don't do that," said Penny. "Jack has a kite. He will make you take it back.”

Discourse Discourse is a coherent structured group of sentences – Example: monologues (including reading passages), dialogues Very little work has been done in understanding beyond a sentence, i.e. understanding a whole paragraph or an entire document together Important tasks in processing a discourse – Discourse Segmentation – Determining Coherence Relations – Anaphora Resolution Ideally deep understanding is needed to do well on these tasks but so far shallow methods have been used

Discourse Segmentation Discourse Segmentation: Separating a document into a linear sequence of subtopics – For example: scientific articles are segmented into Abstract, Introduction, Methods, Results, Conclusions – This is often a simplification of a higher level structure of a discourse Applications of automatic discourse segmentation: – For Summarization: Summarize each segment separately – For Information Retrieval or Information Extraction: Apply to an appropriate segment Related task: Paragraph segmentation, for example of a speech transcript

Unsupervised Discourse Segmentation Given raw text, segment it into multiple paragraph subtopics Unsupervised: No training data is given for the task Cohesion-based approach: Segment into subtopics in which sentences/paragraphs are cohesive with each other; A dip is cohesion at subtopic boundaries

Cohesion Cohesion: Links between text units due to linguistic devices Lexical Cohesion: Use of same or similar words to link text units – Today was Jack's birthday. Penny and Janet went to the store. They were going to get presents. Janet decided to get a kite. "Don't do that," said Penny. "Jack has a kite. He will make you take it back.” Non-lexical Cohesion: For example, using anaphora

Cohesion-based Unsupervised Discourse Segmentation TextTiling algorithm (Hearst, 1997) – compare adjacent blocks of text – look for shifts in vocabulary Do pre-processing: Tokenization, remove stop words, stemming Divide text into pseudo-sentences of equal length (say 20 words)

Cohesion-based Unsupervised Discourse Segmentation TextTiling algorithm (Hearst, 1997) – compare adjacent blocks of text – look for shifts in vocabulary Do pre-processing: Tokenization, remove stop words, stemming Divide text into pseudo-sentences of equal length (say 20 words)

TextTiling Algorithm contd. Compute lexical cohesion score at each gap between pseudo-sentences Lexical cohesion score: Similarity of words before and after the gap (take say 10 pseudo- sentences before and 10 pseudo-sentences after) Similarity: Cosine similarity between the word vectors (high if words co-occur) Gap

TextTiling Algorithm contd. Compute lexical cohesion score at each gap between pseudo-sentences Lexical cohesion score: Similarity of words before and after the gap (take say 10 pseudo- sentences before and 10 pseudo-sentences after) Similarity: Cosine similarity between the word vectors (high if words co-occur) Gap Similarity

Plot the similarity and compute the depth scores of the “similarity valleys”, (a-b)+(c-b) Assign segmentation if the depth score is larger than a threshold (e.g. one standard deviation deeper than mean valley depth) TextTiling Algorithm contd. a b c valley

Plot the similarity and compute the depth scores of the “similarity valleys”, (a-b)+(c-b) Assign segmentation if the depth score is larger than a threshold (e.g. one standard deviation deeper than mean valley depth) TextTiling Algorithm contd.

From (Hearst, 1994)

Supervised Discourse Segmentation Easy to get supervised data for some segmentation tasks – For e.g., paragraph segmentation – Useful to find paragraphs in speech recognition output

Supervised Discourse Segmentation Easy to get supervised data for some segmentation tasks – For e.g., paragraph segmentation – Useful to find paragraphs in speech recognition output Model as a classification task: Classify if the sentence boundary is a paragraph boundary – Use any classifier SVM, Naïve Bayes, Maximum Entropy etc.

Supervised Discourse Segmentation Easy to get supervised data for some segmentation tasks – For e.g., paragraph segmentation – Useful to find paragraphs in speech recognition output Model as a classification task: Classify if the sentence boundary is a paragraph boundary – Use any classifier SVM, Naïve Bayes, Maximum Entropy etc. Or model as a sequence labeling task: Label a sentence boundary with “paragraph boundary” or “not a paragraph boundary label”

Features: – Use cohesion features: word overlap, word cosine similarity, anaphoras etc. – Additional features: Discourse markers or cue word Discourse marker or cue phrase/word: A word or phrase that signal discourse structure – For example, “good evening”, “joining us now” in broadcast news – “Coming up next” at the end of a segment, “Company Incorporated” at the beginning of a segment etc. – Either hand-code or automatically determine by feature selection Supervised Discourse Segmentation

Discourse Segmentation Evaluation Not a good idea to measure precision, recall and F- measure because that won’t be sensitive to near misses One good metric WindowDiff (Pevzner & Hearst, 2002) Slide a window of length k across the reference (correct) and the hypothesized segmentation and count the number of segmentation boundaries in each WindowDiff metric: Average difference in the number of boundaries in the sliding window

Text Coherence A collection of independent sentences do not make a discourse because they lack coherence Coherence: Meaning relation between two units of text; explains how the meaning of different units of text combine to build meaning of the larger unit (to contrast, cohesion is links between units) John hid Bill’s car keys. He was drunk. John hid Bill’s car keys. He likes spinach. Humans try to find coherence between sentences all the time Explanation ???

Coherence Relations Coherence Relations: Set of connections between units in a discourse. A few more such relations, Hobbs (1979): The Tin Woodman was caught in the rain. His joints rusted. The scarecrow wanted some brains. The Tin Woodman wanted a heart. Dorothy was from Kansas. She lived in the midst of the great Kansas prairies. Dorothy picked up the oil-can. She oiled the Tin Woodman’s joints. Result Parallel Elaboration Occasion

Discourse Structure Discourse Structure: The hierarchical structure of a discourse according to the coherence relations. John went to the bank to deposit his paycheck. He then took a train to Bill’s car dealership. He needed to buy a car. The company he works for now isn’t near any public transportation. He also wanted to talk to Bill about their softball league.

Discourse Structure Discourse Structure: The hierarchical structure of a discourse according to the coherence relations. Analogous to syntactic tree structure A node in a tree represents locally coherent sentences: discourse segment (not linear) Occasion Explanation Parallel Explanation John went to the bank to deposit his paycheck. He then took a train to Bill’s car dealership. He needed to buy a car.The company he works for now isn’t near any public transportation. He also wanted to talk to Bil l about their softball league.

Discourse Structure What are the uses of discourse structure? – Summarization systems may skip or merge the segment connected with Elaboration relation – Question-answering systems can search in segments with Explanation relations – Information extraction system need not merge information from segments not linked by relations – A semantic parser may build a larger meaning representation of the whole discourse

Discourse Parsing Coherence Relation Assignment: Automatically determining the coherence relations between units of a discourse Discourse Parsing: Automatically finding the discourse structure of an entire discourse Both are largely unsolved problems, but some shallow methods work to some degree, for example, using cue phrases (or discourse markers)

Automatic Coherence Assignment Shallow cue-phrase-based algorithm: 1.Identify cue phrases in a text 2.Segment text into discourse segments using cue phrases 3.Assign coherence relations between consecutive discourse segments

1. Identify Cue Phrases Phrases that signal discourse structure, e.g. “joining us now”, “coming up next” etc. Connectives: “because”, “although”, “example”, “with”, “and” However, their occurrence is not always indicative of discourse relation: they are ambiguous – With its distant orbit, Mars exhibits frigid weather conditions – We can see Mars with an ordinary telescope Use some simple heuristics, e.g. capitalization of with, etc. but in general use techniques similar to word sense disambiguation

2. Segment Text into Discourse Segments Usually sentences so may suffice to to sentence segmentation However, often clauses are more appropriate – With its distant orbit, Mars exhibits frigid weather conditions Use hand-written rules or utilize syntactic parses to get such segments Explanation

3. Classify Relation between Neighboring Segments Use rules based on the cue phrases and connectives – For example, a sentence beginning with “Because” indicates Explanation relation with the next segment Train classifiers using appropriate features

Drawback of Cue-phrase-based Algorithm Sometimes relations are not signaled by cue phrases but are implicit through syntax, words, negation etc.: – I don’t want a truck. I’d prefer a convertible. Difficult to encode such rules these manually or to get labeled training examples One solution: Automatically find easy examples with cue phrases then remove the cue phrases to generate difficult supervised training examples – I don’t want a truck although I’d prefer a convertible. Contrast

Drawback of Cue-phrase-based Algorithm Sometimes relations are not signaled by cue phrases but are implicit through syntax, words, negation etc.: – I don’t want a truck. I’d prefer a convertible. Difficult to encode such rules these manually or to get labeled training examples One solution: Automatically find easy examples with cue phrases then remove the cue phrases to generate difficult supervised training examples – I don’t want a truck. I’d prefer a convertible. Train using words, word pairs, POS tags, etc. as features Contrast

Penn Discourse Treebank Recently released corpus that is likely to lead to better systems for discourse processing Has coherence relations encoded associated with the discourse connectives Linked to the Penn Treebank

Reference Resolution Reference Resolution: The task of determining what entities are referred to by which linguistic expressions To understand any discourse it is necessary to know which entities are being talked about at which point Mr. Obama visited the city. The president talked about Milwaukee’s economy. He mentioned new jobs. – “Mr.Obama”, “The president” and “He” are referring expressions for referent “Barack Obama” and they corefer – Anaphora: When a referring expression refers to a previously introduced entity (antecedent), the referring expression is called anaphoric, e.g. “The president”, “He” – Cataphora: When a referring expression refers to an entity which is introduced later, the referring expression is called cataphoric, e.g. “the city”

Two Reference Resolution Tasks Coreference Resolution: The task of finding referring expressions that refer to the same entity, i.e. find coreference chain – In the previous example the coreference chains are: {Mr. Obama, The president, he}, {the city, Milwaukee’s} Pronominal Anaphora Resolution: The task of finding the antecedent for a single pronoun – In the previous example, “he” refers to “Mr. Obama” A lot of work has been done in these tasks in the last 15 or so years [Ng, 2010]

Supervised Pronominal Anaphora Resolution Given a pronoun and an entity mentioned earlier, classify whether the pronoun refers to that entity or not given the surrounding context First filter out pleonastic pronouns like “It is raining.” using hand-written rules Use any classifier, obtain positive examples from training data, generate negative examples by pairing each pronouns with other (incorrect) entities Mr. Obama visited the city. The president talked about Milwaukee’s economy. He mentioned new jobs. ? ? ?

Features for Pronominal Anaphora Resolution Constraints: – Number agreement Singular pronouns (it/he/she/his/her/him) refer to singular entities and plural pronouns (we/they/us/them) refer to plural entities – Person agreement He/she/they etc. must refer to a third person entity – Gender agreement He -> John; she -> Mary; it -> car – Certain syntactic constraints John bought himself a new car. [himself -> John] John bought him a new car. [him can not be John]

Features for Pronominal Anaphora Resolution Preferences: – Recency: More recently mentioned entities are more likely to be referred to John went to a movie. Jack went as well. He was not busy. – Grammatical Role: Entities in the subject position is more likely to be referred to than entities in the object position John went to a movie with Jack. He was not busy. – Parallelism: John went with Jack to a movie. Joe went with him to a bar.

Features for Pronominal Anaphora Resolution Preferences: – Verb Semantics: Certain verbs seem to bias whether the subsequent pronouns should be referring to their subjects or objects John telephoned Bill. He lost the laptop. John criticized Bill. He lost the laptop. – Selectional Restrictions: Restrictions because of semantics John parked his car in the garage after driving it around for hours. Encode all these and may be more as features

Coreference Resolution Can be done analogously to pronominal anaphora resolution: Given an anaphor and a potential antecedent, classify as true or false Some approaches also do clustering on the referring expressions instead of doing binary classification Additional features to incorporate aliases, variations in names etc., e.g. Mr. Obama, Barack Obama; Megabucks, Megabucks Inc.