Lexical chains for summarization a summary of Silber & McCoy’s work by Keith Trnka.

Slides:



Advertisements
Similar presentations
Q.
Advertisements

Example 2.2 Estimating the Relationship between Price and Demand.
Question Answer Relationships
1 Extended Gloss Overlaps as a Measure of Semantic Relatedness Satanjeev Banerjee Ted Pedersen Carnegie Mellon University University of Minnesota Duluth.
1 Discourse, coherence and anaphora resolution Lecture 16.
Chapter 7 – Classification and Regression Trees
Clauses.
1 Graphical Models in Data Assimilation Problems Alexander Ihler UC Irvine Collaborators: Sergey Kirshner Andrew Robertson Padhraic Smyth.
Automated rule Generation Maryam Mustafa Sarah Karim
Sentiment Lexicon Creation from Lexical Resources BIS 2011 Bas Heerschop Erasmus School of Economics Erasmus University Rotterdam
1 Noun Homograph Disambiguation Using Local Context in Large Text Corpora Marti A. Hearst Presented by: Heng Ji Mar. 29, 2004.
1 Complementarity of Lexical and Simple Syntactic Features: The SyntaLex Approach to S ENSEVAL -3 Saif Mohammad Ted Pedersen University of Toronto, Toronto.
Semantic Video Classification Based on Subtitles and Domain Terminologies Polyxeni Katsiouli, Vassileios Tsetsos, Stathes Hadjiefthymiades P ervasive C.
Online Learning Algorithms
Query session guided multi- document summarization THESIS PRESENTATION BY TAL BAUMEL ADVISOR: PROF. MICHAEL ELHADAD.
Dr. MaLinda Hill Advanced English C1-A Designing Essays, Research Papers, Business Reports and Reflective Statements.
Aiding WSD by exploiting hypo/hypernymy relations in a restricted framework MEANING project Experiment 6.H(d) Luis Villarejo and Lluís M à rquez.
Evaluating the Contribution of EuroWordNet and Word Sense Disambiguation to Cross-Language Information Retrieval Paul Clough 1 and Mark Stevenson 2 Department.
Lemmatization Tagging LELA /20 Lemmatization Basic form of annotation involving identification of underlying lemmas (lexemes) of the words in.
Genetic Regulatory Network Inference Russell Schwartz Department of Biological Sciences Carnegie Mellon University.
Automatic Lexical Annotation Applied to the SCARLET Ontology Matcher Laura Po and Sonia Bergamaschi DII, University of Modena and Reggio Emilia, Italy.
Example 5.8 Non-logistics Network Models | 5.2 | 5.3 | 5.4 | 5.5 | 5.6 | 5.7 | 5.9 | 5.10 | 5.10a a Background Information.
A Semantic Approach to IE Pattern Induction Mark Stevenson and Mark Greenwood Natural Language Processing Group University of Sheffield, UK.
1 Statistical Parsing Chapter 14 October 2012 Lecture #9.
Scott Duvall, Brett South, Stéphane Meystre A Hands-on Introduction to Natural Language Processing in Healthcare Annotation as a Central Task for Development.
Ontology-Driven Automatic Entity Disambiguation in Unstructured Text Jed Hassell.
Chapter 9 – Classification and Regression Trees
Copyright © Cengage Learning. All rights reserved. CHAPTER 5 Extending the Number System.
Databases From A to Boyce Codd. What is a database? It depends on your point of view. For Manovich, a database is a means of structuring information in.
Intelligent Database Systems Lab Presenter : WU, MIN-CONG Authors : Jorge Villalon and Rafael A. Calvo 2011, EST Concept Maps as Cognitive Visualizations.
Incident Threading for News Passages (CIKM 09) Speaker: Yi-lin,Hsu Advisor: Dr. Koh, Jia-ling. Date:2010/06/14.
Annotating Words using WordNet Semantic Glosses Julian Szymański Department of Computer Systems Architecture, Faculty of Electronics, Telecommunications.
Chapter © 2015 Cengage Learning. All Rights Reserved. May not be scanned, copied or duplicated, or posted to a publicly accessible website, in whole or.
This work is supported by the Intelligence Advanced Research Projects Activity (IARPA) via Department of Interior National Business Center contract number.
WORD SENSE DISAMBIGUATION STUDY ON WORD NET ONTOLOGY Akilan Velmurugan Computer Networks – CS 790G.
Word Sense Disambiguation in Queries Shaung Liu, Clement Yu, Weiyi Meng.
Defining Procedures for Decision Analysis May & Engr A April 30, 2002 Client & Faculty Advisors –Dr. Keith Adams –Dr. John Lamont –Dr. Ralph.
Efficiently Computed Lexical Chains As an Intermediate Representation for Automatic Text Summarization H.G. Silber and K.F. McCoy University of Delaware.
Formal Semantics Chapter Twenty-ThreeModern Programming Languages, 2nd ed.1.
1 Learning Sub-structures of Document Semantic Graphs for Document Summarization 1 Jure Leskovec, 1 Marko Grobelnik, 2 Natasa Milic-Frayling 1 Jozef Stefan.
Designing software architectures to achieve quality attribute requirements F. Bachmann, L. Bass, M. Klein and C. Shelton IEE Proceedings Software Tzu-Chin.
Unsupervised Word Sense Disambiguation REU, Summer, 2009.
Indirect Supervision Protocols for Learning in Natural Language Processing II. Learning by Inventing Binary Labels This work is supported by DARPA funding.
Wikipedia as Sense Inventory to Improve Diversity in Web Search Results Celina SantamariaJulio GonzaloJavier Artiles nlp.uned.es UNED,c/Juan del Rosal,
Statistics : Statistical Inference Krishna.V.Palem Kenneth and Audrey Kennedy Professor of Computing Department of Computer Science, Rice University 1.
1/21 Automatic Discovery of Intentions in Text and its Application to Question Answering (ACL 2005 Student Research Workshop )
Methods for Automatic Evaluation of Sentence Extract Summaries * G.Ravindra +, N.Balakrishnan +, K.R.Ramakrishnan * Supercomputer Education & Research.
For Monday Read chapter 24, sections 1-3 Homework: –Chapter 23, exercise 8.
For Friday Finish chapter 24 No written homework.
For Monday Read chapter 26 Last Homework –Chapter 23, exercise 7.
Using Semantic Relatedness for Word Sense Disambiguation
For Friday Finish chapter 23 Homework –Chapter 23, exercise 15.
National Research Council Of the National Academies
Transitions Bridges between ideas and supporting points.
Support Vector Machines and Kernel Methods for Co-Reference Resolution 2007 Summer Workshop on Human Language Technology Center for Language and Speech.
© 2007 Pearson Education, Inc. publishing as Longman Publishers Efficient and Flexible Reading, 8/e by Kathleen T. McWhorter Chapter 7: Techniques for.
Acquisition of Categorized Named Entities for Web Search Marius Pasca Google Inc. from Conference on Information and Knowledge Management (CIKM) ’04.
A Patent Document Retrieval System Addressing Both Semantic and Syntactic Properties Liang Chen*,Naoyuki Tokuda+, Hisahiro Adachi+ *University of Northern.
Annotation Framework & ImageCLEF 2014 JAN BOTOREK, PETRA BUDÍKOVÁ
Event-Based Extractive Summarization E. Filatova and V. Hatzivassiloglou Department of Computer Science Columbia University (ACL 2004)
An evolutionary approach for improving the quality of automatic summaries Constantin Orasan Research Group in Computational Linguistics School of Humanities,
For Monday Read chapter 26 Homework: –Chapter 23, exercises 8 and 9.
Reinforcement Learning for Mapping Instructions to Actions S.R.K. Branavan, Harr Chen, Luke S. Zettlemoyer, Regina Barzilay Computer Science and Artificial.
Hierarchical clustering approaches for high-throughput data Colin Dewey BMI/CS 576 Fall 2015.
Text Summarization using Lexical Chains. Summarization using Lexical Chains Summarization? What is Summarization? Advantages… Challenges…
Expanding the Notion of Links DeRose, S.J. Expanding the Notion of Links. In Proceedings of Hypertext ‘89 (Nov. 5-8, Pittsburgh, PA). ACM, New York, 1989,
Automatic Writing Evaluation
Linguistic Graph Similarity for News Sentence Searching
Web News Sentence Searching Using Linguistic Graph Similarity
WordNet WordNet, WSD.
Presentation transcript:

Lexical chains for summarization a summary of Silber & McCoy’s work by Keith Trnka

Outline Lexical chains Silber & McCoy’s algorithm for computing lexical chains Silber & McCoy’s evaluation of lexical chains for text summarization

Lexical chains Intuitively, lexical chains represent a recurring noun concept within a document that may be referred to by different nouns* Thus, it deals with the problem of word sense disambiguation (WSD) and other problems –WSD - given a word and multiple meanings for the word, choose the correct meaning (sense) * to my knowledge this has not been proven; it is only an intuition

Lexical chains Manually annotated example (from Barzilay & Elhadad ‘97 Appendices A, B): When Microsoft Senior Vice President Steve Ballmer first heard his company was planning to make a huge investment in an Internet service offering movie reviews and local entertainment information in major cities across the nation, he went to Chairman Bill Gates with his concerns. After all, Ballmer has billions of dollars of his own money in Microsoft stock, and entertainment isn't exactly the company's strong point. But Gates dismissed such reservations. Microsoft's competitive advantage, he responded, was its expertise in Bayesian networks. Asked recently when computers would finally begin to understand human speech, Gates began discussing the critical role of "Bayesian" systems. Ask any other software executive about anything Bayesian and you're liable to get a blank stare. Is Gates onto something? Is this alien-sounding technology Microsoft's new secret weapon? Bayesian networks are complex diagrams that organize the body of knowledge in any given area by mapping out cause-and-effect relationships among key variables and encoding them with numbers that represent the extent to which one variable is likely to affect another. Programmed into computers, these systems can automatically generate optimal predictions or decisions even when key pieces of information are missing.

Lexical chains How to compute them? An interpretation is a set of lexical chains in which each noun occurrence is mapped to exactly one chain Could generate all possible interpretations and find the “best” one –the one with the lengthiest lexical chains

Silber & McCoy’s algorithm Problem: all possible interpretations is exponential, takes too long Solution: don’t compute them explicitly

Silber & McCoy’s algorithm Each lexical chain is associated with a concept and each concept can be mapped to a core meaning (word sense) Implicitly build all possible chains for each sense in WordNet –note: senses in WordNet are represented as synonym sets (synsets) –called meta-chains –a single meta-chain represents all possible lexical chains for that core meaning

Silber & McCoy’s algorithm Given a meta-chain identified by a word sense S and the next noun in the input, for each sense NS of the noun, add it if: –they have the same sense –they are synonyms –S is a kind of NS (NS is a hypernym of S) –NS is a kind of S (NS is a hyponym of S) these last two are computed from WordNet; it has a tree of “is a” relationships

Silber & McCoy’s algorithm How do we identify nouns in the input? –use a part of speech (POS) tagger before computing lexical chains

Silber & McCoy’s algorithm How do we find the best interpretation? –for each noun occurrence in the document, find which chain it “fits” best in use the notion of giving a score to each chain by giving a score to each link in the chain and summing them the score of each noun occurrence in a chain is the score of the strongest backward-looking relationship to other words in the chain

Silber & McCoy’s algorithm –the score of each noun occurrence in a chain is the score of the strongest backward-looking relationship to other words in the chain the first one in the chain is given maximal score other scores are computed based on: –relationship type »same word, synonym, hypernym, sibling in the hypernym/hyponym tree –distance »1 sentence, 2 sentences, same paragraph, etc.

Silber & McCoy’s algorithm Given the constant nature of WordNet and implementation details, computation of the best interpretation is linear in the number of nouns in the source document.

Silber & McCoy’s algorithm So what do we do with our interpretation? –compute the most important chains: strong chains (Barzilay & Elhadad ‘97) chains with a score at least two standard deviations above the mean score And what do we do with our strong chains? –generate a summary!

Lexical chains (cont’d) Barzilay & Elhadad: –One method is to find the first sentence referring to each strong chain and include them (in order) in the summary –Another is to to find the first sentence using only the representative nouns in a lexical chain representative nouns are noun senses in a lexical chain that best identify the chain

Silber & McCoy’s evaluation How to evaluate the lexical chain algorithm’s potential for summarization? –We could generate summaries and have humans evaluate them, but that would mingle evaluation of our summary algorithm and lexical chains algorithm.

Silber & McCoy’s evaluation For background, consider the summarization process: Source document Intermediate representation Summary

Silber & McCoy’s evaluation A summary should contain the important points of the document Therefore, it will share properties with the document –the cosine distance between term vectors might be low, for instance

Silber & McCoy’s evaluation Idea: convert both to the intermediate representation and compare the representations Source document Intermediate representation Summary

Silber & McCoy’s evaluation To evaluate lexical chains, we compute two similarity measures: –percentage of noun senses in the summary that are in the strong lexical chains of the document –percentage lexical chains of the document that have a noun sense in the summary –note: noun senses in the summary are determined by the lexical chains algorithm

Silber & McCoy’s evaluation Results –24 documents with summaries were used varied document type (paper, book chapter) and discipline (computer science, anthropology, etc) –percentage of noun senses in the summary that are in the strong lexical chains of the document about 80% –percentage lexical chains of the document that have a noun sense in the summary about 80%

Silber & McCoy’s evaluation Results (cont’d) –two documents in particular scored very poorly without them the percentages were both about 85% –Why? I’ll hopefully figure it out later, but here are some deficiencies that I would like to look into: –pronoun resolution –non-noun concepts (or at least words that aren’t tagged as nouns) –noun compounds (for example, Monte Carlo algorithms)

References / FYI Barzilay R. and M. Elhadad Using Lexical Chains for Text Summarization. In Proceedings of the Intelligent Scalable Text Summarization Workshop, Madrid, Spain, ISTS ’97. –Obtained from CiteSeer ( Morris, J. and G. Hirst Lexical cohesion computed by thesaural relations as an indicator of the structure of text. Computational Linguistics, 18: –Original lexical chains work Silber, G. and K. McCoy Efficiently Computed Lexical Chains As an Intermediate Representation for Automatic Text Summarization. Computational Linguistics. –Obtained via McCoy’s website (