Extractive Summarization using Inter- and Intra- Event Relevance Wenjie li, Wei Xu, Mingli Wu, Chunfa Yuan, Qin Lu H.K. Polytechnic Univ. & Tsinghua Univ.,

Slides:



Advertisements
Similar presentations
A Human-Centered Computing Framework to Enable Personalized News Video Recommendation (Oh Jun-hyuk)
Advertisements

Chapter 5: Introduction to Information Retrieval
Towards Twitter Context Summarization with User Influence Models Yi Chang et al. WSDM 2013 Hyewon Lim 21 June 2013.
Comparing Twitter Summarization Algorithms for Multiple Post Summaries David Inouye and Jugal K. Kalita SocialCom May 10 Hyewon Lim.
Processing of large document collections Part 6 (Text summarization: discourse- based approaches) Helena Ahonen-Myka Spring 2006.
Query Dependent Pseudo-Relevance Feedback based on Wikipedia SIGIR ‘09 Advisor: Dr. Koh Jia-Ling Speaker: Lin, Yi-Jhen Date: 2010/01/24 1.
Information Retrieval in Practice
Query Operations: Automatic Local Analysis. Introduction Difficulty of formulating user queries –Insufficient knowledge of the collection –Insufficient.
Semantic text features from small world graphs Jure Leskovec, IJS + CMU John Shawe-Taylor, Southampton.
Gimme’ The Context: Context- driven Automatic Semantic Annotation with CPANKOW Philipp Cimiano et al.
Predicting the Semantic Orientation of Adjectives
Designing clustering methods for ontology building: The Mo’K workbench Authors: Gilles Bisson, Claire Nédellec and Dolores Cañamero Presenter: Ovidiu Fortu.
Cover Coefficient based Multidocument Summarization CS 533 Information Retrieval Systems Özlem İSTEK Gönenç ERCAN Nagehan PALA.
Affinity Rank Yi Liu, Benyu Zhang, Zheng Chen MSRA.
A Framework for Named Entity Recognition in the Open Domain Richard Evans Research Group in Computational Linguistics University of Wolverhampton UK
Overview of Search Engines
Query session guided multi- document summarization THESIS PRESENTATION BY TAL BAUMEL ADVISOR: PROF. MICHAEL ELHADAD.
Mining and Summarizing Customer Reviews
AQUAINT Kickoff Meeting – December 2001 Integrating Robust Semantics, Event Detection, Information Fusion, and Summarization for Multimedia Question Answering.
«Tag-based Social Interest Discovery» Proceedings of the 17th International World Wide Web Conference (WWW2008) Xin Li, Lei Guo, Yihong Zhao Yahoo! Inc.,
Challenges in Information Retrieval and Language Modeling Michael Shepherd Dalhousie University Halifax, NS Canada.
Extracting Key Terms From Noisy and Multi-theme Documents Maria Grineva, Maxim Grinev and Dmitry Lizorkin Institute for System Programming of RAS.
A Compositional Context Sensitive Multi-document Summarizer: Exploring the Factors That Influence Summarization Ani Nenkova, Stanford University Lucy Vanderwende,
2007. Software Engineering Laboratory, School of Computer Science S E Towards Answering Opinion Questions: Separating Facts from Opinions and Identifying.
Chapter 2 Architecture of a Search Engine. Search Engine Architecture n A software architecture consists of software components, the interfaces provided.
Automatic Detection of Tags for Political Blogs Khairun-nisa Hassanali Vasileios Hatzivassiloglou The University.
Incident Threading for News Passages (CIKM 09) Speaker: Yi-lin,Hsu Advisor: Dr. Koh, Jia-ling. Date:2010/06/14.
Annotating Words using WordNet Semantic Glosses Julian Szymański Department of Computer Systems Architecture, Faculty of Electronics, Telecommunications.
LexRank: Graph-based Centrality as Salience in Text Summarization
A Machine Learning Approach to Sentence Ordering for Multidocument Summarization and Its Evaluation D. Bollegala, N. Okazaki and M. Ishizuka The University.
Latent Semantic Analysis Hongning Wang Recap: vector space model Represent both doc and query by concept vectors – Each concept defines one dimension.
Chapter 6: Information Retrieval and Web Search
A progressive sentence selection strategy for document summarization Presenter : Bo-Sheng Wang Authors: You Quyang, Wenjie Li, Renxian Zhang, Qin Lu IPM,
Date: 2012/4/23 Source: Michael J. Welch. al(WSDM’11) Advisor: Jia-ling, Koh Speaker: Jiun Jia, Chiou Topical semantics of twitter links 1.
LexPageRank: Prestige in Multi- Document Text Summarization Gunes Erkan and Dragomir R. Radev Department of EECS, School of Information University of Michigan.
Binxing Jiao et. al (SIGIR ’10) Presenter : Lin, Yi-Jhen Advisor: Dr. Koh. Jia-ling Date: 2011/4/25 VISUAL SUMMARIZATION OF WEB PAGES.
Deriving Event Relevance from the Ontology Constructed with Formal Concept Analysis Wei XU
1 Web-Page Summarization Using Clickthrough Data* JianTao Sun, Yuchang Lu Dept. of Computer Science TsingHua University Beijing , China Dou Shen,
Part4 Methodology of Database Design Chapter 07- Overview of Conceptual Database Design Lu Wei College of Software and Microelectronics Northwestern Polytechnical.
Event-Centric Summary Generation Lucy Vanderwende, Michele Banko and Arul Menezes One Microsoft Way, WA, USA DUC 2004.
Minimally Supervised Event Causality Identification Quang Do, Yee Seng, and Dan Roth University of Illinois at Urbana-Champaign 1 EMNLP-2011.
Gene Clustering by Latent Semantic Indexing of MEDLINE Abstracts Ramin Homayouni, Kevin Heinrich, Lai Wei, and Michael W. Berry University of Tennessee.
Department of Software and Computing Systems Research Group of Language Processing and Information Systems The DLSIUAES Team’s Participation in the TAC.
2015/12/121 Extracting Key Terms From Noisy and Multi-theme Documents Maria Grineva, Maxim Grinev and Dmitry Lizorkin Proceeding of the 18th International.
Using a Named Entity Tagger to Generalise Surface Matching Text Patterns for Question Answering Mark A. Greenwood and Robert Gaizauskas Natural Language.
1 Masters Thesis Presentation By Debotosh Dey AUTOMATIC CONSTRUCTION OF HASHTAGS HIERARCHIES UNIVERSITAT ROVIRA I VIRGILI Tarragona, June 2015 Supervised.
A New Multi-document Summarization System Yi Guo and Gorge Stylios Heriot-Watt University, Scotland, U.K. (DUC2003)
Information Retrieval Chapter 2 by Rajendra Akerkar, Pawan Lingras Presented by: Xxxxxx.
Probabilistic Text Structuring: Experiments with Sentence Ordering Mirella Lapata Department of Computer Science University of Sheffield, UK (ACL 2003)
1 Adaptive Subjective Triggers for Opinionated Document Retrieval (WSDM 09’) Kazuhiro Seki, Kuniaki Uehara Date: 11/02/09 Speaker: Hsu, Yu-Wen Advisor:
Finding document topics for improving topic segmentation Source: ACL2007 Authors: Olivier Ferret (18 route du Panorama, BP6) Reporter:Yong-Xiang Chen.
Discovering Relations among Named Entities from Large Corpora Takaaki Hasegawa *, Satoshi Sekine 1, Ralph Grishman 1 ACL 2004 * Cyberspace Laboratories.
Event-Based Extractive Summarization E. Filatova and V. Hatzivassiloglou Department of Computer Science Columbia University (ACL 2004)
LexPageRank: Prestige in Multi-Document Text Summarization Gunes Erkan, Dragomir R. Radev (EMNLP 2004)
Multilingual Information Retrieval using GHSOM Hsin-Chang Yang Associate Professor Department of Information Management National University of Kaohsiung.
Semantic Grounding of Tag Relatedness in Social Bookmarking Systems Ciro Cattuto, Dominik Benz, Andreas Hotho, Gerd Stumme ISWC 2008 Hyewon Lim January.
An evolutionary approach for improving the quality of automatic summaries Constantin Orasan Research Group in Computational Linguistics School of Humanities,
MMM2005The Chinese University of Hong Kong MMM2005 The Chinese University of Hong Kong 1 Video Summarization Using Mutual Reinforcement Principle and Shot.
Annotating and measuring Temporal relations in texts Philippe Muller and Xavier Tannier IRIT,Université Paul Sabatier COLING 2004.
Instance Discovery and Schema Matching With Applications to Biological Deep Web Data Integration Tantan Liu, Fan Wang, Gagan Agrawal {liut, wangfa,
Text Summarization using Lexical Chains. Summarization using Lexical Chains Summarization? What is Summarization? Advantages… Challenges…
NTNU Speech Lab 1 Topic Themes for Multi-Document Summarization Sanda Harabagiu and Finley Lacatusu Language Computer Corporation Presented by Yi-Ting.
An Adaptive User Profile for Filtering News Based on a User Interest Hierarchy Sarabdeep Singh, Michael Shepherd, Jack Duffy and Carolyn Watters Web Information.
A Survey on Automatic Text Summarization Dipanjan Das André F. T. Martins Tolga Çekiç
Trends in NL Analysis Jim Critz University of New York in Prague EurOpen.CZ 12 December 2008.
COP Introduction to Database Structures
INAGO Project Automatic Knowledge Base Generation from Text for Interactive Question Answering.
Presented by: Prof. Ali Jaoua
Block Matching for Ontologies
Semantic Similarity Methods in WordNet and their Application to Information Retrieval on the Web Yizhe Ge.
Presentation transcript:

Extractive Summarization using Inter- and Intra- Event Relevance Wenjie li, Wei Xu, Mingli Wu, Chunfa Yuan, Qin Lu H.K. Polytechnic Univ. & Tsinghua Univ., CHINA

2 Introduction How to define concepts How to judge the importance of the concepts Extract concepts from documents (e.g. keywords, entities, events) Identify the “most important” concepts Create summaries by selecting sentences according to what concepts they contain Extractive Summarization

3 Introduction (Cont’d) Motivation Events contain important information Documents describe more than one similar or related event Most existing event-based summarization explore the importance of the events independently, or need syntactic analysis. What we suggest Semi-structure event extracted with shallow NLP Event-relevance based summarization Determine salient concepts using event relevance with graph ranking algorithm What sorts of event relevance are better in what case Intra-event relevance (direct relationship) Inter-event relevance (indirect relationship)

4 Related Work Event-based Summarization Judge topic as sub-events by human. Determine sentence relevance to each sub-event (Daniel et al., 2003) Atomic events, based on co-occurrence statistics of named entity relations. (Filatova and Hatzivassiloglou, 2004) Employ distribution of discourse entities to improve summary coherence (Barzilay and Lapata, 2005) Event-centric method. Need syntactic analysis of sentence (Vanderwende, 2004) Summarization with graph ranking algorithm ( e.g. PageRank) Sentence similarity according to term vectors (Mihalcea, 2005) Sentence are linked if they share similar events (Yoshioka and Haraguchi, 2004) Importance of the verbs and nouns constructing events was weighted as individual nodes. Need syntactic analysis (Vanderwende, 2004; Leskovec, 2004)

5 Event Definition Comments Events are collections of activities together with associated entities It’s more appropriate to consider events at sentence level, rather than document level Not all verbs denote event happening Semantic similarity or relatedness between action words should be taken into account Solution Semi-structure event: Take advantages of statistical techniques from the IR community and structured information from the IE community Avoid the complexity of deep semantic and syntactic processing

6 Event Definition (Cont’d) Who did What to Whom When and Where Event = event term + associated named entities Definition: event term Verbs and action nouns appearing at least once between two named entities Characterize actions or incident occurrences Roughly relate to “did What” Definition: associated named entities Named entities connect to event term 4 types of named entities + high frequency nouns: Person, Organization, Location, Date Convey the information of “Who”, “Whom”, “When” and “Where”

7 Event Map Events are related with one another semantically, temporally, spatially, causally or conditionally Graph structure Nodes: event terms (ET) & named entities (NE) Words in either their original form or morphological variations are represented with a single node regardless of how many times they appear Represent concepts rather than instances Advantages: (vs. event/sentence node) − Be convenient to analyze the relevance among event terms and named entities either by semantic or distributional similarity − Allow concept extraction, further conceptual compression Links: undirected All event terms and named entities involved can be explicitly or implicitly related

8 Event Map (Cont’d)– An Example Segment of a text from DUC2004 on “antitrust case against Microsoft” S1: The Justice Department and the 20 states suing Microsoft believe that the tape will strengthen their case because it shows Gates saying he was not involved in plans to take what the government alleges were illegal steps to stifle competition in the Internet software market. S2: It showed a few brief clips of a point in the deposition when Gates was asked about a meeting on June 21, 1995, at which, the government alleges, Microsoft offered to divide the browser market with Netscape and to make an investment in the company, which is its chief rival in that market. S3: In the taped deposition, Gates says he recalled being asked by one of his subordinates whether he thought it made sense to invest in Netscape. S4: But in an on May 31, 1995, Gates urged an alliance with Netscape. S5: The contradiction between Gates' deposition and his , though, does not of itself speak to the issue of whether Microsoft made an illegal offer to Netscape.

9 Event Map (Cont’d) – An Example event term (ET) named entity (NE)

10 Event Map (Cont’d) Weighted graph We integrate the strength of the connections between nodes into event graph model in terms of the relevance defined from different perspectives Relevance between nodes: PageRank – graph ranking algorithm To calculate the significance of node according to and the structure of graph Focus of our research How to derive according to intra- or/and inter- event relevance

11 Intra- & Inter- Event Relevance Relevance Matrix matrix element - relevance between nodes (ETs, NEs) Event Term (ET)Named Entity (NE) Event Term (ET) R(ET, ET)R(ET, NE) Named Entity (NE) R(NE, ET)R(NE, NE)

12 Intra- & Inter- Event Relevance (Cont’d) Relevance Matrix Intra-event relevance Direct relevance, explicit in the text (event map) To measure connections between actions and arguments Symmetry Event Term (ET)Named Entity (NE) Event Term (ET) R(ET, ET)R(ET, NE) Named Entity (NE) R(NE, ET)R(NE, NE)

13 Intra- & Inter- Event Relevance (Cont’d) Relevance Matrix Inter-event relevance Indirect relevance, need to be derived from external resource or overall event distribution To measure how an event term/named entity connect to another event term/named entity Event Term (ET)Named Entity (NE) Event Term (ET) R(ET, ET)R(ET, NE) Named Entity (NE) R(NE, ET)R(NE, NE)

14 Intra- & Inter- Event Relevance (Cont’d) How to determine Intra-event relevance Intra-event relevance can be simply established by counting how many times and are associated

15 Intra- & Inter- Event Relevance (Cont’d) How to determine Inter-event relevance Event term relevance – R(ET, ET) 1.Semantice relevance from WordNet − We use WordNet::Similarity to measure the relatedness of concept (event terms in our case) and choose lesk metric. − 2. Topic-specific relevance from documents − Assumption: if 2 events are concerned with the same participant, location or time, these 2 events are interrelated with each other in some ways − Event term relevance then can be derived from the number of named entities they share. −

16 Intra- & Inter- Event Relevance (Cont’d) How to determine Inter-event relevance Named entities relevance – R(NE, NE) 1. Named entity relevance from documents − Named entity relevance then can be derived from the number of event terms they share − 2. Named entity relevance from clustering − We proposed a clustering algorithm based on words − LocationPersonDateOrganization MississippiProfessor Sir Richard Southwood First six months of last year Long Beach City Council Mississippi River Sir Richard Southwood Last yearSan Jose City Council Richard SouthwoodCity Council

17 Intra- & Inter- Event Relevance (Cont’d) How to determine Inter-event relevance Named entities relevance - R(NE, NE) 3. Named entity relevance from sentence pattern − Named entity relevance then can be revealed by sentence context. − Example of Sentence patterns − Window-based: Neighboring named entities are usually relevant −, a-position-name of, dose something. and another do something.

18 Extractive Summarization System Overview Recognize 4 types of named entities, nouns, verbs by GATE Determine event terms (w/ stem, w/o stop-word) Extract events Derive event relevance Determine salience of concept with PageRank Select out sentences containing salient concept

19 Evaluation Task: Automatic text summarization Data: DUC 2001 multi-document summarization task 30 English document sets Summary of 50, 100, 200, 400 word length Each set includes 10.3 documents, 602 sentences, 216 event terms, named entities Evaluation Metric: ROUGE Automatic evaluation Based on N-gram co-occurrence Comparing with human judgments Method: To focus on efficiency and potential of event-relevance based approach Without other features, such as sentence position, headline, publication dates, etc. Simple greedy strategy to extract the most salient sentences. Without avoidance of sentence redundancy, only remove same sentences

20 Evaluation(1) Intra-event relevance – R(ET, NE) & R(NE, ET) High Frequency Noun Some frequently occurring nouns, such as “hurricane”, “euro”, are not marked by general NE taggers. But they indicate persons, organizations, locations or important objects. A noun is considers as a frequent noun when its frequency is larger than 10. 5% improvement with high frequency nouns R(ET,NE)NE w/ High Frequency Nouns NE w/o High Frequency Nouns ROUGE ROUGE ROUGE-W

21 Evaluation(2) Inter-event relevance R(ET, ET) & R(NE, NE) R(ET, ET) 2 approaches: − Semantic relevance from Word-Net − Topic-specific relevance from document number of ET’s shared named entities Example result of event terms pairs with highest relevance “abort”-”confirm” (semantics, antonymous) “vote”- “confirm” (associated, causal) “Document” outperforms “WordNet” by 4% Reason: WordNet may introduce non-necessary relatedness in the topic-specific documents R(ET,ET)Semantic relevance from Word-Net Topic-specific relevance from document ROUGE ROUGE ROUGE-W

22 Evaluation(3) Inter-event relevance R(ET, ET) & R(NE, NE) R(NE, NE) Example result of named entities pairs with highest relevance “Louisiana”-”Florida” (something may happen in both places) “Florida”- “Andrew” (may happen about Andrew in Florida) Best result is “Document”- derive NE relevance according to the numbers of shared event terms Reason: relevance derived from clustering and neighborhoods can also be discovered by R(NE,NE) Relevance from Documents Relevance from Clustering Relevance from Window-based Context ROUGE ROUGE ROUGE-W

23 Evaluation(4) Event relevance Integration of R(ET, NE), R(ET, ET) and R(NE, NE) Different length of summary: 50,100,200,400 words Baseline: “Event-based Extractive Summarization”, Elena Filatova & Vasileios Hatzivassiloglou, 2004 DUC-2001, 200 words summary, ROUGE-1 about 0.3 Significant improvement comparing with baseline Event-based approaches prefer longer summaries ROUGE R(NE,NE) R(ET,NE) R(ET,ET) R(ET,NE)+R(ET, ET)+ R(NE,NE)

24 Conclusion & Future Work Extract event according to actions and associated named entities Use event to denote concept Use Inter- and Intra- event relevance to rank event events-based summarizer gives good performance on news documents Improve event representation to build a more powerful event-base summarization system Text compression technique based on concept What features of a document set preferring event-based approaches (beyond news domain) Influence of IE performance(e.g. POS tagger, NE tagger)

25 Thank You Very Much Do you have any questions?