Download presentation
Presentation is loading. Please wait.
Published byCory Fox Modified over 8 years ago
1
Extractive Summarization using Inter- and Intra- Event Relevance Wenjie li, Wei Xu, Mingli Wu, Chunfa Yuan, Qin Lu H.K. Polytechnic Univ. & Tsinghua Univ., CHINA
2
2 Introduction How to define concepts How to judge the importance of the concepts Extract concepts from documents (e.g. keywords, entities, events) Identify the “most important” concepts Create summaries by selecting sentences according to what concepts they contain Extractive Summarization
3
3 Introduction (Cont’d) Motivation Events contain important information Documents describe more than one similar or related event Most existing event-based summarization explore the importance of the events independently, or need syntactic analysis. What we suggest Semi-structure event extracted with shallow NLP Event-relevance based summarization Determine salient concepts using event relevance with graph ranking algorithm What sorts of event relevance are better in what case Intra-event relevance (direct relationship) Inter-event relevance (indirect relationship)
4
4 Related Work Event-based Summarization Judge topic as sub-events by human. Determine sentence relevance to each sub-event (Daniel et al., 2003) Atomic events, based on co-occurrence statistics of named entity relations. (Filatova and Hatzivassiloglou, 2004) Employ distribution of discourse entities to improve summary coherence (Barzilay and Lapata, 2005) Event-centric method. Need syntactic analysis of sentence (Vanderwende, 2004) Summarization with graph ranking algorithm ( e.g. PageRank) Sentence similarity according to term vectors (Mihalcea, 2005) Sentence are linked if they share similar events (Yoshioka and Haraguchi, 2004) Importance of the verbs and nouns constructing events was weighted as individual nodes. Need syntactic analysis (Vanderwende, 2004; Leskovec, 2004)
5
5 Event Definition Comments Events are collections of activities together with associated entities It’s more appropriate to consider events at sentence level, rather than document level Not all verbs denote event happening Semantic similarity or relatedness between action words should be taken into account Solution Semi-structure event: Take advantages of statistical techniques from the IR community and structured information from the IE community Avoid the complexity of deep semantic and syntactic processing
6
6 Event Definition (Cont’d) Who did What to Whom When and Where Event = event term + associated named entities Definition: event term Verbs and action nouns appearing at least once between two named entities Characterize actions or incident occurrences Roughly relate to “did What” Definition: associated named entities Named entities connect to event term 4 types of named entities + high frequency nouns: Person, Organization, Location, Date Convey the information of “Who”, “Whom”, “When” and “Where”
7
7 Event Map Events are related with one another semantically, temporally, spatially, causally or conditionally Graph structure Nodes: event terms (ET) & named entities (NE) Words in either their original form or morphological variations are represented with a single node regardless of how many times they appear Represent concepts rather than instances Advantages: (vs. event/sentence node) − Be convenient to analyze the relevance among event terms and named entities either by semantic or distributional similarity − Allow concept extraction, further conceptual compression Links: undirected All event terms and named entities involved can be explicitly or implicitly related
8
8 Event Map (Cont’d)– An Example Segment of a text from DUC2004 on “antitrust case against Microsoft” S1: The Justice Department and the 20 states suing Microsoft believe that the tape will strengthen their case because it shows Gates saying he was not involved in plans to take what the government alleges were illegal steps to stifle competition in the Internet software market. S2: It showed a few brief clips of a point in the deposition when Gates was asked about a meeting on June 21, 1995, at which, the government alleges, Microsoft offered to divide the browser market with Netscape and to make an investment in the company, which is its chief rival in that market. S3: In the taped deposition, Gates says he recalled being asked by one of his subordinates whether he thought it made sense to invest in Netscape. S4: But in an e-mail on May 31, 1995, Gates urged an alliance with Netscape. S5: The contradiction between Gates' deposition and his e-mail, though, does not of itself speak to the issue of whether Microsoft made an illegal offer to Netscape.
9
9 Event Map (Cont’d) – An Example event term (ET) named entity (NE)
10
10 Event Map (Cont’d) Weighted graph We integrate the strength of the connections between nodes into event graph model in terms of the relevance defined from different perspectives Relevance between nodes: PageRank – graph ranking algorithm To calculate the significance of node according to and the structure of graph Focus of our research How to derive according to intra- or/and inter- event relevance
11
11 Intra- & Inter- Event Relevance Relevance Matrix matrix element - relevance between nodes (ETs, NEs) Event Term (ET)Named Entity (NE) Event Term (ET) R(ET, ET)R(ET, NE) Named Entity (NE) R(NE, ET)R(NE, NE)
12
12 Intra- & Inter- Event Relevance (Cont’d) Relevance Matrix Intra-event relevance Direct relevance, explicit in the text (event map) To measure connections between actions and arguments Symmetry Event Term (ET)Named Entity (NE) Event Term (ET) R(ET, ET)R(ET, NE) Named Entity (NE) R(NE, ET)R(NE, NE)
13
13 Intra- & Inter- Event Relevance (Cont’d) Relevance Matrix Inter-event relevance Indirect relevance, need to be derived from external resource or overall event distribution To measure how an event term/named entity connect to another event term/named entity Event Term (ET)Named Entity (NE) Event Term (ET) R(ET, ET)R(ET, NE) Named Entity (NE) R(NE, ET)R(NE, NE)
14
14 Intra- & Inter- Event Relevance (Cont’d) How to determine Intra-event relevance Intra-event relevance can be simply established by counting how many times and are associated
15
15 Intra- & Inter- Event Relevance (Cont’d) How to determine Inter-event relevance Event term relevance – R(ET, ET) 1.Semantice relevance from WordNet − We use WordNet::Similarity to measure the relatedness of concept (event terms in our case) and choose lesk metric. − 2. Topic-specific relevance from documents − Assumption: if 2 events are concerned with the same participant, location or time, these 2 events are interrelated with each other in some ways − Event term relevance then can be derived from the number of named entities they share. −
16
16 Intra- & Inter- Event Relevance (Cont’d) How to determine Inter-event relevance Named entities relevance – R(NE, NE) 1. Named entity relevance from documents − Named entity relevance then can be derived from the number of event terms they share − 2. Named entity relevance from clustering − We proposed a clustering algorithm based on words − LocationPersonDateOrganization MississippiProfessor Sir Richard Southwood First six months of last year Long Beach City Council Mississippi River Sir Richard Southwood Last yearSan Jose City Council Richard SouthwoodCity Council
17
17 Intra- & Inter- Event Relevance (Cont’d) How to determine Inter-event relevance Named entities relevance - R(NE, NE) 3. Named entity relevance from sentence pattern − Named entity relevance then can be revealed by sentence context. − Example of Sentence patterns − Window-based: Neighboring named entities are usually relevant −, a-position-name of, dose something. and another do something.
18
18 Extractive Summarization System Overview Recognize 4 types of named entities, nouns, verbs by GATE Determine event terms (w/ stem, w/o stop-word) Extract events Derive event relevance Determine salience of concept with PageRank Select out sentences containing salient concept
19
19 Evaluation Task: Automatic text summarization Data: DUC 2001 multi-document summarization task 30 English document sets Summary of 50, 100, 200, 400 word length Each set includes 10.3 documents, 602 sentences, 216 event terms, 148.5 named entities Evaluation Metric: ROUGE Automatic evaluation Based on N-gram co-occurrence Comparing with human judgments Method: To focus on efficiency and potential of event-relevance based approach Without other features, such as sentence position, headline, publication dates, etc. Simple greedy strategy to extract the most salient sentences. Without avoidance of sentence redundancy, only remove same sentences
20
20 Evaluation(1) Intra-event relevance – R(ET, NE) & R(NE, ET) High Frequency Noun Some frequently occurring nouns, such as “hurricane”, “euro”, are not marked by general NE taggers. But they indicate persons, organizations, locations or important objects. A noun is considers as a frequent noun when its frequency is larger than 10. 5% improvement with high frequency nouns R(ET,NE)NE w/ High Frequency Nouns NE w/o High Frequency Nouns ROUGE-10.333200.34859 ROUGE-20.062600.07157 ROUGE-W0.129650.13471
21
21 Evaluation(2) Inter-event relevance R(ET, ET) & R(NE, NE) R(ET, ET) 2 approaches: − Semantic relevance from Word-Net − Topic-specific relevance from document number of ET’s shared named entities Example result of event terms pairs with highest relevance “abort”-”confirm” (semantics, antonymous) “vote”- “confirm” (associated, causal) “Document” outperforms “WordNet” by 4% Reason: WordNet may introduce non-necessary relatedness in the topic-specific documents R(ET,ET)Semantic relevance from Word-Net Topic-specific relevance from document ROUGE-10.329170.34178 ROUGE-20.057370.06852 ROUGE-W0.119590.13471
22
22 Evaluation(3) Inter-event relevance R(ET, ET) & R(NE, NE) R(NE, NE) Example result of named entities pairs with highest relevance “Louisiana”-”Florida” (something may happen in both places) “Florida”- “Andrew” (may happen about Andrew in Florida) Best result is “Document”- derive NE relevance according to the numbers of shared event terms Reason: relevance derived from clustering and neighborhoods can also be discovered by R(NE,NE) Relevance from Documents Relevance from Clustering Relevance from Window-based Context ROUGE-10.352120.335610.34466 ROUGE-20.071070.072860.07508 ROUGE-W0.136030.131090.13523
23
23 Evaluation(4) Event relevance Integration of R(ET, NE), R(ET, ET) and R(NE, NE) Different length of summary: 50,100,200,400 words Baseline: “Event-based Extractive Summarization”, Elena Filatova & Vasileios Hatzivassiloglou, 2004 DUC-2001, 200 words summary, ROUGE-1 about 0.3 Significant improvement comparing with baseline Event-based approaches prefer longer summaries ROUGE-1 50100200400 R(NE,NE) 0.223830.285840.352120.41612 R(ET,NE) 0.222240.279470.348590.41644 R(ET,ET) 0.206160.269230.341780.41201 R(ET,NE)+R(ET, ET)+ R(NE,NE) 0.213110.279390.346300.41639
24
24 Conclusion & Future Work Extract event according to actions and associated named entities Use event to denote concept Use Inter- and Intra- event relevance to rank event events-based summarizer gives good performance on news documents Improve event representation to build a more powerful event-base summarization system Text compression technique based on concept What features of a document set preferring event-based approaches (beyond news domain) Influence of IE performance(e.g. POS tagger, NE tagger)
25
25 Thank You Very Much Do you have any questions?
Similar presentations
© 2024 SlidePlayer.com. Inc.
All rights reserved.