Download presentation
Presentation is loading. Please wait.
Published byProsper Ethelbert Flynn Modified over 9 years ago
2
NYU/CRL system for DUC and Prospect for Single Document Summaries Satoshi Sekine (New York University) Chikashi Nobata (CRL – Japan) September 14, 2001 DUC2001 Workshop
3
Objective Use IE technologies for Summarization –Named Entity –Automatic pattern discovery Find important phrases (patterns) of the domain Combine with Summarization technologies –Important Sentence Extraction Sentence position, length, TF/IDF, Headline
4
Important Sentence Extraction Combining 5 scores –Sentence position –Sentence length –TF/IDF –Similarity to Headline –Pattern Optimize functions/weights on training data
5
Alternative scores for Sentence position max(1/i, 1/(n-i+1)) n 1/i 1T 1 (i<T) 0 (otherwise) Sentence position Score
6
Alternative scores for Sentence length & TF/IDF Sentence length 1. Score = Length 2. Score = Length (if L>C) Length – C (other wise) TF/IDF TF = tf(w), (tf(w)-1)/tf(w), tf(w)/(tf(w)+1)
7
Alternative scores for Headline TF/IDF ratio between words overlapping words in headline and all words in sentence TF ratio between overlapping Named Entities (NE), and all NE’s in sentence TF = tf(e)/(1+tf(e))
8
Pattern Assumption Patterns (phrases) that appear often in the domain are important Strategy –Intended to use IR to find a larger set of documents in the domain, but used the given document set –NE’s were treated as class rather than the literal
9
Pattern discovery Procedure –Analyze sentences (NE, dependency) –Extract all sub-trees from the dependency trees in the domain –Score the trees based on frequency of the tree and TF/IDF of the words –High score trees are regarded as important patterns
10
Optimal weight Optimal weights are found on training set Contribution Scoreweight * std. dev. Position277 Length8 TF/IDF96 Headline18 Pattern2
11
Evaluation Result Subjective evaluation (V; out of 12) Average over all documents SystemLeadAverage Grammaticality3.711 (5)3.2363.580 Cohesion3.054 (1)2.9262.676 Organization3.215 (1)3.0812.870 Total9.980 (1)9.2439.126
12
Prospect for Single Document Summaries Important Sentence Extraction CAN be Summarization but Summarization is NOT Important Sentence Extraction
13
DUC We are aiming for Document understanding How can understanding be instantiated? –Make summary –Extract essential point, principle relations –Answer questions –Comprehension test
14
Example Earthquake jolts Los Angeles area LOS ANGELES (AP) — An earthquake shook the greater Los Angeles area Sunday, but there were no immediate reports of damage or injuries. The quake had a preliminary magnitude of 4.2 and was centered about one mile southeast of West Hollywood, said Lucy Jones of the U.S. Geological Survey. The quake was felt in downtown Los Angeles where it rolled for about four seconds and also shook in the suburban areas of Van Nuys, Whittier and Glendale.
15
Essential points Event (Earthquake) –When: Sunday, September 9, 2001 –Where: greater Los Angeles area –Magnitude: 4.2 –Injury: No –Death: No –Damage: No
16
How can we make it IE is a hint (a step) IE is a version of document understanding limited to a specific domain and task which are given in advance Document understanding can be achieved by upgrading IE technologies by deleting “specific” and “given in advance”
17
Our approach Essential points can be found by searching frequently mentioned patterns in the same domain Strategy –Given a document, find its domain by IR –Find frequently mentioned patterns –Extract information matching those patterns
18
Single Document Summarization Has to be continued –To pursue researches on “Understanding” –To find something more than sentence extraction –To observe human in summary task –To have new comers (like us)
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.