Presentation is loading. Please wait.

Presentation is loading. Please wait.

Headline Generation Based on Statistical Translation Michele Banko Computer Science Department Johns Hopkins University Vibhu O.Mittal Just Research 報告人.

Similar presentations


Presentation on theme: "Headline Generation Based on Statistical Translation Michele Banko Computer Science Department Johns Hopkins University Vibhu O.Mittal Just Research 報告人."— Presentation transcript:

1 Headline Generation Based on Statistical Translation Michele Banko Computer Science Department Johns Hopkins University Vibhu O.Mittal Just Research 報告人 : 翁鴻加 ACL2000 Michael J. Witbrock Lycos Inc.

2 Abstract  Extractive approach can’t generate document summaries shorter than one sentence  Non-extractive approach : statistical models of term selection  Actual headline often ungrammatical and incomplete phrase

3 Introduction  Generating effective summaries requires the ability to select, evaluate, order and aggregate item of information according to subject  Previous work has focused on extractive summarization drawback: 1.inability to generate coherent summaries shorter than considering context-span 2.most important information scattered across multiple sentences 3.tend to select long sentence

4 The System  Content Selection  Generate summary 1.Length of summaries:fixed length based on document genre 2.Coherently ordered summary from content selected

5 The System  Assumption : likelihood of a word in summary is independent of other words in the summary =>initial modeling choice

6 The System  Bigram instead of n-gram  Model : zero-level, Cross-validation is used to learn weight

7 Experiments 3 Clinton netanyahu arafat 4 Clinton to mideast peace 5 Clinton to meet netanyahu arafat 6 Clinton to meet Netanyahu Arafat Israel U.S. Pushes for Mideast Peace President Clinton met with his top Mideast advisers, including Secretary of State Madeleine Albright and U.S. peace envoy Dennis Ross, in preparation for a session with Israel Prime Minister Benjamin Netanyahu tomorrow. Palestinian leader Yasser Arafat is to meet with Clinton later this week. Published reports in Israel say Netanyahu will warn Clinton that Israel can’t withdraw from more than nine percent of the West Bank in its next scheduled pullback, although Clinton wants a 12-15 percent pullback.

8 Experiment  Corpus : 25000 news articles from Reuters between 1/1/1997 ~ 1/6/1997  Strip punctuation except apostrophes  44000 unique tokens in the article  15000 tokens in the headline  All pairwise conditional probability added complexity : limited vocabulary

9 Experiments  Lack of sufficient training data  Lexical model 1000 unseen documents Gen.Headline length Word Overlap Percentage of complete matches 40.214019.71 % 50.202714.10% 60.208012.14 % 70.175408.70 % 80.124411.90 %

10 Multiple Selection Models : POS and Position  Part of speech information : learn which word-senses are more likely to be part of headline and coherently order  Position information : estimating the probability of a token appea- ring in the headline given that it appeared in the 1 st, 2 st, 3 st, 4 st quartile of the body of the article

11 LLex+position+POS+position +POS 10.374140.398880.305220.40538 20.248180.269230.272460.27838 30.218310.246120.203880.25048 40.214040.240110.187210.25741 50.202720.216850.184470.21947 60.208040.198860.175930.21168 Experiments Overlap with headline

12 Some “equally good” generated headlines count as error Original term Generated term Nations Top JudgeRehnquist KaczynskiUnabomber Suspect ERTop-Rated Hospital Drama Wall Street Stocks DeclineDow Jones index lower 49ers Roll Over Vikings 38- 22 49ers to nfc title game Corn, Wheat prices FallSoybean grain prices lower

13 Conclusion and Future Work  This paper has presented an approach to make it possible to generate coherent summaries shorter than a single sentence  With slight generalization of the system, the summaries need not contain any of the words in original document  Given good corpora, this approach used in Japanese documents and English headline


Download ppt "Headline Generation Based on Statistical Translation Michele Banko Computer Science Department Johns Hopkins University Vibhu O.Mittal Just Research 報告人."

Similar presentations


Ads by Google