Presentation is loading. Please wait.

Presentation is loading. Please wait.

Automatic Summarization

Similar presentations


Presentation on theme: "Automatic Summarization"— Presentation transcript:

1 Automatic Summarization
Heng Ji

2 Table of contents 1. Motivation 2. Overview of summarization system.
3. Summarization techniques 4. Building a summarization system. 5. Evaluating summaries 6. The future. 2

3 MILAN, Italy, April 18. A small airplane crashed into a government
building in heart of Milan, setting the top floors on fire, Italian police reported. There were no immediate reports on casualties as rescue workers attempted to clear the area in the city's financial district. Few details of the crash were available, but news reports about it immediately set off fears that it might be a terrorist act akin to the Sept. 11 attacks in the United States. Those fears sent U.S. stocks tumbling to session lows in late morning trading. Witnesses reported hearing a loud explosion from the 30-story office building, which houses the administrative offices of the local Lombardy region and sits next to the city's central train station. Italian state television said the crash put a hole in the 25th floor of the Pirelli building. News reports said smoke poured from the opening. Police and ambulances rushed to the building in downtown Milan. No further details were immediately available. (Radev, 2004)

4 MILAN, Italy, April 18. A small airplane crashed into a government
building in heart of Milan, setting the top floors on fire, Italian police reported. There were no immediate reports on casualties as rescue workers attempted to clear the area in the city's financial district. Few details of the crash were available, but news reports about it immediately set off fears that it might be a terrorist act akin to the Sept. 11 attacks in the United States. Those fears sent U.S. stocks tumbling to session lows in late morning trading. Witnesses reported hearing a loud explosion from the 30-story office building, which houses the administrative offices of the local Lombardy region and sits next to the city's central train station. Italian state television said the crash put a hole in the 25th floor of the Pirelli building. News reports said smoke poured from the opening. Police and ambulances rushed to the building in downtown Milan. No further details were immediately available. (Radev, 2004)

5 What happened? How many victims? When, where? Says who?
MILAN, Italy, April 18. A small airplane crashed into a government building in heart of Milan, setting the top floors on fire, Italian police reported. There were no immediate reports on casualties as rescue workers attempted to clear the area in the city's financial district. Few details of the crash were available, but news reports about it immediately set off fears that it might be a terrorist act akin to the Sept. 11 attacks in the United States. Those fears sent U.S. stocks tumbling to session lows in late morning trading. Witnesses reported hearing a loud explosion from the 30-story office building, which houses the administrative offices of the local Lombardy region and sits next to the city's central train station. Italian state television said the crash put a hole in the 25th floor of the Pirelli building. News reports said smoke poured from the opening. Police and ambulances rushed to the building in downtown Milan. No further details were immediately available. How many victims? When, where? Says who? Was it a terrorist act? What was the target?

6 1. How many people were injured?
2. How many people were killed? (age, number, gender, description) 3. Was the pilot killed? 4. Where was the plane coming from? 5. Was it an accident (technical problem, illness, terrorist act)? 6. Who was the pilot? (age, number, gender, description) 7. When did the plane crash? 8. How tall is the Pirelli building? 9. Who was on the plane with the pilot? 10. Did the plane catch fire before hitting the building? 11. What was the weather like at the time of the crash? 12. When was the building built? 13. What direction was the plane flying? 14. How many people work in the building? 15. How many people were in the building at the time of the crash? 16. How many people were taken to the hospital? 17. What kind of aircraft was used?

7 Abstracts of papers — time saving
7

8 Table of contents 1. Motivation. 2. Overview of summarization system.
3. Summarization techniques 4. Building a summarization system. 5. Evaluating summaries 6. The future. 8

9 Definitions Summary definition(Sparck Jones,1999)
“a reductive transformation of source text to summary text through content condensation by selection and/or generalization on what is important in the source.”

10 Schematic summary processing model
Source text Interpretation Source representation Transformation Summary representation Generation Summary text

11 Summarizing factors Input (Sparck Jones 2007) Purpose Output
subject type: domain genre: newspaper articles, editorials, letters, reports... form: regular text structure; free-form source size: single doc; multiple docs (few; many) Purpose situation: embedded in larger system (MT, IR) or not? audience: focused or general usage: IR, sorting, skimming... Output completeness: include all aspects, or focus on some? format: paragraph, table, etc. style: informative, indicative, aggregative, critical... 11

12 Examples Exercise: summarize the following texts for the following readers: text1: Coup Attempt text2: childrens’ story reader1: your friend, who knows nothing about South Africa. reader2: someone who lives in South Africa and knows the political position. reader3: your 4-year-old niece. reader4: the Library of Congress. There are two examples expressing summary cannot be context free. The first one is one news about the coup attempt in South Africa. How to summarize for Reader 1 who knows nothing about south africa. and how to summarize for Reader 2 who lives in South African and knows the background. The second one is about a children story. How will you summarize to a 4-year-old girl or to the Library of Congress? I think that it should be different. You can try. 12

13 13 90 Soldiers Arrested After Coup Attempt In Tribal Homeland
MMABATHO, South Africa (AP) About 90 soldiers have been arrested and face possible death sentences stemming from a coup attempt in Bophuthatswana, leaders of the tribal homeland said Friday. Rebel soldiers staged the takeover bid Wednesday, detaining homeland President Lucas Mangope and several top Cabinet officials for 15 hours before South African soldiers and police rushed to the homeland, rescuing the leaders and restoring them to power. At least three soldiers and two civilians died in the uprising. Bophuthatswana's Minister of Justice G. Godfrey Mothibe told a news conference that those arrested have been charged with high treason and if convicted could be sentenced to death. He said the accused were to appear in court Monday. All those arrested in the coup attempt have been described as young troops, the most senior being a warrant officer. During the coup rebel soldiers installed as head of state Rocky Malebane-Metsing, leader of the opposition Progressive Peoples Party. Malebane-Metsing escaped capture and his whereabouts remained unknown, officials said. Several unsubstantiated reports said he fled to nearby Botswana. Warrant Officer M.T.F. Phiri, described by Mangope as one of the coup leaders, was arrested Friday in Mmabatho, capital of the nominally independent homeland, officials said. Bophuthatswana, which has a population of 1.7 million spread over seven separate land blocks, is one of 10 tribal homelands in South Africa. About half of South Africa's 26 million blacks live in the homelands, none of which are recognized internationally. Hennie Riekert, the homeland's defense minister, said South African troops were to remain in Bophuthatswana but will not become a ``permanent presence.'' Bophuthatswana's Foreign Minister Solomon Rathebe defended South Africa's intervention. ``The fact that ... the South African government (was invited) to assist in this drama is not anything new nor peculiar to Bophuthatswana,'' Rathebe said. ``But why South Africa, one might ask? Because she is the only country with whom Bophuthatswana enjoys diplomatic relations and has formal agreements.'' Mangope described the mutual defense treaty between the homeland and South Africa as ``similar to the NATO agreement,'' referring to the Atlantic military alliance. He did not elaborate. Asked about the causes of the coup, Mangope said, ``We granted people freedom perhaps ... to the extent of planning a thing like this.'' The uprising began around 2 a.m. Wednesday when rebel soldiers took Mangope and his top ministers from their homes to the national sports stadium. On Wednesday evening, South African soldiers and police stormed the stadium, rescuing Mangope and his Cabinet. South African President P.W. Botha and three of his Cabinet ministers flew to Mmabatho late Wednesday and met with Mangope, the homeland's only president since it was declared independent in 1977. The South African government has said, without producing evidence, that the outlawed African National Congress may be linked to the coup. The ANC, based in Lusaka, Zambia, dismissed the claims and said South Africa's actions showed that it maintains tight control over the homeland governments. The group seeks to topple the Pretoria government. The African National Congress and other anti-government organizations consider the homelands part of an apartheid system designed to fragment the black majority and deny them political rights in South Africa. 13

14 If You Give a Mouse a Cookie Laura Joffe Numeroff © 1985
If you give a mouse a cookie,he’s going to ask for a glass of milk. When you give him the milk, he’ll probably ask you for a straw. When he’s finished, he’ll ask for a napkin. Then he’ll want to look in the mirror to make sure he doesn’t have a milk mustache. When he looks into the mirror, he might notice his hair needs a trim. So he’ll probably ask for a pair of nail scissors. When he’s finished giving himself a trim, he’ll want a broom to sweep up. He’ll start sweeping. He might get carried away and sweep every room in the house. He may even end up washing the floors as well. When he’s done, he’ll probably want to take a nap. You’ll have to fix up a little box for him with a blanket and a pillow. He’ll crawl in, make himself comfortable, and fluff the pillow a few times. He’ll probably ask you to read him a story. When you read to him from one of your picture books, he'll ask to see the pictures. When he looks at the pictures, he’ll get so excited that he’ll want to draw one of his own. He’ll ask for paper and crayons. He’ll draw a picture. When the picture is finished, he’ll want to sign his name, with a pen. Then he’ll want to hang his picture on your refrigerator. Which means he’ll need Scotch tape. He’ll hang up his drawing and stand back to look at it. Looking at the refrigerator will remind him that he’s thirsty. So…he’ll ask for a glass of milk. And chances are that if he asks for a glass of milk, he’s going to want a cookie to go with it. Library of Congress summary: Relating the cycle of requests a mouse is likely to make after you give him a cookie takes the reader through a young boy’s day. 14

15 ‘Genres’ of Summary? Indicative vs. informative Extract vs. abstract
...used for quick categorization vs. content processing. Extract vs. abstract ...lists fragments of text vs. re-phrases content coherently. Generic vs. query-oriented ...provides author’s view vs. reflects user’s interest. Background vs. just-the-news ...assumes reader’s prior knowledge is poor vs. up-to-date. Single-document vs. multi-document source ...based on one text vs. fuses together many texts. 15

16 Statistical scoring Scoring techniques
Word frequencies throughout the text(Luhn58)‏ Position in the text(Edmundson69)‏ Title Method(Edmundson69)‏ Cue phrases in sentences (Edmundson69)‏

17 Topic Signature Method
(Hovy and Lin, 98) Claim: Can approximate script identification at lexical level, using automatically acquired ‘word families’. Idea: Create topic signatures: each concept is defined by frequency distribution of its related words (concepts): TS = {topic, signature } = {topic, (t1,w1) (t2,w2) ...} restaurant -visit  waiter + menu + food + eat... (inverse of query expansion in IR.) Hovy and Lin also proposed the concept of topic signatures which can play a central role in summarization and IR. They claim that they can approximate script identification at lexical level, using automatically acquired ‘word families." For example, when we visit a restaurant, it will involve the concepts of waiter , menu, food, pay et.al. Thus they create the topic signatures where each concept is defined by frequency distribution of its related words. Topic is the target concept and signature is a vector of related terms. Each ti is a term highly correlated to with association weight wi. 17

18 Signature term extraction
likelihood ratio (Dunning 1993) Hypothesis testing method: to acquire signature terms, they use the likelihood ratio, which is a hypothesis testing method. H1: means the topic relevant is independent with the term ti. H2: means the presence of ti indicates stronger relevance. Then we can create a contigency table.

19 Example Signatures 19 Here is the examples of signature words.
For the banking topic, its signature words include .... In lin's work, The score of a sentence is simply the sum of all the scores of content-bearing terms in the sentence. This method beats the baseline position method and tf*idf method. 19

20 Graph-based Methods Degree Centrality LexRank Continuous LexRank

21 An example graph: Image courtesy: Wikipedia

22 Text as a graph Sentences in the text are modelled as vertices of the graph. Two vertices are connected if there exists a similarity relation between them. Similarity formula After the ranking algorithm is run on the graph, sentences are sorted in reversed order of their score, and the top ranked sentences are selected.

23 Why TextRank works? Through the graphs, TextRank identifies connections between various entities, and implements the concept of recommendation. A text unit recommends other related text units, and the strength of the recommendation is recursively computed based on the importance of the units making the recommendation. The sentences that are highly recommended by other sentences in the text are likely to be more informative

24 Abstractive Summarization
Problems? Out Of Vocabulary words

25 Pointer-Generator Network
How to deal with OOV words? Learn 𝑝 𝑔𝑒𝑛 =𝜎( 𝑤 𝑐 𝑐 𝑖 + 𝑤 𝑠 𝑠 𝑖 + 𝑤 𝑥 𝑥 𝑖 +𝑏) 𝑝 𝑔𝑒𝑛 is used as a soft switch to choose between generating a word from the vocabulary by sampling from 𝑃 𝑣𝑜𝑐𝑎𝑏 , or copying a word from the input sequence by sampling from the attention distribution 𝑎 𝑖 . Sample from the extend vocabulary distribution: 𝑃 𝑤 = 𝑝 𝑔𝑒𝑛 𝑃 𝑣𝑜𝑐𝑎𝑏 𝑤 +(1− 𝑝 𝑔𝑒𝑛 ) 𝑗: 𝑤 𝑗 =𝑤 𝑎 𝑗 𝑖 החלק הימני בנוסחה מעלה את ההסתברות לגנרט מילים שנמצאות במשפט בתוך הפ_ווקב. ממש כמו שאני מכיר שעושים Vinyals et al., 2015

26 Pointer-Generator Network

27 Coverage Mechanism Repetition is a common problem for sequence-to-sequence models Maintain coverage vector 𝑐𝑜 𝑣 𝑖 = 𝑗=0 𝑖−1 𝑎 𝑗 The (unnormalized) distribution over the source document words In order to ensure the attention mechanism is informed of its previous decisions Add this to the alignment model: 𝑒 𝑖𝑗 = 𝑣 𝑇 tanh 𝑊 𝑠 𝑖−1 +𝑈 ℎ 𝑗 +𝑉𝑐𝑜 𝑣 𝑖 + 𝑏 𝑎𝑡𝑡𝑛 Loss function 𝐿𝑜𝑠 𝑠 𝑖 =− log 𝑃 𝑤 𝑖 ∗ +𝜆 𝑗 min⁡( 𝑎 𝑖𝑗 ,𝑐𝑜 𝑣 𝑖𝑗 ) בנוסחה האחרונה ג'יי זה ריצה על כל השורות של שני הווקטורים ואני רוצה שהקטן מבינהם יהיה 0 למה? כי אם הייתי במקום כלשהו, אני לא רוצה להיות שם שוב, כלומר שיהיה אפס עכשיו. Tu et al., 2016

28 Table of contents 1. Motivation
2. Definition, genres and types of summaries. 3. Summarization techniques 4. Building a summarization system. 5. Evaluating summaries 6. The future. This part is nothing new. I just introduce the concrete process of building a simple summarization system. 28

29 Document Understanding Conference
NIST (National Institute of Standards and Technology ) Summarization tasks: : 100 words, query-independent single-doc. and multi-doc. summ. 2003: query-dependent summ. 2004: multi-lingual summ. : Query focused multi-document summ. Here I use the DUC evaluation as example. NIST is in charge of organizing a series of summarization evaluations. First it is just the single document summarization. Then the task evolves to multi-document summarization. Often, a topic query is provided.

30 Query focused Multi-document Summarization
Task description System Implementation Feature driven system design Improvement based machine learning Post-processing Now we focus on the summarization system for query focused MDS system. First I will give the task description. then present the summarization process.

31 Task Description Combination of QA and Summarization
Input: topic query and 25 related documents Output: a summary no more than 250 words News documents from Associated Press, New York Times and Xinhua News Agency, related to recently important events. Evaluation using ROUGE、Pyramid Each topic has four human summaries This task is a combination of ... News document are from ...,

32 DUC Corpus example(1) Each topic is composed of a topic query and 25 related documents Query sample: <topic> <num> D0601A </num> <title> Native American Reservation System - pros and cons </title> <narr> Discuss conditions on American Indian reservations or among Native American communities. Include the benefits and drawbacks of the reservation system. Include legal privileges and problems. </narr> </topic> Usually a topic is composed of a topic query and 25 related documents. This is query sample.

33 DUC corpus example(2) Document sample: <DOC>
<DOCNO> APW </DOCNO> <DOCTYPE> NEWS STORY </DOCTYPE> <DATE_TIME> :56 </DATE_TIME> <HEADLINE> Clinton To Visit Navajo Community </HEADLINE> By CHRIS ROBERTS, Associated Press Writer <TEXT> <P>ALBUQUERQUE, N.M. (AP) -- At Mesa Elementary School in far northwest New Mexico, Navajo children line up to use the few computers connected to the Internet. Their time online must be short for everyone to get a chance. </P> … … <P> Navajo Nation: </TEXT> </DOC> This is the document sample

34 Sentence simplification
Delete meaningless words in sentences News specific noisy words Content irrelevant words Rule based method The beginning of news: e.g.,“ALBUQUERQUE, N.M. (AP) ; The initial words in the sentence: such as “and”,”also”,”besides,”,”though,”,”in addition”,”somebody said”,“somebody says”;; “somebody (代词)/It is said/reported/noticed/thought that” ; The parenthesized content in captalized letters …

35 Sentence ordering Sentence ordering by score: no logic in the content
Temporal based sentence ordering Acquire the time stamp from the original texts Order sentence according to the publish time of documents; For the sentences in the same document, order them by their occurrence in the document

36 Table of contents 1. Motivation. 2. Genres and types of summaries.
3. Approaches and paradigms. 4. Summarization techniques 5. Evaluating summaries. 6. The future. The last part is to evaluate summaries. 36

37 How can You Evaluate a Summary?
When you generate a summary… ..…: 1. the gold standard summaries (human), 2. choose a granularity (clause; sentence; paragraph), 3. create a similarity measure for that granularity (word overlap; multi-word overlap, perfect match), 4. measure the similarity of each unit in the new to the most similar unit(s) in the gold standard, 5. measure Recall and Precision. e.g., (Kupiec et al., 95). When you get a summary, then how can you evaluate it? first, you need get the gold standard summaries which are generally compiled by human. second, you should choose a granularity to evaluate, n-gram, clause, sentence, or ..? third, a similarity measure need to be created for the granularity forth, measure the similarity of each unit 37

38 Evaluation Manual Pyramid: SCU Automatic Responsiveness(content)
Linguistic Quality(readability) Grammaticality Non-redundancy Referential clarity Focus Structure Five-point scale (1 very poor, 5 very good) Pyramid: SCU Automatic Rouge ROUGE 2 ROUGE SU4 4 3 2 1 After we have an idea of the gold-standard summaries, we evalute the system generated summaries. Three kinds evaluations are generally used. One is manual evaluation. human assessors assign a score to each summary with respect to each of the following five criteria: Focus (containing less irrelevant details), Non-redundancy (repeating less the same information). a five-point scale is adopted. 1 means very poor and 5 means very good. Second Pyramid can be seen as semi-automatic evaluation. assessors need to annotate the summary cotent units (scu) for each summary. Others may call it manual method, because it is mainly based on the human efforts. Because the previous two methods still cost a lot of human efforts, the third one ROUGE method is still the widely used evaluation. BE(basic element)

39 Pyramid(1) The pyramid method is designed to address the observation: summaries from different humans always have partly overlapping content. The pyramid method includes a manual annotation method to represent Summary Content Units (SCUs) and to quantify the proportion of model summaries that express this content. All SCUs have a weight representing the number of models they occur in, thus from 1 to maxn, where maxn is the total number of models There are very few SCUs expressed in all models (i.e., weight=maxn), and increasingly many SCUs at each lower weight, with the most SCUs at weight=1. 1 2 3 4

40 SCU example We can see in the underlined content represent the same thing. We can create a SCU and assign it a weight. Because it occurs in 4 summaries, then weight is set to 4.

41 Pyramid(2) The approach involves two phases of manual annotation:
pyramid construction annotation against the pyramid to determine which SCUs in the pyramid have been expressed in the peer summary. The total weight is i means the weight of SCUs in tier Ti. Let Di be the number of SCUs in the summary that appears in Ti.

42 ROUGE basics Rouge(Recall-Oriented Understudy for Gisting Evaluation) Recall-oriented, within-sentence word overlap with model(s) Models - no theoretical limit to number compared system output to 4 models compared manual summaries to 3 models Using n-gram Correlate reasonably with human coverage judgements Not address summary discourse characteristics, and suffer from lack of text cohesion or coherence ROUGE v1.2.1 measures ROUGE-1,2,3,4: N-gram matching where N = 1,2,3,4 ROUGE-LCS: Longest common substring Last is the ROUGE evaluation method. This method is simple and based on the n-gram overlap between system summary and model summary. My requirement is you can use this tool. developed by Chin-Yew Lin at ISI/USC,Available from <

43 ROUGE: Recall-Oriented Understudy for Gisting Evaluation
Rouge – Ngram co-occurrence metrics measuring content overlap Counts of n-gram overlaps between candidate and model summaries This is the formula. The numerator: the counts of n-gram overlaps between system and model summaries. The denominator the total n-grams in model summaries. Total n-grams in summary model

44 ROUGE Where Nn represents the set of all n-grams and i is one member from Nn. Xn(i) is the number of times the n-gram i occurred in the summary and Mn(i,j) is the number of times the n-gram i ocurred in the j-th model reference(human) summary. There are totally h human summaries.

45 Table of contents 1. Motivation. 2. Genres and types of summaries.
3. Approaches and paradigms. 4. Summarization methods . 5. Evaluating summaries. 6. The future. 45


Download ppt "Automatic Summarization"

Similar presentations


Ads by Google