Presentation is loading. Please wait.

Presentation is loading. Please wait.

EUSUM: extracting easy-to-understand english summaries for non-native readers Xiaojun Wan, Huiying Li, Jianguo Xiao pp. 491-498 (SIGIR 2010) EUSUM: Extracting.

Similar presentations


Presentation on theme: "EUSUM: extracting easy-to-understand english summaries for non-native readers Xiaojun Wan, Huiying Li, Jianguo Xiao pp. 491-498 (SIGIR 2010) EUSUM: Extracting."— Presentation transcript:

1 EUSUM: extracting easy-to-understand english summaries for non-native readers Xiaojun Wan, Huiying Li, Jianguo Xiao pp (SIGIR 2010) EUSUM: Extracting Easy-to-Understand English Summaries for Non-Native Readers Authors : Xiaojun Wan (副研究員) Huiying Li Jianguo Xiao (教授) Institute of Computer Science and Technology, Peking University, Beijing , China Key Laboratory of Computational Linguistics (Peking University), MOE, China Presenter : Zhong-Yong 北京大學, 計算機科學技術研究所

2 Outline 1. Introduction 2. Related Work 3. The EUSUM system
4. Experiments 5. Discussion 6. Conclusion and Future work

3 Introduction (1/6) To date, various summarization methods and a number of summarization systems have been developed, such as MEAD, NewsInEssence and NewsBlaster. These methods and systems focus on how to improve the informativeness, diversity or fluency of the English summary. They usually produce the same English summaries for all users. 時至今日,過去的研究已經提出不同的摘要系統與方法。 舉例來說,MEAD, news in essence 與news blaster

4 Newsinessence http://www.newsinessence.com

5 Newsblaster http://newsblaster.cs.columbia.edu/

6 Newsblaster http://www.newsblastergame.com/

7 Introduction (2/6) These methods and systems focus on how to improve the informativeness, diversity or fluency of the English summary. They usually produce the same English summaries for all users. 上述的這些方法,都在於改善資訊量,多樣性與流暢性。 但是這些方法的假設就在於讀者的英文能力都相同

8 Introduction (3/6) Different users usually have different English reading levels because they have different English education backgrounds and learning environments. Chinese readers usually have less ability to read English summaries than native English readers. 可是不同的讀者一定會有不同的英文程度,這是因為它們有不同的英文教育程度與學習環境。 而且中文讀者的英文閱讀能力比英語體系的人的閱讀能力還要來得差。

9 Introduction (4/6) In this study, argue that the English summaries produced by existing methods and systems are not fit for non-native readers. They used a new factor, “reading easiness” (or difficulty) for document summarization. The factor can indicate whether the summary is easy to understand by non-native readers or not. The reading easiness of a summary is dependent on the reading easiness of each sentence in the summary. 所以此論文的作者認為過去的摘要方法並不適用非英語體系的閱讀者。 因此這篇的摘要系統納入了一個新的因素,”reading easiness”(閱讀容易性) 1. 這個因素指的是 是否能讓非英語體系的讀者容易了解文章的內容。 2. Reading easiness這個因素是以摘要中每句話來計算分數。

10 Introduction (5/6) Propose a novel summarization system – EUSUM (Easy-to-Understand Summarization) for incorporating the reading easiness factor into the final summary. 這篇論文提出了一個新的摘要系統,與過去差別在於它有納入reading easiness。

11 Introduction (6/6) The contribution of this paper is summarized as follows: 1) Examine a new factor of reading easiness for document summarization. 2) Propose a novel summarization system – EUSUM for incorporating the new factor and producing easy-to-understand summaries for non-native readers. 3) Conduct both automatic evaluation and user study to verify the effectiveness of the proposed system. 貢獻如下 有檢驗納入reading easiness的摘要系統 提出有納入reading easiness的摘要系統,這系統所產生的文章容易讓非英語體系的讀者了解。 還有做了自動化的評估方式與使用者研究來驗證此系統的效率。 接下來看過去針對摘要系統做了哪些研究

12 Related Work (1/7) Document Summarization
For single document summarization, the sentence score is usually computed by linguistic feature values, such as term frequency, sentence position, topic signature [19, 22]. [19] C. Y. Lin, E. Hovy. The Automated Acquisition of Topic Signatures for Text Summarization. In Proceedings of the 17th Conference on Computational Linguistics, , (C : 213) 過去研究分成兩塊 : 1文章摘要系統 對於單一文章的摘要系統,是針對句子的語言特性來計算,舉例來說,字的出現頻率,句子的位置,文章的特徵。 [22] H. P. Luhn. The Automatic Creation of literature Abstracts. IBM Journal of Research and Development, 2(2), (C : 1403)

13 Topic signature Topic signatures, for example, Restaurant-visit involves at least the concepts menu, eat, pay, and possibly waiter. [19] C. Y. Lin, E. Hovy. The Automated Acquisition of Topic Signatures for Text Summarization. In Proceedings of the 17th Conference on Computational Linguistics, , (C : 213) Topic signature指的是一個集合名詞,而這集合之中的所有物件都有一定的關係存在,舉例來說 : 餐廳 包含 菜單、吃、付錢、服務生

14 Related Work (2/7) Document Summarization
The summary sentences can also be selected by using machine learning methods [1, 17] or graph-based methods [8, 23]. [1] M. R. Amini, P. Gallinari. The Use of Unlabeled Data to Improve Supervised Learning for Text Summarization. In Proceedings of SIGIR2002, (C : 57) [17] J. Kupiec, J. Pedersen, F. Chen. A Trainable Document Summarizer. In Proceedings of SIGIR1995, (C : 764) 那挑選句子也有使用machine-learning或是graph-based的方法 [8] G. ErKan, D. R. Radev. LexPageRank: Prestige in Multi-Document Text Summarization. In Proceedings of EMNLP2004. (C : 96) [23] R. Mihalcea, P. Tarau. TextRank: Bringing Order into Texts. In Proceedings of EMNLP2004. (C : 278)

15 Related Work (3/7) Document Summarization
For multi-document summarization, the centroid-based method [27] is a typical method, and it scores sentences based on cluster centroids, position and TFIDF features. NeATS [20] makes use of the features such as topic signature to select important sentences. 接下來過去對於多篇文章的摘要,最典型的是使用centroid-based的方法,這方法是依據cluster的centroid,句子的位置或是tfidf的值。 也有使用不同feature來做多篇文‘章的摘要。 [27] D. R. Radev, H. Y. Jing, M. Stys and D. Tam. Centroid-based summarization of multiple documents. Information Processing and Management, 40: , (C : 502) [20] C.-Y. Lin and E.. H. Hovy. From Single to Multi-document Summarization: A Prototype System and its Evaluation. In Proceedings of ACL-02. (C : 96)

16 Related Work (4/7) Document Summarization
All these summarization tasks do not consider the reading easiness factor of the summary for non-native readers. 以上所述的摘要方法都沒有考量到reading easiness

17 Related Work (5/7) Reading Difficulty Prediction
Earlier work on reading difficulty prediction is conducted for the purpose of education or language learning. For example, one purpose is to find appropriate reading materials of the appropriate difficulty level for English as a First or Second Language students. 接下來所探討的related work是針對reading easiness(也稱為 reading difficulty), 早期用在reading easiness方面是用在教育或是語言學習。 舉例來說,對於將英文當成第一外語或是第二外語的學生,用reading easiness來選擇適當的閱讀題材。

18 Related Work (6/7) Reading Difficulty Prediction
Almost all earlier work focuses on document-level reading difficulty prediction. In this study, the authors investigate the reading difficulty (or easiness) prediction of English sentences for Chinese readers. 大部分的研究都將reading easiness用來判別文章是否容易閱讀。 在這篇論文裡面,作者將會納入句子是否容易閱讀的因素來建立摘要。

19 Related Work (7/7) Reading Difficulty Prediction
Note that sentence ordering in a long summary also has influences on the reading difficulty or readability of the summary, but sentence ordering is another research problem and do not take into account this factor in this study. 在這篇論文中,句子在摘要中的排序會影響到摘要的閱讀性,可是這篇論文並沒有探討這個問題。

20 The EUSUM system (1/19) System overview
The main idea of the proposed EUSUM system is to incorporate the sentence-level reading easiness factor into the summary extraction process. Each sentence is associated with two factors: informativeness and reading easiness. The reading easiness of a sentence is measured by an EU (easy-to-understand) score, which is predicted by using statistical regression methods. 接下來介紹系統的overview, 這系統的主要想法是在萃取句子的時候將reading easiness納入考量。 每個句子都會考量兩個因素,1. 資訊量, 2. 閱讀容易性。 每個句子的閱讀容易性是由統計回歸來計算。

21 The EUSUM system (2/19) Sentence-Level Reading Easiness Prediction
Reading easiness refers to how easily a text can be understood by non-native readers. The larger the value is, the more easily the text can be understood. 閱讀容易性指的是非英語系讀者能多容易閱讀此句子。 所以回歸求出的值越大,代表越容易閱讀。

22 The EUSUM system (3/19) Sentence-Level Reading Easiness Prediction
Many students have some difficulty to read original English news or summaries. The two factors most influencing the reading process are as follows: 1). Unknown or difficult English words: for example, most Chinese college students do not know the words such as “seismographs”, “woodbine”. 2). Complex sentence structure: for example, a sentence with two or more clauses introduced by a subordinating conjunction is usually difficult to read. 許多學生在閱讀英文新聞或是摘要時都有些困難。 以下歸納兩點會影響閱讀的因素, 不知道的單字,舉例來說 : seismograph(地震儀)、(woodbine)五葉地錦 複雜的句子結構,舉例來說,兩個或多個的子句,而這些子句是由附屬連接詞所連接。

23 The EUSUM system (4/19) Sentence-Level Reading Easiness Prediction
The authors adopt the ε-support vector regression (ε-SVR) method for the reading easiness prediction task. 接下來要注意的是reading easiness分數的計算 這篇的作者使用eplioson support vector regression的方法來計算reading easiness的值。

24 The EUSUM system (5/19) Sentence-Level Reading Easiness Prediction
In the experiments, use the LIBSVM tool with the RBF kernel for the regression task. Use the parameter selection tool of 10-fold cross validation via grid search to find the best parameters with respect to mean square error (MSE). Use the best parameters to train the whole training set. <<挑選參數的方法>> 實驗時,使用Libsvm的tool,kernal使用RBF(Radial Basis Function)來做迴歸。 參數設定則是使用10-fold cross-validation,並依據MSE來挑選最佳參數,然後在最佳參數設定下training整個training set。

25 The EUSUM system (6/19) Sentence-Level Reading Easiness Prediction
Use the following two groups of features for each sentence: The first group includes surface features. The second group includes parse based features. SVM需要將每個sentence做成一個vector,所以挑選兩組的fatures 第一組是surface features 第二組是parse features

26 The EUSUM system (7/19) Sentence-Level Reading Easiness Prediction
The four surface features are as follows: Sentence length : It refers to the number of words in the sentence. Average word length : It refers to the average length of words in the sentence. CET-4 word percentage : It refers to the percentage of how many words in the sentence appear in the CET-4 word list (690 words). Number of peculiar words : It refers to the number of infrequently occurring words in the sentence. Surface feature有四個 : 句子長度。指的是句子中有包含幾個字 (換句話說,長度越長的句子越難理解) 2. 句子中每個字的長度。 越少字母越容易辨識 CET-4 word所包含的字數 大陸人要考的基礎英文檢定 罕見字的字數 罕見字的定義,作者收集的實驗資料,選擇出現頻率最低的前兩千個字當作罕見字。

27 The EUSUM system (8/19) Sentence-Level Reading Easiness Prediction
Use the Stanford Lexicalized Parser [16] with the provided English PCFG model to parse a sentence into a parse tree. [16] D. Klein and C. D. Manning. Fast Exact Inference with a Factored Model for Natural Language Parsing. In Proceedings of NIPS (C : 297) 接下來再使用parse based feature之前先使用Stanford lexicalized parser,將句子轉成parse tree

28 The EUSUM system (9/19) Sentence-Level Reading Easiness Prediction
The four parse features are as follows: Depth of the parse tree: It refers to the depth of the generated parse tree. Number of SBARs in the parse tree: SBAR is defined as a clause introduced by a (possibly empty) subordinating conjunction. Number of NPs in the parse tree: It refers to the number of noun phrases in the parse tree. Number of VPs in the parse tree: It refers to the number of verb phrases in the parse tree. Parse based feature一樣取四個 Parse tree的深度 越深的parse tree代表句子結構越複雜 2. 附屬連接詞的個數 越多代表句子越複雜 3. 名詞個數 4. 動詞個數

29 The EUSUM system (10/19) Sentence-Level Reading Easiness Prediction
Each sentence si can be associated with a reading easiness score EaseScore(si) predicted by the ε-SVR method. The larger the score is, the more easily the sentence is understood. The score is finally normalized by dividing by the maximum score. 每個句子都有一個reading easiness score,稱做EaseScore (Si),由SVR算出 就像前面講的,分數越高代表越容易閱讀。 而最後的easiness score會除以最大的easiness score,讓它範圍在0~1之間。 綜合以上所提,每句話所建立的vector都有八個維度,再用SVR每句話的EaseScore分數

30 The EUSUM system (11/19) Sentence-Level Informativeness Evaluation Centroid-based Method
The score for each sentence is a linear combination of the weights computed based on the following three features: 1). Centroid-based Weight. The weight C(si) of sentence si is calculated as the cosine similarity between the sentence text and the concatenated text for the whole document set D. The weight is then normalized by dividing by the maximal weight. 接下來要看另一個分數 informativeness(資訊量)是如何計算的 計算informativeness的分數時,用兩種方法, 第一種是centroid-based, 分數的計算是用線性組合 考量了以下三種 features Centroid-based的權重 : 每句話與document set所結合起來的文字去算cosine similarity 然後每句話都算完之後,在全部都除以weight最大的分數,將值調整成0-1之間。

31 The EUSUM system (12/19) Sentence-Level Informativeness Evaluation Centroid-based Method
2). Sentence Position. The weight P(si) is calculated for sentence si to reflect its position priority as P(si)=1-((posi-1)/ni). posi : the position number of sentence si in a particular document. ni : the total number of sentences in the document. 第二種feature是計算每句話的位置,這裡的位置pos i是指句子在一篇文章中的位置 Ni是指一篇文章的句子數量

32 The EUSUM system (13/19) Sentence-Level Informativeness Evaluation Centroid-based Method
3). First Sentence Similarity. The weight F(si) is computed as the cosine similarity value between sentence si and the corresponding first sentence in the same document. Sum the three weights and get the overall score InfoScore(si) for sentence si. The score of each sentence is normalized by dividing by the maximum score. 第三個feature則是每句話與自己所屬文章中的第一句話之間的相似度。 這三個分數相加稱做Infoscore 然後每個句子的infoscore會除以所有句子中inforscore最大者,讓分數範圍落在0~1之間

33 The EUSUM system (14/19) Sentence-Level Informativeness Evaluation Graph-based Method
Given a document set D, let G=(V, E) be an undirected graph to reflect the relationships between sentences in the document set. V : the set of vertices and each vertex si in V is a sentence in the document set. E : the set of edges. Each edge eij in E is associated with an affinity weight f(si, sj) between sentences si and sj (i≠j). Computed using the standard cosine measure between the two sentences. 第二種計算informativeness 分數的方法稱做Graph-based, 給定一個document set D,G是一個undirected graph,G會反映出句子之間的關係。 V是端點,每個端點都是一個句子 E是兩個端點之間的edge,每個edge上面的值是句子si與句子sj之間的affinity weight, affinity weight是用standard cosine measure來計算 Affinity weight是指cosine, 可是cosine需要句子的vector,所以我猜句子的the element of the vector是text

34 The EUSUM system (15/19) Sentence-Level Informativeness Evaluation Graph-based Method
Use an affinity matrix M to describe G with each entry corresponding to the weight of an edge in the graph. M = (Mi,j)|V|×|V| is defined as Mi,j=f(si,sj). Then M is normalized to M ~ to make the sum of each row equal to 1. 使用matrix M來描述G,M中每個entry都是si與sj的 standard cosine measure, 建完matrix之後再將每個row除以row的總和, 讓每個row加總為一 e12 e13 e14 e21 e31 e41

35 The EUSUM system (16/19) Sentence-Level Informativeness Evaluation Graph-based Method
where μ is the damping factor usually set to 0.85, as in the PageRank algorithm. The score of each sentence is normalized by dividing by the maximum score. 之後利用公式來求infoscore, u 是damping factor(阻尼), 設它為0.85 算出來的分數與用centroid-based求出來的分數一樣,都除以句子中最大者,讓infoscore的範圍落在0~1之間

36 A brief summary Each sentence has two scores :
1. EaseScore (Reading Easiness with 8 features), SVR with RBF kernel function 2. InfoScore Centroid-based with 3 features, linear combination. Graph-based with affinity matrix M

37 The EUSUM system (17/19) Summary Extraction
Obtain the reading easiness score and the informativeness score of each sentence in the document set, linearly combine the two scores to get the combined score of each sentence. λ is not set to a large value in order to maintain the content informativeness in the extracted summary. 取得reading easiness score與informativeness score之後,用線性的方式相加。 Lamda不設太大,因為要保持摘要的資訊量。

38 The EUSUM system (18/19) Summary Extraction
For multi-document summarization, some sentences are highly overlapping with each other, and thus apply the same greedy algorithm in [31] to penalize the sentences highly overlapping with other highly scored sentences. 對於多文章摘要,有一些句子會與其它句子有高度的重疊,這時候會使用greedy algorithm來逞罰這些高度重疊的句子。 [31] X. Wan, J. Yang and J. Xiao. Using cross-document random walks for topic-focused multi-document summarization. In Proceedings of WI2006. (C : 13)

39 The EUSUM system (19/19) Summary Extraction
The final rank score RankScore(si) of each sentence si is initialized to its combined score CombinedScore(si). At each iteration, the highly ranked sentence (e.g. si) is selected into the summary, and the rank score of each remaining sentence sj is penalized by using the following formula: where ω>0 is the penalty degree factor. 而最後的Rankscore的初始值是combinescore 每次iteration,擁有最高分數的sentence會被挑出,然後會對剩下來的句子利用下列公式來降低它們的分數,omega只要是用在句子的多樣性。 Sj會減掉所挑選出si之間的相似度值。(為了多樣性)

40 Experiment Reading Easiness Prediction Experimental setup
DUC2001 provided 309 news articles for document summarization tasks, and the articles were grouped into 30 document sets. The news articles were selected from TREC-9. Chose 5 document sets (d04, d05, d06, d08, d11) with 54 news articles out of the DUC2001 test set. The documents were then split into sentences and there were totally 1736 sentences. DUC2001提供了309篇新聞,分成30個文件集,這些新聞選自TREC-9。 選5個文件集共54篇新聞,這54篇再分成句子,共有1736個句子。

41 Experiment Reading Easiness Prediction Experimental setup
Two college students (one undergraduate student and one graduate student) manually labeled the reading easiness score for each sentence separately. The score ranges between 1 and 5, and 1 means “very hard to understand”, and 5 means “very easy to understand”, and 3 means “mostly understandable”. The final reading easiness score was the average of the scores provided by the two annotators. 兩個學生為每個句子去做reading easiness score,分數從1~5, 1表很難理解, 以此類推。 最後的reading easiness score是兩個人的平均。

42 Experiment Reading Easiness Prediction Experimental setup
Randomly separated the labeled sentence set into a training set of 1482 sentences and a test set of 254 sentences. Used the LIBSVM tool for training and testing. Two standard metrics were used for evaluating the prediction results : Mean Square Error (MSE): This metric is a measure of how correct each of the prediction values is on average, penalizing more severe errors more heavily. Pearson’s Correlation Coefficient (ρ): This metric is a measure of whether the trends of prediction values matched the trends for human-labeled data. 之後隨機將labeded過後的句子分1482句到training set, 254句用來test。 Training用的是Libvsm 用兩個指 標 來衡量SVR跑出來的結果 : MSE MSE是用來看預測出來的值與labeled值之間的差距平方,再將所有平方值平均 2. 相關係數 用來衡量預測出來的值與人工 標 註 值之間的趨勢是否相同

43 Experiment Reading Easiness Prediction Experimental Results
The results guarantee that the use of reading easiness scores in the summarization process is feasible. 上表的結果是說明在摘要程序中使用SVR預測reading easiness(採用八個feature)是可行的。

44 Experiment Document Summarization Experimental Setup
Used the remaining 25 document sets for summarization evaluation, and the average document number per document set is 10. A summary was required to be created for each document set and the summary length was 100 words. Generic reference summaries were provided by NIST annotators for evaluation. 接下來用剩下的25個 文 件集合來做summarization evaluation 每個document set都需要一個100 字 長 度的summary Summary的評估則是用generic reference summary來進行衡量。

45 Experiment Document Summarization Experimental Setup
Evaluate a summary from the following two aspects: Content Informativeness : It refers to how much a summary reflects the major content of the document set. 評估時會從兩個方面來評估, 第一、內容的資訊量 這是指一個summary呈現了多少主要內容

46 Experiment Document Summarization Experimental Setup
Used the ROUGE toolkit for automatic evaluation of the content informativeness. The toolkit was officially adopted by DUC for automatic summarization evaluation. The toolkit measures summary quality by counting overlapping units such as the n-gram, word sequences and word pairs between the candidate summary and the reference summary. It reports separate F-measure scores for 1, 2, 3 and 4-gram. 使用ROUGE 1.5.5來自動評估內容資訊量 1. 這個工具是DUC官方使用來衡量摘要 2. 衡量的方式是計算candidate summary與reference summary的重複部分,如n-gram, word sequence and word pair 3. ROUGE 1.5.5會分別對於1, 2, 3, 4-gram回報F score

47 Experiment Document Summarization Experimental Setup
Reading Easiness: Use the average reading easiness score of the sentences in a summary as the summary’s reading easiness level. The overall reading easiness score is the average across all 25 document sets. 另一方面是衡量Reading easiness 是計算在summary中所有句子的reading easiness score的平均當作此summary的reading easiness level 整個的reading easiness score是25個 文 件集合的reading easiness score再平均

48 Experiment Document Summarization Experimental Setup
Performed user studies. 4 Chinese college students participated in the user studies. Developed a user study tool for facilitating the subjects to evaluate each summary from the two aspects of content informativeness and reading easiness. Each subject can assign a score from 1 to 5 on each aspect for each summary. For reading easiness, 1 means “very hard to understand”, and 5 means “very easy to understand”. For content informativeness, 1 means “least informative”, and 5 means “very informative”. 也有做 user study, 請四位大學生來參與, 作者也有開發協助衡量summary的工具,每位參與者對於兩個方面可以各給1~5分 在Reading easiness 中 : 1代表困難閱讀 而在Content informativeness中 : 1代表很少資訊

49 Experiment Document Summarization Experimental Setup
Compared two summarization systems’ results.(Centroid v.s Graph) The final score of a summary on one aspect was the average of the scores assigned by the four subjects. And the overall scores were averaged across all subjects and all 25 document sets. 比較兩個summarization system, 一個是用centroid based , 另一個是用graph based 最後的分數是四個參與者平均,然後再針對25個 文 件集所做出來的結果平均,得到使用者的最後分數。

50 Experiment Document Summarization Automatic Evaluation Results
The combination weight λ in EUSUM can be set to any non-negative value, it ranges from 0 to 1 in our experiments, because a much larger λ will lead to a big sacrifice of the content informativeness for the summary. The penalty degree factor ω for EUSUM is set to 10. 挑選句子的combined score, lamda從0~1, Omega設10

51 Experiment Document Summarization Automatic Evaluation Results
這個實驗室觀察informativeness 使用centroid方法下, lamda變化對於summary兩個構面的影響(資訊量與reading easiness score) 當lamda增加,reading easiness會比lamda=0來得好許多,但是content informativeness就不太有影響 調整lamda會影響到做成summary的句子 Observe that with the increase of λ, the summary’s reading easiness can be more quickly becoming significantly different from that of the summary with λ=0, while the summary’s content informativeness is not significantly affected when λ is set to a small value.

52 Experiment Document Summarization Automatic Evaluation Results
這個實驗室觀察informativeness 使用centroid方法下, lamda變化對於summary兩個構面的影響(資訊量與reading easiness score) 當lamda增加,reading easiness會比lamda=0來得好許多,但是content informativeness就不太有影響 調整lamda會影響到做成summary的句子 Observe that with the increase of λ, the summary’s reading easiness can be more quickly becoming significantly different from that of the summary with λ=0, while the summary’s content informativeness is not significantly affected when λ is set to a small value.

53 Experiment Document Summarization Automatic Evaluation Results
那這個實驗是觀察informativeness使用graph-based下對於ladma變化的值 Lamda小,對於summary就沒啥影響,但是lamda不等於0會對於reading easiness有很大的影響 When λ is set to a small value, the content informativeness aspect of the extracted summary are almost not affected, but the reading easiness aspect of the extracted summary can be significantly improved.

54 Experiment Document Summarization Automatic Evaluation Results
See that when λ is fixed, the ROUGE-1, ROUGE-W and ROUGESU* scores of EUSUM(Graph) are higher than the corresponding scores of EUSUM(Centroid). EUSUM(Graph) can extract more easy-to-understand summaries than EUSUM(Centroid). 也觀察到當lamda固定時,ROUGE-1, w, SU在graph方法底下的分數比centroid的分數來得高 換句話說,graph的方法適用於產生容易閱讀的summary。

55 Experiment Document Summarization Automatic Evaluation Results
For example, sentence length is one of the important features for reading easiness prediction, and a shorter sentence is more likely to be easy to understand. 舉例來說,句子長度是一個重要的因素,當lamda越大時,表示考量reading easiness, 那麼句子的平均長度就會比較短

56 How the penalty weight ω influences the two aspects of the proposed summarization system.
See that the reading easiness scores of EUSUM(Graph) with different settings have a tendency to increase with the increase of ω. After ω is larger than 10, the reading easiness scores for most settings do not change any more. 接下來要觀察的是懲罰的權重omega對於兩個指 標 的影響(reading easiness and content informativeness) 對於reading easisness score, 如果omega越大,分數有越高的趨勢,但是當omega=10的時候,改變就不是很明顯了。 所以之前的實驗onega才會取10

57 Experiment Document Summarization User Study Results
In order to validate the effectiveness of the system by real nonnative readers, two user study procedures were performed: User study 1: The summaries extracted by EUSUM(Centroid) (λ=0) and EUSUM(Graph) (λ=0.2) are compared and scored by subjects. 為了驗證系統結果是否好壞,會進行兩個使用者研究, 第一個使用者研究是針對此系統使用centroid lamda=0與graph lamda-0.2,再讓使用者評估

58 Experiment Document Summarization User Study Results
The user study results verify that the summaries by EUSUM(Graph) (λ=0.2) are indeed significantly easy to understand by non-native readers, while the content informativeness of the two systems are not significantly different. 所得的結果如表四所示 Graph的方法的確讓非英語體系的學生容易閱讀,而在內容資訊量上並沒有顯著的差異。

59 Experiment Document Summarization User Study Results
User study 2: The summaries extracted by EUSUM(Graph) (λ=0) and EUSUM(Graph) (λ=0.3) are compared and scored by subjects. The user study results verify that the summaries by EUSUM(Graph) (λ=0.3) is indeed significantly easy to understand by non-native readers, while the content informativeness of the two systems are not significantly different. 第二個使用者研究是做在graph方法下lamda=0與lamda=0.3的比較(同個方法) 結果是lamda=0.3時使用者較容易閱讀,而在內容資訊量上並沒有顯著差異。

60 Experiment Document Summarization Running Examples
EUSUM(Centroid)(λ=0) for D14: A U.S. Air Force F-111 fighter-bomber crashed today in Saudi Arabia, killing both crew members, U.S. military officials reported. ( ) A jet trainer crashed Sunday on the flight deck of the aircraft carrier Lexington in the Gulf of Mexico, killing five people, injuring at least two and damaging several aircraft (3.182) U.S. Air Force war planes participating in Operation Desert Shield are flying again after they were ordered grounded for 24 hours following a rash of crashes. ( ) A U.S. military jet crashed today in a remote, forested area in northern Japan, but the pilot bailed out safely and was taken by helicopter to an American military base, officials said. ( ) 接下來是舉兩個summary的例子

61 Experiment Document Summarization Running Examples
EUSUM(Graph)(λ=0) for D14: The U.S. military aircraft crashed about 800 meters northeast of a Kadena Air Base runway and the crash site is within the air base's facilities. ( ) Two U.S. Air Force F-16 fighter jets crashed in the air today and exploded, an air force spokeswoman said. ( ) West German police spokesman Hugo Lenxweiler told the AP in a telephone interview that one of the pilots was killed in the accident. ( ) Even before Thursday's fatal crash, 12 major accidents of military aircraft had killed 95 people this year alone. ( ) Air Force Spokesman 1st Lt. Al Sattler said the pilot in the Black Forest crash ejected safely before the crash and was taken to Ramstein Air Base to be examined. ( )

62 Conclusion & Future Work
Investigate the new factor of reading easiness for document summarization. Propose a novel summarization system - EUSUM for producing easy-to-understand summaries for non-native readers. 所以這篇論 文 採用了一個新的因素,叫做閱讀容易性 並且提出一個針對於非英語系讀者的摘要系統

63 Conclusion & Future Work
Future work, improve the summary’s reading easiness in the following two ways: 1) The summary fluency (e.g. sentence ordering in a summary) has influences on the reading easiness of a summary, and will consider the summary fluency factor in the summarization system. 2) More sophisticated sentence reduction and sentence simplification techniques will be investigated for improving the summary’s readability. 未來的工作則是由閱讀容易性下手 加入摘要的流暢性 考量句子的精簡方式來改善摘要的可讀性


Download ppt "EUSUM: extracting easy-to-understand english summaries for non-native readers Xiaojun Wan, Huiying Li, Jianguo Xiao pp. 491-498 (SIGIR 2010) EUSUM: Extracting."

Similar presentations


Ads by Google