A progressive sentence selection strategy for document summarization Presenter : Bo-Sheng Wang Authors: You Quyang, Wenjie Li, Renxian Zhang, Qin Lu IPM,

Slides:



Advertisements
Similar presentations
How to Build a Better Bus Garage: Mn/DOTs Facility Guidebook 18 th National Conference on Rural Public and Intercity Bus Transportation Gerry Weiss, State.
Advertisements

Document Summarization using Conditional Random Fields Dou Shen, Jian-Tao Sun, Hua Li, Qiang Yang, Zheng Chen IJCAI 2007 Hao-Chin Chang Department of Computer.
A Phrase Mining Framework for Recursive Construction of a Topical Hierarchy Date : 2014/04/15 Source : KDD’13 Authors : Chi Wang, Marina Danilevsky, Nihit.
Intelligent Database Systems Lab Presenter: WU, MIN-CONG Authors: Abdelghani Bellaachia and Mohammed Al-Dhelaan 2012, WIIAT NE-Rank: A Novel Graph-based.
Towards Twitter Context Summarization with User Influence Models Yi Chang et al. WSDM 2013 Hyewon Lim 21 June 2013.
GENERATING AUTOMATIC SEMANTIC ANNOTATIONS FOR RESEARCH DATASETS AYUSH SINGHAL AND JAIDEEP SRIVASTAVA CS DEPT., UNIVERSITY OF MINNESOTA, MN, USA.
A New Suffix Tree Similarity Measure for Document Clustering Hung Chim, Xiaotie Deng City University of Hong Kong WWW 2007 Session: Similarity Search April.
UNC-CH at DUC2007: Query Expansion, Lexical Simplification, and Sentence Selection Strategies for Multi-Document Summarization Catherine Blake Julia Kampov.
Text Specificity and Impact on Quality of News Summaries Annie Louis & Ani Nenkova University of Pennsylvania June 24, 2011.
Rooks, Parts of the paragraph Objective: Enable students to write a complete outline of paragraph and a complete paragraph with the correct grammar.
1 CS 430 / INFO 430 Information Retrieval Lecture 12 Probabilistic Information Retrieval.
1 Exploratory Tools for Follow-up Studies to Microarray Experiments Kaushik Sinha Ruoming Jin Gagan Agrawal Helen Piontkivska Ohio State and Kent State.
1 Lecture 8 Measures of association: chi square test, mutual information, binomial distribution and log likelihood ratio.
Sampling Distributions
Argumentative essays.  Usually range from as little as five paragraphs to as many as necessary  Focus is mainly on your side  But there is also a discussion.
An Automatic Segmentation Method Combined with Length Descending and String Frequency Statistics for Chinese Shaohua Jiang, Yanzhong Dang Institute of.
Unsupervised object discovery via self-organisation Presenter : Bo-Sheng Wang Authors: Teemu Kinnunen, Joni-Kristian Kamarainen, Lasse Lensu, Heikki Kälviäinen.
«Tag-based Social Interest Discovery» Proceedings of the 17th International World Wide Web Conference (WWW2008) Xin Li, Lei Guo, Yihong Zhao Yahoo! Inc.,
Exploring a topic in depth... From Reading to Writing The Odyssey often raises questions in readers’ minds: Was Odysseus a real person? Were the places.
A Compositional Context Sensitive Multi-document Summarizer: Exploring the Factors That Influence Summarization Ani Nenkova, Stanford University Lucy Vanderwende,
Intelligent Database Systems Lab Presenter : YAN-SHOU SIE Authors : JEROEN DE KNIJFF, FLAVIUS FRASINCAR, FREDERIK HOGENBOOM DKE Data & Knowledge.
Intelligent Database Systems Lab N.Y.U.S.T. I. M. Evaluation of novelty metrics for sentence-level novelty mining Presenter : Lin, Shu-Han Authors : Flora.
PAUL ALEXANDRU CHIRITA STEFANIA COSTACHE SIEGFRIED HANDSCHUH WOLFGANG NEJDL 1* L3S RESEARCH CENTER 2* NATIONAL UNIVERSITY OF IRELAND PROCEEDINGS OF THE.
1.1 - Populations, Samples and Processes Pictorial and Tabular Methods in Descriptive Statistics Measures of Location Measures of Variability.
Intelligent Database Systems Lab 國立雲林科技大學 National Yunlin University of Science and Technology 1 A Comparison of SOM Based Document Categorization Systems.
A progressive sentence selection strategy for document summarization You Ouyang, Wenjie Li, Renxian Zhang, Sujian Li, Qin Lu IPM 2013 Hao-Chin Chang Department.
11 A Hybrid Phish Detection Approach by Identity Discovery and Keywords Retrieval Reporter: 林佳宜 /10/17.
Intelligent Database Systems Lab 國立雲林科技大學 National Yunlin University of Science and Technology 1 Mining Positive and Negative Patterns for Relevance Feature.
1 Sentence-extractive automatic speech summarization and evaluation techniques Makoto Hirohata, Yosuke Shinnaka, Koji Iwano, Sadaoki Furui Presented by.
Intelligent Database Systems Lab 國立雲林科技大學 National Yunlin University of Science and Technology Extracting meaningful labels for WEBSOM text archives Advisor.
Summarization of XML Documents K Sarath Kumar. Outline I.Motivation II.System for XML Summarization III.Ranking Model and Summary Generation IV.Example.
You Are What You Tag Yi-Ching Huang and Chia-Chuan Hung and Jane Yung-jen Hsu Department of Computer Science and Information Engineering Graduate Institute.
LexPageRank: Prestige in Multi- Document Text Summarization Gunes Erkan and Dragomir R. Radev Department of EECS, School of Information University of Michigan.
1 Web-Page Summarization Using Clickthrough Data* JianTao Sun, Yuchang Lu Dept. of Computer Science TsingHua University Beijing , China Dou Shen,
1 Sentence Extraction-based Presentation Summarization Techniques and Evaluation Metrics Makoto Hirohata, Yousuke Shinnaka, Koji Iwano and Sadaoki Furui.
LOGO Summarizing Conversations with Clue Words Giuseppe Carenini, Raymond T. Ng, Xiaodong Zhou (WWW ’07) Advisor : Dr. Koh Jia-Ling Speaker : Tu.
Intelligent Database Systems Lab N.Y.U.S.T. I. M. An IPC-based vector space model for patent retrieval Presenter: Jun-Yi Wu Authors: Yen-Liang Chen, Yu-Ting.
Clustering Sentence-Level Text Using a Novel Fuzzy Relational Clustering Algorithm.
Natural Language Processing for Information Retrieval -KVMV Kiran ( )‏ -Neeraj Bisht ( )‏ -L.Srikanth ( )‏
DOCUMENT UPDATE SUMMARIZATION USING INCREMENTAL HIERARCHICAL CLUSTERING CIKM’10 (DINGDING WANG, TAO LI) Advisor: Koh, Jia-Ling Presenter: Nonhlanhla Shongwe.
Diana Cason Bakersfield College
Intelligent Database Systems Lab Presenter : WU, MIN-CONG Authors : YUNG-MING LI, TSUNG-YING LI 2013, DSS Deriving market intelligence from microblogs.
Intelligent Database Systems Lab N.Y.U.S.T. I. M. Unsupervised word sense disambiguation for Korean through the acyclic weighted digraph using corpus and.
Threshold Setting and Performance Monitoring for Novel Text Mining Wenyin Tang and Flora S. Tsai School of Electrical and Electronic Engineering Nanyang.
A robust adaptive clustering analysis method for automatic identification of clusters Presenter : Bo-Sheng Wang Authors: P.Y. Mok*, H.Q. Huang, Y.L. Kwok,
Intelligent Database Systems Lab Presenter : JIAN-REN CHEN Authors : Wen Zhang, Taketoshi Yoshida, Xijin Tang 2011.ESWA A comparative study of TF*IDF,
Intelligent Database Systems Lab Presenter: CHANG, SHIH-JIE Authors: Longzhuang Li, Yi Shang, Wei Zhang 2002.ACM. Improvement of HITS-based Algorithms.
A Novel Relational Learning-to- Rank Approach for Topic-focused Multi-Document Summarization Yadong Zhu, Yanyan Lan, Jiafeng Guo, Pan Du, Xueqi Cheng Institute.
Event-Based Extractive Summarization E. Filatova and V. Hatzivassiloglou Department of Computer Science Columbia University (ACL 2004)
LexPageRank: Prestige in Multi-Document Text Summarization Gunes Erkan, Dragomir R. Radev (EMNLP 2004)
Finding main idea, supporting details, marking a textbook, and charts to organize information.
Extractive Summarization using Inter- and Intra- Event Relevance Wenjie li, Wei Xu, Mingli Wu, Chunfa Yuan, Qin Lu H.K. Polytechnic Univ. & Tsinghua Univ.,
How to write an effective conclusion Also known as putting it all together.
MMM2005The Chinese University of Hong Kong MMM2005 The Chinese University of Hong Kong 1 Video Summarization Using Mutual Reinforcement Principle and Shot.
Intelligent Database Systems Lab Presenter : JHOU, YU-LIANG Authors : Jae Hwa Lee, Aviv Segev 2012 CE Knowledge maps for e-learning.
Intelligent Database Systems Lab Presenter: YU-TING LU Authors: Junping Zhang, Hua Huang and Jue Wang IEEE INTELLIGENT SYSTEMS Manifold Learning.
Intelligent Database Systems Lab Presenter : YU-TING LU Authors : Hsin-Chang Yang, Han-Wei Hsiao, Chung-Hong Lee IPM Multilingual document mining.
Authors: Yutaka Matsuo & Mitsuru Ishizuka Designed by CProDM Team.
Estimating a Population Proportion ADM 2304 – Winter 2012 ©Tony Quon.
A Survey on Automatic Text Summarization Dipanjan Das André F. T. Martins Tolga Çekiç
Paul van Mulbregt Sheera Knecht Jon Yamron Dragon Systems Detection at Dragon Systems.
Abstract  An abstract is a concise summary of a larger project (a thesis, research report, performance, service project, etc.) that concisely describes.
Research Progress Kieu Que Anh School of Knowledge, JAIST.
Intelligent Database Systems Lab Presenter: YU-TING LU Authors: Yong-Bin Kang, Pari Delir Haghighi, Frada Burstein ESA CFinder: An intelligent key.
2016/9/301 Exploiting Wikipedia as External Knowledge for Document Clustering Xiaohua Hu, Xiaodan Zhang, Caimei Lu, E. K. Park, and Xiaohua Zhou Proceeding.
Queensland University of Technology
A Formal Study of Information Retrieval Heuristics
Arabic Text Categorization Based on Arabic Wikipedia
Using lexical chains for keyword extraction
Results, Discussion, and Conclusion
Presentation transcript:

A progressive sentence selection strategy for document summarization Presenter : Bo-Sheng Wang Authors: You Quyang, Wenjie Li, Renxian Zhang, Qin Lu IPM,

Outlines Motivation Objectives Methodology Experiments Conclusions Comments 2

Motivation Since there are actually many overlapping concepts in the input documents, it is indeed unnecessary and redundant to repeatedly mention one concept in the summary. 3

Objectives They mainly consider the problem of how to construct summaries with good saliency and coverage. 4

Methodology In this paper, they propose a novel sentence selection strategy that follows a progressive way to select the summary sentences 5

Methodology Step 1 : Define the subsuming relationship between two sentences. 6

Methodology Step 1 : Define the subsuming relationship between two sentences. 7

Methodology Step 1 : Define the subsuming relationship between two sentences. The relationship between two sentences is determined by the relations between the concepts. 8

Methodology Step 1-1 : The target is to study the subsuming relations between the words in the input documents. Linguistic relation database (WordNet) ‚Frequency-based statistics (co-occurrence) 9

Methodology Step 1-1 : The target is to study the subsuming relations between the words in the input documents. 10

Methodology They expect the relations to have the characteristics listed below. 1.Sentence-level coverage. 2.Set-based coverage 3.Transitive reduction 11

Methodology- Sentence-level coverage In document summarization, sometimes a document set just consists of only a few documents. (For example : 10 documents per set in the DUC 2004 data.)  They intend to study the sentence-level co-occurrence statistics instead of document-level co-occurrence. 12

Methodology- Set-based coverage Sentence-level co-occurrence is sparser than document-level co-occurrence due to the shorter length of sentences. Therefore,the sentence-level coverage of a word with respect to another is usually much smaller.  They intend to examine the coverage not only between two words, but also between a word and a word set. (For example : there are two common phrases ‘‘King Norodom’’ and ‘‘Prince Norodom’’. In the input documents, the coverage of ‘‘Norodom’’ with respect to either ‘‘King’’ or ‘‘Prince’’ is not large enough and thus ‘‘Norodom’’ is not recognized to be subsumed by any one of the two. On the other side, ‘‘Norodom’’ is almost entirely covered by the set {‘‘King’’, ‘‘Prince’’}. Therefore, if we can define a set-based coverage, more relations can be discovered) 13

Methodology- Transitive reduction They also conduct a transitive reduction on the relations. i.e : to three words a, b, c that satisfy a > b, b > c and a > c (a > b denotes a subsuming b), the long-term relationship a > c will be ignored, since we prefer to include the subsuming word b into the summary before including the subsumed word c. 14

Methodology- Necessary measures Spanned Sentence Set(SPAN) :  SPAN(w) : The Spanned Sentence Set of a word w in document set D  S D : Sentence Set  SPAN(w)={s|sS D ^ ws}= Define as the set of the sentences. Concept Coverage(COV) :  COV(w|W)=|SPAN(w)∩ ∪ i SPAN(w i )|/|SPAN(W)| =Defined as the proportion of the sentences in SPAN(w) that appear in SPAN(W). 15

Methodology 16

Methodology Step1-2 : (1)They define the concept of “Connected Word” i.e : W={w 1,…..,w l }; W’={w’ 1,…..,w’ m } condition : w l1 ;... ; w lk W ∪ W, s.t.w i < w l1 ^ w 11 < w l2 ^... ^ w l(k-1) < w lk ^ w lk < w ’ 1 (2)The Conditional Saliency of calculated as a weighted sum of the importance of all the ‘‘connected words’’ CS(s|s’)Σw i sLOG(MAX w’js’ CON(w i |w ’ j * score(w i ))) 17

Methodology Step 2 : 18

Methodology Step : 1.To every word that is not subsumed by any other word, we regard it as a general word and attach it to ROOT-W. 2.we calculate the score of each unselected sentence based on its conditional saliency to each selected sentence.  Formula : Score(s|S old )=Max s t S old {CS(s,s t )} * 1/len(s) * (1-pos(s))  Penalizing : Score(w i )=α * Score(w i ) 19

Experiments 20 Step : Evaluated on a generic multi-document summarization data set. ‚Evaluated on a query-focused multi-document summarization data set. Pre-processed : – Removing the stop-words and stemming the remaining words

Experiments- Evaluation metrics ROUGE – State-of-the-art automatic summarization evaluation – They mainly makes use of N-gram comparison. DUC 21

Experiments- Generic summarization 22

Experiments- Generic summarization 23

Experiments- Generic summarization 24

Experiments- Generic summarization 25

Experiments- Query-focused summarization 26

Conclusions Progressive system consistently performs better than the sequential system on every data set. The method competes comparably with the best submitted systems. The results clearly demonstrate the advantages of the progressive sentence selection strategy in constructing summaries with better saliency and coverage. 27

Comments Advantages – The method that have better saliency and coverage. – In unsupervised case, find the number of categories can be save some time. Applications – Object Discovery 28

Comments Advantages – The method that have better saliency and coverage. Disadvantage – The method spend some time than traditional methods. Applications – Sentence selection 29