Mutual-reinforcement document summarization using embedded graph based sentence clustering for storytelling Zhengchen Zhang , Shuzhi Sam Ge , Hongsheng.

Slides:



Advertisements
Similar presentations
Robust spectral 3D-bodypart segmentation along time Fabio Cuzzolin, Diana Mateus, Edmond Boyer, Radu Horaud Perception project meeting 24/4/2007 Submitted.
Advertisements

Document Summarization using Conditional Random Fields Dou Shen, Jian-Tao Sun, Hua Li, Qiang Yang, Zheng Chen IJCAI 2007 Hao-Chin Chang Department of Computer.
Finding Topic-sensitive Influential Twitterers Presenter 吴伟涛 TwitterRank:
Carolina Galleguillos, Brian McFee, Serge Belongie, Gert Lanckriet Computer Science and Engineering Department Electrical and Computer Engineering Department.
Comparing Twitter Summarization Algorithms for Multiple Post Summaries David Inouye and Jugal K. Kalita SocialCom May 10 Hyewon Lim.
Sequence Clustering and Labeling for Unsupervised Query Intent Discovery Speaker: Po-Hsien Shih Advisor: Jia-Ling Koh Source: WSDM’12 Date: 1 November,
Texture Segmentation Based on Voting of Blocks, Bayesian Flooding and Region Merging C. Panagiotakis (1), I. Grinias (2) and G. Tziritas (3)
Intelligent Database Systems Lab 國立雲林科技大學 National Yunlin University of Science and Technology A novel document similarity measure based on earth mover’s.
One-Shot Multi-Set Non-rigid Feature-Spatial Matching
DIMENSIONALITY REDUCTION BY RANDOM PROJECTION AND LATENT SEMANTIC INDEXING Jessica Lin and Dimitrios Gunopulos Ângelo Cardoso IST/UTL December
Spectral Clustering 指導教授 : 王聖智 S. J. Wang 學生 : 羅介暐 Jie-Wei Luo.
A Unified View of Kernel k-means, Spectral Clustering and Graph Cuts
Prénom Nom Document Analysis: Data Analysis and Clustering Prof. Rolf Ingold, University of Fribourg Master course, spring semester 2008.
Three Algorithms for Nonlinear Dimensionality Reduction Haixuan Yang Group Meeting Jan. 011, 2005.
Latent Semantic Analysis (LSA). Introduction to LSA Learning Model Uses Singular Value Decomposition (SVD) to simulate human learning of word and passage.
Atul Singh Junior Undergraduate CSE, IIT Kanpur.  Dimension reduction is a technique which is used to represent a high dimensional data in a more compact.
E.G.M. PetrakisDimensionality Reduction1  Given N vectors in n dims, find the k most important axes to project them  k is user defined (k < n)  Applications:
HCC class lecture 14 comments John Canny 3/9/05. Administrivia.
JOURNAL OF INFORMATION SCIENCE AND ENGINEERING 30, (2014) BERLIN CHEN, YI-WEN CHEN, KUAN-YU CHEN, HSIN-MIN WANG2 AND KUEN-TYNG YU Department of Computer.
Jinhui Tang †, Shuicheng Yan †, Richang Hong †, Guo-Jun Qi ‡, Tat-Seng Chua † † National University of Singapore ‡ University of Illinois at Urbana-Champaign.
Manifold learning: Locally Linear Embedding Jieping Ye Department of Computer Science and Engineering Arizona State University
Multi-Style Language Model for Web Scale Information Retrieval Kuansan Wang, Xiaolong Li and Jianfeng Gao SIGIR 2010 Min-Hsuan Lai Department of Computer.
School of Electronics Engineering and Computer Science Peking University Beijing, P.R. China Ziqi Wang, Yuwei Tan, Ming Zhang.
A Two Tier Framework for Context-Aware Service Organization & Discovery Wei Zhang 1, Jian Su 2, Bin Chen 2,WentingWang 2, Zhiqiang Toh 2, Yanchuan Sim.
1 A Graph-Theoretic Approach to Webpage Segmentation Deepayan Chakrabarti Ravi Kumar
Processing of large document collections Part 7 (Text summarization: multi- document summarization, knowledge- rich approaches, current topics) Helena.
No. 1 Classification and clustering methods by probabilistic latent semantic indexing model A Short Course at Tamkang University Taipei, Taiwan, R.O.C.,
IEEE TRANSSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE
Feature selection LING 572 Fei Xia Week 4: 1/29/08 1.
1 Sentence-extractive automatic speech summarization and evaluation techniques Makoto Hirohata, Yosuke Shinnaka, Koji Iwano, Sadaoki Furui Presented by.
Computer Vision Lab. SNU Young Ki Baik Nonlinear Dimensionality Reduction Approach (ISOMAP, LLE)
LexPageRank: Prestige in Multi- Document Text Summarization Gunes Erkan and Dragomir R. Radev Department of EECS, School of Information University of Michigan.
Query Suggestion Naama Kraus Slides are based on the papers: Baeza-Yates, Hurtado, Mendoza, Improving search engines by query clustering Boldi, Bonchi,
Enhancing Cluster Labeling Using Wikipedia David Carmel, Haggai Roitman, Naama Zwerdling IBM Research Lab (SIGIR’09) Date: 11/09/2009 Speaker: Cho, Chin.
Character Identification in Feature-Length Films Using Global Face-Name Matching IEEE TRANSACTIONS ON MULTIMEDIA, VOL. 11, NO. 7, NOVEMBER 2009 Yi-Fan.
Round-Robin Discrimination Model for Reranking ASR Hypotheses Takanobu Oba, Takaaki Hori, Atsushi Nakamura INTERSPEECH 2010 Min-Hsuan Lai Department of.
Semantic Wordfication of Document Collections Presenter: Yingyu Wu.
1 Sentence Extraction-based Presentation Summarization Techniques and Evaluation Metrics Makoto Hirohata, Yousuke Shinnaka, Koji Iwano and Sadaoki Furui.
Gene Clustering by Latent Semantic Indexing of MEDLINE Abstracts Ramin Homayouni, Kevin Heinrich, Lai Wei, and Michael W. Berry University of Tennessee.
1 Modeling Long Distance Dependence in Language: Topic Mixtures Versus Dynamic Cache Models Rukmini.M Iyer, Mari Ostendorf.
SemiBoost : Boosting for Semi-supervised Learning Pavan Kumar Mallapragada, Student Member, IEEE, Rong Jin, Member, IEEE, Anil K. Jain, Fellow, IEEE, and.
Spectral Sequencing Based on Graph Distance Rong Liu, Hao Zhang, Oliver van Kaick {lrong, haoz, cs.sfu.ca {lrong, haoz, cs.sfu.ca.
Learning Spectral Clustering, With Application to Speech Separation F. R. Bach and M. I. Jordan, JMLR 2006.
I-SMOOTH FOR IMPROVED MINIMUM CLASSIFICATION ERROR TRAINING Haozheng Li, Cosmin Munteanu Pei-ning Chen Department of Computer Science & Information Engineering.
Generic Summarization and Keyphrase Extraction Using Mutual Reinforcement Principle and Sentence Clustering Hongyuan Zha Department of Computer Science.
Threshold Setting and Performance Monitoring for Novel Text Mining Wenyin Tang and Flora S. Tsai School of Electrical and Electronic Engineering Nanyang.
Multi-level Bootstrapping for Extracting Parallel Sentence from a Quasi-Comparable Corpus Pascale Fung and Percy Cheung Human Language Technology Center,
Hongbo Deng, Michael R. Lyu and Irwin King
Panther: Fast Top-k Similarity Search in Large Networks JING ZHANG, JIE TANG, CONG MA, HANGHANG TONG, YU JING, AND JUANZI LI Presented by Moumita Chanda.
Finding document topics for improving topic segmentation Source: ACL2007 Authors: Olivier Ferret (18 route du Panorama, BP6) Reporter:Yong-Xiang Chen.
Michael Bendersky, W. Bruce Croft Dept. of Computer Science Univ. of Massachusetts Amherst Amherst, MA SIGIR
Event-Based Extractive Summarization E. Filatova and V. Hatzivassiloglou Department of Computer Science Columbia University (ACL 2004)
LexPageRank: Prestige in Multi-Document Text Summarization Gunes Erkan, Dragomir R. Radev (EMNLP 2004)
Bayesian Networks in Document Clustering Slawomir Wierzchon, Mieczyslaw Klopotek Michal Draminski Krzysztof Ciesielski Mariusz Kujawiak Institute of Computer.
1 An Efficient Optimal Leaf Ordering for Hierarchical Clustering in Microarray Gene Expression Data Analysis Jianting Zhang Le Gruenwald School of Computer.
1 ICASSP Paper Survey Presenter: Chen Yi-Ting. 2 Improved Spoken Document Retrieval With Dynamic Key Term Lexicon and Probabilistic Latent Semantic Analysis.
Intelligent Database Systems Lab Presenter: CHANG, SHIH-JIE Authors: Tao Liu, Zheng Chen, Benyu Zhang, Wei-ying Ma, Gongyi Wu 2004.ICDM. Improving Text.
NTNU Speech Lab 1 Topic Themes for Multi-Document Summarization Sanda Harabagiu and Finley Lacatusu Language Computer Corporation Presented by Yi-Ting.
哈工大信息检索研究室 HITIR ’ s Update Summary at TAC2008 Extractive Content Selection Using Evolutionary Manifold-ranking and Spectral Clustering Reporter: Ph.d.
SemiBoost : Boosting for Semi-supervised Learning Pavan Kumar Mallapragada, Student Member, IEEE, Rong Jin, Member, IEEE, Anil K. Jain, Fellow, IEEE, and.
Multi-Class Sentiment Analysis with Clustering and Score Representation Yan Zhu.
Short Text Similarity with Word Embedding Date: 2016/03/28 Author: Tom Kenter, Maarten de Rijke Source: CIKM’15 Advisor: Jia-Ling Koh Speaker: Chih-Hsuan.
Document Clustering Based on Non-negative Matrix Factorization
Deep Compositional Cross-modal Learning to Rank via Local-Global Alignment Xinyang Jiang, Fei Wu, Xi Li, Zhou Zhao, Weiming Lu, Siliang Tang, Yueting.
Outline Nonlinear Dimension Reduction Brief introduction Isomap LLE
Scale-Space Representation of 3D Models and Topological Matching
“Traditional” image segmentation
NON-NEGATIVE COMPONENT PARTS OF SOUND FOR CLASSIFICATION Yong-Choon Cho, Seungjin Choi, Sung-Yang Bang Wen-Yi Chu Department of Computer Science &
Latent Semantic Analysis
Presented by Nick Janus
Presentation transcript:

Mutual-reinforcement document summarization using embedded graph based sentence clustering for storytelling Zhengchen Zhang , Shuzhi Sam Ge , Hongsheng He IPM2012 Hao-Chin Chang Department of Computer Science & Information Engineering National Taiwan Normal University 2011/03/09

Outline Introduction Sentence ranking using embedded graph based sentence clustering Document modeling Embedded graph based sentence clustering Mutual-reinforcement ranking algorithm Experiment Conclusion and Future work 橘

Introduction There are three phrases in the framework Document modeling document is modeled by a weighted graph with vertexes that represent sentences of the document Sentence clustering The sentences are clustered into different groups to find the latent topics in the story. To alleviate the influence of unrelated sentences in clustering, Sentence ranking An embedding process is employed to optimize the document model We propose a framework which considers the mutual effect between clusters, sentences and terms instead of the relationship between documents, sentences, and terms to employ the cluster level information

Introduction The sentences are then clustered into different groups according to the distance between two sentences In order to alleviate the influence of unrelated sentences in sentence clustering, we employ an embedding process to optimize the graph vertexes The contributions of this paper are summarized An embedded graph based sentence clustering method is proposed for sentence grouping of a document, which is robust with respect to different cluster numbers An iterative ranking method is presented which considers the mutual-reinforcement between terms, sentences and sentence clusters A document summarization framework considering sentence cluster information is proposed and the framework is evaluated using DUC data sets.

Sentence ranking using embedded graph based sentence clustering To reduce the influence of the low cosine similarity weights and to enhance sentence clustering performance, an embedding algorithm is performed on the graph the sentences are ranked according to the mutual effects between sentences, termsand clusters based on the assumptions A sentence is assigned a high rank if it is similar to many high ranking clusters and it contains many high ranking terms The rank of a cluster is high if it contains many high ranking sentences and many high ranking terms The rank of a term is high if it appears in many high ranking sentences and clusters.

Document modeling The weight wij of an edge denotes the distance between sentences si and sj which is a cosine similarity between the vectors of two sentences

Embedded graph based sentence clustering To alleviate the influence of unrelated sentences, we embed the original matrix D of a document into lower dimension space inspired by Locally Linear Embedding (LLE) A sentence di which is a column vector of D is expressed as a linear combination of its ni most similar sentences dj where i is the set of sentences most similar to di

Embedded graph based sentence clustering In graph embedding , we minimize the following cost function of approximation error to determine the optimal weight matrix rij Wi =[ri1, …, rin] are the weights connecting di to its neighbors Partial derivatives with respect to each weight rij Wi is found by solving the equations

Embedded graph based sentence clustering The vectors of embedded sentences with enhanced relationship di are obtained by minimizing the cost function While the embedding operation keeps the relationship between a sentence and a set of neighbors The performance of embedding operation will improve if there are more points in the graph

Mutual-reinforcement ranking algorithm To employ the cluster-level information and the latent theme information in the clusters for document summarization D is the sentence–term matrix of the document where cluster cj is represented by a vector which is summary of the vectors of all the sentences in this cluster The weight of the edge connecting term tl and cluster cj

Mutual-reinforcement ranking algorithm r(si) is the rank of sentencesi, r(cj) is the rank of cluster cj, and r(tl) is the rank of term tl The sentence with the highest score is selected as a part of the summarization

Experiment

Experiment KM and agglomerative (AGG) clustering MREG algorithm which is named KM-TSC-EM mutual-reinforcement between sentences and clusters (KM-SC) TSC is short for Term, Sentence and Cluster EM is short for Embedded Graph based sentence clustering

Experiment a = 0.25 b = 0.25 and c = 0.50 a = 0.50, b = 0.50,c = 1.0 a = 0.750, b = 1.0,c = 1.0 ROUGE-1 scores 0.31841 a = 0.750, b = 1.0,c = 0.50 ROUGE-1 scores 0.31769

Conclusion The sentences were clustered into different groups, and an embedding process was employed to reduce the effect of unrelated sentences and to enhance the sentence clustering performance. Performance comparison of different combinations of components illustrated that the algorithm improved system performance and was more robust with respect to different cluster numbers. Computer Speech & Language 2012 Camille Guinaudeau , Guillaume Gravier , Pascale Sébillo Enhancing lexical cohesion measure with confidence measures, semantic relations and language model interpolation for multimedia spoken content topic segmentation Confidence score

摘要示意圖 訓練文本語料 相關特徵 詞頻特徵 上下文特徵 一般化 收集 生成摘要 語料庫 KL(D||S) 初次檢索系統KL(S||D) … Sn 訓練文本語料 初次檢索系統KL(S||D) 相關特徵 詞頻特徵 上下文特徵 一般化 收集 語料庫 生成摘要 KL(D||S)